276 78 96MB
English Pages 1756 [1744]
Encyclopedia of Earth Sciences Series
Encyclopedia of Mathematical Geosciences Edited by B. S. Daya Sagar · Qiuming Cheng Jennifer McKinley · Frits Agterberg
Encyclopedia of Earth Sciences Series Series Editors Charles W. Finkl, Department of Geosciences, Florida Atlantic University, Boca Raton, FL, USA Rhodes W. Fairbridge, New York, NY, USA
The Encyclopedia of Earth Sciences Series provides comprehensive and authoritative coverage of all the main areas in the Earth Sciences. Each volume comprises a focused and carefully chosen collection of contributions from leading names in the subject, with copious illustrations and reference lists. These books represent one of the world's leading resources for the Earth Sciences community. Previous volumes are being updated and new works published so that the volumes will continue to be essential reading for all professional earth scientists, geologists, geophysicists, climatologists, and oceanographers as well as for teachers and students. Most volumes are also available online. Accepted for inclusion in Scopus.
B. S. Daya Sagar • Qiuming Cheng • Jennifer McKinley • Frits Agterberg Editors
Encyclopedia of Mathematical Geosciences With 555 Figures and 97 Tables
Editors B. S. Daya Sagar Systems Science & Informatics Unit Indian Statistical Institute – Bangalore Centre Bangalore, India Jennifer McKinley Geography School of Natural and Built Environment Queen’s University Belfast Belfast, UK
Qiuming Cheng Institute of Earth Sciences China University of Geosciences Beijing, China Frits Agterberg Canada Geological Survey Ottawa, ON, Canada
ISSN 1388-4360 ISSN 1871-756X (electronic) Encyclopedia of Earth Sciences Series ISBN 978-3-030-85039-5 ISBN 978-3-030-85040-1 (eBook) https://doi.org/10.1007/978-3-030-85040-1 © Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
We dedicate the Encyclopedia of Mathematical Geosciences to two scientists: John Cedric Griffiths and William Christian Krumbein, who inspired the four of us and were directly involved in the creation of the field of mathematical geoscience. These two scientists symbolize our intention for the Encyclopedia of Mathematical Geosciences to connect the basic and applied divisions of the mathematical geosciences.
William Christian Krumbein (1902–1979) was a founding officer of the Association. Born at Beaver Falls, Pennsylvania, in January 1902, Krumbein attended the University of Chicago, receiving the degree of bachelor of philosophy in business administration in 1926, an MS in geology in 1930, and a PhD in geology in 1932. He taught at the University of Chicago from 1933 to 1942, advancing from instructor to associate professor. During World War II, from 1942 until 1945, he served in Washington, D.C. with the Beach Erosion Board of the U.S. Army Corps of Engineers. Following a short stint with Gulf Research and Development Company, immediately after the end of the war, he joined Northwestern University in 1946, serving there until mandatory retirement in 1970. He was named the William Deering Professor of Geological Sciences in 1960. Krumbein died on August 18, 1979, a few months after Syracuse University had awarded him a DSc (honoris causa). At his memorial service, former Northwestern colleague Larry Sloss said of Krumbein “that by constitutionally rejecting conventional wisdom, he continually pursued innovative methods whereby the natural phenomena of geology could be
expressed with mathematical rigor.” IAMG instituted the William Christian Krumbein Medal. This award was fittingly named for William C. Krumbein. He is considered one of the fathers of the subject.
John Cedric Griffiths (1912–1992), formerly Professor of Petrography at the Pennsylvania State University, was the first recipient of the IAMG’s Krumbein Medal in 1976. Like Krumbein, Griffiths remains well known as a pioneer of introducing quantitative methods across a wide spectrum of geological and economic problems. Born in Lianelli, Carmarthenshire, Wales, he graduated in petrology from the Universities of Wales and London prior to working for a petroleum company in Trinidad for 7 years. In June 1987, he retired after 40 years of teaching in the Department of Mineralogy and Petrology at the Pennsylvania State faculty. He directed more than 50 MSc and PhD students. From his classes, students learned the communality of scientific problems across different disciplines including industrial engineering, computer science, and psychology. During 1967–1968, he was the first Distinguished Visiting Lecturer at Kansas University in a program instigated by Professor Daniel Merriam. In addition to his well-known 1967 textbook Scientific Method in Analysis of Sediments, the scientific literature has been greatly enriched by his many articles. His work on unit regional values received international acclaim. Professor Griffith’s scientific distinction, coupled with his wit and lively often provocative oral presentations, have stimulated everyone lucky enough to have experienced them.
Foreword
It is a great honor for me to be invited to contribute this foreword to what is in so many ways an outstanding and impressive work. Dr. Sagar and his team are to be congratulated on bringing together such a complete and comprehensive reference work that will be invaluable to students, researchers, and educators for years to come. Many people who are not already aware of the progress that has been made in the mathematical geosciences in the past few decades may wonder about the existence of such a field at the intersection between mathematics and the geosciences. The natural world often appears random and unpredictable, and thus very unlike the pure logic and beautiful symmetries of mathematics. To be sure, there are examples of something approaching purity and simplicity in the natural world: one thinks of the sine-generated curves of river meanders, the almost-perfect circles of craters, the elegant geometry of some periglacial features such as frost-wedge polygons, or the statistics of stream networks. It was features such as these that attracted my interest many years ago when I found myself enrolled as a PhD student working under a karst geomorphologist (Derek Ford) after a conventional background in undergraduate physics, where the idea that the world can be reduced to a few simple principles has dominated science for centuries. At the opposite extreme, the geosciences have provided a rich set of applications for the mathematics of disorder and chaos. Many of Mandelbrot’s initial examples of his developing mathematics of fractals were provided by the geosciences, including the geometry of shorelines. Moreover, some simple discoveries in the geosciences, such as the scaling laws and the laws of stream number, can be attributed to nothing more than the asymptotic result of randomness. In a way, history has brought us full circle in the increasing use of artificial intelligence and machine learning in the geosciences. Thoroughly grounded in data science and in the fourthparadigm mantra “let the data speak for themselves,” these methods abandon any semblance of theory or generalizability in the search for algorithms that support searches and provide predictions based in complexity. While Occam’s Razor and the traditional scientific search for simplicity may have largely failed to provide understanding and explanation in the geosciences, these methods are at least capable of revealing similarities and supporting an early, hypothesisgenerating phase of a research project. Before we go too far into data science, however, it is worth remembering the famous saying of Korzybski (1933): “The map is not the territory,” that is, what the map or the data are trying to capture is no more than a representation of reality, and will be at best only an approximation, generalization, abstraction, or interpretation. The interpretations drawn from data, whether through statistical analysis or machine learning, will always be subject to the uncertainties that must be present in the data, along with other uncertainties that result from the use of imperfect models, and there will always be a danger that conclusions tell us more about those uncertainties than about reality. In recent years, many branches of science have been the subject of inquiries into ethics. This may have begun with concerns about reproducibility and replicability in experimental psychology (NASEM 2019), and it grew further during the civil unrest in the USA in 2020 with the call to improve diversity, equity, and inclusion. Ethical issues arise in many areas of the geosciences, and more broadly in the social and environmental sciences. Individual privacy and surveillance often feature strongly in debates over ethics, but many other issues can be regarded as problems
vii
viii
Foreword
of ethics (AAG 2022). They include making inappropriate or biased inferences from data, ignoring the presence of uncertainty in findings, or unwarranted generalization across space or in time. While none of the entries in the encyclopedia address ethical issues exclusively, and there are none on replicability or diversity, many of them provide a context for further discussion of ethics in specific areas, and perhaps future editions of the encyclopedia might include more on social issues such as these. Anyone reviewing the encyclopedia and its many entries will wonder what it is that is different about the mathematics that is useful in the geosciences, and similarly about the statistical and computational methods, and the methods of data science. They might wonder what new developments in each of these areas have originated in the mathematical geosciences and perhaps spread into other fields. There are several examples of this in the encyclopedia: the article on Geographically Weighted Regression, which is entirely driven by the need for a relaxed form of replicability in order to deal with the essential spatial heterogeneity of the Earth’s surface; and the entire field of geostatistics, which arose because of the necessity of dealing with geospatial data that exhibit spatial dependence, or regionalized variables in the terminology of Matheron. In inferential statistics, it is clear that the simple model, developed by Fisher and others in order to make conclusions about populations from samples drawn randomly and independently from that population, requires special methods to deal with samples in the geosciences that may be far from independent, and often represent an entire population rather than a sample from it. This is an exciting time for the mathematical geosciences: new data streams are coming online, at finer and finer temporal and spatial resolution; new methods are emerging in machine learning and advanced analytics; computational power is no longer as expensive and constrained as it has been; the commercial sector is expanding; and new graduate students and faculty are bringing new ideas and enthusiasm. I hope this encyclopedia proves to be as useful and defining as it clearly has the potential to be, and I congratulate Dr. Sagar and his team once more on a magnificent achievement. University of California Santa Barbara
Michael F. Goodchild
References American Association of Geographers (2022) A white paper on locational information and the public interest. https://www.aag.org/wp-content/uploads/1900/09/2022-White-Paper-onLocational-Information-and-the-Public-Interest.pdf. Accessed 16 Nov 2022 Korzybski A (1933) Science and sanity. An introduction to non-Aristotelian systems and general semantics. The International Non-Aristotelian Library Pub. Co., pp 747–761 National Academies of Sciences, Engineering, and Medicine (2019) Reproducibility and replicability in science. National Academies Press, Washington, DC
Preface
The past six decades have proved monumental in the discovery and accumulation of facts and detailed data on the solid earth, oceans, atmosphere, and space science. Now an established field, Mathematical Geosciences, covers the original approaches, while bringing new applications of well-established mathematical and statistical techniques to address the various challenges encountered. Pioneering academics and scientists who developed the inceptive fabric of mathematical geosciences, in chronological order, to name a few, including Andrey Kolmogorov (Probability Theory), Ronald Fisher (Mathematical Statistics), Danie Krige (Ore Reserve Valuation), William Christian Krumbein (Sedimentary Geology), John C Griffiths (Econometrics and Sedimentary Petrology), John Tukey (Exploratory Data Analysis), Georges Matheron (Geostatistics), Benoit Mandelbrot (Fractal Geometry), Frits Agterberg (Geomathematics), Geoffrey Watson (Directional Statistics), Daniel Merriam (Computational Geosciences), Jean Serra (Mathematical Morphology), Dietrich Stoyan (Stochastic Geometry), John Aitchison (Compositional Data Analysis), and William Newman (Complexity Science), are among many others who emerged as mathematical geoscientists during the last three decades. Some of the pioneering associations, besides the International Association for Mathematical Geosciences (IAMG), that promote mathematical geosciences include Society for Industrial and Applied Mathematics (SIAM), IEEE’s Geoscience and Remote Sensing Society (GRSS), American Geophysical Union (AGU), European Geoscience Union (EGU), and the Indian Geophysical Union (IGU). With this backdrop, we have worked toward releasing the Encyclopedia of Mathematical Geosciences – as a sequel to Dictionary of Mathematical Geosciences and Handbook of Mathematical Geosciences published by Springer during the latter part of the last decade – receiving requisite attention from mathematicians and geoscientists alike, thus facilitating dialogue among the stakeholders to address outstanding issues in “Mathematics in Geosciences” in the years to come. We bring scientists, engineers, and mathematicians together contributing chapters related to the geosciences, and we opine that this encyclopedia is essential for the next generation of geoscience researchers and educators. It provides more than 300 entries in a lucid manner concerned with relevant conventional and modern mathematical, statistical, and computational techniques for application in the geosciences and space science. Such a resource should be highly useful for the next generation of Mathematical Geoscientists. Chapters in this encyclopedia are of three categories. On much broader topics of relevance to the Mathematical Geosciences, experienced pioneers have contributed about 30 Category-A type chapters and about 265 Category-B type chapters on entries of relevance to the mathematical geosciences, while Category-C type chapters which are brief biographies of fifty pioneering mathematical geoscientists. Most existing papers and books published in the Mathematical Geosciences are introduced for mathematically inclined readers prepared to go through the texts from beginning to end. The main goal of this Encyclopedia of Mathematical Geosciences is to make important keyword-specific chapters accessible to readers interested in specific topics but lacking the time to peruse the more comprehensive publications in their areas of interest. This encyclopedia humbly provides comprehensive references to over 300 topics in the field of Mathematical Geosciences. Needless to say, keywords that are mathematically important but without established geoscientific relevance and keywords that are relevant to geosciences without direct
ix
x
Preface
connection to mathematics were omitted. Each chapter was prepared in such a way that a layperson with high-school qualifications should be able to understand its content without too much difficulty. As much as possible, each chapter was prepared in a descriptive style with minimal mathematical jargon but directing the mathematically oriented readers to key references for further study. To our best, we have avoided redundancy in this two-volume encyclopedia, and we suggest that the reader refers to other encyclopedias in the Earth Science Series whenever needed. A host of challenges of geoscientific relevance commands the attention of applied mathematicians. Addressing some of those challenges also offers opportunities as well as the need to advance the foundations and techniques of applied mathematics in order to meet the challenges. Further scope entails several other entries – from dynamical systems, ergodic theory, renormalization, and multiscaling – which possess high potential to address challenges encountered in the geosciences. Some challenges relevant to fluid and solid media – such as the Earth’s interior and exterior, analysis, simulation, and prediction of terrestrial processes, ranging from climate change to earthquakes, space sciences, etc. – command the involvement of dynamical systems and ergodic theory, renormalization, and topological data analysis. While we like to receive feedback on what has been given emphasis within the gambit of “Mathematical Geosciences,” we also would like to explore what more needs to be done to broaden the portfolio of our scientific discipline. The encyclopedia will require revisions and updates at least once per decade. Its four editors reflect diverse disciplines and they have contributed significantly, from the EOMG’s inceptive stage to its release. The encyclopedia is available not only as a two-volume printed version but also as an online reference work at http:// springerlink.com which will allow us to make regular updates as new entries and techniques emerge. B. S. Daya Sagar Qiuming Cheng Jennifer McKinley Frits Agterberg
Acknowledgments
Half-a-Decade-long work in bringing the Encyclopedia of Mathematical Geosciences to its concluding stage was a herculean task. During the last 5 years (2018–2023), countless hours were spent on the listing of keyword-specific entries, inviting potential authors to contribute keyword-specific chapters, reviewing the submitted chapters, finalizing and forwarding them for subsequent work at the production department, and organizing the corrected galley proofs, followed by the preparation of preliminary content, and front and back matter. There was huge coordination involved throughout among authors, reviewers, sectional editors, editors, the staff of the Springer Nature Publishers, and the production department. We gratefully acknowledge the help and support of the authors ranging from multiple decades of experience to the new generation who have graciously contributed excellent chapters, investing their time, knowledge, and expertise. The basic idea of producing this encyclopedia came from Daya Sagar who subsequently by far has made the largest contribution to making this project a success. The other editors are grateful to him for his initiatives and guidance. This kind of project is successful mostly because of the support, cooperation, and guidance received from highly experienced senior colleagues; viz., Frits Agterberg, Ricardo Olea, Gabor Korvin, Eric Grunsky, John Schuenemeyer, and younger colleagues Jaya Sreevalsan-Nair and Lim Sin Liang, among others. We (Sagar, Cheng, and McKinley) owe a lot to Frits for his guidance and sharp memory, and we feel fortunate to have him and his 24 7 working style that has helped us past several hiccups during the past 5 years. Frits: We apologize for requiring your time very often even while you were hospitalized and ill for three months during the first quarter of 2021. The timely permissions granted/arranged for a few copyrighted media of information by Teja Rao, Laurent Mandelbrot, Karen Kafadar, Ramana Sonty, and Britt-Louise Kinnefors are gratefully acknowledged. The considerable support received from Annett Buettner, Sylvia Blago, and Johanna Klute from Springer Nature Publishers is simply remarkable, and the efficiency, patience, relentless reminders, and strong-willed determination that they have shown throughout every phase of this monumental project are highly professional. The lion’s share of the success should be theirs, besides the contributions of the authors. B. S. Daya Sagar Qiuming Cheng Jennifer McKinley Frits Agterberg
xi
Contents
Accuracy and Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P. A. Dowd
1
Additive Logistic Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gianna Serafina Monti, Glòria Mateu-Figueras and Karel Hron
4
Additive Logistic Skew-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Glòria Mateu-Figueras and Karel Hron
10
Agterberg, Frits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qiuming Cheng
15
Aitchison, John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John Bacon-Shone
17
Algebraic Reconstruction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Utkarsh Gupta and Uma Ranjan
17
Allometric Power Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gabor Korvin
24
Argand Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James Irving and Eric P. Verrecchia
28
Artificial Intelligence in the Earth Sciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norman MacLeod
31
Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuanyuan Tian, Mi Shu and Qingren Jia
43
Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro C. Frery
47
Automatic and Programmed Gain Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gabor Korvin
49
Backprojection Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Utkarsh Gupta and Uma Ranjan
55
Bayes’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Konstantinos Modis
61
Bayesian Inversion in Geoscience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dario Grana, Klaus Mosegaard and Henning Omre
65
Bayesian Maximum Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Junyu He and George Christakos
71
Best Linear Unbiased Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter K. Kitanidis
79
xiii
xiv
Contents
Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaogang Ma
85
Binary Mathematical Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aditya Challa, Sravan Danda and B. S. Daya Sagar
92
Binary Partition Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sravan Danda, Aditya Challa and B. S. Daya Sagar
94
Bonham-Carter, Graeme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eric Grunsky
95
Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oktay Erten and Clayton V. Deutsch
96
Burrough, Peter Alan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrew U. Frank
100
Cartogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael T. Gastner
103
Chaos in Geosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg and Qiuming Cheng
107
Chayes, Felix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Richard J. Howarth
113
Cheng, Qiuming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
114
Circular Error Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
115
Cloud Computing and Cloud Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Liping Di and Ziheng Sun
123
Cluster Analysis and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonella Buccianti and Caterina Gozzi
127
Compositional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vera Pawlowsky-Glahn and Juan José Egozcue
133
Computational Geoscience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eric Grunsky
143
Computer-Aided or Computer-Assisted Instruction . . . . . . . . . . . . . . . . . . . . . . . . Madhurima Panja and Uttam Kumar
165
Concentration-Area Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behnam Sadeghi
169
Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deeksha Aggarwal and Uttam Kumar
175
Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indu Solomon and Uttam Kumar
180
Copula in Earth Sciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sandra De Iaco and Donato Posa
184
..............................................
193
Correlation and Scaling Frits Agterberg
Contents
xv
Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guocheng Pan
201
Coupled Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Ali Aghighi and Hamid Roshan
208
Cracknell, Arthur Phillip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kasturi Devi Kanniah
213
Cressie, Noel A.C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jay M. Ver Hoef
214
Crystallographic Preferred Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Helmut Schaeben
215
Cumulative Probability Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sravan Danda, Aditya Challa and B. S. Daya Sagar
222
..................................................
225
Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rajendra Mohan Panda and B. S. Daya Sagar
226
Data Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Madhurima Panja, Tanujit Chakraborty and Uttam Kumar
230
.....................................................
235
Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tao Wen
238
Data Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
241
Database Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rajendra Mohan Panda and B. S. Daya Sagar
247
David, Michel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Denis Marcotte
250
................................................
251
De Wijs Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuyun Xie
252
Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rajendra Mohan Panda and B. S. Daya Sagar
257
Deep CNN-Based AI for Geosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehala Balamurali
262
Deep Learning in Geoscience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fuchang Gao
267
Dendrogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rogério G. Negri
271
Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio Zlotnik and Jaume Soler Villanueva
274
Dangermond, Jack Lowell Kent Smith
Data Life Cycle Fang Huang
Davis, John Clements Ricardo A. Olea
xvi
Contents
Digital Elevation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pradeep Srivastava and Sanjay Singh
277
Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gabor Korvin
290
Digital Geological Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chengbin Wang and Xiaogang Ma
293
Digital Twins of the Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefano Nativi and Max Craglia
295
Dimensionless Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ekta Baranwal
299
Discrete Prolate Spheroidal Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dionissios T. Hristopulos
303
Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Arun Kumar
307
Discriminant Function Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norman MacLeod
311
Doveton, John Holroyd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John C. Davis
317
Earth Surface Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vikrant Jain, Ramendra Sahoo and R. N. Singh
319
Earth System Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gang Liu and Hongfei Zhang
325
Earthquake Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . William I. Newman
328
Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
336
Electrofacies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John H. Doveton
339
...........................................
342
Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julian M. Ortiz and Jorge F. Silva
346
Euler Poles of Tectonic Plates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Helmut Schaeben, Uwe Kroner and Tobias Stephan
350
Expectation-Maximization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
355
............................................
358
Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emmanuel John M. Carranza
364
FAIR Data Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdullah Alowairdhi and Xiaogang Ma
369
Ensemble Kalman Filtering J. Jaime Gómez-Hernández
Exploration Geochemistry David R. Cohen
Contents
xvii
..............................................
372
Fast Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indu Solomon and Uttam Kumar
376
Favorability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guocheng Pan
380
Fisher, Ronald A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
387
Flow in Porous Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniele Pedretti
388
..............................
393
Fast Fourier Transform Frits Agterberg
Forward and Inverse Stratigraphic Models Cedric M. Griffiths
Fractal Geometry and the Downscaling of Sun-Induced Chlorophyll Fluorescence Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Quiros-Vargas, Bastian Siegmann, Alexander Damm, Ran Wang, John Gamon, Vera Krieger, B. S. Daya Sagar, Onno Muller and Uwe Rascher
404
Fractal Geometry in Geosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qiuming Cheng and Frits Agterberg
407
Fractal Landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ramendra Sahoo
430
Frequency Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ricardo A. Olea
435
Frequency-Wavenumber Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
437
Full Normal Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gianna Serafina Monti and Glòria Mateu-Figueras
445
Fuzzy C-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
447
Fuzzy Inference Systems for Mineral Exploration . . . . . . . . . . . . . . . . . . . . . . . . . Bijal Chudasama, Sanchari Thakur and Alok Porwal
449
Fuzzy Set Theory in Geosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behnam Sadeghi, Alok Porwal, Amin Beiranvand Pour and Majid Rahimzadegan
454
Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Arun Kumar
465
Geocomputing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alice-Agnes Gabriel, Marcus Mohr and Bernhard S. A. Schuberth
468
Geographical Information Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parvatham Venkatachalam
473
Geographically Weighted Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiang Que and Shaoqiang Su
485
Geohydrology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Faramarz Doulati Ardejani, Behshad Jodeiri Shokri, Soroush Maghsoudy, Majid Shahhosseiny, Fojan Shafaei, Farzin Amirkhani Shiraz and Andisheh Alimoradi
489
xviii
Contents
Geoinformatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeffrey M. Yarus, Jordan M. Yarus and Roger H. French
494
Geologic Time Scale Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Felix M. Gradstein and Frits Agterberg
502
....................................................
512
Geomatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rifaat Abdalla
519
Geomechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pejman Tahmasebi
523
Geoscience Signal Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
526
Geosciences Digital Ecosystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefano Nativi and Paolo Mazzetti
534
Geostatistical Seismic Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leonardo Azevedo and Amílcar Soares
539
Geostatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H. S. Pandalai and A. Subramanyam
543
GeoSyntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cedric M. Griffiths
568
Geotechnics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pejman Tahmasebi
573
Geothermal Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katsuaki Koike and Shohei Albert Tomita
575
Global and Regional Climatic Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuriy Kostyuchenko, Igor Artemenko, Mohamed Abioui and Mohammed Benssaou
582
Goodchild, Michael F. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. S. Daya Sagar
586
Gradstein, Felix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
587
Grain Size Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Klichowicz and Holger Lieberwirth
588
Graph Mathematical Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rahisha Thottolil and Uttam Kumar
594
Grayscale Mathematical Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sravan Danda, Aditya Challa and B. S. Daya Sagar
600
Griffiths, John Cedric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Donald A. Singer
601
Harbaugh, John W. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Johannes Wendebourg
603
Harff, Jan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Meschede
604
Geomathematics Frits Agterberg
Contents
xix
High-Order Spatial Stochastic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roussos Dimitrakopoulos and Lingqing Yao
605
Hilbert Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daisy Arroyo
613
Horton, Robert Elmer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keith Beven
618
Howarth, Richard J. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. M. McArthur
619
Hurst Exponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sid-Ali Ouadfeul and Leila Aliouane
620
Hyperspectral Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniele Marinelli, Francesca Bovolo and Lorenzo Bruzzone
625
Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rogério G. Negri
630
Hypsometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sebastiano Trevisani and Lorenzo Marchi
633
Imputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Javier Palarea-Albaladejo
637
Independent Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
642
...................................
644
International Generic Sample Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sarah Ramdeen, Kerstin Lehnert, Jens Klump and Lesley Wyborn
656
Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
660
Interquartile Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro C. Frery
664
Inverse Distance Weight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Narayan Panigrahi
666
Inversion Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. N. Singh and Ajay Manglik
672
Inversion Theory in Geoscience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shib Sankar Ganguli and V. P. Dimri
678
Iterative Weighted Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sara Taskinen and Klaus Nordhausen
688
Journel, André . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Mohan Srivastava
693
.................................................
695
K-Medoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
697
Induction, Deduction, and Abduction Frits Agterberg
K-Means Clustering Jaya Sreevalsan-Nair
xx
Contents
K-nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
700
..............................................
702
Korvin, Gabor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. S. Daya Sagar
703
Krige, Daniel Gerhardus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Richard Minnitt
704
Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xavier Freulon and Nicolas Desassis
705
Krumbein, William Christian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. H. Timothy Whitten
713
Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
715
.............................................
717
Least Absolute Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U. Radojičić and Klaus Nordhausen
720
.................................................
724
Least Median of Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sara Taskinen and Klaus Nordhausen
728
Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark A. Engle
731
LiDAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
735
Linear Unmixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katherine L. Silversides
739
Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christien Thiart
741
Local Singularity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenlei Wang, Shuyun Xie and Zhijun Chen
744
.................................
748
Logistic Attractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Balakrishnan Ashok
751
Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Arun Kumar
756
Kolmogorov, Andrey N. Frits Agterberg
Least Absolute Deviation A. Erhan Tercan
Least Mean Squares Mark A. Engle
Locally Weighted Scatterplot Smoother Klaus Nordhausen and Sara Taskinen
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Helmut Schaeben
759
............................................
766
Log-Likelihood Ratio Test Alejandro C. Frery
Contents
xxi
Lognormal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Glòria Mateu-Figueras and Ricardo A. Olea
769
Lorenz, Edward Norton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kerry Emanuel
773
Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanujit Chakraborty and Uttam Kumar
774
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feifei Pan
781
Mandelbrot, Benoit B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katepalli R. Sreenivasan
784
Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Swathi Padmanabhan and Uma Ranjan
785
Markov Chains: Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adway Mitra
791
..............................................
795
.................................................
799
............................................
801
Mathematical Minerals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John H. Doveton
818
Mathematical Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean Serra
820
Matheron, Georges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean Serra
835
Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deeksha Aggarwal and Uttam Kumar
836
Maximum Entropy Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dionissios T. Hristopulos and Emmanouil A. Varouchakis
842
Maximum Entropy Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eulogio Pardo-Igúzquiza and Francisco J. Rodríguez-Tovar
845
Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
851
McCammon, Richard B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael E. Hohn
853
Membership Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Candan Gokceoglu
854
..................................................
857
Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simon J. D. Cox
858
Markov Random Fields Uma Ranjan Marsily, Ghislain de Craig T. Simmons
Mathematical Geosciences Qiuming Cheng
Merriam, Daniel F. D. Collins
xxii
Contents
Mine Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. Caciagli
860
Mineral Prospectivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Emmanuel John M. Carranza
862
Minimum Entropy Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
868
............................
870
Mining Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youhei Kawamura
875
.....................................................
881
Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jessica Silva Lomba and Maria Isabel Fraga Alves
883
Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Klaus Mosegaard
890
Moran’s Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giuseppina Giungato and Sabrina Maggio
897
Morphological Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aditya Challa, Sravan Danda and B. S. Daya Sagar
901
Morphological Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sravan Danda, Aditya Challa and B. S. Daya Sagar
902
Morphological Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aditya Challa, Sravan Danda and B. S. Daya Sagar
906
..............................................
910
Morphological Opening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sravan Danda, Aditya Challa and B. S. Daya Sagar
921
Morphological Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sin Liang Lim and B. S. Daya Sagar
923
Morphometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sebastiano Trevisani and Igor V. Florinsky
928
Moving Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
934
Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francky Fouedjio
938
Multifractals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Renguang Zuo
945
Multilayer Perceptrons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Özgen Karacan
951
....................................................
954
Minimum Maximum Autocorrelation Factors U. Mueller
Modal Analysis Frits Agterberg
Morphological Filtering Jean Serra
Multiphase Flow Juliana Y. Leung
Contents
xxiii
Multiple Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U. Mueller
958
Multiple Point Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jef Caers, Gregoire Mariethoz and Julian M. Ortiz
960
Multiscaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
970
Multivariate Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monica Palma and Sabrina Maggio
974
Multivariate Data Analysis in Geosciences, Tools . . . . . . . . . . . . . . . . . . . . . . . . . . John H. Schuenemeyer
981
Němec, Václav . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan Harff and Niichi Nichiwaki
985
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vladik Kreinovich
986
Nonlinear Mapping Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jie Zhao, Wenlei Wang, Qiuming Cheng and Yunqing Shao
990
Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenlei Wang, Jie Zhao and Qiuming Cheng
994
.................................................
999
............................................
1003
Object-Based Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Arun Kumar, M. Venkatanarayana and V. S. S. Murthy
1008
Olea, Ricardo A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Ӧzgen Karacan
1012
Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Torsten Hahmann
1013
Open Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lucia R. Profeta
1017
Optimization in Geosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilyas Ahmad Huqqani and Lea Tien Tay
1020
.......................................
1024
Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christopher Kotsakis
1032
Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. N. Singh and Ajay Manglik
1039
Particle Swarm Optimization in Geosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joseph Awange, Béla Paláncz and Lajos Völgyesi
1047
Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Uttam Kumar
1053
Normal Distribution Jaya Sreevalsan-Nair
Object Boundary Analysis Hannes Thiergärtner
Ordinary Differential Equations R. N. Singh and Ajay Manglik
xxiv
Contents
Pattern Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057 Sanchari Thakur, LakshmiKanthan Muralikrishnan, Bijal Chudasama and Alok Porwal Pattern Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katherine L. Silversides
1061
Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rogério G. Negri
1063
Pawlowsky-Glahn, Vera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ricardo A. Olea
1065
Pengda, Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qiuming Cheng and Frits Agterberg
1066
Plurigaussian Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nasser Madani
1067
Point Pattern Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dietrich Stoyan
1073
Polarimetric SAR Data Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed Yahia, Tarig Ali, Md Maruf Mortula and Riadh Abdelfattah
1079
Pore Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weiren Lin and Nana Kamiya
1083
Porosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pham Huy Giao and Pham Huy Nguyen
1086
Porous Medium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jennifer McKinley
1090
Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abhey Ram Bansal and V. P. Dimri
1092
Predictive Geologic Mapping and Mineral Exploration . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
1095
Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1108 Alessandra Menafoglio Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1112 Abraão D. C. Nascimento Proximity Regression Jaya Sreevalsan-Nair
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116
Q-Mode Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119 Norman MacLeod Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126 N. Caciagli Quality Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1130 N. Caciagli Quantitative Geomorphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135 Vikrant Jain, Shantamoy Guha and B. S. Daya Sagar Quantitative Stratigraphy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1152 Felix Gradstein and Frits Agterberg
Contents
xxv
Quantitative Target Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1166 Brandon Wilson, Emmanuel John M. Carranza and Jeff B. Boisvert Quaternions and Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169 Helmut Schaeben Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1175 Michael Hillier Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1182 Emil D. Attanasi and Timothy C. Coburn Random Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185 J. Jaime Gómez-Hernández Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1188 Ricardo A. Olea Rank Score Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1192 Alejandro C. Frery Rao, C. R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194 B. L. S. Prakasa Rao Rao, S. V. L. N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1195 B. S. Daya Sagar Realizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196 C. Özgen Karacan Reduced Major Axis Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199 Muhammad Bilal, Md. Arfan Ali, Janet E. Nichol, Zhongfeng Qiu, Alaa Mhawish and Khaled Mohamed Khedher Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Arun Kumar, G. Hemalatha and M. Venkatanarayana
1203
Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ranganath Navalgund and Raghavendra P. Singh
1206
Reproducible Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anirudh Prabhu and Peter Fox
1209
Rescaled Range Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gabor Korvin
1213
Reyment, Richard A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Bengtson
1218
R-Mode Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norman MacLeod
1219
....................................................
1225
...................................
1229
Rodionov, Dmitriy Alekseevich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hannes Thiergärtner
1234
Rodriguez-Iturbe, Ignacio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. S. Daya Sagar
1235
Robust Statistics Peter Filzmoser
Rock Fracture Pattern and Modeling Katsuaki Koike and Jin Wu
xxvi
Contents
Root Mean Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tianfeng Chai
1236
...........................
1239
..........................................
1244
Scattergram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Richard J. Howarth
1256
Schuenemeyer, John H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Donald Gautier
1259
Schwarzacher, Walther . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jennifer McKinley
1260
.........................................
1261
Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sid-Ali Ouadfeul, Leila Aliouane and Mohamed Zinelabidine Doghmane
1264
Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Valentina Ciriello and Daniel M. Tartakovsky
1271
Sequence Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehala Balamurali
1273
Sequential Gaussian Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Jaime Gómez-Hernández
1276
Serra, Jean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Philippe Salembier
1279
Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norman MacLeod
1280
Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed Zinelabidine Doghmane, Sid-Ali Ouadfeul and Leila Aliouane
1288
Signal Processing in Geosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Chandrasekhar and Rizwan Ahmed Ansari
1297
Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
1320
Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behnam Sadeghi and Julian M. Ortiz
1322
Singular Spectrum Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U. Radojičić, Klaus Nordhausen and Sara Taskinen
1328
Singularity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behnam Sadeghi and Frits Agterberg
1332
Smoothing Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro C. Frery
1339
Spatial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qiyu Chen, Gang Liu, Xiaogang Ma and Xiang Que
1342
Sampling Importance: Resampling Algorithms Anindita Dasgupta and Uttam Kumar Scaling and Scale Invariance S. Lovejoy
Sedimentation and Diagenesis Nana Kamiya and Weiren Lin
Contents
xxvii
Spatial Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Donato Posa and Sandra De Iaco
1345
Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sabrina Maggio and Claudia Cappello
1353
Spatial Data Infrastructure and Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . Jagadish Boodala, Onkar Dikshit and Nagarajan Balasubramanian
1358
Spatial Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noel Cressie and Matthew T. Moores
1362
Spatiotemporal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sandra De Iaco, Donald E. Myers and Donato Posa
1373
Spatiotemporal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shrutilipi Bhattacharjee, Johannes Madl, Jia Chen and Varad Kshirsagar
1382
Spatiotemporal Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shrutilipi Bhattacharjee, Johannes Madl, Jia Chen and Varad Kshirsagar
1386
Spatiotemporal Weighted Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiang Que, Xiaogang Ma, Chao Ma, Fan Liu and Qiyu Chen
1390
Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katherine L. Silversides
1396
Spectrum-Area Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behnam Sadeghi
1398
Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bengt Fornberg
1403
Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro C. Frery
1408
Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Francky Fouedjio
1410
Statistical Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gilles Bourgault
1414
Statistical Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alan D. Chave
1428
..........................................
1439
Statistical Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehala Balamurali and Raymond Leung
1443
............................................
1451
Statistical Rock Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gabor Korvin
1456
................................................
1472
Stereographic Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
1486
Statistical Inferential Testing R. Webster
Statistical Quality Control Jörg Benndorf
Statistical Seismology Jiancang Zhuang
xxviii
Contents
Stereology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leszek Wojnar
1490
.................................
1501
Stoyan, Dietrich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jan Harff
1510
Stochastic Geometry in the Geosciences Dietrich Stoyan
Stratigraphic Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1511 Katherine L. Silversides Structuring Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. S. Daya Sagar and D. Arun Kumar
1513
..........................
1516
Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rogério G. Negri
1520
t-Distributed Stochastic Neighbor Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehala Balamurali
1527
Text Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chengbin Wang and Xiaogang Ma
1535
Thickening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. S. Daya Sagar and D. Arun Kumar
1537
................................................
1539
Three-Dimensional Geologic Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qiyu Chen, Gang Liu, Xiaogang Ma and Junqiang Zhang
1540
Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abraão D. C. Nascimento
1545
.................................
1551
Tobler, Waldo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Batty
1559
Topology in Geosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florian Wellmann
1561
...........................................
1566
Trend Surface Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
1567
Tukey, John Wilder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
1575
Turbulence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
1576
Turcotte, Donald L. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lawrence Cathles and John Rundle
1578
Sums of Geologic Random Variables, Properties G. M. Kaufman
Thiergärtner, Hannes Heinz Burger
Time Series Analysis in the Geosciences Klaus Nordhausen
Total Alkali-Silica Diagram Ricardo A. Olea
Contents
xxix
Unbalanced Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Webster
1579
Uncertainty Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behnam Sadeghi, Eric Grunsky and Vera Pawlowsky-Glahn
1583
Unit Regional Production Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gunes Ertunc
1589
Unit Regional Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frits Agterberg
1591
Univariate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro C. Frery
1593
Universality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dionissios T. Hristopulos
1597
Upper Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Touhid Mohammad Hossain and Junzo Watada
1600
Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claudia Cappello and Monica Palma
1605
Variogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behnam Sadeghi
1609
Very Fast Simulated Reannealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dionissios T. Hristopulos
1614
Virtual Globe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaya Sreevalsan-Nair
1619
Vistelius, Andrey Borisovich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephen Henley
1623
Watson, Geoffrey S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noel Cressie and Carol A. Gotway Crawford
1625
Wavelets in Geosciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guoxiong Chen and Henglei Zhang
1626
Wave-Number Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sid-Ali Ouadfeul and Leila Aliouane
1636
Whitten, Eric Harold Timothy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John Cubitt
1642
Zadeh, Lotfi A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Behnam Sadeghi
1643
Z-Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sathishkumar Samiappan, Rajendra Mohan Panda and B. S. Daya Sagar
1644
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1647
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1651
About the Editors
B. S. Daya Sagar (born February 24, 1967) is a Full Professor of the Systems Science and Informatics Unit at the Indian Statistical Institute. Sagar received his MSc (1991) and PhD (1994) degrees in Geoengineering and Remote Sensing from the Andhra University, India. He worked at Andhra University (1992–1998), The National University of Singapore (1998–2001), and Multimedia University, Malaysia (2001–2007). Sagar has made significant contributions to the field of mathematical earth sciences, remote sensing, spatial data sciences, and mathematical morphology. He has published over 90 papers in journals and has authored or guest-edited 14 books or journal special issues. He authored Mathematical Morphology in Geomorphology and GISci (CRC Press, 2013, p. 546) and Handbook of Mathematical Geosciences (Springer, 2018, p. 942). He was elected a Fellow of Royal Geographical Society, an IEEE Senior Member, a Fellow of the Indian Geophysical Union, and a Fellow of the Indian Academy of Sciences. He was awarded the Dr. Balakrishna Memorial Award – 1995, the IGU-Krishnan Medal – 2002, the IAMG “Georges Matheron Award – 2011,” the IAMG Certificate of Appreciation – 2018 Award, and the IEEE-GRSS Distinguished Lecturership Award. He is a member of the AGU’s Honors and Recognition Committee (2022–23). He is (was) on the Editorial Boards of Computers and Geosciences, Frontiers: Environmental Informatics, and Mathematical Geosciences. He is the Co-Editor-in-Chief of the Springer’s Encyclopedia of Mathematical Geosciences. Qiuming Cheng is currently a Professor at the School of Earth Science and Engineering, Sun Yat-Sen University, Zhuhai, and the Founding Director of the State Key Lab of Geological Processes and Mineral Resources, China University of Geosciences (Beijing, Wuhan). He received his PhD in Earth Science from the University of Ottawa under the supervision of Dr. Frits Agterberg in 1994. Qiuming was a faculty member at York University, Toronto, Canada (1995–2015). Qiuming has specialized in mathematical geoscience with a research focus on nonlinear mathematical modeling of earth processes and geoinformatics for mineral resources prediction. He has authored over 300 research articles. He received the IAMG’s Krumbein Medal – 2008 and the AAG’s Gold Medal – 2020. He is a member of the Chinese Academy of Sciences and a foreign member of the Academia Europaea. He was President of the IAMG (2012–2016) and President of the IUGS (2016–2020). He xxxi
xxxii
About the Editors
has promoted broadening the scope of MG from more traditional statistical geology to interdisciplinary mathematical geosciences and the collaboration among geounions and other organizations on big data and geo-intelligence in solid earth science and geoscientific input to the Future Earth Program. He initialized and established the IUGS big science program on Deep-time Digital Earth (DDE). Jennifer McKinley is Professor of Mathematical Geoscience, in Geography and Director of the Centre for GIS and Geomatics at Queen’s University Belfast (QUB), UK. As a Chartered Geologist, her research has focused on the application of spatial analysis techniques, including geostatistics, spatial data analysis, compositional data analysis, and Geographical Information Science (GIS), to natural resource management, soil and water geochemistry, environmental and criminal forensics, human health and the environment, and nature-based solutions for urban environments. Jennifer’s international leadership roles include Councilor of the International Union of Geosciences (IUGS 2020–2024), President of the Governing Council of the Deep-time Digital Earth Initiative (DDE 2020–2024), and Past President of the International Association of Mathematical Geoscientists (IAMG 2016–2020). Jennifer has served on learned committees including the Royal Irish Academy and Geological Society of London and sits on the Giant’s Causeway UNESCO World Heritage Site Steering Group. Frederik Pieter Agterberg (born November 15, 1936) is a DutchCanadian mathematical geoscientist. He obtained BSc (1957), MSc (1959), and PhD (1961) degrees at Utrecht University. After 1 year as WARF postdoctoral fellow at the University of Wisconsin, he joined the Canadian Geological Survey (GSC) as “petrological statistician.” In 1969, he formed the GSC Geomathematics Section, which he headed until his retirement in 1996 when he became a GSC Emeritus Scientist. “Frits” taught Statistics in Geology course at the University of Ottawa from 1968 to 1992 before becoming an Adjunct Professor in the Earth Science Department at the same university. From 1983 to 1989 he was also Adjunct Professor in Mathematics Department at Carleton University. He was instrumental in establishing the International Association for Mathematical Geosciences (IAMG) in 1968, received the IAMG’s William Christian Krumbein Medal in 1978, and was IAMG President from 2004 to 2008. He was named IAMG Distinguished Lecturer in 2004 and IAMG Honorary Member in 2017. Frits was made Correspondent of the Royal Netherlands Academy of Arts and Sciences in 1981. He is the sole author of 4 books, editor or co-editor of 11 other books, and author or co-author of over 300 scientific publications. He was co-editor of the Handbook of Mathematical Geosciences published in 2018.
Editorial Board
Eric Grunsky Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON, Canada Lim Sin Liang Faculty of Engineering, Multimedia University, Persiaran Multimedia, Selangor, Malaysia Gang Liu School of Computer Science, China University of Geosciences, Wuhan, Hubei, China Xiaogang (Marshall) Ma Department of Computer Science, University of Idaho, Moscow, ID, USA Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India Ricardo A. Olea Geology, Energy and Minerals Science Center, U.S. Geological Survey, Reston, VA, USA Dinesh Sathyamoorthy Science and Technology Research Institute for Defence (STRIDE), Ministry of Defence, Malaysia John H. Schuenemeyer Southwest Statistical Consulting, LLC, Cortez, CO, USA
xxxiii
Contributors
Rifaat Abdalla Department of Earth Sciences, College of Science, Sultan Qaboos University, Al-Khoudh, Muscat, Oman Riadh Abdelfattah COSIM Lab, Higher School of Communications of Tunis, Université of Carthage, Carthage, Tunisia Departement of ITI, Télécom Bretagne, Institut de Télécom, Brest, France Mohamed Abioui Department of Earth Sciences, Faculty of Sciences, Ibn Zohr University, Agadir, Morocco Deeksha Aggarwal Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India Mohammad Ali Aghighi School of Minerals and Energy Resources Engineering, University of New South Wales, Sydney, NSW, Australia Frits Agterberg Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada Md. Arfan Ali School of Marine Sciences, Nanjing University of Information Science and Technology, Nanjing, China Tarig Ali GIS and Mapping Laboratory, American University of Sharjah United Arab Emirates, Sharjah, UAE Department of Civil Engineering, American University of Sharjah, Sharjah, UAE Andisheh Alimoradi Department of Mining Engineering, Imam Khomeini International University, Qazvin, Iran Leila Aliouane LABOPHT, Faculty of Hydrocarbons and Chemistry, University of Boumerdes, Boumerdes, Algeria Faculty of Hydrocarbons and Chemistry, University M’hamed Bougara of Boumerdes, Boumerdes, Algeria Abdullah Alowairdhi Department of Computer Science, University of Idaho, Moscow, ID, USA Farzin Amirkhani Shiraz Mine Environment and Hydrogeology Research Laboratory (MEHR Lab), University of Tehran, Tehran, Iran xxxv
xxxvi
Rizwan Ahmed Ansari Department of Electrical Engineering, Veermata Jijabai Technological Institute, Mumbai, India Daisy Arroyo Department of Statistics, Universidad de Concepción, Concepcion, Chile Igor Artemenko Heat and Mass Exchange in Geo-systems Department, Scientific Centre for Aerospace Research of the Earth, National Academy of Sciences of Ukraine, Kiev, Ukraine D. Arun Kumar Department of Electronics and Communication Engineering and Center for Research and Innovation, KSRM College of Engineering, Kadapa, Andhra Pradesh, India Balakrishnan Ashok Centre for Complex Systems and Soft Matter Physics, International Institute of Information Technology Bangalore, Electronics City, Bangalore, Karnataka, India Emil D. Attanasi US Geological Survey, Reston, VA, USA Joseph Awange School of Earth and Planetary Sciences, Discipline of Spatial Sciences, Curtin University, Perth, WA, Australia Leonardo Azevedo CERENA, DECivil, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal John Bacon-Shone Social Sciences Research Centre, The University of Hong Kong, Hong Kong, China Mehala Balamurali Rio Tinto Centre for Mine Automation, Australian Centre for Field Robotics, Faculty of Engineering, The University of Sydney, Sydney, NSW, Australia Nagarajan Balasubramanian Department of Civil Engineering, Indian Institute of Technology Kanpur, Kanpur, UP, India Abhey Ram Bansal Gravity and Magnetic Group, CSIR- National Geophysical Research Institute, Hyderabad, India Ekta Baranwal Department of Civil Engineering, Jamia Millia Islamia, New Delhi, India Michael Batty University College London, London, UK Peter Bengtson Institute of Earth Sciences, Heidelberg University, Heidelberg, Germany Jörg Benndorf TU Bergakademie Freiberg, Freiberg, Germany Mohammed Benssaou Department of Earth Sciences, Faculty of Sciences, Ibn Zohr University, Agadir, Morocco Keith Beven Lancaster University, Lancaster, UK Shrutilipi Bhattacharjee National Institute of Technology Karnataka (NITK), Surathkal, India Muhammad Bilal School of Marine Sciences, Nanjing University of Information Science and Technology, Nanjing, China
Contributors
Contributors
xxxvii
Jeff B. Boisvert University of Alberta, Edmonton, AB, Canada Jagadish Boodala Department of Civil Engineering, Indian Institute of Technology Kanpur, Kanpur, UP, India Gilles Bourgault Computer Modelling Group Ltd., Calgary, AB, Canada Francesca Bovolo Center for Digital Society, Fondazione Bruno Kessler, Trento, Italy Lorenzo Bruzzone Department of Information Engineering and Computer Science, University of Trento, Trento, Italy Antonella Buccianti Department of Earth Sciences, University of Florence, Florence, Italy Heinz Burger Geoscience, Free University Berlin, Berlin, Germany N. Caciagli Metals Exploration, BHP Ltd, Toronto, ON, Canada Barrick Gold Corp, Toronto, ON, USA Jef Caers Department of Geological Sciences, Stanford University, Stanford, CA, USA Claudia Cappello Dipartimento di Scienze dell’Economia, Università del Salento, Lecce, Italy Emmanuel John M. Carranza Department of Geology, Faculty of Natural and Agricultural Sciences, University of the Free State, Bloemfontein, Republic of South Africa University of KwaZulu-Natal, Durban, South Africa Lawrence Cathles Department of Earth and Atmospheric Sciences, Cornell University, Ithaca, NY, USA Tianfeng Chai Cooperative Institute for Satellite Earth System Studies (CISESS), University of Maryland, College Park, MD, USA NOAA/Air Resources Laboratory (ARL), College Park, MD, USA Tanujit Chakraborty Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India Aditya Challa Computer Science and Information Systems, APPCAIR, Birla Institute of Technology and Science, Pilani, Goa, India E. Chandrasekhar Department of Earth Sciences, Indian Institute of Technology Bombay, Mumbai, India Alan D. Chave Department of Applied Ocean Physics and Engineering, Deep Submergence Laboratory, Woods Hole Oceanographic Institution, Woods Hole, MA, USA Guoxiong Chen State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences, Wuhan, China Jia Chen Technical University of Munich, Munich, Germany
xxxviii
Qiyu Chen School of Computer Science, China University of Geosciences, Wuhan, Hubei, China Zhijun Chen China University of Geosciences, Wuhan, China Qiuming Cheng School of Earth Science and Engineering, Sun Yat-Sen University, Zhuhai, China State Key Lab of Geological Processes and Mineral Resources, China University of Geosciences, Beijing, China George Christakos Department of Geography, San Diego State University, San Diego, CA, USA Bijal Chudasama Geological Survey of Finland, Espoo, Finland Valentina Ciriello Dipartimento di Ingegneria Civile, Chimica, Ambientale e dei Materiali (DICAM), Università di Bologna, Bologna, Italy Timothy C. Coburn Department of Systems Engineering, Colorado State University, Fort Collins, CO, USA David R. Cohen School of Biological, Earth and Environmental Sciences, The University of New South Wales, Sydney, NSW, Australia D. Collins Emeritus Associate Scientist, The University of Kansas, Lawrence, KS, USA Simon J. D. Cox CSIRO, Melbourne, VIC, Australia Max Craglia European Commission –Joint Research Centre (JRC), Ispra, Italy Carol A. Gotway Crawford Albuquerque, NM, USA Noel Cressie School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, Australia John Cubitt Wrexham, UK Alexander Damm Department of Geography, University of Zurich, Zürich, Switzerland Eawag, Swiss Federal Institute of Aquatic Science & Technology, Surface Waters – Research and Management, Dübendorf, Switzerland Sravan Danda Computer Science and Information Systems, APPCAIR, Birla Institute of Technology and Science, Pilani, Goa, India Anindita Dasgupta Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India John C. Davis Heinemann Oil GmbH, Baldwin City, KS, USA B. S. Daya Sagar Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
Contributors
Contributors
xxxix
Sandra De Iaco Department of Economics, Section of Mathematics and Statistics, University of Salento, National Biodiversity Future Center, Lecce, Italy Nicolas Desassis Mines ParisTech, PSL University, Centre de Géosciences, Fontainebleau, France Clayton V. Deutsch Centre for Computational Geostatistics, University of Alberta, Edmonton, AB, Canada Liping Di Center for Spatial Information Science and Systems, George Mason University, Fairfax, VA, USA Onkar Dikshit Department of Civil Engineering, Indian Institute of Technology Kanpur, Kanpur, UP, India Roussos Dimitrakopoulos COSMO – Stochastic Mine Planning Laboratory, Department of Mining and Materials Engineering, McGill University, Montreal, Canada V. P. Dimri CSIR-National Geophysical Research Institute, Hyderabad, Telangana, India Mohamed Zinelabidine Doghmane Department of Geophysics, FSTGAT, University of Science and Technology Houari Boumediene, Algiers, Algeria Faramarz Doulati Ardejani School of Mining, College of Engineering, University of Tehran, Tehran, Iran Mine Environment and Hydrogeology Research Laboratory (MEHR Lab), University of Tehran, Tehran, Iran John H. Doveton Kansas Geological Survey, University of Kansas, Lawrence, KS, USA P. A. Dowd School of Civil, Environmental and Mining Engineering, The University of Adelaide, Adelaide, SA, Australia Juan José Egozcue Department of Civil and Environmental Engineering, U. Politècnica de Catalunya, Barcelona, Spain Kerry Emanuel Lorenz Center, Massachusetts Institute of Technology, Cambridge, MA, USA Mark A. Engle Department of Earth, Environmental and Resource Sciences, University of Texas at El Paso, El Paso, Texas, USA Department of Geological Sciences, University of Texas at El Paso, El Paso, TX, USA Oktay Erten Centre for Computational Geostatistics, University of Alberta, Edmonton, AB, Canada Gunes Ertunc Department of Mining Engineering, Hacettepe University, Ankara, Turkey Peter Filzmoser Institute of Statistics & Mathematical Methods in Economics, Vienna University of Technology, Vienna, Austria Igor V. Florinsky Institute of Mathematical Problems of Biology, Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, Pushchino, Moscow Region, Russia
xl
Bengt Fornberg Department of Applied Mathematics, University of Colorado, Boulder, CO, USA Francky Fouedjio Department of Geological Sciences, Stanford University, Stanford, CA, USA Peter Fox Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA Maria Isabel Fraga Alves CEAUL & DEIO, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal Andrew U. Frank Geoinformation, Technical University, Vienna, Austria Roger H. French Material Sciences and Engineering, Case Western Reserve University, Cleveland, OH, USA Alejandro C. Frery School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand Xavier Freulon Mines ParisTech, PSL University, Centre de Géosciences, Fontainebleau, France Alice-Agnes Gabriel Department of Earth and Environmental Sciences, LMU Munich, Munich, Germany John Gamon Center for Advanced Land Management Information Technologies, School of Natural Resources, University of Nebraska–Lincoln, Lincoln, NE, USA Department of Earth and Atmospheric Sciences and Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada Shib Sankar Ganguli CSIR-National Geophysical Research Institute, Hyderabad, Telangana, India Fuchang Gao Department of Mathematics and Statistical Science, University of Idaho, Moscow, ID, USA Michael T. Gastner Division of Science, Yale-NUS College, Singapore, Singapore Donald Gautier DonGautier L.L.C., Palo Alto, CA, USA Pham Huy Giao PetroVietnam University (PVU), Baria-Vung Tau, Vietnam Vietnam Petroleum Institute (VPI), Hanoi, Vietnam Giuseppina Giungato Department of Economics, Section of Mathematics and Statistics, University of Salento, Lecce, Italy Candan Gokceoglu Department of Geological Engineering, Hacettepe University, Ankara, Turkey J. Jaime Gómez-Hernández Research Institute of Water and Environmental Engineering, Universitat Politècnica de València, Valencia, Spain
Contributors
Contributors
xli
Caterina Gozzi Department of Earth Sciences, University of Florence, Florence, Italy Felix M. Gradstein Founding Member and Past Chair, Geologic Time Scale Foundation, Emeritus Natural History Museum, University of Oslo, Oslo, Norway Dario Grana Department of Geology and Geophysics, University of Wyoming, Laramie, WY, USA Cedric M. Griffiths Predictive Geoscience, StrataMod Pty. Ltd., Greenmount, WA, Australia Department of Exploration Geophysics, Curtin University, Perth, Australia Eric Grunsky Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON, Canada Shantamoy Guha Discipline of Earth Sciences, IIT Gandhinagar, Gandhinagar, India Utkarsh Gupta Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, Karnataka, India Torsten Hahmann Spatial Informatics, School of Computing and Information Science, University of Maine, Orono, ME, USA Jan Harff Institute of Marine and Environmental Sciences, University of Szczecin, Szczecin, Poland Junyu He Ocean College, Zhejiang University, Zhoushan, China G. Hemalatha Department of Electronics and Communication Engineering, KSRM College of Engineering, Kadapa, Andhra Pradesh, India Stephen Henley Resources Computing International Ltd, Matlock, UK Michael Hillier Geological Survey of Canada, Ottawa, ON, Canada Michael E. Hohn Morgantown, WV, USA Touhid Mohammad Hossain Department of Computer and Information Sciences, Faculty of Science and Information Technology, Universiti Teknologi PETRONAS, Seri Iskandar, Malaysia Richard J. Howarth Department of Earth Sciences, University College London (UCL), London, UK Dionissios T. Hristopulos School of Electrical and Computer Engineering, Technical University of Crete, Chania, Crete, Greece Karel Hron Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czech Republic Fang Huang CSIRO Mineral Resources, Kensington, WA, Australia
xlii
Ilyas Ahmad Huqqani School of Electrical and Electronics Engineering, Universiti Sains Malaysia, Penang, Malaysia James Irving Institute of Earth Sciences, University of Lausanne, Lausanne, Switzerland Vikrant Jain Discipline of Earth Sciences, IIT Gandhinagar, Gandhinagar, India Qingren Jia College of Electronic Science and Technology, National University of Defense Technology, Changsha, China Behshad Jodeiri Shokri Department of Mining Engineering, Hamedan University of Technology, Hamedan, Iran Nana Kamiya Earth and Resource System Laboratory, Graduate School of Engineering, Kyoto University, Kyoto, Japan Laboratory of Earth System Science, School of Science and Engineering, Doshisha University, Kyotanabe, Japan Kasturi Devi Kanniah TropicalMap Research Group, Centre for Environmental Sustainability and Water Security (IPASA), Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia, Johor Bahru, Malaysia C. Özgen Karacan U.S. Geological Survey, Geology, Energy and Minerals Science Center, Reston, VA, USA G. M. Kaufman Sloan School of Management, MIT, Cambridge, MA, USA Youhei Kawamura Division of Sustainable Resources Engineering, Hokkaido University, Sapporo, Hokkaido, Japan Khaled Mohamed Khedher Department of Civil Engineering, College of Engineering, King Khalid University, Abha, Saudi Arabia Department of Civil Engineering, High Institute of Technological Studies, Mrezgua University Campus, Nabeul, Tunisia Peter K. Kitanidis Stanford University, Civil and Environmental Engineering & Computational and Mathematical Engineering, Stanford, CA, USA Michael Klichowicz Institute of Mineral Processing Machines and Recycling Systems Technology, Technische Universität Bergakademie Freiberg, Freiberg, Germany Jens Klump Mineral Resources, CSIRO, Perth, WA, Australia Katsuaki Koike Department of Urban Management, Graduate School of Engineering, Kyoto University, Kyoto, Japan Gabor Korvin Earth Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Kingdom of Saudi Arabia Yuriy Kostyuchenko Heat and Mass Exchange in Geo-systems Department, Scientific Centre for Aerospace Research of the Earth, National Academy of Sciences of Ukraine, Kiev, Ukraine International Institute for Applied Systems Analysis (IIASA), Laxenbourg, Austria
Contributors
Contributors
xliii
Christopher Kotsakis Department of Geodesy and Surveying, School of Rural and Surveying Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece Vladik Kreinovich Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Vera Krieger Institute of Bio- and Geosciences, IBG-2: Plant Sciences, Forschungszentrum Jülich GmbH, Jülich, Germany Uwe Kroner Technische Universität Bergakademie Freiberg, Freiberg, Germany Varad Kshirsagar Birla Institute of Technology and Science, Pilani, Hyderabad, India Uttam Kumar Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India Kerstin Lehnert Columbia University, Lamont Doherty Earth Observatory, Palisades, NY, USA Juliana Y. Leung University of Alberta, Edmonton, AB, Canada Raymond Leung Rio Tinto Centre for Mine Automation, Australian Centre for Field Robotics, Faculty of Engineering, The University of Sydney, Sydney, NSW, Australia Holger Lieberwirth Institute of Mineral Processing Machines and Recycling Systems Technology, Technische Universität Bergakademie Freiberg, Freiberg, Germany Sin Liang Lim Faculty of Engineering, Multimedia University, Cyberjaya, Selangor, Malaysia Weiren Lin Earth and Resource System Laboratory, Graduate School of Engineering, Kyoto University, Kyoto, Japan Fan Liu Google, Sunnyvale, CA, USA Gang Liu State Key Laboratory of Biogeology and Environmental Geology, Wuhan, Hubei, China School of Computer Science, China University of Geosciences, Wuhan, Hubei, China S. Lovejoy Physics, McGill University, Montreal, Canada Xiaogang Ma Department of Computer Science, University of Idaho, Moscow, ID, USA Chao Ma State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Chengdu University of Technology, Chengdu, China Norman MacLeod Department of Earth Sciences and Engineering, Nanjing University, Nanjing, Jiangsu, China Nasser Madani School of Mining and Geosciences, Nazarbayev University, Nur-Sultan city, Kazakhstan Johannes Madl Technical University of Munich, Munich, Germany
xliv
Sabrina Maggio Dipartimento di Scienze dell’Economia, Università del Salento, Lecce, Italy Soroush Maghsoudy School of Mining, College of Engineering, University of Tehran, Tehran, Iran Mine Environment and Hydrogeology Research Laboratory (MEHR Lab), University of Tehran, Tehran, Iran Ajay Manglik CSIR-National Geophysical Research Institute, Hyderabad, India Lorenzo Marchi Consiglio Nazionale delle Ricerche – Istituto di Ricerca per la Protezione Idrogeologica, Padova, Italy Denis Marcotte Department of Civil, Geological and Mining Engineering, Polytechnique Montreal, Montreal, QC, Canada Gregoire Mariethoz Institute of Earth Surface Dynamics, University of Lausanne, Lausanne, Switzerland Daniele Marinelli Department of Information Engineering and Computer Science, University of Trento, Trento, Italy Glòria Mateu-Figueras Department of Informatics, Applied Mathematics and Statistics, University of Girona, Girona, Spain Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Girona, Spain Paolo Mazzetti Institute of Atmospheric Pollution Research, Earth and Space Science Informatics Laboratory (ESSI-Lab), National Research Council of Italy, Rome, Italy J. M. McArthur Earth Sciences, UCL, London, UK Jennifer McKinley Geography, School of Natural and Built Environment, Queen’s University, Belfast, UK Alessandra Menafoglio MOX-Department of Mathematics, Politecnico di Milano, Milano, Italy Martin Meschede Institute of Geography and Geology, University of Greifswald, Greifswald, Germany Alaa Mhawish School of Marine Sciences, Nanjing University of Information Science and Technology, Nanjing, China Richard Minnitt School of Mining Engineering, University of the Witwatersrand, Johannesburg, South Africa Adway Mitra Centre of Excellence in Artificial Intelligence, Indian Institute of Technology Kharagpur, Kharagpur, India Konstantinos Modis School of Mining and Metallurgical Engineering, National Technical University of Athens, Athens, Greece
Contributors
Contributors
xlv
Marcus Mohr Department of Earth and Environmental Sciences, LMU Munich, Munich, Germany Gianna Serafina Monti Department of Economics, Management and Statistics, University of Milano-Bicocca, Milan, Italy Matthew T. Moores School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, Australia Md Maruf Mortula Department of Civil Engineering, American University of Sharjah, Sharjah, UAE Klaus Mosegaard Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark U. Mueller School of Science, Edith Cowan University, Joondalup, WA, Australia Onno Muller Institute of Bio- and Geosciences, IBG-2: Plant Sciences, Forschungszentrum Jülich GmbH, Jülich, Germany LakshmiKanthan Muralikrishnan Cybertech Systems and Software Limited, Thane, Maharashtra, India V. S. S. Murthy KSRM College of Engineering, Kadapa, Andhra Pradesh, India Donald E. Myers Department of Mathematics, University of Arizona, Tucson, AZ, USA Abraão D. C. Nascimento Universidade Federal de Pernambuco, Recife, Brazil Stefano Nativi Joint Research Centre (JRC), European Commission, Ispra, Italy Ranganath Navalgund Indian Space Research Organisation (ISRO), Bengaluru, India Rogério G. Negri São Paulo State University (UNESP), Institute of Science and Technology (ICT), São José dos Campos, São Paulo, Brazil William I. Newman Departments of Earth, Planetary, & Space Sciences, Physics & Astronomy, and Mathematics, University of California, Los Angeles, CA, USA Pham Huy Nguyen Imperial College, London, UK Niichi Nichiwaki Professor Emeritus of Nara University, Nara, Japan Janet E. Nichol Department of Geography, School of Global Studies, University of Sussex, Brighton, UK Klaus Nordhausen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä, Finland Ricardo A. Olea Geology, Energy and Minerals Science Center, U.S. Geological Survey, Reston, VA, USA
xlvi
Henning Omre Department of Mathematical Sciences, Norwegian University of Science and Technology, Trondheim, Norway Julian M. Ortiz The Robert M. Buchan Department of Mining, Queen’s University, Kingston, ON, Canada Sid-Ali Ouadfeul Algerian Petroleum Institute, Sonatrach, Boumerdes, Algeria Swathi Padmanabhan Indian Institute of Science, Bangalore, Karnataka, India Béla Paláncz Department of Geodesy and Surveying, Budapest University of Technology and Economics, Budapest, Hungary Javier Palarea-Albaladejo Biomathematics and Statistics Scotland, Edinburgh, UK Monica Palma Università del Salento-Dip. Scienze dell’Economia, Complesso Ecotekne, Lecce, Italy Guocheng Pan Hanking Industrial Group Limited, Shenyang, Liaoning, China Feifei Pan Rensselaer Polytechnic Institute, Troy, NY, USA Rajendra Mohan Panda Geosystems Research Institute, Mississippi State University, Starkville, MS, USA H. S. Pandalai Department of Earth Sciences, Indian Institute of Technology Bombay, Mumbai, India Narayan Panigrahi Scientist and Head of GIS Division, Center for Artificial Intelligence and Robotics (CAIR), Bangalore, India Madhurima Panja Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India Eulogio Pardo-Igúzquiza Instituto Geológico y Minero de España (IGME), Madrid, Spain Vera Pawlowsky-Glahn Department of Computer Science, Applied Mathematics and Statistics, Universitat de Girona, Girona, Spain Daniele Pedretti Dipartimento di Scienze della Terra “A. Desio”, Università degli Studi di Milano, Milan, Italy Alok Porwal Centre for Studies in Resources Engineering, Indian Institute of Technology Bombay, Mumbai, India Centre for Exploration Targeting, University of Western Australia, Crawley, WA, Australia Donato Posa Department of Economics, Section of Mathematics and Statistics, University of Salento, Lecce, Italy Amin Beiranvand Pour Institute of Oceanography and Environment (INOS), Universiti Malaysia Terengganu (UMT), Terengganu, Malaysia
Contributors
Contributors
xlvii
Anirudh Prabhu Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA B. L. S. Prakasa Rao CR Rao Advanced Institute of Mathematics, Statistics and Computer Science, Hyderabad, India Lucia R. Profeta Lamont-Doherty Earth Observatory, Columbia University in the City of New York, New York, NY, USA Zhongfeng Qiu School of Marine Sciences, Nanjing University of Information Science and Technology, Nanjing, China Xiang Que Computer and Information College, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China Department of Computer Science, University of Idaho, Moscow, ID, USA Juan Quiros-Vargas Institute of Bio- and Geosciences, IBG-2: Plant Sciences, Forschungszentrum Jülich GmbH, Jülich, Germany U. Radojičić Institute of Statistics and Mathematical Methods in Economics, TU Wien, Vienna, Austria Majid Rahimzadegan Water Resources Engineering and Management Department, Faculty of Civil Engineering, K. N. Toosi University of Technology, Tehran, Iran Sarah Ramdeen Columbia University, Lamont Doherty Earth Observatory, Palisades, NY, USA Uma Ranjan Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, Karnataka, India Uwe Rascher Institute of Bio- and Geosciences, IBG-2: Plant Sciences, Forschungszentrum Jülich GmbH, Jülich, Germany Francisco J. Rodríguez-Tovar Universidad de Granada, Granada, Spain Hamid Roshan School of Minerals and Energy Resources Engineering, University of New South Wales, Sydney, NSW, Australia John Rundle Departments of Physics and Earth and Planetary Sciences, Earth and Physical Sciences, University of California Davis, Davis, CA, USA Behnam Sadeghi EarthByte Group, School of Geosciences, University of Sydney, Sydney, NSW, Australia Earth and Sustainability Research Centre, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, Australia Ramendra Sahoo Discipline of Earth Sciences, IIT Gandhinagar, Gandhinagar, India Department of Earth and Climate Science, IISER Pune, Pune, India Philippe Salembier Department of Signal Theory and Communication, Universitat Politècnica de Catalunya, BarcelonaTech, Barcelona, Spain
xlviii
Contributors
Sathishkumar Samiappan Geosystems Research Institute, Mississippi State University, Starkville, MS, USA Helmut Schaeben Technische Universität Bergakademie Freiberg, Freiberg, Germany Bernhard S. A. Schuberth Department of Earth and Environmental Sciences, LMU Munich, Munich, Germany John H. Schuenemeyer Southwest Statistical Consulting, LLC, Cortez, CO, USA Jean Serra Centre de morphologie mathmatique, Ecoles des Mines, Paristech, Paris, France Fojan Shafaei School of Mining, College of Engineering, University of Tehran, Tehran, Iran Mine Environment and Hydrogeology Research Laboratory (MEHR Lab), University of Tehran, Tehran, Iran Majid Shahhosseiny School of Mining, College of Engineering, University of Tehran, Tehran, Iran Mine Environment and Hydrogeology Research Laboratory (MEHR Lab), University of Tehran, Tehran, Iran Yunqing Shao China University of Geosciences, Beijing, China Mi Shu Tencent Technology (Beijing) Co., Ltd, Beijing, China Bastian Siegmann Institute of Bio- and Geosciences, Forschungszentrum Jülich GmbH, Jülich, Germany
IBG-2:
Plant
Sciences,
Jorge F. Silva Information and Decision System Group (IDS), Department of Electrical Engineering, University of Chile, Santiago, Chile Jessica Silva Lomba CEAUL, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal Katherine L. Silversides Australian Centre for Field Robotics, Rio Tinto Centre for Mining Automation, The University of Sydney, Camperdown, NSW, Australia Craig T. Simmons College of Science and Engineering & National Centre for Groundwater Research and Training, Flinders University, Adelaide, SA, Australia Donald A. Singer U.S. Geological Survey, Cupertino, CA, USA R. N. Singh Discipline of Earth Sciences, Indian Institute of Technology, Gandhinagar, Palaj, Gandhinagar, India Raghavendra P. Singh Space Applications Centre (ISRO), Ahmedabad, India Sanjay Singh Signal and Image Processing Group, Space Applications Centre (ISRO), Ahmedabad, GJ, India Lowell Kent Smith Emeritus, University of Redlands, Redlands, CA, USA
Contributors
xlix
Amílcar Soares CERENA, DECivil, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal Indu Solomon Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India Katepalli R. Sreenivasan New York University, New York, NY, USA Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India Pradeep Srivastava Earth Sciences, Indian Institute of Technology, Gandhinagar, Gujarat, India R. Mohan Srivastava TriStar Gold Inc, Toronto, ON, Canada Tobias Stephan Department of Geoscience, University of Calgary, Calgary, AB, Canada Dietrich Stoyan Institut für Stochastik, TU Bergakademie Freiberg, Freiberg, Germany Shaoqiang Su Computer and Information College, Fujian Agriculture and Forestry University, Fuzhou, Fujian, China Key Laboratory of Fujian Universities for Ecology and Resource Statistics, Fuzhou, China A. Subramanyam Department of Mathematics, Indian Institute of Technology Bombay, Mumbai, India Ziheng Sun Center for Spatial Information Science and Systems, George Mason University, Fairfax, VA, USA Pejman Tahmasebi College of Engineering and Applied Science, University of Wyoming, Laramie, WY, USA Daniel M. Tartakovsky Department of Energy Resources Engineering, Stanford University, Stanford, CA, USA Sara Taskinen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä, Finland Lea Tien Tay School of Electrical and Electronics Engineering, Universiti Sains Malaysia, Penang, Malaysia A. Erhan Tercan Department of Mining Engineering, Hacettepe University, Ankara, Turkey Sanchari Thakur University of Trento, Trento, Italy Christien Thiart Department of Statistical Sciences, University of Cape Town, Cape Town, South Africa Hannes Thiergärtner Department of Geosciences, Free University of Berlin, Berlin, Germany
l
Rahisha Thottolil Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India Yuanyuan Tian School of Geographical Science and Urban Planning, Arizona State University, Tempe, AZ, USA Shohei Albert Tomita Department of Urban Management, Graduate School of Engineering, Kyoto University, Kyoto, Japan Sebastiano Trevisani Università Iuav di Venezia, Venice, Italy Emmanouil A. Varouchakis School of Mineral Resources Engineering, Technical University of Crete, Chania, Greece Parvatham Venkatachalam Centre of Studies in Resources Engineering, Indian Institute of Technology, Mumbai, India M. Venkatanarayana Department of Electronics and Communication Engineering and Center for Research and Innovation, KSRM College of Engineering, Kadapa, Andhra Pradesh, India Jay M. Ver Hoef Alaska Fisheries Science Center, NOAA Fisheries, Seattle, WA, USA Eric P. Verrecchia Institute of Earth Surface Dynamics, University of Lausanne, Lausanne, Switzerland Jaume Soler Villanueva E.T.S. de Ingeniería de Caminos, Universitat Politècnica de Catalunya, Barcelona, Spain Lajos Völgyesi Department of Geodesy and Surveying, Budapest University of Technology and Economics, Budapest, Hungary Chengbin Wang State Key Laboratory of Geological Processes and Mineral Resources & School of Earth Resources, China University of Geosciences, Wuhan, China Ran Wang Center for Advanced Land Management Information Technologies, School of Natural Resources, University of Nebraska–Lincoln, Lincoln, NE, USA Wenlei Wang Institute of Geomechanics, Chinese Academy of Geological Sciences, Beijing, China Junzo Watada Graduate School of Information, Production Systems, Waseda University, Fukuoka, Japan R. Webster Rothamsted Research, Harpenden, UK Florian Wellmann Computational Geoscience and Reservoir Engineering, RWTH Aachen University, Aachen, Germany Tao Wen Department of Earth and Environmental Sciences, Syracuse University, Syracuse, NY, USA Johannes Wendebourg TotalEnergies Exploration, Paris, France
Contributors
Contributors
li
E. H. Timothy Whitten Riverside, Widecombe-in-the-Moor, Devon, UK Brandon Wilson University of Alberta, Edmonton, AB, Canada Leszek Wojnar Krakow University of Technology, Krakow, Poland Jin Wu China Railway Design Corporation, Tianjin, China Lesley Wyborn Research School of Earth Sciences, Australian National University, Canberra, ACT, Australia Shuyun Xie State Key Laboratory of Geological Processes and Mineral Resources (GPMR), Faculty of Earth Sciences, China University of Geosciences, Wuhan, China Mohamed Yahia GIS and Mapping Laboratory, American University of Sharjah United Arab Emirates, Sharjah, UAE Laboratoire de recherche Modélisation analyse et commande de systèmes- MACS, Ecole Nationale d’Ingénieurs de Gabes – ENIG, Université de Gabes Tunisia, Zrig Eddakhlania, Tunisia Lingqing Yao COSMO – Stochastic Mine Planning Laboratory, Department of Mining and Materials Engineering, McGill University, Montreal, Canada Jeffrey M. Yarus Material Sciences and Engineering, Case Western Reserve University, Cleveland, OH, USA Jordan M. Yarus Geoscience Lead Site Reliability Engineering Halliburton, Houston, TX, USA Hongfei Zhang State Key Laboratory of Biogeology and Environmental Geology, Wuhan, Hubei, China School of Earth Science, China University of Geosciences, Wuhan, Hubei, China Henglei Zhang Institution of Geophysics and Geomatics, China University of Geosciences, Wuhan, China Junqiang Zhang School of Computer Science, China University of Geosciences, Wuhan, China Jie Zhao School of the Earth Sciences and Resources, China University of Geosciences, Beijing, China Jiancang Zhuang The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tokyo, Japan Sergio Zlotnik E.T.S. de Ingeniería de Caminos, Universitat Politècnica de Catalunya, Barcelona, Spain Renguang Zuo State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences, Wuhan, China
A
Accuracy and Precision P. A. Dowd School of Civil, Environmental and Mining Engineering, The University of Adelaide, Adelaide, SA, Australia
Definitions Accuracy is a measure of how close a measured value of a variable is to its true value. In the context of the mathematical geosciences, it could also be a measure of how close an estimated value of a variable is to the unknown true value at a given location when the estimate is obtained as some function of the measured data values. In the latter sense, the accuracy comprises two components: the accuracy of the measurements of the individual data values and the accuracy of the estimate based on those values. Precision is a measure of the extent to which independent measurements of the same variable agree when using the same measuring device and procedure, i.e., the degree of reproducibility. Precision is also used to refer to the number of significant digits used in the specified measured value. For example, when comparing two measuring devices, if one can measure in smaller increments than the other, it has a higher precision of measurement. To distinguish this definition from the previous one, it will be referred to as the resolution of the measuring device. As accuracy is independent of precision, a measurement may be precise but inaccurate or it may be accurate but not precise. It is difficult to achieve high accuracy without a good level of precision. A common way of illustrating the concepts of accuracy and precision is to use the game of darts where the objective is to throw multiple darts at a dart board with the objective of hitting the bull’s eye at the center of the board. In terms of the accuracy and precision of the dart thrower, there are four general outcomes: (Fig. 1)
Both accuracy and precision are related to uncertainty. Uncertainty arises from the inability to measure a variable exactly and is quantified as the range of values of the variable within which the true value lies. The uncertainty may be due to the calibration of a measuring device or the manner in which a sample is collected and/or analyzed. Uncertainty also refers to the range of values of an estimated value of a variable when the estimate is obtained as some function of a set of measured data values. The most common measure of accuracy is the relative error in a set of n measurements: n i¼1
jmeasured valuei true valuej true value
The most common measure of precision is the coefficient of variation: n i¼1
ðmeasured valuei
2 mean measured valueÞ
n1
mean measured value
The Mathematical Geosciences Context Geoscience variables may be quantitative or qualitative. These variables may be measured directly by collecting physical samples and subjecting them to some process or they may be measured indirectly by using one or more of many sensing devices. Quantitative variables are numerical and they measure specific quantities. They are generally measured on specific volumes and their values are averages over those volumes. For example, the gold content of a specific volume of rock is measured as the proportion of the volume of rock that is gold. As the variance of such variables is inversely proportional to the volume on which they are measured, values measured on different volumes should not be combined without accounting for the volume-variance effect.
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
2
Accuracy and Precision
Accuracy and Precision, Fig. 1
Poor accuracy and poor precision
Good accuracy and poor precision
Poor accuracy and good precision
Good accuracy and good precision
Qualitative (or categorical) variables may also be measured on specific volumes and they may be nominal (e.g., rock types or the presence or absence of a characteristic such as lithology) or ordinal (i.e., an ordered rank or sequence in which the distance between the elements is irrelevant). Examples of the latter include measures of mineral hardness or rock strength characteristics. Quantifying the uncertainty of qualitative variables is more problematic see, for example, Pulido et al. (2003) for examples of methods, and Lindsay et al. (2012) for quantifying geological uncertainty in threedimensional modelling. All geoscience variables are spatial in the sense that their values are a function of the location at which they are measured, and they generally have a level of spatial correlation that reflects the structure of the geological or geomorphological setting in which they occur. The accuracy of a measurement of a variable can only be known if the true value of the variable is known. If the true value is not known, which is the case for many geoscience variables, then the accuracy of a measurement can only be estimated. Assuming that the measurement instrument is properly calibrated, and the approved measurement process is followed, a practical method of quantifying accuracy is by repeatedly measuring the same sample and expressing the accuracy as: n i¼1
jmeasured i averagej average
However, in the geosciences, often only some representative proportion of the entire sample is measured. In addition, the direct quantitative measurement of many types of samples requires destructive testing of a subsample of the total sample. A common example is measuring the mineral content of a cylindrical drill core sample. In common practice, the core is
split longitudinally, and one half is kept for reference. The other half-core is crushed to a given particle size and a subsample is taken, which is then crushed to a smaller particle size and again resampled. This procedure is repeated a number of times until a final subsample is obtained and chemically analyzed for mineral content. Each step of the procedure may introduce error and/or affect the representativity of the subsample and hence affect accuracy and precision. For a detailed account of the sampling of particulate materials, see Gy (1977) and Royle (2001). For a detailed approach to accuracy and precision in sampling particulate materials for analytical purposes, see Gy (1998). For quantifying uncertainty in analytical measurements see Ellison and Williams (2012).
Resolution of the Measuring Device: Significant Figures It is useful to distinguish between exact numbers and measured numbers. An exact number is a number that is perfectly known. It could be a number obtained by counting items, such as the number of samples prepared for analysis. It could also be a defined number, such as 1 m ¼ 100 cm. Exact numbers have an infinite number of significant figures. The interest here is in measured numbers, which are not exactly known because of the limitations of the measuring device and process, and in values that are calculated from measured numbers. An indication of the uncertainty of a measured value is the number of significant figures used to report it. Results of calculations should not be reported to a greater level of precision than the data. The rules for determining the number of significant figures are:
Accuracy and Precision
• Nonzero digits are always significant. • Zeros between significant digits are significant. For example, 3045 has four significant digits. • Zeros to the left of the first nonzero digit are not significant. For example, 0.00542 has three significant digits. This is clearer when using exponential notation: 5.42 103. • A final zero and all trailing zeros in the decimal portion of a number are significant. For example, the final two zeros in 0.0004700 in which there are four significant (4700) digits and 6.720 104 in which all zeros are significant. • Trailing zeros are not significant in numbers without decimal points. For example, 250 has two significant figures and 5000 or 5 103 has only one significant digit.
Sensed Data Some variables are not directly measured but are extracted from a signal generated by a sensing device. Such a device might be used locally (e.g., lowered into a drillhole to sense mineral grades at incremental depths) or used remotely (e.g., from a satellite or an aircraft to sense features of the surface of the Earth). Widely used subsurface sensing techniques include seismic tomography, magnetotellurics, and groundpenetrating radar. These signal data must be processed to extract the required quantitative or qualitative data. For quantitative variables, such as mineral grades, the depth that the signal penetrates the rock may differ depending on the local characteristics of the rock and thus each signal measurement at different locations may refer to different volumes. Such measurements cannot be combined without accounting for the volume-variance relationship. Quantification of the accuracy and precision of sensed geoscience data has largely been limited to the surface of the Earth using techniques such as Landsat and LiDAR. These techniques can detect and measure surface features and generate two-dimensional and three-dimensional data with relatively high accuracy and precision. Congalton and Green (2020) is a good reference for quantifying the accuracy and precision of these techniques. Quantifying the accuracy and precision of subsurface sensed data is more complex. In many cases, the detection of significant qualitative features is relatively accurate. However, it is much more difficult to quantify the accuracy and precision of quantitative measurements. The increasing use of sensed quantitative subsurface data requires some meaningful measure of accuracy and precision, especially when these data are integrated with directly measured data, which will have different levels of accuracy and precision. An indication of the increasing use of subsurface sensed data for minerals, energy, and water resources, including the identification and reduction of all sources of uncertainty, can be found in the Deep Earth Imaging Project (CSIRO 2016).
3
Integrating Sensed and Directly Measured Data Until recently, sensed data in the geosciences have largely been limited to qualitative geological and geomorphological characteristics and to quantitative variables on the surface, or near subsurface, of the earth particularly for applications such as hydrology (e.g., Reichle 2008). For subsurface applications, the exceptions have been quantitative variables, such as elastic moduli and rock-strength parameters, that can be derived from sensed variables such as shear velocity, compressive velocity, and bulk density; see Dowd (1997) for an example of deriving these variables from measured variables. Increasingly, direct quantitative subsurface sensing devices (e.g., measuring mineral grades while drilling) are being developed and implemented. In economic geology and mineral resource applications these sensed data are used together with directly measured values of the same characteristic or property (e.g., sensed mineral grade and directly measured grade) usually at different locations. This requires the two types of data to be integrated in a way that accommodates the different accuracies and uncertainties of the measured values and the different volumes of measurements. This is an area of ongoing research in many areas of the mathematical geosciences. See Careasssi et al. (2018) for approaches in environmental modelling.
Using Integrated Data Types for Estimation Quantitative data may be used to estimate the value of a variable at an unsampled location on the same volume as the data or, more often, on a significantly larger volume. Qualitative data may be used to estimate the presence or absence of a characteristic or feature at an unsampled location or to estimate the boundaries of features or in the various forms of geological domaining. These (statistical or geostatistical) estimates are, or should be, accompanied by a measure of the accuracy of the estimate. In this context, the accuracy refers to the uncertainty of the estimate. The estimate will also be reported with a level of precision. For various reasons, more often than not, the measure of the uncertainty of the estimate does not include the uncertainty of the data used in the estimate. See de Marsily (1986) for kriging with uncertain data and an application in Cecinati et al. (2018). As the amount of sensed data increases, there is a corresponding need to account appropriately for the relative uncertainties of directly measured and sensed data especially in mineral resource applications and more generally in economic geology. The data are also used to inform the model used for estimation; for example, the variogram used in geostatistical estimation (cf. Journel and Huijbregts 1978, 1984). The amount of data available, and their relative spatial locations,
A
4
will affect the quantification of spatial variability in the variogram model, which in turn will affect the accuracy of estimates. The various forms of kriging yield estimation variances that quantify the accuracy of the estimate, but the quantification does not generally include the uncertainty of the parameters in the variogram model. In addition, sparse data at significant distances from the location at which the estimate is made will overestimate low values and underestimate high values (Goovaerts 1997), thus affecting the accuracy and precision of the estimate. There is a need, especially in economic geology and mineral resource estimation, for the adequate incorporation of these sources of uncertainty in the estimates.
Additive Logistic Normal Distribution Gy P (1977) The sampling of particulate materials – theory and practice. Elsevier, 431pp. ISBN: 978-0444420794 Gy P (1998) Sampling for analytical purposes. Wiley, 153pp. ISBN: 0-471-97956-2 Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic. 610pp. ISBN-13: 978-0123910509. Re-published by Blackburn Press (2004), ISBN-13: 978-1930665910 Lindsay MD, Aillères L, Jessell MW, de Kemp MW, Betts PG (2012) Locating and quantifying geological uncertainty in threedimensional models: analysis of the Gippsland Basin, South-Eastern Australia. Dent Tech 546–547:10–27 Pulido A, Ruisáchez I, Boqué R, Rius FX (2003) Uncertainty of results in routine qualitative analysis. Trends Anal Chem 22(10):64–654 Reichle RH (2008) Data assimilation methods in the Earth sciences. Adv Water Resour 31:1411–1418 Royle AG (2001) Simulations of gold exploration samples. Trans Inst Min Metall: Section B Appl Earth Sci 110(3):136–144
Summary This entry summarizes the concepts of accuracy and precision in the context of the mathematical geosciences. Both concepts relate to direct, and sensed, in situ measurements of variables, to the integration of these two types of measurement, to the quantification of the spatial variability of these two types of measurement, and to the use of the measurements to estimate values of variables at unsampled locations. It also identifies areas that require further research to enable a greater understanding of accuracy and precision and to include additional sources of uncertainty that affect accuracy and precision.
Additive Logistic Normal Distribution Gianna Serafina Monti1, Glòria Mateu-Figueras2,3 and Karel Hron4 1 Department of Economics, Management and Statistics, University of Milano-Bicocca, Milan, Italy 2 Department of Informatics, Applied Mathematics and Statistics, University of Girona, Girona, Spain 3 Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Girona, Spain 4 Department of Mathematical Analysis and Applications of Mathematics, Palacký University, Olomouc, Czech Republic
Cross-References ▶ Geologic Time Scale Estimation ▶ Geostatistics ▶ Kriging ▶ Variogram ▶ Variance
Synonyms
Bibliography
The additive logistic normal model is a family of distributions of compositional data defined on the simplex using the logratio approach (Aitchison 1986; Pawlowsky-Glahn et al. 2015) and the multivariate normal distribution for real random vectors. It is the most used distribution on the simplex introduced by Aitchison (1982) as an alternative to the wellknown Dirichlet model. For the construction of the distribution, standard strategy of the logratio methodology using the principle of working on coordinates is applied; specifically, the multivariate normal distribution in the real space is used to model the additive logratio representation of the random composition. Originally, the model was defined using the additive logratio (alr) representation, and hence the name additive
Careassi A, Bocquet M, Bertino L, Evenson G (2018) Data assimilation in the geosciences: an overview of methods, issues, and perspectives. WIREs Clim Change 9:5 Cecinati F, Moreno-Ródenas AM, Rico-Ramirez MA, ten Veldhuis M-C, Langeveld JG (2018) Considering rain gauge uncertainty using kriging for uncertain data. Atmosphere 9:446 Congalton RG, Green K (2020) Assessing the accuracy of remotely sensed data: principles and practices, 3rd edn. CRC Press, 348pp CSIRO (2016). https://research.csiro.au/dei/about-us/ De Marsily G (1986) Quantitative hydrogeology: groundwater hydrology for engineers. Academic, San Diego, 440pp Dowd PA (1997) Geostatistical characterization of three-dimensional spatial heterogeneity of rock properties at Sellafield. Trans Inst Min Metall, Section A 106:A133–A147 Ellison SLR, Williams A (eds) (2012) Quantifying uncertainty in analytical measurement. EURACHEM/CITAC Guide CG 4, 3rd edn. ISBN 978-0-948926-30-3
Logistic normal; Logratio normal; Normal on the simplex
Definition
Additive Logistic Normal Distribution
5
logistic normal was assigned. However, other orthonormal logratio representations, originally known as isometric logratio (ilr) representation (Egozcue et al. 2003), can be used too. For this reason, the term isometric logistic normal or, more general, the term logistic normal is adopted for this model as well. Later, motivated by the principle of working on coordinates the term normal distribution on the simplex comes to emphasize the idea of staying on the simplex (Mateu-Figueras et al. 2013). Nowadays, the term logratio normal is also preferred by some authors (Comas-Cufí et al. 2016) to highlight the use of logratio coordinates instead of logistic transformation.
Sample Space of Compositional Data and Its Geometry The content of this section is largely covered in the chapter ▶ “Additive Logistic Skew-Normal Distribution,” and more extended in the chapter ▶ “Compositional Data”. For readers’ convenience, we introduce here only notation and some essential concepts useful in the following. The simplex is the sample space of random compositions with D parts. It is denoted as D
SD ¼
x¼ ðx1 , . . . , xD Þ, xi > 0, i ¼ 1, 2, . . . , D,
xi ¼ k , i¼1
where k is a constant, usually taken to be 1 or 100. The simplex S D has a (D – 1)-dimensional real Euclidean vector space structure with the following operations (Pawlowsky-Glahn et al. 2015): • Perturbation: x y ¼ Cðx1 y1 , . . . , xD yD Þ, • Powering: a x ¼ C xa1 , . . . , xaD , • Inner product: hx, yia ¼
D i¼1
ln
xi gm ðxÞ
ln
yi gm ðyÞ,
for x, y S D, a scalar α ℝ, gm(x) the geometric mean of the parts of x, and C the closure operator which normalizes any vector to the constant sum k. This algebraic-geometric structure of the sample space assures existence of a basis and the corresponding coordinate representation of compositions. These coordinates are real vectors that obey the standard Euclidean geometry. Consequently, once a basis has been chosen, all standard geometric operations, or statistical methods, can be applied to compositional data in coordinates and transferred to the simplex by preserving their properties. This is known as the principle of working on coordinates (see the ▶ “Compositional Data” entry). Some frequently used coordinate representations are: • alrðxÞ ¼ ln
x1 xD ,
. . . , ln
xD1 xD
ℝD1 ,
D xi i¼1 ln gm ðxÞ ¼ 0, ℝD 1, where {e1, D
• clrðxÞ ¼ ln gmxð1xÞ, . . . , ln gmxðDxÞ ℝD , where
• h(x) ¼ (hx, e1ia, . . ., hx, eD 1ia) . . ., eD 1} is an orthonormal basis on S .
Note that vectors alr(x) and clr(x), where the latter stands for centered logratio (clr) coefficients, are non-orthogonal logratio representations. In particular the clr representation has one extra dimension, which makes it a constrained vector. Vector h(x) is an orthonormal logratio representation. It contains the coordinates of x with respect to the orthonormal basis {{e1, . . ., eD 1}. Egozcue et al. (2003) introduced it using a particular orthonormal basis naming it isometric logratio (ilr(x)) representation. Quite recently, MartínFernández (2019) renamed it as orthogonal logratio (olr(x)) representation to emphasize this intrinsic property. There are some popular possibilities how to build such an orthonormal basis (Egozcue and Pawlowsky-Glahn 2005; Filzmoser et al. 2018). In order to avoid confusion, here we will denote a generic orthonormal representation as h(x). Also, given a real vector of olr coordinates y we will denote the corresponding composition as h1(y). The different logratio coordinate representations can be linked through the change of basis matrix. We briefly review the concept of subcomposition, used in the remainder of the chapter. Given a random composition X S D, a C-part subcomposition is a subvector in S C formed only by C parts (C < D). A subcomposition can be obtained as C ðSXÞ where S is a (C, D)-matrix with C elements equal 1 (one in each row and at most one in each column) and the remaining elements equal 0. Furthermore, to define probability laws it is important to establish a measure with respect to which our density function is expressed. Traditionally, a Lebesgue measure l has been used in the simplex, but the Aitchison measure la is preferred in the context of the Aitchison geometry (Mateu-Figueras and Pawlowsky-Glahn 2008). It could be proved that la is absolutely continuous with respect to l. The relationship between them is described by the Jacobian dla 1 ¼p , dl D D i¼1 xi
ð1Þ
and allows to express density functions on S D with respect to either l or la.
The Logistic Normal Model Definition The additive logistic normal (aln) model is defined by Aitchison (1982) using the alr coordinates. Indeed, a D-part random composition X is said to have an aln distribution when its additive logratio representation, alr(X), has a (D – 1)-dimensional normal distribution in the space of
A
6
Additive Logistic Normal Distribution
coordinates. The corresponding density function on S D with respect to the l measure is f ðx Þ ¼
ð2pÞðD1Þ=2 jSj1=2 1 exp ðalrðxÞ mÞ0 S1 ðalrðxÞ mÞ : 2 PD i¼1 xi
measure l. Using the relationship (1), it is easy to express it with respect to the Aitchison measure la as 1 f a ðxÞ ¼ ð2pÞðD1Þ=2 jYj1=2 exp ðhðxÞ j Þ0 Y1 ðhðxÞ j Þ ; 2
ð5Þ
ð2Þ D
This distribution will be denoted as X N S ðm, SÞ (Mateu-Figueras et al. 2013). Note that the total number of parameters of an aln distribution is (D 1)(D þ 2)/2. We could express density (2) in terms of any orthonormal logratio representation h(X). We have to use only the matrix relationship between the two logratio representations h(X) ¼ A alr(X) (see the ▶ “Additive Logistic Skew-Normal Distribution” entry for the matrix A) and the linear transformation property of the normal distribution. The resulting density is defined on the simplex and also expressed with respect to the Lebesgue measure l: f ðxÞ ¼
ð2pÞðD1Þ=2 jYj1=2 1 p exp ðhðxÞ j Þ0 Y1 ðhðxÞ j Þ , 2 DP D x i¼1 i
ð3Þ where j ¼ Am,
Y ¼ ASA0 :
ð4Þ
A similar strategy could be applied to obtain the density in terms of the clr representation, but it has a limited use in practice as a degenerate density is obtained due to the zerosum constraint of the vector clr(X). Both densities (2) and (3) define exactly the same probability law on S D and are defined with respect to the Lebesgue
Additive Logistic Normal Distribution, Fig. 1 Isodensity curves of two logistic normal models in (a) on S 3 and (b) on ℝ2 using an orthonormal logratio representation. The solid line stands for j ¼ (1, 0), Y ¼ I and the dash-dotted line for 0:2 0:8 j ¼ (1, 1), Y ¼ 0:8 6
here the subindex a is only used to remind that the density is expressed with respect to la. Mateu-Figueras et al. (2013) define the logistic normal model using density (5) and call it the normal distribution on the simplex. Even though this change of measure gives the same law of probability, it produces changes in some characteristic values of the distribution. An interested reader can find a detailed explanation in Mateu-Figueras et al. (2013). We advise that if we want to study properties of the distribution directly on the simplex, it is better to use the density with respect to the Aitchison measure as it is compatible with the simplex space structure. Moreover, the Aitchison measure enables further generalization of the logistic normal distribution, if parts of the random composition are weighted, for example, according to measurement precision (Egozcue and Pawlowsky-Glahn 2016; Hron et al. 2022). However, for computation of probability of an event we have to use the density with respect to the Lebesgue measure in order to compute the integrals. In Fig. 1 we represent the isodensity curves of two logistic normal densities on (a) S 3 with respect to la and on (b) ℝ2 using the orthonormal logratio representation with respect to the particular orthonormal basis defined in Egozcue et al. (2003). Note that using the coordinate representation, the typical circumferences and ellipses from the multivariate normal in real space are obtained as isodensity curves. The isodensity curves on the simplex are the corresponding logratio circumferences and logratio ellipses.
x1 4 3 2 1 0 -1 -2
x2
x3
(a)
-4
-3
-2
-1
0
(b)
1
2
3
4
5
Additive Logistic Normal Distribution
7
Key Properties Real random vectors obtained as linear transformation of a multivariate normal distribution on real space are still multivariate normal distributed. This is the well-known linear transformation property of the multivariate normal. Applying this property to the logratio vectors, we can easily prove the closure under perturbation, power transformation, and subcomposition of the logistic normal family of distributions. We list here the main properties using the density in terms of the logratio representation h(x) with respect to a generic orthonormal basis, but the density in terms of the alr(x) could also be used. An interested reader can find a detailed proof of each property in Mateu-Figueras et al. (2013). Property 1 (Closure under and ) Let X N S ðj, YÞ be a random composition, a S D , and b ℝ. Then, the D-part random composition X ¼ a ðb X Þ N DS hðaÞ þ bj, b2 Y . From this property, and using the density function with respect to the Aitchison measure la, we obtain the invariance under perturbation operation, that is, fa(a x) ¼ fa(x). This has important consequences, because when working with compositional data, it is usual to center the input data set which could be expressed as perturbation by a compositional vector (center of the distribution). However, this property would not be obtained using the density with respect to the measure l. Accordingly, although both densities define the same probability law, this is one of differences between them. D
Property 2 (Closure under subcomposition) Let X
value because the corresponding integral is not reducible to any simple form, and numerical integration should be used. Nevertheless, for random compositions, the use of Ea(X) is recommended with the same arguments as, when working with a compositional data set, the center, that is the sample geometric mean, is used instead of the arithmetic mean. Variability of random compositions can be measured with the metric variance, also called the total variance (Aitchison 1986). D
Property 4 (Metric variance) Let X N S ðj, YÞ, then Mvar(X) ¼ trace(Y). Given a compositional data set, the parameters of the logistic normal model, that is, the vector of expected values j and the covariance matrix Y, may be estimated using the maximum likelihood method based on the logratio representation of the data set. If we change the logratio representation, the estimates will be related thought the corresponding transformation matrix A as done in (4), the only deviations will be due to the numerical nature of the procedure. Thus, the estimates are invariant under the choice of the logratio representation. Finally, given a compositional data set, we can validate the assumption of the logistic normality via testing for multivariate normality of the olr coordinate representation using a proper goodness-of-fit test. In this case, any univariate normality tests applied to each marginal are not invariant under the choice of the logratio representation, but the w2 goodnessof-fit test applied to the Mahalanobis distances or the multivariate Shapiro-Wilk test of normality is invariant.
D
N S ðj, YÞ and XS ¼ CðSXÞ be a C-part random subD composition. Then XS N S ðj S , YS Þ with j S ¼ U0 SUj,
Applications to Geosciences
0
YS ¼ ðU0 SUÞYðU0 SUÞ ,
where the columns of matrices U and U* contain the clr representation of the corresponding orthonormal basis in S D and S C . The center of a random composition X plays the role of the expected value for a real random vector (Pawlowsky-Glahn et al. 2015). Mateu-Figueras et al. (2013) show that this center can be expressed as the expected value using the density function with respect to la; for this reason it is denoted as Ea(X). D
Property 3 (Location) Let X N S ðj, YÞ, then mode and expected value with respect to the measure la coincide and are mode a(X) ¼ Ea(X) ¼ h1(j) independently of the chosen orthonormal coordinates h(x). The expected value with respect to the Lebesgue measure corresponds to the standard expected value for real random vectors. Although it exists, there is no closed form for this
In this section, by means of some compositional data analysis, we briefly show the central rule of the logistic normal distribution in geosciences. As the normal distribution in real space, the logistic normal distribution is used in many statistical compositional applications and represents an essential distribution for statistical inference on the simplex, such as regression analysis, hypotheses testing, and construction of confidence intervals. We consider, as a first example, the GEMAS data set (Reimann et al. 2014a, b), a geochemical data set on agricultural and grazing land soil, collected as GEMAS data in the R-package robCompositions (Templ et al. 2011), to show the fit of the logratio normal distribution. Here the complementary particle size distributions are considered, that is, the three-part compositions x ¼ (sand, silt, clay). For this example only samples from Italy are used and filtered from outliers (Filzmoser and Hron 2008), resulting in n ¼ 109 soil samples. Using the orthonormal logratio representation called pivot coordinates
A
8
hðxÞ ¼
Additive Logistic Normal Distribution
2=3 ln sand=ðsilt clayÞ2 ,
1=2 lnðsilt=clayÞ (Filzmoser
et al. 2018), the following parameter estimates are obtained: 0:285 0:031 j ¼ ð0:245, 0:346Þ and Y ¼ . Goodness-of0:031 0:084 fit tests applied to each marginal show no significant departures from normality ( p – values > 0.1 for the Anderson-Darling and Shapiro-Wilk test). The Anderson-Darling test applied to the Mahalanobis distances to check the w22 assumption shows no significant departure (p – value > 0.1). The fitted isodensity curves on S 3 are displayed in Fig. 2. For the multivariate analysis of variance (MANOVA) to test whether significant differences exist in the three-part compositions of particle size distributions among soil groups, the multivariate logistic normality is one of the fundamental assumptions. In other words, we assume that the dependent variables, that is, the (sand, silt, clay) compositions follow the multivariate logistic normality within groups defined by the soil classes. The Shapiro-Wilk test for multivariate normality applied to the two pivot coordinates shows no significant strong departures from the null hypothesis (p – values ¼ 0.1). Figure 3 reports Normal Q-Q plots of the residuals of the MANOVA for the two pivot coordinates, which confirm that a sand
normal distribution is reasonable in both cases. Finally, the MANOVA test suggests significant differences between the soil groups (p – value < 0.001 for the Pillai test). Note that we would obtain the same inference in multivariate linear regression, using the soil classes as categorical covariate. Similar to the MANOVA, also Linear Discriminant Analysis (LDA) for compositions, principal component analysis, or canonical correlation are other techniques which use the assumption of logistic normality to represent group distributions. For demonstration purposes, we use a second example related to whole-rock composition of lavas from the Atacazo and Ninahuilca volcanoes, Ecuador (Hidalgo et al. 2008). In particular, we look at the major elements (sub)composition with five parts: Fe2O3, MgO, K2O, La, and Yb. The logistic normality assumption is confirmed by the univariate p-values of the Shapiro-Wilk tests applied to each pivot coordinate listed in Table 1; all reported p-values are greater than 0.05 within each level of the grouping variable, related to different volcanoes. Moreover, the last row of Table 1 reports results of the Shapiro-Wilk test for multivariate normality within the two groups, which again are compatible with the assumption. The purpose of LDA in this example is to find the linear combinations of the original variables, namely the four pivot coordinates, that gives the best possible separation between groups in the data set. The LDA results offer a satisfactory separation of the two groups; in fact the model accuracy is equal to 0.945. The discriminant scores themselves are approximately normally distributed within groups as we can see from Fig. 4.
Summary clay
Additive Logistic Normal Distribution, Fig. 2 Fitted isodensity curves for the logistic normal model to the composition (sand, silt, clay) of the GEMAS data set. Samples from Italy were taken and filtered from outliers (n ¼ 109)
Normal Q−Q Plot
−2
0
1
2
Theoretical Quantiles sand_si.cl
0.2 −0.2 −0.6
0.0
0.5
Sample Quantiles
Normal Q−Q Plot Sample Quantiles
Additive Logistic Normal Distribution, Fig. 3 Q-Q plots of the residuals of the MANOVA (GEMAS data set)
In this chapter the logistic normal model, a follower of the additive logistic normal distribution compatible with the algebraic-geometric structure of the simplex, was presented as a general tool for distributional modeling of compositional data in geosciences. It is flexible enough to model data
−1.0
silt
−2
0
1
2
Theoretical Quantiles silt_cl
Additive Logistic Normal Distribution
9
Additive Logistic Normal Distribution, Table 1 Results of univariate, and multivariate (last row), Shapiro-Wilk normality test (test statistics W and p-values) within groups
4=5 ln ðFe2 O3 =MgO K2 O La YbÞ4 Þ
Group Atacazo W 0.958
3=4 ln ðMgO=K2 O La YbÞ3 Þ
0.967
0.130
0.985
0.739
2=3 ln K2 O=ðLa YbÞ2
0.98
0.482
0.968
0.162
0.954
0.033
0.984
0.684
0.762
0,
xi ¼ k , i¼1
where k is a constant, usually taken to be one (proportional representation of compositional data) or one hundred (representation in percentages). Below the algebraic geometric structure of the simplex, contained also in the entry ▶ “Compositional Data”, is briefly recalled for the purpose of this chapter. The simplex S D has a (D 1)-dimensional real Euclidean vector space structure with the following operations (Pawlowsky-Glahn et al. 2015): • Perturbation: x y ¼ C (x1y1, . . ., xDyD), • Powering:a x ¼ C xa1 , . . . , xaD , for x, y S Dand a scalar α ℝ. The operator C is the closure and normalizes the compositional vector in the argument by dividing each component by the sum of all components and multiplying them by the constant k. Also, an inner product can be defined, that induces a norm and a distance (see the ▶ “Compositional Data” entry for details). This algebraic-geometric structure of the sample space assures existence of a basis and the corresponding coordinate representation of compositions. These coordinates are real vectors that obey the standard Euclidean geometry, with the sum of two vectors and the multiplication of a vector by a scalar instead of perturbation and powering. Consequently, once a basis has been chosen, all standard geometric operations or statistical methods can be applied to coordinates and transferred to the simplex by preserving their properties. This is known as the Principle of working on coordinates (see the ▶ “Compositional Data” entry). Some frequently used coordinate representations defined by Aitchison (1986) involve logratios: alrðxÞ ¼
clrðxÞ ¼
ln
ln
x1 x , . . . , ln D1 xD xD
x1 x , . . . , ln D gm ð xÞ gm ð xÞ
The additive logratio (alr) representation is a (D 1)dimensional vector which corresponds to coordinates with respect to an oblique basis. Consequently, the geometric operations are preserved as alr(x y) ¼ alr(x) þ alr(y) and alr(a x) ¼ a alr(x), but distances and orthogonal projections computed using the alr coordinates will not be the same as computed with the original compositions and using the Aitchison distance or inner product. This is the main drawback of the additive logratio coordinates. The centered logratio (clr) representation is a D-dimensional vector obtained as coefficients with respect to a generating system,
Additive Logistic Skew-Normal Distribution
11
not coordinates of a basis. We have one extra dimension and for this reason the clr vector is constrained: the sum of its components equals to zero. Although this representation is widely used in the literature because geometric operations, distances and inner product are preserved, its use to define families of distributions is very limited because degenerate densities are obtained. We can also consider coordinates with respect to an orthonormal basis, initially called as isometric logratio (ilr) representation, but recently renamed to orthogonal logratio (olr) representation which better reflects its intrinsic properties. Egozcue et al. (2003) proposed a particular orthonormal basis for construction of these coordinates, but there are also other popular possibilities how to build such a basis. One of them, explained in the ▶ “Compositional Data” entry, is using a sequential binary partition of a generic composition and called balances (Egozcue and Pawlowsky-Glahn 2005). Some other olr coordinate systems, called pivot coordinates, highlight the role of single compositional parts and were proposed in Filzmoser et al. (2018). In order to avoid confusion, here we will denote a generic orthonormal representation as h(x). Also, given a real vector of olr coordinates y we will denote the corresponding composition as h1(y). Different logratio coordinate representations can be linked through a matrix which stands for the change of the basis. For example, we can construct a (D 1) (D 1) matrix A which relates the alr representation with a particular orthonormal representation h(x) as hðxÞ ¼ A alrðxÞ or alrðxÞ ¼ A1 hðxÞ
ð1Þ
where A ¼ U0F, U is a (D, D 1)-matrix which contains in columns the clr representation of the orthonormal basis and F⋆ is the (D, D 1)-matrix
F⋆ ¼
1 D
D1 1
1 D1
... ...
1 1
⋮ 1
⋮ 1
⋱ ...
⋮ D1
1
1
...
1
Given a D-part random composition X, we can be interested in the composition formed only by C-parts (C < D). This can be regarded as a projection on a simplex with fewer parts and, accordingly, a random subcomposition is obtained. Formally, the formation of a subcomposition can be achieved as C (SX) where S is a (C, D)-matrix with C elements equal 1 (one in each row and at most one in each column) and the remaining elements equal 0. To define probability laws it is important to establish a measure with respect to which our density function is expressed. Traditionally, a Lebesgue measure l has been
used in the simplex but the alternative measure called Aitchison measure and denoted as la is preferred in context of the Aitchison geometry (Mateu-Figueras and PawlowskyGlahn 2008). This measure is defined using the Lebesgue measure in the space of olr coordinates and it is compatible with the vector space structure defined above. It could be proved that the Aitchison measure la is absolutely continuous with respect to the Lebesgue measure l. The relationship between them is dla 1 : ¼p dl DPD i¼1 xi
ð2Þ
This relationship allows to express density functions on SD with respect to l or la. Specifically, the chain rule for measures relates two probability densities, dP/dla and dP/dl, as dP/dl ¼ (dP/dla)(dla/dl).
The Multivariate Skew-Normal Model According to the definition given by Azzalini and Capitanio (1999), a (D 1)-variate random vector Y follows a multivariate skew-normal (sn) distribution on ℝD1 with parameters m, S, and α, if its density function is given by f ðyÞ ¼ 2ð2pÞðD1Þ=2 jSj1=2 1 exp ðy mÞ0 S1 ðy mÞ F a0 v1 ðy mÞ , 2
ð3Þ
where F is the standard normal distribution function and v is the square root of the diagonal matrix formed from S. The (D 1)-variate vector α regulates the shape of the density and indicates the direction of maximum skewness. The sn family is only able to model moderate levels of skewness as the skewness coefficient of each univariate sn marginal takes values in the interval (0.995, 0.995). When α ¼ 0, the multivariate normal density with parameters m and S is obtained. In the following the notation Y SN D1(m, S, α) is used. The moment generating function provided in Azzalini and Capitanio (1999) is MYðtÞ ¼ 2 expðt0 m þ ð1=2Þt0 StÞFðd0 vtÞ,
ð4Þ
where d¼p
v1 Sv1 a 1 þ a0 v1 Sv1 a
ð5Þ
The expected value and the covariance matrix derived from (4) are
A
12
Additive Logistic Skew-Normal Distribution
EðYÞ ¼ m þ v
2=pd, VarðYÞ ¼ S ð2=pÞvdd0 :v
The appealing properties of this distribution are listed by Azzalini and Capitanio (1999). Most of them are analogues of the multivariate normal properties, in particular, random vectors obtained as linear transformation of the form Y ¼ AY for any real matrix A, are still multivariate sn distributed. Another interesting property in connection with the multivariate normal random compositions is that the random variable
distribution. Accordingly, for h(X) ¼ A alr(X) we can derive the previous density function in terms of h(X). It is defined on the simplex and also expressed with respect to the Lebesgue measure l, 2ð2pÞðD1Þ=2 jYj1=2 p DP D i¼1 xi 1 exp ðhðxÞ j Þ0 Y1 ðhðxÞ j Þ F ϱ0 y1 ðhðxÞ j Þ , 2
f ðxÞ ¼
ð8Þ 0
1
Z ¼ ðY mÞ S ðY mÞ,
ð6Þ
follows a w2 distribution with D 1 degrees of freedom. Given a random sample, the maximum likelihood (ML) method is proposed to estimate the parameters but numerical procedures have to be used. The method of moments is proposed to obtain starting values for the iterative procedure. The R-package “sn” is available and contains a suite of functions for handling univariate and multivariate sn distributions (Azzalini 2020).
The Logistic Skew-Normal Model Definition By analogy with the additive logistic normal (aln) model defined by Aitchison (1986) using the alr coordinates, Mateu-Figueras et al. (2005) use the same strategy to define the additive logistic skew-normal (alsn) model. Indeed, a D-part random composition X is said to have an alsn distribution when its additive logratio representation, alr(X), has a (D 1)-dimensional sn distribution in the space of coordinates. The corresponding density function on S D with respect to the Lebesgue measure l is 2ð2pÞðD1Þ=2 jSj1=2 PD i¼1 xi 1 exp ðalrðxÞ mÞ0 S1 ðalrðxÞ mÞ F a0 v1 ðalrðxÞ mÞ : 2
f ðxÞ ¼
ð7Þ This distribution will be denoted as X SN D S ðm, S, aÞ. A particular member of this class of distributions is determined by (D 1)(D þ 4)/2 parameters, D 1 more than the aln class due to the α vector. The aln model is obtained as a particular case when α ¼ 0. For this reason, we can say that the alsn model is a generalization of the aln model. We could express the density function in terms of any orthonormal logratio representation. We have only to use the matrix relationship between h(X) and alr(X) coordinates (1) and the linear transformation property of the sn
where y stands for the square root of the diagonal matrix formed from U. The relationship between the parameters of both densities is j ¼ Am,
Y ¼ ASðAÞ0 ,
0
ϱ ¼ y A1 v1 a
ð9Þ
Both densities (7) and (8) define exactly the same probability law on S D and are defined with respect to the Lebesgue measure l. Using the relationship (2), it is easy to express it with respect to the Aitchison measure la as f a ðxÞ ¼ 2ð2pÞðD1Þ=2 jYj1=2 1 exp ðhðxÞ j Þ0 Y1 ðhðxÞ j Þ F ϱ0 y1 ðhðxÞ j Þ ; 2
ð10Þ here the subindex a is only used to remind that the density is expressed with respect to la. Mateu-Figueras and Pawlowsky-Glahn (2007) define the logistic skew-normal model using this density and call it the skew-normal distribution on the simplex by analogy to the normal on the simplex law defined using the same strategy (Mateu-Figueras et al. 2013). Even though this change of measure gives us the same law of probability, it produces changes in some characteristic values of the distribution. An interested reader can find a detailed explanation in Mateu-Figueras and PawlowskyGlahn (2007) and Mateu-Figueras et al. (2013). If we want to study properties of the distribution directly on the simplex, it is better to use the density with respect to the Aitchison measure because it is compatible with the simplex space structure. Moreover, the Aitchison measure enables further generalization of the logistic skew-normal distribution, if parts of the random composition are weighted, e.g., according to measurement prevision (Egozcue and Pawlowsky-Glahn 2016; Talská et al. 2020). However, for computation of probability of an event we have to use the density with respect to the Lebesgue measure in order to compute the integrals. In Fig. 1 we present ternary diagrams with isodensity curves for two logistic skew-normal densities with respect
Additive Logistic Skew-Normal Distribution
13
a
b x1
x2
x1
x3
A
x2
x3
Additive Logistic Skew-Normal Distribution, Fig. 1 In solid line, isodensity curves of two logistic skew-normal in S3 with: (a) j ¼ (0.05, 0:2 0:5 0.08), U ¼ , ϱ ¼ (5, 5); (b) j 5 (1.5, 0.5), U 5 Id, 0:5 1:5
ϱ ¼ (1.5, 2). In dashed-doted line, the isodensity curves for the corresponding logistic normal model obtained with ϱ ¼ (0, 0)
to the Aitchison measure. We can compare it with the corresponding logistic normal density obtained by taking α ¼ 0. Observe that using the logistic skew-normal models, we will be able to capture more forms of variability besides the typical arc-shape or rounded-shape forms of the logistic normal model.
important consequences, because when working with compositional data, it is usual to center the input data set which could be expressed as perturbation by a compositional vector (center of the distribution). However, this property would be not obtained using the density with respect to the measure l. Accordingly, although both densities define the same probability law, this is one of differences between them. Property 2 (Closure Under Subcomposition). Let X SN D S (j, Y, ϱ) and XS ¼ C (SX) be a C-part random subcomposition. Then XS SN D S (j S, Y S, ϱS) with
Key Properties Using the linear transformation property of the multivariate sn distribution, it is easy to prove the closure under perturbation, power transformation and subcomposition of the logistic skew-normal family of distributions. We list here the main properties using the density in terms of the logratio representation h(x) with respect to a generic or- thonormal basis, but the density in terms of the alr(x) could also be used. An interested reader can find a detailed proof of each property in Mateu-Figueras and Pawlowsky-Glahn (2007) and MateuFigueras et al. (2013). Property 1 (Closure Under and ). Let X D SN D ð j, U, ϱ Þ be a random composition, a S and b S ℝ. Then, the D-part random composition X ¼ a (b X) 2 SN D (h(a) þ bj, b Y, ϱ). S From this property and using the density function with respect to the Aitchison measure la we obtain the invariance under perturbation operation, i.e., fa(a x) ¼ fa(x). This has
0
j S ¼ U0 SUj, ϱS ¼
Y S ¼ ðU0 SUÞY ðU0 SUÞ , uS Ys1 B0 ϱ 0 1 þ ϱ0 ðy1 Yy1 BY 1 S B ϱ
,
where B ¼ y1Y (U0S0U), yS and y are the squared roots of the diagonal matrices formed by Y S and Y, respectively. Matrices U and U contain in columns the clr representation of the corresponding orthonormal basis in S D and S C. The center of a random composition X plays the role of the expected value for a real random vector (Pawlowsky-Glahn et al. 2015). Mateu-Figueras et al. (2013) show that this center can be expressed as the expected value using the density function with respect to la; for this reason it is denoted Ea(X).
14
Property 3 (Location). Let X SN D S (j, U, ϱ), then the expected value with respect to the measure la is Ea(X) ¼ h1(β) with β ¼ j þ yθ 2=p, where y stands for the squared root of the diagonal matrix formed by U and θ is as (5) but using parameters j, U and ϱ. The expected value with respect to the Lebesgue measure corresponds to the standard expected value for real random vectors. For a logistic skew-normal random composition, there is no closed form for this value, although it exists. The reason is that the corresponding integral it is not reducible to any simple form and numerical integration should be used. Nevertheless, for random compositions, the use of Ea(X) is recommended with the same arguments as, when working with a compositional data set, the center (the geometric mean) is used instead of the arithmetic mean. Variability of random compositions can be measured with the metric variance, also called the total variance (Aitchison 1986). Property 4 (Metric Variance). Let X SN D S (j, U, ϱ), then the metric variance of X is Mvar(X) ¼ trace (U (2/π)yθθ0y), where y stands for the squared root of the diagonal matrix formed by U and θ is as (5) but using parameters j, U, and ϱ. Given a compositional data set, the parameters of the logistic skew-normal models can be estimated using the ML method based on the logratio representation of the data set. These estimates cannot be expressed in analytic form and numerical methods have to be used to compute them. If we change the logratio representation, the estimates will be related thought the corresponding transformation matrix A as we do in (9), the only deviations will be due to the numerical nature of the procedure. Thus, the estimates are invariant under the choice of the logratio representation. The logistic normal model is obtained when ϱ ¼ 0 and, consequently, a likelihood ratio test can be used to decide if a logistic skew-normal model provides a significantly better fit. Finally, given a compositional data set, we can validate the assumption of the logistic skew-normality via testing for multivariate skew-normality of the olr coordinate representation using a proper goodness-of-fit test. One possibility is to apply a goodness-of-fit test for the underlying w2 distribution with D 1 degrees of freedom of the random variable Z (6) using an olr coordinate representation and the corresponding parameter estimates. Application to Geochemical Data For demonstration of fitting the skew-normal distribution on the simplex to real-world geochemical data, we use the Kola data resulting from a large geochemical mapping in the Kola peninsula of Northern Europe (Reimann et al. 1998). In total,
Additive Logistic Skew-Normal Distribution x1
x2
x3
Additive Logistic Skew-Normal Distribution, Fig. 2 Subcomposition (Al, Si, K) of the bhorizon data set and the fitted isodensity curves for the logistic skew-normal model
approximately 600 soil samples were taken an analyzed for more than 50 chemical elements. The three major lithogenic elements Al, Si, and K from the bhorizon data set (Filzmoser 2020) are used, filtered from outlying observations (Filzmoser and Hron 2008). Using the particular olr representation 1=6 ln ðx1 x2 Þ=x23 , 1=2 lnðx1 =x2 Þ and the Rpackagesn, the _ following estimated parameters are obtained: j ¼ ð0:127, 1:156Þ, U¼ 0:332 0:139 and ϱ ¼ ð2:658, 0:928Þ Each 0:139 0:074 marginal of the two-dimensional logratio data set exhibits a slight skewness that the skew-normal model can capture. The values of the estimated parameters depend on the particular logratio representation we use but the final model on S 3 is invariant under this choice. The fitted isodensity curves on S 3 for these three-part compositions are displayed in Fig. 2.
Summary In the chapter the logistic skew normal model, a follower of the additive logistic skew normal distribution compatible with the algebraic-geometric structure of the simplex, was presented as a general tool for distributional modeling of compositional data in geosciences. It is flexible enough to model data coming from a wide range of applications and to provide a sensible alternative to the well-known Dirichlet distribution
Agterberg, Frits
which is not fully compatible with the logratio approach (Pawlowsky-Glahn et al. 2015).
Cross-References ▶ Compositional Data
15
Agterberg, Frits Qiuming Cheng School of Earth Science and Engineering, Sun Yat-Sen University, Zhuhai, China State Key Lab of Geological Processes and Mineral Resources, China University of Geosciences, Beijing, China
Bibliography Aitchison J (1986) The statistical analysis of compositional data. In: Monographs on statistics and applied probability. Chapman & Hall, London, 416 p. (Reprinted in 2003 with additional material by The Blackburn Press) Azzalini A (2020) The R package sn: the skew-normal and related distributions such as the Skew-t (version 1.6-2). Universit’a di Padova, Italia, URL http://azzalini.stat.unipd.it/SN Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew-normal distribution. J R Statist Soc B 61(3):579–602 Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37:795–828 Egozcue JJ, Pawlowsky-Glahn V (2016) Changing the reference measure in the simplex and its weighting effects. Aust J Stat 45(4):25–44 Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio trans- formations for compositional data analysis. Math Geol 35:279–300 Filzmoser P (2020) StatDA: statistical analysis for environmental data. https://CRAN.R-project.org/package¼StatDA, r package version 1.7.4 Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3):233–248 Filzmoser P, Hron K, Templ M (2018) Applied compositional data analysis. Springer, Cham Martín-Fernández JA (2019) Comments on: compositional data: the sample space and its structure, by Egozcue and Pawlowsky-Glahn. TEST 28:653–657 Mateu-Figueras G, Pawlowsky-Glahn V (2007) The skew-normal distribution on the simplex. Commun Stat Theory Methods 36(9): 1787–1802 Mateu-Figueras G, Pawlowsky-Glahn V (2008) A critical approach to probability laws in geochemistry. Math Geosci 40(5):489–502 Mateu-Figueras G, Pawlowsky-Glahn V, Barceló-Vidal C (2005) The additive logistic skew-normal distribution on the simplex. Stoch Env Res Risk A 18:205–214 Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue JJ (2013) The normal distribution in some constrained sample spaces. SORT 37:29–56 Pawlowsky-Glahn V, Egozcue J, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Chichester Reimann C, Äyräs M, Chekushin V, Bogatyrev I, Boyd R, Caritat P, Dutter R, Finne TE, Halleraker JH, Jæger Ø, Kashulina G, Lehto O, Niskavaara H, Pavlov VK, Räisänen ML, Strand T, Volden T (1998) Environmental geochemical atlas of the central parts of the Barents region. Geological Survey of Norway, Trondheim Talská R, Menafoglio A, Hron K, Egozcue JJ, Palarea-Albaladejo J (2020) Weighting the domain of probability densities in functional data analysis. Stat 9(1):e283
Fig. 1 Photo of Frits Agterberg taken in China University of Geosciences, Beijing, in November 2017
Biography Frederik Pieter (Frits) Agterberg was born in 1936 in Utrecht, the Netherlands. He studied geology and geophysics at Utrecht University, obtaining B.Sc. (1957), M.Sc. (1959), and Ph.D. (1961). These three degrees were obtained “cum laude” (with distinction). After a 1-year Wisconsin Alumni Research Foundation postdoctorate fellowship at the University of Wisconsin, he joined the Geological Survey of Canada in 1962, initially as petrological statistician working on the Canadian contribution to the International Upper Mantle Project. Later, he formed and headed the Geomathematics Section of the Geological Survey of Canada in Ottawa (1971–1996). He has made a major contribution to the geomathematical literature in numerous areas, often as the first to explore new applications and methods, and always with mathematical rigor blended with practical examples. The major research
A
16
areas include statistical frequency distributions applied to geoscience data, mineral-resources quantitative assessment, stratigraphic analysis and timescales, and fractal and multifractal modeling. As one of the founders of the International Association for Mathematical Geosciences (IAMG). Frits Agterberg, in his association with the IAMG, has been a key figure in its ongoing success and is widely recognized as one of the fathers of mathematical geology. Agterberg has authored or coauthored over 350 scientific journal articles and books, including Computer Programs for Mineral Exploration published in 1989 by Science (Agterberg 1989), which has become a routinely used computer program for mineral resources assessments worldwide. The books he had published include the textbook Geomathematics: Mathematical Background and GeoScience Applications, published in 1974 by Elsevier with approximately 10,000 copies sold worldwide (Agterberg 1974), the monograph Automated Stratigraphic Correlation, published in 1990 by Elsevier (Agterberg 1990), and the textbook Geomathematics: Theoretical Foundations, Applications and Future Developments, published in 2014 by Springer (Agterberg 2014). He has edited or coedited seven other books and many special issues in scientific journals. Frits Agterberg received several prestigious awards and recognitions, including the third W.C. Krumbein medalist of the International Association for Mathematical Geology in 1978, Best Paper Awards for articles in the journal Computers & Geosciences for 1978, 1979, and 1982, correspondent member of the Royal Dutch Academy of Sciences in 1981, and as Honorary Professor of the China University of Geosciences in 1987. A newly discovered fossil, Adercotrima agterbergi, was named after him to recognize his contributions to quantitative stratigraphy. Since 1968, he was associated with the University of Ottawa, where he has taught an undergraduate course on “Statistics in Geology” for 25 years and served as a supervisor for undergraduate and graduate students. Several of his former students now occupy prominent positions in the universities, government organizations, and mining industry in Canada and abroad. Other academic positions included being Distinguished Visiting Research Scientist at the Kansas Geological Survey of the University of Kansas (1969–1970), Adjunct Professor at Syracuse University (1977–1981), Esso Distinguished Lecturer for the Australian Mineral Resource Foundation, University of Sydney (August–November 1980), and Adjunct Research Professor, Department of Mathematics, Carleton University, Ottawa (1986–1994). Since 2000, he has served as a guest and distinguished professor at China
Agterberg, Frits
University of Geosciences (CUG) both in Beijing and Wuhan, where he has co-supervised a dozen Ph.D. students with his former student Professor Qiuming Cheng at York University. Agterberg has lectured in more than 40 short courses worldwide, including several lectures delivered when he was named IAMG Distinguished Lecturer in 2004. From 1979 to 1985, he was leader of the International Geological Correlation Programme’s Project (IGCP) on “Quantitative Stratigraphic Correlation Techniques.” He has served on numerous committees, editorial boards, and councils of national and international organizations. For example, he served as an associate editor of several journals, including the Canadian Journal of Earth Sciences, the Bulletin of the Canadian Institute of Mining and Metallurgy, Mathematical Geosciences, Computers & Geosciences, and Natural Resources Research, the president of IAMG from 2004 to 2008, and general secretary of IAMG from 2012 to 2016. He chaired the Quantitative Stratigraphy Committee of the International Stratigraphic Commission (ICS), which is part of the International Union of Geological Sciences (IUGS). In 1996, Frits Agterberg commenced a phased retirement from the Geological Survey of Canada to work as part-time independent geomathematical consultant for the industry. He continues to teach and supervise graduate students at the University of Ottawa. In November of 2017, his Chinese students and friends organized a workshop with a party on the CUGB campus for celebrating his 81st birthday and his contributions to Chinese Mathematical Geosciences. Now he is still tirelessly working on scientific publications, including coediting a large volume of the Encyclopedia of Mathematical Geosciences, which includes several hundred international scientists working on over 300 separate articles on different topics of mathematical geosciences.
Bibliography Agterberg F (1974) Geomathematics, mathematical background and geo science application. Development in geomathematics. Elsevier, Amsterdam Agterberg F (1989) Computer programs for mineral exploration. Science 245(4913):76–81 Agterberg F (1990) Automated stratigraphic correlation. Elsevier, Amsterdam Agterberg F (2014) Geomathematics: theoretical foundations, applications and future developments. Springer, Cham, Switzerland
Algebraic Reconstruction Techniques
Aitchison, John John Bacon-Shone Social Sciences Research Centre, The University of Hong Kong, Hong Kong, China
17
London, for which he later received the Guy Medal in Silver. This influential paper (over 5000 citations) finally solved Pearson’s problem of spurious correlation (Pearson 1897), which Chayes (Chayes 1960) had linked to compositional data. While others had used the logistic transformation for binary outcome probabilities, John alone saw the potential for modeling compositions in general by using the logistic Normal distribution. This paper developed into an essential monograph (Aitchison 1986). This book (which has received over 5000 citations) and many other influential related papers, which together revolutionized the statistical analysis of compositions, led to John receiving the William Christian Krumbein Medal of the International Association for Mathematical Geosciences in 1997 for a “distinguished career as author and researcher in the field of mathematical geology.” After retirement from Hong Kong, John became Professor of Statistics in the University of Virginia, before finally returning to Glasgow.
Bibliography
Fig. 1 John Aitchison, courtesy of Prof. Bacon-Shone
Biography John Aitchison was born in East Linton, East Lothian, Scotland, on July 22, 1926, and passed away in Glasgow on December 23, 2016. John first studied mathematics at the University of Edinburgh, where he received his MA, before attending Cambridge on a scholarship, where he initially studied mathematics, before a course from Frank Anscombe diverted him to Statistics. He started his career as a statistician in the Department of Applied Economics at the University of Cambridge, where he wrote his first classic text on the lognormal distribution with Alan Brown (Aitchison and Brown 1957). This book already showed the importance of accounting for restricted sample spaces by transformation or using nonEuclidean distance metrics. He then returned to Scotland, where he was a Lecturer in Statistics in the University of Glasgow. After 5 years at the University of Liverpool, he returned to Glasgow to become Titular Professor of Statistics and Mitchell Lecturer in Statistics and write his second classic book on statistical prediction analysis (Aitchison and Dunsmore 1975). Then John took a big step into the unknown as Professor of Statistics at the University of Hong Kong. During his time in Hong Kong, John presented his seminal paper on statistical compositional data analysis (Aitchison 1982) as a read paper to the Royal Statistical Society in
Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B Methodol 44(2):139–177 Aitchison J (1986) The statistical analysis of compositional analysis. Chapman & Hall, London Aitchison J, Brown J (1957) The lognormal distribution. Cambridge University Press, Cambridge Aitchison J, Dunsmore IR (1975) Statistical prediction analysis. Cambridge University Press, Cambridge Chayes F (1960) On correlation between variables of constant sum. J Geophys Res 65(12):4185–4193 Pearson K (1897) On a form of spurious correlation which may arise when indices are used, etc. Proc R Soc 60:489–498
Algebraic Reconstruction Techniques Utkarsh Gupta and Uma Ranjan Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, Karnataka, India
Definition Algebraic reconstruction techniques (ART) were initially proposed for radiology and medical applications but have been extensively used to study various seismological phenomena (Del Pino 1985; Peterson et al. 1985). These algorithms are based on iterative reconstruction techniques and efficiently help invert matrices of large dimensions, taking advantage of the sparse nature of the matrices. These methods gained popularity due to the fact that they were able to handle various data geometries with irregular sampling and limited projection angle (Peterson et al. 1985). In addition, these techniques
A
18
Algebraic Reconstruction Techniques
are relatively more straightforward to implement compared to transform methods described by Scudder (Scudder 1978). This entry describes fundamental principles and derivations of ART along with various modifications to ART, namely simultaneous iterative reconstructive technique (SIRT) and simultaneous algebraic reconstructive technique (SART).
Tomographic Projection Model In iterative reconstruction techniques, one starts by discretizing the problem, as shown in the Fig. 1. We achieve discretization by superimposing a square grid on the image g(x, y); we assume that each square grid cell holds a constant value denoted by gj (subscript j denotes the jth cell), and N represents the total number of cells. In addition, a ray denotes a line in the x y plane, and pi represents the ray sum (also referred as line-integral) along ith ray as shown in Fig. 1. Relation between g0j s and p0i s can be expressed as in Eq. 1. N
aij gj ¼ pi , i ¼ 1, 2, . . . , M
ð1Þ
j¼1
where M represents the total number of rays across all the projection direction, and aij represents the weight factor which is assigned for jth cell and ith ray. Eq. 1 represents a large set of linear equations that needs to be inverted to determine the unknowns g0j s:. For small values of N and M, conventional linear algebra methods like matrix inversion to determine the unknowns g0j s Algebraic Reconstruction Techniques, Fig. 1 Illustration of tomographic projection process, with aij representing the weight contribution for jth pixel and corresponding ith ray
can be applied. But, in almost all the practical scenarios, values of N and M are quite large. For example, for the image of dimension 256 256, then the value of N is as large as 65,000. For real scenarios, M’s magnitude follows the same order as of N. Direct matrix inversion is often impossible even for cases where values of N and M are small due to measurement noise in the projection data or the cases when M < N. In all such cases, we depend upon simple least square method. These least square-based methods are not applicable for real scenarios as M and N will be huge. Therefore, in situations where conventional linear algebra fails to achieve the desired solution, algebraic reconstruction techniques (ART) comes into picture.
ART Algorithm ART algorithm is based on Methods of Projections, proposed by Kaczmarz (1937) (hence sometimes referred to as Kaczmarz method). Eq. 1 can be unrolled as set of linear equations as follows: a11 g1 þ a12 g2 þ a13 g3 þ þ a1N gN ¼ p1 , a21 g1 þ a22 g2 þ a23 g3 þ þ a2N gN ¼ p2 , ⋮
ð2Þ
aM1 g1 þ aM2 g2 þ aM3 g3 þ þ aMN gN ¼ pM : !
Further, we will represent an image g(x, y) as g ¼ ½g1 , g2 , g3 , . . . , gN which is a N-dimensional vector of unknowns g0j s in a N-dimensional Euclidean space. Each sub-equation in Eq. 2 represents an individual hyperplane.
Algebraic Reconstruction Techniques
19
A unique solution will exist only when all the M hyperplanes intersect at a single point. For better understanding and visualization, we explain the algorithm in two-dimensional Euclidean space with two unknowns g1 and g2. These unknowns satisfy the following equations: a11 g1 þ a12 g2 ¼ p1 , a21 g1 þ a22 g2 ¼ p2 :
ð3Þ
Sub-equations in Eq. 3 represent the equations of 2D-lines, and the same is graphically shown in Fig. 2. In order to, iteratively solve the pair of linear equations (Eq. 3), we start ! with an initial guess g0 , which is projected perpendicularly ! onto the first equation resulting in g1 , then reprojecting the ! resulting point g1 onto the second equation and then back projecting onto the first line and so on. If a unique solution exists, the system will always converge to the solution (which ! in this case is represented as gs in Fig. 2). We can extend the same process described for solving a pair of linear equations to a large set of linear equations. The method begins with an initial guess represented by vector !0 g ¼ g01 , g02 , g03 , . ! . . , g0N . In most of the computational ! implementations, g0 is represented as a 0 vector. The initial ! guess is projected onto the first hyperplane of Eq. 2 giving g1 : !1 Thereafter, g !is projected onto the ! second hyperplane of Eq. 2 to yield g2 and so on. When gði1Þ is projected onto ith
Algebraic Reconstruction Techniques, Fig. 2 The Kaczmarz method of solving algebraic equations is illustrated for the case of two unknowns. It begins with some arbitrary initial guesses and then projects onto the line corresponding to the first equation. The resulting point is now projected onto the line representing the second equation. If there are only two equations, this process is continued back and forth, until convergence is achieved. [Modified from Kak and Slaney (2001)]
! hyperplane resulting in gðiÞ , this operation can be mathematically expressed as: g ! gðiÞ ¼ gði1Þ
!
ði1Þ
! ai pi
! ai ! ai
A ! ai
ð4Þ
where ! ai ! ai represents the scalar product operation and ! ai ¼ ½ai1 , ai2 , ai3 : . . . :aiN . Eq. 4 represents the vectorized update equation for the ART algorithm. Next section covers the derivation for Eq. 4. Vector form of the first hyperplane in Eq. 2 can be written as: ! ! a1 g ¼ p1
ð5Þ
where the normal vector to the hyperplane is given by ! a1 ¼ ! ½a11 , a12 , a13 : . . . :a1N (represented by OD in Fig. 3). Eq. 5 ! simply says that the projection of any vector OC (where C is any random point on the hyperplane) on the vector a1 is of constant length. Unit vector along the direction of normal ! !) is represented by OU vector (a given by: 1 ! OU ¼
! a1 ! a a ! 1
ð6Þ
1
Algebraic Reconstruction Techniques, Fig. 3 The hyperplane ! a1 ! ! g ¼ p perpendicular to the vector OD ! a 1
1
20
Algebraic Reconstruction Techniques
Also, the perpendicular distance of the hyperplane ! from the origin is represented by OA in Fig. 3, is given by ! ! OC OU : ! ! ! j OA j ¼ OU OC ¼ ¼
! 1 ! a1 OC ! ! a1 a1 p1 1 ! ! a1 g ¼ ! ! ! a1 a1 a1 ! a1
ð7Þ
! ! To calculate g1 , we subtract the initial guess g0 from the ! vector HG : ! ! ! gð1Þ ¼ gð0Þ HG
ð8Þ
! where the length of the vector HG is represented by: ! ! ! j HG j ¼ j OF j j OA j ! ! ! ¼ gð0Þ OU j OA j
ð9Þ
! gs lim gðkM Þ ¼ !
k!þ1
ð13Þ
Few more comments about convergence have been presented below: Case 1: Suppose, in Fig. 2, two lines are perpendicular to each other. In that case, it is possible to arrive at the actual solution in just two iterations following the update Eq. 4, and the convergence is independent of the choice of the initial guess. Case 2: Suppose, in Fig. 2, two lines make a very small angle between them. In that case, we may require a significant number of iterations (k value in Eq. 13) to reach the actual solution depending upon the choice of the initial guess. Clearly, rate of convergence of ART algorithm is directly influenced by the angles between the hyperplanes. If M hyperplanes in Eq. 2 can be made orthogonal with respect to one another, we can reach the actual solution in just one pass through the M equation (only in case a unique solution exists).
Substituting Eqs. 6 and 7 in Eq. 9, we get:
! j HG j ¼
Characteristics of the Solution
! gð0Þ ! a1 p1
ð10Þ
! a1 ! a1
! Since HG follows the same direction as the unit vector ! OU , we can write:
! ! ! HG ¼ j HG j OU ¼
! gð0Þ ! a1 p1 ! a1 ! a1
! a1
Overdetermined System (M > N): Not an uncommon case, it is possible that we have more number of hyperplanes (M) than the number of unknowns (N ) in Eq. 2, and also projection data ( p1, p2. . .pM) is corrupted by measurement noise. In
ð11Þ
Finally substituting Eq. 11 in Eq. 8, we get: ! gð0Þ ! a1 p1 ! ! ! ð1Þ ð0Þ g ¼g a1 ! a1 ! a1
ð12Þ
Eq. 4 represents the generalized form of Eq. 12.
Convergence of ART Algorithm Tanabe (1971) proved that if a unique solution ! gs exists to a system of linear equations described by Eq. 2 (refer Fig. 2), then:
Algebraic Reconstruction Techniques, Fig. 4 Illustration of the case where all hyperplanes do not intersect at a single point. In such cases, iterations will continue to oscillate in the neighborhood of the intersections of the hyperplanes
Algebraic Reconstruction Techniques
21
such cases, no unique solution is possible (as shown in Fig. 4). In all such cases, ART iteration never converges to a unique point, but iterations will continue to oscillate in the neighborhood of the intersections of the hyperplanes (as shown in Fig. 4). Under-determined System (M < N): In such cases, there is no unique solution, but in fact, there are an infinite number of solutions possible. It can be shown in such cases that ART ! iteration converges to a solution ! g such that g0 ! g is s
s
minimized.
Advantages and Disadvantages of ART The key feature of the iterative steps presented for ART algorithm here is that it is now possible to integrate the prior information about the reconstructed image g(x, y). For example, if it is known that pixels of the reconstructed image cannot take a nonnegative value, then in each iteration, we can set negative components equal to zero. Similar to the above example, we can incorporate other information about the image g(x, y) if known or conveyed in advance. In applications requiring large-sized reconstructions, difficulties arises in calculation, storage, and efficient retrieval of weight coefficients aij. In most of the ART implementations, these weights aij exactly represent the length of the intersection of the ith ray with the jth pixel. It should be noted that linelength is not the only way to describe the distribution of weights; various physics-based concepts like attenuation and point spread function can also be incorporated. Eq. 4 represents the vectorized update equation for the ART algorithm. Below we describe the ART algorithm as the gray levels/pixel (gj) wise update equation: ð iÞ
ði1Þ
gj ¼ gj
þ
ð pi qi Þ ! a N 2 ij k¼1 aik
ð14Þ
where ! ai qi ¼ gði1Þ ! N
ði1Þ
¼
gk
aik
ð15Þ ð16Þ
k¼1
These equations represent that when we project ((i 1)th) ði1Þ ) is onto the ith equation, the gray level of the jth pixel (gj ðiÞ corrected by a factor of Dgj where: ðiÞ
ðiÞ
ði1Þ
Dgj ¼ gj gj
¼
pi qi a N 2 ij k¼1 aik
ð17Þ
Here, pi is the measured line-integral/ray sum along the ith ray, and qi is the computed line-integral/ray sum for the same ray but using the (i 1)th solution for the image gray levels. ART reconstruction, in general, suffers from salt and pepper noise which generally arises due to inconsistencies introduced while modeling the weights (aik) in the forward projection process. Due to these inconsistencies, computed ray sum (qi in Eq. 15) is different from the measured ray sum ( pi). In the practical scenario, it is possible to reduce the effect of noise by relaxation, in which instead of updating the gray ðiÞ ðiÞ level by the factor Dgj , we update it by the factor of aDgj , where α is less than 1. At the same time, introducing the relaxation parameter α in the update equation leads to the slower convergence of the ART algorithm. Finally, the update equation for the ART algorithm including the relaxation parameter α is described below: ðiþ1Þ
gj
ðiÞ
ðiÞ
¼ gj þ aDgj
! ! gðiþ1Þ ¼ gðiÞ þ a
! pi gðiÞ ! ai ! ai ai !
ð18Þ
! ai
ð19Þ
Simultaneous Iterative Reconstructive Technique (SIRT) Simultaneous iterative reconstructive technique (SIRT) (Gilbert 1972) is another iterative algorithm that produces a smoother and better-looking reconstructed image than those produced from the ART algorithm at the cost of slower conðiÞ vergence. We again compute the correction term Dgj (using Eq. 17) in the jth pixel caused by ith equation. But before adding the correction term to the jth gray level, we calculate all the correction terms calculated from all the M equations in Eq. 2, after that, we update the gray level in the jth cell by adding the average of all the obtained correction terms obtained from each equations. This concludes the one iteration of the SIRT algorithm. In the second iteration, we again start with the first equation in Eq. 2, and the process is repeated. Mathematically SIRT update equation can be represented as: ðkþ1Þ
gj
ðk Þ
¼ gj þ a
ðiÞ M i¼1 Dgj
M
ð20Þ
where k represents the iteration number, M signifies number of total projection rays or total number of algebraic equations, and α represents the relaxation parameter.
A
22
Algebraic Reconstruction Techniques
Simultaneous Algebraic Reconstructive Technique (SART)
N
pi ¼
bj aij
ð25Þ
j¼1
Simultaneous algebraic reconstructive technique (SART) (Andersen and Kak 1984) was designed with mainly two objectives in mind: first is to reduce the salt and pepper noise commonly found in ART-type reconstruction algorithms, and secondly to achieve good quality and numerical accuracy in only a single iteration. We will now describe the key steps involved in SART algorithm.
bj ðx, yÞ ¼
Modeling the Forward Projection Process For SART-type reconstruction algorithm, we express the image g(x, y) as the linear combination of N basis images (bj(x, y)) weighted by real numbers β1, β2, β3. . .. . .βN. This can be mathematically expressed as: N
gðx, yÞ gðx, yÞ ¼
bj bj ðx, yÞ
ð21Þ
j¼1
gðx, yÞ is a discrete approximation of the actual image g(x, y), and b0i s are the finite set of numbers which completely describes the image relative to the chosen basis images (bj(x, y)). If ri(x, y) ¼ 0 is the equation of ith ray, the projection operator Rj along the direction of the ray can be expressed as follows: þ1 þ1
pi ¼ Ri gðx, yÞ ¼
gðx, yÞ dðr i ðx, yÞÞdxdy
Here, aij represent the line integral of bj(x, y) along the ith ray. Eq. 25 shares the same structure as Eq. 1 and is yet more generalized since βj does not represent the image gray levels as gj in Eq. 1. Eq. 25 and Eq. 1 are equivalent if in case image frame is divided into N identical sub-squares, and we use the basis image as described by Eq. 26.
ð22Þ
1, 0,
inside the jth pixel otherwise
ð26Þ
SART algorithm provides superior reconstruction using the forward projection process described above. Accurate reconstructions are obtained using bilinear elements which are simplest higher order basis functions. It can be shown that using bilinear basis we obtain a more continuous image reconstruction than those which are obtained using pixel basis. However, computing line-integral across these bilinear basis is a computationally expensive task. Hence we approximate the overall ray integral Ri gðx, yÞ by finite sum involving Mi equidistant point fgðsim Þg (see Fig. 5a). Mi
pi ¼
gðsim ÞDs
ð27Þ
m¼1
gðsim Þ is determined using bilinear interpolation using the values βj of g(x, y) on the four neighboring points on the sample lattice. We get:
1 1 N
gðsim Þ ¼
Assuming the projection operator Rj as a linear operator, we have: N
bj Ri bj ðx, yÞ
Ri gðx, yÞ ¼
ð23Þ
i¼1
pi ¼ Ri gðx, yÞ Ri gðx, yÞ ¼
Mi
i¼1
ð29Þ
d ijm bj Ds, 1 i J
ð30Þ
Mi
j¼1 m¼1
N
bj aij
d ijm bj Ds m¼1 j¼1
N
j¼1
N
pi ¼
pi ¼
bj Ri bj ðx, yÞ
ð28Þ
where dijm represents the contribution that is made by the jth image to the mth point on the ith ray. Overall, we can approximate the line-integral pi as the linear combination of the βj:
In practical implementation of SART-type algorithm, we compute Ri gðx, yÞ using some approximations aij to reduce the computational load of the algorithm. Hence, we get: N
dijm bj j¼1
ð24Þ
N
¼
aij bj j¼1
ð31Þ
Algebraic Reconstruction Techniques
23
Algebraic Reconstruction Techniques, Fig. 5 (a) Illustration of the ray-integral equations for a set of equidistant points along a straight line cut by the circular reconstruction region. (b) Longitudinal Hamming window for set of rays
A
where aij is computed as the sum of contribution from different points along the ray. Mi
aij ¼
dijm Ds
ð32Þ
m¼1
Also, the overall weights (aij) are adjusted in such a way that Nj¼1 aij is actually equal to the physical length Lj. The precise formula used for updating the b0i s using SART algorithm is stated as: !
ðk Þ ! p i b ai i
!
ðkþ1Þ
bj
! ðkÞ ¼ bj þ
tij
N
a j¼1 ij i aij
Summary and Conclusions This chapter covers three essential algebraic reconstruction techniques: ART, SIRT, and SART, prominent in geoscience, primarily for seismological research and application. Also, these algorithms form the basis for several other algebraicbased reconstruction techniques proposed for various tasks. For instance, C-SART (constrained simultaneous algebraic reconstruction technique) algorithm has been applied for ionospheric tomography (Hobiger et al. 2008) that has a higher convergence rate than classical SART. These algorithms, being iterative in nature, have the added advantage of requiring less computational storage along with fast convergence to the solution in case of large and sparse linear systems (Xun 1987).
ð33Þ
Bibliography
where Mi
tij ¼
him dijm Ds
ð34Þ
m¼1
where the summation with respect to i is over the rays intersecting the jth image cell for a given projection direction. The sequence him, for 1 m Mi, is a Hamming window of length Mi and length of the window varies according to the number of point Mi describing the part of ray inside the reconstruction circle. On the whole, SART algorithm described using the update Eq. 33 results in a less noisy reconstructed image when compared to images reconstructed using ART and SIRT type algorithms.
Andersen AH, Kak AC (1984) Simultaneous algebraic reconstruction technique (SART): a superior implementation of the ART algorithm. Ultrason Imaging 6(1):81–94 Del Pino E (1985) Seismic wave polarization applied to geophysical tomography. In: SEG technical program expanded abstracts 1985. Society of Exploration Geophysicists, Tulsa, pp 620–622 Dines KA, Lytle RJ (1979) Computerized geophysical tomography. Proc IEEE 67(7):1065–1073 Gilbert P (1972) Iterative methods for the three-dimensional reconstruction of an object from projections. J Theor Biol 36(1):105–117 Hobiger T, Kondo T, Koyama Y (2008) Constrained simultaneous algebraic reconstruction technique (C-SART) – a new and simple algorithm applied to ionospheric tomography. Earth Planets Space 60(7): 727–735 Kak AC, Slaney M (2001) Principles of computerized tomographic imaging. Society for Industrial and Applied Mathematics, Philadelphia
24 Karczmarz S (1937) Angenaherte auflosung von systemen linearer gleichungen. Bull Int Acad Pol Sic Let Cl Sci Math Nat 35:355–357 Peterson JE, Paulsson BN, McEvilly TV (1985) Applications of algebraic reconstruction techniques to crosshole seismic data. Geophysics 50(10):1566–1580 Scudder HJ (1978) Introduction to computer aided tomography. Proc IEEE 66:628–637 Tanabe K (1971) Projection method for solving a singular system of linear equations and its applications. Numer Math 17(3):203–214 Xun L (1987) Art algorithm and its applications in geophysical inversion. Comput Techn Geophys Geochem Explor 9(1):34
Allometric Power Laws Gabor Korvin Earth Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Kingdom of Saudi Arabia
Definition Allometry has its origin in biology, anatomy, and physiology; it studies the effect of body size on shape and metabolic rate, the laws of growth, and relationships between size and different measurable properties of living organisms. Its laws were first summarized by D’Arcy Thompson in 1917 and Julian Huxley in 1932 (Stevens 2009). Frequently, relationships between size and other measured quantities obey power law equations (Allometric Power Laws) of the form y ¼ kxα (with standard notation y / xα or y~xα, where / or ~ denote proportionality). The exponent α is in most cases a noninteger real number, which explains the name of the discipline (allometry < Gr. αllomεtron ¼ “strange scale”). APLs of the form y / xα are also called scaling laws, because they are scale-invariant, and they arise when describing self-similar objects, such as earth-materials and formations on the surface and in the interior of the Earth and planets.
Introduction: Allometric Power Laws and SelfSimilarity A power law is a functional relationship between two quantities, and it has the form y ¼ kxα (with standard notation y / xα or y~xα, where / or ~ denote proportionality). Famous power laws are the allometric power laws (APLs in what follows), originally used to express relationships between body size and biological variables (Stevens 2009), such as the empirical rule that metabolic rate q0 is proportional to body mass M raised to the (¾)th power: q0 / M3/4, or that heart
Allometric Power Laws
rate t is inversely proportional to the (1/4)th power of M: t / M1/4. Dimensional considerations would suggest a metabolic rate proportional to the surface area available for energy transfer, so one would expect that metabolic rates of similar organism should scale as q0 / M2/3 rather than the empirical “M0.75” law, and the very name of such laws (allometry < Gr. αllomεtron ¼ “strange scale”) expresses the fact that most observed APL exponents are “strange,” in the sense that their values usually cannot be derived from physical principles or dimensional analysis (Korvin 1992: 233). APLs of the form y ¼ kxα (where both sides are positive) can be written, taking logarithm, as logy ¼ α log x þ log k or lny ¼ α ln x þ ln k that is they show up as straight lines when plotted on double-logarithmic axes, and the scaling exponent is obtained from the slope. APLs are scale invariant. Given an APL f(x) ¼ k ∙ xα, multiplying the argument x by a factor l causes only a proportionate change in the function itself: (lx) ¼ k(lx)α ¼ lαf(x) / f(x). The frequent occurrence of such APLs in Geosciences (see Table 1) which connect two measured properties of a geologic object (as, e.g., perimeter and area of islands, width and depth of rivers, etc.) is a consequence of the similarity of all geological objects of the same kind (as, e.g., all mountains, all islands, all sea coasts, all meandering rivers), or it arises because of their self-similarity, that is the (statistical) similarity of their parts observed at different scales (Korvin 1992: 4–5). Consider, as an example, the relation between wavelengths L and widths W of meanders of different size (Fig. 1), and assume there exists a functional relation W ¼ W(L ) (Eq. 1) between them. Consider three pieces of the same meander, or from different meanders, with wavelengths L1 > L2 > L3. Because of the similarity of meanders, one has for any two W ð La Þ La wavelengths La and Lb that W ðLb Þ ¼ f Lb with a yet unknown function f(x), i.e., W ðLa Þ ¼ W ðLb Þf LLab (Eq. 2), in particular W 1 ¼ W 3 f LL13 ); W 1 ¼ W 2 f LL12 ); W 2 ¼ W 3 f LL23 ) (Eqs. 3 a-c). Combining Eqs. (3. a, b, c) yields LL13 ¼ f LL12 ∙f LL23 , that is, letting LL12 ¼ a, LL23 ¼ b we see that for all a, b > 0 the unknown function f(x) satisfies f(a ∙ b) ¼ f(a) ∙ f(b) (Eq. 4). This is Cauchy’s functional equation (Aczél & Dhombres 1989; Korvin 1992: 73–76) whose only continuous solution (apart from f(x) 0) is f(x) ¼ xα for some real number α. Writing up Eq. 2 for the case La ¼ L, Lb ¼ 1, W(1) ¼ w we get an APL: W ¼ wLα (Eq. 5) connecting meander width with wavelength (Korvin 1992: 6). Note that the value of exponent α cannot be found from scaling arguments!
Allometric Power Laws
25
Allometric Power Laws, Table 1 Some empirically found APLs in Geosciences (the exponent b is different in each case) # 1
Name of the law Cloud areas and perimeters
Equation
2
Destruction of basaltic bodies by high velocity impacts
M(m) / mb (Eqs. 7a, b)
3
Drainage area vs. channel length
L / A0:5 d (Eq. 8)
4
Grain size distribution in sedimentary rock
N ðr Þ /
5
Horton’s law of stream numbers
N o ¼ ROo B (Eq. 10)
6
Meander width and channel width vs. channel depth
7
Number of clusters of entrapped oil particles in rock
Wc ¼ 6.8h1.54 W m ¼ 7:44W 1:01 c (Eqs. 11. a, b) ns / sb (Eq. 12)
8
Porosity vs. grain radius and eff. hydraulic radius for sandstone and graywacke Porosity vs. pore-size for sandstone
F ¼ 0:5
Wc / h1.54 (Eq. 15)
11
Rivers’ width vs. depth (for sinuosity >1.7) Size distribution of caves
12
Size distribution of islands
13
Undesirable porosity in concrete
D
P / A2 (Eq. 6)
b
r r0
Meaning of the constants P ¼ cloud perimeter A ¼ cloud area, D ¼ 1.35 0.05 M(m)¼ total # of fragments of mass larger than m L¼ channel length Ad¼ drainage area
Authors Lovejoy (1982)
Discussion of the law Korvin (1992): 62
Fujiwara et al. (1977)
Korvin (1992): 203
Hack (1957)
Korvin (1992): 6
N(r) ¼ # of grains longer than r, r0 reference length
Johnson and Kotz (1970)
Korvin (1992): 203
No¼ # of streams of order o, Ω is the order of the basin, RB is the bifurcation ratio Wm meander width, Wc channel width, h Channel depth (feet)
Horton (1945)
Scheidegger (1968)
Lorenz et al. (1985)
Korvin (1992): 1–3
ns ¼ # of clusters s ¼ # of oil droplets in a cluster
Sherwood (1986)
F porosity, rgrain grain radius. reff effective hydraulic radius
Pape et al. (1984)
Korvin (1992): 193–195 Korvin (1992): 288
F porosity, l1 and l2 smallest and largest pore size
Katz and Thompson (1985) Leeder (1973) Curl (1986)
Korvin (1992): 286–287 Korvin (1992): 1 Korvin (1992): 197
Korčak (1940) Caquot (1937)
Korvin (1992): 191 Korvin (1992): 14
(Eq.9)
9
r grain r eff
0:25
(Eq. 13) F¼
l1 l2
b
(Eq. 14) 10
N ðlÞ /
l l0
Wc bankfull width, h bankfull depth b
(Eq. 16) Prob (A > a) / ab (Eq. 17) F¼
l1 l2
1=5
N(l ) # of caves longer than l, l0 reference length A, a areas F porosity, l1 and l2 smallest and largest grain size
(Eq. 18)
Allometric Power Laws, Fig. 1 Derivation of the APL between meander width W and wavelength L. (From Korvin 1992: 5)
Several kinds of phenomena can produce, or explain, the observed APLs in earth-processes, earth-formations, and geomaterials. If an earth-process is observed to similarly behave across a range of spatial and/or temporal scales, the created formation usually becomes self-similar, and thus described by APLs. Deterministic chaos, self-organized
criticality (SOC, Hergarten 2002), critical phenomena (as rock fragmentation and failure, Turcotte 1986) are also associated with APLs. Other relevant models leading to APLs include (Mitzenmacher 2004): preferential attachment (applied in transportation geography, geomicrobiology, soil science, hydrology, and geomorphology); minimization of
A
26
Allometric Power Laws
costs, entropy, etc. (applicable to channel networks and drainage basins); multiplicative cascade models (used in meteorology, soil physics, geochemistry); and DLA (Diffusion Limited Aggregation, Korvin 1992: 349–366, used to model the growth of drainage networks, escarpments, eroded plateaus). However, as we know from geomorphology, similar landforms and other earth-formations might arise through quite different processes. This is the principle of equifinality (Bertalanffy 1968): same or similar outcomes can be produced by different processes, and thus there is no way to reconstruct the formative mechanism from the APLs obeyed by the resulting geologic objects.
Examples for Allometric Power Laws in Geosciences Some examples for APLs are compiled in Table 1. Discussion of these laws and references to their sources are found in Korvin (1992).
Allometric Relations with Modified Power Laws Empirical data are sometimes better described by approximate power laws instead of the ideal APL y ¼ axk because of random errors in the observed values, or their systematic deviation from the power law. Such functional forms, of course, do not have scaling symmetry any more. Popular variants are (https://en.wikipedia.org/wiki/Power_law): Power law with error term: y ¼ axk þ ε (Eq. 19). Broken power law: It is a piece-wise function consisting of one or more power laws, for two power laws, for example: y / xa 1 for x < xth (Eq. 20, the threshold xth is y / xath1 a2 xa2 for x > xth the cross-over). Power law with exponential cutoff: y / xαeβx (Eq. 21). Curved power law: f(x) / xα þ βx (Eq. 22). Rigout’s formula: Rigout (1984, cited in Korvin 1992: 290) described the Richardson-Mandelbrot plot (Mandelbrot 1982) for the length of Britain’s coastline measured with resolution l as LðlÞ ¼ Lmax 1 þ
l L0
c
1
(Eq. 23, L0 is the cross-over length). This approach makes possible to describe two slopes (one seen for small, one for large l values) on the double-logarithmic plot. The same trick was used by Pape et al. (1984, cited in Korvin 1992: 290) who studied the scaling of pore volume1F(l) with resolution l and 0:6434 found FðlÞ ¼ Fmax 1 þ 0:5676l (Eq. 24, rpore is pore r radius). pore
Allometric Power Laws for Size Distribution From among the APLs compiled in Table 1, #7 (number of clusters of entrapped oil particles in rock), #11 (size distribution of caves), and #12 (size distribution of islands, called Korčak’s law) are examples for the probability distribution (or the total number, total volume, or total mass) of geologic objects with respect to their size. While Korčak claimed that b ¼ 0.5 in his law Prob (A > a) / ab (Eq. 17), careful regression of his data gave b ¼ 0.48; Mandelbrot reported b ¼ 0.65 for the whole Earth with considerable variation for the different regions, for example, b ¼ 0.5 was found for Africa, b ¼ 0.75 for Indonesia and North America (Korvin 1992: 191; Mandelbrot 1982). Eq. (17) defines what is called in probability theory the “power law,” “hyperbolic,” or “Pareto” distribution. The power-law probability distribution p(x) / xα has a well-defined mean over x [1, 1) only if α > 2, and it has a finite variance only if α > 3. For most observed power-law distributions in Geosciences, the exponents α are such that the mean is well-defined but the variance does not exist because typically the exponent falls in the range 2 < α < 3. It is important to note that for power-law distributions p(x) ¼ Cxα for x > xmin and α > 1, the median does exist, and pmedian ¼ 21/(α1)xmin (Eq. 25). In this case, p(x) can be written a1 in the normalized form pðxÞ ¼ ax min
x xmin m
a
(Eq. 26). Its 1 m xmin x pðxÞdx
¼ moments can also be computed: hx i ¼ a1 m x for m < α 1 (Eq. 27). For m α 1, all moments a1m min diverge: in the case α 2, the mean and all higher-order moments are infinite; when 2 < α < 3, the mean exists, but the variance and higher-order moments diverge. To avoid divergences, power-law distributions can be made more tractable with an exponential cutoff, i.e., by using p(x) / xαelx (Eq. 28), where the exponential decay elx suppresses the power-law-like increase at large values of x. This distribution does not scale for large values of x; but it is approximately scaling over a finite region before the cutoff (whose size is controlled with the parameter l). CAVEAT: The geoscientist should never forget that the straightness of the plotted line is a necessary, but not sufficient, condition for the data to follow a power-law relation, and many non-power-law distributions will also appear as approximately straight lines on a log–log plot. For example, lognormal distributions are often mistaken for power-law distributions (Clauset et al. 2009, Mitzenmacher 2004, Korvin 1989). Equations and practical advice for fitting a power-law distribution, estimating the uncertainty in the fitted parameters, calculating the p-value for the fitted power-law model can be found in Clauset et al. (2009) and its very useful “Companion paper” (https://aaronclauset.github.io/powerlaws/) which provides usable code (in Matlab, Python, R and C++) for the estimation and testing routines.
Allometric Power Laws
Concluding Remarks: The Power of Allometric Power Laws As two case histories will show, APLs are a useful tool in the hand of the practicing or theoretical geoscientist. Perhaps the most beautiful practical use of APLs was made in 1985, in a reserve estimation project. J.C. Lorenz and co-workers at the Sandia National Laboratories, Albuquerque, tried to estimate the maximum width of the meanders of an ancient fluvial system (that is the present-day reservoir extension) using only the depths of paleostreams measured in wells (their work is reviewed, and referenced, in Korvin 1992: 1–3). Actually, they only used core and log data from a single well. They identified 49 fining-upward sand bodies from the meandering fluvial environment of the Mesaverde group, from the histogram of their preserved thicknesses estimated the maximal paleo-channel depth as 3–4 m (10–13 ft). To take into account postdepositional compaction of sand into sandstone, they increased this thickness by 11% which gave 3.3–4.4 m (11–13 ft) depth at the time of deposition. Then they substituted the decompacted channel depth into the empirical APL Wc ¼ 6.8h1.54 Eq. (11a, see Law #6 in Table 1). The formula gave an estimated channel width Wc ¼ 45 67m(150 220ft), plugging this into the APL W m ¼ 7:44W 1:01 (Eq. 11.b) yielded the predicted meander c belt width Wc ¼ 350 520m(1,100 1,700ft). Many empirical laws of the physics of sedimentary rock describe some physical property of the rock as a power of porosity, as, e.g., the conductivity s of fluid-filled rock satisfies s / Fm1 (Archie’s law, Eq. 29), for some rocks and some porosity ranges the permeability k satisfies k / Fm2 (Eq. 30). Such power laws are not considered allometric power laws (APL) because while the dependence is power-like, the measured quantity is a property of the whole rock, and not of its size. To find a heuristic explanation of such empirical findings, one must keep in mind that very often we can assume hierarchical (or fractal) structures and processes, and hidden APLs lying behind such power law-like behavior. Consider, e.g., the empirically observed k / Fm2 relation (Eq. 30) and try to explain it by scaling arguments (Korvin 1992: 292–294). According to the Kozeny-Carman relation: k ¼ 1b F3 ∙ S21 ∙ t12 spec (Eq. 31), where k is permeability, F porosity, Sspec specific surface of the pore space, t tortuosity, and b a constant depending on the shape of the pore-throats available for the flow. Denote by ε the smallest pore size, and consider a piece of rock of fixed size R, ε. If the total pore volume in an R-sized D rock scales with fractal dimension Dp, then Ftotal / Re p , and D D R p 3 porosity scales with ε as F FRtotal ∙R ¼ ReDp3 / em4 3 / p e (Eq. 32, in this heuristic consideration m1, m2, are the corresponding exponents). If the pore surface has fractal dimension DS, the specific surface of a piece of rock of size D 3 D R scales as Sspec / Re S =R3 ¼ ReDSS / em5 (Eq. 33). Next, consider tortuosity t (¼ hydraulic path/straight line path) over
27
distance R. If R is not too great, the hydraulic path is a planar curve, that is its fractal dimension is DS 1 (by Mandelbrot’s rule, Mandelbrot 1982: 365, Korvin 1992: 125 Eq. 2.3.1.7) and D 1 its length scales as Lhydr / Re S , so tortuosity satisfies Lhydr R DS 1 1 m6 t¼ R / e ∙R / e (Eq. 34). Plugging Eqs. (32, 33, 34) into the right-hand side of Eq. (31), we find that permeability also scales as a power of e: k / em7 ¼ ðem4 Þðm7 =m4 Þ / Fðm10 =m4 Þ (Eq. 35) (m4 ¼ Dp ≈ 3, certainly m4 6¼ 0) which explains the power-like relation k / Fm2 in (Eq. 30). Thus, fractal geometry, assumption of self-similar pores, and simple scaling arguments yield a heuristic explanation for the empirical power-law relation between porosity and permeability over a limited range of specimen and pore size. Because of the principle of equifinality, this – of course – is only one of the many possible explanations.
Cross-References ▶ Chaos in Geosciences ▶ Flow in Porous Media ▶ Fractal Geometry in Geosciences ▶ Frequency Distribution ▶ Grain Size Analysis ▶ Lognormal Distribution ▶ Pore Structure ▶ Porosity ▶ Porous Medium ▶ Probability Density Function ▶ Rock Fracture Pattern and Modeling ▶ Scaling and Scale Invariance ▶ Singularity Analysis ▶ Statistical Rock Physics
References Aczél J, Dhombres J (1989) Functional equations in several variables. Cambridge University Press, Cambridge Caquot A (1937) Rôle des matériaux inertes dans le béton. Mem Soc Ing Civils France:562–582 Clauset A, Shalizi CR, Newman MEJ (2009) Power-law distributions in empirical data. SIAM Rev 51(4), 661–703. See also its companion paper: Power-law Distributions in Empirical Data https:// aaronclauset.github.io/powerlaws/ Curl RL (1986) Fractal dimensions and geometries of caves. Math Geol 18(8):765–783 Fujiwara A, Kamimoto G, Tsukamoto A (1977) Destruction of basaltic bodies by high-velocity impact. Icarus 31:277–288 Hack JT (1957) Studies of longitudinal stream-profiles in Virginia and Maryland. US Geol Surv Prof Pap 294B:45–97 Hergarten S (2002) Self-organized criticality in earth systems. Springer, New York Horton RE (1945) Erosional development of streams and their drainage basins: hydrophysical approach to quantitative morphology. Bull Geol Soc Am 56:275–370
A
28 Johnson NJ, Kotz S (1970) Continuous univariate Distributions-2. Houghton Mifflin, Boston Katz AJ, Thompson AH (1985) Fractal sandstone pores: implications for conductivity and pore formation. Phys Rev Lett 54:1325–1328 Korčak J (1940) Deux types fondamentaux de distribution statistique. Bull Inst Int Stat 30:295–299 Korvin G (1989) Fractured but not fractal: fragmentation of the Gulf of Suez basement. PAGEOPH 131(1–2):289–305 Korvin G (1992) Fractal models in the earth sciences. Elsevier, Amsterdam Leeder MR (1973) Fluviatile fining-upwards cycles and the magnitude of palaeochannels. Geol Mag 110(3):265–276 Lorenz JC, Heinze DM, Clark JA (1985) Determination of width of meander-belt sandstone reservoirs from vertical downhole data, Mesaverde group, Piceance Greek Basin, Colorado. AAPG Bull 69(5):710–721 Lovejoy S (1982) Area-perimeter relation for rain and cloud areas. Science 216(4542):185–187 Mandelbrot B (1982) The fractal geometry of nature. W.H. Freeman & Co., New York Mitzenmacher M (2004) A brief history of generative models for power law and lognormal distributions. Internet Math 1(2):226–251 Pape H, Riepe L, Schopper JR (1984) The role of fractal quantities, as specific surface and tortuosities, for physical properties of porous media. Part Part Syst Charact 1(1–4):66–73 Rigout JP (1984) An empirical formulation relating boundary lengths to resolution in specimens showing non-ideally fractal dimensions. J Microsc 133:41–54 Scheidegger AE (1968) Horton’s law of stream numbers. Water Resour Res 4(3):655–658 Sherwood JD (1986) Island size distribution in stochastic simulations of the Saffman-Taylor instability. J Phys A Math Gen 19(4):L195–L200 Stevens CF (2009) Darwin and Huxley revisited: the origin of allometry. J Biol 8:14 Turcotte DL (1986) Fractals and fragmentation. J Geophys Res B 91: 1921–1926 von Bertalanffy L (1968) General systems theory, foundations, development, applications. George Braziller, New York
Argand Diagram James Irving1 and Eric P. Verrecchia2 1 Institute of Earth Sciences, University of Lausanne, Lausanne, Switzerland 2 Institute of Earth Surface Dynamics, University of Lausanne, Lausanne, Switzerland
Synonyms Argand plane; Cole-Cole plot; Complex plane; Gauss plane; Impedance diagram; Zero-pole diagram; Zero-pole plane; z-plane
Argand Diagram
Definition The Argand diagram denotes a graphical method to plot complex numbers, expressed in the form z ¼ x þ iy, where (x, y) are used as coordinates and plotted on a plane with two orthogonal axes, the abscissa representing the real axis and the ordinate the imaginary axis.
Argand Diagram The Argand diagram is a way to plot a complex number in the form of z ¼ x þ iy, using the ordered pair (x, y) as coordinates and p where the constant i represents the imaginary unit, i.e., 1. The Argand diagram (or plane) is therefore defined by two orthogonal axes where the abscissa refers to a real axis and the ordinate to an imaginary axis. Historical Origin of the Argand Diagram The family name attached to the Argand diagram designates that this geometric representation of complex numbers is credited to Swiss mathematician Jean-Robert Argand, who was born in Geneva in 1768 and died in Paris in 1822. Indeed, this graphical method has been detailed in Argand (1806). Nevertheless, Howarth (2017) suggested that the plot was first introduced by Caspar Wessel (1799), a Norwegian-Danish mathematician. Wessel (1745–1818) presented his work about the analytical representation of the direction of numbers to the Academy of Science in Copenhagen on March 10, 1797; this contribution was printed in 1798 and finally published in 1799. Nevertheless, Valentiner (1897), in the first preface of the French translation of Wessel’s essay, noted that Carl Friedrich Gauss (1777–1855) arrived at the same idea in the same year (1799), but without citing any specific source. This remark led some researchers to refer to the Argand plane as the Gauss plane. In the same preface, Valentiner also acknowledged a British mathematician, John Wallis (1616–1703). It seems that Wallis proposed a similar approach, even earlier than Wessel, in his “Treatise of Algebra,” published in London in 1685. Valentiner noted that Wessel’s essay includes “a theory as complete as Argand’s, and in my opinion, more correct [. . .]” (Valentiner, 1897, p. iii). Regarding the various synonyms for the Argand diagram, such as the complex z-plane, zero-pole diagram, or zero-pole plot, they were introduced during the twentieth century, mostly by mathematicians and geophysicists. For example, according to Howarth (2017), the concept of “complex z-plane,” with z referring to z ¼ x þ iy, was coined by the American mathematician William Fogg Osgood at the turn of the twentieth century (Osgood 1901) and used later by engineers studying signal processing as the “zero-pole diagram” and “zero-pole plot.” In geophysics, the concept of “zero-
Argand Diagram
29
A
Argand Diagram, Fig. 1 Left, representation of an imaginary number (z), and its conjugate ðzÞ , as ordered pairs (x, y), and (x, y), on an Argand diagram, respectively. The x-axis is the real axis, whereas the
pole” usually refers to the z-transform of a discrete impulse response function when it reaches infinite values (Buttkus 2000).
y-axis refers to the imaginary axis. Right: representation of the same numbers in terms of the modulus (M ) and the angle (θ) as polar coordinates (M, θ)
cosðyÞ ¼
eiy þ eiy eiy eiy and sinðyÞ ¼ 2 2i
ð2Þ
we arrive at The Argand Diagram: A Geometric Plot The Argand diagram is a kind of Cartesian plane upon which an imaginary number, z ¼ x þ iy, is represented with its real part along the x-axis and its imaginary part along the y-axis. It provides a geometrical way of locating imaginary numbers (Fig. 1, left). Therefore, any complex number can be recognized by the coordinates of a unique point in the diagram according to its ordered pair (x, y). Any complex number with a null real part (z ¼ 0 þ iy) will lie on the vertical imaginary axis, whereas any real number (z ¼ x þ i0) will lie on the real horizontal axis. The conjugate of a complex number, i.e., z ¼ x iy, can be easily represented by its symmetric location along the real x-axis (Fig. 1, left), i.e., by the (x, y) coordinates on the Argand diagram. The distance from the origin (0, 0) to a given point (x, y) on the Argand plane is known as the modulus of z (or its absolute value), and is usually denoted by r, M , or |z|. The modulus is easily calculated as jzj ¼ x2 þ y2 . Complex numbers can also be expressed in polar form on the Argand diagram, i.e., using the modulus (M) and the angle (θ) (also called the argument Arg(z)) as polar coordinates, the angle (θ) being measured along the positive direction of the real x-axis. Consequently, the ordered pair (x, y) is replaced by the pair (M, θ) as illustrated in Fig. 1 (right). The ordered pair (x, y) can now be calculated as: x ¼ M: cosðyÞ and y ¼ M: sinðyÞ
ð1Þ
Therefore, z ¼ M. cos (θ) þ i. M. sin (θ). Using Euler’s formulae expressing the relationshippbetween trigonometric functions, e (Euler’s number) and i ( 1 number), i.e.,
z ¼ MðcosðyÞ þ i: sinðyÞÞ ¼ M
eiy þ eiy eiy eiy þ i: 2 2i
¼ M:eiy If z ¼ M. eiθ, then z ¼ M:eiy . Eq. (1):
x2 þ y2 Following
y sin y y ¼ ¼ tan y ) y ¼ arctan x cos y x
ð3Þ
Some Applications of the Argand Diagram in Geosciences In many areas of the geosciences, complex numbers play an important role, and the Argand diagram can serve as a useful tool for data analysis. A wide range of Earth materials, for example, possess dispersive material properties, whereby the steady-state material response to a sinusoidal forcing depends on the frequency at which the forcing is applied. Due to various relaxation phenomena, both the amplitude and phase of the response will vary with frequency, meaning that the corresponding physical property, which links the response and forcing through a constitutive equation, takes the form of a complex and frequency-dependent variable in the Fourier domain. One commonly used formulation to describe such behavior is the Cole-Cole model (Cole and Cole 1941). Originally proposed to describe the relaxation behavior of the dielectric permittivity, it is given by
30
Argand Diagram
Argand Diagram, Fig. 2 Plot of the imaginary versus real part of the complex dielectric permittivity corresponding to the Cole-Cole model, for different values of α. Frequency was varied from 1 Hz to 1020 Hz, with fixed parameters ε1 ¼ 5 F.m1, ε0 ¼ 80 F.m1, t ¼ 1010 s
e0 ðoÞ ie00 ðoÞ ¼ e1 þ
es e1 1 þ ðiotÞa
ð4Þ
where ε0(o) and ε00(o) are the real and imaginary parts of the permittivity [F.m1], ε1 and εs are the high- and lowfrequency limiting values of the real part, o is the angular frequency [s1], t is a relaxation time constant [s], and α is a unitless time-constant distribution parameter that varies between 0 and 1. Plots on an Argand diagram of complex permittivity measurements made at different frequencies provide insight into the nature of the relaxation (Fig. 2). For instance, α ¼ 1 in the Cole-Cole equation describes a Debye-type relaxation response involving a single polarization mechanism, and the complex values plot along a semicircular arc whose intersections with the real axis correspond to ε1 and εs. Lower values of α result in Argand plots with the same real-axis intersections but lower complex amplitudes, indicating a distribution of relaxations within the considered frequency range. Multiple polarization mechanisms whose relaxations are well separated in frequency may appear as more than one arc. Similar analyses using the Cole-Cole and other related models have been performed for the electrical conductivity as well as for various elastic moduli. Argand diagrams have also proven highly useful in the context of linear time-invariant (LTI) system identification and digital filter design, for which there are many important applications in the geosciences. In the z-transform domain, the input X(z1), output Y(z1), and impulse response H(z1) of a discrete LTI system are related as follows: Y z1 ¼ H z1 :X z1
F z1 ¼
n¼1
f ðnT Þzn
H z1 ¼
Aðz1 Þ Bðz1 Þ
ð7Þ
where A(z1) and B(z1) correspond to moving-average and feedback operations, respectively; the positions of the roots of A(z1) (zeros) and those of B(z1) (poles) in the complex plane with respect to the unit circle can be used to understand the system frequency response as well as design effective digital filters for environmental data.
ð5Þ
where the z-transform of time series f(t) with sampling interval T is given by 1
Inspection of the complex roots of the z-transform polynomials in Eq. (5) in the Argand plane can provide important information regarding the system impulse response. De Laine (1970), for example, suggested that the unit hydrograph for a catchment rainfall-runoff model could be estimated via comparison of the roots of the runoff polynomials corresponding to different storm events. Runoff roots having similar locations in the Argand plane were assumed to correspond to the catchment system and not the input. Turner et al. (1989) similarly suggested that roots for a single runoff time series that fell along a “skew circle” in the Argand plane could be attributed to the unit hydrograph and thus allow for its identification. In seismic data analysis, examination of the roots of H(z1) also has important implications for the stability of inverse filtering operations such as deconvolution; if the roots of the wavelet polynomial under consideration lie outside the unit circle |z| ¼ 1, the deconvolution operation will be unstable. Further, if one considers of the LTI system as a rational function in the z-transform domain, i.e.,
ð6Þ
Summary The Argand diagram provides a means of plotting a complex number in the form of z ¼ x þ iy, using the ordered pair (x, y) as coordinates and p where the constant i represents the imaginary unit, i.e., 1 . The Argand diagram (or plane) is
Artificial Intelligence in the Earth Sciences
therefore defined by two orthogonal axes, a real one (abscissa) and an imaginary one (ordinate). In many areas of the geosciences, complex numbers play an important role, and the Argand diagram can serve as a useful tool for data analysis. Examples are provided regarding dispersive material properties, linear time-invariant (LTI) system identification, and digital filter design, as well as seismic data analysis.
31
Regardless, this difficulty has not prevented philosophers, artists, writers, filmmakers, scientists, engineers, and/or earth scientists from imagining what an artificially intelligent device might be like, how it might be constructed, what it might be used to do, and what dangers it might pose.
Introduction Bibliography Argand JR (1806) Essai sur une manière de représenter les quantités imaginaires dans les constructions géométriques. Mme Veuve Blanc Edn, Paris Buttkus B (2000) Spectral analysis and filter theory in applied geophysics. Springer, Berlin Cole KS, Cole RH (1941) Dispersion and absorption in dielectrics. J Chem Phys 9:341–351 De Laine RJ (1970) Deriving the unit hydrograph without using rainfall data. J Hydrol 10:379–390 Howarth RJ (2017) Dictionary of mathematical geosciences. SpringerNature, Cham Osgood WF (1901) Note on the functions defined by infinite series whose terms are analytic functions of a complex variable; with corresponding theorems for definite integrals. The Annals of Mathematics 31:25–34 Turner JE, Dooge JCI, Bree T (1989) Deriving the unit hydrograph by root selection. J Hydrol 110:137–152 Valentiner H (1896) Première préface. In: Wessel C Essai sur la représentation analytique de la direction, French translation, Host, Copenhague, pp III–X Wallis J (1685) Treatise of algebra. John Playford printer, London Wessel C (1799) Om Directionens analytiske Betegning et Forsog, anvendt fornemmelig til plane og sphæriske Polygoners Oplosning. Nye Samling af det Kongelige Danske Videnskabernes Selskabs Skrifter 5:469–518
Artificial Intelligence in the Earth Sciences Norman MacLeod Department of Earth Sciences and Engineering, Nanjing University, Nanjing, Jiangsu, China
Definition There are many definitions of artificial intelligence (AI), but perhaps it is most commonly understood to mean any machine, computer, or software system that accepts information and performs the cognitive functions humans associate with intelligence, such as learning and problem-solving. Part of the difficulty in defining AI more precisely is that no common, universally accepted, formal definitions of the core concepts needed to define human intelligence exist (e.g., thinking, intelligence, experience, consciousness).
Few can have failed to notice the plethora of news bulletins, editorials, perspectives, symposia, conferences, course offerings, lectures, and the like that, over the past few years, have announced the imminent arrival of artificially intelligent machines. These are portrayed as being designed to assist with the tasks traditionally assigned to human workers in fields that, thus far, have proven resistant to automation. While not muted (so far) as a replacement for scientists, artists, or senior managers, presumably because of the creativity required by these fields, there is widespread agreement among employers that many repetitive tasks could, and perhaps should, be given over to automation if possible. In addition to creativity, scientific research often involves complex, laborious, and time-consuming searches through datasets, library references, catalogues, image sets, and, increasingly, online resources for bits of information critical to the resolution of a particular problem at hand or hypothesis test. In principle, many researchers would welcome access to automated intelligent agents that could conduct such searches quickly, systematically, and objectively, thus freeing themselves, their colleagues, their assistants, and their students to focus on tasks better suited to their training, knowledge, and core interests. The Massachusetts Institute of Technology’s recent announcement that a team of its medical researchers used a deep-learning system named “Hal” successfully to find new generalized antibiotic molecules that kill a number of the most common disease-causing bacteria after screening more than 107,000,000 candidate molecules represents an interesting case in point, especially insofar as the material discovered often exhibited a form structurally divergent from conventional antibiotics (Stokes et al. 2020). Consequently, current debates over artificial intelligence (AI) techniques and technologies are as relevant – though perhaps not quite as threatening – to research scientists as they are to professionals in other areas of human activity. Curiously, the earth science community appears to be of two minds regarding AI. On one hand, the use of computer software designed to process and identify patterns in earth science data without having to be programmed explicitly to do so is old hat in many fields, so ubiquitous it hardly bears comment. On the other hand, some recent articles have claimed earth science research is either unusually susceptible to the misuse of AI (e.g., Ebert-Uphoff et al. 2019) or focuses on problem classes inherently resistant to AI-based
A
32
approaches (e.g., Anonymous 2019; see also Reichstein et al. 2019). Instead, these critics advocate a vaguely defined partnership between traditional “physical process models” and artificial expert systems. At present, the focus of most new AI applications in the earth sciences has centered on machine learning systems and software, especially the newer “deeplearning” artificial neural network architectures. However, machine learning is only one aspect of the much larger artificial intelligence field. In order to determine whether AI has a past, present, or future in earth science research, a more general review is required.
What Is Artificial Intelligence? The earliest written record of an AI device was the character of ΤΑΛΩΝ or Talon (also Talos) which was a giant robot made of bronze, gifted to Europa by Zeus, that circled the island triannually and threw stones at approaching ships to protect the island from invaders. Modern variations of the Talon myth can be found in Homer’s description of Hephaestus’ serving devices in the Iliad, the Jewish parable of the Prague golem, Mary Shelly’s Frankenstein, Arthur C. Clark’s HAL from 2001: A Space Odyssey, Ridley Scott’s dystopian epic Blade Runner (based on the Philip K. Dick novel Do Androids Dream of Electric Sheep?), and latter installments of the Terminator film franchise. The first computational steps toward realizing artificial intelligence were taken by Alan Turing, the mathematician and computer-design pioneer who understood that binary arithmetic could be used to perform any conceivable act of mathematic deduction. This insight led to the Church-Turing thesis (Church 1936; Turing 1937, 1950) which was a key advance that ultimately enabled digital computers to be designed. At the same time, several researchers recognized the similarity between Turing’s switch-based computational system and the operation of mammalian neurons which are the fundamental units of the human nervous system. This transfer of insight across disciplines led to Warren McCulloch and Walter Pitt’s (1943) proposal of a design for a Turingcomplete artificial neuron and the possibility of constructing a complete, artificial, electronic brain. In the wake of these midtwentieth-century developments, the MIT and Stanford University computer scientist John McCarthy coined the term “artificial intelligence” to describe machines that can perform tasks characteristic of human intelligence such as planning, understanding language, recognizing objects and sounds, learning, and problem-solving (McCorduck 1979). McCartney’s list of the tasks or capabilities that define AI operations is still reflected in the major components of this field (Fig. 1).
Artificial Intelligence in the Earth Sciences
Development and Activity Domains Two broad classes of AI are recognized: narrow AI and general AI. Narrow AI includes all devices and systems that have some characteristics of human intelligence. Systems that exhibit narrow AI have existed at least since the late 1950s (e.g., Samuel 1959), and it is devices of this sort that are being referred to in contemporary media reports of specific, realized AI applications. The list of fields in which narrow AI systems have, or could be, applied is very long indeed. No general AI system currently exists, in part owing to lack of agreement of the characteristics such a system would need to possess in order to be so designated, but also owing to the broad assumption that genuine general AI system would need to be self-aware or exhibit consciousness. The basis of fear over the capabilities of a general AI system can be traced to the assumption that a conscious system would seek to establish local or general dominance in the sense many social biological species do. But as Pinker has noted, dominance is usually associated with mammalian male behavior patterns and is derived from genetic strategies that have no necessary correspondence to AI system development goals (https:// bigthink.com/videos/steven-pinker-on-artificial-intelligenceapocalypse?jwsource¼cl). Conscious AI systems could, just as easily, adopt a female perspective and prioritize problemsolving along with social coherence. Six activity domains are commonly recognized by AI researchers. These are identified principally by the problems associated with each rather than by the technologies or algorithms that happen to be associated with attempts to address these problems at any given time. Activity has waxed and waned within and between each of these domains over the course of the past 75 years as new approaches were tried and met either with success or failure. Indeed, the methods employed in the current most active domain – machine learning – are being applied, in whole or in part, to all of the other domains at present. Since this domain is the subject of a separate article in this volume, only a very abbreviated description of its core concepts and scope will be provided here. However, it is important to note that the inherent topical disparity that exists within this field has, at certain times, been as much a hindrance as a help to its overall development. In her history of AI, Pamela McCorduck (1979) noted that, soon after its formulation, AI shattered into subfields, the practitioners of which “would hardly have anything to say to each other” (p. 424). This self-inflicted breach of cohesion and collegiality, along with a tendency to overpromise and under-deliver to funders, exposed AI research to criticisms that led to a reduction in both academic interest and extraacademic support funding over two successive intervals. The first of these “AI winters” extended from 1974 to 1980 and was precipitated by the very damaging US Automatic Language Processing Advisory Committee (ALPAC) report into
Artificial Intelligence in the Earth Sciences
33
A
Artificial Intelligence in the Earth Sciences, Fig. 1 Schematic representation of the primary domains of contemporary artificial intelligence research and applications. A non-exhaustive listing of some
subordinate activity domains has been provided as a rough indication of domain diversity as well as relative activity levels
34
prospects for machine translation (Pierce et al. 1966) and the report of Sir James Lighthill (1973) into the state of AI research in the UK. The second AI winter, from 1987 to 1995, was caused by the collapse of the commercial market for AI-based, specialist expert system computers, especially those based on the LISt processor (LISP machines) that employed a syntax close to that of natural language to manipulate source code as if it were a data structure. Arguably, these difficulties for the whole of the AI field were not addressed successfully until the advent of convolution-based artificial neural networks in the late 1990s (e.g., LeCun et al. 1998). It is from this point that the present revival of AIs fortunes has stemmed. Expert Systems Expert systems are computer systems that accept information, usually in the form of alpha-numeric data, and employ various processing strategies to address problems via autonomous reasoning based on relations known, or suspected, to exist in a set of data rather than by passing the data through a set of deterministic mathematical procedures. Expert systems were among the first successful AI applications, proliferating commercially in the late 1970s through 1980s via sales of dedicated computer systems based on the LISP family of computer languages, all of which are based on Alonso Church’s (1936) development of lambda calculus. Interestingly, LISP is the second oldest computer language, the first versions of which were released in 1958, just 1 year after the release of the first FORTRAN compilers (Wexelblat 1981). Typically expert systems are organized into a knowledge base and an inference engine. The knowledge base contains definitions, facts, rules, and systems of weights that govern how the rules are applied in particular situations. These rules can be provided by human experts based on their personal or collective knowledge or transferred from the inference engine based on the system’s experience processing datasets. In this way, the knowledge base can grow and improve the system’s performance giving the system, overall, a degree of machine learning capability. The inference engine applies the knowledge base to data and, by virtue of relations identified within and between datasets, can infer new facts/rules that, in turn, can be passed back to the knowledge base. Classic expert system designs represent knowledge, for the most part, via specification of if-then rules. In this sense, expert system designs resemble aspects of decision trees. Many information-technology historians date the introduction of expert systems to 1965 and the Stanford Heuristic Programming Project whose original application was to identify novel organic molecules based on their mass spectra with a rule set derived from human expert knowledge of organic chemistry. In 1982, the Synthesis of Integral Design (SID) system was the first system to generate a set of logical routines sufficiently extensive to outperform expert logicians. Despite
Artificial Intelligence in the Earth Sciences
these milestones, however, the performance predictions made by advocates of expert systems failed to be realized and were recognized to have fallen short of expectations by the late 1980s. The reasons behind the inherent limitations in expert system designs had been studied and explained by Karp (1972). But in the early optimism over AI his concerns were discounted or ignored. Nevertheless, failure to meet expectations and deliver on commercial investments in expert systems caused a collapse of the market for specialist systems by the late 1980s, a situation from which classical LISP-based expert systems never really recovered, though aspects of expert system design continue as codified algorithms in many business-analysis applications. Hayes-Roth et al. (1983) published an extensive summary and classification of expert system designs which remains a standard reference. Natural Language Processing The very large field of natural language processing (NLP) includes all aspects of the theory and application of the ability of humans and computers to interact with one another using natural spoken or written languages. While machines that speak and understand what is being said when spoken to have been a staple of science fiction for over a century, sophisticated and effective NPL applications are familiar to most people today in the form of personal computer assistants such as Apple Corporation’s Siri, Google’s Google Assistant, and Amazon’s Alexa, in addition to a plethora of online automated technical service and customer support systems that respond to typed text. This field represents the intersection between linguistics, computer science, information engineering, and AI. Alan Turing (1950) regarded NPL as a primary goal of AI in the form of his “imitation game” – later referred to as the “Turing test” – for the degree to which any machine can be judged to demonstrate intelligent behavior. Turing’s test has been criticized for confusing the concepts of thinking and acting, but it remains a well-known, popular, and (so) important AI benchmark. This concept probably inspired, at least in part, the so-called Georgetown experiment a few years after Turing’s article appeared when, in 1954, an IBM 701 mainframe computer successfully translated 60 Russian sentences into English based on a dictionary of 250 words and a knowledge base of six rules (Hutchins 2004). Not only did this demonstration receive wide media coverage, it stimulated government and commercial interest in NPL, and in AI generally, prompting some to predict generalized systems for language translation could be produced by 1960. When this goal had not been achieved by the mid-1960s, it prompted a withdrawal of US government funding, precipitating the first AI winter. Natural language processing systems were based originally on expert-determined rule sets. This design was replaced in the late 1980s by a statistical approach in which
Artificial Intelligence in the Earth Sciences
35
A
Artificial Intelligence in the Earth Sciences, Fig. 2 Example of a typical hidden Markov model. In this situation, the task is to infer the state of the weather by collecting data on what a friend who lives in another city is doing. Since you cannot observe the weather there, that layer is hidden. But if probabilities can be assigned to various weather-
dependent activities, its state can be inferred by collecting information about the friend’s activity. If the friend spent the day shopping, there’s a 40% chance it was raining. But if they went for a walk, there’s a 60% chance it was sunny. (Redrawn from an original by Terrance Honles: see https://commons.wikimedia.org/wiki/File:HMMGraph.svg)
rules were learned by systems analyzing large sets of documents and could be applied in a probabilistic manner that incorporated information from context. Modern NPL systems are quite sophisticated and are gaining in accuracy, and popularity, with each passing year. Amazon’s Alexa is representative of the best of the current crop of NPL devices. Alexa records ambient sounds (¼ listens) continuously and subdivides spoken works into individual phonemes. When the “wake” word “Alexa” is detected, the statement that follows is shunted to a very large database of word pronunciations to find which words most closely correspond to the combinations of sounds. A second (huge) database of likely phrases is then searched to infer the most likely word order using a hidden Markov model (Fig. 2; also see Rabiner 1989). Once the statement has been reconstructed, the Alexa system searches the statement for important or key terms that signal instructions to perform particular tasks. All these searches, inferences, and reconstructions are performed in real time using cloud computing. Responses to spoken commands are formulated in the same manner, but in reverse order. Owing to the combination of deep learning with NLP, along with the number of times they are used by system subscribers, the performance of these systems is improving rapidly. Moreover, the recent release of 5G network technology will enable NPL systems such as Siri, Google Assistant, Alexa, and others to communicate with an ever-increasing array of devices so they can, effectively, serve as intermediaries for two-way human-device communication. This
capability is set to have an extremely wide range of applications, including scientific research. Machine Learning Without question, machine learning (ML) is the AI activity domain that has received the most attention over the last two decades, driven largely by a revival of interest in artificial neural networks (Only the basics of machine learning will be covered in this section. The reader is directed to the companion article in this volume a for additional information.). Formally ML is the study and application of algorithms that enable computer systems to lean and improve performance in specific task areas without being programmed explicitly to do so. This is an algorithmically diverse and mathematically complex field that, to this author’s way of thinking, has its conceptual origins in multivariate data analysis, especially eigenvector-based methods such as principal component analysis (ordination, dimensionality reduction) and multigroup discriminant analysis (classification). The techniques and search strategies employed by current machine learning systems are far removed from these origins, and, given adequate data, their performance exceeds those delivered by these original and more primitive approaches by a very substantial margin. But on a conceptual level the goals are largely the same. Key to understanding the core structure of ML is the difference between unsupervised and supervised learning. Both approaches require a set of representative data used to
36
train algorithms to identify regularities and differences among subgroups contained therein. Like all statistical samples, these training data must (ideally) constitute an unbiased, representative summary of patterns present in the parent population of interest, at least at the level of resolution necessary to address the problem at hand. In most cases, owing to the complexity of the machine learning algorithms and the number of variables that must be estimated, the size of the training dataset must be large; uncommonly large in many instances though this depends on the distinctiveness of the subgroups included in the population of interest, or lack thereof. In any unsupervised learning problem, only the training dataset is input and the task is twofold: (1) determination how many different groups of observations are present in the training data and (2) identification of characteristic patterns of variation within and between these subgroups that will allow unknown observations to be associated with the correct subgroup. Under a supervised learning design, the classification, or pattern of group membership, present in the training data is known before training is undertaken; this information is supplied to the system at the outset of training, and only the second learning task is performed. Under both approaches, assessments of training performance and/or evaluations of the training dataset are made post hoc, preferably via reference to a new or validation dataset whose members were not part of the training process (If necessary, it is permissible to approach the estimation of post-training performance using a jackknifed testing strategy though this cannot be considered as rigorous a test and might be impractical owing to the time required to train ML systems such a large number of times in succession.). Some ML systems employ other learning designs (e.g., semi-supervised learning, reinforcement learning, self-learning, feature learning, transfer learning), but most adopt either a supervised or unsupervised approach with the former being more common at present. A very large and diverse array of algorithm families are currently being employed in ML systems (e.g., logistic regression, Bayesian networks, artificial neural networks, decision trees, selforganizing maps, deep-learning networks). Each has advantages and disadvantages in particular situations, a discussion of which is beyond the scope of this article. Computer Vision This activity domain involves more than locating and identifying images, or parts thereof, tasks that properly fall into the domain of machine learning. Rather, computer vision involves imbuing machines with the capacity to understand image-based information and the ability to make decisions or take actions based on that understanding. Specific skills associated with this domain include scene reconstruction, event detection, tracking, object recognition, pose estimation, motion estimation, indexing, and image restoration along with AI.
Artificial Intelligence in the Earth Sciences
Since humans have particularly well-developed visual systems, this has been one of the more attractive of the AI activity domains. Interest in computer vision grew out of research into image processing and was funded originally by national militaries interested in the problem of automatically detecting and identifying hostile aircraft. From a strictly AI perspective, however, this field was considered a developmental priority because of its strong link to the issue endowing intelligent devices with ability to move about in local environments autonomously. While vision is but one of the five human senses, the problems associated with its artificial manifestation are, essentially, the same for all the other senses (e.g., signal extraction, boundary recognition, identification, cognition). Therefore, research into the development of computer (or machine) vision has direct implications for the development of other machine senses. Drawing from its origin in image processing, research into computer vision began in the 1960s with algorithms designed to locate the boundaries and establish the forms of objects in two-dimensional (2D) scenes, but quickly progressed to the inference of three-dimensional (3D) structure from clues provided by shading, texture, and focus. Rapid reconstruction of 3D scenes was aided greatly via the importation of techniques drawn from mathematical bundle adjustment theory (Triggs et al. 1999; Hartley and Zisserman 2004), photogrammetry (Wiora 2001; Jarve and Liba 2010), and graph cut optimization (Boykov et al. 2001; Boykov and Kolmogorov 2003). More recently, this domain has taken a substantial leap forward as a result of developments in image-based, deeplearning subsystems and their incorporation into computer/ machine vision systems. Contemporary, state-of-the-art, computer vision systems have found wide, and increasingly diverse, application in manufacturing system-control and quality-assurance, object identification, surveillance and law enforcement, autonomous vehicle/robot, and military applications. Indeed, these systems have reached the point where, in some respects, their performance is vastly superior to that of (even) expert capabilities (e.g., the sorting of object images into large numbers of fine-grained categories; see Culverhouse et al. 2013), whereas in others they still struggle with tasks humans find trivial (e.g., recognition of small of thin objects). Planning Systems Many human activities consist of sequences of actions that, ideally, are directed toward the attainment of a particular goal and, operationally, must be performed in a particular sequence. When the goal is specific (e.g., arrival at a particular destination from a given starting point), the domain over which the plan must be executed limited (e.g., must utilize local road systems or in urban areas, sidewalks, tram lines, or underground railways), and the optimality criterion known (e.g., shortest route, quickest travel time), planning can be a
Artificial Intelligence in the Earth Sciences
37
Artificial Intelligence in the Earth Sciences, Fig. 3 A simple Markov decision process with three states (red), two actions (green), and two rewards (blue). Note the greater dependency of state S2 on states S0 and S1. (Redrawn from an original by Waldo Alvarez: see https:// commons.wikimedia.org/wiki/ File:Markov_Decision_ Process.svg)
relatively simple exercise. However, when this problem is extended over multiple domains (e.g., multiple destinations, opening/closing times of various attractions along the way) whose characteristics might not be well established and whose states are subject to conditions that may change while the plan is being executed (e.g., traffic), determination of the optimal set and sequence of actions can be very complex. When the set of domains over which the plan must be executed is unknown, the challenge of determining optimal solutions unaided can be daunting. This problem category represents the activity domain of AI-empowered planning systems. A number of generalized strategies have been developed to address planning problems which are fundamental to a (growing) range of different industries (e.g., logistics, workflow management, career progression, robot task management). For example, the range of possible actions and their consequences can be modeled by a network or as a surface and the optimal path found deterministically via constrained optimization (Bertsekas 2014; Verfaillie et al. 1996), Markovian decision processes (Burnetas and Katehakis 1997; Feinberg and Shwartz 2002, Fig. 3), or trial and error simulations (Kirkpatrick et al. 1983) with alternative routes found, by any of these approaches, as and when information pertinent to the plan’s execution and derived from local or remote sensors changes. Alternatively route domains characterized by features that may not be fully observable can be planned using a partially observable Markovian decision process (POMDP) approach (Kaelbling et al. 1998). These procedures can be embedded in a number of different planning
A
scenarios (e.g., classical planning, probabilistic planning, conditional planning, conformant planning). Robotics Owing to the unique problems associated with the construction of machines that can move about in the environment and manipulate objects autonomously, robotics is the AI activity area that has seen the strongest mechanical engineering input. This activity area represents the intersection of electronic engineering, bioengineering, computer science, nanotechnology, and AI. The first widely acknowledged autonomous robot was developed by George Devol, based on a patent application filed in 1954 (approved in 1961) for a single arm, stationary robot. This industrial robot, the Unimate, relied on a unique rotating drum memory module to perform a number of routine tasks. First employed to take die cast auto parts from a conveyor belt and weld them to auto bodies at General Motors’ Inland Fisher Guide Assembly Plant in New Jersey, the original Unimate, along with the more flexible Programmable Universal Manipulation Arm (PUMA) system, could perform many other tasks. Today, robots are so common in such a wide variety of manufacturing, construction, agricultural, law enforcement, consumer service, entertainment, emergency/rescue service, medical, and military industries, employ such a wide variety of technologies, and exhibit such an extreme range of designs; the idea of listing their commonalities might seem hopeless. However, all extant robots do share certain features with the
38
original Unimate: (1) mechanical construction, (2) electrical components to power the mechanical parts, and (3) computer programs to control the robot’s movements and/or responses to its environment. Not all robots employ AI. Indeed, many simply perform sets of preprogrammed movements activated by remote human control. These devices are better thought of as automations rather than robots per se. However, AI-enabled robots are becoming more common with each passing year and are increasingly able to respond autonomously to changes in local conditions as those are transmitted to the system from various sensors located both on the robot’s body and remotely via wireless communication links. Fully autonomous, mobile, multipurpose, self-actuated robots remain a science-fiction fantasy at the moment. But a veritable international army of well-funded technologists are working diligently toward the ultimate goal of assembling such a device. Already there is growing concern among a variety of philosophical, civil rights, legal, regulatory, public health, law enforcement, and military bodies with regard to the opportunities and concerns the existence of this category of machines will embody. Such concerns are even beginning to be raised within the earth sciences (e.g., Wynsberghe and Donhauser 2018). These concerns are, to some extent, characteristic signs of contemporary interest in the entire subject of AI. Regardless, it is difficult to deny the symbolic role increasingly common, and increasingly autonomous, robots will have in shaping this public debate, irrespective of the many uses, services, capabilities, and economic savings they will provide.
Applications in the Earth Sciences All forms of AI are currently being employed across the broad range of earth scientist research projects, and earth science researchers have an excellent historical record of taking advantage of the latest developments in this field. Among the first applications of AI technology in a geological context were the PROSPECTOR (Hart and Duda 1977; Hart et al. 1978) and muPETROL expert systems (Miller 1986, 1987, 1993). Both were Bayesian LISP-based systems programmed originally for IBM-PC platforms. PROSPECTOR was a mineral exploration system and was based on the analysis of geochemical data associated with three different types or mineral deposits. It remains available today as PROSPECTOR II (McCammon 1994). The muPETROL system focused on petroleum resource exploration and was based on Klemme’s (1975, 1980, 1983) basin classification system with a set of consensus inference rules devised by experienced petroleum exploration geologists. Subsequent geological expert systems include DIPMETER ADVISOR (Carlborn 1982; Smith and Baker 1983), Fossil (Brough and Alexander 1986), EXPLORER (Mulvenna et al. 1991),
Artificial Intelligence in the Earth Sciences
ARCHEO-NET (Fruitet et al. 1990), and Expert (Folorunso et al. 2012). Listings and descriptions of additional geological expert systems can be found in Aminzadeh and Simaan (1991) and Bremdal (1998). Expert systems are also available for a number of other earth science disciplines, including oceanography (e.g., Nada and Elawady 2017), atmospheric science (e.g., Peak and Tag 1989), geography (Fisher et al. 1988), and biology (Edwards and Cooley 1993 and references therein). A number of NLP initiatives have been funded and/or are currently underway in the earth sciences. An initial, albeit limited, attempt to develop a system to automate the extraction of biodiversity data from taxonomic descriptions was developed by Curry and Connor (2007). This system, which included information on the stratigraphic ranges of fossil taxa, was based on the XML markup language which was used to annotate relevant terms, assignments, names, and other taxonomically relevant bits of information in taxonomic descriptions. Once annotated, these data could be migrated quickly, easily, and reliably into relational databases where appropriate data extractions, integrations, and analyses could take place. While the system proposed by Curry and Connor (2007) fell well short of a fully automated NPL system, it did demonstrate the importance of expert-level data annotation. Annotated data of precisely this sort today serve as training datasets for fully automated NPL systems, allowing them to learn how to identity different aspects of text passages to extract and tag for database migration. A more complete and up-to-date discussion of NPL techniques applied to the systematics and taxonomy literature has been provided by Thessen et al. (2012). A sophisticated and successful fully NPL initiative of this type was the PaleoDeepDive project (Peters et al. 2014), which focused on the extraction of fossil occurrence data from the English, German, and Chinese language paleontological literature. This system employed NPL tools developed for other purposes (e.g., Tesseract and Cuneiform for text, Abbyy Fine Reader for tables, StanfordCoreNLP for linguistic context) to define factor graphs that were then used to aggregate variables (names, stratigraphic ranges, geographic occurrences) into meaningful combinations. The dataset used to train the PaleoDeepDive system was the Paleobiology Database (http://paleobiodb.org) which, at the time of training, consisted of over 300,000 taxonomic names and 1.2 million taxonomic occurrences, the majority of which had been extracted by hand from over 40,000 publications. This system was successful in extracting 59,996 taxonomic occurrences from a collection of literature sources that had not been entered into the Paleobiology Database previously. In another, unrelated, initiative, ClearEarth (https://github. com/ClearEarthProject) is a US National Science Fund project established to bring computational linguistics and earth science researchers together to experiment with ways to
Artificial Intelligence in the Earth Sciences
39
extract and integrate data from a variety of sources for the purpose of supporting research within and across the disciplinary fields of geology, ecology, and cryology. ClearEarth employs the ClearTK package of NPL routines, which is a well-established system for processing medical patient notes that is used on a routine basis by a large number of institutional health providers (Thessen et al. 2018). In essence, ClearEarth represents an attempt to port a mature and proven NPL technological infrastructure to aspects of earth science where, it is hoped, a similar penetration of institution-level integration and usage will result. Since 2016, ClearEarth has focused on the expert-informed markup of a well-annotated corpus of earth science text sources that will be used to train the ClearEarth system. To date, this project has released annotation guidelines for sea ice and earthquake events as well sponsoring a “hackathon” in 2017 at the University of Colorado (Boulder) where bio-ontologies were produced and extended. Additional NPL earth science research and management areas that have undertaken NLP-based initiatives include oceanography (e.g., Manzella et al. 2017) and geography (e.g., Lampoltshammer and Heistracher 2012 and references therein). Machine learning applications in the earth sciences are very common and have received a substantial boost over the past decade with the arrival of high-performance convolution neural network and deep-learning architectures. Examples will be reviewed in the companion to this article. For the sake of completeness though, a few references to reviews and initial trials of ML applications in various earth science fields are provided for the fields of stratigraphy (Zhou et al. 2019), sedimentology (Demyanov et al. 2019),
paleontology (Monson et al. 2018), mineralogy (Caté et al. 2017), petrology (Rubo et al. 2019), geophysics (Russell 2019), archaeology (Nash and Prewitt 2016; MacLeod 2018), geography (Kanevski et al. 2009), systematics and taxonomy (MacLeod 2007), biology (Tarca et al. 2007), ecology (Christin et al. 2019), oceanography (Dutkiewicz et al. 2015), glaciology (Zhang et al. 2019), atmospheric sciences (Ghada et al. 2019), and climate sciences (Cho 2018; Reichstein et al. 2019). In the area of robotics, many of the most exciting AI developments have come from an earth science context. Two interesting recent deployments in this area are the EU-funded UNEXMIN project’s UX-1 underwater mine exploration robot and the Woods Hole Oceanographic Institution’s (WHOI) Nereid Under Ice robot (Fig. 4). The UX1-Neo was developed by the European Union’s UNEXMin project primarily as an autonomous system for mapping and assessing Europe’s flooded mines into which it would be extraordinarily dangerous for humans to venture, much less work. Various UX-1 systems currently exist each with a different set of sensors and analytic packages onboard. The system is designed to operate completely autonomously with only a guideline attached so the robot can be recovered if it becomes stuck or ceases to function properly. Aside from the technological challenge, creation of the UX1 series was justified as a means of determining whether it would be feasible and commercially profitable to reopen old, disused, flooded properties to extract rare-earth elements whose price has escalated dramatically in recent years. Regardless, such a system would be useful in many other earth science contexts. In a similar vein, the WHOI Nereid Under Ice robot was recently sent to explore the Kolumbo volcano off Greece’s Santorini Island. Santorini was, of course, the site of one of
Artificial Intelligence in the Earth Sciences, Fig. 4 Portraits of the UNEXUP UX1-Neo robot with autonomous capabilities and the WHOI Nereid under the ice robot which was rendered fully autonomous via installation of an automation technology package developed by NASAs
PSTAR program. Autonomous robots such as these are pioneering AI technology developments that will have broad application across the earth and space sciences in the coming years. Image credits UNEXMIN and NOAA for the UX1-Neo and Nereid Under Ice robots respectively
A
40
the largest volcanic eruptions recorded in human history. For the 500 m Santorini dive, the Nereid was equipped with a new automation-technology package developed by US National Aeronautics and Space Administration’s (NASA) Planetary Science and Technology from Analog Research (PSTAR) program. This NASA-PSTAR package included an AI-driven planning system that allowed the Nerid to plan its own dive as well as to determine which samples to take and how to take them. On the Santorini dive, the AI-enabled Nerid became the first robot to collect an underwater sample autonomously (see video clip at: https://www.whoi.edu/ press-room/news-release/whoi-underwater-robot-takes-firstknown-automated-sample-from-ocean/). Both the UX1-Neo and NASA-PSTAR system-equipped Nerid represent a new breed of fully autonomous robots that will not only fill critical gaps in the capabilities of earth scientists now, but also provide test beds for the development of new and even more sophisticated autonomous robots in the future, for use both here on Earth and beyond. While many believe the most efficient way to explore the geologies of distant planets is via a remote controlled robot (e.g., Spudis and Taylor 1992), and despite the fact that all current lunar and Martian rovers have been remotely controlled automations, the distances involved in the exploration of Mars and the moons of Jupiter and Saturn are such that a large degree of autonomous capability will need to be incorporated into the designs of future planet-exploration robots (Anonymous 2018). At present the ExoMars Phase 2 Mission (due to be launched in 2022) will include a substantially automated robotic system designed to complement the ExoMars Phase 1 Mission (launched in 2016). The difference between the levels of autonomous capability incorporated into these two rovers reflects developments – and confidence – in the field of robot autonomous systems (RAS) over the past
Artificial Intelligence in the Earth Sciences, Fig. 5 Stratigraphic coexistence matrix before (left) and after (right) permutation to infer the correct relative order, or slotting, of the datums. In a spatial sense, this problem is similar to a minimum trip distance problem in which the distances between locations are known, but the most efficient sequence
Artificial Intelligence in the Earth Sciences
5 years. Similar levels of autonomous robotic capabilities are expected to be part of the Mars Sample Return (MSR) Mission (expected launch 2026), Martian Moon (Phobos) Sample Return Mission (expected launch 2024), and various orbital missions currently in planning stages. Planning Systems While AI planning systems might appear to have more to do with the logistics of earth science research than the science itself, it is possible to take a broader view of planning which is, in essence, an attempt to order a set of data into a sequence according to some error-minimization criterion. In this context, the routine – though complex – task of inferring the correct sequence of stratigraphic datums from samples collected from multiple stratigraphic sections or cores is not dissimilar to a planning problem. Here, the geologist/ stratigrapher must infer the correct order a set of stratigraphic datums whose true order is obscured by a variety of distributional and preservation factors (Fig. 5). This problem can be approached in many ways (e.g., Shaw 1964; Gordon and Reyment 1979; Edwards 1989; MacLeod and Keller 1991). However, several recent and quite sophisticated approaches have treated it as a quasi-planning problem and developed either probabilistic (Agterberg et al. 2013), rule-based (Guex and Davaud 1984; Guex 2011), or constrained optimization (Kemple et al. 1995; Sadler and Cooper 2003; Sadler 2004; Fan et al. 2020) approaches to its solution; approaches that, in the case of the latter, strongly mimic that of an AI-based planning system approach.
of destinations is not. Such problems can be solved using an AI-based planning system approach. For these data, the coexistence of B and G is implied but not observed. The column of numbers to the right represent reciprocal averaging scores. (Redrawn from Sadler 2004)
Artificial Intelligence in the Earth Sciences
Conclusion and Prospectus Artificial intelligence-based applications have been a part of earth science research and industrial applications for over 40 years, virtually as long as such applications have been available for use outside the fields of mathematics and information technology. Levels of interest and activity in AI by the earth science community have waxed and waned over these decades in much the same way as the fortunes of AI generally. While it is true most of the activity in this community has come from technology and data-analysis enthusiasts, AI is undergoing a renaissance of interest across the earth science disciplines currently, which mirrors renewed government and commercial interest in AI-based solutions to a wide range of technical, technological, and social problems generally. In large part, this renewed interest has been driven by advances in the machine learning AI activity domain. But these advances have had the effect of stimulating research, development, and interest in all forms of AI. Many seem to be concerned with the effect AI will have on the direction of the fields to which it is applied and especially on the employment prospects for young people. Such concerns are, in my view, misplaced. Earth sciences as a whole have become increasingly quantitative over the past 70 years, and there is little evidence to support a conclusion that this has led to unethical practices, or reduced employment prospects, in the sector. Routine identification, sampling, and datacollection tasks are, increasingly, able to be performed by quasi-autonomous AI-enabled machines. This trend is very likely to continue. But in many ways it is these tasks that often take us away from the more creative and integrative aspects of our science; two areas of performance AI approaches struggle to deliver. In addition, it is simply beyond question that human experts suffer from a wide variety of performance limitations that the use of AI-based technologies can assist in addressing (e.g., MacLeod et al. 2010; Culverhouse et al. 2013). Instead of viewing the rise of AI as a threat to the development of the earth sciences, the community should, as it has with other technological advances in the past, welcome such developments as much-needed and much-valued facilitators/partners in the task of exploring and understanding our planet and its inhabitants: past, present, and future. As with any genuine partnership though, both sides need to work to make the most of the collaboration. Devices that employ aspects of AI can be massive boon to earth science research, but AI cannot deliver on this promise alone. In order to be effective, AI systems need data and guidance. Earth scientists have data, but we have not been diligent in aggregating, cleaning, updating, and making our data available. Various earth science big data initiatives are currently
41
underway (e.g., Spina 2018; Normile 2019; Stephenson et al. 2019). These need not only to be supported and contributed to by earth scientists, but the guidance AI applications need must be delivered via engagement with, and active participation in, AI-related initiatives by expert earth scientists of all types. In addition, the training of students interested in the earth sciences at all levels needs to be reorganized and expanded to equip them with the skills they will need to take full advantage of the coming AI revolution. Finally, the earth science community needs to establish AI as a priority – perhaps a top priority – for ensuring the future success of our science through the diverse, ongoing professional education, outreach, and influencing activities of our professional societies.
Bibliography Agterberg FP, Gradstein FM, Cheng Q, Liu G (2013) The RASC and CASC programs for ranking, scaling and correlation of biostratigraphic events. Comput Geosci 54:279–292 Aminzadeh F, Simaan M (1991) Expert systems in exploration. Society of Exploration Geophysicists, Tulsa, Oklahoma Anonymous (2018) Space robotics & autonomous systems: widening the horizon of space exploration. Available via UK-RAS Network. https://www.ukras.org/publications/white-papers/. Accessed 4 Mar 2020 Anonymous (2019) Artificial intelligence alone won’t solve the complexity of earth sciences. Nature 566:153 Bertsekas DP (2014) Constrained optimization and Lagrange multiplier methods. Academic, New York Boykov Y, Kolmogorov V (2003) Computing geodesics and minimal surfaces via graph cuts. In: Proceedings Ninth IEEE International Conference on Computer Vision 2003, pp 26–33 Boykov Y, Veksler O, Zabih R (1998) Markov random fields with efficient approximations: proceedings. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1998, pp 648–655 Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. IEEE Trans Pattern Anal Mach Intell 23: 1222–1239 Bremdal BA (1998) Expert systems for management of natural resources. In: Liebowitz J (ed) The Handbook of Applied Expert Systems. CRC Press, Boca Ration, Louisiana, pp 30-1–30-44 Brough DR, Alexander IF (1986) The Fossil expert system. Expert Syst 3:76–83 Burnetas AN, Katehakis MN (1997) Optimal adaptive policies for Markov decision processes. Math Oper Res 22:222–255 Carlborn I (1982) Dipmeter advisor expert system. Am Assoc Pet Geol Bull 66:1703–1704 Caté A, Perozzi L, Gloaguen E, Blouin M (2017) Machine learning as a tool for geologists. Leading Edge, pp 64–68 Cho R (2018) Artificial intelligence—a game changer for climate change and the environment. State of the Planet, pp 1–13 Church A (1936) An unsolvable problem of elementary number theory. Am J Math 58:345–363 Christin S, Hervet É, Lecomte N (2019) Applications for deep learning in ecology. Methods Ecol Evol 10:1632–1644
A
42 Culverhouse PF, MacLeod N, Williams R, Benfield MC, Lopes RM, Picheral M (2013) An empirical assessment of the consistency of taxonomic identifications. Mar Biol Res 10:73–84 Curry GB, Connor RJ (2007) Automated extraction of biodiversity data from taxonomic descriptions. In: Curry GB, Humphries CJ (eds) Biodiversity databases: techniques, politics and applications. The Systematics Association and CRC Press, Boca Raton Demyanov V, Reesink AJH, Arnold DP (2019) Can machine learning reveal sedimentological patterns in river deposits? In: Corbett PWM, Owen A, Hartley AJ, Pla-Pueyo S, Barreto D, Hackney C, Kape SJ (eds) River to reservoir: geoscience to engineering, Geological Society of London, Special Publication No. 488, London Dutkiewicz A, Müller RD, O’Callaghan S, Jónasson H (2015) Census of seafloor sediments in the world’s ocean. Geology 43:795–798 Ebert-Uphoff I, Samarasinghe S, Barnes E (2019) Thoughtfully using artificial intelligence in earth science. Eos 100:1–5 Edwards LE (1989) Supplemented graphic correlation: a powerful tool for paleontologists and nonpaleontologists. Palaios 4:127–143 Edwards M, Cooley RE (1993) Expertise in expert systems: knowledge acquisition for biological expert systems. Comput Appl Biosci 9: 657–665 Fan J et al. (2020) A high-resolution summary of Cambrian to Early Triassic marine invertebrate biodiversity. Science 367:272–277 Feinberg EA, Shwartz A (2002) Handbook of Markov decision processes. Kluwer, Boston Fisher PF, Mackaness WA, Peacegood G, Wilkinson GG (1988) Artificial intelligence and expert systems in geodata processing. Progr Phys Geogr Earth Environ 12:371–388 Folorunso IO, Abikoye OC, Jimoh RG, Raji KS (2012) A rule-based expert system for mineral identification. J Emerg Trends Comput Inform Sci 3:205–210 Fruitet J, Kalloufi L, Laurent D, Boudad L, de Lumley H (1990) “ARCHEO-NET” a prehistoric and paleontological material data base for research and scientific animation. In: Tjoa AM, Wagner R (eds) Database and expert systems applications. Springer, Wien/ New York Ghada W, Estrella N, Menzel A (2019) Machine learning approach to classify rain type based on Thies disdrometers and cloud observations. Atmos 10:1–18 Gordon AD, Reyment RA (1979) Slotting of borehole sequences. Math Geol 11:309–327 Guex J (2011) Some recent ‘refinements’ of the unitary association method: a short discussion. Lethaia 44:247–249 Guex J, Davaud E (1984) Unitary associations method: use of graph theory and computer algorithm. Comput Geosci 10:69–96 Hart PE, Duda RO (1977) PROSPECTOR – a computer-based consultation system for mineral exploration. Artificial Intelligence Center, Technical Note 155, SRI International, Meno Park Hart PE, Duda RO, Einaudi MT (1978) PROSPECTOR – a computerbased consultation system for mineral exploration. J Int Assoc Math Geol 10:589–610 Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision. Cambridge University Press, Cambridge, UK Hayes-Roth F, Waterman DA, Lenat DB (1983) Building expert systems. Addison-Wesley, Reading, Massachusetts Hutchins WJ (2004) The Georgetown-IBM experiment demonstrated in January 1954. In: Frederking RE, Taylor KB (eds) Machine translation: from real users to research. Lecture notes in computer science 326. Springer, Berlin Jarve I, Liba N (2010) The effect of various principles of external orientation on the overall triangulation accuracy. Technol Mokslai 86:59–64 Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101:99–134 Kanevski M, Foresti L, Kaiser C, Pozdnoukhov A, Timonin V, Tuia D (2009) Machine learning models for geospatial data. In: Bavaud F,
Artificial Intelligence in the Earth Sciences Christophe M (eds) Handbook of theoretical and quantitative geography. University of Lausanne, Lausanne Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW (eds). Complexity of Computer Computations. Plenum Press, New York, pp 85–103 Kempell WG, Sadler PM, Strauss DJ (1995) Extending graphic correlation to many dimensions: stratigraphic correlation as constrained optimization. In: Mann KO, Lane HR (eds). Graphic Correlation. SEPM Society for Sedimentary Geology, Special Publication 53, Tulsa, Oklahoma, pp 65–82 Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671–680 Klemme HD (1975) Giant oil fields related to their geologic setting-a possible guide to exploration. Bull Can Petrol Geol 23:30–36 Klemme HD (1980) Petroleum basins-classifications and characteristics. J Pet Geol 3:187–207 Klemme HD (1983) Field size distribution related to basin characteristics. Oil Gas J 81:187–207 Lampoltshammer TJ, Heistracher T (2012) Natural language processing in geographic information systems – some trends and open issues. Int J Comput Sci Emerg Tech 3:81–88 LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2323 Liebowitz J (1997) The handbook of applied expert systems. CRC Press, Boca Raton Lighthill J (1973) Artificial intelligence: a paper symposium. UK Science Research Council, London MacLeod N (2007) Automated taxon identification in systematics: theory, approaches, and applications. CRC Press/Taylor & Francis Group, London MacLeod N (2018) The quantitative assessment of archaeological artifact groups: beyond geometric morphometrics. Quat Sci Rev 201: 319–348 MacLeod N, Benfield M, Culverhouse PF (2010) Time to automate identification. Nature 467:154–155 Macleod N, Keller G (1991) How complete are Cretaceous/Tertiary boundary sections? A chronostratigraphic estimate based on graphic correlation? Geol Soc Am Bull 103:1439–1457 Manzella G et al (2017) Semantic search engine for data management and sustainable development: marine planning service platform. In: Diviacco P, Ledbetter A, Glaves H (eds) Oceanographic and marine cross-domain data management for sustainable development. IGI Global, Hershey McCammon RB (1994) Prospector II: towards a knowledge base for mineral deposits. Math Geol 26:917–936 McCorduck P (1979) Machines who think. W. H. Freemen, San Francisco McCulloch W, Pitts W (1943) A logical calculus of ideas immanent in nervous activity. Bull Math Biophys 5:115–133 Miller BM (1986) Building an expert system helps classify sedimentary basins and assess petroleum resources. Geobyte 1(44–50):83–84 Miller BM (1987) The MuPETROL expert system for classifying world sedimentary basins. US Geol Surv Bull 1810:1–87 Miller BM (1993) Object-oriented expert systems and their applications to sedimentary basin analysis. US Geol Surv Bull 2048:1–31 Monson TA, Armitage DW, Hlusko LJ (2018) Using machine learning to classify extant apes and interpret the dental morphology of the chimpanzee-human last common ancestor. PaleoBios 35:1–20 Mulvenna MD, Woodham C, Gregg JB (1991) Artificial intelligence applications in geology: a case study with EXPLORER. In: McTear MF, Creaney N (eds) AI and Cognitive Science ’90. Workshops in Computing 1991, pp 109–119 Nada YA, Elawady YH (2017) Analysis, design, and implementation of intelligent fuzzy expert system for marine wealth preservation. International Journal of Computer Applications 161:15–20
Artificial Neural Network Normile D (2019) Earth scientists plan to meld massive databases into a ‘geological Google’. Science 363:917 Peak JE, Tag PM (1989) An expert system approach for prediction of maritime visibility obscuration. Mon Weather Rev 117:2641–2653 Peters SE, Zhang C, Livny M, Ré C (2014) A machine reading system for assembling synthetic paleontological databases. PLoS One 9: 1–22 Pierce JR, Carroll JB, Hamp EP, Hays DG, Hockett CF, Oettinger AG, Perlis A (1966) Language and machines: computers in translation and linguistics. National Academy of Sciences and National Research Council, Washington, DC Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77:257–286 Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J, Carvalhais N, Prabhat (2019) Deep learning and process understanding for data-driven Earth system science. Nature 566:195–204 Rubo RA, Carneiro CC, Michelon MF, dos Santos GR (2019) Digital petrography: mineralogy and porosity identification using machine learning algorithms in petrographic thin section images. J Pet Sci Eng 183:106382 Russell B (2019) Machine learning and geophysical inversion – a numerical study. Lead Edge 38:512–519 Sadler PM (2004) Quantitative biostratigraphy – achieving finer resolution in global correlation. Annual Review of Earth and Planetary Science 32:187–213 Sadler PM, Cooper RA (2003) Best-fit intervals and consensus sequences: comparison of the resolving power of traditional biostratigraphy and computer-assisted correlation. In: Harries PJ (ed) HighResolution Stratigraphic Correlation. Kluwer, Amsterdam, The Netherlands, pp 49–94 Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3:210–229 Shaw A (1964) Time in stratigraphy. McGraw-Hill, New York Smith RG, Baker JD (1983) The DIPMETER ADVISOR system. A case study in commercial expert system development. In: van den Herik J, Filipe J (eds) Proceedings of the 8th International Joint Conference on Artificial Intelligence, Rome, pp 122–129 Spina R (2018) Big data and artificial intelligence analytics in geosciences: promises and potential. GSA Today 29:42–43 Spudis PD, Taylor GJ (1992) The roles of humans and robots as field geologists on the moon. In: Mendell W.W. (ed) 2nd Conference on Lunar Bases and Space Activities. NASA Lyndon B. Johnson Space Center, Houston, Texas, pp 307–313 Stephenson M, Cheng Q, Wang C, Fan J, Oberhänsli R (2019) On the cusp of a revolution. Geoscientist 2019:16–19 Stokes JM et al (2020) A deep learning approach to antibiotic discovery. Cell 180:688–702.e13 Tarca AL, Carey VJ, Chen X, Romero R, Drăghici S (2007) Machine learning and its applications to biology. PLoS Comput Biol 3: 953–963 Thessen AE, Cui H, Mozzherin D (2012) Applications of natural language processing in biodiversity science. Adv Bioinforma 2012: 1–17 Thessen A, Preciado J, Jenkins C (2018) Collaboration between the natural sciences and computational linguistics: a discussion of issues. INSTAAR Univ Colorado Occas Rep 28:1–24 Triggs B, McLauchlan P, Hartley R, Fitzgibbon A (1999) Bundle adjustment – a modern synthesis. In: Triggs W, Zisserman A, Szeliski R (eds) Vision algorithms: theory and practice (CCV ‘99: proceedings of the international workshop on vision algorithms). Springer, Berlin/ Heidelberg/New York, pp 298–372 Turing AM (1936, Published in 1937) On computable numbers, with an application to the Entscheidungsproblem. Proc Lond Math Soc 42: 230–265 Turing AM (1950) Computing machinery and intelligence. Mind 49: 433–460
43 van Wynsberghe A, Donhauser J (2018) The dawning of the ethics of environmental robots. Sci Eng Ethics 24:1777–1800 Verfaillie G, Lemaitre M, Schiex T (1996) Russian doll search for solving constraint optimization problems. Proc Natl Conf Artif Intell 1:181–187 Wexelblat RL (1981) History of programming languages. Academic, New York Wiora G (2001) Optische 3D-Messtechnik: präzise gestaltvermessung mit einem erweiterten streifenprojektionsverfahren. Ruprechts-Karls Universutät, 36p Zhang J, Jia L, Menenti M, Hu G (2019) Glacier facies mapping using a machine-learning algorithm The Parlung Zangbo Basin case study. Remote Sens 11:1–38 Zhou C, Ouyang J, Ming W, Zhang G, Du Z, Liu Z (2019) A stratigraphic prediction method based on machine learning. Applied Sciences 9:3553
Artificial Neural Network Yuanyuan Tian1, Mi Shu2 and Qingren Jia3 1 School of Geographical Science and Urban Planning, Arizona State University, Tempe, AZ, USA 2 Tencent Technology (Beijing) Co., Ltd, Beijing, China 3 College of Electronic Science and Technology, National University of Defense Technology, Changsha, China
Definition An artificial neural network (ANN), also known as neural network, is a computational model capable of processing information to tackle tasks such as classification and regression. It is the component of artificial intelligence inspired by the human brain and nervous system.
Introduction Artificial neuron and the coding signals in an ANN are used to simulate the electrical activities of how the nervous system communicates. Each artificial neural neuron receives input signals, processes these signals, and generates an output signal. A general practice is to sum the weighted inputs, and then feed the sum value into a transfer function to judge if the value is qualified to be transferred to another neuron as an input. The simplest transfer function is to compare the sum value with the set threshold, while nonlinear functions allow more flexibility. Starting from the 1940s, ANNs have evolved from perceptron, an algorithm for supervised learning of binary classifiers, through backpropagation, a way to train a multilayer network, to deep learning. One of the reasons why ANNs have become so popular is that the network’s selflearning capability of ANNs enables them to solve complex
A
44
Artificial Neural Network
problems, which are difficult or unfulfillable for humans, and even produce better results than statistical methods. Also, they can deal with incomplete, noisy, and ambiguous data. Nowadays, ANNs are used in a variety of research and industrial areas. The applications of ANNs are wide, encompassing engineering, finance, communication, medical, and so on. The use of ANNs in geography stems from the need to handle big spatial data. With the development of geo-information technologies such as Remote Sensing (RS), Geographic Information System (GIS), and Global Navigation Satellite System (GNSS), spatial data processing and analysis begin to ask for artificial intelligence to handle complex geographical problems. The purpose of this encyclopedia entry is to introduce the basic concept of ANNs, review major applications in geoinformatics, and discuss future trends.
ANN Models Components of ANNs ANN refers to an artificial neuron system built by imitating human neurons for information processing. It is a computing model, which is composed of a large number of interconnected neurons. The neuron can be regarded as a calculation and storage unit. A single neuron generally has multiple inputs. After some calculations (i.e., weighted addition), the output will be used as the input of subsequent neurons. In a neural network, every connection between two neurons represents a weight to reflect the strength of the connection. Finally, neurons are combined layer by layer to form a neural network (Fig. 1). In more detail, let ai to represent the inputs of a neuron, and wi to represent the weights on the connections. Then, the output of this neuron would be z ¼ f ðsumðwi wai ÞÞ, where f represents an activation function. With the help of the activation function, what a neuron can do is more than just weighting and summing the received information and outputting it to the next neuron. The activation function simulates such a process. In reality, neurons in the organism are not
necessarily activated after receiving information, but determine whether this information has reached a threshold. If the value of the information does not reach this threshold, the neuron is in a state of inhibition, otherwise, the neuron is activated and transmits the information to the next neuron. Similarly, a neuron in an ANN would decide whether and how to pass the information. In order to improve the utilization of neurons, nonlinear functions are usually used instead of thresholds as the activation functions of neurons. Besides, by introducing nonlinear activation functions, a neural network can theoretically fit any objective function, making it powerful. Organization Structurally, a neural network can be divided into input layer, output layer, and hidden layer. The input layer accepts signals and data from the outside world, and the output layer realizes the output of the system processing results. The hidden layer is between the input layer and the output layer and cannot be observed from outside the system. When designing a neural network, the number of neurons in the input layer and output layer is often fixed, and the hidden layer can be freely specified. The number of hidden layers and the number of neurons in each layer determine the complexity of the neural network. In the 1950s, ANN presented itself in the form of a perceptron, without a hidden layer. Later in the 1980s, ANN was developed into a multilayer perceptron, with only one hidden layer. From 2012, ANN tended to have more and more hidden layers (deep learning). The reason for the development of neural network structure is that as the number of layers increases, the parameters of the entire network increase. More parameters mean that the simulated function by ANN can be more complicated to solve harder problems (Fig. 2). Learning A neural network training algorithm is to adjust the value of all the weights to the best, so that the prediction performance of the entire network could be the best. In order to fit the target better, the neural network needs to know how far the prediction is from predicted target. Let the sample be yp and the true target be y. Then, loss function needs to be defined to measure the distance between yp and y, so as to define what weight parameters are appropriate. There are some different loss functions in practice. For example, the calculation formula could be as follows. loss ¼ ðyn yÞ2
Artificial Neural Network, Fig. 1 (a) a natural neuron vs. (b) a neuron in an ANN
Then, the training goal is to minimize the loss function and make the prediction result of the neural network for the input sample closer and closer to the true value by adjusting each
Artificial Neural Network
45
A
Artificial Neural Network, Fig. 2 The development of neural network structure. (a) perception; (b) multilayer perception, MLP; (c) deep learning
weight parameter. In practice, the optimization algorithm represented by the gradient descent method can be used to solve the above minimization problem. The back-propagation method is the specific implementation method of the gradient descent method on the neural network. Its main purpose is to pass the output error back and distribute the error to all the neurons of each layer, so as to obtain the error of each neuron, and then modify each weight accordingly. In general, in ANN Models, a collection of artificial neurons are created and connected together, allowing them to send messages to each other. The network is asked to solve a problem, so it attempts to make it by iteration, each time strengthening the connections that lead to success and diminishing those that lead to failure. Applications Tasks suited for ANNs are classification, regression and clustering, etc. ANNs have been applied in many disciplines, especially in earth science. ANNs have a long history of use in the geosciences (Van der Baan and Jutten 2000), such as hydrology, ocean modeling and coastal engineering, and geomorphology, and they remain popular for modeling both regression and/or classification (supervised or unsupervised) of nonlinear systems. As the early landmarks, the revival of neural networks applied on high-resolution satellite data were applied to classify land cover and clouds (Benediktsson et al. 1990; Lee et al. 1990) almost 30 years ago. In addition to spatial prediction, ANNs have also been used to study dynamics in the earth systems by mapping time-varying features to temporally varying target variables in land, ocean, and atmosphere domains. For instance, by using (multitemporal) remote sensing data, ANN can be used to detect forest changes caused by a prolonged drought in the Lake Tahoe Basin in California (Gopal and Woodcock 1996). Also, an artificial neural network with one hidden layer was able to filter out noise, predict the
diurnal and seasonal variation of carbon dioxide (CO2) fluxes, and extract patterns such as increased respiration in spring during root growth (Papale and Valentini 2003). In these early works, ANNs were proved to provide better classification or regression performance compared to traditional methods in given scenarios. However, compared to statisticalbased machine learning (ML) methods, ANNs do not have many overwhelming advantages. Deep neural networks (DNNs), or deep learning (DL), extend the classical ANN by incorporating multiple hidden layers and enable good model performance without requiring well-chosen features as inputs. DNNs have attracted wide attention in the new millennium, mainly by outperforming alternative machine-learning algorithms in numerous application areas. In 2012, AlexNet, a convolutional neural network (CNN) model won Imagenet’s image classification contest with an accuracy of 84% jump over 75% accuracy that earlier ML models had achieved based on its efficient use of graphic processing units (GPU), rectified linear units (ReLUs), and large training data volume. This win triggers a new DNN boom globally. In the geoscience domain, researchers observe, collect, and process geological data and discover interested knowledge of the geoscientific phenomenon from the data. While the earth system is a complex system, building a quantitative understanding of how the Earth works and evolves by using traditional tools from geology, chronology, physics, chemistry, geography, biology, and mathematics becomes difficult. However, as the volume of data accumulated through longterm tracking and recording, the scientific research paradigm shifts from knowledge-driven to data-driven (Kitchin 2014). This shift provides opportunities for ANN-based machine learning methods in geoscience fields, as they can capture nonlinear relationships between the inputs as well as reduce the dimensions of the model and help convert the input into a more useful output.
46
Specifically, there are mainly three popular research fields in geomatics: • Although applications to problems in geosciences are still in their infancy (Reichstein et al. 2019), tasks using remote-sensing images benefit a lot by introducing deep learning methods that are rapidly developing in computer science. Since 2014, the remote sensing community has shifted its attention to DL, and DL algorithms have achieved significant success at many image analysis tasks including land use and land cover classification, scene classification, and object detection. CNN is more popular than other DL models, and spectral-spatial features of images are utilized for analysis. The main focus of the studies is on classification tasks, but several other applications are worth mentioning, including fusion, segmentation, change detection, and registration. Therefore, it appeared that DL could be applied to almost every step of the remote sensing image processing. • In solid Earth geoscience, DNNs trained on simulationgenerated data can learn a model that approximates the output of physical simulations (Bergen et al. 2019). It can be used to accurately reproduce the time-dependent deformation of the Earth (DeVries et al. 2017), or perform fast full wavefield simulations by training synthetic data from a finite difference model using CNN model (Moseley et al. 2018). In another example, Shoji et al. (2018) use the class probabilities returned by CNN to identify the mixing ratio for ash particles with complex shapes. Several recent studies have applied DNNs with various architectures for automatic earthquake and seismic event detection, phasepicking, and classification of volcano-seismic events. All these tasks are difficult for expert analysts. • In dynamic earth system researches, such as climatology and hydrology, ANNs, especially DNNs, also play a vital role. For instance, a neural network was used to predict the evolution of the El Niño Southern Oscillation (ENSO) and it shows that ENSO precursors exist within the South Pacific and Indian Oceans (Ham et al. 2019). Recent studies demonstrate the application of deep learning to the problem of extreme weather, for instance, hurricane detection (Racah et al. 2017). Also, Gagne II et al. (2019) employed neural networks to predict the likelihood that a convective storm would produce hail, showing that the neural networks made accurate predictions by identifying known types of storm structures. The studies report success in applying deep learning architectures to objectively extract spatial features to define and classify extreme situations (for example, storms, hails, atmospheric rivers) in numerical weather prediction model output. Such an approach enables rapid detection of such events and
Artificial Neural Network
forecast simulations without using either subjective human annotation or methods that rely on predefined arbitrary thresholds for wind speed or other variables.
Conclusions Based on many existing works in recent years, the adoption of ANNs in geoinformatics are quite successful. Although ANNs have been widely used in geoscience applications, geoscientists sometimes hesitate to use them due to the lack of interpretability. The ANN is criticized as being oversold as a “black-box” with output oriented regardless of understanding the network structure. Recently, there have been efforts within the geoscience community to compile methods for improving machine learning model interpretability (Gagne II et al. 2019; McGovern et al. 2019). These interpretation methods are applied to trace the decision of a neural network back onto the original dimensions of the inputs and thereby permit the understanding of which input variables are most important for the neural network’s decisions. Backward optimization and layerwise relevance propagation (LRP) reveal that the neural network mainly focuses its attention on the tropical Pacific, which is consistent with expert perception (Toms et al. 2020). This can help uncover the “black-box” construction of ANNs. Also, expert knowledge can be introduced to educate the network structure formation and optimization. Empirical, inner model comparisons and assessments are needed to facilitate understanding of the physical mechanism of the spatial processes. Therefore, for geoscientists who embrace neural network methods, interpretable ANNs will be one of the promising directions where breakthroughs may be made.
Cross-References ▶ Deep Learning in Geoscience ▶ Geographical Information Science ▶ Machine Learning
Bibliography Benediktsson JA, Swain PH, Ersoy OK (1990) Neural network approaches versus statistical methods in classification of multisource remote sensing data Bergen KJ, Johnson PA, Maarten V, Beroza GC (2019) Machine learning for data-driven discovery in solid Earth geoscience. Science 363 DeVries PMR, Ben TT, Meade BJ (2017) Enabling large-scale viscoelastic calculations via neural network acceleration. Geophys Res Lett 44:2662–2669
Autocorrelation Gagne DJ II, Haupt SE, Nychka DW, Thompson G (2019) Interpretable deep learning for spatial analysis of severe hailstorms. Mon Weather Rev 147:2827–2845 Gopal S, Woodcock C (1996) Remote sensing of forest change using artificial neural networks. IEEE Trans Geosci Remote Sens 34: 398–404 Ham Y-G, Kim J-H, Luo J-J (2019) Deep learning for multi-year ENSO forecasts. Nature 573:568–572 Kitchin R (2014) Big data, new epistemologies and paradigm shifts. Big Data Soc 1:2053951714528481 Lee J, Weger RC, Sengupta SK, Welch RM (1990) A neural network approach to cloud classification. IEEE Trans Geosci Remote Sens 28: 846–855 McGovern A, Lagerquist R, John Gagne D et al (2019) Making the black box more transparent: understanding the physical implications of machine learning. Bull Am Meteorol Soc 100: 2175–2199 Moseley B, Markham A, Nissen-Meyer T (2018) Fast approximate simulation of seismic waves with deep learning. arXiv preprint arXiv:180706873 Papale D, Valentini R (2003) A new assessment of European forests carbon exchanges by eddy fluxes and artificial neural network spatialization. Glob Chang Biol 9:525–535 Racah E, Beckham C, Maharaj T et al (2017) ExtremeWeather: a largescale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. Adv Neural Inf Proces Syst 30:3402–3413 Reichstein M, Camps-Valls G, Stevens B et al (2019) Deep learning and process understanding for data-driven Earth system science. Nature 566:195–204 Shoji D, Noguchi R, Otsuki S, Hino H (2018) Classification of volcanic ash particles using a convolutional neural network and probability. Sci Rep 8:8111 Toms BA, Barnes EA, Ebert-Uphoff I (2020) Physically interpretable neural networks for the geosciences: applications to earth system variability. J Adv Model Earth Syst 12:e2019MS002002 Van der Baan M, Jutten C (2000) Neural networks in geophysical applications. Geophysics 65:1032–1047
Autocorrelation Alejandro C. Frery School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
Synonyms Second moment; Serial correlation (when referring to time series)
Definition The autocorrelation is a measure of the association between two random variables of the same process. In geosciences; it is often called autocovariance.
47
There are many ways of measuring the association of two random variables. Among them, the Pearson coefficient of correlation between two random variables X, Y : Ω ! ℝ is defined as rðX, Y Þ ¼
EðXY Þ EðXÞEðY Þ , VarðXÞVarðY Þ
ð1Þ
provided this quantity is well defined. The numerator in Eq. (1) is called “covariance.” The main properties of Eq. (1) are: Symmetry: r(X, Y) ¼ r(Y, X). Boundedness: 1 r(X, Y) 1. Invariance before linear transformations: For any a, b ℝ holds that r(aX þ b, aY þ b) ¼ r(X, Y), and that r(aX þ b, –aY þ b) ¼ r(X, Y). If X and Y are independent, then they are uncorrelated, e.g., r(X, Y) ¼ 0. The converse is not necessarily true, but if both X and Y are normal random variables, then it holds. When X and Y belong to the same random process, Eq. (1) is called “autocorrelation.” In this case, instead of X and Y, one usually denotes the random variables with an index, e.g., Xt and Xu, and their correlation as rt,u.
Discussion Time Series Consider an infinite series of real-valued random variables X ¼ ( , Xt2, Xt1, Xt, Xtþ1, Xtþ2, ). A general model allows both the mean E(Xu) ¼ mu and the variance VarðXu Þ ¼ s2u to vary with the index u, and the covariance Cov(Xu, Xv) to vary with the pair of indexes (u, v). We will make three assumptions in order to facilitate our presentation: H1: The (finite) mean does not vary with the index: mu ¼ m < 1 for every u ℤ. H2: The (finite) variance does not vary with the index: s2u ¼ s2 < 1 for every u ℤ. H3: The covariance depends only on the absolute value of the difference between the indexes: Cov(Xu, Xv) ¼ Covk, where k ¼ |u v| for every (u, v) ℤ ℤ. This absolute difference k is known as “lag,” and Covk is referred to as “autocovariance.” A random process X that satisfies these three hypotheses is called “weakly stationary.”
A
48
Autocorrelation
The autocorrelation of X is rk ¼
Covk E½ðXtþk mÞðXt mÞ ¼ , s2 s2
provides a simple rule for building approximate confidence intervals around 0. Figure 1 shows 1000 observations of atmospheric noise (top) collected by https://www.random.org, along with the first 30 estimates of its autocorrelation function (ACF) r1 , r2 , . . . , r30 and the approximate confidence interval at the 95% level. As expected for this kind of noise, there is no evidence of serial correlation. Figure 2 shows 870 measurements of the daily number of sun spots (top) downloaded from https://www.ngdc.noaa. gov/stp/solar/ssndata.html, along with the first 29 estimates of its autocorrelation function (ACF) r1 , . . . , r29 and the approximate confidence interval at the 95% level. Seasonal observations like these often show significant autocorrelation values. The reader is referred to the texts by Gilgen (2006) for specific applications to geosciences, and to the more general books by Box et al. (1994), Cryer and Chan (2008), and Cowpertwait and Metcalfe (2009).
ð2Þ
which, by the hypotheses above, does not depend on t. Notice that r0 ¼ 1. Finlay et al. (2011) provide a comprehensive account of conditions for a function rk : ℤ ! [1, 1] to be a valid autocorrelation function (ACF). Such a characterization is primarily helpful for theoretical purposes. In practice, one encounters finite time series x ¼ (x1, x2, , xn) of length n, with which one has to make several types of inference. If we are interested in estimating the autocorrelation, we may use the estimator of the autocovariance Covk ¼
1 n
nk
ðx1 xÞðxtþk xÞ,
ð3Þ
t¼1
Extensions Not rejecting a test for zero autocorrelation (or autocovariance) does not imply independence. Székely et al. (2007) discuss extensions (distance autocorrelation and distance covariance) which are zero only if the random vectors are independent. Autocorrelation and autocovariance are defined over realvalued (univariate) random vectors. Székely and Rizzo (2009) define a new measure, the Brownian Distance Covariance, that compares vectors of arbitrary dimension.
where x is the sample mean of the whole time series, and then obtain the estimator of the autocorrelation rk ¼
Covk Cov0
:
ð4Þ
Notice that Cov0 is the sample variance of the whole time series. If rk ¼ 0, then rk is asymptotically distributed as a Gaussian random variable with mean 1/n and variance 1/n. This result
Autocorrelation, Fig. 1 Gaussian atmospheric noise
Gaussian Noise
Value
Atmospheric Source 2.5 0.0 −2.5 0
250
500
750
1000
Index
ACF
0.04 0.00 −0.04 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Lag
Automatic and Programmed Gain Control
49
Autocorrelation, Fig. 2 Number of sun spots
Number of Sun Spots
A
Value
200 150 100 50 0 0
250
500
750
Index
ACF
0.75 0.50 0.25 0.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Lag
Summary
Automatic and Programmed Gain Autocorrelation functions are powerful tools for an Control exploratory analysis of the dependence structure of time series.
Gabor Korvin Earth Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Kingdom of Saudi Arabia
Cross-References ▶ Standard Deviation ▶ Univariate
Bibliography Box GEP, Jenkins GM, Reinsel GC (1994) Time series analysis: forecasting and control, 3rd edn. Prentice, Englewood Cliffs Cowpertwait PSP, Metcalfe AV (2009) Introductory time series with R. Springer, New York Cryer JD, Chan KS (2008) Time series analysis with applications in R, 2nd edn. Springer, New York Finlay R, Fung T, Seneta E (2011) Autocorrelation functions. Int Stat Rev 79(2):255–271. https://doi.org/10.1111/j.1751-5823.2011. 00148.x Gilgen H (2006) Univariate time series in geosciences. Springer, Berlin. https://www.ebook.de/de/product/11432954/hans_gilgen_univari ate_time_series_in_geosciences.html Székely GJ, Rizzo ML (2009) Brownian distance covariance. Ann Appl Stat 3(4). https://doi.org/10.1214/09-AOAS312. http://projecteuclid. org/euclid.aoas/1267453933 Székely GJ, Rizzo ML, Bakirov NK (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):1–31. https://doi.org/10.1214/009053607000000505
Definition In the electronic acquisition of geoscientific data, and their subsequent digital processing, we frequently subject the data to Gain Control which is a nonlinear transformation to maintain an approximately stationary signal of a prescribed Root Mean Square (RMS) amplitude at its output, despite timedependent slow variations of the signal amplitude level at the input. In simple terms, Gain Control reduces the signal if it is strong and amplifies it when it is weak. There are two ways to control signal amplitude, automatically or in a preset, programmed manner. Automatic Gain Control (AGC, also called Automatic Volume Control (AVC), Automatic Level Control (ALC), Time-Varying Gain (TGC)) is realized by an algorithm where the output energy level in a sliding time-window controls the gain applied to the input to keep it within prescribed limits. Programmed Gain Control (PGC) applies a gain which is a function of record time and was determined beforehand. In some AGC or PGC systems the gain can only vary by a factor of two (Binary Gain Control BGC). Gain Control (AGC or PGC) is an important step of data processing to improve the visibility of such data where propagation effects (such as attenuation or spherical divergence) have
50
Automatic and Programmed Gain Control
caused amplitude decay, and to assure approximate stationarity which is a prerequisite for statistical time-series analysis.
Introduction: Stationary and Nonstationary Processes, Gain Control Before delving into the treatment of Gain Control, we must discuss the concepts of stationarity and nonstationarity of a stochastic process. Stationarity of a time series has become considered – paradoxically – of crucial importance in analyzing Earth data (Myers 1989), even though most dynamic geoprocesses (such as “expansion of the Universe, the evolution of stars, global warming on the Earth, and so on,” Guglielmi and Potapov 2018) are actually nonstationary. In simple terms, stationarity means that the statistical properties of a time series (more exactly: of the stochastic process generating the time series) do not change over time. In other words, a stationary time series has statistical properties (such as mean, variance, and autocovariance) that do not change under time translation. Stationary processes are easier to analyze, to display, to model, and their future behavior is in a certain sense predictable because their statistical properties will be the same in the future as they have been in the past. There are different notions of stationarity, such as (Myers1989, Nason 2006): (a) Strong stationarity (also called strict stationarity) requires that the joint distribution of any finite sub-sequence of random variables of the process remains the same as we shift it along the time index axis: A stochastic process 1 is strongly stationary if, for any integer n, and fY t g t ¼ 1 any set of time instants {t1, t2, , tn} the joint distribution of the random variables ðY t1 t , Y t2 t , , Y tn t Þ depends only on {t1, t2, , tn} but not on t. (b) Weak stationarity (also called covariance stationarity) only requires the shiftinvariance of the mean and the autocovariance: a process is weakly stationary or covariance stationary if (letting E denote expected value). EðY t Þ ¼ m 6¼ 1 8t varðY t Þ¼s2 c 1, the system has two symmetrical fixed points, representing two steady convective states. At c ¼ 24.74, these two convective states lose stability; at c ¼ 28, the system
Chaos in Geosciences, Fig. 6 The Lorenz attractor as revealed by the never-repeating trajectory of a single chaotic orbit (Motter and Campbell 2013). The spheres represent iterations of the Lorenz equations, calculated using the parameters originally used by Lorenz (1963). Spheres are
colored according to iteration count. The two lobes of the attractor resemble a butterfly, a coincidence that helped earn sensitive dependence on initial conditions. Hence, its nickname: “the butterfly effect.” (Source: Agterberg 2014, Fig. 12.9)
The Lorenz Attractor
C
112
shows nonperiodic trajectories. Such trajectories orbit along a bounded region of 3-D space known as a chaotic attractor, never intersecting themselves (Fig. 6). For larger values of c, the Lorenz equations exhibit different behaviors that have been catalogued by Sparrow (1982). Conceptionally, chaos is closely connected to fractals and multifractals. The attractor’s geometry can be related to its fractal properties. For example, although Lorenz could not resolve this particular problem, his attractor has fractal dimension equal to approximately 2.06 (Motter and Campbell 2013, p. 30). An excellent introduction to chaos theory and its relation to fractals can be found in Turcotte (1997). The simplest nonlinear single differential equation illustrating some aspects of chaotic behavior is the logistic equation that can be written as dx/dt ¼ x(1 x) where x and t are nondimensional variables for population size and time, respectively. There are no parameters in this equation because characteristic time and representative population size were defined and used. The solution of this logistic equation has fixed points at x ¼ 0 and 1, respectively. The fixed point at x ¼ 0 is unstable in that solutions in its immediate vicinity diverge away from it. On the other hand, the fixed point at x ¼ 1 has solutions in its immediate vicinity that are stable. Introducing, the new variable, x1 ¼ x 1, and neglecting the quadratic term, the logistic equation has the solution x1 ¼ x10et where x10 is assumed to be small but constant. All such solutions “flow” toward x ¼ 1 in time. They are not chaotic. Chaotic solutions evolve in time with exponential sensitivity to the initial conditions. The so-called logistic map arises from the recursive relation xk þ 1 ¼ axk (1 xk) with iterations for k ¼ 0, 1, . . . May (1976) found that the resulting iterations have a remarkable range of behavior depending on the value of the constant a that is chosen. Turcotte (1997, Sect. 10.1) discusses in detail that there now are two fixed points at x ¼ 0 and 1 a1, respectively. The fixed point at x ¼ 0 is stable for 0 < a < 1 and unstable for a > 1. The other fixed point is unstable for 0 < a < 1, stable for 1 < a < 3, and unstable for a > 3. At a ¼ 3 a so-called flip bifurcation occurs. Both singular points are unstable and the iteration converges on an oscillating limit cycle. At a ¼ 3.449479, another flip bifurcation occurs and there suddenly are four limit cycles. The approach can be generalized by defining successive constants ai (i ¼ 1, 2, . . ., 1) leading to limit cycles that satisfy the iterative relation ai þ 1 ai ¼ F1 (ai ai-1) where F ¼ 4.669202 is the so-called Feigenbaum constant. Turcotte (1997, Figs. 10.1–10.6) shows examples of this approach. Sornette et al. (1991) and Dubois and Cheminée (1991) have treated the eruptions of the Piton de la Fournaise on Réunion Island and Mauna Lao and Kilauea in Hawaii using this generalized logistic model.
Chaos in Geosciences
The preceding examples of deterministic processes result in chaotic results. However, one can ask the question of whether there exist deterministic processes that can fully explain fractals and multifractals? The power-law models related to fractals and multifractals can be partially explained on the basis of the concept of self-similarity. Various chaotic patterns observed for element concentration values in rocks and orebodies could be partially explained as the results of multiplicative cascade models such as the model of de Wijs. These models invoke random elements such as increases of element concentration that are not deterministic. Successful applications of nonlinear modeling in geoscience also include the following: Rundle et al. (2003) showed that the Gutenberg-Richter frequency-magnitude relation is a combined effect of the geometrical (fractal) structure of fault networks and the nonlinear dynamics of seismicity. Most weather-related processes taking place in the atmosphere, including cloud formation and rainfall, are multifractal (Lovejoy and Schertzer 2007). Other space-related nonlinear processes include “current disruption” and “magnetic reconnection” scenarios (Uritsky et al. 2008). Within the solid Earth’s crust, processes involving the release of large amounts of energy over very short intervals of time including earthquakes (Turcotte 1997), landslides (Park and Chi 2008), flooding (Gupta et al. 2007), and forest fires (Malamud et al. 1998) are nonlinear and result in fractals or multifractals.
Summary and Conclusions Nonlinear process modeling is providing new clues to answers of where the randomness in nature comes from. From chaos theory it is known that otherwise deterministic Earth process models can contain terms that generate purely random responses. The solutions of such equations may contain unstable fixed points or bifurcations. It was shown that multifractals provide a novel way of approach to problem solving in situations where the attributes display strongly positively skewed frequency distributions. Randomness in the geosciences has been known to exist since the sixteenth century but it is only fairly recently that chaos theoretical explanations using deterministic differential equations are being developed. The results of chaos theory often are shown to possess fractal or multifractal properties. The model of de Wijs applied to gold concentration values in till was used as an example in this entry. It can be assumed that future developments of chaos theory will help to explain various forms of randomness and multifractals in the geosciences.
Chayes, Felix
Cross-References ▶ Additive Logistic Normal Distribution ▶ De Wijs Model ▶ Frequency Distribution ▶ Lognormal Distribution ▶ Multifractals ▶ Self-Organizing Maps
Bibliography Agterberg FP (2007) New applications of the model of de Wijs in regional geochemistry. Math Geosci 39(1):1–26 Agterberg FP (2014) Geomathematics: theoretical foundations, applications and future developments. Springer, Dordrecht Bak P (1996) How nature works. Copernicus, Springer, New York Beirlant J, Goegebeur Y, Segers J, Teugels J (2005) Statistics of extremes. Wiley, New York Cheng Q (1994) Multifractal modeling and spatial analysis with GIS: gold mineral potential estimation in the Mitchell-Sulphurets area, northwestern British Columbia. Unpublished doctoral dissertation, University of Ottawa Cheng Q, Agterberg FP (2009) Singularity analysis of ore-mineral and toxic trace elements in stream sediments. Comput Geosci 35: 234–244 Cheng Q, Agterberg FP, Ballantyne SB (1994) The separation of geochemical anomalies from background by fractal methods. J Geochem Explor 51(2):109–130 de Wijs HJ (1951) Statistics of ore distribution I. Geol Mijnb 13:365–375 Dubois J, Cheminée H (1991) Fractal analysis of eruptive activity of some basaltic volcanoes. J Volcanol Geotherm Res 45:197–208 Garrett RG, Thorleifson LH (1995) Kimberlite indicator mineral and till geochemical reconnaissance, southern Saskatchewan. Geological Survey of Canada Open File 3119, pp 227–253 Gupta VK, Troutman B, Dawdy D (2007) Towards a nonlinear geophysical theory of floods in river networks: an overview of 20 years of progress. In: Nonlinear dynamics in geosciences. Springer, New York, pp 121–150 Hacking I (2006) The emergence of probability, 2nd edn. Cambridge University Press, New York Lorenz EN (1963) Deterministic nonperiodic flow. J Atmos Sci 20: 130–141 Lovejoy S, Schertzer D (2007) Scaling and multifractal fields in the solid earth and topography. Nonlinear Process Geophys 14:465–502 Malamud BD, Morein G, Turcotte DL (1998) Forest fires: an example of self-organized critical behavior. Science 281:1840–1842 Mandelbrot B (1989) Multifractal measures, especially for the geophysicist. Pure Appl Geophys 131:5–42 May RM (1976) Simple mathematical models with very complicated dynamics. Nature 261:459–467 Motter E, Campbell DK (2013) Chaos at fifty. Physics Today, May issue, pp 27–33 Park NW, Chi KH (2008) Quantitative assessment of landslide susceptibility using high-resolution remote sensing data and a generalized additive model. Int J Remote Sens 29(1):247–264 Poincaré H (1899) Les methods nouvelles de la mécanique celeste. Gauthier-Villars, Paris Rundle JB, Turcotte DL, Sheherbakov R, Klein W, Sammis C (2003) Statistical physics approach to understanding the multiscale dynamics of earthquake fault systems. Rev Geophys 41:1019. https://doi. org/10.1029/2003RG000135 Sagar BSD, Murthy MBR, Radhakrishnan P (2003) Avalanches in numerically simulated sand dune dynamics. Fractals 11(2):183–193
113 Schuster HG (1995) Deterministic chaos, 3rd edn. Wiley, Weinheim Sornette A, Dubois J, Cheminée JL, Sornette D (1991) Are sequences of volcanic eruptions deterministically chaotic. J Geophys Res 96(11): 931–945 Sparrow C (1982) The Lorentz equation: bifurcations, chaos, and strange attractors. Springer, New York Turcotte DL (1997) Fractals and chaos in geology and geophysics, 2nd edn. Cambridge University Press, Cambridge, UK Uritsky VM, Donovan E, Klimas AJ (2008) Scale-free and scaledependent modes of energy release dynamics in the night time magnetosphere. Geophys Res Lett 35(21):L21101, 1–5 pages Xie S, Bao Z (2004) Fractal and multifractal properties of geochemical fields. Math Geol 36(7):847–864 Zipf GK (1949) Human behavior and the principle of least effect. Addison-Wesley, Cambridge, MA
Chayes, Felix Richard J. Howarth Department of Earth Sciences, University College London (UCL), London, UK
Fig. 1 Felix Chayes in 1970. (Reproduced with permisssion of the Geophysical Laboratory, Carnegie Institution of Washington)
Biography The quantitative petrographer (Charles) Felix Chayes was born in New York City, on 10 May 1916. He majored in geology at New York University (BA 1936), then moved to Columbia University, NY, to study the alkaline intrusives of Bancroft, Ontario, Canada (MA 1939; PhD 1942). Joining the US Bureau of Mines in 1942, he introduced grain-fragment counting to monitor the composition of quarried granite products. In 1947 he joined the Geophysical Laboratory, Carnegie Institution, Washington, DC, and studied the compositional variation of fine-grained New England granites, using “modal analysis” (i.e., “point-counting” the proportions of mineral
C
114
grains in a thin-section; Chayes 1956) and undertook similar investigations elsewhere. Advised by statisticians Joseph Cameron (1922–2000) and William Youden (1900–1971) from the National Bureau of Standards (NBS), Washington, he tried to alert geologists to problems posed by data in which “the sum of the variates is itself a variable . . . [which] is actually, or practically, constant. The effect . . . is to generate negative correlation” (Chayes 1948, p. 415). He introduced the term “closed table” for one having a constant-sum property. Realizing that major-element geochemical data caused similar ratio-correlation problems which affected both petrographic attributes and derived CIPW- and Niggli-normative compositions, he repeatedly drew attention to these difficulties thereafter. Advised by statistician William Kruskal (1919–2005) of the University of Chicago, he developed a test for the significance of such spurious correlations, but it subsequently became apparent that there were problems with it. In 1961, Spanish petrologist José Luis Brändle began work with Chayes to assemble a database of major-element analyses of Cenozoic volcanic rocks and in 1971, funded by the National Science Foundation, Chayes began to undertake data-retrieval requests. He became Chair of the International Union of Geological Sciences (IUGS) Subcommission on Electronic Databases for Petrology in 1984. With the advice of NBS statisticians Cameron, Calyampudi Rao (1920–) and Geoffrey Watson (1921–1998), Chayes and Danielle Velde (1936–; née Métais) were early users of discriminant analysis to distinguish between igneous rock types, based on their major-element chemical compositions (Chayes and Métais 1964), but later work revealed the inherent variability of such procedures when using small numerical sample sizes. Chayes (1968) also used least-squares mixing-models to estimate the composition of the unexposed “hidden zone” of the Skaergaard layered intrusion, East Greenland. Chayes and Brändle sought to enlarge their database to include both trace-element and petrographic data. This resulted in the IUGS “IGBADAT” database for igneous petrology and its support software; keeping it updated dominated Chayes’ later years – its fifth release in 1994 contained 19,519 rock analyses. After 1986, he worked part-time at the Geophysical Laboratory and at the National Museum of Natural History, Washington. He died on February 28, 1993, as the result of a car accident. A Charter Member (19681993) of the International Association for Mathematical Geology (IAMG), he was awarded its William Christian Krumbein Medal in 1984. The IAMG’s Felix Chayes Prize for Excellence in Research in Mathematical Petrology was established in his memory in 1997. For more information, see Howarth (2004).
Cheng, Qiuming
Bibliography Chayes F (1948) A petrographic criterion for the possible replacement origin of rocks. Am J Sci 246:413–425 Chayes F (1956) Petrographic modal analysis. An elementary statistical appraisal. Wiley, New York, p 113 Chayes F (1968) A least squares approximation for estimating the amounts of petrographic partition products. Mineral Petrogr Acta 14:111–114 Chayes F, Métais D (1964) On distinguishing basaltic lavas of intra- and circum-oceanic types by means of discriminant functions. Trans Am Geophys Union 45:124 Howarth RJ (2004) Not “Just a Petrographer”: The life and work of Felix Chayes (19161993). Earth Sci Hist 23:343–364
Cheng, Qiuming Frits Agterberg Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Fig. 1 Qiuming Cheng, courtesy of Prof. Cheng
Biography Professor Qiuming Cheng was born in 1960 in Taigu County, China. He received a BSc in mathematics (1982) and MSc in mathematical geology (1985) at Changchun University of Earth Sciences where he was appointed Associate Professor. In 1991 he became a PhD student at the University of Ottawa. Within 3 years, he completed his doctoral thesis (Cheng 1994)
Circular Error Probability
recognized by the university as its best doctoral dissertation for that year. The university’s citation included the sentence “Dr. Cheng has demonstrated his potential as a world-class scientist, producing results that are important not only in the geosciences but also in physics and mathematics.” This prediction was amply fulfilled. After a one-year post-doctorate fellowship at the Geological Survey of Canada in Ottawa, Dr. Cheng was awarded a professorship at York University with cross-appointment in the Department of Earth and Space Science and Department of Geography. In 2004 he also became a Changjiang Scholar in China and Director of the State Key Laboratory of Geological Processes and Mineral Resources (GPMR) which is part of the China University of Geosciences with campuses in Wuhan and Beijing. In 2017 he retired from York in order to perform research and teach fulltime at CUG Beijing. In 2019 he was elected to membership in the Chinese Academy of Sciences. Professor Cheng’s research focuses on the development of mathematical geocomplexity theory for modeling nonlinear geo-processes and quantitative prediction of mineral resources (see e.g., Cheng 2005). His pioneering work on new fractal theory and local singularity analysis (Cheng 2007) has made major impacts on several geoscientific subdisciplines including those concerned with extreme geological events originated from nonlinear processes of plate tectonics such as the formation of supercontinents, midocean ridge heat flow, earthquakes, and the formation of ore deposits (Cheng 2016). He has authored or co-authored over 300 refereed journal papers or book chapters and delivered over 100 invited papers or keynote lectures. In 1997 he obtained the President’s Award of the International Association for Mathematical Geosciences (IAMG) given to scientists aged 35 or less. In 2008, the IAMG awarded him the William Christian Krumbein medal, its highest award for lifetime achievements in mathematical geoscience, service to the profession and contributions to the IAMG of which he became the 2012–2016 President. From 2016 to 2020, Professor Cheng was President of the International Union of Geological Sciences with over a million members in its affiliated organizations.
Bibliography Cheng Q (1994) Multifractal modeling and spatial analysis with GIS: gold mineral potential estimation in the Mitchell-Sulphurets area, northwestern British Columbia. Unpublished doctoral dissertation. University of Ottawa, Ottawa Cheng Q (1999) Spatial and scaling modelling for geochemical anomaly separation. J Geochem Explor 65:175–194 Cheng Q (2005) A new model for incorporating spatial association and singularity in interpolation of exploratory data. In: Leuangthong D, Deutsch CV (eds) Geostatistics Banff 2004. Springer, Dordrecht, pp 1017–1025
115 Cheng Q (2007) Mapping singularities with stream sediment geochemical data for prediction of undiscovered mineral deposits in Gejiu, Yunnan Province, China. Ore Geol Rev 32:314–324 Cheng Q (2016) Fractal density and singularity analysis of heat flow over ocean ridges. Sci Rep 6:1–10
Circular Error Probability Frits Agterberg Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Definition Circular error probability applies to the directions of linear features in two-dimensional space that are subject to uncertainty. Statistical analysis of directional data differs from ordinary statistical analysis based on the normal distribution. Linear features are either directed or undirected. For example, paleocurrents can result in features from which it can be seen in which direction the current was flowing or they result in symmetrical features from which the direction of flow cannot be determined. A sample of n features has values that can be written as xi (i ¼ 1, 2, . . ., n). The probability P(xi) of occurrence of xi ranges from 0 to 360 (or from 0 to 2π). The system is closed because xi 360 ¼ xi 360 . Estimates of average directions are affected by closure if relatively many measurements deviate significantly from the arithmetic mean; e.g., when the standard deviation of the sample significantly exceeds 30 .
Introduction Various geological attributes may be approximated by lines or planes. Measuring them results in “angular data” consisting of azimuths for lines in the horizontal plane or azimuths and dips for lines in 3D space. Although a plane usually is represented by its strike and dip, it is fully determined by the line perpendicular to it, and statistical analyses of data sets for lines and planes are analogous. Examples of angular data are strike and dip of bedding, banding, and planes of schistosity, cleavage, fractures, or faults. Azimuth readings with or without dip are widely used for sedimentary features such as axes of elongated pebbles, ripple marks, foresets of cross-bedding and indicators of turbidity flow directions (sole markings). Then there are the B-lineations in tectonites. Problems at the microscopic level include that of finding the preferred orientation of crystals in a matrix (e.g., quartz axes in petrofabrics).
C
116
Circular Error Probability
Initially, circular data in geoscience were treated like ordinary data not subject to the closure constraint (see, e.g., analysis of variance treatment by Potter and Olson (1954). However, increasingly 2D (and 3D) directional features were treated by methods that incorporated the closure constraint of vectorial data (Watson and Williams 1956; Stephens 1962). However, Agterberg and Briggs (1963) pointed out that using the normal distribution provides a good approximation if the standard deviation is less than 30 or if the range of observations does not exceed 114 (also see Agterberg 1974, Fig. 102). The following types of mean can be estimated: AM (arithmetic mean): x ¼ 1n
n
xi i¼1
Doubling the Angles
n
VM (vector mean): g ¼ arctan
sin xi i¼1 n
cos xi i¼1 n
SVM (semicircular vector mean): g2 ¼ 12 arctan
sin 2xi i¼1 n
cos 2xi i¼1
VM and SVM apply to directed and undirected features, respectively. There is a close connection between 2D circular and 3D spherical statistics. The features taken, for example, in this chapter are observed in 2D outcrops in the field. In the first example, they are approximately in 2D, because the rock layers containing them are subhorizontal. In the second example, they are 3D but can be regarded as 2D because they resulted from geological processes that took place at significantly different times, during the Hercynian and Alpine orogenies, respectively. In a so-called marked point process, the marks are values at the points (c.f. Stoyan and Stoyan 1996 or Cressie 1991). Thus, on maps showing azimuths and dips of
Circular Error Probability, Fig. 1 Axial symmetric rose diagram of the direction of pebbles in glacial drift in Sweden according to Köster (1964). (Source: Agterberg 1974, Fig. 99)
features at observation points for the second example the dips could be regarded as marks of the azimuths, or vice versa. The body of publications dealing with statistics of orientation data is large. Introductory papers on unit vectors in the plane include Rao and Sengupta (1970). Statistical theory for the treatment of unit vectors in 2D (Fisher 1993) and 3D (Fisher et al. 1987) is well developed. The comprehensive textbook by Mardia (1972) covers both 2D and 3D applications. Pewsey and Garcia-Portugués (2020) is a more recent review pointing out that https://git.ub.com/egarpor/Directional Statsbib contains over 1700 references to publications on statistics of 2D and 3D directional features.
It is useful to plot angular data sets under study in a diagram before statistical analysis is undertaken. Azimuth readings can be plotted on various types of rose diagrams (Potter and Pettijohn 1963). If the lines are directed, the azimuths can be plotted from the center of a circle and data within the same class intervals may be aggregated. If the lines are undirected, a rose diagram with axial symmetry can be used such as the one shown in Fig. 1. Then the method of Krumbein (1939) can be used in which the average is taken after doubling the angles and dividing the resulting mean angle by two. In Agterberg (1974), problems of this type and their solutions are discussed in more detail. Krumbein’s solution is identical to fitting a major axis to the points of intersection of the original measurements with a circle. Axial symmetry is preserved when the major axis is obtained. The three types of mean unit vector listed previously correspond to the following three types of frequency distribution:
Circular Error Probability
117
AM: normal (Gaussian) with frequency distribution f n ðyÞ ¼ p1 s 2p
VM:
2
# exp 2s 2
circular
1 2pI0 ðkÞ
SVM: 1 pI 0 ðk2 Þ
normal
(von
Mises)
with
f ðyÞ ¼
expðk cos yÞ semicircular
normal
with
f 2 ð yÞ ¼
expðk2 cos yÞ
In these expressions, θ (previously denoted as x) represents the angle measured clockwise from the pole (usually the north direction); I0 (. . .) is the Bessel function of the first kind of pure imaginary argument.
Bjorne Formation Paleodelta Example The Bjorne Formation is a predominantly sandy unit of Early Triassic age. It was developed at the margin of the Sverdrup Islands of the Canadian Arctic Archipelago. On northwestern Melville Island, the Bjorne Formation consists of three separate members which can be distinguished, mainly on the basis of clay content which is the lowest in the upper member, C. The total thickness of the Bjorne Formation does not exceed 165 m on Melville Island. The formation forms a prograding fan-shaped delta. The paleocurrent directions are indicated by such features as the dip azimuths of planar foresets and axes of spoon-shaped troughs (Agterberg et al. 1967). The average current direction for 43 localities of Member C is shown in Fig. 2a. These localities occur in a narrow belt where the sandstone member is exposed at the surface. The azimuth of the paleocurrents changes along the belt. The attitude of sandstone layers is subhorizontal in the entire study area. The variation pattern (Fig. 3a) shows many local irregularities but is characterized by a linear trend. In the (U,V ) plane for the coordinates (see Fig. 3b), a linear trend surface can be written as x ¼ F(u,v) ¼ 292.133 þ 27.859u þ 19.932v (degrees). At each point on the map, x represents the tangent of a curve for the paleocurrent trend that passes through that dv point, or du ¼ tan x. It is readily shown that: du ¼ a1 adx2 tan x where a1 ¼ 27.859 and a2 ¼ 19.932. Integration of both sides gives: u¼Cþ
1 ½a1 x a2 log e ða1 cos x þ y sin xÞ a21 þ a22
with y ¼ a2 if tan x > aa12 and y ¼ a2 if tan x < aa12 . The result is applicable when x is expressed in degrees. When x 360 , the quantity 360 may be subtracted from x, because azimuths are periodic with period 360 . The constant C is arbitrary. A value can be assigned to it by inserting specific values for u and v into the equation. For example, if
u ¼ 0 and x ¼ 0, C ¼ 0.7018. It can be used to calculate a set of values for u forming a sequence of values for x; the corresponding values for v then follow from the original equation for the linear trend surface. The resulting curves (1) and (2) are shown in Fig. 3b. If the value of C is changed, the curves (1) and (2) become displaced in the x ¼ 54.4o direction. Thus, a set of curves is created that represents the paleocurrent direction for all points in the study area. Suppose that the paleocurrents were flowing in directions perpendicular to the average topographic contours at the time of sedimentation of the sand. If curve (1) of Fig. 3b is moved in the 144.4o direction over a distance that corresponds to 90 in x, the result represents a set of directions which are perpendicular to the paleocurrent trends. Four of these curves which may represent the shape of the delta are shown in Fig. 4c. These contours, which are labeled a, b, c, and d satisfy the preceding equation for u with different values of C and with x replaced by (x þ 90 ) or (x 90 ). Definite values cannot be assigned to these contours because x represents a direction and not a vector with both direction and magnitude. In trend surface analysis, the linear trend surface (also see Fig. 4a) has explained sum of squares ESS ¼ 78%. The complete quadratic and cubic surfaces have ESS of 80% and 84%, respectively. Analysis of variance for the step from linear to quadratic surface resulted in F ð3, 37Þ ¼ ð8078Þ=3 ð10080Þ=37
¼ 1:04. This would correspond to F0.60 (3,37)
showing that the improvement in fit is not statistically significant. It is tacitly assumed that the residuals are not autocorrelated as suggested by their scatter around the straight line of Fig. 3. Consequently, the linear trend surface as shown in Fig. 4a is acceptable in this situation. The 95% confidence interval for this linear surface is shown in Fig. 4b. Suppose that X is the matrix representing the input data. This is a so-called half-confidence interval with values equal to [( p þ 1)F s2 ðY Þ] with F ¼ F0.95(3,40) ¼ 2.84 and s2 ðY Þ ¼ 1 s2 X´ k X´ X X k with residual variance s2 ¼ 380 square degrees. All flow lines in Fig. 4a converge to a single line-shaped source. They have a single asymptote in common which suggests the location of a river. An independent method for locating the source of the sand consists of mapping the grade of the largest clasts contained in the sandstone at a given place. Four grades of larges clasts could be mapped in the area of study. They are: (1) pebbles, granule, coarse sand; (2) cobbles, pebbles; (3) boulders, cobbles; and (4) boulders (max. 60 cm). The size of the clasts is larger where the velocity of the currents was higher. Approximate grade contours for classes 1–4 are shown in Fig. 4c for comparison. This pattern corresponds to the contours constructed for the paleodelta.
C
118
Circular Error Probability
Circular Error Probability, Fig. 2 Variation of preferred paleocurrent direction along straight line coinciding with surface outcrop of Member C, Bjorne Formation (Triassic, Melville Island, Northwest
Territories, Canada); systematic change in azimuth is represented by trend line. (Source: Agterberg 1974, Fig. 12)
Directed and Undirected Unit Vectors
perpendicular to the AC-plane representing the main plane of motion of the rock particles. Two examples of Hercynian minor folds with clearly developed B-axes are shown in Fig. 5 (from Agterberg 1961) for quartzphyllites belonging to the crystalline basement of the Dolomites in northern Italy. Measurements on B-axes from these quartzphyllites were analyzed previously (Agterberg 1961, 1974, 2012) and are again used here.
Directional features play an important role in structural geology. Basic principles were originally reviewed and developed by Sander (1948) who associated a three-dimensional (A, B, C) Cartesian coordinate system with folds and other structures. A-axis and C-axis were defined to be parallel to the directions of compression and expansion, respectively, with the B-axis
Circular Error Probability
119
C
Circular Error Probability, Fig. 3 Reconstruction of approximate preferred paleocurrent direction and shape of paleodelta from preferred current directions, Member C, Bjorne Formation, Melville Island. (After Agterberg et al. 1967). (a) Preferred directions of paleocurrent indicators
at measurement stations; 0-isopach applies to lower part of Member C only. (b) Graphical representation of solution of differential equation for paleocurrent trends. Inferred approximate direction of paleoriver near its mouth is N54.4 E. (Source: Agterberg 1974, Fig. 13)
Stereographic projection using either the Wulff net or the equal-area Schmidt net (see, e.g., Agterberg 1974) continue to be useful tools for representing sets of directional features from different outcrops within the same neighbourhood or for directions derived from crystals in thin sections of rocks. Contouring on the net is often applied to find maxima representing preferred orientations, which also can be estimated using methods developed by mathematical statisticians, especially for relatively small sample sizes (see, e.g., Fisher et al. 1987). Reiche (1938) used the vector mean to find the preferred orientation of directional features. Using a paleomagnetic data set, Fisher (1953) further developed this method for estimating the mean orientation from a sample of directed unit vectors. The frequency distribution of unit vectors may satisfy the so-called Fisher distribution f ðy, fÞ ¼ k 4p sinh k expðk cos yÞ, which is the spherical equivalent of a two-dimensional normal distribution.
folded and their average orientation can be measured. However, elsewhere, like in the examples of Fig. 6 and in most of the Pustertal to the east of Bruneck, there are relatively many minor folds (as illustrated in Fig. 5) of which the orientation can be measured. However, then it may not be possible to measure representative strike and dip of schistosity at outcrop scale. In the parts of the area where the strike and dip of schistosity planes can be determined, there usually are B-lineations for microfolds on the schistosity planes that also can be measured. Consequently, the number of possible measurements on B-axes usually exceeds the number of representative measurements on schistosity planes. Figure 6 shows that the average strike of schistosity is fairly constant in this area but the average azimuth and dip of the B-axes is more variable. There is a trend from intermediate eastward dip of B-axes in the southern parts of the region to subvertical attitude in the north. This systematic change can also be seen on Schmidt net plots for the six measurement samples along the Gaderbach (Agterberg 1961, Appendix III). In Fig. 7, the B-axes measured along the Gaderbach are projected onto a North-South line approximately parallel to the average orientation of the TRANSALP profile. In a general way, the results of the preceding linear unit vector field analysis confirm the earlier conclusions on NorthSouth regional change in average B-lineation attitude. Original estimates of mean B-axis orientations of which some are shown in Fig. 6 were obtained by separately averaging the azimuths and dips. Table 1 shows results from another test. In Fig. 7 such average values for the six measurement samples (Method 1) are compared with ordinary unit vector means (Method 2) at mean distances south of the (Periadriatic) Pusteria Lineament near Bruneck. Differences between the two sets of mean azimuth and dip values are at most a few degrees indicating that regionally, on the average, there is a
TRANSALP Profile Example Figure 6 (based on map by Agterberg 1961) shows mean B-axes and schistosity planes in quartzphyllites belonging to the crystalline basement of the Dolomites in an area around the Gaderbach, which was part of the seismic Vibroseis and explosive transects of the TRANSALP Transect. The TRANSALP profile was oriented approximately along a north-south line across the Eastern Alps from Munich to Belluno with locally its position determined by irregular shapes of the topography (see, e.g., Gebrande et al. 2006). Structurally, the quartzphyllites are of two types: in some outcrops the Hercynian schistosity planes are not strongly
120
Circular Error Probability, Fig. 4 (a) Linear trend for data of Fig. 3a. Computed azimuths are shown by line segments for points on regular grid, both inside and outside exposed parts of Member C; flow lines are based on curves 1 and 2 of (c). (b) Contour map of 95% half-confidence interval for A; confidence is greater for area supported by observation points. (c) Estimated topographic contours for delta obtained by shifting
Circular Error Probability
curve 1 of Fig. 3b, 90 in the southeastern direction (¼ perpendicular to isoazimuth lines), and then moving it into four arbitrary positions by changing the constant C. Contoured grades of largest clasts provide independent information on shape of delta. (Source: Agterberg 1974, Fig. 41)
Circular Error Probability
121
C
Circular Error Probability, Fig. 5 Two examples of minor folds in the Pustertal quartzphyllite belt. Left side: East-dipping minor folds near Welsberg (Monguelfo). Right side: ditto along TRANSALP profile near Circular Error Probability, Fig. 6 Schematic structure map of surroundings of TRANSALP profile (TAP) in tectonites of crystalline basement of the Dolomites near Bruneck. Arrows represent average azimuths of B-axes for measurement samples. Average dips of B-axes are also shown. Average s-plane attitudes are shown if they could be determined with sufficient precision. MS: Measurement Sample as in Agterberg (1961, Appendix II). PI: Permian Intrusion; PL: Periadriatic Lineament. Coordinates of grid are as on 1:25,000 Italian topographic maps. (Source: Agterberg 2012, Fig. 5)
subvertical contact with Permotriassic (grid coordinates 206–767; part of MS112, see Fig. 8). (Source: Agterberg 2012, Fig. 1)
122
Circular Error Probability 350
300
Angle (degrees)
250
200
150
100
50
0 0
2
4
6
8
10
12
14
Distance (km) South of Periadriatic Lineament near Bruneck Circular Error Probability, Fig. 7 Azimuths (diamonds) and dips (triangles) of B-axes from six measurement samples along Gaderbach shown in Fig. 6. Note that along most of this section, the B-axes dip
nearly 50 East. Closest to the Periadriatic Lineament, their dip becomes nearly 90 SSW (also see Table 1). (Source: Agterberg 2012, Fig. 6)
Circular Error Probability, Table 1 Comparison of average orientations obtained by two methods. MS: measurement sample; distance in km south of Insubric Line; Method 1: average azimuth/dip; Method 2: Fisher azimuth/dip; Azimuth and dip are given in degrees
indeterminate. The northward steepening of attitudes of s-planes in the northern quartzphyllite belt to the West of Bruneck from gentle southward dip near Brixen is probably due to neo-Alpine shortening. The northward steepening of the B-axes near Bruneck, which on average are contained within the s-planes, could be due to Neo-Alpine sinistral movements along the Periadriatic Lineament and south of the Permian (Brixen granodiorite) intrusion.
MS # 92 95 112 96 111 112
Distance 2.38 4.38 7.08 8.68 10.58 11.88
Method 1 217/77 104/58 101/51 86/39 113/52 104/43
Method 2 220/78 103/58 100/52 85/41 113/53 102/44
dip rotation from about 45 dip to the east in the south to about 80 dip in the north accompanied by an azimuth rotation of about 120 from eastward to WNW-ward. Widespread occurrence of steeply dipping B-axes near the Periadriatic Lineament and along the southern border of the Permian (Brixen granodiorite) intrusion, however, is confirmed by results derived from other measurement samples. The rapid change in average azimuth that accompanies the B-axis steepening is probably real but it should be kept in mind that the azimuth of vertical B-axes becomes
Summary and Conclusions Circular error probability is associated with two-dimensional directional features that are observed in different types of rock. Examples in sedimentary rocks include cross-bedding, ripple marks and features associated with turbidity currents. Crystals in igneous rocks may display preferred orientations. Microfolds and minor folds in structural geology provide other examples. Different procedures must be used for directed and undirected features, especially in sedimentology. Elongated pebbles in Sweden were used to illustrate one type of statistical method to treat undirected features. There is a
Cloud Computing and Cloud Service
close association between spherical and circular error probability. Directional cross-bedding in sandstones of the Early Triassic Bjorne Formation in the Canadian Arctic was used to reconstruct the shape of a paleodelta with approximate location of the river associated with it. In structural geology the preferred orientations of minor folds in the Hercynian basement of the Dolomites in Northern Italy were taken as an example. Although these features are three-dimensional, their azimuths and dips can be analyzed separately by using a marked 2D model. The Hercynian minor folds were subjected to rotations during the Alpine orogeny.
123 Reiche P (1938) An analysis of cross-lamination of the Coconino sandstone. J Geol 46:905–932 Sander B (1948) Einfűrung in die gefűgekunde der geologischen Kőrper, vol I. Springer, Vienna, p 215 Stephens MA (1962) The statistics of directions – the Fisher and von Mises distributions. Unpublished PhD thesis, University of Toronto, Toronto, p 211 Stoyan D, Stoyan H (1996) Fractals, random shapes and point fields. Wiley, Chichester, p 389 Watson GS, Williams EJ (1956) On the construction of significance tests on the circle and sphere. Biometrika 43:344–352
Cloud Computing and Cloud Service Cross-References ▶ Trend Surface Analysis
Liping Di and Ziheng Sun Center for Spatial Information Science and Systems, George Mason University, Fairfax, VA, USA
Bibliography Synonyms Agterberg FP (1961) Tectonics of the crystalline basement of the Dolomites in North Italy. Kemink, Utrecht, p 232 Agterberg FP (1974) Geomathematics – mathematical background and geo-science applications. Elsevier, Amsterdam, p 596 Agterberg FP (2012) Unit vector field fitting in structural geology. Zeitschr D Geol Wis 40(4/6):197–211 Agterberg FP, Briggs G (1963) Statistical analysis of ripple marks in Atokan and Desmoinesian rocks in the Arkoma Basin of east-central Oklahoma. J Sediment Petrol 33:393–410 Agterberg FP, Hills LV, Trettin HP (1967) Paleocurrent trend analysis of a delta in the Bjorne Formation (Lower Triassic) of northeastern Melville Island, Arctic Archipelago. J Sediment Petrol 37:852–862 Cressie NAC (1991) Statistics for spatial data. Wiley, New York, p 900 Fisher RA (1953) Dispersion on a sphere. Proc R Soc Lond Ser A 217: 295–305 Fisher NI (1993) Statistical analysis of circular data. Cambridge University Press, Cambridge, UK, p 277 Fisher NI, Lewis T, Embleton BJJ (1987) Statistical analysis of spherical data. Cambridge University Press, Cambridge, UK/New York, p 316 Gebrande H, Castellarin A, Lűschen E, Millahn K, Neubauer F, Nicolich R (2006) TRANSALP – a transect through a young collisional orogen: introduction. Tectonophysics 414:1–7 Kőster E (1964) Granulometrische und Morphometrische Messmethoden an Mineralkőrnern, Steinen und sőnstigen Stoffen. Enke, Stuttgart, p 336 Krumbein WC (1939) Preferred orientations of pebbles in sedimentary deposits. J Geol 47:673–706 Mardia KV (1972) Statistics of directional data, Probability and mathematical statistics. Academic, London, p 358 Pewsey A, Garcia-Portugués E (2020) Recent advances in directional statistics. Test 30:1–61 Potter PE, Olson JS (1954) Variance components of cross-bedding direction in some basal Pennsylvanian sandstones of the Eastern Interior Basin. J Geol 62:50–73 Potter PE, Pettijohn FJ (1963) Paleocurrents and basin analysis. Academic, New York, p 296 Rao JS, Sengupta S (1970) An optimum hierarchical sampling procedure for cross-bedding data. J Geol 78(5):533–544
API: application programming interface; AWS: Amazon Web Service; CPU: central processing unit; DevOps: development and operations; EC2: Elastic Computing Cloud; GPU: graphical processing unit; IRIS: Incorporated Research Institutions for Seismology; IT: information technology; PC: personal computer; RAID: redundant array of independent disks; REST: representational state transfer; S3: Simple Storage Service; SSH: Secure Shell
Definitions Cloud computing:
Cloud service:
A technique that virtualizes the IT resources and delivers them to users over the Internet with pay-as-you-go pricing and on-demand basis. The commodity services delivered by cloud computing vendors to end-point users, including the storage, server, application, and networking.
Background The exploding volume of Earth observations is driving scientists to seek for better schemes of computing and storage. Large-scale and high-performance computing is more necessary in operational use today than 10 years ago. Many institutes and commercial companies are working intensively to provide such computing capabilities and make them available
C
124
to scientists. Cloud has become one of the major computing resource providers in geosciences (Hashem et al. 2015). This entry reviews the existing cloud platforms, cloud-native services, and the schemes of using cloud resources in geoscientific research. This entry offers insights on how to correctly leverage clouds in geoscientific research and how the clouds and geosciences should thrive together in the future.
What Is Cloud Computing? The leading cloud provider, Amazon Web Service, explains it in this way: “Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go pricing. Instead of buying, owning, and maintaining physical data centers and servers, you can access technology services, such as computing power, storage, and databases, on an as-needed basis from a cloud provider” (https://aws. amazon.com/what-is-cloud-computing/). The definition shows two major reasons that are backing the requirements for cloud computing: server costs and on-demand needs. For example, if a geoscientist wants to serve a valuable dataset to the public, in old days the scientist will have to buy a server and plant it in a data center and pay the data center for the costs on electricity, network, air conditioning, room space, racks, and on-site staff (Dillon et al. 2010). Meanwhile, all the risks are on the scientist, such as disk failures, network interruption, vulnerability fix, Internet attack, etc. In a while, the scientist will find the research team in an awkward situation working significantly on disk replacing, network testing, and vulnerability fixing. As the owner of the server, they need to pay a lot to survive the server online or off-line. Even worse, many servers in universities’ data centers are never running on 100% of capability. Most of the computing hours are wasted with servers running idle, which is a big waste of the research resources and results in a serious unbalance between research hours and server maintenance hours. To reverse the unbalance, cloud computing was proposed, implemented, and instantly gained popularity in server administrators. Compared to the old routine of server-hosting businesses, cloud computing provides a new paradigm by adding a virtualization layer on top of the physical hardware layer. The virtualization layer is powered by the virtual machine technique which can create multiple new virtual machines from one physical machine. The virtual machines use the host’s CPU, memory, and disks, but running separately. From the user perspective, the virtual machines have no differences from a real physical machine. Users can do everything they did on a physical machine on the virtual machines. In cloud computing, each virtual machine is called an instance. In cloud computing, users only interact with instances and do not have access to the physical hardware level. Cloud data centers host tens of thousands of servers,
Cloud Computing and Cloud Service
link them into cloud clusters, sell the computing hours to the public by creating separated virtualized instances, and provide web-based interfaces or remote-messaging channels (such as SSH and RESTful API) for the users to manipulate the instances remotely (users normally cannot access the data centers physically). The bills will be calculated based on the occupied computing hours and the instance type. The rates (cost per hour) vary across the types (configuration of CPU, GPU, memory, storage, and network) and the location of the data centers. For example, the rate for AWS EC2 m5.xlarge at US East (N.Va) data center is $0.192 per hour, the rate for the same instance at Asia Pacific (Sydney) data center is $0.24 per hour (https://aws.amazon.com/emr/pricing/). Overall, cloud computing is a scheme in which the computing resources are pooled together and disseminated in the form of virtual machines over the Internet to reduce the labor and financial burdens on both the owners and users of the IT resources. Thinking of the computing resources as commodities, the cloud providers are sellers and the users are consumers. Cloud services denote all the service commodities provided by cloud platforms. To better commercialize the clouds, cloud providers have made many innovative developments to customize their services to fit in users’ scenarios and further reduce the workloads on the user side. This customization categorizes cloud computing into several types (Armbrust et al. 2010): • Infrastructure as a Service (IaaS) (Bhardwaj et al. 2010) offers complete instance machines including all computing resources including data storage, servers, applications, and networking. Users are responsible for managing all four kinds of resources. Examples: AWS EC2, Google Compute Engine, Digital Ocean, Microsoft Azure, and Rackspace. • Platform as a Service (PaaS) (Pahl 2015) offers a programming language execution environment associated with a database. It encapsulates the environment where users can build, compile, and run their programs without worrying about the underlying infrastructure. In this service model, users only manage software and data, and the other resources are managed by the vendors. Examples: AWS Elastic Beanstalk, Engine Yard, and Heroku. • Software as a Service (SaaS) (Cusumano 2010) offers pay per use of application software to users. Users do not need to install software on PC and access the software via a web browser or lightweight client applications. Examples: Google ecosystem Apps (Gmail, Google Docs), Microsoft Office365, Salesforce, Dropbox, Cisco WebEx, Citrix GoToMeeting, and Workday. There are many more types, such as Artificial Intelligence as a Service, Hadoop as a Service, Storage as a Service, Network as a Service, Backend as a Service, Blockchain as a Service, Database as a Service, Knowledge as a Service,
Cloud Computing and Cloud Service
etc. (Tan et al. 2016). Users should consult cloud experts to analyze which cloud computing model is the most suitable for their use cases. An appropriate selection would save significantly in costs while gaining more from the investments.
Using Cloud in Geosciences Geoscientists on Cloud The paradigm of cloud computing makes it possible for scientists to only buy necessary computing hours and storage space instead of owning entire real servers, and thereafter get rid of all the hassles in server maintenance. With cloud computing, the scientist could worry less about the tedious technical issues, and pay for the resources that are used on demand and avoid the constant waste of idle servers. It is an evolution in scientific research efficiency and boosts the utilization ratio of the computing resources. Besides, it is easier to scale up the scientific models to a larger extent on big data, which was impossible before for many scientists in the old routine of computing schemes. Similar to the other sharing economy schemes, cloud computing takes advantage of its vast number of users to average the huge hardware maintenance costs and lower the bills on each user. Geoscientific Use Cases Geosciences have numerous study cases that meet the scenario requirements of cloud computing, e.g., earthquakes, landslides, land cover land use, volcano, rainfall, groundwater, snow, ice, climate change, hurricane, agriculture, wildfire, air quality, etc. Each research domain has undergone many years of observation activities and accumulated a lot of datasets. For example, IRIS (incorporated research institutions for seismology) Data Management Center (DMC) has operated a public repository of seismological data of three decades. IRIS offers a wide and growing variety of services that Earth scientists in over 150 countries worldwide reply on through web services. IRIS DMC archive is nearing one petabyte in volume. The funding of IRIS is mainly coming from the National Science Foundation, and the annual cost is around $30 million (https://ds.iris.edu/ds/newsletter/vol20/ no2/498/geoscicloud-exploring-the-potential-for-hosting-ageoscience-data-center-in-the-cloud/). A significant amount is spent on maintaining the data repository to serve all the downloading requests from all over the world. Cloud computing is considered as a promising solution to significantly lower the data-hosting costs and put more funding on datacollection and data-mining activities. Cloud-Based Data Storage Most use cases of cloud in geosciences focus on storage, such as storing spatial data that can be accessed remotely. For big spatial data, cloud storage provides a more elastic, efficient,
125
robust storage service than conventional storage plans. The disks on the servers are consumable products that will certainly fail after reading and writing data onto the disks for a certain number of times (e.g., tens of million times). The standard warranty of a normal hard disk is about 5 years. RAID (Redundant Array of Independent Disks) is a common practice to ensure data security when some disks reach the end of life. RAID creates a pool of disks and stores the data scattered among all the disks instead of relying solely on one disk. Once a disk is detected with alert status, the data on it will be automatically transferred to other healthy disks. Some RAID (e.g., RAID-5) will store all the data duplicated, so when some disk regions are damaged, it can recover the data from the other copy. However, RAID will run through all the disks in every I/O operation which significantly decreases the average life of the disks. RAID needs to replace the disks in alert status as quickly as possible, otherwise, all the data will be lost if the percentage of failed disks reaches a certain ratio. Scientists might be exhausted to do the routine RAID check. Cloud data centers have on-site engineers who can do the disk check and replacement. Data is safer in the cloud than in individual small data centers that are less attended. Cloud-Based Data Analysis With the continuous efforts by the generations of scientists working on deploying sensors and building data repositories, geoscientists have many direct channels to get the required data to support their research. The fact today is that geoscientists do not lack data in most times. The real missing piece is the capability to allow scientists to manipulate the data in a time-wise and cost-wise manner. Artificial intelligence and machine learning algorithms are being used in geospatial analysis and run against spatial data to automate the processing and reveal insights (Sun et al. 2020). Spatial analysis, such as network analysis, clustering, pattern extraction, classification, and predictive modeling, poses more intelligence for information extraction. However, intelligent algorithms come with higher complexity which requires a more efficient solution to process the big amount of data. For example, it takes about 10 min for a PC to do a buffer process on the road network of Washington D.C. but takes weeks to buffer the road network of the entire world. The time cost increases at an exponential rate over the volume of the input datasets. Big data processing is one of the major applications of clouds. Cloud can create multiple instances that work simultaneously on the same task. Each instance processes one portion of the dataset (Map), and a reduce process will summarize the results from the instances into one result (Reduce). This parallel processing scheme is called MapReduce. Typical Map Reduce software includes the most widely used Apache Hadoop, Apache Spark, and Google Earth Engine (GEE) (Gorelick et al. 2017). For example, GEE is a cloud-based platform for planetary-scale geospatial analysis that brings
C
126
Google massive computational capabilities to process tremendous amounts of remote-sensing data products to help scientists to study deforestation, drought, disaster, disease, food security, ocean, water management, climate, and environment protection. These cloud-based big data-processing platforms have been heavily utilized and driven a series of significant discoveries in geoscience communities recently.
Challenges and Opportunities Turning Geoscientific Applications to Cloud Native Cloud native is an approach to building and running applications that are small, independent, and loosely coupled. Cloudnative applications are purposefully built for the cloud model to take advantage of the full benefits of clouds to enable resilience and flexibility of the application. Cloud-native app development typically includes DevOps, Agile methodology, microservices, cloud platforms, containers such as Kubernetes and Docker, and continuous delivery. However, the existing applications of geosciences are not cloud native. Most of them need wrapping and reprogramming, which may cause a series of compatibility issues between the traditional application environment and the container environment. Geoscientific applications are large programs that are not suitable to be packaged as microservices and need a breakdown. Geoscientific workflows should be itemized into several small-granule processes and run on the clouds with scalability and elasticity (Sun et al. 2019). Managing Cloud Expenses In most cases, cloud computing could save money. Instead of a onetime payment, the cloud cost is recurring based on usage. For applications with big data and a lot of users, more cloud resources will be used to cope with the huge data volume and the increasing requests. Correspondingly, the cost will rise as well and might make the entire bill unaffordable. For short-lived or small-scale projects, it is expensive to transfer all the data into public clouds. Using clouds needs to spend optimization which aims to generate the largest possible saving with predictive analytics and actionable resource purchasing recommendations. It is financially wise to automatically identify idle resources, unused instances, and shut them down to save costs. The cloud platforms usually provide cost optimization options by financial analytics and governance policies for automatically releasing the unused resources and decreasing the costs. Data Security and Privacy This is the top concern of most users on the cloud. Uploading data into the cloud feels like turning in the control over the data to a third party. This is true from a certain perspective, as
Cloud Computing and Cloud Service
the data is physically transferred from users’ machines to some machines owned by another institute. But from an engineering perspective, the cloud-based storage is more secure than local disks. Besides the common credential protection (password), the state-of-the-art cloud storage uses professional security methods such as firewalls, traffic monitoring, event logging, encryption, and physical security to keep the data private even though they are on a publicly accessible server. Another benefit of using the cloud is easy updating to fix vulnerabilities. Users do not need to worry about occasionally upgrading their PC systems to fix the newly discovered vulnerabilities. It is also smoother to apply the new security technology provided by cloud vendors on the fly. On the other side, users must follow the vendors’ security guidelines, and negligent use could compromise even the best protection. Governance In industries, cloud computing governance creates businessdriven policies and principles that establish the appropriate degree of investments and control around the life-cycle process for cloud services (http://www.opengroup.org/cloud/ gov_snapshot/p3.htm). It ensures that all the sponsored activities are aligned with the project objective. For geosciences, one fundamental measure in governance is whether the activity directly contributes to problem-solving. The priority should be categorized based on the job significance in the entire workflow. In addition, the governance of the cloud is also an implementation of project management policies. The management models and standards for team members and the end-point users are going to be enforced in the form of a cloud computing governance framework. Coordinating the team members to work simultaneously on the cloud is another challenge facing the geoscientists and also an important research area for cloud vendors to innovate and advance their cloud services. Private Cloud In reality, many scientists will not use the public clouds out of various reasons (cost and trust); however, they would like to use the cloud features within their institutes. A private cloud would be the answer. Building a cloud is similar to building a cluster or supercomputer, all of which need multiple servers (more than one), including one management server and several client/slave servers. All the servers are connected via high-speed networks to pass messages among them. Operating systems of the member servers are mostly Linux-based, even in Microsoft Azure, because of its better support for virtualization hypervisor (software used for creating virtual machines). The cloud platform software includes OpenStack, CloudStack, Eucalyptus, and OpenNebula. Among them, OpenStack is the choice of most industrial companies, e.g.,
Cluster Analysis and Classification
RedHat, IBM, Rackspace, Ubuntu, SUSE, etc. The installation of OpenStack is complicated. For scientific teams, it is better to consult cloud professionals before installing the servers.
127
Cluster Analysis and Classification Antonella Buccianti and Caterina Gozzi Department of Earth Sciences, University of Florence, Florence, Italy
Summary This entry introduces the concept of cloud computing and the situation of using clouds in geosciences. Cloud computing has been proven as an effective solution to lift the bar for scientists to access and process the big geoscientific datasets. More and more use cases of cloud computing in geosciences are expected soon. There are several associated challenges with the transition from conventional computing paradigm to cloud, including transforming the old geoscientific models into cloud-native applications, managing cloud expenses over time, ensuring data security and privacy within the cloud data centers, coordinating the team efforts to work simultaneously on the cloud, and creating private clouds for scientists to integrate computing resources and serve the users within the institutions. Cloud vendors could take on these challenges to advance their cloud services to further extend their collaboration with geoscientists and assist them to solve critical scientific problems that are insolvable before.
Bibliography Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I (2010) A view of cloud computing. Commun ACM 53:50–58 Bhardwaj S, Jain L, Jain S (2010) Cloud computing: a study of infrastructure as a service (IAAS). Int J Eng Inf Technol 2:60–63 Cusumano M (2010) Cloud computing and SaaS as new computing platforms. Commun ACM 53:27–29 Dillon T, Wu C, Chang E (2010) Cloud computing: issues and challenges. In: Proceedings of 24th IEEE international conference on Advanced Information Networking and Applications (AINA), pp 27–33 Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R (2017) Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sens Environ 202:18–27 Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of “big data” on cloud computing: review and open research issues. Inf Syst 47:98–115 Pahl C (2015) Containerization and the paas cloud. IEEE Cloud Comput 2:24–31 Sun Z, Di L, Cash B, Gaigalas J (2019) Advanced cyberinfrastructure for intercomparison and validation of climate models. Environ Model Softw 123:104559 Sun Z, Di L, Burgess A, Tullis JA, Magill AB (2020) Geoweaver: advanced cyberinfrastructure for managing hybrid geoscientific AI workflows. ISPRS Int J Geo Inf 9:119 Tan X, Di L, Deng M, Huang F, Ye X, Sha Z, Sun Z, Gong W, Shao Y, Huang C (2016) Agent-as-a-service-based geospatial service aggregation in the cloud: a case study of flood response. Environ Model Softw 84:210–225
C Definition Cluster analysis and classification – a family of data analysis methods that attempt to find natural groups clearly separated from each other, thus giving indications about potential classification rules to be exploited in discriminant analysis procedures.
Historical Material Our numerical world produces masses of data every day in earth sciences as well as in several other fields of investigation. The analysis of this big available data offers the possibility to exploit unknown paths leading to important advances in knowledge. However, the management of highdimensional data, with the aim to extract as much information as possible, needs adequate methods and procedures. Cluster analysis and classification is one of the most interesting fields of statistical analysis both as explorative tool of the data grouping structures and model-based techniques for classification. The aim of cluster analysis is to find meaningful groups in data internally coherent and well separated from others. The members of the groups joint some common features in the multivariate space, and this peculiarity is enhanced by using appropriate similarity/dissimilarity measures. The grouping of cases in cohesive groups dates back to Aristotle in his History of Animals in 342 B.C., while in natural and taxonomical sciences, a typical example is referred to Linnaeus (1735) for plant and animals. The systematic use of numerical methods in searching for groups in quantitative data refers to anthropological studies (Czekanowski 1909) where the measures of similarity between objects began to be the focus. In the 1950s, several advancements were proposed by considering the way of linking cases in a hierarchical procedure of aggregation whose final graphical result was the dendrogram (a tree diagram that shows the hierarchical relationships between objects). In the 1960s, a turning point is represented by the publication of the book of Sokal and Sneath (1963), and some recent interesting developments concern the modelbased clustering where each group is modeled by its own probability distribution, thus obtaining a finite mixture model (Bouveyron et al. 2019). The application of cluster analysis for compositional data has also been discussed
128
Cluster Analysis and Classification
considering the constrained working sample space (Aitchison 1986) and the implications to manage distances inside it (Templ et al. 2008). Compositional data, in fact, describe parts of the same whole and are commonly presented as vectors of proportions, percentages, or concentrations. Since they are expressed as real numbers, people are tempted to interpret or analyze them as real multivariate data, leading to paradoxes and misinterpretations. When applied to geological and environmental data, cluster analysis can be used to group the variables, with the aim to highlight the correlation structure of the data matrix or to cluster cases into homogeneous subsets for subsequent analysis (e.g., discriminant analysis or mapping): these two different ways are known in literature as R-mode and Q-mode, respectively.
dominate the others. Data normalization is a common solution to overcome this problem. Average distance is a modified version of the Euclidean distance developed to try to solve the previous problems. For two data points x and y in a n-dimensional space, the average distance is defined as:
dave ¼
1 n
1=2
n
ðxi yi Þ
2
ð2Þ
i¼1
The weighted Euclidean distance is instead defined as: 1=2
n
dwe ¼
2
w i ðx i y i Þ
ð3Þ
i¼1
Essential Concepts and Applications In cluster analysis, there are basically two different paths noting if the case is allocated to just one cluster (hard clustering) or if it is distributed among several clusters by considering a certain degree of association (fuzzy clustering, Dumitrescu et al. 2000). The input of most of the hierarchical clustering algorithms is a distance matrix (distances between observations) n n generated from an input original matrix n p (n ¼ number of cases, p ¼ number of variables). Many different distance measures can be considered as a measure of similarity between cases in the multivariate space. An interesting deepening can be found in Shirkhorshidi et al. (2015). The Minkowski family includes the Euclidean and Manhattan distances, which are particular cases of the Minkowski distance. The latter is defined by: 1=m
n
dmin ¼
jxi yi j
m
ð1Þ
i¼1
with m 1 and xi and yi two vectors in n-dimensional space. The distance performs well when the data clusters are isolated or compacted, otherwise the large-scale attributes dominate the others. The Manhattan distance is a special case of the Minkowski one at m ¼ 1 and like the previous one is sensitive to outliers. The Euclidean distance is a special case of the Minkowski one when m ¼ 2 and similarly to this one performs well in the same conditions. It is very popular in cluster analysis, but it has some drawbacks. In fact, even if two data vectors have no attribute values in common, it might occur that they have a smaller distance than the other pair of data vectors containing the same attribute values. Another problem with Euclidean distance, as a family of the Minkowski metric, is that the largest-scaled feature could
where wi is the weight given to the ith component to be used if the information of the relative importance according to each attribute is available. Chord distance was proposed to overcome the problems of the scale of measurements and the critical issues of the Euclidean one. It is defined as: 1=2
n
xi yi dchord ¼
22
i¼1
,
kx k2 ky k2
where kxk2 is the L2 norm kxk2 ¼
n i¼1
ð4Þ
x2i .
Mahalanobis distance, in contrast with the previous ones, is a data-driven measure, and it is used to extract hyperellipsoidal clusters tone down distortion caused by linear correlation among features. It is defined by: dmah ¼
ðx yÞS1 ðx yÞT
ð5Þ
where S is the covariance matrix of the data set. The cosine similarity measure is given by: n
xi yi cos xy ¼
i¼1
kx k2 ky k2
ð6Þ
where kyk2 is the Euclidean norm of vector y ¼ (y1, y2, . . ., yn) defined as kyk2 ¼ y21 þ y22 þ . . . þ y2n . This measure is invariant to rotation but not to linear transformations. It is also independent of the vector length. When the target of the analysis is the clustering of variables, a similarity matrix p p will be generated by the n p original one, by choosing a correlation measure.
Cluster Analysis and Classification
129
The Pearson correlation coefficient is widely used, even if it is very sensible to outliers; it is defined by: n
r xy ¼
ð x i mx Þ y i m y
i¼1 n
ðx i y i Þ
i¼1
2
n
ð7Þ ðx i y i Þ
2
i¼1
where mx and my are the means for x and y, respectively. Agglomerative techniques start considering that each sample forms its own cluster then enlarging the clusters stepwise with new contributions, thus linking the most similar groups at each step. The process ends when there is one single cluster containing all the cases. On the other hand, divisive clustering starts with one cluster containing all samples and then successively splits this into groups in subsequent steps. This reverse procedure generally requires intensive calculus. To link clusters in the agglomerative procedure, several different methods exist. The best known are average linkage, complete linkage, and single linkage (Kaufman and Rousseeuw 2005). The average linkage method considers the averages of all pairs of distances between the cases of two clusters. The two clusters with the minimum average distance are combined into one new cluster. Complete linkage
(or farthest-neighbor method) looks for the maximum distance between the samples of two clusters. The clusters with the smallest maximum distance are combined. Single linkage (or nearest-neighbor method) considers the minimum distance between all samples of two clusters. The clusters with the smallest minimum distance are thus linked. The dendrograms obtained with the previous linking methods will show some differences. The single linkage will give cluster chains, complete linkage very homogeneous clusters in the early stages of agglomeration, and small resulting clusters, average linkage a compromise solution between the two mentioned methods. The Ward method (Kaufman and Rousseeuw 2005) is an alternative to the single link procedure able to create compact clusters, even if computationally intensive. Instead of measuring the distance directly, it monitors the variance of clusters during the merging process, keeping the growth as small as possible, by using the within-group sum of squares as a measure of homogeneity. In the dendrogram, the horizontal line indicates the link between cases in clusters, whereas the vertical axis is related to the similarity, that is, to the chosen measure of distance/ correlation. The axes can be inverted without problems. The linking of two groups at a large height indicates strong dissimilarity and vice versa. If the dendrogram is cut at a given
Cluster Analysis and Classification, Fig. 1 Dendrogram for the chemical composition of water data sampled in a sedimentary aquifer (Minkowski distance and complete linking method)
30 29 10 9 23 22 26 28 7 13 8 6 17 12 4 18 3 5 25 24 16 15 27 2 1 21 11 20 19 14 500
400
300
200
100
0
C
130
Cluster Analysis and Classification
height, the assignation of the cases to a distinct cluster can be obtained and, eventually, their spatial distribution can be visualized. To demonstrate the procedure, a set of samples defining the chemical composition of an aquifer mainly developed in a sedimentary context has been chosen. The n p basic matrix was given by n ¼ 30 cases and p ¼ 7 variables, representing the main chemical components: Na+, K+, Ca2+, Mg2+, HCO3, Cl, and SO42, all expressed in mg/L. The dendrograms based on the cited anions and cations are reported in Figs. 1, 2, and 3 for the similarity matrix n n considering the same similarity metric (Minkowski) but different linkage methods (complete, single, and average). The numbers on the vertical and horizontal axes are related to the label of the cases and to the measure of closeness of either individual data points or clusters. The figures highlight the differences in using different linkage methods due to the hierarchical way used to construct the clusters and to associate cases to them, step by step, as previously reported. However, some groups of data appear to be well characterized and homogeneous, while others (e.g., 22 and 23, 9, 10, 29 and 30) need more inspection to verify the origin of their isolated behavior, as reported in Table 1. At this stage of the investigation, the possibility to map data will help
to associate the presence of the clusters with possible environmental drivers such as lithology, depth of the well, climate, giving interesting information about processes, and forcing variables. In the contest of geoscience, cluster analysis may be very useful to unravel the complexity of nature and its laws, opening paths for classification rules of unknown cases, thus representing the explorative basic tool of discriminant analysis. When the aim of the investigation is the analysis of the relationships among variables, the Pearson correlation matrix p p can represent the basis for cluster analysis. Results are reported in Fig. 4 by using the average linkage method. No appreciable differences were obtained changing the similarity measure (e.g., the sample Spearman rank correlation between variables) or the linking method. The cluster analysis enhances the relationships characterizing water chemistry revealing a strong link among Na2+ and HCO3 and a more isolated behavior of K+, perhaps related to its complex biogeochemical cycle and possible anthropical contribution due to the extensive use of fertilizers in agriculture. The identification of groups of variables helps in evaluating which type of water-rock interaction processes can be considered in the investigated sedimentary context; for example, the
Cluster Analysis and Classification, Fig. 2 Dendrogram for the chemical composition of water data sampled in a sedimentary aquifer (Minkowski distance and single linking method)
23 22 10 9 30 29 13 8 6 28 7 26 12 17 4 18 3 5 11 25 24 15 16 27 2 1 21 20 19 14 200
180
160
140
120
100
80
60
40
20
0
Cluster Analysis and Classification
131
Cluster Analysis and Classification, Fig. 3 Dendrogram for the chemical composition of water data sampled in a sedimentary aquifer (Minkowski distance and average linking method)
30 29 10 9 23 22 26 28 7 13 8 6 12 17 4 18 3 5 25 24 16 15 27 2 1 11 21 20 19 14 300
250
relationship among SO42 and Ca2+ invites to check for the presence of sulfatic rocks while that between HCO3 and Na+ to explore the role of cation exchange in the presence of clays. In contrast to hierarchical clustering methods, partitioning methods require to a priori know the number of clusters (Kaufman and Rousseeuw 2005). A popular partitioning algorithm is the k-means algorithm that works minimizing the average of the squared distances between cases and their cluster centroids. Starting from k initial cluster centroids, the algorithm assigns the observations to their closest centroids using a given distance measure. Then, the cluster centroids are recomputed and iteratively the reallocation of the data points to the closest centroids performed. In this framework, Manhattan distance is used for k-medians when the centroids are defined by the medians of each cluster rather than their means. Several clustering methods have been proposed by Kaufman and Rousseeuw (2005) and are contained in several software packages (SPSS, Matlab, R) as the PAM (Partitioning Around Medoids) method that minimizes the average distances to the cluster medians, like the k-medians, and the CLARA (Clustering Large Applications) based on random sampling to reduce computing time and RAM storage problems.
200
150
100
50
0
Two different concepts of validity criteria need to be considered to evaluate the performance of a clustering procedure. External criteria compare the obtained partition with a partition that is known a priori and several indices can be used to do it (Haldiki et al. 2002). On the other hand, internal criteria are cluster validity measures that evaluate the clustering results of an algorithm by using only quantities and features inherent in the data set based on within and between cluster sums of squares, analogous to the multivariate analysis of variance (MANOVA), as the average silhouette width of Kaufman and Rousseeuw (2005).
Data Problems in the Context of Cluster Analysis In multi-element analysis of geological materials, elements occur in very different concentrations so that, when methods exploring the variance-covariance structure are used, the variable with the greatest variance will have the greatest effect in the outcome. Consequently, variables expressed in different units should not be mixed or, as an alternative, the use of adequate transformation and standardization techniques (i.e., centering and scaling the data by using also a robust
C
132
Cluster Analysis and Classification
Cluster Analysis and Classification, Table 1 Chemical composition in mg/L of the analyzed samples. The colored rows are related to cases that in cluster analysis have given isolated groups (22 and 23, 9, 10, 29 and 30, Figs. 1, 2, and 3)
Cases
Ca2+
Mg2+
Na+
K+
HCO3 -
Cl-
SO 4 2-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
122 124 97.7 101 177 40.3 91.4 55.0 24.6 24.6 123 67.7 19.9 152 142 136 93.2 96.5 150 156 142 45.2 38.0 161 165 98.5 127 76.9 102 73.0
10.6 10.0 32.2 31.0 14.9 12.4 28.7 14.5 8.70 7.80 18.9 17.2 5.80 11.2 11.0 9.50 25.5 25.0 13.3 13.0 14.4 10.5 8.60 9.00 10.0 48.9 8.35 30.9 52.0 46.0
29.5 25.0 81.0 76.5 34.4 161 105 141 255 256 36.8 114 183 23.8 27.2 25.5 68.7 64.5 26.2 18.3 38.8 39.4 156 24.7 25.0 97.6 16.7 149 211 253
1.78 1.10 1.46 0.91 4.31 0.79 0.83 0.63 0.57 0.65 1.49 0.55 0.98 1.92 11.3 13.0 1.67 0.85 1.51 1.51 3.33 1.49 1.70 1.39 1.20 0.72 0.85 0.89 0.99 0.83
367 388 434 466 390 545 542 511 651 704 472 420 447 445 375 396 396 438 440 417 459 214 230 356 378 512 389 547 586 621
19.1 17.0 78.0 79.0 40.1 22.3 40.8 52.5 83.3 79.0 30.5 53.9 64.5 20.9 31.6 28.7 69.1 68.0 23.8 22.0 23.4 12.8 184 37.2 31.0 104 16.3 71.9 195 149
44.3 50.0 38.3 45.0 115 9.10 53.3 0.70 0.70 0.48 30.6 54.6 0.02 42.1 42.3 48.0 39.9 38.0 39.1 51.4 63.4 35.4 37.0 59.0 62.0 61.5 31.9 81.3 102 129
version of mean and standard deviation, such as median and MAD for skewed variables) is needed. The presence of outliers can have severe effects on cluster analysis affecting the distance measures and distorting the underlying data structure. Thus, it is advisable to remove the outliers prior to the application of the clustering procedure or to adopt methods able to handle them (Kaufman and Rousseeuw 2005). From this point of view, the use of explorative univariate and bivariate tools (e.g., histograms, box plots, binary diagrams) is fundamental before proceeding with every multivariate methodology to probe the data structure. The method of substitution of censored data can have a big effect on the results of the clustering procedure, particularly when a high percentage of cases are substituted with an identical value (e.g., the detection limit or its percentage). The elimination of variables characterized by a large amount
of data below the detection limit often represents the only solution. However, with the aim of not losing the information, these variables can be transformed into dichotomic variables (values below and above the detection limit transformed in 0 and 1, respectively) so that a contingency table can be realized cross-correlating the clusters obtained by the analysis without them and the variables with dichotomic behavior. Compositional (closed) data can also be a problem in cluster analysis since they pertain to the simplex sample space and not the real one (Aitchison 1986), thus compromising the correct use of every similarity/distance measure. One of the transformations of the log-ratio family, in particular the log-centered or the isometric one, should be applied or the use of the Aitchison distance promoted (Templ et al. 2008).
Compositional Data
133
Cross-References K SO4 Ca Cl Mg
▶ Compositional Data ▶ Correlation and Scaling ▶ Correlation Coefficient ▶ Dendrogram ▶ Discriminant Analysis ▶ Exploratory Data Analysis ▶ Fuzzy C-means Clustering
HCO3
Bibliography Na
1.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Cluster Analysis and Classification, Fig. 4 Dendrogram for the chemical composition of water data sampled in a sedimentary aquifer. The liking method was given by the average linkage and the similarity measure by the Pearson correlation coefficient
Conclusions Cluster analysis is widely used in geological sciences as an exploratory methodology to verify the presence of natural groups in data. From this point of view, cluster analysis is not a statistical inference technique, where parameters from a sample are assessed as possibly being representative of a population. Instead, it is an objective methodology for quantifying the structural characteristics of a set of observations. However, two critical issues must be considered, the representativeness of the sample and multicollinearity. In the first case, the researcher would be confident that the obtained sample is truly representative of the population and the presence of multimodality or outlier cases would be carefully monitored in single variables (univariate analysis) and in the multivariate distribution. In the second case, the attention must be posed to the extent to which a variable can be explained by the other variables in the analysis. In this context, the use of clustering techniques with compositional data is very critical due to the constrained interrelationships among all the variables. As an alternative, model-based (Mclust) clustering can be used. It is not based on distances between the cases but on models describing the shape of possible clusters. The algorithm selects the cluster models (e.g., spherical or elliptical cluster shape) and determines the cluster memberships of all cases for solution over a range of different numbers of clusters. The estimation is done using the expectation maximization (EM) algorithm (Bouveyron et al. 2019).
Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London. 416 pp (reprinted in 2003 by Blackburn Press) Bouveyron C, Celeux G, Brendan Myrphy T, Raftery AE (2019) Modelbased clustering and classification data science with applications in R. Cambridge Series in statistical and probabilistic mathematics. Cambridge University Press. 427 pp Czekanowski J (1909) Zur differential-diagnose der Neadertalgruppe. Korrespondenz-Blatt der Deutschen Geselleschaft f¨ur Anthropologie. Ethnologie, und Urgeschichte 40:44–47 Dumitrescu D, Larrerini B, Jain LC (2000) Fuzzy sets and their application to clustering and training. CRC Press, Boca Raton. 622 pp Haldiki M, Batistakis Y, Vazirgiannis M (2002) Cluster validity methods. SIGMOD Record 31:409–445 Kaufman L, Rousseeuw PJ (2005) Finding groups in data. Wiley, New York. 368 pp Linnaeus C (1735) Systema Naturae, 1st edn. Theodorum Haak, Leiden, p2 Shirkhorshidi AS, Aghabozorgi A, Wah TY (2015) A comparison study on similarity and dissimilarity measures in clustering continuous data. PLoS One 10(12). https://doi.org/10.1371/journal.pone. 0144059 Sokal RR, Sneath PHA (1963) Principles of numerical taxonomy. W. H. Freeman, San Francisco, p 2 Templ M, Filzmoser P, Reimann C (2008) Cluster analysis applied to regional geochemical data: problems and possibilities. Appl Geochem 23:2198–2213
Compositional Data Vera Pawlowsky-Glahn1 and Juan José Egozcue2 1 Department of Computer Science, Applied Mathematics and Statistics, U. de Girona, Girona, Spain 2 Department of Civil and Environmental Engineering, U. Politècnica de Catalunya, Barcelona, Spain
Synonyms Closed data; CoDa; Compositional data
C
134
Definitions
Definition 1 Compositional data (CoDa) are vectors with strictly positive components that carry relative information. Consequently, the quantities of interest are ratios between components. CoDa represent parts of some whole, like, e.g., the geochemical composition of a rock sample, which states the proportion of each of the different elements present in the sample. The proportion can be expressed in percentages, as is usually done for major element oxides, or in ppm or ppb, as is ordinarily the case for trace elements. The change from percentages to ppm is attained by multiplying the observed values by a constant, and it is assumed that they still carry the same compositional information. The vectors are then proportional, as they are different representations of the same composition. They are equivalent compositions and constitute an equivalence class. A typical representation is as vectors of strictly positive components with constant sum, which is very convenient from the mathematical point of view. For a graphical representation, see Fig. 1. There are many examples in the geosciences, like the mineral composition of a rock, or the geochemical composition of a sample. The representation of such data in ternary diagrams has a long tradition in this field. Definition 2 Composition. A composition is a vector of D components taking values in the positive orthant of real space, RD þ , that carry relative information. It is custom to call
Compositional Data
them parts, a term suggested by Aitchison (1982) to avoid confusion with real variables. In order to represent a composition x ¼ [x1, x2, . . ., xD] in the simplex all parts are divided by their sum and multiplied by k,
Cx ¼
kx1 D i¼1 xi
,
kx2 D i¼1 xi
, ...,
kxD D i¼1 xi
,
k > 0,
ð1Þ
where the components (parts) of Cx add up to k. Cx is called closure of x to the constant k; it is a composition equivalent to x. Usually k ¼ 1 or k ¼ 100. A random composition, X ¼ [X1, X2, . . ., XD], is a composition whose parts are random variables. In order to determine the sample space of the parts it is convenient to consider Y ¼ CX. The parts of Y, Yi, are positive and add to the closure constant k > 0. Then, the values of Yi are in the D-part simplex satisfying D i¼1 Y i ¼ k. The notation in capitals, like X, is used for random compositions and the lower case, as x, for realizations of the random composition or for fixed values. The notation with capitals is also used for denoting generic parts; for instance, in a geochemical analysis, X1 Ca, X2 Na refers to the parts Ca and Na, respectively, in a generic sense. Although the D-part simplex is only a subset of RD space, quite often, it is assumed that operations and metrics of RD, the ordinary Euclidean geometry, is valid for the analysis of random compositions in the simplex. This assumption underlays almost all statistical methods and leads to spurious results when applied to CoDa. Definition 3 Subcomposition. A subcomposition of a D-part composition XD is a subvector of S parts, S < D, XS ¼ ½X1 , X2 , . . . , XS , which support is the positive orthant of real space, RSþ , carrying relative information. It is assumed that the same rules stated above for compositions hold for subcompositions. Thus, they represent parts of a (partial) whole and proportional vectors represent the same subcomposition. They are equivalence classes and any suitable representant of the class can be chosen. It is common to use a closed representation, the closure constant being usually 1, 100 or 106. One frequent example in the geosciences is the representation of a three-part subcomposition of a geochemical analysis in a ternary diagram.
Compositional Data, Fig. 1 Compositional equivalence classes in 3D. The ray from the origin through B is the class which contains the equivalent points Q and F
Compositional Data
Introduction Historically, CoDa have been defined as vectors of positive components constraint by a constant sum or closure constant, usually 1, 100 or 106. Nowadays, understanding CoDa as equivalence classes represented by one element of that class has broadened the concept. Available methods, designed to be scale invariant, allow us to analyze the data in such a way that the closure constant, if any, is not important. Problems with the analysis of CoDa were first detected by Karl Pearson (1897), who coined the term spurious correlation. Awareness of these problems was kept alive by many authors in the geosciences. It was John Aitchison (1982) who put forward the log-ratio approach, opening the way to deal consistently with CoDa. He set the basic framework that allowed later to define the algebraic-geometric structure of the sample space of CoDa. This was published simultaneously by Billheimer et al. (2001) and by PawlowskyGlahn and Egozcue (2001). It was the clue to understanding that there is no solution to these problems under the assumption that compositional data are vectors in real space that obey the usual Euclidean geometry. It can be shown mathematically that for raw data represented in the simplex (Eq. 2), i.e., closed or normalized to a constant, the covariance matrix necessarily is singular, has some non-zero and some negative entries, and is subcompositionally incoherent. To find subcompositions for which the sign of one or more covariances changes, it suffices to take the closed representant of the subcomposition consisting of those parts which have positive covariances. This shows that covariances of compositions are subject to nonstochastic controls, and this fact leads to the conclusion that they are spurious, and thus nonsensical. Problems with raw CoDa were described extensively by Felix Chayes (1971). They appear because they are assumed to be points in D-dimensional real space endowed with the usual Euclidean geometry. This assumption does not guarantee subcompositional coherence, and the following issues appear: (1) the mathematical necessity of at least one nonzero covariance; (2) the bias towards negative covariances; (3) the singularity of the covariance matrix; and (4) the nonstochastic control, and thus description and interpretation, of the dependence between the variables under study.
Principles John Aitchison (1982) introduced three principles, here called Aitchison principles, which should govern any compositional analysis.
135
Aitchison Principle 1. Scale Invariance Two proportional vectors give the same compositional information. They represent the same composition XD ¼ ½X1 , X2 , . . . , XD k XD ¼ ½k X1 , k X2 , . . . , k XD : Aitchison Principle 2. Subcompositional Coherence A result for Xi, Xj XS will not lead to a contradiction with a result obtained for Xi, Xj XD. Aitchison Principle 3. Permutation Invariance Results should be invariant if the order of the parts in the composition is changed. The principles above were stated by Aitchison (1982) before the vector space structure of the sample space of CoDa was recognized. They were led by the idea that methods for CoDa should be consistent whichever path is followed. Considering CoDa as equivalence classes corresponds to the Aitchison Principle 1, while the Aitchison geometry of the simplex makes the requirements of the Aitchison Principles 2 and 3 unnecessary, as long as one works on orthonormal (Cartesian) coordinates.
The Sample Space of Compositional Data The set of compositions represented by vectors subject to a constant sum is the D-part simplex, D
SD ¼
xi ¼ k ,
½x1 , x2 , . . . , xD R D þ xi > 0, i ¼ 1, 2, . . . , D; i¼1
ð2Þ with k an arbitrary constant, usually 1, 100 or 106. The simplex is usually taken as a representative of the sample space of CoDa. The closure operation (1) allows one to select the representative of an equivalence class in S D. The principle of scale invariance, mentioned above, is reflected in the selection of the simplex (Eq. 2); it excludes zero or negative values. This leads to the zero problem in CoDa, which is extensively addressed in Palarea-Albaladejo and Martín-Fernández (2015). An important concept in CoDa analysis is subcompositional coherence. As mentioned above, a subcomposition is a vector that includes only some parts of the original composition. A subcomposition is in general normalized to a constant sum, like when three parts of a larger composition are plotted in a ternary diagram. Subcompositional coherence is then the requirement that
C
136
Compositional Data
statements about parts in the original composition hold when a subcomposition is considered. The Aitchison Geometry for CoDa For x, y S D , α R, the following operations can be defined: • Perturbation: x y ¼ C½x1 y1 , x2 y2 , . . . , xD yD ; • Powering: a x ¼ C xa1 , xa2 , . . . , xaD ;
1 • Inner product: hx, yia ¼ 2D
D i¼1
D i¼1
ln
xi xj
1 2D
D
D
ln i¼1 j¼1
xi xj
2
log
y Ca mg=L Ca atom=L ¼ log þ log Ca , yMg Mg mg=L Mg atom=L
yCa ¼
ln yyi ; j
These operations generate a (D – 1)-dimensional Euclidean vector space structure known as Aitchison geometry (Pawlowsky-Glahn and Egozcue 2001). The inner product has an associated norm
kxka ¼
mg/L by the respective atomic weights which are, approximately, (40., 24.), respectively. This division of both elements and any other in the composition is a perturbation. If x ¼ (. . ., Ca, Mg, . . .) and y ¼ C ð. . . , 1=40:, 1=24:, . . .Þ, then
,
1 , 40:
fðxÞ ¼ D
D
ln i¼1 j¼1
y xi ln i xj yj
2
:
Perturbation, powering, and distance were already defined by Aitchison (1982). Perturbation is particularly interesting, as it can be interpreted in a very intuitive way as linear filtering. Also, any change in composition expressed as increasing or decreasing percentages of the parts is a perturbation as, for instance, a decrease of 3% implies multiplication by 0.97 of that part. Linear Functionals on Compositions In most CoDa analysis some real functions of compositions, called functionals, may have a special interest depending on the research questions. However, these functionals should be scale invariant to be consistent with the Aitchison principles. A functional f is scale invariant if, for any composition x and any scalar k, f(kx) ¼ f(x). If the function is not scale invariant, equivalent compositions closed to a different constant k give different results. For instance, the log ratios log (Ca/Mg) and log(Cl/(Ca þ Mg)) in a water analysis do not depend on the units in which they are expressed, be it in mg/L or mg/L. In many cases, a linear behavior of functionals can be a convenient condition for functionals. This is f(α
(x y)) ¼ α(f(x) þ f(y)). This linearity property assures that a change of units acts on the functional in the same way for mean values and that variability only changes with the square of the scaling factor α. Changing from mg/L to atoms/ L in Ca and Mg consists of dividing both concentrations in
D
ai log xi , i¼1
1 2D
1 : 24:
Taking the values of log(Ca/Mg) in mg/L across a sample, the average of the functional values changes summing the log ratio log(yCa/yMg), and the variance remains unaltered. This recalls that a perturbation, in this case a change of units, is a shift in the simplex. It is straightforward to check that the amalgamated log ratio log(Cl/(Ca þ Mg)) is nonlinear. Contrarily, consider the functional D
and an associated distance:
da ðx, yÞ ¼
yMg ¼
ai ¼ 0, i¼1
called log contrast. All linear scale invariant log ratios have this form. Functionals like log(x1/x2), log(x1/gm(x)), and log (gm(x1, x2)/gm(x3, . . ., xD)), where gm() is the geometric mean of the arguments, are log contrasts. Among log contrasts, the class of balances are of the form Bðxþ =x Þ ¼
g ðx Þ nþ n log m þ , nþ þ n gm ð x Þ
ð3Þ
where x+ and x denote two groups of parts included in x and n+, n are the number of parts in the two groups, respectively. Sometimes, the square root in (3) is suppressed; the remaining log ratio of geometric means is a non-normalized balance.
The Principle of Working in Coordinates It is well known that every (D – 1)-dimensional Euclidean space is isometric to the (D – 1)-dimensional real Euclidean vector space RD1. This implies that orthonormal coordinates in S D are elements of RD1 that obey the usual Euclidean geometry; and, consequently, all available methods developed in RD1 under the implicit or explicit assumption that the coordinates are orthonormal hold on orthonormal coordinates in S D . To apply this fact in practice, it suffices to realize that most statistical methods developed in RD1 assume cartesian coordinates. However, some nonorthonormal coordinates have properties that made them historically relevant.
Compositional Data
137
Non-Orthogonal Log-Ratio Representations The following two non-orthogonal representations were introduced by Aitchison (1982): Centered Log-Ratio (clr) Representation For x S D and gm(x) the geometric mean of the parts in x, it is defined as clr ðxÞ ¼ ln
x1 x2 x , ln , . . . , ln D , gm ð xÞ gm ð xÞ gm ð xÞ 1=D
D
gm ð xÞ ¼
xi
:
This representation is very useful for computational issues, as it has important properties: xi gðxÞ
it holds
D i¼1 vi
¼ 0.
• The clr is a bijection between S D and the hyperplane D D RD 0 R , that is, the plane in R such that for any D v RD, i¼1 vi ¼ 0. • The clr inverse is x ¼ clr1 ðvÞ ¼ C exp½v1 , v2 , . . . , vD : • The clr is an isomorphism, i.e., clr (x y) ¼ clr(x) þ clr(y) and clr (l x) ¼ l clr (x). • The clr is isometric, i.e., da(x, y) ¼ de(clr(x), clr(y)), where da stands for the Aitchison distance defined above, and de for the usual Euclidean distance. Additive Log-Ratio (alr) Representation For x S D, it is defined as alr ðxÞ ¼ ln
Orthogonal Log-Ratio Representations They are based on the Euclidean vector space theory. The orthogonal projection of a vector on an orthonormal basis of the space generates orthonormal coordinates which fully represent the vector. The same ideas apply to the simplex where the vectors are now compositions. Definition 4 Orthonormal basis. Compositions e1, e2, . . ., eD1 in S D constitute an orthonormal basis if ei , ej
i¼1
• For vi ¼ ln
ratio CoDa approach can deal with single parts when the relevant quantities are always scale-invariant ratios.
x1 x x , ln 2 , . . . , ln D1 , xD xD xD
where the part in the denominator can be any part in the composition. This representation has mainly historical interest, as it corresponds to a representation in oblique coordinates, which are not always easy to handle in computing Aitchison distances and orthogonal projections. Both representations, clr and alr, are too easily associated with the compositional parts appearing in the numerators of the log ratio, thus suggesting a kind of identification. This association is a source of confusion and misinterpretation. In fact, the log ratios in both alr and clr involve more than a single part, thus favoring the false impression that the log-
a
¼ clrðei Þ, clr ej
¼ dij ,
where the Kronecker delta δij is equal to 0 when i 6¼ j and equal to 1 for i ¼ j. Definition 5 Basis contrast matrix. It is the clr (D – 1, D) matrix of the basis:
C¼
clr ðe1 Þ clr ðe2 Þ clr ðeD1 Þ
CC0 ¼ I D1 ,
,
C0 C ¼ I D ð1=DÞ10D 1D
Coordinates of a composition x are obtained computing the Aitchison inner product of x with each of the elements of an orthonormal basis, x i ¼ hx, ei ia , and the expression of the original composition in terms of orthonormal coordinates leads to x ¼ D1 i¼1 xi ei : The function (olr) that assigns orthonormal coordinates to a composition is one to one. Originally, olr coordinates were known as isometric logratio coordinates (ilr) (Egozcue et al. 2003), but this concept was confusing, as, e.g., the clr coordinates are isometric but not orthonormal, while any orthonormal system of coordinates in S D endowed with the Aitchison geometry is isometric (see discussion Egozcue and Pawlowsky-Glahn 2019a, b). Properties of an Orthonormal Representation The olr: S D ! ℝD1 defined through an orthonormal basis e1, e2, . . ., eD1 in S D as olr (x) ¼ x ¼ clr (x) C0, with inverse olr1 ðx Þ ¼ x ¼ C ðexpðx CÞÞ, is an isometry, that is, • olrða x1 b x2 Þ ¼ a olrðx1 Þ þ b olrðx2 Þ ¼ a x 1 þ b x 2 • hx1 , x2 ia ¼ holrðx1 Þ, olrðx2 Þie ¼ x 1 , x 2 e • kxka ¼ kolr ðxÞk, d a ðx1 , x2 Þ ¼ d e ðolrðx1 Þ, olrðx2 ÞÞ
C
138
Compositional Data
Most compositional datasets report zeros in some parts and cases. These zero values are incompatible with their transformation into log ratios, since infinite values would appear in the transformed dataset. Therefore, a preprocessing of the raw data is frequently required. However, the treatment of zeros needs a careful examination of their nature, and possibly some additional assumptions and decisions which are not based on the data themselves. This is a short description of the procedures to treat zeros. The interested reader is redirected to Palarea-Albaladejo and Martín-Fernández (2015) and references therein.
A first step is to decide the origin and nature of the zeros in the dataset. At least, three classes can be distinguished: essential zeros, rounded zeros, and zeros from counts. Essential zeros are those which are due to the definition of the parts implying that, in certain circumstances, one or more parts are completely absent. An example is the use of land including for instance classes of forest and some types of agriculture. If a parcel is completely covered by forest, then there is no agriculture and each type of agriculture would be reported as 0%. These zeros are essential zeros. The treatment of essential zeros mainly consists in translating the presenceabsence of zeros into a factor defining a subpopulation. Rounded zeros and zero count can be processed in, at least, two ways: (A) the substitution of reported zeros by an imputed value, which is then processed as a regular compositional datum or, alternatively, (B) the zeros are thought as observations in a model in which these zero values are acceptable. Then, the analysis of such a model, consisting in the estimation of parameters, requires to know the likelihood of the parameters given the data including the zeros. The B methods avoid the explicit imputation of the zero values by estimating the parameters of the model. Commonly, in A cases, a detection threshold is available or can be inferred to help the imputation. There are two classes of imputation, those where information only of the single part, which contains the zero, and of the corresponding detection limit is used; and those taking into account the relationship with other parts of the composition. If these imputation techniques are applied to counts, the imputation consists of a pseudo-count, frequently a value between 0 and 1. A simple case is that of data in percentages, where also the detection limit is given in percentages. When a zero is present the multiplicative strategy of replacement consists of assigning a fraction of the detection limit to the reported zero, and then replacing all other observed data so that the resulting composition adds to 100. A typical example of including the zeros in the estimation of parameters is the zero counts in a multinomial sampling. The goal is frequently to estimate the multinomial probabilities which are assumed positive. The multinomial likelihood
Compositional Data, Table 1 Example of SBP for a five-part composition. Four partition steps are necessary to complete the SBP; each one divides the active subcomposition into two groups of parts, marked
with +1 or 1, respectively; nonactive parts are labeled with 0. The expression of balance coordinates corresponding to each step are in the right column
where the subscripts e indicate that the operator is the ordinary Euclidean one. There are different ways to build orthonormal coordinates within the Aitchison geometry for CoDa. One of them is based on principal component analysis of a dataset in its clr representation. The nonstandardized scores are olr coordinates and the loadings are a contrast matrix (PawlowskyGlahn et al. 2015). Consequently, the coordinates obtained in this way are data driven. Some further details are discussed in section “Exploratory CoDa Tools: Variation Matrix, Biplot, and CoDa-Dendrogram.” Sequential Binary Partitions A straightforward and intuitive way to define an orthonormal basis is through a sequential binary partition (SBP) of a generic composition. The construction is conveniently based on expert knowledge, but it can be done automatically or using specific algorithms, like the principal balances explained below. The procedure to define SBP (PawlowskyGlahn et al. 2015) is best explained with an example. Consider a composition x S 5 . To define an SBP and to obtain the coordinates in the corresponding orthonormal basis proceed as illustrated in Table 1: The olr-coordinates assigned to each step of the partition are balances as defined in Eq. (3). For instance, the coordinate in step 3 in Table 1 would be B(x1/x3, x4) following the notation in that Equation.
Zeros
x1 +1
x2 1
x3 +1
x4 +1
x5 1
2
0
+1
0
0
1
3
+1
0
1
1
0
4
0
0
+1
1
0
Step 1
Coordinate y1 ¼
ðx1 x3 x4 Þ1=3 32 3þ2 ln ðx2 x5 Þ1=2
y2 ¼ y3 ¼ y4 ¼
x2 11 1þ1 ln x5 x1 12 1þ2 ln ðx3 x4 Þ1=2 x3 11 1þ1 ln x4
Compositional Data
139
can include in a natural way zeros in the observation. A Bayesian estimation of multinomial probabilities provides an elegant solution of the problem. This class of approach typically requires the assumption of an underlying model for the observations. The R-package zCompositions implements most of these imputation techniques (Palarea-Albaladejo and MartínFernández 2015, and references therein). However, the treatment of zeros in CoDa is not only a computational problem, but it requires a detailed examination of the character of the zeros and even a complete model for the whole dataset.
Exploratory CoDa Tools: Variation Matrix, Biplot, and CoDa-Dendrogram Following the principle of working in coordinates (section “The Principle of Working in Coordinates”), compositional datasets can be represented in coordinates, preferably orthogonal, which are real variables. As a consequence, all exploratory tools devised for real variables can be applied to compositional log-ratio coordinates. However, there are at least three exploratory techniques that, although inspired in exploration of real variables, have characteristics that deserve particular attention. They are the variation matrix, the CoDabiplot, and the CoDa-dendrogram. In order to illustrate the use of these tools, the Varanasi dataset described below is used. Varanasi Dataset The Varanasi dataset (VDS) (Olea et al. 2017) consists of concentration of ions dissolved in ground water sampled at 95 wells in the surroundings of Varanasi (India) on the two margins of the Ganga river. The following concentrations are reported: Fe(mg/L), As(mg/L), Ca(mg/L), Mg(mg/L), Na (mg/L), K(mg/L), HCO3(mg/L), SO4(mg/L), Cl(mg/L), NO3(mg/L), and F(mg/L). The data also reports the location
of the sample wells. Wells numbered 1 to 44 are placed on the left margin, and the remaining wells, from 45 to 95, on the right margin of the Ganga river. The data matrix is denoted by X; its columns are denoted by Xj, j ¼ 1, 2, . . ., D (D ¼ 11 in VDS) and the rows containing a composition are denoted xi, where i ¼ 1, 2, . . ., n (n ¼ 95 in VDS). A first step in an exploration of a compositional dataset is often the computation of quantiles of its columns. This step is not important in a compositional data study except for detecting errors and/or the presence of zeros, and to have a first impression of the scale and units of the parts. In VDS, there are no zeros, but the initial values of Fe and As were given in mg/L, while the remaining elements are in mg/L. Although the compositional analysis can be performed on these nonhomogeneous units, Fe and As were divided by 1000 for translating them into mg/L. Note that such transformation is a perturbation. Center and Variability The center of a composition plays the role of the mean for real data. It is estimated as a perturbation-average along the data matrix cen ½X ¼
n 1
xi , n i¼1
where the big is a repeated perturbation. The symbol cen denotes sample center and it is an estimator of the theoretical center. Table 2 shows the sample center of the VDS in mg/L. Those values are the geometric means of the columns of VDS provided that all compositional data (rows of X) are given in homogeneous units. When all the data in the sample are closed to a constant k, the resulting geometric means by columns should be closed to the same k. In statistical analysis of real data, variability of the sample is very often described by the sample covariance matrix or its
Compositional Data, Table 2 Sample center (first row in mg/L) and normalized variation matrix [tij] of Varanasi data corresponding to the samples on the left margin of Ganga river. Normalized variations less than 0.2 are in boldface and marked with an asterix cenXL Ca Mg Na K HCO3 SO4 Cl NO3 F Fe As
Ca 43.62 0.00 0.90 0.72 1.18 0.42 1.33 0.70 1.88 0.55 0.44 0.53
Mg 43.99 0.90 0.00 0.54 1.06 0.30 0.67 0.29 3.05 0.30 0.78 0.43
Na 57.94 0.72 0.54 0.00 0.92 0.36 0.58 0.35 2.41 0.49 0.58 0.60
K 4.89 1.18 1.06 0.92 0.00 0.87 1.11 0.97 2.97 0.93 1.03 0.87
HCO3 328.15 0.42 0.30 0.36 0.87 0.00 0.91 0.44 2.54 *0.11 0.47 *0.16
SO4 36.68 1.33 0.67 0.58 1.11 0.91 0.00 0.60 3.43 0.91 1.11 1.11
Cl 71.90 0.70 0.29 0.35 0.97 0.44 0.60 0.00 2.73 0.49 0.76 0.58
NO3 9.96 1.88 3.05 2.41 2.97 2.54 3.43 2.73 0.00 2.54 1.31 2.44
F 0.64 0.55 0.30 0.49 0.93 *0.11 0.91 0.49 2.54 0.00 0.47 0.29
Fe 0.45 0.44 0.78 0.58 1.03 0.47 1.11 0.76 1.31 0.47 0.00 0.48
As 2.30e-3 0.53 0.43 0.60 0.87 *0.16 1.11 0.58 2.44 0.29 0.48 0.00
C
140
Compositional Data
standardization, the (Pearson) correlation matrix. When dealing with compositional data these matrices are spurious (Aitchison 1986). They are useless and confusing for a further analysis. Alternatively, variability of a compositional dataset can be described by means of the variation matrix (Aitchison 1986). The variation matrix is a (D, D) matrix T whose entries are t ij ¼ var log
Xi , Xj
i, j ¼ 1, 2, . . . , D:
The entries tii ¼ 0, as there is no variability in log(Xi/Xi). When tij is small, near to zero, it suggests linear association between the parts Xi and Xj (Egozcue et al. 2018). In fact, if Xi ’ aXj, for some a R, then tij ’ 0, that is, they are proportional or nearly so. Large values of tij point out the main sources of variability in the sample. In Aitchison (1986), the sample total variance is defined as the average entry of T, that is, totvar ½X ¼
1 2D
D
D
tij : i¼1 j¼1
The total variance can be also expressed using the clr and olr representations of the sample. It can be shown that D
totvar ½X ¼
D1
var½clri ðXÞ ¼ i¼1
var½olri ðXÞ,
ð4Þ
i¼1
where the last term points out that the sample total variance is the trace of the sample covariance matrix of the olr coordinates, thus matching the terminology in statistical analysis of real data. The values in T strongly depend on the number of parts, D, and the total variance. In order to interpret and compare the values in T, some normalizations have been proposed. For instance, if the total variance were spread over all non-null entries of T, the values would be 2Dtotvar [X]/(D(D 1)). Comparing this value with tij, a normalized variation is tij ¼
D1 t ij , 2totvar ½X
D
D
tij ¼ ðD 1ÞD i¼1 j¼1
where values of tij 1 suggest no linear association and tij < 1 indicates possible association (Pawlowsky-Glahn et al. 2015). In practice, tij > 0.2 corresponds to very poor linear associations. It is worth noting that the approximate linear association between two parts depends on the subcomposition considered (Egozcue et al. 2018). Table 2 shows the sample center and the normalized variation matrix for the data from the left margin of the Ganga river. The ion bicarbonate is suggested to be linearly associated with F and As. These evidences of association are not apparent in the right margin.
Compositional Biplot The singular value decomposition (SVD) is an algebraic technique for decomposition of data matrices. It is routinely used in principal component analysis. When dealing with compositional data the SVD is applied to the centered clr(X), that is to clr(X cen [X]), where cen[X] is perturbation-subtracted from each row of X. The SVD results in clrðX cen½XÞ ¼ U LV ⊤ , where Λ ¼ diag (l1, l2, . . ., lD1, 0) contains the singular values ordered from larger to smaller; U is an (n, D) matrix and V is a (D, D) matrix. The last singular value is 0 due to the fact that the rows of the decomposed matrix add up to 0, as pointed out above, after the definition of the clr. This means that the last 0 in Λ, and the last columns of U and V, can be suppressed. After this suppression and maintaining the same notation, Λ, U, and V are (D – 1, D – 1), (n, D – 1), (D, D–1) matrices, respectively (Pawlowsky-Glahn et al. 2015). The main property of V is that it is orthogonal, i.e., V⊤V ¼ ID1. Therefore, C ¼ V⊤ is a contrast matrix and UΛ ¼ olr(X cen [X]). The columns of V (rows of C) contain the clr’s of the elements of an orthonormal basis of the simplex, and the rows of U Λ are the (centered) olr coordinates of the compositional dataset. Also U⊤U ¼ ID1 and, consequently, the singular values are proportional to the standard deviations 2 of the olr coordinates. Furthermore, D1 i¼1 li ¼ n totvar ½X, where the centering of X can be suppressed, since centering does not affect the total variance. These properties allow the simultaneous plotting of the centered clr variables and of the compositions in the dataset, represented by their olr coordinates. Commonly, two olr coordinates are chosen for the plot, say the first and the second coordinates. For such a projection the proportion of total variance shown in the plot is l21 þ l22 =n totvar ½X. Superimposed to the olr coordinates, the clr of the vectors of the basis (columns of V ) can be plotted as arrows. As these vectors are unitary, the length in the plot indicates the orthogonal projection on the coordinates chosen for the biplot. This kind of plot is called form compositional biplot. It is appropriate to look for features of the data and to their (Aitchison) interdistances projected onto a twodimensional plot. However, the most frequent biplot is the covariance compositional biplot. The compositions in the dataset are represented by the standardized coordinates in U (called scores in a standard principal component analysis); simultaneously, the arrows to the columns of ΛV are plotted, so that the length of the arrows are proportional to the standard deviation of the clr variables, provided the projection represents a good deal of the total variance. The main interpretative tools are the lengths and directions of the links between the heads of two arrows. These lengths, up to a good projection,
Compositional Data
are proportional to the standard deviations of the log ratios between the parts labeling the corresponding arrows. That is, they are approximately proportional to the square root of the elements in the variation matrix. Figure 2 shows covariance biplots for VDS. The corresponding form biplots are quite similar and are not shown. The left panel corresponds to the projection on the first and second principal coordinates, but this projection only represents 50% of the total variance. The right panel shows the projection on the first and third principal coordinates, this latter contributing an additional 15% of the total variance. The sampling points on the left margin of the river are plotted in red and those on the right margin in green. In the two projections the left margin points appear less disperse than those of the right margin. However, the left margin sample appears mixed within the right margin in both projections. In fact, the total variance of the joint sample is 7.61 while the total variance for the left and right margin samples are, respectively, 5.14 and 8.28. In both projections shown in Fig. 2, the links between the arrows from the apexes of As, Fe, and Ca to those of Mg, K, and SO4 are approximately parallel to the first principal axis. This means that the fraction of total variance of the first principal coordinate comes from log-ratio variances between elements from one from each group. Also, this suggests that the variance of the balance B(Mg, K, SO4/As, Fe, Ca) can be similar (but less than) the variance of the first principal coordinate. Additionally both biplots suggest that the two margins of the river are distinguishable in mean by this first principal coordinate or by its proxy, the mentioned balance. Many observable features in the biplots are also identifiable in the variation matrices in a more quantitative way. For this reason
141
the CoDa-biplot can be considered one of the most powerful tools in the exploration of CoDa. More detailed explanations can be found in Pawlowsky-Glahn et al. (2015) and references therein. Compositional Dendrogram The principle of working on coordinates proposes the statistical analysis of olr coordinates as real coordinates. This is valid for exploratory purposes as well. Consequently, the standard description of sample mean, quantiles, and variances of coordinates should be accompanied of a definition of the basis that define the coordinates. In the case in which the basis is defined by an SBP and the coordinates are balances, the basis can be represented by a tree structure or dendrogram. An exploration based on mean and variances of the balance coordinates can be shown on the dendrogram structure, thus providing a powerful interpretation of the balance coordinates. Figure 3 is a CoDa-dendrogram for the VDS. A partition of the composition has been chosen using the principal balances technique (Martín-Fernández et al. 2018) which tries to mimic the principal component analysis using balances. The elements in a CoDa-dendrogram are: (1) The tree which reproduces the SBP selected. (2) The horizontal bars, which are equally scaled in the interval (7, 7) in Fig. 3, representing an interval for the corresponding balance in the tree. Their length in the plot is meaningless and only corresponds to the spacing of the labels. The anchoring of vertical bars on the horizontal ones points out the mean of the balance in the mentioned scale.
Compositional Data, Fig. 2 Covariance biplots for VDS. Left margin, red points; right margin, green points. Projection on first-second (left panel) and first-third (right panel) principal coordinates
C
142
Compositional Data, Fig. 3 CoDa dendrogram for VDS. The SBP corresponds to principal balances. Black tree: joint sample; red tree: left margin sample; green tree, right margin sample
(3) The length of vertical full lines is proportional to the variance associated with the balance, thus reproducing the decomposition of the total variance by olr coordinates in Eq. (4). The CoDa-dendrogram can be completed with some boxplots following the scale of the horizontal bars (not shown in Fig. 3). When there are two or more subpopulations, like in the present case the left and right margins of the Ganga river, the trees corresponding to each subpopulation can be superimposed. In Fig. 3, the black tree corresponds to the joint sample, whereas the red and the green trees represent the data in the left and right margins, respectively. This allows the comparison of mean values and variances of the subpopulations, a kind of visual ANOVA for each coordinate. If the goal is to identify features that distinguish the left and right margins of the river, Fig. 3 reveals that the sample mean of balance B(Mg, K, SO4/As, Fe, Ca) shows the largest difference for the two margins. Simultaneously, the CoDadendrogram indicates that the variance of this balance in the right margin is larger than in the left margin, a fact also shown in Fig. 2. Also, the variance of balance B(F/HCO3) is similar in both margins, and the variance is smaller than that of other balances, thus suggesting a linear association between F and HCO3. It should be remarked that the CoDa-dendrogram is strongly dependent on the SBP selected, but is extremely useful when the goal of the exploration is clearly stated and expert knowledge about the problem is available.
Compositional Data
1. CoDaPack: This is a user-friendly, stand-alone, freeware, developed by the Research Group on Compositional Data Analysis from the Computer Science, Applied Mathematics, and Statistics Department of the University of Girona (UdG). It can be downloaded from the web. This software was developed as support for short courses and is not a complete statistical package. Nevertheless, the most recent version allows a direct link to R, opening a wide range of opportunities (Comas-Cufí and Thió-Henestrosa 2011). 2. R-package “compositions”: This is a very complete package including most procedures available for CoDa. It is conceived for scientists familiar with the mathematical foundations of CoDa and R-programming. An introduction to this package and CoDa procedures is van den Boogaart and Tolosana-Delgado (2013). 3. R-package “robCompositions”: Originally implementing robust statistical procedures in CoDa, it grew to incorporate other CoDa procedures. Many of its features are described in Filzmoser et al. (2018). 4. R-package “zCompositions”: Directed to the analysis of different kinds of zeros and missing data appearing in CoDa datasets. Useful in preprocessing of most CoDa problems (Palarea-Albaladejo and Martín-Fernández 2015).
Conclusions CoDa represents the paradigm of what can go wrong if a methodology is applied to data that do not comply with the assumptions on which the methodology is based. A consistent and coherent solution for CoDa came from recognizing the Euclidean-type structure of the sample space, nowadays known as Aitchison geometry. In the framework of this geometry coordinates can be defined that allow for the application of standard methods. It is still a very active and open field of research, as many research questions call for the development of appropriate models which comply with the log-ratio approach.
Books on Compositional Data Analysis
Software for CoDa
The Aitchison (1986) book is the first book on compositional data analysis. It is seminal for most of procedures in CoDa within the log-ratio approach. After the recognition of the Aitchison geometry of the simplex and of the conception of compositions as equivalence classes, a number of books covering all theory, applications, and software have been published:
Several software packages are available for the analysis of compositional data following a log-ratio approach, among others the following:
• Buccianti et al. (2006), made of contributed chapters. Most chapters are applications of compositional data to geological cases, but it also contains theoretical elements.
Computational Geoscience
• Pawlowsky-Glahn and Buccianti (2011), made of contributed chapters. It includes chapters collecting advances in theoretical issues related to CoDa analysis and related methodologies. Also a variety of applications are collected. It represents the state of the art in 2011. • van den Boogaart and Tolosana-Delgado (2013), mainly oriented to introducing the R-package compositions and its use, also contains valuable descriptions of the CoDa theory. • Pawlowsky-Glahn et al. (2015), initially conceived as lecture notes, is a textbook introducing most methods dealing with CoDa. • Filzmoser et al. (2018) is a guide through CoDa statistical applications with theoretical hints and R-programming linked to the R-package rob-Compositions. • Greenacre (2018) presents some CoDa methodologies illustrated with examples developed with software in R.
Cross-References ▶ Aitchison, John ▶ Chayes, Felix ▶ Principal Component Analysis
Bibliography Aitchison J (1982) The statistical analysis of compositional data (with discussion). J R Stat Soc Ser B (Stat Methodol) 44(2):139–177 Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall, London. (Reprinted in 2003 with additional material by The Blackburn Press). 416 p Billheimer D, Guttorp P, Fagan W (2001) Statistical interpretation of species composition. J Am Stat Assoc 96(456):1205–1214 Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (2006) Compositional data analysis in the geosciences: from theory to practice, volume 264 of special publications. Geological Society, London Chayes F (1971) Ratio correlation. University of Chicago Press, Chicago, 99 p Comas-Cufí M, Thió-Henestrosa S (2011) Codapack 2.0: a stand-alone, multi-platform compositional software. In: Egozcue JJ, TolosanaDelgado R, Ortego MI (eds) Proceedings of the 4th international workshop on compositional data analysis (2011). CIMNE, Barcelona. ISBN 978-84-87867-76-7 Egozcue JJ, Pawlowsky-Glahn V (2019a) Compositional data: the sample space and its structure (with discussion). Test 28(3):599–638. https://doi.org/10.1007/s11749-019-00670-6 Egozcue J.J, Pawlowsky-Glahn V (2019b). Rejoinder on: “compositional data: the sample space and its structure”. Test 28(3):658–663. https://doi.org/10.1007/s11749-019-00674-2 Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300 Egozcue JJ, Pawlowsky-Glahn V, Gloor GB (2018) Linear association in compositional data analysis. Aust J Stat 47(1):3–31 Filzmoser P, Hron K, Templ M (2018) Applied compositional analysis. With worked examples in R. Springer Nature, Cham, 280pp
143 Greenacre M (2018) Compositional data analysis in practice. Chapman and Hall/CRC, Cham, 122 pp Martín-Fernández JA, Pawlowsky-Glahn V, Egozcue JJ, TolosonaDelgado R (2018) Advances in principal balances for compositional data. Math Geosci 50(3):273–298 Olea RA, Raju NJ, Egozcue JJ, Pawlowsky-Glahn V, Singh S (2017) Advancements in hydrochemistry mapping: methods and application to groundwater arsenic and iron concentrations in Varanasi, Uttar Pradesh, India. Stochastic Environ Res Risk Assess (SERRA) 32(1): 241–259 Palarea-Albaladejo J, Martín-Fernández JA (2015) zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemom Intell Lab Syst 143:85–96 Pawlowsky-Glahn V, Buccianti A (eds) (2011) Compositional data analysis: theory and applications. Wiley, Hoboken, 378 p Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stochastic Environ Res Risk Assess (SERRA) 15(5):384–398 Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Statistics in practice. Wiley, Chichester, 272 pp Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond LX:489–502 van den Boogaart KG, Tolosana-Delgado R (2013) Analysing compositional data with R. Springer, Berlin, p 258
Computational Geoscience Eric Grunsky Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON, Canada
Definition Process – A phenomenon that is defined within a coordinate space with a valid metric. Processes can be defined by the geochemical composition of a material (e.g., mineral), combinations of minerals (rocks), or measures of rock assemblage properties including magnetic properties, electroconductivity, and density. Coherence in the form of geospatial continuity or an ordered and distinctive framework defines patterns that can be interpreted in a geochemical/geological context and subsequently tested as processes. Examining geochemical data on an element-by-element basis has been reviewed by McKinley et al. (2016). Multielement geochemical surveys provide insight into geochemical processes through the use of multivariate statistical methods. Metric – A measure of a process within a coherent coordinate space. Different processes have different coordinate spaces and hence, different metrics. Commonly used metrics are derived from Hilbert spaces that are intrinsically orthonormal. The most commonly known metric is the Euclidian space. Non-orthogonal bases can also be used to discover/ display features of a process. Such bases occur when
C
144
compositional data are in raw form or transformed using the additive logratio (alr), centered logratio (clr), isometric logratio (ilr), or pivot coordinates (pvc) Aitchison (1986), Egozcue et al. (2003). Despite the non-orthogonality of these bases, meaningful interpretations can be derived from the data and used to validate known processes or identify new processes (Greenacre et al. 2022). The relationships between the elements of geochemical data are controlled by “natural laws” (Aitchison 1986). In the case of inorganic geochemistry that law is stoichiometry, which governs how atoms are combined to form minerals, and thereby defines the structure within the data. In the case of compositional data, additional measures are required in order to overcome the constant sum (closure) problem. Elements or oxides of elements are generally expressed as parts per million (ppm), parts per billion (ppb), weight percent (wt%, or simply %), or some other form of “proportion.” When data are expressed as proportions, there are two important limitations: first the data are restricted to the positive number space and must sum to a constant (e.g., 1,000,000 ppm, 100%), and second when one value (proportion) changes, one or more of the others must change too to maintain the constant sum. This problem cannot be overcome by selecting sub-compositions so that there is no constant sum. The “constant sum,” or “closure,” problem results in unreliable statistical measures. The use of ratios between elements, oxides, or molecular components that define a composition is essential when making comparisons between elements in systems such as igneous fractionation. The use of logarithms of ratios, or simply logratios, is required when measuring moments such as variance/ covariance. Depending on the data type, other metrics can be used, which can be: nominal, binary, ordinal, count, time, and interval. Each of these metrics defines different ways of expressing the relationships between the variable (elements) and the observations (analyses). From a Process Discovery perspective, different metrics can assist in the identification of different processes. The following coordinate spaces are common in the field of data analytics for geochemistry: • • • •
Principal Component Analysis (PCA) – Joliffe (1986) Independent Component Analysis (ICA) – Comon (1994) Multidimensional Scaling (MDS) – Cox (2001) Minimum-maximum Autocorrelation Factor Analysis (MAF) – Switzer and Green (1984) • t-distributed Stochastic Neighbor Embedding (t-SNE) – van der Maaten and Hinton (2008) • Uniform Manifold Approximation and Projection (UMAP) – McInnes et al. (2020)
Computational Geoscience
Other metrics can be used to measure associations between variables and observations. A two-step approach is recommended for evaluating multielement geochemical data. In the first “process discovery” step, patterns, trends and associations between observations (sample sites) and variables (elements) are extracted. Geospatial associations are also a significant part of process discovery. Patterns and/or processes that demonstrate geospatial coherence likely reflect an important geological/ geochemical process. Following process discovery, “process validation” is the step in which the patterns or associations are statistically tested to determine if these features are valid or merely coincidental associations. Patterns and/or associations that reveal lithological variability in surficial sediment, for instance, can be used to develop training sets from which lithology can be predicted in areas where there is uncertainty in the geological mapping and/or paucity of outcrop. Patterns and associations that are associated with mineral deposit alteration and mineralization may be predicted in the same way. In low-density geochemical surveys, where processes such as those related to alteration and mineralization are generally under-sampled, it may be difficult to carry out the process validation phase relating to these processes. Censored Data Quality assurance and quality control of geochemical data require that rigorous procedures be established prior to the collection and subsequent analysis of geochemical data. This includes the use of certified reference standards, randomization of samples, and the application of statistical methods for testing the analytical results. Historical accounts of Thompson and Howarth plots, for analytical results, can be found in Thompson and Howarth (1976a, b). Geochemical data reported at less than the lower limit of detection (censored data) can bias the estimates of mean and variance; therefore, a replacement value that more accurately reflects an estimate of the true mean is preferred. Replacement values for censored geochemical data can be determined using several methods. The lrEM function from the R Computing Environment, zCompositions package (Palarea-Albaladejo et al. 2014) is suitable to use for estimating replacement values. Values greater than the upper limit of detection (ULD) can also be adjusted, based on a linear regression model formed by the uncensored data below the ULD.
Process Discovery The following provides a few ways that processes can be discovered in multielement geochemical survey data.
Computational Geoscience
Multivariate Data Analysis Techniques The usefulness of multivariate data analysis methods applied to geochemical data has been well documented. An interpretation of the relationships of 30 elements is almost impossible without applying some techniques to simplify the relationships of elements and observations. Multivariate data analysis techniques such as those listed above provide numerical and graphical means by which the relationships of a large number of elements and observations can be studied Grunsky (2010) and Grunsky and Caritat (2019). These techniques typically simplify the variation and relationships of the data in a reduced number of dimensions, which can often be tied to specific geochemical/geological processes. Incorporation of the spatial association with multielement geochemistry involves the computation of auto- and crosscorrelograms or co-variograms. This field of study falls into the realm of geostatistics, which is not covered in this contribution. Multivariate geostatistics, which incorporates both the spatial and inter-element relationships, has been studied by only a few. Grunsky (2012) provides details on several approaches to account for geospatial variation in multivariate data. The discovery of processes is typically carried out through the use of unsupervised methods within a multivariate framework in which patterns of the variables are detected. Unsupervised methods are also known as “unsupervised learning.” Pattern recognition depends on two essential elements: a reduction of noise in the system (signal/noise ratio) and the application of these methods at an appropriate scale of measurement (metric). The metric of measurement can be either linear or nonlinear. Linear metrics are useful for detecting patterns that are defined by linear processes such as stoichiometric control in mineralogy or linear combinations of minerals that comprise various rock types. Nonlinear processes may reflect processes that cannot be modeled using a linear metric. Geological processes such as mineral sorting through gravitational processes or chaotic processes (storm events) can be nonlinear. The scale of the metric may have an influence on the ability to determine linear/nonlinear processes. A multivariate approach is an effective way to start the process discovery phase. Linear combinations of elements that are controlled by stoichiometry may occur as strong patterns, while random patterns and/or under-sampled processes show weak or uninterpretable patterns. An essential part of the process discovery phase is a suitable choice of coordinates to overcome the problem of closure. The centered logratio (clr) transformation (Aitchison 1986) is a useful transformation for evaluating geochemical data. Discovery of persistent and meaningful patterns enables the establishment of training sets for specific models to carry out process prediction. Models can include geology, mineral occurrences/prospects/deposits, soil types, and anthropogenic domains. Statistical measures applied to geochemical data typically reveal linear relationships, which may represent the
145
stoichiometry of rock-forming minerals and subsequent processes that modify mineral structures, including hydrothermal alteration, weathering, and water–rock interaction. Physical processes such as gravitational sorting can effectively separate minerals according to the energy of the environment and mineral/grain density. Mineral chemistry is governed by stoichiometry and the relationships of the elements that make up minerals are easily described within the simplex, an n-dimensional composition within the positive real number space. It has long been recognized that many geochemical processes can be clearly described using element/oxide ratios that reflect the stoichiometric balances of minerals during formation (e.g., Pearce 1968). Geochemical data, when expressed in elemental or oxide form, can be a proxy for mineralogy. If the mineralogy of a geochemical data set is known, then the proportions of these elements can be used to calculate normative minerals. Comprehension of the patterns that are revealed in process discovery include a wide range of graphics and tables. The relative relationships between variables and observations can be observed through the different multivariate coordinate systems generated by metrics including: • • • • • • • • • • •
Principal Component Analysis (PCA) Independent Component Analysis (ICA) t-distributed Stochastic Neighbor Embedding Minimum/Maximum Autocorrelation Factor analysis (MAF) Multidimensional Scaling (MDS) Self-Organizing Maps (SOM) Neural Networks (NN) Random Forests (RF) Blind Source Separation Hierarchical/nonhierarchical/model-based clustering Fractal/Multi-fractal analysis
From a process discovery perspective, patterns generated by these coordinate systems that show geospatial coherence and/or positive/negative associations between the variables, may be useful in describing processes. The commonly used principal component coordinates of clr-transformed data are orthonormal (i.e., statistically independent) and can reflect linear processes associated with stoichiometric constraints. Fractal Methods Fractal mathematics is now commonly used in the geosciences. Cheng and Agterberg (1994), Cheng and Zhao (2011) showed how fractal methods can be used to determine thresholds of geochemical distributions on the basis of spatial patterns of abundance. Where the concentration of a particular component per unit area satisfies a fractal or multifractal model, the area of the component follows a power law relationship with the concentration. This can be expressed mathematically as:
C
146
Computational Geoscience
Aðr nÞ / ra1
Aðr > nÞ / ra2
where A(r) denotes an area with concentration values greater than a contour (abundance) value greater than r. This also implies that A(r) is a decreasing function of r. If n is considered the threshold value then the empirical model shown above can provide a reasonable fit for some of the elements. In areas where the distribution of an element represents a continuous single process (i.e., background) then the value of α remains constant. In areas where more than one processes have resulted in a number of superimposed spatial distributions, there may be additional values of α defining the different processes. The use of power-spectrum methods to evaluate concentration-area plots derived from geochemical data has also been shown by Cheng et al. (2000), whereby the application of filters and patterns can be detected related to background and noise, thus enabling the identification of areas that are potentially related to mineralization. Visualizing the Coordinates and Elemental Associations of a Multivariate Metric The transformation of the logratio coordinates can be visualized in the transformed metric space and the geographic domain. Some of the transformations permit the calculation and visualization of the variables with the observations. An important part of determining variable significance is the measure of variance accounted by each component (dimension). • Principal Component Biplots • Minimum-Maximum Autocorrelation Biplots Exploring data geospatially involves the use of coordinate system, in either a geodetic or orthogonal grid/ metric. Coordinate systems generated by methods such as MAF include measures of geospatial association (Mueller et al. 2020). Geospatial signatures derived from variables that have positive/negative associations over a geospatial domain enhance the geospatial aspect of multivariate associations. The process of principal component analysis involves choosing the appropriate metric (i.e., logcentered transform) from which eigenvalues/eigenvectors are computed using singular value decomposition. The resulting eigenvalues, plotted as a “screeplot,” indicates how much “structure” is in the data. Principal component biplots display the relative relationships between the observations (principal component scores) and the variables (principal component loadings). Biplots can reveal much information about the relative relationships between the variables and the sample population. The resulting principal components can also be plotted in the geospatial domain, whereby the principal component patterns can be visually assessed in terms of the geospatial relationships between geological/geochemical processes. Areas of
contiguous zones of similar principal component scores can show geospatial patterns that define the “geospatial coherence” of the data in a geographic sense. Geospatial Coherence Multivariate coordinate systems have coordinates based on metrics of the variables (elements) and the observations (geochemical analyses). Different coordinate systems display different relationships of the data. These coordinate systems provide insight into the discovery of processes. For geospatial geochemical data, geospatial coherence is defined as both local and regional continuity of a response based on geostatistical methods such as semi-variograms. If a geospatial rendering of a principal component shows no spatial coherence (i.e., no structure, or a lot of “noise”), then it is likely that the image will be difficult to interpret within a geological context. The most effective way to test this is through the generation and modeling of semi-variograms that describe the spatial continuity of a specific class based on prior or posterior probabilities (PPs), which are described below. If meaningful semi-variograms can be created, then geospatial maps of PPs can be generated through interpolation using the kriging process. Maps of PPs may show low overall values but still be spatially coherent. This is also reflected in the classification accuracy matrix that indicates the extent of classification overlap between classes. Geospatial analysis methodology described by the “gstat” package (Pebesma 2004) in R can be used to generate the geostatistical parameters and images of the PCs and PPs from kriging. If the spatial sampling density appears to be continuous then it may be possible to carry out spatial prediction techniques such as spatial regression modeling and kriging. A major difficulty with the application of spatial statistics to regional geochemical data is that the data seldom exhibit stationarity. Stationarity means that the data has some type of location invariance, that is, the relationship between any sets of points are the same regardless of geographic location. This is seldom the case in regional geochemical datasets. Thus, interpolation techniques such as kriging must be applied cautiously, particularly if the data cover several geochemical domains in which the same element has significantly different spatial characteristics. Evaluation of the variogram or the autocorrelation plots can provide insight about the spatial continuity of an element. If the autocorrelation decays to zero over a specified range, then this represents the spatial domain of a particular geological process associated with the element. Similarly, for the variogram, the range represents the spatial domain of an element, which reaches its limit when the variance reaches the “sill” value, the regional variance of the element. Theoretically, at the origin, the variance should be zero at lag zero. However, typically, an element may have a significant degree of variability even at short distances from neighboring points. This variance is termed the nugget effect.
Computational Geoscience
Skill, knowledge, and experience are required to effectively use geostatistical techniques. It requires considerable effort and time to effectively model and extract information from geospatial data. The benefits of these efforts are a better understanding of the spatial properties of the data, which permits a more effective predictive estimate of geochemical trends. However, they must be used and interpreted with the awareness about problems with techniques of interpolation and the spatial characteristics of the data. Recognizing Rare Events – Under-Sampled Processes in Principal Component Analysis In regional geochemical surveys, the sample design is not optimally suited for the recognition of geological processes that have small geospatial footprints such as a mineral deposit. Thus, sites that are associated with mineral deposits are undersampled, relative to the dominant processes defined by background lithologies. A principal component analysis may identify these “rare event” processes in the lesser eigenvalues/ eigenvectors. Examination of biplots and geospatial plots of these lesser components can help identify mineral exploration targets or zones of contamination.
Process Validation – Testing Models Derived from Process Discovery Process Validation – Process validation is the statistical measure of the likelihood of a given process as defined by discovery or by definition. When processes are defined by discovery, as described above, or by training sets defined by other means, the application of process validation provides a statistical measure of uniqueness of the process. Using different multivariate coordinate systems (metrics), geochemical data can be classified using a variety of classification methods. The combination of coordinate systems and measures of proximity for classification results in a range of predictions. In a multivariate normal dataset, the results of the different combinations of coordinate systems and classifications can be similar. In datasets where there are multiple and distinct processes, the results may vary widely and provide different interpretations. Classification can be carried out as a binary or multi-class process. For many geoscience applications that use multielement geochemical data, multi-class classification studies are more common. Bayesian methods produce predictive probabilities that are based on prior probabilities and are used in many classification algorithms. A non-exhaustive list of classification/prediction methods include: • Random Forests (RF) • Linear Discriminant Analysis (LDA)
147
• • • • • •
Quadratic Discriminant Analysis (QDA) Logistic Regression (LR) Neural Networks (NN) Support Vector Machines (SVM) Boosting Classification and Regression Trees (CART)
Process validation is the methodology used to verify that a geochemical composition (response) reflects one or more processes. These processes can represent lithology, mineral systems, soil development, ecosystem properties, climate, or tectonic assemblages. Validation can take the form of an estimate of likelihood that a composition can be assigned membership to one of the identified processes. This is typically done through the assignment of class identifier or a measure of probability. Training sets contain the classes used to build a prediction model. The proportion of each class relative to the sum of all of the classes is termed the prior probability. After the application of the prediction model to the training set, and/or the test dataset, for each observation, there is list of proportional values of each class, which are termed the posterior probabilities (PP). A critical part of process validation is the selection of variables that produce an effective classification. This requires the selection of variables that maximize the differences between the various classes and minimizes the amount of overlap due to noise, unrecognized or under-sampled processes in the data. As stated previously, because geochemical data are compositional in nature, the variables that are selected for classification require transformation to logratio coordinates. The alr or ilr are both effective for the implementation of classification procedures. The clr-transform is not suitable because the covariance matrix of these coordinates is singular. However, analysis of variance (ANOVA) applied to clr-transformed data enables the recognition of the compositional variables (elements) that are most effective at distinguishing between the classes. Choosing an effective alrtransform (choice of suitable denominator) or balances for the ilr-transform can be challenging and requires some knowledge and insight about the nature of the processes being investigated. The choice of a suitable divisor for the alr transform can be based on prior knowledge of the processes that involve specific elements. ANOVA applied to the PCs derived from the clr-transform has been shown to be highly effective at discriminating between the different classes. Typically, the dominant PCs (PC1, PC2, . . .) commonly identify dominant processes and the lesser components (PCn, PCn-1, . . ., where n is the number of variables) may reflect undersampled processes or noise. The use of the dominant components can be interpreted as a technique to increase the signal/ noise ratio and result in a more effective classification using only a few variables. Classification results can be expressed as direct class assignment or posterior probabilities (PPs) in the form of forced class allocation, or as class typicality. Forced
C
148
class allocation assigns a PP based on the shortest Mahalanobis distance of a compositional observation from the compositional centroid of each class. Class typicality measures the Mahalanobis distance from each class and assigns a PP based on the F-distribution. This latter approach can result in an observation having a zero PP for all classes, indicating that its composition is not similar to any of the compositions defined by the class compositional centroids. The application of a procedure such as LDA can make use of cross validation procedures, whereby the classification of the data is repeatedly run based on random partitioning of the data into a number of equal sized subsamples. One subsample is retained for validation and the remaining subsamples are used as training sets. Classification accuracies can be assessed through the generation of tables that show the accuracy and errors measured from the estimated classes against the initial classes in the training sets used for the classification.
Computational Geoscience
Class imbalance occurs when there is a disproportionate number of samples for the different classes being considered in a model. For example, in predictive lithology study, for the creation of a training set, there may be large number of observations that represent granitic rocks and a small number of samples that reflect volcanic rocks. This class imbalance can affect the resulting model from many of the classification methods. Class imbalance can be adjusted using methods such as the Synthetic Minority Oversampling Technique (Hariharan et al. 2017).
Example of Process Discovery and Process Prediction of Mineral Resource Potential Using Geochemical Survey Data Regional geochemical survey data (Fig. 1) from southern British Columbia, Canada were studied for the potential of
Computational Geoscience, Fig. 1 Location map of stream sediment sampling sites (black dots) overlying a map of the regional geology. (From Grunsky and Arne 2020, Fig. 1)
Computational Geoscience
discovering/prediction specific types of mineral deposits (Grunsky and Arne 2020), based on mineral deposit models that have been developed for the province (BC Geological Survey 1996). The study integrated the regional geochemical survey with a mineral occurrence database (MINFILE) comprised of active mines, past producers, developed prospects, prospects, occurrences, and anomalies (BC Geological Survey 2019) as seen in Fig. 2. Each mineral occurrence was assigned a specified mineral deposit model. The mineral occurrences are not colocated with the regional stream sediment survey data. Thus, based on the distance between a stream sediment sampling site and a mineral occurrence (Fig. 3), the attributes of the mineral occurrence were assigned to the stream sediment sample site. If the distance between the two sites exceeded 3000 m, no assignment was made and the mineral deposit model attribute was classed as “unknown.” Additionally, if the MINFILE record was identified as a “showing” and the distance from the stream sediment site to the MINFILE site was
149
greater than 1000 m, the stream sediment site mineral deposit model was classed as “unknown.” Some mineral deposit types, listed in Table 1, were not included due to the distance selection criteria. Mineral deposit models with less than ten MINFILE records were not included. Also, the mineral deposit model (I05 – Polymetallic veins Ag-Pb-Zn Au) were not included because the geochemistry of the stream sediment sites associated with I05 overlap with almost every other mineral deposit type. It should be noted that the geochemistry of the mineral deposit models using stream sediment geochemistry is not unique and that compositional overlap is a problem when using geochemistry alone. The resulting deposit models used in the process prediction/validation part of the study are highlighted in bold in Table 1. Process Discovery The geochemical survey data were screened for censoring and values were imputed for element values less than the lower
Computational Geoscience, Fig. 2 Map of the MINFILE locations that are tagged with a British Columbia Mineral Deposit Model Mnemonic
C
150
Computational Geoscience
Computational Geoscience, Fig. 3 Map of distances between stream sediment sample sites and the closest MINFILE occurrence
limit of detection for the specific instrumentation. Following imputation a logcentered transformation was applied to the data. Three multivariate metrics were computed; principal component analysis (PCA), independent component analysis (ICA), t-distributed stochastic neighbor embedding (t-SNE). Choosing the dimensionalities of the ICA and t-SNE spaces requires an iterative approach. Analysis of variance and a provisional use of Random Forests are useful in determining which variables are suitable for process discovery (characterization) and process prediction (validation). After a number of iterations a nine-dimensional space for both the ICA and t-SNE metrics was considered a reasonable space to represent the variability of the geochemistry and the associated MINFILE assignments to each stream sediment site. The choice of dimensions is partly guided through the application of techniques such as Analysis of Variance, which provides an indication of how many variables (components) provide the best separation between the classes of attributes. These three metrics were then explored through visualization within their respective coordinate systems and a geospatial rendering of these coordinates. Principal component biplots and scatter plots of the t-SNE transform coordinates were examined for
patterns and relationships of the variables (PCA only) and the observations. Geospatial rendering, via kriging of the transformed variables (e.g., PC1, t-SNE1) yielded maps for the identification of patterns. Geospatially coherent patterns were considered to represent geochemical processes related to bedrock, alteration, mineralization, weathering, and mass transport. The application of principal component analysis yielded eigenvalues/eigenvectors that showed significant “structure” in the data. Figure 4 shows a screeplot and a table for the first ten eigenvalues, cumulative eigenvalues, and cumulative percentage eigenvalues. The first four eigenvalues represent most of the variability (56.9%) of the data, while the lesser values may represent under-sampled or random processes. Figure 5 shows a principal component biplot PC1-PC2 with the observations coded by the attributes of geological terrane, bedrock lithologies, and mineral deposit model assignment at each stream sediment site. Mean values for each of the attributes are shown by the larger symbols on separate biplots, which indicate the compositional difference of the attributes across the PC1-PC2 space.
Computational Geoscience
151
Computational Geoscience, Table 1 Frequency of MINFILE mineral deposit models in the study area Mineral deposit model C01 D03 E05 E12 E13 E15 G04 G05 G06 G07 H02 H03 H04 H05 H08 I01 I02 I05 I06 I08 I09 J01 J04 K01 K02 K04 K05 K07 L01 L02 L03 L04 L05 Unknown
Description Surficial placers Volcanic redbed Cu Mississippi Valley-type Pb-Zn Mississippi Valley-type Pb-Zn Irish-type carbonate-hosted Zn-Pb Blackbird sediment-hosted Cu-Co Besshi massive sulfide Cu-Zn Cyprus massive sulfide Cu (Zn) Noranda/Kuroko massive sulfide Cu-Pb-Zn Subaqueous hot spring Ag-Au Hot spring Hg Hot spring Au-Ag Epithermal Au-Ag-Cu; high sulfidation Epithermal Au-Ag; low sulfidation Alkalic intrusion-associated Au-Ag Au-quartz veins Intrusion-related Au pyrrhotite veins Polymetallic veins Ag-PbZn Au Cu Ag quartz veins Silica-Hg carbonate Stibnite veins and disseminations Polymetallic manto Ag-Pb-Zn Sulfide manto Au Cu skarns Pb-Zn skarns Au skarns W skarns Mo skarns Subvolcanic Cu-Ag-Au (As-Sb) Porphyry-related Au Alkalic porphyry Cu-Au Porphyry Cu Mo Au Porphyry Mo (low F-type)
Frequency 118 115 2 12 5 1 34 6 102 2 13 2 7 44 10 311 30 988 82 4 24 11 2 151 28 54 25 5 61 7 198 384 52 1218
Models highlighted in bold were used in the classification/prediction
Figure 6 shows a scatterplot matrix of the nine independent components (ICA), coded by the attributes of lithology. There is a clear distinction between granitic and metamorphic lithologies throughout the scatterplots. The volcanic and clastic rock types are transitional between the two igneous/metamorphic lithologies. Figure 7a, b show t-SNE coordinates 3 and nine coded by the attributes of underlying bedrock lithology, mineral
deposit model assignment at each stream sediment site. Mean values for each bedrock lithology and mineral deposit model are shown as larger symbols across the t-SNE3–tSNE9 space. The use of the stream sediment geochemistry PCA, ICA, and t-SNE coordinates and the MINFILE mineral deposit model attributes form the basis of process discovery from which the multielement associations and zones of geospatial coherence form a training set (provisional model) from which mineral deposit models can be predicted based. AOV An analysis of variance (AOV) was carried out on the PCA space using the tagged mineral deposit models to each stream sediment site. One hundred stream sediment sites with mineral deposit models tagged as “unknown” were included in the analysis. Figure 8 shows a plot of ordered F-values for each principal component. Components with high F-values are better at discriminating between the different mineral deposit types based on 41 PCs. PCs 1, 3, 8, and 13 are the dominant components for mineral deposit discrimination. In the case of the 9-dimensional t-SNE coordinates, t-SNE9, t-SNE3, t-SNE5, and t-SNE4 accounted for most of the variability of the data for the geochemistry tagged with a mineral deposit model. An AOV for the ICA coordinate space indicates that ICA2, ICA5, and ICA9 account for most of the mineral deposit class separation. Geospatial Coherence in Process Discovery Figure 9a, b show kriged images for principal components 1 and 3. The stream sediment sampling sites are shown on the images. The choice of these components for geospatial rendering is based on the analysis of variance of the principal components and the discrimination of the mineral deposit models assigned to the geochemical data. The kriged images of PC1 and PC3 show broad geospatially coherent regions that represent associations of elements and sampling sites as shown in the biplot of Fig. 5. Negative PC1 scores represent granitic/felsic rock types and positive scores indicate mafic and sedimentary rock types. Figure 10a, b show kriged images for t-SNE coordinates 3 and 7. The choice of displaying these maps is based on the application of random forests for the prediction of mineral deposit types. Figure 11a, b show kriged ICA coordinates 2 and 5 that shows broad regional patterns that reflect underlying lithologies across the area. These two variables were determined the most significant for discriminating between the mineral deposit classes. The kriged images of selected coordinates of the three different metrics show zones of geospatial continuity
C
152
Computational Geoscience
Quest South PCA [clr] PC1 PC2 λ 5.04 2.79 λ% 26.1818 14.4935 Σλ% 26.1818 40.6753
PC3 1.81 9.4026 50.0779
PC4 1.32 6.8571 56.9351
PC5 1.02 5.2987 62.2338
PC6 0.87 4.5195 66.7532
PC7 0.78 4.0519 70.8052
PC8 0.63 3.2727 74.0779
PC9 0.6 3.1169 77.1948
PC10 0.49 2.5455 79.7403
PC11 0.43 2.2338 81.974
PC12 0.4 2.0779 84.0519
PC13 0.35 1.8182 85.8701
PC14 0.33 1.7143 87.5844
PC15 0.31 1.6104 89.1948
Computational Geoscience, Fig. 4 Screeplot of eigenvalues derived from a principal component analysis of logcentered transformed geochemical data. The chart below lists the eigenvalues, percent contribution of the variance, and the cumulative contribution of the variance for each
indicating that these variables represent geospatially coherent processes, which will be tested using methods of classification and prediction. Process Validation An initial step in process validation requires an investigation on the ability of the training set of stream sediment geochemistry sites using PCA, t-SNE, and ICA metrics, tagged with a mineral deposit model, to predict the mineral deposit type for stream sediment sites where the mineral deposit model is tagged as unknown. Part of the strategy for use of the metrics is to choose principal components, t-SNE, and ICA coordinates that account for most of the observed signals in PCA biplots and geospatial maps of the spaces. This approach increases the signal-to-noise ratio, which increases the ability to predict mineral deposit types. Stream sediment sites tagged
with a mineral deposit model formed the training set, plus the additional 100 sites tagged as “unknown,” were assigned to the training set. As described above, there are several methods to classify and predict processes. Three examples are shown here: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and random forests (RF). Because geochemical processes can be linear, nonlinear, or a combination of both, it is useful to classify and predict processes with more than one method. This approach provides a measure of the variability of the data and identifies processes that are variable dependent on the method used and an indirect measure of uncertainty. Table 2 and Fig. 12 show the prediction accuracies for each of the three metrics and three classification methods based on the training sets as described above. The table and figures
Computational Geoscience
153
C
Computational Geoscience, Fig. 5 Three PCA biplots of PC1.v. PC2. (a) Scores coded by geological terrane. Mean values for geological terranes are shown by the terrane label placement in the biplot. (b) Scores coded by rock type. Mean values of the different rock types are shown by
the label placement in the biplot. (c) Scores coded by MINFILE mineral deposit model. Mean values for each of the mineral deposits models are shown by the label placement in the biplot
show that some of the mineral deposit models have higher prediction accuracies than others. The mineral deposit models C01 (placer gold), I01 (Au quartz veins), L03 (alkalic porphyry Cu-Au), L04 (porphyry Cu), and L05 (porphyry Mo) have the highest prediction accuracies for all three metrics. Quadratic discriminant analysis (Fig. 12b) and random forests (Fig. 12c) have higher prediction accuracies than those based
on predictions using linear discriminant analysis (Fig. 12a). Figure 12d shows the overall prediction accuracies for the three metrics and three methods. The figure shows that the method of random forests provides the highest level of prediction accuracy. Quadratic discriminant analysis outperforms linear discriminant analysis with the exception of the ICA metric.
154
Computational Geoscience
Computational Geoscience, Fig. 6 Scatterplot matrix of the ICA coordinates coded by rock type. Legend is the same as Fig. 5b
Figure 13a–f show kriged images of the posterior probabilities generated for mineral deposit predictions using random forests on the three metrics. Each map shows the MINFILE occurrences, coded as yellow crosses, designated for the mineral deposit models C01 (placer Au) and L04
(porphyry Cu-Au). Each stream sediment site that was predicted as C01 (Fig. 13a, c, e) and L04 (Fig. 13b, d, f) are shown as red dots. The kriged images reflect prediction values that range between 0 and 1. The geospatial coherence of the images and the strength of the kriged prediction surfaces
Computational Geoscience
155
C
Computational Geoscience, Fig. 7 (a) Scatterplot of t-SNE3.v.t-SNE5 coded by rock type. (b) Scatterplot of t-SNE3.v.t-SNE5 coded by mineral deposit model
Computational Geoscience, Fig. 8 Plot of F-values obtained from an analysis of variance for the mineral deposit models based on the principal components
indicate that the use of the t-SNE metric results in a prediction that is better than that of the ICA or PCA metric. Similarly, the classification for individual stream sediment sites show that fewer sites are predicted for the PCA metric, while the t-SNE
metric shows a much larger number of sites predicted for both the C01 and L04 deposit models. The increased number of predicted sites and the kriged probability surfaces for both models is partly supported by the addition of the known
156
Computational Geoscience, Fig. 9 (a) Kriged image of principal component 1. Broad regional domains indicating geospatial coherence are evident. The eastern part of the map shows dominantly positive PC1
Computational Geoscience
scores. The central part of the map shows mainly negative PC1 scores. (b) Kriged image of PC3 showing regions of geospatial coherence
Computational Geoscience
157
C
Computational Geoscience, Fig. 10 (a) Kriged image of t-SNE component 3. (b) Kriged image of t-SNE component 7. The two figures show broad regions of geospatial coherence
158
Computational Geoscience
Computational Geoscience, Fig. 11 (a) Kriged image of ICA component 2. (b) Kriged image of ICA component 5. The two figures show broad regions of geospatial coherence
Computational Geoscience
159
Computational Geoscience, Table 2 Prediction accuracies for three metrics and three classification methods LDA PCA ICA tSNE QDA PCA ICA tSNE RF PCA ICA tSNE PCA LDA QDA RF ICA LDA QDA RF tSNE LDA QDA RF Overall accuracy LDA.PCA 16.06
C01 22.72 48.18 30.90 C01 33.63 39.09 31.81 C01 59.78 52.50 53.41 C01 22.72 33.63 59.78 C01 48.18 39.09 52.50 C01 30.90 31.81 53.41
D03 29.41 29.41 5.88 D03 17.64 17.64 17.64 D03 0 5.57 16.83 D03 29.41 17.64 0 D03 29.41 17.64 5.57 D03 5.88 17.64 16.83
G06 18.75 10.41 2.08 G06 16.66 18.75 31.25 G06 20.49 10.22 49.48 G06 18.75 16.66 20.49 G06 10.41 18.75 10.22 G06 2.08 31.25 49.48
H05 14.81 3.70 18.51 H05 25.92 11.11 40.74 H05 14.36 14.36 50.94 H05 14.81 25.92 14.36 H05 3.70 11.11 14.36 H05 18.51 40.74 50.94
I01 36.55 38.70 0 I01 36.55 37.63 30.10 I01 42.74 32.02 44.89 I01 36.55 36.55 42.74 I01 38.70 37.63 32.02 I01 0 30.10 44.89
I06 13.33 13.33 13.33 I06 0 6.66 20 I06 0 0 18.98 I06 13.33 0 0 I06 13.33 6.66 0 I06 13.33 20 18.98
K01 2.56 10.25 0 K01 25.64 17.94 28.20 K01 12.54 15.05 32.77 K01 2.56 25.64 12.54 K01 10.25 17.94 15.05 K01 0 28.20 32.77
K04 0 0 0 K04 9.09 0 18.18 K04 0 0 16.92 K04 0 9.09 0 K04 0 0 0 K04 0 18.18 16.92
LDA. ICA 38.90
LDA. TSNE 9.92
QDA. PCA 34.96
QDA. ICA 38.11
QDA. TSNE 39.84
RF. PCA 45.49
RF. ICA 45.49
RF. TSNE 45.49
MINFILE occurrences shown in yellow crosses. The areas of increased probability and/or sampling sites predicted as C01 or L04, but with few or no MINFILE occurrences may represent regions of previously unrecognized sites with increased mineral deposit potential, or they reflect compositional overlap between deposit types and the site may be mineralized but belong to a different mineral deposit class. Within the current scope and context of this study, some fundamental assumptions have been made as detailed in Grunsky and Arne (2020). The geochemical composition of the stream sediment associated with individual mineral deposit models is uniquely distinct. In some cases, this assumption is not warranted. As described above, the mineral deposit model I05 (polymetallic veins) has characteristics that overlap with many other mineral deposit types, which resulted in a significant amount of confusion when applying process prediction. As a result, mineral deposit models with a significant amount of overlap with other models should be removed.
L03 56 8 60 L03 16 40 48 L03 0 0 27.21 L03 56 16 0 L03 8 40 0 L03 60 48 27.21
L04 1.49 59.70 2.23 L04 51.49 58.20 43.28 L04 69.24 67 61.76 L04 1.49 51.49 69.24 L04 59.70 58.20 67 L04 2.23 43.28 61.76
L05 37.5 12.5 12.5 L05 0 6.25 31.25 L05 0 0 29.96 L05 37.5 0 0 L05 12.5 6.25 0 L05 12.5 31.25 29.96
Unknown 0.57 57 0 Unknown 49 52 70 Unknown 74.81 59.76 77.82 Unknown 0.57 49 74.81 Unknown 57 52 59.76 Unknown 0 70 77.82
Not all mineral deposit types can be best represented with stream sediment geochemistry. The size fraction and the analytical methods used may not extract unique information to distinguish a sub-cropping mineral deposit or distinguish between different mineral deposit types, as in the case for the polymetallic vein class (I05). The method of dissolution using aqua regia is useful for sheet silicates and sulfide minerals, but aqua-regia digestion does not dissolve many other forms of silicates. Thus, some unique geochemical aspects of specific mineral deposit types based on silicate mineral assemblages may not be recognized. The MINFILE model identification is accurate. This may not be the case for some types of mineral systems and, as a result, there will be an increase in confusion of prediction. The identification of the mineral deposit profiles, as specified in the MINFILE field “Deposit Type,” may be incorrect or inconclusive, or the locations may be inaccurate. This can lead to misclassification errors in the subsequent application of machine-learning prediction methods.
C
160
Computational Geoscience
Computational Geoscience, Fig. 12 Charts that show the predictive accuracy of each mineral deposit type based on linear discriminant analysis (a), quadratic discriminant analysis (b), and random forests (c). Figure (d) shows the overall accuracy of each prediction method and metric
The location of a MINFILE site and the associated stream sediment site may not be within the same catchment area. In the example presented here, the assumption was made that the effect of catchment is not significant. If there is a requirement for the location of a MINFILE site and associated stream sediment site to be in the same catchment, the number of sites for the training set would be significantly reduced. The fact that the corresponding MINFILE site and the stream sediment sample site are not colocated means that there is always the likelihood that the stream sediment composition does not reflect the observed mineralization at the MINFILE site. The distance threshold of 3.0 km appears to work for some mineral deposit types, but not necessarily for others. Changing the threshold distance for different mineral deposit models may help in a more refined estimate of mineral resource prediction.
Summary In the realm of computational geosciences, the integration and use of geoscience data as variables (e.g., geochemistry) and attributes (e.g., geology, mineral deposit occurrences) can be studied to identify/discover processes. In geochemical data, processes are recognized through the application of multivariate methods using different metrics (three in the example shown here) that demonstrate element associations related to mineralogy and patterns of geospatial coherence that are related to primary lithology and other processes including metamorphism, weathering, mass wasting, hydrothermal alteration, and base – precious-metal mineralization. These patterns, once identified as processes, can be used as training sets to validate and predict these processes using other data.
Computational Geoscience
161
C
Computational Geoscience, Fig. 13 Kriged images of posterior probabilities based on random forests prediction for the three metrics (PCA, ICA, t-SNE) and two mineral deposit model types (C01-placer gold), (L04-porphyry Cu-Au). (a–c) C01 predictions. (d–f) L04 predictions
162
Computational Geoscience, Fig. 13 (continued)
Computational Geoscience
Computational Geoscience
163
C
Computational Geoscience, Fig. 13 (continued)
164
Cross-References ▶ Compositional Data ▶ Data Mining ▶ Data Visualization ▶ Discriminant Function Analysis ▶ Exploration Geochemistry ▶ Exploratory Data Analysis ▶ Fractal Geometry in Geosciences ▶ Geographical Information Science ▶ Geoscience Signal Extraction ▶ Geostatistics ▶ Imputation ▶ Interpolation ▶ Machine Learning ▶ Mineral Prospectivity Analysis ▶ Minimum Maximum Autocorrelation Factors ▶ Multidimensional Scaling ▶ Multivariate Analysis ▶ Multivariate Data Analysis in Geosciences, Tools ▶ Predictive Geologic Mapping and Mineral Exploration ▶ Random Forest ▶ Spatial Analysis ▶ Stationarity ▶ Variogram
Bibliography Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, New York, p 416 BC Geological Survey (1996) British Columbia mineral deposit profiles. BC Geological Survey. http://cmscontent.nrs.gov.bc.ca/geoscience/ PublicationCatalogue/Miscellaneous/BCGS_MP-86.pdf. Accessed Nov 2019 BC Geological Survey (2019) MINFILE BC mineral deposits database. BC Ministry of Energy, Mines and Petroleum Resources. http:// MINFILE.ca. Accessed Nov 2019 Cheng Q, Agterberg FP (1994) The separation of geochemical anomalies from background by fractal methods. J Geochem Explor 51:109–130 Cheng Q, Zhao P (2011) Singularity theories and methods for characterizing mineralization processes and mapping geo-anomalies for mineral deposit prediction. Geoscience Fontiers 2(1):67–79 Cheng Q, Xu Y, Grunsky EC (2000) Integrated spatial and spectrum analysis for geochemical anomaly separation. Nat Resour Res 9(1): 43–51 Comon P (1994) Independent component analysis: a new concept? Signal Process 36:287–314 Cox MAA (2001) Multidimensional scaling. Chapman and Hall, New York Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barcelo-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300
Computational Geoscience Greenacre M, Grunsky E, Bacon-Shone J, Erb I, Quinn T (2022) Atichison’s Compositional Data Analysis 40 Years On: A Reappraisal. arXiv:submit/4116835 [stat.ME] 13 Jan 2022 Grunsky EC (2010) The interpretation of geochemical survey data. Geochem Explor Environ Anal 10(1):27–74. https://doi.org/10. 1144/1467-7873/09-210 Grunsky EC (2012) Editorial, special issue on spatial multivariate methods. Math Geosci 44(4):379–380 Grunsky EC, Arne D (2020) Mineral-resource prediction using advanced data analytics and machine learning of the QUEST-South streamsediment geochemical data, southwestern British Columbia, Canada. Geochem Explor Environ Anal. https://doi.org/10.1144/ geochem2020-054 Grunsky EC, de Caritat P (2019) State-of-the-art analysis of geochemical data for mineral exploration. Geochem Explor Environ Anal. Special issue from Exploration 17, October, 2017, Toronto, Canada. https:// doi.org/10.1144/geochem2019-031 Hariharan S, Tirodkar S, Porwal A, Bhattacharya A, Joly A (2017) Random forest-based prospectivity modelling of greenfield terrains using sparse deposit data: an example from the Tanami Region, Western Australia. Nat Resour Res 26:489–507. https://doi.org/10. 1007/s11053-017-9335-6 Joliffe IT (1986) Principal component analysis. Springer, Berlin. https:// doi.org/10.1007/b98835 McInnes L, Healy J, Melville J (2020) UMAP: Uniform Manifold Approximation and Projection for dimension reduction. arXiv:1802.03426v3 [stat.ML], 18 Sep 2020 McKinley JM, Hron K, Grunsky EC, Reimann C, de Caritat P, Filzmoser P, van den Boogaart KG, Tolosana-Delgada R (2016) The single component geochemical map: fact or fiction? J Geochem Explor 162:16–28. ISSN 0375-6742. https://doi.org/10. 1016/j.gexplo.2015.12.005 Mueller U, Tolosana-Delgado R, Grunsky EC, McKinley JM (2020) Biplots for compositional data derived from generalised joint diagonalization methods. Appl Comput Geosci. https://doi. org/10.1016/j.acags.2020.100044 Palarea-Albaladejo J, Martín-Fernández JA, Buccianti A (2014) Compositional methods for estimating elemental concentrations below the limit of detection in practice using R. J Geochem Explor 141:71–77 Pearce TH (1968) A contribution to the theory of variation diagrams. Contrib Mineral Petrol 19:142–157 Pebesma EJ (2004) Multivariable geostatistics in S: the gstat package. Comput Geosci 30:683–691 Switzer P, Green A (1984) Min/max autocorrelation factors for multivariate spatial imaging, Technical report no. 6. Department of statistics, Stanford University, Stanford Thompson M, Howarth RJ (1976a) Duplicate analysis in practice – part 1. Theoretical approach and estimation of analytical reproducibility. Analyst 101:690–698 Thompson M, Howarth RJ (1976b) Duplicate analysis in practice – part 2. Examination of proposed methods and examples of its use. Analyst 101:699–709 van der Maaten LJP, Hinton GE (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Computer-Aided or Computer-Assisted Instruction
Computer-Aided or Computer-Assisted Instruction Madhurima Panja and Uttam Kumar Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India
Definition “Computer-aided or computer-assisted instruction” (CAI) refers to instruction or remediation presented on a computer, i.e., it is an interactive technique whereby a computer is used to present the instructional material and monitor the learning that takes place. The computer has many purposes in the classroom, and it can be utilized to help a student in all areas of the curriculum. CAI refers to the use of the computer as a tool to facilitate and improve learning. Many educational computer programs are available online (through cloud services) and from computer stores and textbook companies; these tools enhance instructional qualities in several ways. Computer programs are interactive and enhance the learning process by utilizing a combination of text, graphics, vivid animation, sound, and demonstrations. Unlike a teacher-led program, computers offer a different range of activities and allow the students to progress at their own pace. Students are also provided the option of working either individually or in a group. A computer can train the student on the same topic rigorously until they master it. They also provide a facility for immediate feedback so that the students do not continue to practice the wrong skills. Computers capture the students’ attention and engage the students’ spirit of competitiveness to increase their scores.
Introduction The earliest CAI, Pressey’s multiple-choice machine (Benjamin 1988), was invented in 1925. The main purpose of this machine was to present instructions, test the user, provide feedback to the user based on their responses, and record each attempt as data. Since then, several attempts have been made by researchers to improve Pressey’s multiplechoice machine and use it in various applications. Computer-assisted or computer-aided instruction (CAI) has been a term of increasing significance during the last few decades and can also be referred to as computer-based instruction (CBI), computer-aided or computer-assisted learning (CAL). In general, CAI can be described as the set of instructions provided by a computer program that enhance the learning experience of users. However, the keyword for
165
understanding the CAI is interaction. Computers can facilitate interaction during the learning process in several ways, e.g., the user might interact with the learning material that has been programmed systematically. Alternatively, there might be interaction between user and the tutor, even the computer might host peer interaction or interaction between members of whole “virtual” learning communities. The concept of user interaction with content was initially introduced in the 1980s, and it forms the primary reason behind the rapid success of CAI. During the last few decades, the explosion of technological advancements allowed reliable and inexpensive communication that facilitated interaction between humans via computer programs, and massive open online course (MOOC) is a popular example of CAI.
Types of CAI Programs Several types of software, viz., drill and practice, tutorial, simulations, problem-solving, gaming, and discovery are used for providing the course content in a CAI program (Blok et al. 2002). A brief description of these software is provided next: • In drill-and-practice programs, the student rehearses different elements of teaching and develops related skills. This program is highly user-friendly and is one of the most prevalent types of computer software for many years. This type of software relies heavily on positive reinforcement, i.e., a reward follows a correct response. For example, if a drill and practice program was designed to help users to learn multiplication tables, then the drill might be presented with a car race game and the refueling of the car would depend on the correct answer provided by the student on the multiplication question. Drill-andpractice software deals primarily with lower-order thinking skills and also have a tracking facility that records the progress of the user. Moreover, this software does not utilize the full power of a computer and hence it can be used continuously to provide students with practice sessions and evaluate them. • The use of “computers as tutors” is as widespread as that of drill-and-practice programs. The tutorial programs are specifically designed to teach certain concepts or skills and then assess the student’s understanding of such concepts by providing them the opportunity to practice. This type of program tries to act like a simulating teacher. The tutorial sessions are usually very interactive and involve explanations, questions, feedback, as well as correctives. Students are also provided with computerized tutorials that can be accessed multiple times until they become adept in those concepts.
C
166
• A simulation is a representation of real-world event, object or phenomenon. Simulation attracts the students’ attention by setting up reality in classroom where the learners can see the results of their action. This feature of CAI provides great advantages for the science curriculum. With the help of simulation, experiments that are difficult, time–consuming, and costly could be realized in the classroom. Simulations also help in mathematics by producing visualizations of three-dimensional objects, which is not possible otherwise. This is a very powerful application of computers, and the educational community has the potential to capitalize on this type of software. Simulation software are now supported by artificial intelligence, and virtual and augmented reality. • Problem-solving is a type of CAI program that allows the user to manipulate the variables, and feedback is provided based on these manipulations. The major difference between a problem-solving software and a simulation software is that the former does not necessarily utilize realistic scenarios like the latter one. • Gaming-based CAI program is the most stimulating use of computers. These programs are a kind of simulation that offer competitive games from academic and nonacademic background to its users. The games are based upon various themes, e.g., some games might encourage their users to win individually, or some might give opportunity to the users to work as a team and explore the environment with other team members. However, irrespective of the theme based on which the game is designed, the primary objective behind these games is to produce valuable learning environment. • Discovery is another approach used in a CAI program to engage its users for skill development. In this approach, the users are provided with a large database of information regarding a specific content area and are asked to analyze, compare, and evaluate their findings based on explorations of the database.
Features of CAI CAIs have served the purpose of enhancing students’ learning by affecting cognitive processes and increasing motivation. There are certain factors by which computer programs can facilitate this learning experience. • Personalized information – CAI captures learner’s interest in a given task by providing them with personalized information. For example, if a CAI program is allotted the task of explaining the concept of big data to the users, then it will use several animated objects to demonstrate the concepts and thereby increase learning by decreasing the
Computer-Aided or Computer-Assisted Instruction
cognitive load on the learner’s memory. This will allow the student to perform search and recognition processes and make more informational relationships (Fig. 1). • Trial sessions – CAI encourages its learners to practice challenging problems. These practice sessions are intrinsically motivating and also carry other significant advantages such as personal satisfaction, challenge, relevance, and promotion of a positive perspective on a lifelong journey.
Types of Student Interactions in CAI 1. Recognition – In recognition type of interactions, the students are only required to identify whether or not a particular study material or question presented by a machine has been demonstrated previously. 2. Recall – This type of interaction requires the student to do more than just recognizing the information previously presented. The students are expected to reproduce the information precisely in a similar manner as was done previously by the machine. 3. Reconstruction – The reconstruction type of interaction does not engage the learners superficially like that of recognition and recall. In this type of interaction, the learners are required to reproduce concepts or principles that have been previously learnt. A basic difference between recall and reconstruction is that the latter does not show any relevant information except the question to the users when they are constructing the answers. 4. Intuitive understanding – The most difficult type of interaction between a CAI program and a student is intuitive understanding. These interactions often involve prolonged activity and are directed at “getting a feel” for an idea, developing sophisticated pattern-recognition skills, or developing a sense of strategy. Here, subject matter understanding must be demonstrated through experimental analysis and will be judged accordingly by experts. In this interaction, it is not possible to evaluate the students’ performance by some explicit criteria stored in the computer. The activities that are used for assessing the intuitive understanding of learners include discovering principles behind simulations, problem-solving using classical techniques, and developing a feel for diagnostic strategies. 5. Constructive understanding – Unlike the other interactions mentioned above, constructive understanding is extremely open-ended and encourages the students to “create” knowledge. The learners are usually provided with an “open” enquiry and are expected to do genuine research instead of just providing solutions to questions on the content that is already known. From the users’ point of view, she/he will be going beyond her/his knowledge and
Computer-Aided or Computer-Assisted Instruction
167
Computer-Aided or ComputerAssisted Instruction, Fig. 1 Demonstration of a learner using a CAI program © [Sujit Kumar Chakrabarti.]
C
may be testing her/his own hypothesis, developing new theory, and drawing conclusions based on her/his own work.
How Is CAI Implemented? Teachers are primarily responsible to review the computer programs or games or online activities before they are introduced to the students. The mentors should understand the context of the lessons in any CAI program and then determine their usefulness. The CAI program must satisfy some basic requirements. Firstly, the program should be able to supplement the lesson and give basic skills to the learners. Secondly, the content should be presented in such a manner that the student will remain interested in the topic. However, it must be ensured that the students don’t waste a substantial amount of time with too much animation in a content. Finally, the program must be at the correct level for the class or the individual students, and should neither be too simple nor too complex. Once these requirements are fulfilled, the teachers might allow the students to operate these computer programs to enhance their learning experience.
Advantages of CAI There are several advantages of using a CAI program as listed below: 1. CAI programs are self-pacing; they allow the students to learn at their own pace, totally unaffected by the performance of others. Owing to its self-directed learning nature, the learners have the freedom to choose when, where, and what to learn. It is also very useful for slow learners. 2. Information is presented in a structured manner in CAI software, which is useful when there exists a hierarchy of facts and rules on the subject. 3. CAI requires active participation from its users, which is not necessarily true while reading a book or attending a lecture. 4. The immediate feedback mechanism of CAI software provides a clear picture to the student about their progress. Learners can identify the subject areas in which they have improved and in which they need to improve further. 5. While teaching a concept, the CAI programs enable students to experiment with different options and explore the results after any manipulations, thereby reducing the time taken to comprehend difficult concepts.
168
Computer-Aided or Computer-Assisted Instruction
Computer-Aided or Computer-Assisted Instruction, Fig. 2 Three-dimensional surface generation in GRASS GIS with a simulation tool
6. CAI offers a wide range of experience that is otherwise not available to the student. It enables the students to understand concepts clearly using various techniques such as animation, blinking, and graphical displays. CAI software acts as multimedia that helps to understand difficult concepts through a multisensory approach. 7. CAI provides individual attention to each and every student. Additionally, a lot of practice sessions are also given to the students that can prove useful, especially for lowaptitude ones. 8. CAI programs are great motivators and encourage the students to take part in challenging activities, thus, enhancing their reasoning and decision-making abilities.
3. 4.
5.
6. 7.
Limitations of CAI
programmers to design such a software as well as sophisticated infrastructure for deployment. The course content of any CAI program needs to be updated frequently, otherwise, it will become obsolete. Since the objective and method of teaching decided by a teacher may be different from that of a CAI program designer, the program might fail to fulfill the expectation of the teacher. Teachers might be reluctant about using a computer in teaching and may be unwilling to spend extra time on the preparation, selection, and use of CAI packages. It may be perceived as a threat to their job. The overuse of multimedia in CAI software may divert the attention of the student from the course. CAI packages depend upon the configuration of the hardware; previously built packages might become completely useless after major hardware upgradation.
Apart from the abovementioned advantages, there are certain restrictions of CAI software as described next:
Application of CAI in Geospatial Analysis 1. Although CAI programs allow experimentation through simulation, the hands-on experience is missing. These programs cannot develop manual skills like handling apparatus in a chemical experiment, working with machines in a physical laboratory, workshop experience, etc. 2. There are real costs associated with the development of CAI systems. They require dedicated time from
In geospatial science, CAI programs are very useful in the simulation of real-world scenarios. There are many free and open-source software that provide such simulation and visualization capabilities. Geographic Resources Analysis Support System (GRASS) GIS (https://grass.osgeo.org/) is a software built for vector and raster geospatial data
Concentration-Area Plot
management, geoprocessing, spatial modeling, and visualization released under the GNU General Public License. It has programs to render three-dimensional views from contours at a given spatial location that allow users to generate interpolated surface of the terrain and simulate the environment. Figure 2a shows contours and Fig. 2b, c shows interpolated raster (digital elevation model) of a small region from a larger terrain shown in Fig. 2d. Three-dimensional surface generation in GRASS GIS with a simulation tool is shown in Fig. 2e. The dataset used to generate the figures are from the North Carolina sample open data available for GRASS GIS users (https://grass.osgeo.org/download/data/). These types of simulations can be realized in a classroom environment without actually visiting the spatial point of interest. A fly-through can be generated and stored in the form of popular video formats for multimedia display.
Conclusion This chapter was a brief overview of the features and usage of CAI. Nowadays, with the rapid technological development, it is almost impossible to keep away from electronic devices like laptops and smartphones. However, with the help of CAI software, students can be motivated to use electronic devices for their greater good. CAI provides the learners with the freedom to experiment with different options and get instant feedback. Another crucial feature of CAI is its one-to-one interaction, which not only provides individual attention to the students but also allows them to learn at their own pace. This assistance from technology provides opportunities to the teachers to work with students who need more of their time. It is also a proven concept that the animations that a CAI package offers enable the students to learn difficult concepts quite easily and are useful in virtual laboratories. With all these great features that CAI has to offer to its users, it is always recommended to use this technology under proper supervision, otherwise, the learning becomes too mechanical. If all these factors are taken into consideration, one might appropriately declare that CAI has the potential to transform the educational system of our society (Bayraktar 2001).
Cross-References ▶ Artificial Intelligence in the Earth Sciences ▶ Big Data ▶ Cloud Computing and Cloud Service ▶ Pattern Recognition
169
Bibliography Bayraktar S (2001) A meta-analysis of the effectiveness of computerassisted instruction in science education. J Res Technol Educ 34(2): 173–188 Benjamin L (1988) A history of teaching machines. Am Psychol 43(9):703 Blok H, Oostdam R, Otter M, Overmaat M (2002) Computer-assisted instruction in support of beginning reading instruction: a review. Rev Educ Res 72(1):101–130
Concentration-Area Plot Behnam Sadeghi EarthByte Group, School of Geosciences, University of Sydney, Sydney, NSW, Australia Earth and Sustainability Science Research Centre, University of New South Wales, Sydney, NSW, Australia
Definition Element contour maps are representative tools to discriminate the anomalous areas from background areas using optimum thresholds. Such thresholds could be defined via the target element concentration-area (C-A) log-log plot. Such an idea was developed as the C-A fractal model, by Cheng et al. (1994), which is the first fractal/multifractal model applied in geochemical anomaly classification.
Introduction Classification models, represented as classified maps, are the main tools to discriminate anomalous areas of mineralization from background areas. To expose the subtle patterns (e.g., geochemical patterns) associated with various mineralization and lithologies, a variety of mathematical and statistical models have been developed and applied to the available data. Traditional statistical methods, advanced geostatistical models and simulations, fractal/multifractal models, machine learning, deep learning, and artificial intelligence are some of the models that have been developed (Sadeghi 2020). Fractal geometry was proposed by Mandelbrot (1983); later, in 1992, it was taken into further consideration in geosciences by Bölviken et al. – see the ▶ “Fractal Geometry in Geosciences” and ▶ “Singularity Analysis” entries for further details. Then, Cheng et al. (1994) developed the first geochemical anomaly classification fractal model, called concentration-area (C-A). The C-A model is simply based on the relation between the target element concentration and the area confined by the relevant concentration contours (Cheng et al. 1994; Zuo and
C
170
Concentration-Area Plot
Wang 2016; Sadeghi 2020, 2021; Sadeghi and Cohen 2021). In this entry, the C-A fractal model and its log-log plot will be studied in detail. After that, several other fractal models, particularly concentration-volume (C-V) (Afzal et al. 2011) and concentration-concentration (C-C) (Sadeghi 2021) fractal models, which were developed based on the C-A model, will be discussed.
r i ¼ yi f ðxi , bÞ ¼ yi
S¼
Y ¼ mX þ C where X and Yare the axis coordinates of the points generating the lines, m denotes the slope of the lines, and C is a constant. Therefore, analysts need to fit straight lines to the fractal log-log plots to find their cross-points as the desired thresholds. The main method implemented to do so is the LS fitting method (Carranza 2009), as will continue to be discussed. The LS method, proposed by Carl Friedrich Gauss (1795) (Stigler 1981; Bretscher 1995), is a regression analysis to do linear and nonlinear regression. It minimizes the sum of the difference between the actual values and the fitted values predicted by a model, which are the squared residuals (r2). The data pairs are (xi, yi) that i ¼ 1, . . ., n, where xi is an independent variable and yi is a dependent variable. If the model function is f(x, β), the ith residual would be calculated as given below (cf. Stigler 1981; Bretscher 1995; Strutz 2011):
n
r2 i¼1 i
If β0 is the y intercept, and β1 is the slope of the straight line, the model function is given by: f ðx, bÞ ¼ b0 þ b1 x If the sum of the squared residuals, S, is minimum, the LS method would be optimum. So, in order to have the Smin, its gradient should be set to 0, meaning: @S ¼2 @ bj
A ðr uÞ / ra1 ; A ðr > uÞ / ra2 where A(r) denotes the area occupied by concentrations greater than a concentration contour value r; u represents a threshold concentration contour; and a1 and a2 are characteristic exponents. The thresholds defined by the C-A fractal model are cross-points of lines fitted, by Least Squares (LS) method, through log-log plots of A versus r. One question that may come to mind is about the relation between the fractal geometry, log-log plots, e.g., log-log plots of A versus r, fitting straight lines and calculating their slopes (fractal dimensions). Such slopes represent the exponents of the power-law relation in the C-A model equation (Zuo and Wang 2016). The answer, in summary, is that if we implement the logarithm (e.g., log 10) on both sides of the C-A fractal model equation, the result will be the equation of a straight line as below:
X b j¼1 ij j
So the sum of the squared residuals is:
Methodology: Concentration-Area Fractal Model The C-A fractal model, as one of the important fractal models, describes the spatial distribution of geochemical data based on the spatial relationship of geochemical concentrations to the occupied areas (Cheng et al. 1994):
n
m
r i¼1 i
@ ri ¼ 0, ð j ¼ 1, . . . , nÞ @ bj
Then, the derivatives are: @ ri ¼ Xij @ bj so, @S ¼2 @ bj
m i¼1
n
yi
X b k¼1 ik k
Xij , ð j ¼ 1, . . . , nÞ
If b minimizes S, we will have: m
2
yi
i¼1
n
Xij , ð j ¼ 1, . . . , nÞ
X b k¼1 ik k
and now, the normal equations are obtained: m i¼1
n
X X b k¼1 ij ik k
¼
m
X y ,ðj i¼1 ij i
¼ 1, 2, . . . , nÞ
The matrix notation of this equation is: XT X b ¼ XT y where XT is the matrix transpose of X. Another significant parameter in the straight-line fitting using the LS technique that should be taken into account is the determination coefficient (R2), which is the square of the correlation coefficient (R) between predicted values and actual values; so, the range of R2 is between 0 and 1. It is considered as a key output of regression analysis. In other words, it is the proportion of the variance in the dependent variable, which is predictable from the independent variable. The reason why the coefficient is important is because a linear
Concentration-Area Plot
171
relation is supposed to be strong/weak if the points on a scatterplot are close to/far from the straight line. Therefore, scientifically, instead of simply saying there is a strong or weak relationship between the variables, analysts must apply a measure of how strong/weak the association is. If R2 ¼ 0, it means that the dependent variable cannot be predicted from the independent variable; if R2 ¼ 1, it explains the dependent variable can be predicted from the independent variable, without error; and if R2 is between 0 and 1, it denotes the extent to which the dependent variable is predictable. For instance, if R2 ¼ 0.10, it means that 10% of the variance in Y is predictable from X. The formula of R considering R2 for a linear regression model with one independent variable is (Caers 2011): R¼
1 n1
n i¼1
y y xi x : i sy sx
where n is the number of samples or points on the scatter plot; xi and yi are the x and y values of the ith sample, x and y are the
mean of the x and y values; and sx and sy are the standard deviations of x and y.
Case Study: Sweden Sadeghi and Cohen (2019) applied the C-A model to Swedish till geochemical data, collected by Geological Survey of Sweden (Andersson et al. 2014) (Fig. 1). The thresholds obtained were applied to generate the classified volcanic massive sulfide (VMS) Cu geochemical anomaly map (Fig. 2). Main anomalies are concentrated in Caledonides and Northern Norrbotten metallogenic provinces.
Summary of the Other Models Developed Based on the C-A Fractal Model Several other models have been developed based on the C-A model such as the 3D format of C-A, called concentration-
Concentration-Area Plot, Fig. 1 Swedish till sample locations, and the IDW interpolated map of Cu. (From Sadeghi 2020; Sadeghi and Cohen 2021)
C
172
Concentration-Area Plot
Concentration-Area Plot, Fig. 2 VMS Cu geochemical anomalies recognized by C-A fractal modeling of till data in Sweden. (From Sadeghi and Cohen 2019)
Concentration-Area Plot
173
Concentration-Area Plot, Fig. 3 (a) Highly, (b) moderately, and (c) weakly mineralized zones, in addition to (d) wall rocks, and (e) the cross-section of the mineralized zones, characterized by C-V fractal model. (From Sadeghi et al. 2012)
volume (C-V) (Afzal et al. 2011), spectrum-area (S-A) (Cheng et al. 2000), and its 3D format, called power spectrum-volume (P-V) (Afzal et al. 2012), and concentration-concentration (C-C) (Sadeghi 2021) fractal models. For S-A and P-V fractal models see the ▶ “SpectrumArea Method” entry. Here C-V and C-C fractal models are introduced. The C-V fractal model was developed by Afzal et al. (2011) to define different mineralization zones in 3D:
C
V ðr uÞ / ra1 ; V ðr > uÞ / ra2 where V(r) denotes the volume occupied by concentrations greater than r. In Fig. 3, Sadeghi et al. (2012) have applied the C-V model to the Fe data of Zaghia iron ore deposit, Central Iran, to characterize different mineralized zones. Sadeghi (2020, 2021) proposed the C-C fractal model as a bivariate modification of the C-A model (Fig. 4):
174
Concentration-Area Plot
Concentration-Area Plot, Fig. 4 The C-C model applied to Cyprus soil data to classify Cu and In geochemical anomalies based on each other. (From Sadeghi 2021)
C2 ð C1 Þ ¼ FCD 1 where C1 is the target element concentration, C2(C1) is the cumulative concentration of the element, which is highly correlated with the target element, with concentration values
higher than or equal to C1; F is a constant and D is the fractal dimension. Unlike the C-A model, in the C-C model the element C2 is not measured in the area, i.e., it is measured in a subspace containing subareas in which C1 values exceed a certain value.
Constrained Optimization
Summary and Conclusions Fractal/multifractal models provide robust tools to characterize mineralization zones and discriminate geochemical anomalies from the background using the obtained thresholds. The C-A fractal model is the first fractal model, which has been efficiently applied to geochemical anomaly classifications. It works based on the relation between element concentrations and their occupied contour areas. It has been applied to various geological studies, then modified to several other models such as C-V in 3D, S-A and its 3D format, P-V, and recently C-C fractal models.
Cross-References ▶ Fractal Geometry in Geosciences ▶ Singularity Analysis ▶ Spectrum-Area Method
Bibliography Afzal P, Fadakar Alghalandis Y, Khakzad A, Moarefvand P, Rashidnejad Omran N (2011) Delineation of mineralization zones in porphyry Cu deposits by fractal concentration-volume modeling. J Geochem Explor 108:220–232 Afzal P, Fadakar Alghalandis Y, Moarefvand P, Rashidnejad Omran N, Asadi Haroni H (2012) Application of power-spectrum-volume fractal method for detecting hypogene, supergene enrichment, leached and barren zones in Kahang Cu porphyry deposit, Central Iran. J Geochem Explor 112:131–138 Andersson M, Carlsson M, Ladenberger A, Morris G, Sadeghi M, Uhlbäck J (2014) Geochemical atlas of Sweden. Geological Survey of Sweden (SGU), Uppsala, p 208 Bölviken B, Stokke PR, Feder J, Jössang T (1992) The fractal nature of geochemical landscapes. J Geochem Explor 43:91–109 Bretscher O (1995) Linear algebra with applications, 3rd edn. Prentice Hall, Upper Saddle River Caers JK (2011) Modeling uncertainty in earth sciences. Wiley, Hoboken Carranza EJM (2009) Geochemical anomaly and mineral prospectivity mapping in GIS. In: Handbook of exploration and environmental geochemistry, vol 11. Elsevier, Amsterdam, p 368 Cheng Q, Agterberg FP, Ballantyne SB (1994) The separation of geochemical anomalies from background by fractal methods. J Geochem Explor 51:109–130 Cheng Q, Xu Y, Grunsky E (2000) Integrated spatial and spectrum method for geochemical anomaly separation. Nat Resour Res 9:43–52 Mandelbrot BB (1983) The fractal geometry of nature, 2nd edn. Freeman, San Francisco Sadeghi B (2020) Quantification of uncertainty in geochemical anomalies in mineral exploration. PhD thesis, University of New South Wales Sadeghi B (2021) Concentration–concentration fractal modelling: a novel insight for correlation between variables in response to changes in the underlying controlling geological–geochemical processes. Ore Geol Rev. https://doi.org/10.1016/j.oregeorev.2020.103875 Sadeghi B, Cohen D (2019) Selecting the most robust geochemical classification model using the balance between the geostatistical
175 precision and sensitivity. In: International Association for Mathematical Geology (IAMG) conference, State College Sadeghi B, Cohen D (2021) Category-based fractal modelling: a novel model to integrate the geology into the data for more effective processing and interpretation. J Geochem Explor. https://doi.org/10. 1016/j.gexplo.2021.106783 Sadeghi B, Moarefvand P, Afzal P, Yasrebi AB, Daneshvar Saein L (2012) Application of fractal models to outline mineralized zones in the Zaghia iron ore deposit, Central Iran. J Geochem Explor 122: 9–19 Stigler SM (1981) Gauss and the invention of least squares. Ann Stat 9(3):465–474 Strutz T (2011) Data fitting and uncertainty. Vieweg+Teubner Verlag/ Springer, Berlin Sun H (1992) A general review of volcanogenic massive sulfide deposits in China. Ore Geol Rev 7:43–71 Zuo R, Wang J (2016) Fractal/multifractal modeling of geochemical data: a review. J Geochem Explor 164:33–41
Constrained Optimization Deeksha Aggarwal and Uttam Kumar Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India
Definition Constrained optimization is defined as the process of optimizing an objective function with respect to some variables on which constraints are defined. In general, for optimization, the objective function is either minimized in case of a cost or energy function or maximized in case of a reward or utility function. The constraints defined on the variables can be hard constraints which put a hard limit on the value of the variable or it can be a soft constraint having some penalized values of the variables in the objective function if the condition on the variables is not satisfied.
Introduction Optimization problem can be easily seen in most of the real world economic decisions which are generally subject to one or more constraint(s). Some real world examples of the economic decisions are: 1. When buying goods or services, consumers make decisions on what to buy, constrained by the fact that it must be affordable and under their budget. 2. Industries often make decisions on maximizing their profit subject to having limited production capacity, limited
C
176
Constrained Optimization
costing, and also maintaining high quality of goods or services that are above standards.
Hence, substituting y in f(x) we get: f ðxÞ ¼ xðx 11Þ f ðxÞ ¼ x2 11x
Basic Nomenclature A set S Rn is called: • Affine if x, y S implies θx þ (1 – θ)y S for all θ R. • Convex if x, y S implies θx þ (1 – θ)y S for all θ [0, 1]. • Convex cone if x, y S implies lx þ my S for all l, m 0.
According to first-order necessary condition, Therefore,
@f @x
¼ 0:
@f ¼ 2x 11 @x 2x 11 ¼ 0 x ¼ 5:5
A function f : Rn ! R is called • Affine if f(θx þ (1 – θ)y) ¼ θf(x) þ (1 – θ)f( y) for all θ R and x, y Rn. • Convex if f(θx þ (1 – θ)y) θf(x) þ (1 – θ)f( y) for all θ [0, 1] and x, y Rn. General form: A constrained optimization problem has the following general form: min f ðxÞ such that gi ð x Þ ¼ c i ; hj ðxÞ dj ;
for i ¼ 1, . . . , n; c ℜ Equality constraint for j ¼ 1, . . . , m; d ℜ Inequality constraint
Here, f(x) is the objective function that needs to be optimized in terms of minimization such that gi and hj (hard constraints) are satisfied. Convexity: This problem is called a convex problem if the objective function f is convex, the inequality constraints hj(x) are convex, and the equality constraints gi(x) are affine. Convex analysis is the study of properties of convex functions and convex sets with applications in convex minimization. Solution methods: To solve the constrained optimization problem, several methods have been proposed. Some of these methods are described below:
(i) Substitution method: For simple objective functions having two or three variables with equality constraints, this method is favorable as the solution. Here, a composite function is made by substituting the value of the equality constraint in the objective function as shown below: Let min f ðxÞ ¼ x:y suchthat y ¼ x 11
(ii) Lagrange multiplier: This method is used to minimize or maximize the objective function subject to one or more equality constraints only. To solve the objective function, a Lagrangian function is defined which is a relation between the gradient of the objective function and the gradients of the equality constraints, which lead to the reformulation of the original problem. The Lagrange multiplier theorem for the objective function f(x) and equality constraint g(x) is stated as below: Lðx, lÞ ¼ f ðxÞ þ lgðxÞ Here, l ℜ is the unique Lagrange multipliers such that @f(x) ¼ lT@g(x). First order condition: A differentiable function f of one variable defined on an interval F ¼ [ae]. If an interior-point x is a local/global minimizer, then f 0 ðxÞ ¼ 0; if the left-endpoint x ¼ a is a local minimizer, then f(a) 0; if the rightend-point x ¼ e is a local minimizer, then f(e) 0. First-order necessary condition (FONC) summarizes the three cases by a unified set of optimality/complementarity slackness conditions: a x e, f 0(x) ¼ ya þ ye, ya 0, ye 0, ya(x – a) ¼ 0, ye(x – e) ¼ 0. Note: If f(x) ¼ 0, then it is also necessary that f(x) is locally convex at x for being a local minimizer. Second order condition: Second-order condition states that if first-order condition satisfies and if f ðxÞ 0, or the function f is strictly locally convex, then x is a local minimizer. Duality: Consider the Lagrange multiplier theorem for objective function f(x) and equality constraint g(x) where L(x, l) ¼ f(x) þ lg(x) and there is a Lagrange dual function DðlÞ ¼ inf x ℜn Lðx, lÞ:l and u are dual feasible if l 0 and D(l) is finite. The dual function D is always concave. Weak duality holds when D(l) f(x).
(iii) Karush–Kuhn–Tucker (KKT) conditions: Constrained optimization with objective function having
Constrained Optimization
177
both inequality and equality constraints can be solved as a more generalized method (as compared to the Lagrange method) called the Karush–Kuhn–Tucker or the KKT conditions. Similar to Lagrange method, the KKT condition method solves the constrained optimization problem using the Lagrange function L(x, l) whose optimal point is the saddle point. Some of the other popular optimization algorithms like gradient descent, simulated annealing, local search algorithm are based on approximation methods in order to find a local or global minimum to optimize the objective function. As one of the most popular optimization algorithms, the gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable objective function. The idea is to take repeated steps in the opposite direction of the gradient of the function at the current point, because this is the direction of steepest descent. In order to use gradient descent algorithm in the presence of constraints, projected gradient descent algorithm was proposed where the constraint is imposed on the feasible set of the parameters.
Constrained Optimization Application in Geospatial Science Now, let us see one geospatial application of constrained optimization in land cover (LC) classification. Most land cover features occur at finer spatial scales compared to the resolution of primary remote sensing satellites. Therefore, observed data are a mixture of spectral signatures of two or more LC features resulting in mixed pixels (see Fig. 1); the mixed pixel problem has been commonly observed in hyperspectral remote sensing data. In other words, the observed data are mixture of spectral signatures of the individual objects present in mixed pixels, which could be due to two
Land cover type 1
Land cover type 2
Land cover type 3
Constrained Optimization, Fig. 1 A mixed pixel comprised of three land cover types
reasons: (1) the spatial resolution of the sensor is not high enough to separate different objects, these can jointly occupy a single pixel, and the resulting spectral measurement is a composite of the individual spectra, (2) mixed pixels may also result when distinct objects are combined into homogeneous (intimate) mixtures (Plaza et al. 2004). One of the solutions to the mixed pixel problem is the use of spectral unmixing techniques such as linear mixture model (LMM) or linear unmixing to disintegrate pixel spectrum into its constituent spectra. Here, the observation vector in the low spatial resolution imagery is modeled as a linear combination of the endmembers, and given the endmembers present in the scene, the fractional abundances are extracted through the ordinary least squares error approaches. In most cases, the fractional abundances are constrained to be positive and sum to one (Heinz and Chang 2001). LMM infers a set of pure object’s spectral signatures (called endmembers) and fractions of these endmembers (abundances) in each pixel of the image, that is, the objects are present with relative concentrations weighted by their corresponding abundances. The endmembers can be either derived using endmember extraction algorithms, from the image pixels, or obtained from an endmember spectral library available a priori. In LMM, the observation vector y for each pixel is related to endmember signature E by a linear model as y5Ea þ h
ð1Þ
where each pixel is a M-dimensional vector y whose components are the digital numbers corresponding to the M spectral bands. E ¼ [e1, . . .en 1, en, en þ 1. . ., eN] is a M N matrix, where N is the number of classes, {en} is a column vector representing the spectral signature of the nth target material, and h accounts for the measurement noise. For a given pixel, the abundance of the nth target material present in the pixel is denoted by αn, and these values are the components of the N-dimensional abundance vector α. Further, components of the noise vector h are zero-mean random variables that are independent and identically distributed (i.i.d.). Therefore, covariance matrix of the noise vector is s2I, where s2 is the variance and I is M M identity matrix. LMM renders an optimal solution that is in unconstrained or partially constrained or fully constrained form. A partially constrained model imposes either the abundance nonnegativity constraint (ANC) or abundance sum-to-one constraint (ASC) while a fully constrained model imposes both. ANC restricts the abundance values from being negative and ASC confines the sum of abundances of all the classes to unity. The conventional approach to extract the abundance values is to minimize ky Eαk as in (2): aUCLS ¼ ET E
1
ET y
ð2Þ
C
178
Constrained Optimization
which is termed as unconstrained least squares (UCLS) estimate of the abundance. UCLS with full additivity is a nonstatistical, nonparametric algorithm that optimizes a squared-error criterion but does not enforce the nonnegativity and unity conditions. To avoid deviation of the estimated abundance fractions, ANC given in (3) and ASC given in (4) are imposed on the model in fully constrained least squares (FCLS):
AASC allows usage of Lagrange multiplier along with exclusion of negative abundance values. This leads to optimal constrained least squares solution satisfying both ASC and AASC, known as Modified FCLS (MFCLS). MFCLS utilizes SCLS solution and the algorithm terminates with all nonnegative components (Kumar et al. 2017).
Case Study an 08n : 1 n N
ð3Þ
N
an ¼ 1
ð4Þ
Problem: Solve the below constraint optimization problem by applying KKT conditions for the objective function f(x, y) subject to the constraint function h1( y) and h2( y).
n¼1
ANC and ASC constrain the value of abundance in any given pixel between 0 and 1. When only ASC is imposed on the solution, the sum-to-one constrained least squares (SCLS) estimate of the abundance is l aSCLS ¼ ET E1 ET y 1 , 2
ð5Þ
min f ðx, yÞ ¼ ðx 2Þ2 þ 2ðy 1Þ2 subject to: h1 ðx, yÞ ¼> x þ 4y 3
ðAÞ
h2 ðx, yÞ ¼> x þ y 0
ðBÞ
Solution: Applying generalized Lagrangian function:
where
l¼
1
2 1T ET E T
T
1 E E
ET y 1 1
1
:
ð6Þ
The SCLS solution may have negative abundance values, but they add to unity. FCLS (Chang 2003; Heinz and Chang 2001) extends non-negative least squares (NNLS) algorithm (Lawson and Hanson 1995) to minimize kEα – yk subject to α 0 by including ASC in the signature matrix E by a new signature matrix (SME) defined by SME ¼
yE 1T
ð7Þ
with 1 ¼ ð11111 . . . N timesÞT and s ¼
yy 1
ð8Þ
θ in (7) and (8) regulates ASC. Using these two equations, the FCLS algorithm is directly obtained from NNLS by replacing signature matrix E with SME and pixel vector y with s. ANC is a major difficulty in solving constrained linear unmixing problems as it forbids the use of Lagrange multiplier. One solution is the replacement of αn 0 8 n : 1 n N with N absolute ASC (AASC), n¼1 j an j ¼ 1 (Chang 2003).
Lðx, y, l1 , l2 Þ ¼ f ðx, yÞ þ l1 h1 ðx, yÞ þ l2 h2 ðx, yÞ
Lðx, y, l1 , l2 Þ ¼ ðx 2Þ2 þ 2ðy 1Þ2 þ l1 ðx þ 4y 3Þ þ l2 ðx þ yÞ
with l1 0 and l2 0. Calculating the partial derivatives to find the optimal point as below: @L ¼ 2ð x 2Þ þ l1 l2 ¼ 0 @x
ð9Þ
@L ¼ 4ðy 1Þ þ 4l1 þ l2 ¼ 0 @y
ð10Þ
l1 ðx þ 4y 3Þ ¼ 0
ð11Þ
l2 ðx þ yÞ ¼ 0
ð12Þ
x þ 4y 3
ð13Þ
x þ y 0
ð14Þ
l1 0
ð15Þ
l2 0
ð16Þ
We need to check four possible cases as shown below:
Constrained Optimization
179
Case I: When l1 ¼ 0 and l2 ¼ 0. Using Eqs. 1 and 2, we have x ¼ 2 and y ¼ 1; these values of x and y do not satisfy Eqs. 5 and 6. So this is not a possible or optimal solution. Case II: When primal constraint A is inactive and primal constraint B is active, that is, l1 ¼ 0, and –x þ y ¼ 0. then substituting the above values in Eqs. 1 and 2 gives: 2ð x 2Þ l2 ¼ 0 and 4ðy 1Þ þ l2 ¼ 0: By solving above two equations, we have: 4 4 4 x ¼ , y ¼ , l2 ¼ 3 3 3 The values of l2 does not satisfy Eqs. 7 and 8. So, this is not a possible or optimal solution. Case III: When primal constraint A is active and primal constraint B is inactive, that is, x þ 4y ¼ 3 and l2 ¼ 0 then substituting the above values in Eqs. 1 and 2 gives: 2ð x 2Þ þ l1 ¼ 0 and
possible or optimal solution. Hence, the only possible solution to the objective function f is case III. Therefore, 5 1 Optimalsolution : x ¼ , y ¼ 3 3 2 5 1 Optimalvalue : f ðx, yÞ ¼ 2 þ2 1 3 3 Optimalvalue : f ðx, yÞ ¼ 1
2
Summary and Conclusions In this chapter, we saw some basic method of solutions for solving constrained optimization problem. Solution methods such as substitution method, Lagrange multipliers, and KKT conditions were discussed in detail with respective case studies at the end of the chapter. Furthermore, the global optimality conditions can be checked by KKT conditions and the dual problem provides a lower-bound and an optimality gap.
Cross-References ▶ Convex Analysis ▶ Hyperspectral Remote Sensing ▶ Least Squares ▶ Linear Unmixing ▶ Ordinary Least Squares
Bibliography 4ðy 1Þ þ 4l1 ¼ 0:
By solving the above two equations, we have: 5 1 2 x ¼ , y ¼ , l1 ¼ 3 3 3 The values of x, y, l1, and l2 satisfy all Eqs. 1–8. So, this case is one of the possible or optimal solution. Case IV: When both primal constraint A and B are active, that is, x þ 4y ¼ 3 and –x þ y ¼ 0. then by solving above two equations we have: 3 3 x ¼ ,y ¼ : 5 5 Substituting the above values in Eqs. 1 and 2: l1 ¼
22 48 ,l ¼ 25 2 25
The values of l2 do not satisfy Eq. 8. So this is not a
Chang C-I (2003) Hyperspectral imaging techniques for spectral detection and classification. Kluwer Academic/Plenum Publishers, New York David Lay, Steven Lay Judi McDonald, Linear Algebra and its Applications Fletcher R (2013) Practical methods of optimization. Wiley Gilbert Strang, Linear Algebra and its Applications, 4th Edition Heinz DC, Chang C-I (2001) Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery. IEEE Trans Geosci Remote Sens 39(3):529–545 Kennith Hoffman Ray Kunze, Linear Algebra, 2nd Edition Kumar U, Ganguly S, Nemani RR, Raja KS, Milesi C, Sinha R, Michaelis A, Votava P, Hashimoto H, Li S, Wang W, Kalia S, Gayaka S (2017) Exploring subpixel learning algorithms for estimating global land cover fractions from satellite data using high performance computing. Remote Sens 9(11):1105 Lawson L, Hanson RJ (1995) Solving least squares problems. SIAM, Philadelphia, PA Plaza J, Martinez P, Pérez R, Plaza A (2004) Nonlinear neural network mixture models for fractional abundance estimation in AVIRIS Hyperspectral Images. Proceedings of the NASA Jet Propulsion Laboratory AVIRIS Airborne Earth Science Workshop, Pasadena, California, March 31–April 2, 2004
C
180
Convex Analysis
Convex Analysis
extra requirement, that is, all functions {f0, . . ., fm} are convex functions and {gi, . . ., gp} are affine functions.
Indu Solomon and Uttam Kumar Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India
Affine Functions A function f : Rn ! Rm is affine, if it is a sum of linear function and a constant. It has a form f(x) ¼ Ax þ b, where A Rm n and b Rm.
Definition Convex analysis is a subfield of mathematical optimization, the word meaning of “optimal” is “best” and optimization techniques strive to bring out the best solution. Convex analysis studies the properties of convex functions and convex sets and aims to minimize convex functions over convex sets. It has application in various fields such as automatic control systems, signal processing, deep learning, data analysis and model building, finance, statistics, economics, and so on.
Affine Sets A set C Rn is an affine set if a line connecting any two points in C lies in C. If any x1, x2 C and θ R, θx1 þ (1 θ)x2 C (Boyd et al. 2004). Therefore, the set C must contain all linear combination of points in C, provided the coefficients θi in the linear combination sum to one. Examples are line, plane, and hyperplane.
Convex Sets Any set C is convex if the line segment between any two points in C lies in C. For any x1, x2 C and any θ in 0 θ 1, the combination is given by
Introduction In the beginning of 1960s, the works of mathematicians Ralph Tyrrell Rockafellar and Jean Jacques Moreau brought about great advancement in the field of convex analysis. An example for optimization in the field of economics could be profit maximization or cost minimization. As the system complexity increases, analytical solutions become infeasible, and we can reach the optimal solution only through convex analysis and optimization. The goal of convex analysis is to determine x , such that f(x ) f(x) (Au 2007) for all other points of x. Loss functions used in machine learning techniques are convex in nature, and hence, best model parameters are estimated through optimization techniques like gradient decent. Linear programming is a special case of convex optimization where the objective function is linear and the constraints consist of linear equalities and inequalities. An optimization problem can be defined as finding x that satisfies the following objective function
yx1 þ ð1 yÞx2 C:
ð2Þ
Every affine set is also convex. A convex combination of points in C is the weighted average of points with θi as weight for the point xi. Examples of convex and non-convex sets are shown in Fig. 1.
X2
X2
X1 X1
X3
minimize subject to
f 0 ðxÞ f i ðxÞ 0, i ¼ 1: . . . , m gj ðxÞ ¼ 0, j ¼ 1, . . . , p
Non-con vex set
ð1Þ
where f0 : Rn ! R is the objective function to be optimized and x Rn is the n-dimensional variable, fi(x) are inequality constraints, and gj(x) are equality constraints of the objective function. A valid solution exists if the optimization variable x is a feasible point and x has to satisfy m inequality constraints, fi : Rn ! R and p equality constraints gi : Rn ! R. A convex optimization problem is the one which satisfies an
Convex set
X1
X2
Non-con vex set Convex Analysis, Fig. 1 Convex sets and nonconvex sets
Convex Analysis
181
Convex Functions A function f : Rn ! R is convex, if domf is a convex set, and if for all x, y domf and θ, 0 θ 1 given by f ðyx þ ð1 yÞyÞ yf ðxÞ þ ð1 yÞf ðyÞ
ð3Þ
where domf is the domain of function f and is the subset of points of x Rn for which function f is defined (Boyd et al. 2004). The criteria in Eq. (3) are geometrically explained in Fig. 2. It means that the function f lies below the line segment connecting any two points on the function.
Second-Order Conditions Suppose if f is twice differentiable and its Hessian or second derivative exists at each point in domf, then the function f is convex iff its domf is convex and its second derivative or hessian is positive semi-definite for all x domf. ∇2 f ðxÞ 0
The condition ∇2f(x) 0 can be geometrically interpreted as the function f(x) must have positive curvature. The gradient ∇ of f(x) with x Rn is given by
First-Order Conditions If the function f is differentiable and ∇f exists at each point in domf, then f is convex if and only if domf is convex and T
f ðyÞ f ðxÞ þ ∇f ðxÞ ðy xÞ
∇f ¼
ð4Þ
for all x, y domf. Figure 3 is the geometrical representation of the inequality described in Eq. (4) and it states that a convex function always lies above its tangent. Hence, the tangent to a convex function is its global underestimator and if we have the local information about a function, we can derive the global information.
@2f @x21
∇ f ¼
@2f @x1 @x2
@2f @x2 @x1 ⋮ @2f @xn @x1
f(y)
@f @x1 @f @x2 : ⋮ @f @xn
ð6Þ
The hessian ∇2 for f(x) with x Rn is given by,
2
Θ f(x) +(1-q) f(y)
ð5Þ
@2f @x22 ⋮ @2f @xn @x2
...
@2f @x1 @xn
...
@2f @x2 @xn
⋱ ...
ð7Þ
⋮ @2f @x2n
Example-1: f ¼ x2 þ xy þ y2
f(x)
∇f ¼ f(Θx + (1-Θ)y)
∇2 f ¼ x
Θx +(1- Θ)y
y
Convex Analysis, Fig. 2 Convex function criteria
Ȉ
f(y)
f (x)T (y − x)
x
Convex Analysis, Fig. 3 First-order condition
x þ 2y 2 1 1 2
Eigen values of ∇2f is l ¼ 1, 3 and the eigen values are > 0. Hence, ∇2f is positive semi-definite and the second-order conditions of convexity are satisfied (Fletcher 2013). Figure 4 shows the 3D plot of f ¼ x2 þ xy þ y2. Example-2: f ¼ x2 þ 3xy þ y2 ∇f ¼
y f(x)
2x þ y
∇2 f ¼
2x þ 3y 3x þ 2y 2 3 3 2
Eigen values of ∇2f are l ¼ 5, 1, and all the eigen values are not 0. Hence, ∇2f is not positive semi-definite and the
C
182
Convex Analysis
Convex Analysis, Fig. 4 3D plot of the Convex function f ¼ x2 þ xy þ y2
Convex Analysis, Fig. 5 3D plot of the concave function f ¼ x2 þ 3xy þ y2
second-order conditions of convexity are not satisfied. Figure 5 shows the 3D plot of f ¼ x2 þ 3xy þ y2.
is a weighted sum of the constraints with the existing objective function. Lagrangian L is defined as L : Rn Rm Rp ! R.
Operations that preserve Convexity
p
m
Lðx, l, nÞ ¼ f 0 ðxÞ þ
li f i ð x Þ þ i¼1
Non-negative Weighted Sums If f is a convex function and α 0, then αf is a convex function. Sum If f1 and f2 are convex functions, then their sum f1 þ f2 is also convex. A non-negative weighted sum of convex functions is convex. Point-wise maximum If {f1, f2, . . ., fn} are convex functions, then f ¼ max {f1, f2, . . ., fn} is convex.
Duality The Lagrangian Consider the optimization problem defined in Eq. (1). Lagrangian duality is used to modify the objective function by including the constraints. The updated objective function
n i hi ð x Þ
ð8Þ
i¼1
In Eq. (8), li is the Lagrange multiplier associated with the ith inequality constraint fi(x) 0 and ni is the Lagranage multiplier associated with the ith equality constraint hi(x) ¼ 0. The Lagrange Dual Function Lagrange dual function g : Rm Rp ! R is the minimum value of the Lagrangian over x: for l Rm, n Rp given by gðl, nÞ ¼ inf Lðx, l, nÞ xD
p
m
¼ inf
xD
f 0 ðxÞ þ
li f i ðxÞ þ i¼1
ni hi ðxÞ
ð9Þ
i¼1
The dual function is a pointwise infimum of a family of affine functions, and hence, it is concave. The dual function gives the lower bound of the optimal value p for any l 0 and any n, g(l, n) p . Let x be a feasible point, then f i ðxÞ 0 and hi ðxÞ ¼ 0 and li 0. Mathematically,
Convex Analysis
183 p
m
li f i ð x Þ þ i¼1
n i hi ð x Þ 0
ð10Þ
i¼1
Then Eq. (9) becomes p
m
Lðx, l, nÞ ¼
f 0 ðxÞ þ
li f i ð x Þ þ i¼1
n i hi ð x Þ i¼1
f 0 ðxÞ
ð11Þ
Therefore,
signature of pixels to RGB colors. Cat swarm optimization technique was used for feature extraction from satellite images (Prabhakaran et al. 2018). Cat optimized algorithm distinguishes the inner, outer, and extended boundary along with the land cover. Examples of convex analysis in geospatial applications are terrain analysis (Prabhakaran et al. 2018), landslide susceptibility mapping (Pham et al. 2019), and in interactive hyperspectral visualization (Cui et al. 2009).
Summary and Conclusions
gðl, nÞ ¼ inf Lðx, l, nÞ Lðx, l, nÞ f 0 ðxÞ xD
ð12Þ
Every feasible point satisfies gðl, nÞ f 0 ðxÞ and hence for every feasible point, g(l, n) p . For a nontrivial lower bound p , l 0 and l, n domg.
Example – Least Squares Solution Minimize xT x subject to Ax ¼ b. The problem has equality constraints and the Lagrangian is given by, L(x, n) ¼ xTx þ n(Ax b) in the domain Rn R p. The dual function is given by g(n) ¼ infx L(x, n). The optimality condition is given by
In this chapter, we have explored mathematical optimization and especially convex optimization. Optimization concepts like affine sets, convex sets, and convex functions were explained. The first-order and second-order optimality conditions for a convex function put a constraint on the function that the hessian of the function must be positive semi-definite for the function to be convex. Duality concept combines the function and the constraints into a single equation with the help of the Lagrange multiplier. Optimization concepts are useful in geospatial applications and a few examples were mentioned in this chapter.
Cross-References ∇x Lðx, nÞ ¼ 2x þ AT n ¼ 0
ð13Þ
which gives x ¼ 12 AT n: Hence, the dual function can be written as gðnÞ ¼ 14 nT AAT n bT n and the dual function is a concave quadratic function with domain R p. The lower bound property of dual function for any n Rp gives 1 nT AAT n bT n inf xT xjAx¼ b 4
▶ Constrained Optimization ▶ Hyperspectral Remote Sensing ▶ Optimization in Geosciences
Bibliography ð14Þ
Examples of Optimization Techniques in Geospatial Applications Spatial data analysis and linear programming can be combined to obtain an optimal solution to optimal location problems. An optimal geographical location for a desired building of a restaurant/mall/park/anything of interest can be found by setting the mathematical model with the right optimization criteria (Guerra and Lewis 2002). Convex optimization technique was used for hyperspectral image visualization (Cui et al. 2009). Maintaining the spectral distances, spectral discrimination and interactive visualization were three objectives of hyperspectral image visualization techniques. A combination of principal component analysis and linear programming techniques was used for mapping the spectral
Au JKT (2007) An ab initio approach to the inverse problem-based design of photonic bandgap devices (Doctoral dissertation, California Institute of Technology) Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge university press Cui M, Razdan A, Hu J, Wonka P (2009) Interactive hyperspectral image visualization using convex optimization. IEEE Trans Geosci Remote Sens 47(6):1673–1684 Fletcher R (2013) Practical methods of optimization Guerra G, Lewis J (2002) Spatial optimization and gis. Locating and optimal habitat for wildlife reintroduction. McGill University Prabhakaran N, Ramakrishnan SS, Shanker NR (2018) Geospatial analysis of terrain through optimized feature extraction and regression model with preserved convex region. Multimed Tools Appl 77(24): 31855–31873 Pham BT, Prakash I, Chen W, Ly HB, Ho LS, Omidvar E et al (2019) A novel intelligence approach of a sequential minimal optimization-based support vector machine for landslide susceptibility mapping. Sustainability 11(22):6323
C
184
Copula in Earth Sciences
Copula in Earth Sciences Sandra De Iaco and Donato Posa Department of Economics, Section of Mathematics and Statistics, University of Salento, Lecce, Italy
Synonyms Multivariate distribution; Spatial dependence
Definition A copula is a multivariate distribution, defined on the unit hypercube, which is characterized by uniform marginals. In particular, it describes the dependence of multivariate distributions by marginal distributions. By recalling the result of Sklar (1959), any multivariate distribution F(z1, z2, ..., zn) can be written as a copula of its one-dimensional marginal distribution Fi(z), i ¼ 1, 2, ..., n, that is Fðz1 , . . . , zn Þ ¼ CðF1 ðz1 Þ, F2 ðz2 Þ, . . . , Fn ðzn ÞÞÞ:
ð1Þ
Moreover, the function C in (1) is unique, if the marginal distributions are assumed to be continuous.
Introduction Quality parameters in Earth Sciences are characterized by significant spatial variability. In Geostatistics the evaluation of this variability is based on the use of the variogram. In addition, copulas provide a valid chance to give information on the dependence structures for multivariate distributions and in particular bi-dimensional copulas can be an alternative to the use of the traditional measures of spatial correlation (i.e., variograms and covariance functions) in describing the spatial variability. Thus, the copula-based models are widely utilized in stochastic interpolation (alternatively to kriging) or for simulation.
Measures of Spatial Variability Traditionally, the spatial variability of various variables encountered in Earth Sciences are analyzed by using geostatistical methods (Matheron 1971). For geostatistical studies, spatial observations, for a given variable Z of interest, are assumed to be a finite realization of a random function (or random field), which is usually supposed to satisfy the
hypothesis of second-order stationarity or at least the hypothesis of intrinsic stationarity. Under these conditions, the covariance function or the variogram of Z at two different locations x1 and x2 of the spatial domain, depends on the spatial separating vector h ¼ x1 x2 and can be estimated from the available sample data. Thus, the spatial variability or spatial correlation is well described by these moments, however they cannot highlight potential differences in the spatial dependence related to extreme values or to central values, since they represent the spatial correlation obtained as an integral over the whole distribution of the variable values. Regarding this aspect, it is well known that different quantiles can be characterized by a different spatial dependence structure (Journel and Alabert 1989). For example, extreme values can have a spatial dependence structure which is different from the one related to central values. Indicator variograms represent a way to assess the difference in dependence with respect to the values of the variable under study. For this reason, the indicator approach used in simulation and interpolation studies requires that variogram models are fitted for each selected threshold. However, an indicator variogram model is fitted for each threshold separately and it is not applied any stochastic model. Therefore the results might be not consistent; the monotonicity of the estimated indicator variogram (corresponding to the estimated distribution function) for the various cutoff values might not be guaranteed. With the support of the indicator variograms, Journel and Alabert (1989) pointed out that the dependence between variables often deviates significantly from the Gaussian. Another advantage in using indicator variogram emerges when data have skewed distributions, as usually happened in Earth Sciences applications. In these cases, the estimation of the variogram (which is based on the squared differences between the values measured at different locations) might be severely influenced by few anomalous squares. These problems can be mitigated by adopting the indicator approach, with the effort of computing several indicator variograms. In alternative, another technique to be used to face the same problem is to consider the ranks for interpolation, as illustrated in Journel and Deutsch (1996). In the following, the concept of copulas and its help in the study of the spatial variability of earth quality variables is underlined. In particular, it is clarified that the dependence structure between random variables can be represented through copulas with no information on the marginal distributions. In other terms, copulas is interpreted as the key measure of the dependence over the range of quantiles.
Some Fields of Application Nowadays, an emerging interest in applying copulas in the spatial context is noticeable. There are numerous applications
Copula in Earth Sciences
185
of copulas in finance, where it is important to study the dependence between extremes (Embrechts et al. 2002) and to adopt appropriate non-Gaussian type of dependence for the estimation of financial risks. The non-Gaussian structure of dependence can also be found in other contexts, such as in hydrology, for groundwater modeling (Gomez and Wen 1998), where the consequences of the departure from normality on some aspects regarding ground water flow and transport were analyzed. Copulas were also applied on stochastic rainfall simulation or on extreme value statistics, among other by De Michele and Salvadori (2003); Favre et al. (2004).
C : ½0, 12 ! ½0, 1
Cðu, 0Þ ¼ Cð0, uÞ ¼ 0 and C ð1, uÞ ¼ C ðu, 1Þ ¼ u, given u1 v1 and u2 v2 , then C ðu1 , u2 Þ þ C ðv1 , v2 Þ C ðu1 , v2 Þ Cðv1 , u2 Þ 0:
C : ½0, 1 ! ½0, 1
ð3Þ
which is characterized by uniform marginals, ¼ ui ,
if
u
ð iÞ
¼ ð1, . . . , 1, ui , 1, . . . , 1Þ,
and is zero if at least one of the arguments is zero, that is C ðuÞ ¼ 0,
u ¼ ðu1 , . . . , ui , , . . . , un Þ,
if
∃i : ui ¼ 0: For every n-dimensional hypercube in the domain [0, 1]n (called unit hypercube), that is for [u0 , v0 ] [0, 1]n, the associated probability has to be nonnegative: V C ð½u0 ,v0 Þ 0, v0
v0
v0
. . . Du10 C ðuÞ is the n-difference where V C ð½u0 , v0 Þ ¼ Dun0n Dun1 0 n1
1
of C on [u00 , v0 ] and a first-order difference is v defined as Duk0 C ðuÞ ¼ C u1 , . . . , uk1 , v0k , ukþ1 , . . . , un Þ k C u1 , . . . , uk1 , u0k , ukþ1 , . . . , un . The readers may refer to Joe (1997) or Nelsen (1999) for more information.
Some Features of Copulas If C is assumed to be an absolutely continuous function, the corresponding copula density, denoted with c(), is expressed as
ð2Þ
and has to fulfill the following properties:
II
n
C u
As already specified, copulas are multivariate distributions defined on the unit hypercube with uniform marginals. In particular, for n ¼ 2 the definition of bivariate copula (or bidimensional copula) C can be given as follows:
I
These last property guarantees the non-negativity of the probability associated to any rectangle in the unit square. For a n-dimensional space, the multivariate copula is a function
ðiÞ
Basic Concepts
cðu1 , . . . , un Þ ¼
@ n C ð u1 , . . . , un Þ : @u1 . . . @un
ð4Þ
In addition, it is worth introducing the conditional copula can which be written as Cðu1 jU 2 ¼ u2 , . . . , U n ¼ un Þ ¼
1 c ð u2 , . . . , un Þ
@ n1 Cðu1 , . . . , un Þ @u2 . . . @un
ð5Þ
Thus, a copula C can be viewed as a multivariate cumulative distribution function, while a copula density c is interpreted as the strength of dependence. Given two independent variables, then the bivariate copula of their joint distribution function is obtained through the product of the marginals C[F(z1), F(z2)] ¼ F(z1)F(z2); for this reason, it is known as the product copula and the density copula is equal to 1. In the case of full dependence, the copula becomes C[F(z1), F(z2)] ¼ min[F(z1), F(z2)]. The invariance of the copula of a multivariate distribution with respect to monotonic transformations of the marginal variables represents one of the main advantages. Thus, some common data transformations (a normal score transformation and Box-Cox transformations, included logarithms) do not affect the copula. On the basis of this characteristic, the use of copula is preferred to the approaches based on covariance functions or variograms, as these last are strongly influenced by the marginal distributions. Another relevant property of a copula is associated to the possibility of pointing out whether the corresponding dependence is related to the intensity of the variable realizations, in other term, whether high (low) values present a strong (weak) spatial dependence. A bi-dimensional copula highlights a symmetrical dependence if
C
186
Copula in Earth Sciences
Cðu1 , u2 Þ ¼ Cð1 u1 , 1 u2 Þ 1 þ u1 þ u2 ,
ð6Þ
or in terms of its density cðu1 , u2 Þ ¼ cð1 u1 , u2 Þ :
ð7Þ
Note that the symmetric property is given with respect to the axis u2 ¼ 1 u1 of the unit square.
Copula Families Building new families of copulas is not an easy task. Traditional copula families include the Archimedean class (Genest and MacKay 1986) as well as the Farlie-GumbelMorgenstern class (Nelsen 1999). Moreover, the Gaussian copulas is another well-known family of copulas which is used to explain the dependence form of the multivariate normal distribution. They are frequently used in hydrology, although the term copula is not popular; they are applied for regressions, or in case of normal score transformation. In Table 1, some well-known families of bivariate copula are given together with their parameter space and the dependence structure. Note that the normal copula class presents a symmetric density, which can be properly expressed as follows: cðu1 , . . . , un Þ ¼ kn exp 0:5zT S1 I z
ð8Þ
where ui ¼ F(zi), F is the standardized normal distribution function, kn is a normalizing constant, S is the correlation matrix, and I is called identity matrix.
For n ¼ 2, S depends on the only parameter r, which is the correlation coefficient between the two variables. A graphical representation of the bivariate Gaussian copula together with the associated copula density given r ¼ 0.85 is shown in Fig. 1. Note that the symmetry of the density, as specified in (7). It is also worth highlighting that, for the user, the dependence can be better visualized through the density rather than the copula itself. Similarly to the Gaussian family, several other bivariate copula classes have multivariate extensions, which are desirable in spatial modeling. However, these families are considered not flexible enough, since they depend only on one or two parameters. Moreover, it might happen that the multivariate distribution has different pairwise dependence structures across its margins, as a consequence a single family might not be able to capture this behavior. These restrictions can be get through with vine copulas, as will be clarified in one of the next sections. The introduction of a bivariate spatial copula into a vine copula for interpolation has been described by Gräler and Pebesma (2011) and is extended in Gräler (2014), where convex combinations of bivariate copulas parametrized by distance are combined in a vine copula (also known as pair-copula construction).
Copulas for Describing Spatial Variability In spatial statistics, copulas characterize the joint multivariate distribution associated to the regionalized variables at different locations of the domain of interest. In this context, the variable at each location of the domain is assumed to have the same probability distribution. Let Z ¼ {Z(x), x D} be a random function, where D represents the spatial domain (one-, two-, or three-
Copula in Earth Sciences, Table 1 Some well-known families of copulas Name Gaussian
Copula CN(u1, u2; r) ¼ F(F1(u1), F(u2))
Parameter r
Student-t
1 CST(u1, u2; r, v) ¼ T(t1 v (u1), tv (u2))
r, v
Gumbel
CG(u1, u2; δ) ¼ exp {[( log u1)δ þ ( log u2)δ]1/δ}
δ1
Rotated Gumbel Plackett
CRG(u1, u2; δ) ¼ u1 þ u2 1 þ CG(1 u1, 1 u2; δ)
δ1
CP ðu1 , u2 ; yÞ ¼
1 ½1 þ ðy 1Þ ðu1 þ u2 Þþ 2ðy 1Þ
θ 0, θ 6¼ 1
Dependence structure Tail independence: lU ¼ lL ¼ 0 Symmetric tail dependence: l U ¼ lL ¼ p 2tvþ1 v þ 1 1 r= 1 þ r Asymmetric tail dependence: lU ¼ 2 21/δ, lL ¼ 0 Asymmetric tail dependence: lU ¼ 0, lL ¼ 2 21/δ Tail independence: lU ¼ lL ¼ 0
½1 þ ðy 1Þ ðu1 þ u2 Þ2 4yðy 1Þu1 u2 Frank
1 CF ðu1 , u2 ; yÞ ¼ log 1 þ þ exp yu1 1 exp yu2 1 = exp y 1 g y
0 2) random variables that optimally honors the bivariate marginals. For this aim, although there are different conditions that ensure the existence of such a copula (Drouet and Kotz 2001), building a multivariate copula (with n > 2), on the basis of given bivariate marginals, is not an easy task and sometimes it is not possible at all on for the incompatibility of the bivariate marginals. Rueschendorf (1985) proposed an interesting method for constructing multivariate distributions and it can be used to define copulas (Drouet and Kotz 2001). Bardossy (2006) started from the selection of a multivariate distribution, then a suitable multivariate copula was found. Indeed, a multivariate copula can be defined through a multivariate distribution, then the multivariate model is obtained through a combination with the univariate marginal of the regionalized variable. On the basis of Sklar’s theorem, given a multidimensional distribution function F, whose margins are absolutely continuous, the Eq. (1) is valid, where C (uniquely determined) is a copula function. A copula function C is determined from the function F by considering 1 1 Cðu1 , u2 , . . . , un Þ ¼ F F1 1 ðu1 Þ, F2 ðu2 Þ, . . . , Fn ðun Þ ,
ð14Þ where Fi, i ¼ 1, 2, ..., n are the corresponding univariate distribution functions of F. On the other hand, any copula C can be applied to combine a set of univariate distribution functions F1, F2, ..., Fn in order to build a multivariate distribution F. This technique is often adopted, for example, in case of multivariate normal or t copula.
Copula in Earth Sciences
189
Various well-known multivariate copula functions only depend on a restricted number of parameters which can cause that the dependence would not change with respect to the spatial distance between two points and that the third property above-mentioned would be violated. The multivariate normal copula and the Farlie-Gumbel-Morgenstern copn ula present parameters. However, the former is 2 symmetric, that is satisfies Eq. (7), the latter has the drawback of showing a decaying correlation between pairs of variables with the increase in the number of points (Drouet and Kotz 2001). This last feature represents a critical aspect especially when the two variables corresponds to very close points x1 and x2 and it is expected to have a high correlation. Thus, this multivariate copula does not respect the first two properties of the above list. Copula models can be built on the basis of Eq. (14) starting from a given multivariate distribution. Moreover, interesting constructions of multivariate copulas can be obtained through non-monotonic transformations from the Gaussian copula; indeed, as underlined in Bardossy (2006), they are asymmetric. For example, given a nonmonotonic function g(t) and an n-dimensional normal random variable Y ~ N(m, Γ), whose expected value is mT ¼ (m,...., m) and covariance matrix Γ, then V is defined for each coordinate j ¼ 1,..., n as Vj ¼ g Yj :
ð15Þ
For simplicity, all marginals are assumed to have unit variance. By using (15), the copula of the transformed multivariate distribution is not necessarily Gaussian. For example, the copula of the non-centered multivariate chi-square distribution can be easily obtained by considering the transformation function g(t) ¼ t2 and V j ¼ Y 2j . In this way, the corresponding n-dimensional distribution has identical onedimensional marginals. In particular, the marginal variables follows a w2 distribution with a degree of freedom equal to 1, under the hypothesis that the expected value is equal to zero (m ¼ 0). Thus, after getting the multivariate density function fn, the multivariate copula density can be determined through the following equality: f n ðz1 , z2 , . . . , zn Þ ¼ cðF1 ðz1 Þ, F2 ðz2 Þ, . . . , Fn ðzn ÞÞ f 1 ðz1 Þ f 2 ðz2 Þ f n ðzn Þ,
ð16Þ
where fi, i ¼ 1, 2, ..., n are the marginal density. The presence of an asymmetric dependence can be assessed by comparing the strength of the dependence of high values and the one related to low values. In particular, the following ratio of the joint probability between high values (exceeding the quantile 1 u) and the joint probability between low values (not exceeding a quantile u):
Að u Þ ¼
2u 1 þ Cð1 u, 1 uÞ Cðu, uÞ
ð17Þ
is such that A(u) > 1 if high values present a stronger dependence than the low values, while A(u) < 1 if high values present a weaker dependence than the low values; the case A(u) ≈ 1 occurs when the dependence is the same. Moreover, as highlighted in Joe (1997), the tail dependence for a given copula offers information about multivariate extreme value distributions, that is about the dependence in the upper and lower quadrants. However, the tail dependence is rarely applied in spatial statistics, since the main focus does not often regard the extremes, but the location where the observations that are greater than certain thresholds are recorded. For the spatial context, the elements of the correlation matrix Γ are the correlation function r(h), where h ¼ xi xj, i ¼ 1, 2, ..., n. Thus, for any set of points x1x2, ..., xn, the generic element of the covariance matrix associated to the Gaussian variable Y, from which the copula is generated, is Γ[ij] ¼ [r(xi xj)]. At this point, the estimation of the correlation function r(h) of Y, on the basis of the observations, has to be faced. However, this is not an easy task since on one hand Y is not an observed variable and on the other hand it cannot be obtained from the measured variable Z because the transformation function is non-monotonic. Regarding modeling, note that Γ has to be a positive semidefinite matrix, then the function r has to be a positive semidefinite function. Nowadays, the family of positive semi-definite models includes a wide range of parametric functions, among which users can choose. The maximum likelihood method as well as the rank correlations and the asymmetry of the copulas can be used for the parameters estimation of the selected correlation model. Maximum likelihood is often applied only for the bivariate marginals rather than for the n-dimensional copula (whose density requires the sum of 2n elements). Indeed, in the former case, the complexity is greatly reduced to the order of n2, since observations are paired for computation; however, the existence of independence among different pairs implicitly assumed is not supported by the model. Fitting the rank correlation is often done by graphical inspection or through least squares, analogously to the variogram fitting. If the least squares method is used, then it is necessary to calculate first the sample rank correlations corresponding to the empirical copulas and the theoretical rank correlation can be computed by using the Spearman’s rank correlation (Joe 1997) r0 ¼ 12
1
1
0
0
uv cðu, vÞdvdu 3:
ð18Þ
Thus, it is minimized, with respect to the parameters of the correlation function, the sum of the squared differences
C
190
Copula in Earth Sciences
between the rank correlations of the theoretical and the empirical copulas plus the squared difference of the asymmetries.
Note that simulating the w2 copula is an easy task, since after simulation of the normal Y, then the transformation according to the equation V j ¼ Y 2j can be performed.
Validation of the Copula Models The goodness of fit of a selected copula model can be assessed by comparing the sample bivariate copulas with respect to the fitted theoretical ones. For this aim, some statistical tests are available in the literature, such as the Kolmogorov-Smirnov distance D1 or a Cramér von Mises type distance D2: D1 ¼ sup jCðu1 , u2 Þ Cðu1 , u2 Þj; ðu1 , u2 Þ ½0, 12 , ð19Þ D2 ¼
1
1
0
0
2
Cðu1 , u2 Þ Cðu1 , u2 Þ du1 du2 :
ð20Þ
However, performing the test statistics is not straightforward since the independence of samples cannot be assumed. Alternatively, different realizations of random functions having the same multivariate copula can be simulated, so that the assessment of the significance can be obtained by using a bootstrap framework (Efron and Tibshirani 1986). Thus, an algorithm of statistical testing based on bootstrap, useful to analyze the suitability of the spatial dependence models, is described hereafter. The validation procedure consists of the following steps: 1. Computation of the sample copulas from the observations on the basis of (12). 2. Parameter estimation of the vector-dependent bivariate copula by using the available sample. 3. Comparison between the sample copula and the theoretical copula, for each distance lag, through the distances D1 and D2. These are indicated with the notation D 1 and D 2 . 4. Fixing the counter i equal to zero. 5. Simulating one realization for the multivariate copula related to the sample points. In other terms, the simulation aims to produce a realization for the defining multivariate distribution, then the corresponding multivariate copula is constructed through the formulation given in (14). 6. Calculation of the sample copulas of the simulated field. 7. Computation of the indexes D1(i) and D2(i), for these copulas. 8. Increasing the counter i by one. 9. Replication of the previous steps 5–8 until the number H of simulations is reached. 10. Ranking of the distances D1(i) and D2(i), i ¼ 1, 2, ..., H, such that Dj(1) ... Dj (H) with j ¼ 1, 2. 11. Rejection of the copula at the level a of significance for the index Dj, if Dj ði0 Þ D j Djði0 þ 1Þ and i0 > H(1 – α).
Vine Copula In spatial statistics, the interpolation approaches are based on different variants of kriging. However, the choice of assuming that the random field is Gaussian might be too limiting with respect to other more flexible probabilistic models. In this case, the theory of copula can be of some help. Gaussian multivariate distributions have two restrictions: elliptical symmetry and tail independence. This last means that joint extreme events are independent and this is not influenced by the covariance model, while the first limitation implies that pair of high values and pair of low values, at two spatial points, have the same level of dependence. In turn, this characteristic generates smooth interpolation when kriging is used. Thus, using the concept of copula and copula families, which are different from the Gaussian, can overcome these restrictions. The idea, known as the pair-copula construction, is to connect bivariate copulas to higherdimensional vine copulas, through vines (Bedford and Cooke 2002). Since the statistical applicability of vine copulas with non-Gaussian building bivariate copulas was recognized by Aas et al. (2009), vine copulas became a standard tool to describe the dependence structure of multivariate data (Aas 2016). Thus, it is worth introducing first the concept of regular vines from Kurowicka and Cooke (2006). Definition 1 (Regular vine) A collection of trees V ¼ (T1, ..., Td–1) is a regular vine on d elements if. 1. T1 is a tree with nodes N1 ¼ {1, ..., d} connected by a set of non-looping edges E1. 2. For i ¼ 2, ..., d, Ti is a connected tree with edge set Ei and node set Ni ¼ Ei, where |Ni| ¼ d (i 1) and |Ei| ¼ d i are the number of edges and nodes, respectively. 3. For i ¼ 2, ..., d 1, 8e ¼ {a,b} Ei: |a \ b| ¼ 1 two nodes a, b Ni are connected by an edge e in Ti if the corresponding edges a and b in Ti share one node (proximity condition). A tree T ¼ (N, E) is an acyclic graph, where N is its set of nodes and E is its set of edges. Acyclic means that there exits no path such that it cycles. In a connected tree we can reach each node from all other nodes on this tree. R-vine is simply a sequence of connected trees such that the edges of Ti are the nodes of Tiþ1. A traditional example of these structures are canonical vines (C-vines) and drawable vines (D-vines) (Aas et al. 2009) in Fig. 2. Every tree of a C-vine is defined by a root node, which has di incoming edges, in each tree Ti,
Copula in Earth Sciences
191
a
2 12 T1
13
14
1 23|1
b
3 15
24|1
25|1
35|12
23|1 34|12
45|123
T1 T2
14
1
12 12
2 13|2
15 T3
24|1
34|12
T4
5
13
T2 12
T3
4
25|1
T4
13|2
23 23 14|23
14|23
34
3 24|3
24|3
34 25|34
15|234
4 35|4
45
5
C
45
35|4
25|34
35|12
Copula in Earth Sciences, Fig. 2 (a) C-vine and (b) D-vine representation for d ¼ 5, as given in Aas et al. (2009)
i {1, ..., d 1}, whereas a D-vine is solely defined through its first tree, where each node has at most two incoming edges. After this brief review of the basic notion of vine, it become easy to introduce the construction of vine copulas to be used in modeling a random field in space or spacetime. In this context, the n-dimensional multivariate distribution to be modeled is referred to the given random field evaluated only at a set of discrete points x1, ..., xn (which can be either spatial locations, temporal points, or spatio-temporal points). By following the approach of Gräler and Pebesma (2012); Gräler and Pebesma (2011), bivariate spatial copulas lead to only spatial, only temporal, or spatio-temporal multivariate copulas, differently from the other approaches which have built spatial copulas on the basis of single-family multivariate copulas (Bárdossy 2006; Kazianka and Pilz 2011). Note that a software package is also available on the R environment and is called spcopula. The pair-copula construction is based on the idea that multivariate copulas can be approximated with nested bivariate copulas blocks. It is worth to underline that the use of the
term “approximate” is due to the consideration that not all copulas can be rebuild as a vine copula, thus although vine copulas are multivariate copulas, in some cases they can only approximate the target copula. As well-known the decomposition of the multivariate copula into nested bivariate blocks is not unique, then different decompositions can be obtained from different regular vines, and the choice of the actual vine will depend on the goodness of fit to the multivariate target copula. For the specific spatial and spatio-temporal context, where there is a neighbourhood that can be used to define the decomposition and a central location with respect to which all initial dependencies can be fixed, the canonical vine copula is a reasonable choice. In Fig. 3, a five-dimensional canonical vine is illustrated, although the conditional cumulative distribution functions have been reported without their arguments. The density of the respective pair-copula construction is the product of all bivariate copula densities following the decomposition structure:
cðu0 , . . . , u4 Þ ¼ c01 ðu0 , u1 Þ c02 ðu0 , u2 Þ c03 ðu0 , u3 Þ c04 ðu0 , u4 Þ c12j0 u1j0 , u2j0 c13j0 u1j0 , u3j0 c14j0 u1j0 , u4j0 c23j01 u2j01 , u3j01 c24j01 u2j01 , u4j01 c34j012 u3j012 , u4j012 ,
where the recalled conditional cumulative distribution functions uk|V ¼ Fk|V (zk) correspond to partial derivatives of the copulas involved, as specified below ukjv ¼
@Cjkjvj ujjvj , ukj0...j1 , @ujjvj
ð21Þ
with v the set of indexes of the conditioning variables and v j the set of indexes excluding j. It is worth recalling that the use of vine copulas is advisable not only to model non-Gaussian random fields Z (where the non-Gaussianity is referred to marginal distributions), but also to adopt non-Gaussian dependence structure between locations. Then, since every n-dimensional neighbourhood of a random field Z can be interpreted as n-variate distribution, the vine copulas describing these
192
Copula in Earth Sciences
Copula in Earth Sciences, Fig. 3 Canonical vine structure for the construction of a five-dimensional spatial random field, as given in Gräler and Pebesma (2011)
neighbourhoods need to capture changing correlations across the domain (assuming often isotropy and stationarity), in order to flexibly reflect different neighbourhoods arrangements. To this aim, Gräler (2014) proposed to use blocks of bivariate copulas, which are convex combinations of wellknown bivariate copula families whose parameters depend on distance (in space or in spacetime) in the vine copula. These leads to spatial or spatio-temporal vine copulas which are distance-aware. Indeed, an earlier contribution Gräler and Pebesma (2011) highlighted the potentiality of spatial vine copulas for heavily skewed spatial random fields and a first attempt to extend the spatial approach to the spatio-temporal one was given in Gräler and Pebesma (2012). In this way, modeling the dependence structure, in terms of strength and shape, between points in space or in space-time is very flexible. Note that the bivariate space or space-time copulas on the first tree essentially recall a Gumbel copula, since it allows for a strong dependence in the tails, differently from the Gaussian case. Moreover, considering marginals together to this vine copula makes a local multivariate probabilistic model of the random field available for prediction at unsampled points. Unlike kriging where each predictive distribution is always Gaussian, the conditional distributions of the covariate vine copula can assume any functional form and then can provide more realistic uncertainty estimates. However, modeling separately marginals and dependence structure offers a wide range of alternatives in the definition of a random field’s distribution. Nevertheless, it is crucial to recover reliable models for both components in order to guarantee good results. Simulation from the modeled random function is also practicable and can be performed by using the same package spcopula. For further details on bivariate spatial copulas, where distance is incorporated as parameter, together with the use of many bivariate copula families, the readers can refer to Gräler (2014).
Conclusions Nowadays, there are wide areas of research that involves relevant theoretical and computational challenges to face interpolation or prediction problems as well as applications in various scientific fields. In particular, some significant lines of research can be found in the construction of new class of models for spatial and spatio-temporal data, as well as in the analysis of big spatial and spatio-temporal data sets or in machine learning methods developed by the computational science researchers.
Cross-References ▶ Markov Random Fields ▶ Random Function ▶ Variogram
Bibliography Aas K (2016) Pair-copula constructions for financial applications: a review. Econometrics 4:43 Aas K, Czado C, Frigessi A, Bakken H (2009) Pair-copula constructions of multiple dependence. Insurance 44:182–198 Bárdossy A (2006) Copula-based geostatistical models for groundwater quality parameters. Water Resour Res 42:W11416. (1–12) Bedford T, Cooke RM (2002) Vines – a new graphical model for dependent random variables. Ann Stat 30(4):1031–1068 De Michele C, Salvadori G (2003) A generalized Pareto intensityduration model of storm rainfall exploiting 2-Copulas. J Geophys Res 108(D2):4067 Drouet-Mari D, Kotz S (2001) Correlation and dependence. Imperial College Press, London Efron B, Tibshirani R (1986) Bootstrap method for standard errors, confidence intervals and other measures of statistical accuracy. Stat Sci 1:54–77
Correlation and Scaling Embrechts PME, McNeil AJ, Straumann D (2002) Correlation and dependency in risk management: properties and pitfalls. In: Dempster M (ed) Risk management: value at risk and beyond. Cambridge University Press, Cambridge, UK, pp 176–223 Favre A-C, El Adlouni S, Perreault L, Thiémonge N, Bobée B (2004) Multivariate hydrological frequency analysis using copulas. Water Resour Res 40:W01101 Genest C, MacKay R (1986) Archimedean copulas and families of bidimensional laws for which the marginals are given. Can J Stat 14:145–159 Gomez-Hernandez J, Wen X (1998) To be or not to be multi- Gaussian? A reflection on stochastic hydrogeology. Adv Water Resour 21:47–61 Gräler B (2014) Modelling skewed spatial random fields through the spatial vine copula. Spat Stat 10:87–102 Gräler B, Pebesma EJ (2011) The pair-copula construction for spatial data: a new approach to model spatial dependency. Procedia Environ Sci 7:206–211 Gräler B, Pebesma EJ (2012) Modelling dependence in space and time with Vine Copulas, expanded abstract collection from ninth international geostatistics congress, Oslo, Norway, June 11–15, 2012. International Geostatistics Congress. http://geostats2012.nr.no/1742830. html Joe H (1997) Multivariate models and dependence concepts. CRC Press, Boca Raton, Fla Journel AG, Alabert F (1989) Non-Gaussian data expansion in the earth sciences. Terra Nova 1:123–134 Journel AG, Deutsch CV (1996) Rank order geostatistics: a proposal for a unique coding and common processing of diverse data. In: Baafi EY, Schofield NA (eds) Geostatistics Wollongong 96. Springer, New York, pp 174–187 Kazianka H, Pilz J (2011) Copula-based geostatistical modeling of continuous and discrete data including covariates. Stoch Environ Res Risk Assess 24(5):661–673 Kurowicka D, Cooke R (2006) Uncertainty analysis with high dimensional dependence modelling. John Wiley & Sons, New York Matheron G (1971) The theory of regionalized variables and its applications. École des Mines de Paris, Paris Nelsen R (1999) An introduction to Copulas. Springer, New York Rueschendorf L (1985) Construction of multivariate distributions with given marginals. Ann Inst Stat Math 37:225–233 Sklar A (1959) Fonctions de répartition â n dimensions et leur marges. Publ Inst Stat Paris 8:131–229
Correlation and Scaling Frits Agterberg Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Definition Correlation and scaling (or ranking and scaling, in some publications abbreviated to rascing) is a technique to order stratigraphic events, observed at different locations in a large study region, along a linear scale with variable distances between them. The stratigraphic events considered in applications are mostly biostratigraphic, primarily first or last occurrences of fossil species, but some lithostratigraphic
193
events can be included as well. These other events may be ash layers, gamma-ray signatures, or other lithostratigraphic events that can be found in different stratigraphic sections across the entire study region considered, which is usually a sedimentary basin.
Correlation Between Sections in Stratigraphy Correlation of strata on the basis of their fossil content is one of the earliest scientific methods applied in geology. As documented by Winchester (2001), the first geological map was published in 1815 by William Smith under the title: “A Delineation of the Strata of England and Wales, with Parts of Scotland.” It was based on his discovery that strata could be correlated on the basis of their fossil content that changed with time. Shortly afterward, Charles Lyell (1833) in the first edition of his textbook Principles of Geology subdivided the Tertiary into different stages by counting how many fossil species they contain are still living today. Large uncertainties are commonly associated with the positioning of events in biostratigraphic sections. This subject is treated in more detail in the chapter on quantitative biostratigraphy. Here the emphasis is on how scaled optimum sequences can be used for correlation between sections. An artificial example to illustrate uncertainty associated with observed data is shown in Fig. 1. An early statistical method in quantitative stratigraphy was developed by Shaw (1964) for use in hydrocarbon exploration. In this book, Shaw illustrates his technique of “graphic correlation” on first and last occurrences (FOs and LOs) of trilobites in the Cambrian Riley Formation of Texas for which range charts had been published by Palmer (1954). Shaw’s method consists of constructing a “line of correlation” on a scatter plot showing the FOs and LOs of taxa in two sections. If quality of information is better in one section, its distance scale is made horizontal. FOs stratigraphically above and LOs below the line of correlation are moved horizontally or vertically toward the line of correlation in such a way that the ranges of the taxa become longer. The reason for this procedure is that it can be assumed that observed lowest occurrences of fossil taxa generally occur above truly highest occurrences and the opposite rule applies to highest occurrences. Consequently, if the range of a taxon is observed to be longer in one section than in the other, the longer observed range is accepted as the better approximation. The objective is to find approximate locations of what are truly the first appearance datum (or FAD) and last appearance datum (LAD) for each of the taxa considered. True FADs and LADs probably remain unknown, but, by combining FOs and LOs from many stratigraphic sections, approximate FADs and LADs are obtained and the intervals between them can be plotted on a range chart.
C
194
Correlation and Scaling, Fig. 1 This artificial example shows three stratigraphic sections in which three fossil taxa (a, b, and c) were observed to occur. Each taxon has first and last occurrence (FO and LO) labeled 1 and 2, respectively. Consequently, there are six stratigraphic events in total. The equidistant horizontal lines represent five regularly spaced sampling levels. The discrete sampling procedure changes the positions of the FOs and LOs. Any observed range for a taxon generally is much shorter than its true range of occurrence. In this example, the observed ranges become even shorter because of the discrete sampling. FOs and LOs coincide in three places. In reality, the FO of a taxon always must occur stratigraphically below its LO. It is because of the sampling scheme that these two events can occur at exactly the same level in this example (see circles in Fig. 1). For comparison, coincident FO and LO would occur in land-based studies when only a single fossil for a taxon would be observed in a stratigraphic section. In this artificial example, it is assumed that all three taxa occur in each section. In practical applications, many taxa generally are missing from many sections
Two sections usually produce a crude approximation, but a third section can be plotted against the combination of the first two, and new inconsistencies can be eliminated as before. The process is repeated until all sections have been used. The final result or “composite standard” contains extended ranges for all taxa. Software packages in which graphic correlation has been implemented include GraphCor (Hood 1995) and STRATCOR (Gradstein 1996). The method can be adapted for constructing lines of correlation between sections (Shaw 1964). Various modifications of Shaw’s graphic correlation technique with applications in hydrocarbon exploration and to land-based sections can be found in Mann and Lane (1995). An example of Shaw’s original method will be given later in this chapter for comparison with two other methods of biostratigraphic correlation. In another type of approach, Hay (1972) used stratigraphic information on calcareous nannofossils from sections in the California Coast Ranges as an example of application of his method of ordering biostratigraphic events. Hay’s objective was to construct a so-called optimum sequence of FADs and LADs. His example will be discussed in more detail in the next section, because it inspired the “rascing” approach for ranking and scaling later advocated by Gradstein and Agterberg (1982, also see Gradstein et al. 1985) to solve the same problem. In rascing, Hay’s optimum sequence is followed by scaling that consists of estimating intervals of
Correlation and Scaling
variable length between the successive events. Scaling can be regarded as a refinement of ranking. The computer program RASC (for latest version, see Agterberg et al. 2013) performs rascing calculations followed by construction of lines of correlation between sections with graphical displays of results. In their chapter on quantitative biostratigraphy, Hammer and Harper (2005) present separate sections on five methods of quantitative biostratigraphy: (1) graphic correlation, (2) constrained optimization, (3) ranking and scaling (RASC), (4) unitary associations, and (5) biostratigraphy by ordination. Theory underlying each method is summarized by these authors, and worked-out examples are provided. Their book on paleontological data analysis is accompanied by the free software package PAST (available through http://www. blackwellpublishing.com/hammer) that has been under continuous development since 1998. It contains simplified versions of CONOP for constrained optimization (Sadler 2004) and RASC, as well as a comprehensive version for unitary associations (Guex 1980), a method that puts much weight on observed co-occurrences of fossil taxa in time. For comparison of RASC and unitary associations output for a practical example, see Agterberg (1990).
Examples of Ranking, Scaling, and Correlation The first example to be discussed is concerned with sequencing of stratigraphic events. Nannoplankton faunizones were sampled in nine sections across California of which three are shown in Fig. 2 (Hay 1972). Stratigraphic information consisting of nine highest occurrences (HI) and one lowest occurrence (LO) of nannofossils was extracted from these sections labeled A-I in Fig. 3. The columns on the right represent a subjective ordering of the ten events and Hay’s original optimum sequence that is an objective ordering based on frequencies of how many times every event occurs above (or below) all other events considered. Agterberg and Gradstein (1988) took Hay’s approach one step further by estimating intervals between successive events in the optimum sequence. Each frequency fij was converted into a relative frequency pij ¼ fij/nij where nij is sample size. This relative frequency was changed into a “distance” Dij by means of the probit transformation: Dij ¼ F1( pij) (cf. Agterberg 2014). An example of this transformation is as follows. If two events turn out to be coeval in the optimum sequence, their interevent distance in the scaled optimum sequence becomes 0 as it should be. In Fig. 4, the optimum sequence for Hay’s original example is compared with the scaled optimum sequence obtained after subjecting all
Correlation and Scaling
195
Correlation and Scaling, Fig. 2 Locations of sections in the Sullivan (1965, Table 6) database for the Eocene used by Hay (1972) for example. (Source: Agterberg 1990, Fig. 4.1)
C
Correlation and Scaling, Fig. 3 Hay’s (1972) example. One last occurrence and nine “first” occurrences of Eocene nannofossils selected by Hay (1972) from the Sullivan Eocene database. Explanation of symbols: δ ¼ LO, Coccolithus gammation; F ¼ LO, Coccolithus cribellum; Θ ¼ LO, Coccolithus solitus; V ¼ LO, Discoaster cruciformis; < ¼ LO, Discoaster distinctus; P ¼ LO, Discoaster
germanicus; U ¼ LO, Discoaster minimus; W ¼ HI, Discoaster tribrachiatus; Δ ¼ LO, Discolithus distinctus; □ ¼ LO, Rhabdosphaera scabrosa. See Fig. 2 for locations of the nine sections (A-I). Some LOs are for nannofossils also found in Paleocene (Sullivan 1964, Table 3). The columns on the right represent subjective ordering of the events and Hay’s original optimum sequence. (Source: Agterberg 1990, Fig. 4.2)
196
Correlation and Scaling
Correlation and Scaling, Fig. 4 RASC results for Hay example of Fig. 3 (after Agterberg and Gradstein 1988); (a) ranked optimum sequence; (b) scaled optimum sequence. Clustering of events 1 to 7 in
the dendrogram (b) reflects the relatively large number of two-event inconsistencies and many coincident events near the base of most sections used. (Source: Agterberg 1990, Fig. 6.3)
superpositional frequencies to the probit transformation and representing the result in the form of a dendrogram. The approach can be taken a further step forward. In Fig. 5, which is for a single section in a larger database, scaled optimum sequence points are plotted against their depths, and a cubic spline curve was fitted to the data points. The points on the curve can be regarded as estimates of expected positions of the transformed relative frequencies. These expected positions of events in different sections can be used for correlation between sections. Figure 5 is for one of seven sections used by Shaw (1964) who illustrated his original technique of “graphic correlation” on first and last occurrences of trilobites in the Cambrian Riley Formation of Texas for which range charts had been published by Palmer (1954). Shaw (1964) extended his composite standard method to construct correlation lines between sections. Figure 6 shows correlation results for three sections
in the Riley Formation obtained by three different methods: (1) Palmer’s original biozones that were obtained by conventional subjective paleontological reasoning, (2) Shaw’s so-called Riley Composite Standard (SRC) correlation lines, and (3) Agterberg’s (1990) correlation and scaling (CASC) results in which the expected positions of events were connected by lines of correlation. Error bars in Fig. 5 are projections of single standard deviations on either side of probable positions of events obtained by the CASC method. Clearly, uncertainty in positioning of the correlation lines increases rapidly in the stratigraphically downward direction (for further explanation, see Agterberg 1990). Similar results were obtained by the three different methods used. In another application to a large database (see Fig. 7), the estimated RASC optimum sequence locations are compared with maximum deviations between observed and best-fitting spline curve for 44 offshore wells drilled along the
Correlation and Scaling
197
C
Correlation and Scaling, Fig. 5 RASC distance-event level plot for Morgan Creek section. Spline curve is for optimum (cross-validation) smoothing factor SF ¼ 0.382. (Source: Agterberg 1990, Fig. 9.23)
northwestern Atlantic margin and on the Grand Banks. This example illustrates that the optimum sequence value for a stratigraphic event is the average value of a frequency distribution of observed occurrences (in this example, LOs only). Using the terminology of Shaw, the corresponding LAD occurs at a stratigraphically later position equivalent to about 9 million years in this application. In a later hydrocarbon application, Agterberg and Liu (2008) showed that the LOs for 27 Labrador and northern Grand Banks wells satisfy a bilateral gamma distribution that plots as a straight line on a normal Q-Q plot when the depths of the LOs in any well are subjected to a square root transformation (see Fig. 8). The straight line fitted in Fig. 8 was fitted to the data points shown excluding those on the anomalous bulge near the center of the plot that reflects the discrete sampling method used when the wells were being drilled. For a more detailed explanation, see Agterberg et al. (2007) and Agterberg and Liu (2008).
Summary and Conclusions First and last occurrences (FOs and LOs) of fossil species in a region are subject to relatively large uncertainties. In general, FOs are observed above their truly first occurrences, and LOs are below their truly last occurrences because the period of time that any fossil species was in existence generally remains unknown. Various methods of quantitative stratigraphy have been developed to reduce uncertainties related to true period of existence, in order to allow regional correlations between estimated positions of stratigraphic events in sections at different locations in a region. The main technique illustrated in this chapter is ranking and scaling (rascing) of observed stratigraphic events in different sections in a region or in deep boreholes in a basin drilled for hydrocarbon exploration. Estimated average FOs and LOs can be used for correlation between different stratigraphic sections. Uniquely identifiable lithostratigraphic events such as ash layers resulting from the same volcanic eruption can be used along with the biostratigraphic information.
198
Correlation and Scaling, Fig. 6 Biostratigraphic correlation of three sections for the Riley Formation in central Texas by means of three methods. Palmer’s (1955) original biozones and Shaw’s (1964) R.T.S. value correlation lines were superimposed on CASC (Agterberg 1990) results. Error bars are projections of single standard deviation on either side of probable positions for cumulative RASC distance values
Correlation and Scaling
equal to 2.0, 5.0, and 6.0, respectively. A cumulative RASC distance value is the sum of interevent distances between events at stratigraphically higher levels. Uncertainty in positioning the correlation lines increases rapidly in the stratigraphically downward direction. The example shows that the three different methods used produce similar correlations. (Source: Agterberg 1990, Fig. 9.25)
Correlation and Scaling
199
C
Correlation and Scaling, Fig. 7 Extended RASC ranges for Cenozoic Foraminifera in so-called Gradstein-Thomas database for offshore wells, northwestern Atlantic margin. Letters for taxon 59 on the right represent (a) estimated RASC distance, (b) mean deviation from spline curve, and (c) highest occurrence of species (i.e., maximum for deviation from spline curve). B is shown only if it differs from A. Good “markers”
such as highest occurrence of taxon 50 (Subbotina patagonica) have approximately coinciding positions for A, B, and C. Note that as a first approximation it could be assumed that the highest occurrences (c) have RASC distances which are about 1.16 units less than the average position. Such systematic difference in RASC distance is equivalent to approximately 10 million years. (Source: Agterberg 1990, Fig. 8.9)
200
Correlation and Scaling
Correlation and Scaling, Fig. 8 Normal Q-Q plot of first-order square root transformed depth differences for dataset (b). Approximate normality is demonstrated except for anomalous upward bulge in the center of
the plot, which is caused by the use of discrete sampling interval. (Source: Agterberg et al. 2013, Fig. 8b)
Cross-References
Gradstein FM (1996) STRATCOR – graphic zonation and correlation software – user’s guide, Version 4 Gradstein FM, Agterberg FP (1982) Models of Cenozoic foraminiferal stratigraphy – northwestern Atlantic margin. In: Cubitt JM, Reyment RA (eds) Quantitative stratigraphic correlation. Wiley, Chichester, pp 119–173 Gradstein FM, Agterberg FP, Brower JC, Schwarzacher WS (1985) Quantitative stratigraphy. UNESCO/Reidel, Paris/Dordrecht. 598 pp Gradstein FM, Kaminski MA, Agterberg FP (1999) Biostratigraphy and paleoceanography of the cretaceous seaway between Norway and Greenland. Earth-Sci Rev 46:27–98. (erratum 50:135–136) Guex J (1980) Calcul, caractérisation et identification des associations unitaires en biochronologie. Bull Soc Vaud Sci Nat 75:111–126 Hammer O, Harper D (2005) Paleontological data analysis. Blackwell, Ames. 351 pp Hay WW (1972) Probabilistic stratigraphy. Ecl Helv 75:255–266 Hood KC (1995) GraphCor – interactive graphic correlation software, Version 2.2 Lyell C (1833) Principles of geology. Murray, London Mann KO, Lane HR (1995) Graphic correlation, vol 53. SEPM (Soc Sed Geol), Tulsa Palmer AR (1954) The faunas of the Riley formation in Central Texas. J Paleolimnol 28:709–788 Sadler PM (2004) Quantitative biostratigraphy achieving finer resolution in global correlation. Annu Rev Earth Planet Sci 32:187–231
▶ Frequency Distribution ▶ Quantitative Stratigraphy ▶ Smoothing Filter
Bibliography Agterberg FP (1990) Automated stratigraphic correlation. Elsevier, Amsterdam. 424 pp Agterberg FP (2014) Geomathematics: theoretical foundations, applications and future developments. Springer, Heidelberg. 553 pp Agterberg FP, Gradstein FM (1988) Recent developments in stratigraphic correlation. Earth-Sci Rev 25:1–73 Agterberg FP, Liu G (2008) Use of quantitative stratigraphic correlation in hydrocarbon exploration. Int J Oil Gas Coal Expl 1(4):357–381 Agterberg FP, Gradstein FM, Liu G (2007) Modeling the frequency distribution of biostratigraphic interval zone thicknesses in sedimentary basins. Nat Resour Res 16(3):219–233 Agterberg FP, Gradstein FM, Cheng Q, Liu G (2013) The RASC and CASC programs for ranking, scaling and correlation of biostratigraphic events. Comput Geosci 54:279–292
Correlation Coefficient Shaw AG (1964) Time in stratigraphy. McGraw-Hill, New York. 365 pp Sullivan FR (1964) Lower tertiary nannoplankton from the California Coast ranges, 1. Paleocene. University of California Press, Berkeley Sullivan FR (1965) Lower tertiary nannoplankton from the California Coast ranges, II. Eocene. University of California Press, Berkeley Winchester S (2001) The map that changed the world. Viking/Penguin Books, London. 329 pp
Correlation Coefficient Guocheng Pan Hanking Industrial Group Limited, Shenyang, Liaoning, China
Definition Correlation is a term that describes the strength of dependency between two or more variables, which can be continuous, categorical, or discrete. Correlation coefficient is a measure of the degree to which the values of these variables vary in a relevant manner. The measure is generally defined as linear relationships between variables, but it can be readily generalized to take into account other relationships of more complex forms. Correlation analysis provides a means of drawing inferences about the strength of the relationship between variables. It can be more powerful when used jointly with other statistical tools for data analysis and pattern recognitions.
Introduction The correlation coefficient is a statistical measure of the dependence or association of a pair of variables. When two sets of values move in the same direction, they are said to have a positive correlation. When they move in the opposite directions, they are said to have a negative correlation. The correlation coefficient is used as a measure of the goodness of fit between the prediction equation and the data sample used to derive the prediction equation. Correlation analysis answers if a linear association between a pair of variables exists. A correlation relationship between variables can be judged either by quantitative methods or through graphical observations. Correlation analysis is meaningful only when there exists a relationship between variables. The method can be parametric or nonparametric depending on the form or style of data that are used in the statistical analysis (Pan and Harris 1990). In a broad sense, correlation analysis includes the description of the mathematical form of correlation, such as fitting a regression equation, testing the rationality of regression equation, and applying regression model for statistical prediction. Correlation analysis also includes the significance test of
201
statistical correlation between variables. Correlation analyses are most often performed after an equation relating one variable to another has been developed (Ayyub and McCuen 2011). Correlation analysis is an extensively used technique that identifies intrinsic relationships in geoscientific, socialeconomic, and biomedical fields among others. It is also used broadly in machine learning and networking in recent years to identify the relevance of attributes with respect to the target class to be predicted in environmental study (Kumar and Chong 2018). A formal correlation analysis can be carried out as follows: inspect scatter graphs of variables, select an appropriate mathematical method, calculate correlation coefficient, determine the degree of correlation between the variables, test the statistical significance of correlation, and judge the nature of correlation between variables. It is also important to understand what correlation analysis cannot do. Correlation analysis does not provide an equation for predicting the value of a variable. It does not indicate whether a relationship is causal or physical. Correlation analysis only indicates whether the degree of common variation is statistically significant.
Scatter Graph Graphical analysis is the first effective step in examining the relationship between variables. The relationship between a pair of variables can be visualized through plotting of scatter graphs from a set of sample values. A useful intuitive indication of correlation is the extent of the point cloud around the best-fit line when the data are plotted on a scatter graph. Positive correlation is indicated by a clear pattern of points running from down left to up right, while negative correlation is shown by an opposite pattern. As an illustration, various degrees of variation between two variables (X and Y) are shown in Fig. 1 using a set of artificial sample data (observations) for the variables. Figure 1a represents an extreme case of perfect positive correlation, while Fig. 1b sketches the extreme case of perfect negative correlation. In Fig. 1c there is no correlation at all between the two variables. In Figs. 1d and e, the degree of correlation is moderate to strong, since in these figures, Y generally increases (decreases) as X increases (decreases) though the corresponding changes are not exactly proportional for all the sample values involved. Figure 1f provides an example of weak correlation between the two variables, implying that Y cannot be predicted by X in a satisfactory accuracy.
C
202
Correlation Coefficient
Correlation Coefficient, Fig. 1 Various degrees of correlation between the variables x and y: (a) perfect positive correlation; (b) perfect negative correlation; (c) zero correlation; (d) strong positive correlation; (e) strong negative correlation; (f) weak correlation
Random Variable A random variable, either continuous or discrete, is a variable whose value is unknown or a function that assigns values to each of an experiment’s outcomes (observations). A random variable has a probability distribution that represents the likelihood that any of the possible values would occur. In probability and statistics, random variables are used to quantify outcomes of a random occurrence or event and define probability distributions of the event. Random variables are required to be measurable and are typically real numbers. Continuous random variables can represent any value within a specified range or interval and can take on an infinite number of possible values. An example of a continuous random variable is an experiment that involves measuring the grade of gold in a mineral deposit. The random variable can be represented by a capital letter, such as Z (in unit of gram per ton, g/t). Any gold grade value (for example, from drilling in the deposit), usually denoted by lower case letter zi, is considered as an outcome of this random variable. The gold deposit is called the population of random variable Z, while the collection of gold grades from drillholes and other sampling methods is called a sample of variable Z.
Discrete random variables take on a countable number and categorical assignment of distinct values, which can be ranks of data values, type of a physical object, etc. In mineral exploration, geologists often consider many quantitative geological measurements as discrete geological phenomena. For instance, the size of ore deposits may be expressed in terms of qualitative categories, such as large, medium, and small deposits. Another example is conversion of a continuous variable to a discrete variable through optimum discretization techniques (Pan and Harris 1990).
Population and Sample It is necessary to introduce some basic statistical concepts including population and sample, prior to discussion of the correlation and other statistical analysis for random variables. Population is the parent of a sample. Population refers to the collection of all elements possessing common characteristics and sharing certain properties, while the sample is a finite subset of the population chosen by a sampling process. Population can be finite if its total number of elements (N ) is finite; otherwise it is considered infinite. A sample is part of population chosen at random
Correlation Coefficient
203
capable of representing the population in all its characteristics or properties. The major objective of a sample is to make statistical inferences about the population. As an example, a gold deposit can be viewed as a population of gold grade (random variable), which includes all possible sample values within the deposit. In reality, it is almost impossible to analyze gold grade values (outcomes) of all sample points within the deposit. Instead, we often obtain a limited set of sample values that are designed to reasonably represent the distribution of gold grade within the deposit, for example, through a grid drilling. Then the statistical nature of gold grade in the deposit can be estimated by analyzing a sample of the population. Statistical characteristic of a population, such as a mean and standard deviation, are called parameters; while a measurable characteristic of a sample is called a statistic. The mean and standard deviation of a population are usually denoted by the symbols m and s, while the mean and standard deviation of a sample are represented by the symbols x and s. Accuracy of a sample estimation on the statistical characteristics of a population depends on sample size and sampling method. The greater the size of a sample, the higher is the level of accuracy in representation of the population. Sampling is a procedure for selecting sample elements from a population. A common method is called simple random sampling, which satisfies the following conditions: (1) the population consists of a finite number N elements, (2) each element in population cannot be sampled in duplication, and (3) all possible sample observations are equally likely to occur. Given a simple random sample, conventional statistical methods can be employed to define various statistical characteristics of a random variable, such as a confidence interval around a sample mean of gold grade in a mineral deposit.
s2 ¼
s2x ¼
n
xi
ð1Þ
ð2Þ
i¼1
n
1 n1
ðxi xÞ2
ð3Þ
i¼1
where s is called the sample standard deviation. The unit of standard deviation is always same as the unit of the variable; for example, if the variable is measured in gram per ton (gpt), the standard deviation also has the unit of gpt. For convenience, the variance of a sample can also be calculated using the following alternative equation: s2x
1 ¼ n1
n
1 xi n 2
i¼1
2
n
xi
ð4Þ
i¼1
In the equations of sample variance, note that n – 1 (not n) is used to compute the average deviation to obtain an unbiased estimate of the variance. This statistic is always nonnegative. As a generalization of variance for a single random variable, covariance is defined as a measure of linear associations between two random variables x and y. The population covariance between random variables X and Y is defined with respect to the population means mx and my as:
Basic Statistics
1 x ¼ n
ðxi mÞ2
where s is called standard deviation of the population. In practice, a statistical parameter of population is estimated by its corresponding sample statistic. Given n observations in a sample, the sample variance is estimated by:
sxy ¼
The most commonly used statistic of a random variable x is the sample mean, which is a central tendency descriptor. For n observations of a given sample, the average value is estimated by
N
1 N
1 N
N
ð x i mx Þ y i my
ð5Þ
i¼1
where N is the size of population. Sample covariance, which evaluates how a pair of variables are associated relative to their means in a sample data set, is defined as follows: sxy ¼
1 n1
n
ðxi xÞðyi yÞ
ð6Þ
i¼1
i¼1
where xi is a sample point and i ¼ 1, 2, , n. Variance or standard deviation is another important statistic that measures dispersion around the mean. The population variance of x is defined in terms of the population mean m and population size N:
A positive covariance would indicate a positive linear relationship between the variables, and a negative covariance would suggest the opposite. Another useful statistic of a random variable is called the median value xm, which is defined as the middle value when the numbers are arranged in order of magnitude. Usually, the median is used to measure the midpoint of a large set of
C
204
numbers. The median value can be determined by ranking the n values in the sample in descending order, 1 to n. If n is an odd number, the median is the value with a rank of (n þ 1)/ 2. If n is an even number, the median equals the average of the two middle values with ranks n/2 and (n/2) þ 1 (Illowsky and Dean 2018).
Correlation Coefficient In geosciences, we often need to examine relationships between different variables, such as gold and silver in a mineral deposit. It is always insightful to plot out the scatter graph of the sample observations as the first step, showing a visual relationship between the two variables. Figure 2 is the scatter plot with a sample of 17 observations on gold (x) and silver ( y) from a mineral deposit (population) (see Table 1), which clearly reveals that gold and silver are linearly associated or positively correlated except for a couple of outliers. This correlation can be readily fitted by simple linear regression analysis with exclusion of the two outliers, which is not a subject of this Chapter. Quantification of the linear association between the two variables is the focus in this section. While the covariance in Eq. (5) or (6) does measure the linear relationship between two random variables, it cannot be conveniently applied in practice since the two variables of interest could have vastly different standard deviations. It is intuitive that the covariance can be better interpreted if its value is standardized to a neat range, such as [1, 1] by removing the influence of different variances associated with variables. Thus, the Pearson coefficient (also called the product-moment coefficient of correlation) is designed in this manner so that it can be readily utilized to quantify the strength of linear correlation between variables. The Pearson coefficient in the population form is expressed as the following standardized form (Pearson 1920; Chen and Yang 2018):
Correlation Coefficient, Fig. 2 Scatter graph of observations for variables gold and silver from a drilling program, showing that the two variables are highly correlated except for a couple of outlying points
Correlation Coefficient
r¼
sxy sx sy
ð7Þ
The Pearson coefficient is a type of correlation coefficient that represents the relationship between two variables that are measured on the same interval or ratio scale. The Pearson coefficient is suitable to measure the strength of linear associations between two continuous variables. Numerically, the Pearson coefficient is represented the same way as a correlation coefficient that is used in linear regression, ranging from [1, 1]. A value of þ 1 is the result of a perfect positive relationship between variables. Positive correlations indicate that observations of both variable move in the same direction. Conversely, a value of 1 represents a perfect negative relationship, indicating that the two variables move in the opposite directions. A zero value means no correlation. Note that the Pearson coefficient shows correlation, not necessarily causation. In practice, the correlation coefficient r is estimated by using a sample consisting of a set of observations for the pair of variables (Rumsey 2019). The sample form of the Pearson coefficient is defined as follows: r¼
Sxy ¼ S x Sy
n i¼1 ðxi n i¼1 ðxi
xÞðyi yÞ
xÞ2
n i¼1 ðyi
yÞ2
ð8Þ
While Eq. (8) provides the means to calculate values of the correlation coefficient, the following form is more convenient in computation: n
r¼ n
n 2 i¼1 xi
n i¼1 xi yi
2 n i¼1 xi
n i¼1 xi
n
n i¼1 yi n 2 i¼1 yi
2 n i¼1 yi
ð9Þ
This equation is used most often in practice because it does not require prior computation of the means, and the
Correlation Coefficient
205
Correlation Coefficient, Table 1 Observations of gold and silver grades in a mineral deposit Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Gold (gpt) 0.8 2.3 1.2 4.5 1.6 0.5 5.5 5.7 4.8 3.5 4.0 2.1 1.6 3.3 1.1 9.5 9.1
Silver (gpt) 3.2 11.4 5.5 17.3 6.8 2.0 21.3 24.0 19.6 12.6 18.8 7.1 5.4 15.3 1.8 6.5 9.9
Note Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Regular sample point Extreme point (outlier) Extreme point (outlier)
computational algorithm is easily programmed for a computer (Ayyub and McCuen 2011). The formula has been devised to ensure that the value of r will lie in the range [1, 1] with r ¼ 1 for perfect negative correlation, r ¼ 1 for perfect positive correlation, and r ¼ 0 for no correlation at all. Using the previous example in Table 1, the correlation coefficient between gold and silver is calculated to be 0.46. From the scatter plot, two observations (9.5, 6.5) and (9.1, 9.9) are clearly deviated from the general trend of linear association between the two variables. They are called outliers, which can seriously distort the real relationship between the variables. The correlation coefficient is increased to 0.98 when these two outliers are excluded from the sample. The Pearson correlation coefficient is good to measure the strength of linear associations between a pair of random variables without involvement of other variables. When more than two variables are there in the system, this measure is referred to as the partial correlation between two random variables by ignoring the effect of other random variables. The partial correlation coefficient will give misleading results if there is another random variable that is numerically related to both variables of interest.
Hypothesis Testing Equations (8) and (9) provide a tool of estimating correlation coefficient between two variables given a sample consisting a set of observations. It is generally true that higher absolute value of correlation coefficient indicates greater correlation between the variables, and vice versa. In many cases,
judgment or decision is necessary to make if the correlation of two variables is statistically significant in addition to the estimate of correlation coefficient. This judgment calls for the tool of hypothesis testing, which is the formal procedure for using statistical measures in the process of decision making. Hypothesis testing represents a class of statistical techniques designed to extrapolate information from samples of data to make inferences about populations for the purpose of decision making (Ayyub and McCuen 2011). Here t-test is chosen for hypothesis testing on the statistical significance of correlation coefficient. Note that use of t-test requires that population follows a normal distribution and samples are obtained by simple random sampling. If samples differ slightly from normal distribution, t-test is still applicable, but its results will be inaccurate. As the deviation from normality increases, the results become less reliable. The strength of correlation is generally gauged by how far from zero the value of r lies. A correlation coefficient of zero indicates that there is no linear relationship between the two variables. The t-test aims to answer the question: “Is the population correlation equal to 0?” This question can be answered by hypothesis testing through the following steps (Ayyub and McCuen 2011): (1) formulate hypotheses; (2) select an appropriate statistical model that identifies the test statistic; (3) specify the level of significance, which is a measure of risk; (4) collect a sample of data and compute estimate of the test statistic; (5) define the region of rejection for the test statistic; (6) select the appropriate hypothesis. The t statistic can be used to test the significance of linear correlation when the sample size n is not very large. The t-test for correlation coefficient focuses on the statistical significance of correlation between two variables. Since correlation can be positive and negative, the region of rejection will consist of values in both tails. This is illustrated in Fig. 3a, where T is the test statistic that has a continuous probability density function. In this case, the test is called a two-sided test. A one-sided hypothesis test is also needed sometime for the so-called “directional” parameters. The region of rejection for directional test is associated with values in only the upper tail or lower tail of the distribution, as illustrated in Fig. 3b, c. The foremost step in performing the t test is to form hypotheses, which are composed of statements involving either population distributions or parameters. Note that hypotheses should not be expressed in terms of sample statistics. The first hypothesis is called the null hypothesis denoted by H0, which is formulated as an equality. The second hypothesis is called alternative hypothesis denoted by H1, which is formulated to indicate the opposite. The null and alternative hypotheses, which are written both mathematically and grammatically, must express mutually exclusive conditions. Thus, when a statistical analysis of sampled data suggests that the null hypothesis should be
C
206
Correlation Coefficient
Correlation Coefficient, Fig. 3 Sampling distribution of the t-test statistic, showing the region of rejection (cross-hatched area), region of acceptance, and critical value (t_α): (a) two-sided test, (b) lower tail one-sided test, and (c) upper tail one-sided test
rejected, the alternative hypothesis must then be accepted. The following is a formal t-test procedure: Step 1: Propose hypothesis H0, which implies that there is no linear correlation between the two variables when the population correlation coefficient is zero. The alternative hypothesis would indicate the opposite. H0 : r ¼ 0
ð10Þ
H 1 : r 6¼ 0
ð11Þ
It can be shown that for n > 2 these hypotheses can be tested using a t-statistic that is given by t ¼ jr j ðn 2Þ=ð1 r 2 Þ
Step 3: Calculate the t-statistic using Eq. (12). The calculated t value is compared against the critical value obtained from the t distribution table at the critical value α/2 for a two sided test or α for a one-sided test. Step 4: The null hypothesis H0 is rejected and H1 is accepted if the calculated t value is greater than the critical value from the t distribution table, implying that the correlation between the two variables is statistically significant at the significance level α; otherwise, the correlation is not statistically significant at the same level. As an illustration, assume the threshold α ¼ 0.05 or 5% for a two-sided test. Using the data in Table 1 by excluding the two outliers, t-value is calculated by Eq. (12), which is given by
ð12Þ
where t is a random variable that has approximately a t distribution with (n 2) degrees of freedom. Step 2: Choose a level of significance α. Determine a threshold value α for the test and find out the critical value tα/2(n 2) for a two sided t test at the threshold from the t-distribution table, which can be found in many text books for statistics, such as Chen and Yang (2018).
ta=2 ¼ t0:025 ¼ 0:98 > 2:16
ð15 2Þ= 1 0:982 ¼ 17:76 ð13Þ
The null hypothesis H0 is then rejected in this two sided test, implying that the correlation is considered statistically significant at the level of significance of α ¼ 0.05, since the computed t0.025 ¼ 17.76 value is greater than the critical value T0.025 ¼ 2.16 given a freedoms of degree n 2 ¼ 13.
Correlation Coefficient
207
As stated, the t-test is suitable when sample size is not very large. For a large sample size, other testing options are recommended, such as z test, the chi square test, and the f-test, etc.
Rank Correlation All discussions above focus on correlation coefficient for continuous variables. In practice, however, it is not always necessary, or even possible, when investigating correlation, to draw on the continuous measurements that have been presented so far. An alternative is to work only from the rank positions. In such cases, the data are ranked according to their importance, class, quality, or other factors, using integer numbers 1, 2, . . ., n. The coefficient of rank correlation can be calculated for a pair of ranked variables x and y. While it would be possible to use Pearson correlation coefficient r, in the ranked data, there is an alternative measure, which is specially designed for ranked data called the rank coefficient of correlation rs. It is sometimes known as “Spearman’s coefficient of rank correlation,” which is given below (Spearman 1904; Graham 2017): rs ¼ 1
6 ni¼1 d2i nð n2 1 Þ
ð14Þ
where n is the number of all pairs (xi, yi) of ranked data and d is the difference between ranks of corresponding values of x and y. As with the product-moment correlation coefficient, the coefficient of rank correlation rs, has been devised to ensure that its value lies within the range [1, 1]. The value for rs can be interpreted in much the same way as the values for r were in the previous section : rs ¼ 1 for perfect positive correlation, rs ¼ 1 for perfect negative correlation, and rs ¼ 0 for no correlation. As an illustration in a mineral exploration program, it is insightful to examine the relationship between rock type (x) and copper mineralization ( y). Past exploration experience and existing geological maps provide evidence of relevancy between the two variables. Thus, rocks can be ranked according to the degree of their relevancy to the mineralization of copper. Table 2 shows the coded data of rock type and copper mineralization intensity from 12 drillhole intervals, which is explained in Table 3. The scatter plot in Fig. 4 reveals a clear positive correlation between copper mineralization and rock type. Using Eq. (14), the rank coefficient of correlation between copper mineralization and rock type is calculated to 0.94 using the 12 pairs of rank observations drawn from the drillholes. The result, also shown in Table 2, clearly indicates
Correlation Coefficient, Table 2 Coded rock type and copper mineralization Observation 1 2 3 4 5 6 7 8 9 10 11 12
Rock code 2 1 3 2 3 4 1 1 4 5 3 5
Cu mineralization 1 1 2 2 1 2 1 1 3 4 2 3
d Square 1 0 1 0 4 4 0 0 1 1 1 4
Correlation Coefficient, Table 3 Code for rock type and rank of copper mineralization intensity Rock type Overburdens Dyke Argilized limestones Altered porphyry Skarn
Code 1 2 3 4 5
Mineralization intensity Baren Weakly mineralized Low-grade ore High-grade ore
Rank 1 2 3 4
Correlation Coefficient, Fig. 4 Scatter graph between code of rock type (x) and rank of copper mineralization intensity (y). The graph clearly shows a highly correlated relationship between the two variables
that copper mineralization has a strong association with rock type. Rock type is represented by numerical code, while copper mineralization is expressed by class numbers representing intensity of mineralization. The definitions of rock type and mineralization class are given in Table 3. The statistical significance of the rank coefficient can also be judged by hypothesis testing similar to the correlation coefficient. However, testing methods for discrete data differ
C
208
from the t-test for continuous data. This is not a subject to be discussed here. Spearman rank coefficient for correlation analysis is a nonparametric method, which has several advantages. The significance test for rank correlation doesn’t depend on sample distribution. Another advantage is immune from outliers or extreme values. If the sample size is small, one big outlier can completely distort Pearson’s correlation coefficient, leading to wrong conclusions. Spearman rank correlation coefficient is less affected by outliers and its results are more robust. The rank approach provides a natural filter to suppress or remove excessive influence or noises of outliers from the data. In addition to the Spearman rank correlation coefficient introduced here, other commonly used nonparametric correlation indexes include the contingency coefficient and the Kendall rank correlation coefficient.
Summary Random variation represents uncertainty. If the variation is systematically associated with one or more other variables, the uncertainty in estimating the value of a variable can be reduced by identifying the underlying relationship. Correlation and regression analyses are important statistical methods to achieve these objectives. They should be preceded by a graphical analysis to determine (1) if the relationship is linear or nonlinear, (2) if the relationship is direct or indirect, and (3) if any extreme events might dictate the relationship. The Pearson correlation coefficient is defined as a quantitative index of the degree of linear association. Cautions must be taken when the correlation analysis is applied to the real world. The first caution is the preconditions for construction of Pearson correlation coefficient and its t-statistic test. Appropriate use of these requires that the distribution of variables involved must be the bell-shaped and that sample is collected from a representative, randomly selected portion of the total population. Pearson product-moment correlation coefficient only quantifies the degree of linear association between two variables. A high statistical correlation between two variables can give a misleading impression about the true nature of their relationship. A strong Pearson correlation between two variables does not prove that one has caused the other. A strong correlation merely indicates a statistical link, but there may be many reasons for this besides a cause-and-effect relationship. Variables can be associated in temporally, causally, or spatially. For example, gold and silver are often temporally or spatially associated in a mineralization and geological environment. High correlation coefficient values do describe highly physical association between the two metals in this case. The cost of living and wages are a good example of causal relationship.
Coupled Modeling
Pearson correlation coefficient can be readily distorted when outlying observations in the sample are present. A small number of extreme value points can completely alter the statistical significance of linear association, creating spurious correlation outcomes. Outliers or extreme values are common in sampling for variables in the geoscience fields, such as copper in a mineral deposit. Hence, it is wise to perform visual inspection and basic statistical treatments on extreme values in the original data prior to use of Pearson correlation coefficient. An important assumption of using correlation coefficient and its t-test is the bell-shaped or normal distribution of random variables involved. Unfortunately, most metal grades in mineral deposits are distributed in skewed forms with long tails in the upper value side. Instead of following a normal distribution, these variables usually follow some forms of the so-called lognormal distribution. In this situation, it is appropriate to use the log scale of original data in the correlation analysis.
Bibliography Aldis M, Aherne J (2021) Exploratory analysis of geochemical data and inference of soil minerals at sites across Canada. Math Geol 53: 1201–1221 Ayyub BM, McCuen RH (2011) Probability, statistics, and reliability for engineers and scientists. CRC Press, 624p Chen JH, Yang YZ (eds) (2018) Principal of statistics. Beijing Institute of Technology Press, 291p Graham A (2017) Statistics: an introduction. Kindle Edition, 320p Illowsky B, Dean S (2018) Introductory statistics. OpexStax, 905p Kumar S, Chong I (2018) Correlation analysis to identify the effective data in machine learning: prediction of depressive disorder and emotion states. Int J Environ Res Public Health 15(6):2907 Pan GC, Harris DP (1990) Three nonparametric techniques for the optimum discretization of quantitative geological variables. Math Geol 22:699–722 Pearson K (1920) Notes on the history of correlations. Biometrika 13: 25–45 Rumsey DJ (2019) Statistics essentials for dummies. Wiley, 174p Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15:72–101
Coupled Modeling Mohammad Ali Aghighi and Hamid Roshan School of Minerals and Energy Resources Engineering, University of New South Wales, Sydney, NSW, Australia
Definition Coupled modeling refers to translating our understanding of a process involving two or more interacting physical or
Coupled Modeling
chemical subprocesses into a set of governing equations. These partial differential equations are derived either phenomenologically or from fundamental laws. The governing equation of a subprocess is often extended to account for the effect of other interacting subprocesses through additional terms called coupling terms. Coupled models are solved analytically or numerically through one-way coupling, iterative coupling, or fully coupling approaches. The latter refers to simultaneous solution of all governing eqs. A proper characterization of many of the Earth’s processes requires coupled modeling. For instance, the process of geothermal energy extraction involves heat transfer, fluid flow, and rock deformation subprocesses. This process requires cold water to be injected into the natural or induced fractures of a hot rock via a borehole so that the injected water picks up the heat and flows back to the surface through another borehole. Pressurized by the injected water, the rock fractures open, thus their flow characteristics alter. Also, the heat exchange takes place between the injected cold water and the hot rock resulting in rock contraction/expansion as well as rising water temperature. Study of such interacting processes requires coupled modeling. A geothermal energy extraction process constitutes at least three governing equations representing rock deformation, water flow, and heat transfer. Coupled modeling provides more realistic insights into the Earth’s developments involving significant interacting subprocesses.
Introduction Geoscience deals with many natural and artificial processes of which those occurring within the Earth’s crust are more of interest from an applied standpoint. Although no process in Coupled Modeling, Fig. 1 An example of coupled subsurface processes is shale gas production where several interacting subprocesses of deformation, fluid flow, and diffusion type are involved
209
Earth takes place in isolation, some can be sufficiently modeled in an uncoupled manner if the effect of other interacting processes can be neglected within an acceptable error range. For example, the process of groundwater flow can be perceived as a sole fluid flow problem if the effect of other processes such as rock deformation on flow is negligible. Otherwise, groundwater flow is a coupled process thus requiring coupled modeling. This means that the governing equation of fluid flow must somehow incorporates the effect of rock deformation. Another example from an applied geoscience perspective is the process of gas production/drainage from shales. This process involves several individual processes such as gas diffusion within the shale matrix, flow of gas within fractures, and the deformation of matrix due to sorption-induced shrinkage. The interaction of these processes (Fig. 1) adds to the complexity of the problem. For example, the shale matrix deformation changes the flow characteristics, which in turn affects the diffusion process within the matrix, thus deforming the shale matrix and bulk. Study of these interacting processes therefore requires coupled modeling (Aghighi et al. 2020). In order to develop a coupled model, every contributing process needs to be first described in the form of a field eq. A field equation for a specific process is derived through combining constitutive models and conservation laws. The set of field equations representing all contributing processes is then solved using analytical or numerical methods. Among countless physical and chemical processes occurring in Earth, solid deformation and transport phenomena are more of interest particularly in applied geoscience. Table 1 provides more information on the simple form of these processes. Most Earth developments and transformations involve the interaction of these processes.
C
210
Coupled Modeling
Coupled Modeling, Table 1 Solid deformation and transport processes Process category Process Physics law Flux/strain Potential/field Drive or load
Solid deformation Linear elasticity Hooke’s law Strain (ε) Displacement Stress (s)
Transport phenomena Fluid flow Darcy’s law Pore fluids flux (q) Head (h) Hydraulic gradient
Heat transfer Fourier’s law Heat flux (qH) Temperature (T ) Temperature gradient
Material property Equation
Young’s modulus (E) « ¼ E1 s
Hydraulic conductivity (K) q¼ K∇h
Thermal conductivity (k) qH ¼ k ∇ h
Solute (ion) transfer Fick’s law Solute flux (f) Concentration (C) Chemical potential gradient Diffusion coefficient (D) f¼ D∇C
Charge transfer Ohm’s law Charge flux (i) Voltage (V ) Electrical potential gradient Electrical resistance (R) i ¼ R1 ∇V
Modified after Mitchell and Soga (2005)
When examining a geoscience-related process, it is more convenient and cost efficient to deal with only one individual process rather than a combination of some processes; however, coupled modeling is often inevitable. Natural and manmade problems in geoscience such as plate tectonics, erosion, sedimentary deposition, groundwater flow, geothermal energy extraction, oil and gas production, CO2 capture and sequestration, storage of radioactive waste, subsidence, fault reactivation, and hydraulic fracturing inherently consist of more than one process. While the fundamentals of each individual process in physics or chemistry are somewhat well understood, an emerging challenge is to enhance our understanding of the coupled processes (Acocella 2015). This challenge cannot be tackled without studies involving coupled modeling. The following sections describe some of the most common coupled modeling that are encountered in Earth sciences. These coupled modeling cases are explained here for relatively simple yet common situations where materials are isotropic and homogeneous, fluids are single phase, laminar and incompressible, flow is Newtonian, and the relationship between stress and strain is linear, so is that between the change of the fluid content of the porous medium and pore pressure. There are numerous extensions to the following coupled models where nonlinear relationships (e.g., plasticity and viscoelasticity), non-Newtonian flow, compressible fluids (e.g., gas), anisotropy, heterogeneity, among other complexities, are taken into account. These models have been increasingly used in applied geoscience such as geophysics and hydrogeology as well as earth-related engineering disciplines such as civil, petroleum, mining, and environmental engineering.
Hydromechanical Coupling Hydromechanical coupling refers to the interactions of fluid flow and solid deformation. One of the simplest formulations of the hydromechanical coupling is the linear poroelastic
theory. This theory was developed by Biot (1941) as an extension to the Terzaghi’s effective stress law (Terzaghi 1923). Poroelasticity has been reformulated (Rice and Cleary 1976), extended (Skempton 1954), and used in many applications (e.g., Detournay and Cheng 1988). The linear poroelastic theory is based on the following assumptions among others (Biot 1941): (1) there is an interconnected pore network uniformly saturated with fluid (2) the total pore volume is small compared to the bulk volume of the rock, and (3) pore pressure and stresses can be averaged over the volume of rock. The governing equations of poroelasticity consist of two equations: a fluid flow equation incorporating the effect of solid deformation and a solid deformation equation incorporating the effect of fluid flow. These governing equations can be written in terms of different pairs of unknowns; however, it is more common to write and solve these equations in terms of displacement and pore pressure representing the solid deformation and fluid flow model, respectively. Solid Deformation Model and Its Coupled Form The combination of Hooke’s law and the equilibrium equation leads to (Jaeger et al. 2007): ðl þ GÞ∇ð∇:uÞ þ G∇2 u ¼ 0
ð1Þ
where l and G are Lame’s constants and u is the displacement vector. Incorporating the effect of fluid flow in Eq. 1 yields (compressive stresses are positive) (Jaeger et al. 2007): ðl þ GÞ∇ð∇:uÞ þ G∇2 u þ a∇p ¼ 0
ð2Þ
where α is the Biot’s coefficient and p is the pore pressure. The term α ∇ p is the hydromechanical coupling term. Fluid Flow Model and Its Coupled Form The diffusivity equation results from substituting the Darcy’s law in the continuity equation (Ahmed 2010):
Coupled Modeling
211
@p k ∇2 p ¼ @t mfcf
ð3Þ
where t is time, k is the permeability of the porous medium, m is the fluid viscosity, f is the solid porosity, and cf is the fluid compressibility. Equation 3 can be extended to account for the effect of solid deformation as follows (Detournay and Cheng 1993): @p kM 2 @ ¼ ∇ p þ aM ð∇:uÞ @t m @t
Thermo-Hydromechanical Coupling Where change in temperature and rock deformation both affect the fluid flow in a porous medium, thermohydromechanical coupling is required for modeling of the system. In this case, three governing equations are needed to describe the three processes involved (Charlez 1991): ðl þ GÞ∇ð∇:uÞ þ G∇2 u þ a∇p þ 3aT K∇T ¼ 0
ð4Þ
where M is the Biot’s modulus. The last term in the right-hand side is the coupling term representing the effect of solid deformation on fluid flow. The set of Eqs. 2 and 4 is a well-posed mathematical problem including four equations with four unknowns (three displacement components and the pore pressure).
ð8Þ
@p kM 2 @ @T ¼ ∇ p þ aM ð∇:uÞ asf M @t m @t @t
ð9Þ
@T k ¼ T ∇2 T @t rcv
ð10Þ
where αsf is a thermic coefficient.
Chemo-Hydromechanical Coupling Thermomechanical Coupling Thermomechanical coupling or thermoelasticity considers the mutual effect of solid deformation and temperature. This coupling is very similar to the poroelastic coupling with the temperature replacing the pore pressure. The set of governing equations of thermoelectricity is as follows (Coussy 1995): ðl þ GÞ∇ð∇:uÞ þ G∇2 u þ 3aT K∇T ¼ 0
ð5Þ
@T k 3KaT T 0 @ ¼ T ∇2 T þ ð∇:uÞ @t rcv rcv @t
ð6Þ
where T is the difference between the current and the reference temperature, αT is the coefficient of linear thermal expansion, K is the bulk modulus and kT is the thermal conductivity, r is the density, and cv is the specific heat at constant strain. Equation 6 is a diffusion type equation for temperature, which is obtained by introducing the Fourier’s law equation into the conservation of energy equation. Note that 3αTK ∇ T is the thermoelastic coupling coefficient. It can be shown that the effect of solid deformation on heat transfer is negligible for typical values of the parameters involved. Therefore, Eq. 6 is reduced to: @T k ¼ T ∇2 T @t rcv
ð7Þ
Equation 7 is independent of solid deformation and has been analytically solved for different geometries and boundary conditions (Carslaw and Jaeger 1959).
In a chemically active saturated porous system such as shales and clay-rich rocks, three processes are in interaction: solid deformation, fluid flow, and ion transfer. The ion transfer can alter the state of stress around boreholes causing instabilities (Roshan and Aghighi 2011). Clays swell or shrink when they adsorb or lose water because of the differences in chemical potentials of the species in pore fluid and other fluids that the rock is exposed to. These developments lead to changes in permeability, hence the flow pattern as well as the stress distribution in the formation. In an isotropic formation, where other assumptions of linear poroelasticity also hold, the following governing equations constitute a chemoporoelastic model (Ghassemi and Diek 2002): ðl þ GÞ∇ð∇:uÞ þ G∇2 u þ a∇p auc ∇C ¼ 0
ð11Þ
@p kM 2 @ @C ¼ ∇ p þ aM ð∇:uÞ apc bpc ∇2 C @t m @t @t
ð12Þ
@C ¼ acc ∇2 C ∇ðJC Þ @t
ð13Þ
where αuc, αpc, βpc, αcc, and J are chemical and flow coefficients elaborated in Roshan and Aghighi (2011). Equations 11 and 12 include the effects of other interacting processes through coupling terms; however, Eq. 13 is independent of the two other processes (i.e., rock deformation and fluid flow), thus can be solved using existing analytical solutions. Other cases such as chemo-thermo-hydromechanical coupling can be found in Roshan and Aghighi (2012).
C
212
Solving Coupled Models Coupled models can be solved analytically, numerically, or in a hybrid form. Analytical methods are used either to solve the set of governing equations in a closed form or simplify them for a numerical analysis. Even after being reduced in terms of spatial dimensions and simplifying boundary and initial conditions, only a small fraction of coupled models can be solved analytically. Nonetheless, analytical solutions are widely used for gaining useful preliminary insights into the problem through initial simulations and analysis. They are also employed for benchmarking and validation of numerical solutions of the real problem in more complex forms. Function transformations such as Laplace, Fourier, and Hankel transforms as well as differential operator methods are commonly used for solving coupled models analytically (Bai and Elsworth 2000). Analytical solutions are convenient and simple to use; however, numerical methods can provide solutions for complex geometries and conditions representing more realistic replications of the problem. Numerical methods discretize the space domain into elements and turn the continuous governing equations into finite equations. The finite element method (FEM) (Zienkiewicz and Taylor 2000), the finite difference method (FDM) (LeVeque 2007), and the boundary element method (BEM) (Banerjee and Butterfield 1981) are the most common numerical methods. The FEM is the most capable among the foregoing methods in terms of handling complex geometries and boundary conditions as well as heterogeneity and nonlinearity (Bai and Elsworth 2000).
Coupling Approaches Coupled models can be solved based on different coupling approaches: partially coupled and fully coupled. In partial coupling, governing equations are solved separately in a certain order with results being passed onto the next solution at each time step. In hydromechanical coupling, for instance, pore pressure can be evaluated first by solving the governing equation of fluid flow. The pore pressure results will be then introduced into the governing equation of rock deformation (geomechanics) to calculate displacements and stresses. The procedure is repeated for other time steps. If there is no iteration involved at each time step, the method is called “one-way coupling” (Longuemare et al. 2002). The effect of geomechanics on fluid flow is not taken into account in oneway coupling approach and the governing equations are solved once at each time step. If the results are exchanged at each time step in an iterative manner until a convergence criterion is satisfied, the coupling is called iterative. Results of the iterative coupling approach
Coupled Modeling
are obviously more accurate than that of the one-way coupling approach. Since the interactions of subprocesses take place simultaneously, it is sensible to solve their respective equations concurrently. This is what signifies a fully coupled approach, which provides accurate results without any need for iteration. The fully coupled approach requires less time compared to the iterative coupling. The governing equations, however, often need to be simplified to be suitable for full coupling and meet the requirements of solution procedures.
Summary Many processes in the earth involve two or more interacting subprocesses. Coupled modeling is required to describe such processes. A coupled model is a set of governing equations characterizing all contributing subprocesses. Governing equations are derived by combining constitutive and conservation laws. Some of the widely occurring coupling processes can be characterized using hydromechanical, thermomechanical, thermo-hydromechanical, and chemohydromechanical couplings. Coupled models are solved using analytical and numerical models. While analytical models are more convenient to use, numerical models can handle more complex geometries and conditions. Common approaches for solving coupled models are one-way coupling, iterative coupling, and fully coupling methods. In the latter approach, governing equations are solved simultaneously. Solving models in a fully coupled manner has the advantage of requiring less time compared to the iterative approach; however, it is not applicable to all governing equations unless some simplifications are made.
Cross-References ▶ Computational Geoscience ▶ Earth System Science ▶ Flow in Porous Media ▶ Fast Fourier Transform ▶ Geomechanics ▶ Laplace Transform ▶ Mathematical Geosciences ▶ Partial Differential Equations ▶ Simulation
Bibliography Acocella V (2015) Grand challenges in earth science: research toward a sustainable environment. Front Earth Sci 3(68). https://doi.org/10. 3389/feart.2015.00068
Cracknell, Arthur Phillip Aghighi MA, Lv A, Roshan H (2020) Non-equilibrium thermodynamic approach to mass transport in sorptive dual continuum porous media: a theoretical foundation and numerical simulation. J Nat Gas Sci Eng:103757. https://doi.org/10.1016/j.jngse.2020.103757 Ahmed T (2010) Reservoir engineering handbook. Elsevier Science, San Diego Bai M, Elsworth D (2000) Coupled processes in subsurface deformation, flow, and transport. American Society of Civil Engineers, Reston Banerjee PK, Butterfield R (1981) Boundary element methods in engineering science, vol 17. McGraw-Hill, London Biot MA (1941) General theory of three-dimensional consolidation. J Appl Phys 12(2):155–164. Retrieved from http://link.aip.org/ link/?JAP/12/155/1 Carslaw H, Jaeger J (1959) Conduction of heat in solids. Clarendon Press, Oxford Charlez PA (1991) Rock mechanics: theoretical fundamentals. Éditions Technip, Paris Coussy O (1995) Mechanics of porous continua. Wiley Detournay E, Cheng AHD (1988) Poroelastic response of a borehole in a non-hydrostatic stress field. Int J Rock Mech Min Sci Geomech Abstract 25:171–182 Detournay E, Cheng AHD (1993) Fundamentals of poroelasticity. In: Hudson (ed) Comprehensive rock engineering: principles, practice and projects, vol 2. Pergamon Press, Oxford, pp 113–171 Ghassemi A, Diek A (2002) Porothermoelasticity for swelling shales. J Pet Sci Eng 34(1–4):123–135. https://doi.org/10.1016/S09204105(02)00159-6 Jaeger JC, Cook NGW, Zimmerman R (2007) Fundamentals of rock mechanics. Wiley, Malden LeVeque RJ (2007) Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems. SIAM, Philadelphia Longuemare P, Mainguy M, Lemonnier P, Onaisi A, Gerard C, Koutsabeloulis N (2002) Geomechanics in Reservoir Simulation: Overview of Coupling Methods and Field Case Study Oil & Gas Science and Technology 57:471–483 Mitchell JK, Soga K (2005) Fundamentals of soil behavior, vol 3. Wiley, New York Rice JR, Cleary MP (1976) Some basic stress diffusion solutions for fluid-saturated elastic porous media with compressible constituents. Rev Geophys 14(2):227–241. https://doi.org/10.1029/ RG014i002p00227 Roshan H, Aghighi MA (2011) Chemo-poroelastic analysis of pore pressure and stress distribution around a wellbore in swelling shale: effect of undrained response and horizontal permeability anisotropy. Geomech Geoeng 7:209–218. https://doi.org/10.1080/17486025. 2011.616936 Roshan H, Aghighi MA (2012) Analysis of pore pressure distribution in shale formations under hydraulic, chemical, thermal and electrical interactions. Transp Porous Media 92(1):61–81. https://doi.org/10. 1007/s11242-011-9891-x Skempton AW (1954) The pore-pressure coefficients A and B. Geotechnique 4(4):143–147. https://doi.org/10.1680/geot.1954. 4.4.143 Terzaghi K (1923) Die brechnung der durchlassigkeitsziffer des tones aus dem verlauf der hydrodynamischen spannungserschinungen. Sitz Akad Wissen, Wien Math Naturwiss Kl, Abt IIa 132:105–124 Zienkiewicz OC, Taylor RL (2000) The finite element method – basic formulation and linear problems, vol 1, 5th edn. ButterworthHeinemann, Oxford
213
Cracknell, Arthur Phillip Kasturi Devi Kanniah TropicalMap Research Group, Centre for Environmental Sustainability and Water Security (IPASA), Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
Fig. 1 Arthur Phillip Cracknell (1940-2021), © Prof. Kasturi Kanniah
Biography Arthur P. Cracknell passed away in April 2021. He was an emeritus professor at the School of Engineering, University of Dundee, UK, from September 2002 until April 2021. He was one of the pioneers in the field of remote sensing technology and his contribution to the development and advancement of the technology is remarkable. I regard him as my mentor and a dear friend and I am honored to write a biography about him. Professor Cracknell started remote sensing works in the late 1970s when the field was still in its infancy in the UK. The establishment, development, and operations of the National Oceanic and Atmospheric Administration’s Polar Orbiting Environmental Satellites data receiving station in Dundee University in 1978 was the catalyst for the development of remote sensing in the UK. While he was there, Professor Cracknell initiated research on processing and interpreting remote sensing data that was required by a group of environmental scientists and engineers in Dundee for their environmental consultancy works. Professor Cracknell established and led a very active research group in remote sensing at Dundee University and trained many foreign PhD students including two of the current UTM professors in remote sensing. Most of the
C
214
research conducted by his group focused on the modeling of atmospheric transmission and earth surface conditions, development of software to process remotely sensed data, and the environmental application of remote sensing data. Besides being actively involved in research projects, he also initiated academic programs via summer schools and postgraduate courses. The summer schools played a great role in advancing the application of remote sensing technology within a teaching and research context in many organizations around the world. The MSc program in remote sensing, image processing, and applications produced many undergraduates and postgraduates over the years. Professor Cracknell’s contribution to remote sensing knowledge dissemination through academic journals is noteworthy. In 1983, he became the chief editor of the International Journal of Remote Sensing and later the editor of Remote Sensing Letters. He published over 300 research papers and over 30 books covering physics and various remote sensing applications. These publications had significant contributions for innovations and potential future research in remote sensing. He is recognized for these contributions through various prestigious awards that he has received. Since his retirement from Dundee University, he continued as an active scientist and researcher with universities within and outside of the UK. He was a senior visiting professor at UTM, between 2003 and 2019. At UTM he was engaged with teaching courses to undergraduates, co-supervising PhD candidates, reviewing the curriculum of remote sensing programs, delivering talks, and assisting in research activities. I became acquainted with him since 2009 when I started to work closely on a project related to carbon storage of oil palm trees through which I had an opportunity to learn a lot from him. I have always noticed that he loved working with students and was always willing to discuss their research projects and manuscripts while imparting his knowledge and values to them. Through his academic network, he also assisted many students to secure scholarships and travel grants to present their research findings in international conferences. He helped to link UTM researchers with scholars from many worldrenowned universities such as the Tsinghua University in Beijing. The connection later enabled UTM to expand its research collaboration in many other areas. On behalf of UTM, I thank Professor Cracknell for all his contributions. Since my early interaction with Professor Cracknell, I have been fascinated by his passion in conducting research and scientific writing even after his retirement. In fact, due to his profound dedication for research, he was always willing to travel with his own expenses. I consider Professor Cracknell to be a principled, responsible, and highly motivated scholar and a gentleman. I feel very honored and thankful to have known such a great person and to have coauthored almost 20 journal articles, conference papers, and to have jointly supervised postgraduate students.
Cressie, Noel A.C. Acknowledgment This chapter is dedicated to honoring the late Professor Cracknell for his contribution to the development of remote sensing. He definitely deserves the honor as a remarkable scientist, educator, and individual.
Cressie, Noel A.C. Jay M. Ver Hoef Alaska Fisheries Science Center, NOAA Fisheries, Seattle, WA, USA
Fig. 1 Noel A. C. Cressie, courtesy of Prof. Cressie
Biography Cressie is best known for his work in spatial statistics. In his widely cited, 900- page book, Statistics for Spatial Data, Cressie (1991) developed a general spatial model that unified disciplines using geostatistical data, regular and irregular lattice data, point patterns, and sets in Euclidean space. Since the 1980s, he has been a developer of statistical theory and methods, such as weighted least squares (see article on ordinary least squares) for fitting variograms (see article on variograms), and he has also written on the history of geostatistics (see articles on geostatistics, kriging). His book contains his contributions, and those of many others, in the field of spatial statistics. Since the 2000s, Cressie has been actively researching and applying spatiotemporal models using a hierarchical-statisticalframework. He has made groundbreaking innovations in dimension-reduction to make spatial and spatiotemporal modeling computationally feasible for “big data.” He has developed new classes of nonseparable spatiotemporal covariance models (Cressie and Huang 1999) (see articles on spatiotemporal analysis, spatiotemporal mod- eling) and used them in many applications in the geosciences. He synthesized this knowledge in the award-winning books
Crystallographic Preferred Orientation
Statistics for Spatio-Temporal Data (Cressie and Wikle 2011) and Spatio-Temporal Statistics with R (Wikle et al. 2019). In statistical theory, Cressie is also highly regarded for his fundamental contributions to goodness-of-fit, which provided a new unified view of this classic research field (Read and Cressie 1988) and Bayesian sequential testing (Cressie and Morgan 1993). Cressie has published 4 books and over 300 articles/chapters/discussions in scholarly journals and edited books. Among numerous awards and fellowships, in 2009, he received from the Committee of Presidents of Statistical Societies (COPSS) one of the highest awards in statistical science – the R.A. Fisher Award and Lectureship. In 2014, Cressie was awarded the Pitman Medal by the Statistical Society of Australia for his outstanding achievements in the discipline of Statistics, and in 2018, he was elected a Fellow of the Australian Academy of Science. More on Cressie’s background and history may be found at https://en.wikipedia.org/wiki/ Noel_Cressie and in Wikle and Ver Hoef (2019).
Cross-References ▶ Geostatistics ▶ Kriging ▶ Matheron, Georges ▶ Ordinary Least Squares ▶ Spatial Statistics ▶ Spatiotemporal Analysis ▶ Spatiotemporal Modeling ▶ Variogram ▶ Watson, Geoffrey S.
References Cressie NAC (1991) Statistics for spatial data. John Wiley & Sons, New York, NY Cressie N, Huang H-C (1999) Classes of nonseparable, spatio-temporal stationary covariance functions. J Am Stat Assoc 94:1330–1340. (Correction, 2001, Vol. 96, p. 784) Cressie N, Morgan PB (1993) The VPRT: a sequential testing procedure dominating the SPRT. Economet Theor 9:431–450 Cressie N, Wikle CK (2011) Statistics for Spatio-temporal data. John Wiley & Sons, Hoboken Read TR, Cressie NA (1988) Goodness-of-fit statistics for discrete multivariate data. Springer, New York, NY Wikle CK, Ver Hoef JM (2019) A conversation with Noel Cressie. Stat Sci 34:349–359 Wikle CK, Zammit-Mangion A, Cressie N (2019) Spatio-temporal statistics with R. CRC/Chapman and Hall, Boca Raton
215
Crystallographic Preferred Orientation Helmut Schaeben Technische Universität Bergakademie Freiberg, Freiberg, Germany
C Synonyms Crystallographic texture
Definition Crystallographic preferred orientation is a multidisciplinary topic of crystallography (solid states physics), mineralogy, mathematics, and materials science and geosciences. Its subject used to be the statistical distribution of the spatial orientations of crystallites (grains) within polycrystalline specimen, a distinct pattern of which is referred to as texture. Since spatially resolving electron backscatter diffraction delivers digital orientation map images, contemporary texture analysis comprises the spatial distribution of grains by means of image analysis and includes intragrain orientation distributions and grain boundary phenomena.
Historical Notes Analysis of crystallographic preferred orientation or crystallographic texture analysis and its application originate in materials science and have been initiated by Günter Wassermann in the 1930s and largely extended together with Johanna Grewen (Wassermann and Grewen 1962). Single crystals are largely anisotropic with respect to almost all physical properties, one of the best-known examples in geosciences is the anisotropy of thermal expansion of quartz crystals which is for the crystallographic a-axis about twice as large as for the crystallographic c-axis. Thus, directional or tensorial physical properties of polycrystalline materials such as metals, rocks, ice, and many ceramics depend on the statistical distribution of orientations of grains assuming that the orientation of a grain is unique. Vice versa, a grain could be defined as a spatial domain of material support of a unique crystallographic orientation. The first experiments to determine texture of a specimen of polycrystalline material were done with X-ray diffraction in a texture goniometer and provided integral orientation measurements. A texture gonionmeter applies Bragg’s law 2d sin y ¼ nl, n ℕ,
ð1Þ
216
to select a feasible diffracting crystallographic lattice plane with interplanar distance d when keeping the angle θ and the wavelength l fixed. Then a texture goniometer measures the diffracted intensities of the selected lattice planes with their common unit normal crystallographic direction h 2 aligned with a given but variable macroscopic direction r 2 referring to the specimen. These intensities are eventually modeled by a bi-directional probability density function P(h, r) of normals h of specified crystallographic lattice planes and ! specimen directions r , and its equal area projections on the unit disk are called pole figures. In 1965, Bunge and Roe suggested simultaneously (Bunge 1965; Roe 1965) a mathematical approach in terms of a harmonic (Fourier) series expansion to determine a crystallographic orientation density function f from several h-pole figures. However, their approach was flawed by the same erroneous conclusion that the odd-order harmonic coefficients of an orientation density function are identical to 0, i.e., that the orientation density function is an even function because experimental pole figures are even, P(h, r) ¼ P(h, r) due to Friedel’s law implying that the diffraction experiment (excluding anomalous scattering) itself introduces a center of symmetry whether it is included in the actual crystal symmetry class or not. Their model of the relationship of an orientation and pole probability density function was incomplete in that it missed to account for both directions h and its antipodal –h. The mistake was abetted by their rather symbolic notation. Numerical realizations of their model in various computer codes led to occurrences of false texture components in the computed orientation density functions, which became notorious as terrifying “texture ghosts” for more than a decade. The riddle was resolved by a proper model accounting for (h, h) by Matthies (1979). In the early 1990s, orientation imaging microscopy has been developed (Adams et al. 1993; Schwartz et al. 2009) based on electron backscatter diffraction (EBSD) providing spatially resolved individual orientation measurements. The spatial reference of these measurements provides the essential prerequisite to model the geometry and topology of grains at the surface of the specimen by means of mathematical approaches and their respective assumptions. Thus, EBSD experiments open the venue to realize Bruno Sander’s vision of a comprehensive fabric analysis (Sander 1950) beyond axes distribution analysis including grain size distribution and grain shape analysis as well as various misorientation distributions, grain boundary classification and directional grain boundary distribution, and other instructive entities.
Introduction The subject of texture analysis is crystallographic preferred orientation, i.e., of patterns of crystallographic orientations in
Crystallographic Preferred Orientation
terms of their statistical or spatial distribution within a specimen. The statistical distributions range from a uniform (or random) texture representing the lack of any preferential pattern, to sharp textures concentrated around a preferred orientation with a deviation of a few degrees, referred to as a technical single crystal orientation. Texture analysis is a field of multidisciplinary interaction of crystallography, mathematics, solid state physics, mineralogy, and materials science, on the one hand, and geophysics and geology, on the other hand. Analysis of crystallographic (preferred) orientation is based on integral orientation measurements with X-ray, neutron, or synchrotron diffraction, or individual orientation measurements with electron backscatter diffraction (EBSD). Integral orientation measurements require to resolve the inverse problem to estimate an orientation density function from experimental pole intensities. The key to derive any reasonable method is to recognize that their relationship can mathematically be modeled in terms of a Funk-Radon transform. Estimation of an orientation density function from given individual orientation measurements is done with nonparametric spherical kernel density estimation. Moreover, spherical statistics applies to analyze EBSD data. Since EBSD results in spatially referenced individual crystallographic orientations, they provide spatially resolved information and can be visualized as orientation maps. Then crystallographic grains can be modeled, and texture analysis can be extended to fabric analysis.
Definitions An orientation is an element of the special orthogonal group SO(3) of rotations in ℝ3 thought of a rotational configuration of one right-handed orthonormal 3-frame, i.e., an ordered basis of ℝ3, say K S , with respect to another right-handed orthonormal 3-frame, denoted K C . The orientation of the 3-frame K C with respect to the 3-frame K S is defined as the element g SO(3) such that gK S ¼ K C :
ð2Þ
Then the coordinate vectors hK C with respect to the ! 3-frame K C and rK S with respect to the 3-frame K S of a unique unit vector v 2 are related by gh ¼ r:
ð3Þ
Spatial shifts between the 3-frames are not considered. A crystallographic orientation takes crystallographic symmetry into account. Crystallographic symmetry enables to distinguish mathematically different orientations of a crystal which are physically equivalent. Representing crystallographic symmetry in terms of the point group Spoint of the
Crystallographic Preferred Orientation
217
crystal, its crystallographic orientation is defined as the (left) coset gSpoint ¼ {gq | q Spoint} of all orientations crystallographic symmetrically equivalent to a given orientation g. A comprehensive treatise of rotations and crystallographic orientations is found in (Morawiec 2004). In case of diffraction experiments, not only the crystallographic symmetry but also symmetry imposed by the diffraction itself (Friedel’s law) has to be considered. This symmetry is generally described by the Laue group SLaue which is the point group of the crystal augmented by inversion. However, the cosets of equivalent orientations can be completely represented by proper rotations. Likewise, when analyzing diffraction data for preferred crystallographic orientation, it is sufficient to consider the restriction of the Laue group SLaue to its purely rotational part SLaue . Then, two orientations g and g0 SO (3) are crystallographic symmetrically equivalent with respect to SLaue if there is a symmetry element q SLaue such that gq ¼ g0 . Thus, a crystallographic orientation is a left coset. The left cosets gSLaue define classes of crystallographic symmetrically equivalent orientations. The set of all cosets is called quotient space. Two crystallographic directions h and h0 2 are crystallographic symmetrically equivalent if a symmetry element q SLaue exists such that qh ¼ h0 . The set of all crystallographic symmetrically equivalent directions SLaue h 2 may be viewed as a set of positive normals sprouting from the symmetrical equivalent crystallographic lattice planes. A fundamental entity of texture analysis is the statistical distribution of crystallographic orientations by volume (as opposed to number), i.e., of the volume portion of a specimen supporting crystallographic orientations within a given (infinitesimally) small range neglecting its spatial distribution. The statistical distribution is usually represented by an orientation probability density function f almost always falsely referred to as orientation distribution function since the early days of texture analysis in materials science and solid state physics, respectively, cf. (Bunge 1965; Roe 1965). Thus, an orientation density function f models the relative frequencies of crystallographic orientations within a specimen by volume. It is properly defined on the corresponding quotient space SOð3Þ=SLaue , i.e., f ðgÞ ¼ f ðgqÞ, g SOð3Þ, q SLaue :
ð4Þ
Crystallographic preferred orientation may also be represented by a pole density function P modeling the relative frequencies that an element of SLaue h 2 coincides with a given specimen direction r 2 , i.e., one of the crystallographic symmetrically equivalent directions SLaue h or its antipodal symmetric direction coincides with a given specimen direction r 2 . Thus, a pole density function satisfies the symmetry relationships
Pðh, rÞ ¼ Pðqh, rÞ, ðh, rÞ 2 2 , q SLaue , and Pðh, rÞ ¼ Pðh, rÞ, i.e., it is essentially defined on 2=SLaue 2 . Pole density functions thought of as functions of the variable specimen direction r parametrized by the crystallographic axis h are experimentally accessible as pole intensities from X-ray, neutron, or synchrotron diffraction for some crystallographic forms h and displayed as h-pole figures. Orientation probability density and pole density function are related to each other by the Funk-Radon transform (Bernstein and Schaeben 2005). Once an orientation density function has been estimated from experimental h-pole intensities (or from individual orientation measurements), its corresponding fitted pole density function may be computed and displayed as function of r parametrized by h, or vice versa; the latter are referred to as inverse pole figures where the parameter r is usually chosen to be one of the axes of the specimen coordinate system.
Estimation of an Orientation Density Function An orientation density function can be estimated from integral orientation measurements as provided by X-ray, synchrotron, or neutron diffraction experiments in terms of pole figures or from individual orientation measurements as provided by spatially resolving electron backscatter diffraction (EBSD) experiments in terms of orientation maps. Once an orientation density function is estimated sufficiently well, characteristics quantifying its features such as Fourier coefficients, texture index, entropy, modes, volume portions around peaks or fibers, and others can be computed.
Estimation with Integral Orientation Measurements Up to normalization, the pole intensities P(h, r) as displayed in pole figures are a measure of the mean density of orientations rotating the elements of SLaueh to the macroscopic direction r. The key element of a mathematical model for crystallographic pole density functions is the Funk-Radon transform R f of an orientation density function f defined on SO(3). Here, the Funk-Radon transform assigns the mean values of f along geodesics G SO(3) to f. Any geodesic G can be parametrized by a pair of unit vectors ðh, rÞ 2 2 as Gðh, rÞ ¼ g SOð3Þjgh ¼ r, ðh, rÞ 2 2 :
ð5Þ
C
218
Crystallographic Preferred Orientation
Since [ Gðh, rÞ ¼ [ Gðh, rÞ ¼ SOð3Þ,
h 2
r 2
ð6Þ
the geodesics provide a double (Hopf) fibration of SO(3). Then the Funk-Radon transform R f of f defined on SO(3) is defined on 2 2 as R f ðh, rÞ ¼
1 2p
Gðh,rÞ
f ðgÞdg:
ð7Þ
It is a probability density function if the function f is a probability density function. The Funk-Radon transform R f possesses a unique inverse (Helgason 1999). Thus, the initial function f can be uniquely recovered from its Funk-Radon transform R f by its inversion, the numerical inversion being notoriously ill-posed. The question appears how can a one-toone relationship exist between the function f defined on the three-dimensional manifold SO(3) and its Funk-Radon transform R f defined on the four-dimensional manifold 2 2 . The affirmative answer is that the Funk-Radon transform is governed by a Darboux-type differential equation, the ultrahyperbolic differential equation ðDh Dr ÞR f ðh, rÞ ¼ 0,
ð8Þ
where Δ denotes the (spherical) Laplace-Beltrami operator and the subscript denotes the variable with respect to which the Laplace-Beltrami operator is applied (Savyolova 1994). Taking means of the Funk-Radon transform, e.g., 1 R f ðh, rÞ ¼ ðR f ðh, rÞ þ R f ðh, rÞÞ, 2
ð9Þ
a one-to-one relationship between the mean R f and the initial function does not exist. To recover an estimate f from R f requires additional modeling assumptions which are mathematically tractable and physically reasonable, e.g., the nonnegativity of f. To account for crystal symmetry, the basic model of a crystallographic pole density function is X f ðh, rÞ ¼
1 #SLaue h n S
R f ðn, rÞ,
ð10Þ
Laue h
and may be referred to as spherical crystallographic X-ray transform. Of course, it does not possess an inverse, and the inverse problem does not have a unique solution. The complete discrete mathematical model of the experimental intensities is much more involved, e.g., intensities are experimentally accessible for a few distinct crystallographic lattice planes only, effects of partial or complete superposition
of diffracted intensities from different lattice planes, e.g., complete superposition of (hkil) and (khil) of quartz, have to be accounted for by their structure coefficients, and the experimental intensities, actually counts, are usually not normalized. Their normalization requires special attention as the experimental intensities do not cover an entire hemisphere, a problem referred to as “incomplete pole figures.” All approaches to the numerical resolution of the inverse problem involve truncated infinite or finite series expansion of the orientation probability density function f, e.g., into generalized spherical harmonics given rise to Fourier series expansion, into finite elements or splines, respectively, or into appropriate radial basis functions (Hielscher and Schaeben 2008), and their discretization. The Fourier approach requires truncation of the series and allows to estimate the even Fourier coefficients only, i.e., the kernel of the mean R f , Eq. 9, comprises the generalized harmonics of odd order (Matthies 1979). Thus the nonuniqueness of the inverse problem of texture analysis can be quantified that all odd Fourier coefficients remain unaccessible without additional efforts. Finite series expansion into nonnegative radial basis function employs the heuristic that their linear combination is apt to approximate any nonnegative function reasonably well. It is computationally feasible only when spherical nonequispaced fast Fourier transform (Kunis and Potts 2003, Potts et al. 2009) is applied.
Estimation with Individual Orientation Measurements An EBSD experiment yields Kikuchi patterns to be processed by methods of image analysis or image comparison to result eventually in a set of spatially referenced representatives g(x‘) ¼ g‘ SO(3), x‘ ℝ2, ‘ ¼ 1, . . ., n, of crystallographic orientations in SOð3Þ=SLaue . Then an orientation probability density function is estimated by nonparametric kernel density estimation, i.e., the superposition of a radial basis function centered at the data g‘, ‘ ¼ 1, . . ., n, f k ðgjg1 , . . . , gn Þ ¼
1 n
n ‘¼1
, k , Ck o gg 1 ‘
ð11Þ
where ðCk , k Þ is a set of nonnegative radial basis functions with shape parameter k , actually an approximate identity for k ! k0. While the choice of the radial basis function C is not generally critical in practical applications, the choice of its shape parameter k controlling the width or bandwidth, respectively, is so. If the kernel is given, heuristics can be applied to optimize its shape parameter with respect to various criteria (Hielscher 2013). Kernel density estimation
Crystallographic Preferred Orientation
assumes independent and identically distributed random orientations with finite expectation and finite variance.
Grain Modeling with Individual Orientation Measurements Even though the orientation probability density function and its estimate, respectively, referring to a set of crystallographic grains are basic entities of polycrystalline materials, there are many more entities rather referring to grain boundaries or individual grains which are instrumental to describe and quantify features of the fabric of polycrystalline materials. Modeling the geometry and topology of grains applies heuristic modeling assumptions, some kind of classification of individual orientation measurements into classes of similar orientations, a geometrical segmentation (partition) of the orientation map image, and a data model to represent the geometry and topology of the grains and their boundaries. The basic heuristic modeling assumption is the user’s definition of an angular threshold of the misrorientation angle of two crystallographic orientations. If the actual misorientation angle exceeds the threshold, the orientations are considered to be different, possibly stemming from different grains and indicating a grain boundary if their locations are adjacent. Straightforward thresholding of pixel-to-pixel misorientation angles does not yield unique grains. There are quite a few numerical approaches to grain modeling. They may differ by the order of procedural steps, e.g., commencing with the segmentation of the orientation map image and then proceeding to threshold controlled amalgamation of the initial segments to crystallographic grains, or the other way round, prioritizing the clustering of similar orientations followed by geometrical modeling. An example of the former approach features the Dirichlet-Thiessen-Voronoi partition of the map image into polygonal cells centered at the locations of the measurements. This partition is based on the modeling assumption that grain boundaries are located at the bisectors of adjacent measurement locations. It immediately implies that grains are composed of adjacent Dirichlet-ThiessenVoronoi cells (Bachmann et al. 2011). An example of the latter approach is generalized fast multiscale clustering resulting in grains composed of adjacent pixels (McMahon et al. 2013). Both methods apply graph theory to represent the grain model.
Orientation Statistics It appears tempting to apply statistical analysis and parametric model fit to individual crystallographic orientation data. Statistics of orientations may be seen as generalization of statistics of directions and axes (Downs 1972). The generalization seems straightforward when orientations are thought of in
219
terms of unit quaternions. The Bingham distribution applies to axes in 2 as well as to pairs of antipodal quaternions in 3 (Kunze and Schaeben 2004). Likewise, the Fisher distribution of directions in 2 can be generalized to the Fisher matrix distribution of matrices in SO(3). However, the distributions known from spherical statistics do not apply to crystallographic orientations as they cannot represent any crystal symmetry. Symmetry could be imposed on a quaternion or matrix distribution by superposition corresponding to the crystal symmetry class, but they would not be exponential and therefore lose all the distinct properties of exponential distributions like the Bingham or Fisher distribution. Nevertheless, they may be useful to distinguish patterns of preferred crystallographic orientation in terms of their parameters. The effects of mixing by superposition of a distribution are small if the distribution is highly concentrated. High concentrations can be expected if the measurements refer to an individual grain. Then inferential statistics is possible to distinguish the geometrical shape of highly concentrated clusters of crystallographic orientations (Bachmann et al. 2010). Otherwise, native models for ambiguous rotations are required (Arnold et al. 2018).
Estimation of Macroscopic Directional Properties Given an anisotropic antipodally symmetric property of a single crystal Ec(h) depending on the crystallographic axis h 2 , the corresponding macroscopic property of a specimen Em(r) depending on the macroscopic direction r 2 can be computed given the orientation density function f or the even pole density function, respectively, Em ð r Þ ¼
1 4p2
2
Ec ðhÞX f ðh, rÞdh:
ð12Þ
For a much more detailed presentation of the topic, the reader is referred to (Mainprice et al. 2011).
Applications of Texture Analysis Typical Applications in General In terms of their objective, applications of texture analysis in materials science or geosciences are very different. The prevailing processes forming texture, i.e., distinct patterns of crystallographic preferred orientation, are plastic deformation, recrystallization, and phase transformations. In materials science, texture analysis typically addresses “forward” problems, like what pattern of crystallographic preferred orientation is caused by a given process, and refers to process control in the laboratory or quality control in production to guarantee a required texture and corresponding macroscopic physical properties, e.g., isotropic steel, i.e., almost perfect uniform
C
220
Crystallographic Preferred Orientation
distribution, no crystallographic preferred orientation, or high-temperature semiconductors, i.e., an almost perfect “single crystal” crystallographic preferred orientation. In geosciences, texture analysis is typically applied to the much more difficult “inverse” problem to identify process(es) of a sequence of geological processes which may have caused an observed pattern of crystallographic preferred orientation in rocks or ice. A result of geological texture analysis could be to exclude a geological process or event. In any case, geological texture analysis aims at an interpretation of the kinematics and dynamics of geological processes contributing to a consistent reconstruction, e.g., of the geological deformation history. Geological texture analysis provides otherwise inaccessible insight and proceeding understanding, e.g., differences in the velocity of seismic waves along or across ocean ridges have been explained with textures changes during mantle convection (Almqvist and Mainprice 2017), varying texture may result in a seismic reflector (Dawson and Wenk 2000), and the texture of marble slabs employed as building facades or tombstone decoration is thought to influence the spectacular phenomena of bending, fracturing, spalling, and shattering of the initially intact slabs (Shushakova et al. 2013).
Texture of a Hematite Specimen from the Pau Branco Deposit, Quadrila´tero Ferrífero (Iron Quadrangle), Brazil
Crystallographic Preferred Orientation, Fig. 1 (a) Raw orientation map image of 316,050 spatially referenced individual orientation measurements with EBSD (left), where pixels supporting orientation g are
(b) color coded according to g1z in the z-axis inverse pole figure (right), (Schaeben et al. 2012)
Some typical steps of an EBSD data analysis have been presented with respect to hematite from a high-grade schistose ore sample from the Pau Branco deposit, Quadrilátero Ferrífero (Iron Quadrangle), Brazil (Schaeben et al. 2012). They include estimation of an orientation probability density function and modeling of grains, and in particular their interplay. A raw orientation map image from EBSD with the Pau Branco specimen is shown in Fig. 1, where crystallographic orientations are assigned to pixels which in turn are colorcoded according to the colors defined in the inverse z-pole figure. The initially estimated orientation probability density function and its corresponding pole figures show a pronounced deviation from the familiar symmetric pattern of crystallographic preferred orientation of hematite (Fig. 2). The obvious asymmetry is caused by one single grain, which is peculiar both by size and by crystallographic orientation (Fig. 3). Excluding this particular grain from the orientation probability density estimation yields the expected symmetric pattern (Figs. 4 and 5).
Crystallographic Preferred Orientation
221
C
Crystallographic Preferred Orientation, Fig. 2 (a) ODF of individual orientations displayed in s-sections (left) revealing a conspicuous asymmetry best visible in the s ¼ 10∘ section (left); (b) corresponding
PDFs for crystal forms of special interest (right). The dots mark the modal orientation gmodal and their corresponding r ¼ gmodalh, (Schaeben et al. 2012)
Crystallographic Preferred Orientation, Fig. 3 Orientation map image of spatially referenced individual orientation measurements with EBSD after corrections involving confidence index (CI), forward scatter
detector signal (SEM), and grain size. The vaguely visible pattern of vertical lines may be reminiscent of specimen preparation (left); (Schaeben et al. 2012)
Crystallographic Preferred Orientation, Fig. 4 Orientation map image displaying grains with respect to an angular threshold of 10 of misorientation and their mean orientation. It should be noted that there is
a conspicuous grain in terms of its blue color coding a peculiar orientation and its size in the upper left of the image, which accounts for 2.4% of the total surface area of the specimen (Schaeben et al. 2012)
222
Cumulative Probability Plot
Crystallographic Preferred Orientation, Fig. 5 (a) ODF of individual orientations displayed in s-sections augmented by the mean orientation of conspicuous grain 829 as a green dot visible in the s ¼ 10∘ section
(left); (b) s-sections of a recalculated ODF excluding grain 829 and all its properties, augmented by the modal orientation visible in the s ¼ 10∘ section (right) (Schaeben et al. 2012)
Conclusions
Morawiec A (2004) Orientations and rotations. Springer, Berlin Potts D, Prestin J, Vollrath A (2009) Numer Algorithms 52:355 Roe RJ (1965) J Appl Phys 36:2024 Sander B (1950) Einführung in die Gefügekunde der Geologischen Körper. Zweiter Teil Die Korngefüge. Springer Savyolova TI (1994) In: Bunge HJ (ed) Proceedings of the 10th international conference on textures of materials, materials science forum, vol 15762, p 419 Schaeben H, Balzuweit K, León-García O, Rosière CA, Siemes H (2012) Z Geol Wiss 40:307 Schwartz AJ, Kumar M, Adams BL, Field DP (2009) Electron backscatter diffraction in materials science. Springer, New York Shushakova V, Fuller ER Jr, Heidelbach F, Mainprice D, Siegesmund S (2013) Environ Earth Sci 69:1281. https://doi.org/10.1007/s12665013-2406-z Wassermann G, Grewen J (1962) Texturen metallischer Werkstoffe, 2nd edn. Springer, Berlin/Heidelberg
Texture, i.e., a pattern of preferred crystallographic orientation, provides the first order relation of a microscopic anisotropic property of a single crystal to the corresponding macroscopic anisotropic property of the polycrystalline material. Thus, texture is essential in the design of high-tech materials as well as in the recognition of texture-forming processes.
Bibliography Adams BL, Wright SI, Kunze K (1993) Metall Mater Trans A 24:819 Almqvist BSG, Mainprice D (2017) Rev Geophys 55:367 Arnold R, Jupp PE, Schaeben H (2018) J Multivar Anal 165:73 Bachmann F, Hielscher R, Jupp PE, Pantleon W, Schaeben H, Wegert E (2010) J Appl Crystallogr 43:1338 Bachmann F, Hielscher R, Schaeben H (2011) Ultramicroscopy 111: 1720 Bernstein S, Schaeben H (2005) Math Methods Appl Sci 28:1269 Bunge HJ (1965) Z Metallk 56:872 Dawson PR, Wenk H-R (2000) Phil Mag A 80:573 Downs TD (1972) Biometrika 59:665 Helgason S (1999) The radon transform, 2nd edn. Birkhäuser, Boston Hielscher R (2013) J Multivar Anal 119:119 Hielscher R, Schaeben H (2008) J Appl Crystallogr 41:1024 Kunis S, Potts D (2003) J Comput Appl Math 161:75 Kunze K, Schaeben H (2004) Math Geol 36:917 Mainprice D, Hielscher R, Schaeben H (2011) Calculating anisotropic physical properties from texture data using the MTEX open source package. In: Prior DJ, Rutter EH, Tatham DJ, (eds) Deformation mechanisms, rheology and tectonics: microstructures, mechanics and anisotropy, vol 360. Geological Society, Special Publications, London, p 175192. https://doi.org/10.1144/SP360.10. http://sp. lyellcollection.org/content/360/1/175.full Matthies S (1979) Phys Stat Sol A 92:K135 McMahon C, Soe B, Loeb A, Vemulkar A, Ferry M, Bassman L (2013) Ultramicroscopy 133:16
Cumulative Probability Plot Sravan Danda1, Aditya Challa1 and B. S. Daya Sagar2 1 Computer Science and Information Systems, APPCAIR, Birla Institute of Technology and Science Pilani, Zuari Nagar, Goa, India 2 Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
Definition A cumulative probability plot or a cumulative distribution function (see Rohatgi and Saleh 2015) is defined on realvalued random variables. For a univariate real-valued random variable, it is given by the function FX : ℝ ! [0, 1] such that
Cumulative Probability Plot
223
FX(x) ¼ P(X x) for each x ℝ. In the case of two or more random variables, i.e., a multivariate real-valued random variable, it is given by the function FX1 ,,Xn : ℝn ! ½0, 1 such that FX1 ,,Xn ðx1 , , xn Þ ¼ PðX1 x1 , , Xn xn Þ.
where p(xi) denotes the probability that X ¼ xi and the summation is over all xi x such that p(xi) > 0. Similarly, the cumulative distribution function can be written as an improper integral of the probability density function (see Rohatgi and Saleh 2015) in the case of continuous univariate probability distributions:
Illustration As the name suggests, for every real number, a cumulative probability plot accumulates the probability that the random variable takes any value less than or equal to the real number. For example, let X be a univariate random variable that takes values 1 with probability 13, 2 with probability 16, and 3 with probability 12 . The cumulative distribution function of X is illustrated by Fig. 1. In the case of multivariate distributions, a similar assumption holds, i.e., the accumulation occurs in each random variable separately.
In the case of discrete univariate probability distributions, the cumulative distribution function can be written as a possibly infinite summation (as the support of the probability distribution, i.e., the number of values which the random variable attains with positive probability can be countably infinite) of probabilities or the probability mass function (see Rohatgi and Saleh 2015) as follows: pð x i Þ
ð1Þ
xi x
1.0
1
C f X ðtÞdt
ð2Þ
where fX(.) denotes the probability density function of X. It is important to note that there exist probability distributions that are neither discrete nor continuous. Such distributions do not possess either probability mass function or probability density function. For example, consider a realvalued random variable Y whose cumulative distribution function is given by: FY ðyÞ ¼
Relation to Probability Density and Probability Mass Functions
FX ð x Þ ¼
FX ðxÞ ¼
x
0
ey 1 2
if y < 0 otherwise
ð3Þ
Y is neither a discrete random variable nor a continuous random variable. Fortunately, such distributions are rarely used in applications.
Some Important Properties In this section, some important properties of a cumulative distribution function are mentioned. These properties hold for cumulative distribution function of every probability distribution irrespective of whether the probability distribution is a discrete or continuous or neither. 1. For a univariate random variable, the probability that the random variable takes values between two real numbers can be expressed as a difference of the cumulative probability plot at the end points. Formally, if a b are real numbers, we have
Cumulative Probability Plot
0.8 0.6
Pða < X bÞ ¼ FX ðbÞ FX ðaÞ
ð4Þ
0.4
In the case of multivariate distributions, a similar property holds. Let 1 i n then
0.2 0.0 –6
–4
–2
0
2
4
6
8
10
Cumulative Probability Plot, Fig. 1 The cumulative distribution function of the discrete distribution given in the text is plotted here. The function is discontinuous at three points viz. at x ¼ 1, and x ¼ 3. The corresponding y values of the cumulative distribution function at these points are identified by marking with a star
PðX1 x1 , , a < Xi b, , Xn xn Þ ¼ FX ðx1 , , b, , xn Þ FX ðx1 , , a, , xn Þ
ð5Þ
2. For a univariate random variable, the cumulative distribution function is a nondecreasing function, i.e., if b a are real numbers,
224
Cumulative Probability Plot
FX ðbÞ FX ðaÞ
ð6Þ
Similarly, for a multivariate distribution, the cumulative distribution function is nondecreasing in each of its variables, i.e., FX ðx1 , , b, , xn Þ FX ðx1 , , a, , xn Þ
ð7Þ
3. The cumulative distribution function is right continuous in each of its variables, i.e., if 1 i n then lim x!aþ FX ðx1 , , x, , xn Þ ¼ FX ðx1 , , a, , xn Þ ð8Þ 4. The cumulative distribution function is bounded by 0 and 1. More specifically, let 1 i n then the following hold: lim x1 ,,xn !þ1 FX ðx1 , , xn Þ ¼ 1,
ð9Þ
lim xi !1 FX ðx1 , , xn Þ ¼ 0
ð10Þ
Lehmann and Romano 2006). For example, KolmogorovSmirnov test (see Wasserman 2006) is widely used in applications.
Summary In this chapter, a cumulative probability plot or a cumulative distribution function is defined for univariate and multivariate probability distributions. These definitions are then illustrated on a discrete probability distribution with finite support. The relation of cumulative probability plot to probability mass function and probability density is described. This is followed by mentioning some important properties of a cumulative probability plot. The chapter is then concluded by mentioning some of its uses in statistical analyses.
Cross-References Uses in Statistics A cumulative distribution function can be used to simulate data from continuous probability distributions. Suppose one can simulate a continuous uniform distribution on the closed interval [0, 1] using a computer, i.e., u1, , uk are realizations of U[0, 1] using a pseudorandom number generation process. Let X be a continuous random variable with a theoretically known distribution. Data from X can be generated using the inverse transformation F1 X ðÞ on u1, , uk, where FX() denotes the cumulative distribution function of X. In other 1 words, F1 X ðu1 Þ, , FX ðuk Þ are realizations of X. Often, in absence of reasonable assumptions on the distribution of data, nonparametric tests based on empirical estimates of the cumulative distribution function are performed to test statistical hypotheses (see Lehmann and Casella 2006;
▶ Hypothesis Testing ▶ Probability Density Function ▶ Simulation
Bibliography Lehmann EL, Casella G (2006) Theory of point estimation. Springer Science & Business Media Lehmann EL, Romano JP (2006) Testing statistical hypotheses. Springer Science & Business Media Rohatgi VK, Saleh AME (2015) An introduction to probability and statistics. Wiley Wasserman L (2006) All of nonparametric statistics. Springer Science & Business Media
D
Dangermond, Jack Lowell Kent Smith Emeritus, University of Redlands, Redlands, CA, USA
Fig. 1 Jack Dangermond, courtesy of J. Dangermond
Biography Jack Dangermond was born in 1945 and grew up in Redlands, California, where his parents owned a plant nursery at which he worked from an early age. His education included a B.S., Landscape Architecture, California Polytechnic College – Pomona, 1967; M.S., Urban Planning, Institute of Technology, University of Minnesota, 1968; and M.S., Landscape Architecture, Graduate School of Design, Harvard University, 1969. He studied at Harvard’s Laboratory for Computer Graphics and Spatial Analysis (LCGSA) which influenced his founding in 1969, with his wife Laura, of Environmental Systems Research Institute, (now Esri) and also influenced the later development of Esri’s ARC/INFO software.
The Dangermonds’ vision for Esri was to develop Geographic Information Systems (GIS) software, demonstrate its usefulness by applying it to real problems, and then make it freely available. When this model proved financially infeasible, Esri was incorporated as a privately held, profit-making business; but Esri continues to be the means to realize their vision. In the 1970s, Esri developed PIOS, the Polygon Information Overlay System, and GRID for raster data, and applied these in project-focused consulting. Esri next developed ARC/INFO (1982); since its release, Esri has focused primarily on GIS software development, leading to ArcGIS (1999 to the present). In 50 years, Esri has grown from a small research institute to a company with a staff of 4000 in 11 research centers and 49 offices around the world. Jack is a leading advocate for GIS, making thousands of presentations to diverse audiences all over the world. Supporting geoscience, Esri provides research grants and inexpensive software licenses to higher education, and in 2014 donated a billion dollars of software to US schools as part of the ConnectEd initiative. Esri Press publishes GISrelated technical and educational titles. Esri offers online courses and teaching materials. More than 7000 colleges and universities use and teach ArcGIS. Jack’s professional service has included: Committee on Geography, US National Academy of Sciences; National Geographic Society Board of Trustees; National Geospatial Advisory Committee, NGAC; Earth System Science and Applications Advisory Committee, NASA; Science and Technology Advisory Committee, NASA; National Steering Committee, Task Force on National Digital Cartographic Standards; and Executive Board, National Center for Geographic Information and Analysis (NCGIA). Among the recognitions of his accomplishments are Officer in the Order of Orange Nassau, Netherlands, 2019; Audubon Medal, The National Audubon Society, 2015; Alexander Graham Bell Medal, National Geographic Society, 2010; Patron’s Medal, Royal Geographical Society, 2010; Carl Mannerfelt
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
226
Medal, International Cartographic Association, 2008; Fellow of ITC, The Netherlands, 2004; Brock Gold Medal for Outstanding Achievements in the Evolution of Spatial Information Sciences, International Society for Photogrammetry and Remote Sensing, 2000; Cullum Geographical Medal of Distinction for the advancement of geographical science, American Geographical Society, 1999; and James R. Anderson Medal of Honor in Applied Geography, Association of American Geographers, 1998. Jack has received 13 honorary doctorates, including: University of Massachusetts, Boston, Massachusetts, 2013; University of Minnesota, Minneapolis, 2008; California Polytechnic University – Pomona, 2005; State University of New York – Buffalo, 2005; and City University of London, 2002.
Data Acquisition Rajendra Mohan Panda1 and B. S. Daya Sagar2 1 Geosystems Research Institute, Mississippi State University, Starkville, MS, USA 2 Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
Definition Data acquisition is a process to collect and store information for production use. Technically, data acquisition is a process of sampling signals of real-world physical phenomena and their digital transformation.
Terminology Analog-to-digital converter (ADC) prepares sensor signals in the digital form. Data loggers are independent data acquirers. Digital-to-analog (D/A) converter produces analog output signals. Electromagnetic spectrum is the wavelengths or frequencies extending from gamma to radio waves. Resolution is the smallest incremental unit of a signal in the data acquisition system. Resolution is either expressed in bits or percent of full scale, where one part in 4096 resolution represents 0.0244% of full scale. Radiometers are the instruments used to measure the intensity of electromagnetic radiation in select bands. Sample rate is the speed at which a data acquisition system gathers data. Unit: sample/s Sensors or transducers convert physical phenomena to electric signals.
Data Acquisition
Spectrometers are devices designed to detect, measure, and analyze the spectral phenomena of reflected electromagnetic radiation. Signal-molding hardware shapes sensor signals to digital forms in an analog-to-digital converter. Signal conditioning or transmission is an optimization process that modifies sensor signals to a usable form.
Introduction Data acquisition is a process to collect and store information for production use. Data are newly collected, legacy data, shared, and purchased data. These data include sensorgenerated data, empirical data, and other sources. This data acquisition concept started around the early seventies, with the development of software by the International Business Management (IBM) company solely dedicated to data collection. The computing abilities for data acquisition have been improved immensely with better storage capacities and faster processing. Data acquisition involves signal reception and digitization for storage and analysis using computers (Meet et al. 2018). The data acquisition (DAQ) system has three components (OMEGA): Sensor ! Analog-to-digital-conversion ! Transmission
Sensors transduce physical values to electric signals, and the sensor-molding hardware converts them to digital values. These digitally modified values undergo noise correction and optimize for several implications. Several programming languages used for obtaining such signals are Assembly, BASIC, C, C++, Fortran, Java, LabVIEW, Lisp, and Pascal. There is also stand-alone DAQ hardware that can process electric signals without a computer, e.g., data loggers and oscilloscopes. The DAQ software processes data from the DAQ hardware to outputs for meaningful uses. Data acquisition devices are equipped with signal conditioning and analog-to-digital converter. These devices can be wired or wireless and need a computer for data transmission. The wired data acquisition process can be single-ended or differential inputs. Interface buses like RS232 and RS485 are popular DAQ devices. RS232 is suitable for small businesses but has limitations of supporting one device for serial communication with the transmissibility of 50 ft. Alternately, RS485 has options for multiple device connectivity and transmission support for a distance of 5000 ft. General Purpose Interface Bus (GPIB) or IEEE 488 (as per ANSI protocol) are standard interface buses. The Universal Serial Bus (USB), Ethernet, and PCI are other data acquisition devices, with USB being advantageous for its higher bandwidth and power supply capability. Data acquisition devices vary depending on the data collection speed of the analog-to-digital
Data Acquisition
converter. This speed is the number of channels sampled per second and expressed as the sample rate. DAQ cards or boards increase the sample rate by directly connecting to the bus. In addition, modular data acquisition systems, designed for a high channel count, integrate and synchronize multiple sensors with multiple applications to increase the speed of data collection. The PXI is a popular modular system with tremendous flexibility. It has options of channel count and is cost-effective. This modular system is easy to use and is capable of fast data acquisition. Some data acquisition tools for specific uses by the users are WINDAQ, ActiveX, DATAQ, Flukecal, MSSATRLABS, DAQami, TracerDAQ, DASYLAB, and LabVIEW.
Remote Sensing Data Acquisition Principles Remote sensing is a process of data acquisition on earth or other planetary systems. It involves data capture and interpretation of the electromagnetic spectrum without making physical contact with the object. It works on the principle of the law of conservation of energy, i.e., total energy to be conserved over time (Feynman 1970): Incident energy ¼ Reflected energy þ Absorbed energy þ Transmitted energy Types Usually, remote sensing satellites follow three orbital paths: polar, nonpolar, and geostationary (Zhu et al. 2018). Polar satellites are placed about 90 above the equatorial plane to procure information about the entire globe and polar regions. These are sun-synchronous satellites appearing at the exact location at the same time for every cycle. They move from south to north and north to south, which explains their ascending and descending nature. Nonpolar satellites are at lower orbits, i.e., 2000 km from the earth’s ground. These satellites provide partial coverage, for example, the Global Precipitation Measurement (GPM) mission satellite covers 65 N to 65 S latitude. Geostationary satellites are placed above 36,000 km of the earth’s surface, very powerful in weather forecasting. These satellites follow the earth’s rotation at the same speed the earth rotates. Due to the same reason, these satellites capture the same view of earth’s particular area with each observation in a regular fashion. Sensors The chief component of remote sensing data acquisition is the sensors, which can be active or passive. Passive sensors are capable of detecting solar radiation that is reflected or emitted by the object or scene being observed (NASA) and have radiometers and spectrometers operating in the visible
227
(400–700 nm), infrared (700–3000 nm), thermal infrared (3000 nm–1 mm), and microwave (1 mm–300 cm) portions of the electromagnetic spectrum (Elert 1998). These sensors are useful in procuring information from land and sea surface, vegetation, cloud, aerosol, and other physical phenomena. Passive sensors are not efficient in penetrating dense clouds, which active sensors can fulfill. Active sensors do not depend on sunlight for emissivity. These sensors primarily operate in the microwave portion of the electromagnetic spectrum having radio detection and ranging (radar) sensors, altimeters, and scatterometers. Active sensors measure the vertical profiles of aerosols, forest structure, precipitation and winds, sea surface topography, and snow cover. Synthetic aperture radar (SAR) is a well-known active sensor, which provides cloud-free data. Its sensors are capable of collecting data day and night (Woldai 2004). Resolution The remote sensing data include multispectral, hyperspectral, optical, Infrared, thermal, and microwave imageries, and their quality depend on four resolution categories: radiometric, spatial, spectral, and temporal (NASA). The satellite resolution varies with the orbits and the type of sensors. The radiometric resolution provides energy information of a pixel in bits. The higher the bit value, the higher the pixel information, and the higher the radiometric accuracy of the sensor (Liang and Wang 2019). For example, an 8-bit resolution has an extent of digital numbers from 0 to 255 against an 11-bit from 0 to 2047 and so on. Spatial resolution represents the pixel size of a digital image that explains the minimum area coverage of the satellite sensor on the ground. For example, a satellite sensor at a 1-km resolution has a pixel size of 1 km 1 km. The pixel is a function of orbital altitude, radiation angle, sensor size, and focal length of the optical system. Swath width corresponds to the total field of view on the ground. Satellites have different spatial resolutions to accomplish specific purposes (Table 1). Based on pixel size, satellite sensors have four major spatial resolution categories: (a) coarse (>1 Km), (b) medium (100 m–1 km), (c) high (5–100 m), and (d) very high ( 0:
ð14Þ
• Gaudin-Melloy (GM) distribution with scale parameter xmax and shape parameter n:
Grain Size Analysis, Table 2 Types of quantity for grain size distributions r 0 1 2 3
Grain Size Analysis, Fig. 1 Cumulative grain size distribution function Q3(x) plotted against linear and logarithmic x-scale. Since the size scale can be in arbitrary dimensions, no explicit length unit was used for these example plots
Grain Size Analysis, Fig. 2 Histograms and corresponding density functions obtained by kernel estimators in linear and logarithmic scale
Definition Number Length Area Volume or mass
Q 3 ðxÞ ¼ 1 1
x xmax
n
for 0 x xmax and n > 0: ð15Þ
These distributions are frequently used, but experience shows that often the fit at the tails is of low quality. To overcome such shortcomings, tailor-made model adjustments are used, e.g., the three parameter RRSB distribution function, where an additional term xmin is used.
Analysis Methods Grain sizes are measured in different approaches and methods, depending on the fineness and accessibility of the grains. The following are brief outlines of such methods; for details we refer to Allen (1990), Bernhardt (1994), and Higgins (2006). Sieve Analysis Sieve analysis is a popular measurement approach for loose grains in the range of 5–125 mm. During sieving the material is separated into fractions, the masses of which are determined and compiled as histograms. The analysis can be done consecutively with single sieves or in parallel with a set of sieves by hand or by machine with different setups, mostly depending on the material properties. The loose grains
G
592
are transported through the mesh openings by inertia forces, by gravity and/or flow forces, or by hand if very large grains are sieved. Imaging Particle Analysis This complex of analysis methods is based on images taken with cameras, microscopes, X-ray detectors, etc. The grains can be loose like fractured rock as well as embedded grains like crystals in a mineral microstructure. While microstructures are analyzed in a static or quasi static setups, for example, under a microscope or in a tomograph, images of loose grains can also be taken in a dynamic situation, like sand in a flow (Fig. 3). Based on the kind of images it can further be distinguished between approaches for the analysis of projections, sections, or spatial images like those resulting from tomography. In theory, the size resolution of imaging particle analysis is only limited by the resolution of the imaging method. For deriving grain sizes of projections or sections of original three-dimensional objects, size parameters can be calculated directly from measured parameters like area or perimeter of the outlines. However, those are fraught with uncertainties, since the spatial character of the measured objects is only partly taken into account. To overcome this, stereological methods for either sections or projections can be applied. In this context, stereology offers a comprehensive toolbox to calculate spatial characteristics based on specific assumptions on the grain shape and orientation. In contrast, it is possible to measure spatial characteristics directly on spatial images by means of tomography without further assumptions, which is a great benefit. However, tomography is difficult when objects with similar physical properties, like crystals of the same material, are touching each other. (Since tomography methods measure the interactions between penetrating waves and the matter of the objects and X-ray absorption and refraction are Grain Size Analysis, Fig. 3 Imaging particle analysis of a static microstructure and of moving loose grains. For grain size analysis, either direct measures of the planar projections can be used or spatial length measures are calculated with stereological tools
Grain Size Analysis
isotropic for all materials, it is hardly possible to distinguish adjoining objects if they have similar properties (Baker et al. 2012).) Furthermore, tomography needs rather complex measurement instruments and produces huge datasets, and the evaluation of tomography data needs special statistical methods (Ohser and Schladitz 2006). As an example, for the application of stereological methods, grain measurement for the quartz grains in Fig. 4 is discussed. For the analysis of such planar sections there exist some solutions in the mineral-related literature. One, see Higgins (2000, 2006), assumes that the grains are parallelepipeds or ellipsoids, whereas another, see Popov et al. (2014), assumes ellipsoids. In the planar section the areas of the section profiles are measured. By means of approximate stereological methods then grain shape is characterized according to the assumed geometry and some linear size characteristic, which is the diameter of volume-equivalent sphere for Popov and maximum length or major axis of equivalent ellipsoid in the Higgins approach. The Popov method is parametric and uses the log-normal distribution. Figure 4 shows the solutions for both approaches for the quartz grains. Light Scattering Methods If a loose grain is irradiated with light, several interactions like light scattering occur. Since the intensity of the scattered light is a function of grain size, this principle can be used for grain size measurement. In this context, there exist two main measuring principles. The first is direct measurement of scattered light, which allows measurement of loose grains in the narrow size range between 0.1 mm and 50 mm. The second is measurement of extinction caused by a loose grain in a light beam. This method of measurement has a lower limit of approximately 1 mm but is theoretically capable of measuring grains up to the
Grain Size Analysis
593
Grain Size Analysis, Fig. 4 Thin section (35 mm 31 mm) of a granite and corresponding empirical lognormal distributions for grain size of quartz
G size of centimeters. For both measuring principles, the grains have to be dispersed into a gas or a fluid. For measuring grain size with light scattering, a variety of solutions with different requirements and specifications are available. To measure the size of solid grains, solutions with absorbance measurement are commonly used. There, the material is dispersed in a liquid or a gas and transported through a measurement cell where the grains are measured automatically. Sedimentation Analysis Since the settling behavior of loose grains in a fluid largely depends on their size, the sedimentation principle provides a basis for grain size analysis. Both, gravitation and centrifugal force can be used. Generally, two main approaches are distinguished. For the incremental method, the rate of change of density or concentration is used, while the cumulative approach uses the settling rate of loose grains. While measures based on the first method are fast, an advantage of cumulative measurement is that it works also with small amounts of material, which reduces the error due to interactions between the settling grains. Sedimentation analysis allows various measurement setups of different complexity (Allen 1990; Bernhardt 1994). It is mainly used for the analysis of fine grains in the lower micron range. If sedimentation in a laminar flow field is used, the Stokes diameter dSt serves as reference size:
Particle imaging and light scattering methods benefit from advances in computer technology and optoelectronics, whereas sedimentation analysis has lost some of its importance. However, it is still considered one of the most important methods of grain size analysis.
Summary The size of grains is measured by various methods, which are adapted to the nature of the material. There is a tendency to use more and more computer techniques. Grain size is defined in different ways, in dependence of the material and the method of measurement. The data obtained are analyzed statistically, using various special distribution functions. The statistics is difficult for the tails of distributions, for very small or very large grains. For embedded grains stereological methods are used.
Cross-References ▶ Cumulative Probability Plot ▶ Pore Structure ▶ Stereology
Bibliography dSt ¼
18 rg rf ∙g
∙ws ,
ð11Þ
where – dynamic viscosity of the fluid, rg– grain density, rf– fluid density, g– acceleration due to gravity or centrifugal force, ws – settling velocity of the loose grain. This implies that the grain Reynolds number calculated for the Stokes diameter is below the value of 0.25.
Allen T (1990) Particle size measurement. Powder technology. Springer Netherlands, Dordrecht Baker DR, Mancini L, Polacci M, Higgins MD, Gualda G, Hill RJ, Rivers ML (2012) An introduction to the application of X-ray microtomography to the three-dimensional study of igneous rocks. Lithos 148:262–276. https://doi.org/10.1016/j.lithos.2012.06.008 Bernhardt C (1994) Particle size analysis: classification and sedimentation methods. Springer Netherlands, Dordrecht Higgins MD (2000) Measurement of crystal size distributions. Am Mineral 85:1105–1116. https://doi.org/10.1515/am-2000-8-901
594 Higgins MD (2006) Quantitative textural measurements in igneous and metamorphic petrology. Cambridge University Press, Cambridge Ohser J, Schladitz K (2006) 3D images of materials structures. Wiley, Weinheim Popov O, Lieberwirth H, Folgner T (2014) Properties and system parameters – quantitative characterization of rock to predict the influence of the rock on relevant product properties and system parameters – part 1: application of quantitative microstructural analysis. AT Miner Process 55:76–88 Randolph AD, Larson MA (1988) Theory of particulate processes: analysis and techniques of continuous crystallization, 2nd edn. Academic, San Diego Rumpf H, Scarlett B (1990) Particle technology. Springer Netherlands, Dordrecht
Graph Mathematical Morphology Rahisha Thottolil and Uttam Kumar Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India
Definition Mathematical morphology provides a set of filtering and segmenting tools for image analysis. Here, an image space is considered as a graph where the connectivity of the pixels/grids is taken into account. A simple graph is a collection of vertices and edges, and it is denoted as G ¼ (X •, X), where X • is a non-empty set of vertices and X (representing adjacency relation among the vertices) is composed of pairs of distinct vertices from X • called edges. In other words, each element of X • is called a vertex or a node point of a graph G, and each element of X is called an edge of a graph G. A simple graph is an undirected graph where each edge connects two different vertices and no two edges connect the same pair of vertices. Graph mathematical morphology (GMM) is a systematic theory and has been developed on these graph spaces (G). GMM extracts structural information from simple and small predefined structuring graphs by constructing morphological operators such as openings and closings. However, morphological transformations by GMM are restricted by their graph structure like the subsets of vertices, the subsets of edges, and the subgraphs of graphs.
Introduction The term morphology is widely used in the field of biology that deals with the shape and structure of plants and animals. However, in this chapter, we outline the important concept of mathematical morphology (MM) in the context of graphs that
Graph Mathematical Morphology
has wide applications in digital image processing. MM was introduced in mid-1960s by two researchers Georges Matheron and Jean Serra (Youkana 2017) for investigating the structures present in crystals and alloys. The theoretical basis of a morphological operator was derived from set theory and integral geometry (Heijmans and Ronse 1990). Initially, MM was applied on binary, gray-scale and color images. Luc Vincent (1988) extended the application of MM from image space to graph space. Later, MM operators were applied on multivalued data such as in remote sensing and geoscience applications, medical image analysis, and astronomy to extract spatial features. This chapter aims to present a brief introduction of classical morphology operators on twodimensional binary and gray-scale images and discuss graph-based mathematical morphology (GMM). In the following section, important terminologies used in morphological methods are defined. A 2D binary image is assumed to be represented in the form of a point set (vector in 2D space), where each pixel is an image point with value 0 or 1, whereas in the case of a grayscale image, the value of an image point is an ordered set of gray levels (e.g., 8-bit image value ranges from 0 to 255). Figure 1 (a) and (b) shows examples of a discrete binary and gray-scale images, where each image pixel (point) is represented as a grid. In practice, the points belong to an area of interest (building footprint illustrated in white pixels) in Fig. 1a and are represented as a point set X. It means, X ¼ {(x0, y0), (x0, y1), . . .. . .., (xi, yi)}, and all the points (xi, yi) belong to the location where the area of interest/object is present, which is a simple morphological description of an image. Similarly, the background pixels (black) can also be represented in the form of a point set, which is the complement of X(Xc). In case of gray-scale, it is a multidimensional (n) image and is represented as a set of points, X Εn, where En is n-dimensional Euclidian space. However, Fig. 1b illustrates the concept of En on a 2D gray-scale image, where each grid denotes a single pixel, then (n 1) coordinates represent the spatial domain, and nthcoordinate represents the value of the function. Morphological operation on an image involves a relation of the point set X with another small structural point set(B). Generally, B is a small subset of a binary image (called structuring elements: SE(B) defining the neighborhood of the points) that uses probes with well-defined shape and size to extract specific information (Heumans et al. 1992). Here, one of the pixels represent an origin (usually center pixel), and its size is based on the dimension of matrix while shape depends on the position of values ones or zeroes (Youkana 2017). It is to be noted that the selection of SE is done by considering the image geometry and prior knowledge of spatial properties of an object. Usually, flat SE are used in
Graph Mathematical Morphology
595
Graph Mathematical Morphology, Fig. 1 Example of (a) a binary image and (b) a grayscale image
case of 2D image analysis, and non-flat SE are used for higher dimensional images. Once the image and SE(B) are represented in the form of sets, then MM operators on images are basic set operations (usually nonlinear) on the point sets. In practice, SE(B) moves systematically over the input image(X), and the performance of operation depends on the type of SEs and translation operations applied. To learn more about the theoretical aspects of classical morphology, we need to define a complete lattice notion which is the fundamental source of MM operations.
Fundamental Operators: Dilation and Erosion The fundamental morphological operators are dilation (δ) and erosion (ε) (Heijmans and Ronse 1990). The dilation operation is implemented in terms of union (the Minkowski addition) of two sets. We denote the dilation of a set X over a SE(B) by δ(X) as shown in Eq. (1) in 2D space (ℤ). dðXÞ ¼ X B ¼ p ℤ2 =p ¼ x þ b, x X, b B ¼ [b B Xb
dB ðFÞ ¼ max b B ½Fðx þ bÞ
ð3Þ
eB ðFÞ ¼ min b B_ ½Fðx þ bÞ
ð4Þ
In case of dilation, the algorithm finds the maximum value of the original image within the scope of the SE, whereas erosion consists of minimum value of F. Erosion and dilation operations are considered as adjunction pairs of basic morphological operators (δ, ε). Most of the morphological algorithms are derived from these two basic elementary operations.
ð1Þ
Morphological Filtering: Opening and Closing
The erosion operation of a set X by SE(B) - ε(X) is defined in terms of intersection (the Minkowski subtraction) operation of two sets as given by Eq. (2), where Xb denotes the translation of X by b. eðXÞ ¼ XYB ¼ p ℤ2 =ðp þ bÞ X, b B ¼ \b B Xb
image. Dilation of an image is always commutative with SE(B) and always associative with different SE. To compensate for the distraction of object shape in image processing, we use the combination of these basic MM tools with the same SE(B). The implementation of dilation and erosion on gray-scale image (F) consists of finding the maximum and minimum of original image (F) respectively, as given by Eqs. (3) and (4).
ð2Þ
In general, after applying dilation operation on an image, the regions of an object area expand and noisy pixels are removed from the object region. In contrast, erosion is an inverse operation of dilation where the boundary of an object gets eroded. Interestingly, both the operators are dual by complementation and can be applied at various levels such as during pre-processing, analysis, and post-processing of an
Two main filtering operators derived from the composition of dilation and erosion are opening (X ∘ B) and closing (X • B). Thus, the opening of X by B is an anti-extensive (γ(x) x) filtering function. Opening and closing filters on a binary image are expressed as follows: gB ðXÞ ¼ X∘B ¼ ðXYBÞ B
ð5Þ
’B ðXÞ ¼ X • B ¼ ðX BÞYB
ð6Þ
The opening operation on an image (X) by SE(B) is defined as the erosion of X by B followed by a dilation of the results by the same SE. Likewise, the closing operation on an image X is the dilation of X by B followed by an erosion of the result by
G
596
B, which is extensive (x ’(x)). Both the filters are complementary to each other. Opening operations are useful for smoothening the contour of the objects present in an image and to remove the long narrow gaps. Further, closing fuses narrow gaps and removes small patches by filling the contour breaks. While applying a closing filter, we can remove all the internal noise present in the region, and by using an opening filter, the external noise is removed from the background region of an image. Composing closing and opening operations with increasing size of SE is called an alternate sequential filter (ASF). These MM algorithms have practical use cases in image processing such as extracting boundary of an object, region filling, extracting connected components, retrieving convex hull information from high-level images, thinning and thickening operations, and enhancing the object structure by skeletonization. During pre-processing, applications such as filtering of salt and pepper noise, shape simplification (i.e., complex structured object broken into a number of simple structures), extracting image components such as segmentation of an image by using object shape, quantification of geometric features (area, perimeter and axis length) of an object present in an image, and object masking, etc. are possible. Efficient filters can be obtained for image processing by various compositions and iterations of basic morphological operators.
Graph Mathematical Morphology
difference between the adjacent pixels and it is called discrete gradient. These edge-weighted graph models are useful for image segmentation and filtering. The objects that appear in an image have structural information, therefore, MM operators can be applied on more complex data structures like graphs and hypergraphs. For example, in the field of transportation, road networks are represented as a planar graph. In such cases, the intersection of roads and endpoints is represented as vertices and road segments correspond to edges. The distance between the vertices is some kind of a measure and the graph representation of the road network describes its spatial pattern. The basic concept of a graph in the form of a road network is shown in Fig. 2 (a) and (b) where vertices are represented by red dots and the edges by yellow line segments. In GMM operators, SE are extracted from the graphs using predefined probes (structuring graphs) (Heijmans et al. 1992). Similar to the SE, structuring graphs are also small as compared to the input graph. Cousty (2018) proposed a large collection of morphological operators acting on graphs such as dilation, erosion, opening and closing, granulometries, and alternate sequential filters. Here, the input operands and the result of the operators were both considered as graphs (Cousty et al. 2013).
MM Operators on Graphs: Dilation and Erosion Basic Concept of Graphs In this section, we describe how to extend the classical morphological operation from image to graph spaces. Generally, the image space is considered as a graph whose vertex set is generated from pixels, and the edges represent the local pixel adjacency relation or Euclidian distance between the pixels. In a 4-pixel adjacency graph, the weight of an edge is the distance between the neighboring pixels, and to model color images, the edge weights represent the similarity or dissimilarity between the neighboring pixels (Danda et al. 2021). Edge weight is computed by taking the absolute Euclidian Graph Mathematical Morphology, Fig. 2 (a) Undirected road network represented as a simple graph, (b) road network overlaid on Google Earth image
Here, operands are considered as a graph and the basic morphological operators are defined. Complex structures and relations with neighboring structures present in an image are represented by simple (unweighted) graphs. A dilation operator on a graph with the set of vertices X • is illustrated in Fig. 3a, b (Cousty 2018; Youkana 2017). The dilation operator from the set of vertices expands to all the neighboring vertices which are connected through edges. Dilation operator on a graph with subgraphs as shown in Fig. 3 (c) and (d) shows that dilation enrich by expanding the set of elements with the adjacent and connected vertices and edges.
Graph Mathematical Morphology
597
G
Graph Mathematical Morphology, Fig. 3 Illustration of dilation operation on set of vertices (a–b) and subgraphs (c–d) (Cousty 2018)
The graph space G ¼ (X •, X), the set of x• is the subset of X , xis the subset of X, and g ¼ (g•, g) is the subgraph of G from the complete lattice. Four elementary building blocks on graphs are used to derive a set of edges from a set of vertices and a set of vertices from a set of edges (Cousty 2018). The dilation and erosion operators of vertices (δ• & ε•) map from g to g•, and the operators of edges (δ & ε) map from g• to g as illustrated in Fig. 3. •
d • ðX Þ ¼ fx G • j∃fx, yg X g, for any X G
ð7Þ
e • ðX Þ ¼ fx G • j8fx, yg G , fx, yg X g, for any X G ð8Þ In Eq. 7, the dilation operator maps to any set of edges X denoted as δ•(X), which results in all the set of vertices that belong to an edgeX. The erosion operator ε•(X) maps to any
set of edges X (Eq. 8) that results from the set of vertices that are completely covered by edges X. Resultant graphs are shown in Fig. 4a, c. The erosion operator ε maps to any set of vertices X•, the set of all edges whose two extremities are in X• (Eq. 9). The dilation operator δ maps to any set of vertices X•, the set of all edges that have at least one extremity in X• (Eq. 10). Resultant graphs are shown in Fig. 5a, c. e ðX • Þ ¼ ffx, yg G jx X • and y X • g, for any X • G• ð9Þ d ðX • Þ ¼ ffx, yg G j x X • or y X • g, for any X • G• ð10Þ
598
Graph Mathematical Morphology
Graph Mathematical Morphology, Fig. 4 Operators acting on the lattice from g to g• (Cousty 2018)
Graph Mathematical Morphology, Fig. 5 Operators acting on the lattice from g• to g (Cousty 2018)
The dilation operator commutes with union set operation, and the intersection operation is erosion. MM operators on binary unweighted graphs commute with set operations as follows: d • ðX [ Y Þ ¼ d • ðX Þ [ d • ðY Þ, for any X , Y G e • ðX \ Y Þ ¼ e • ðX Þ \ e • ðY Þ, for any X , Y G
d ðX [ Y Þ ¼ d G•
ðX • Þ
[d
ðY • Þ, for
any
X• , Y•
e ðX \ Y Þ ¼ d ðX • Þ \ d ðY • Þ, for any X • , Y • G•
•
ð11Þ
ð12Þ
ð13Þ
ð14Þ
where δ and δ are called vertex-edge dilation and edgevertex dilation. Similarly, εand ε• are called vertex-edge erosion and edge-vertex erosion, respectively. The elementary vertex dilation (δ(X•)¼δ ¼ δ• ∘ δ) and edge dilation (Δ(X)¼Δ ¼ δ ∘ δ•) by composition of the vertex-edge dilation and edge-vertex dilation. Likewise, vertex erosion (ε(X•)¼ε ¼ ε• ∘ ε) and edge erosion (Ε(X)¼ E ¼ ε ∘ ε•) are obtained by the composition of the vertex-edge erosion and edge-vertex erosion. In case of morphological analysis on
gray-scale images, intensity value is also considered; therefore, MM operators are proposed on weighted graphs. The major applications of gray-scale morphology are contrast enhancement, texture extraction, edges detection, and thresholding. Definitions involved in gray-scale graph morphological operations are shown below: Let F• I(X•) and F I(X); then MM operators are d • ðF ÞðxÞ ¼ maxfF fx, ygjfx, yg X g, 8x X •
ð15Þ
e ðF • Þfx, yg ¼ minfF • ðxÞ, F • ðyÞg8fx, yg X
ð16Þ
e • ðF ÞðxÞ ¼ fF fx, ygjfx, yg X g8x X •
ð17Þ
d ðF • Þfx, yg ¼ maxfF • ðxÞ, F • ðyÞg, 8fx, yg X
ð18Þ
Grayscale image dilation involves assigning the maximum value over the neighborhood of the SE. In contrast, erosion involves assigning minimum value over the neighborhood of the SE (Cousty et al. 2013).
Morphological Filters on Graphs Graph-based morphological filters such as opening and closing were obtained from the composition and iteration of dilation and erosion either by applying on set of vertices and
Graph Mathematical Morphology
599
set of edges or subgraphs of graphG. The number of iteration operations (also called parameter size denoted as ‘l’) is important in opening and closing filters. The value of l depends on the size of the features to be preserved or removed (Youkana 2017). Filtering operators depend on the order of basic MM operators, and their complexity corresponds to the value of land the size of the graphs. We denote opening and closing on vertices byγ, (γ ¼ δ ∘ ε) and f, (f ¼ ε ∘ δ), respectively. Similarly, opening and closing on edges are denoted by Γ, (Γ ¼ Δ ∘ E) and F, (F ¼ E ∘ Δ), respectively. Combination of these operators gives opening ([γ, Γ]) (Eq. 19) and closing ([f, F]) (Eq. 20) on a graph G. ½g, G ¼ ½gðX • Þ, GðX Þ
ð19Þ
½f, F ¼ ½fðX • Þ, FðX Þ
ð20Þ
We can deduce the operators on the subgraph of graph called half-opening and half-closing (Najman and Cousty 2014). The series of these morphological filters are called granulometries. Furthermore, an alternate sequence of filters can be derived from intermixing of opening and closing with the increasing size of structuring graph. It has been found that GMM operators are more efficient than image-based operators (Youkana 2017). However, a major drawback of graphbased image analysis is the images with a large number of pixels that have very high asymptotic complexity (Danda et al. 2021).
Application of MM and GMM Graph-based MM operations have been used in applications involving quantitative description of spatial relationship between objects in images. Heijmans and Vincent (1992) studied the heterogeneous media at a macroscopic level based on the information on their microstructure by GMM. Furthermore, GMM filtering on preprocessing of medical images with higher accuracy among existing operations were demonstrated by Cousty (2018). The usage of GMM based ASF has proved to render minimum mean square error for noise removal (Cousty et al. 2013; Youkana 2017). GMM models have also been applied successfully for various image segmentations and filtering applications including astronomical images (Danda et al. 2021) and for segmentation of 3D images from multimodal acquisition devices (Grossiord et al. 2019). However, application of GMM in geospatial data analysis is a field of ongoing research. Hence, there is a tremendous scope of applied GMM in remote sensing data analysis.
Summary The concept of morphological techniques discussed in this chapter constitutes a significant tool in the image-based geoscience data analysis. The main purpose of mathematical morphology operation is the quantitative description of spatial features of the objects, specifically, extracting meaningful information from noisy images. Dilation and erosion are the fundamental morphological operators, and all other morphological algorithms are derived from these primitive functions. In the first part, MM operators on binary and gray-scale image space were discussed. In the second part, we focused on the framework of MM operators on graph space. Finally, we summarized a few studies in the field of GMM.
Cross-References ▶ Binary Mathematical Morphology ▶ Grayscale Mathematical Morphology ▶ Mathematical Geosciences ▶ Mathematical Morphology
Bibliography Cousty J (2018) Segmentation, hierarchy, mathematical morphology filtering, and application to image analysis. Doctoral dissertation, Université Paris-Est Cousty J, Najman L, Dias F, Serra J (2013) Morphological filtering on graphs. Comput Vision Image Unders 117(4):370–385 Danda S, Challa A, Sagar BSD, Najman L (2021) A tutorial on applications of power watershed optimization to image processing. Eur Phys J Spec Topics 230:1–25 Grossiord E, Naegel B, Talbot H, Najman L, Passat N (2019) Shapebased analysis on component-graphs for multivalued image processing. Math Morphol Theory Appl 3(1):45–70 Heijmans HJ, Ronse C (1990) The algebraic basis of mathematical morphology I. Dilations and erosions. Comput Vision Graphics Image Proc 50(3):245–295 Heumans HJAM, Nacken P, Toet A, Vincent L (1992) Graph morphology. J Visual Commun Image Represent 3(1):24–38 Heijmans HJ, Vincent L (1992) Mathematical morphology in image processing, E. Dougherty, editor. Marcel-Dekker, New York, pp. 171–203. Najman L, Cousty J (2014) A graph-based mathematical morphology reader. Pattern Recog Lett 47:3–17 Vincent L (1988) Mathematical morphology on graphs. Proceeding SPIE 1001, visual communications and image processing ‘88: third in a series. https://doi.org/10.1117/12.968942. Youkana I (2017) Parallelization of morphological operators based on graphs. Doctoral dissertation, UNIVERSITE DE MOHAMED KHIDER BISKRA).
G
600
Grayscale Mathematical Morphology
3. Opening: Given a gray-scale image f and a structuring element g the opening is defined as
Grayscale Mathematical Morphology Sravan Danda1, Aditya Challa1 and B. S. Daya Sagar2 1 Computer Science and Information Systems, APPCAIR, Birla Institute of Technology and Science, Pilani, Goa, India 2 Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
gg ðXÞ ¼ dg ∘ eg ðf Þ
ð3Þ
4. Closing: Given a gray-scale image f and a structuring element g the closing is defined as fg ðXÞ ¼ eg ∘ dg ðXÞ
ð4Þ
Definition Gray-scale morphology encompasses all operators from mathematical morphology (MM) on gray-scale images. MM operators are defined on complete lattices. Gray-scale images constitute a specific lattice. This allows the definition of the MM operators to be simplified. In this entry we shall give an overview of different gray-scale morphological operators. Detailed explanation of these operators can be found in their respective chapters and the books Najman and Talbot (2013), Soille (2004), and Dougherty and Lotufo (2003). A gray-scale image is represented as a function, f: E ! {0, 1, 2, , 255} where E is either ℝ2 or ℤ2. If the domain is ℤ2 it is referred to as a discrete image.
Filtering on Gray-Scale Images A morphological filter on a gray-scale image is any operator that is increasing and idempotent and can be obtained using the dilation/erosion operators above. The opening/closing operators above form the basic filters. Several families of filters can be constructed from these operators. 1. Granulometries: The idea of a granulometry is similar to filtering out particle based on their sizes. Using the opening operator we can define it as. Granulometry ¼ gng ∘ gðn1Þg ∘ ∘ g2g ∘ gg
Two Kinds of Structuring Element Recall that the structuring element in the gray-scale case is another gray-scale image with restricted domain. The structuring element is also referred to as structuring function in few places. There are two kinds of structuring elements based on the values the gray-scale image can take – Flat structuring element takes only a value 0 while Non-flat structuring element can take any positive values.
ð5Þ
Not all structuring elements g are allowed. Usually flat structuring elements with the domain constrained to be a disk or line-segment is considered. 2. Alternating Sequential Filters: For gray-scale images where noise is spread across different scales and contains both bright and dark distortions, using alternate sequences of opening and closing would help. ASF ¼ fng ∘ gng ∘ fðn1Þg ∘ gðn1Þg ∘ fg ∘ gg
ð6Þ
Basic MM Operators on Binary Images The following are the basic operators from mathematical morphology adapted to gray-scale images. 1. Dilation: Given a gray-scale image f and a structuring element g, the dilation is defined as dg ð f Þ ¼ ð f gÞðxÞ ¼ sup f f ðyÞ þ gðx yÞg yE
ð1Þ
2. Erosion: Given a gray-scale image f and a structuring element g the erosion is defined as eg ð f Þ ¼ ð f gÞðxÞ ¼ inf f f ðyÞ gðx yÞg yE
ð2Þ
One can also start with closing instead of opening as in the above definition. On the other hand one can define an algebraic filter, which is any operator that is increasing and idempotent. Examples of algebraic filters include – Area Opening, Attribute Openings, Annular Opening, Convex Hull Closing, etc.
Other Operators on Gray-Scale Images A host of operators can be defined on binary images. A few are stated below.
Griffiths, John Cedric
601
1. Geodesic Dilation/Erosion: The geodesic operations force the output to belong to a mask f 0 . A single step of geodesic dilation is defined as ð1Þ
Geo Dilate ðf Þ ¼ dg ðf Þ ^ f
0
ð7Þ
Griffiths, John Cedric Donald A. Singer U.S. Geological Survey, Cupertino, CA, USA
One can suitably define geodesic erosion as well. These operators can be repeated to obtain higher order dilations or erosions. 2. Thinning/Thickening: One can define the notions of thinning/thickening by considering two structuring elements, respectively, for foreground/background pixels. See Soille (2004) for details. 3. Skeletonization involves reducing the binary image to a one-dimensional caricature to extract the most discerning features in the image. 4. Segmentation of gray-scale images involves partitioning the domain such that pixels within a component (of partition) belong to the same object. Several approaches to achieve this are proposed and the most commonly used one is that of Watershed Transform.
G
Summary In this entry, elementary MM operators on gray-scale images, namely, gray-scale dilation, gray-scale erosion, gray-scale opening, and gray-scale closing are defined. Some of the popularly used operators that use these four elementary operations are then briefly described. These are granulometries, alternating sequential filters, geodesic dilation/ erosion, thinning/thickening, skeletonization, and watershed transform.
Fig. 1 John Cedric Griffiths
Biography Cross-References ▶ Mathematical Morphology
Bibliography Dougherty ER, Lotufo RA (2003) Hands-on morphological image processing, vol 59. SPIE Press, Bellingham Najman L, Talbot H (eds) (2013) Mathematical morphology: from theory to applications. Wiley. https://doi.org/10.1002/ 9781118600788 Soille P (2004) Morphological image analysis. Springer, Berlin/Heidelberg. https://doi.org/10.1007/978-3-662-05088-0
Professor John C. Griffiths was one of the founders of the International Association for Mathematical Geology. His impact on geology and other wide-ranging fields began in Lianelli, County Dyfed, Wales, in 1912. Griffiths earned a B.Sc. (1933), and Ph.D. (1937) degrees in petrology from the University of Wales, a Diploma of the Imperial College in petrology from the Royal College of Science, London, and Ph.D. degree in petrology from the University of London in 1940. His recognition of the importance of practical experiences in teaching how to solve problems came partially from his 7 years (1940–1947) working for Trinidad Leaseholds Ltd. in the British West Indies. He directed more than 50 students during his following 40 years teaching in the Department of Mineralogy and Petrology of the Pennsylvania State
Donald A. Singer has retired.
602
University. Griffiths was recognized as a leader in applying statistics to the analysis of sediments. Between 1948 and 1962 he published many papers on improving analysis of sediments. This work culminated in his 1967 book (Griffiths 1967), Scientific Method in Analysis of Sediments. From this book one can see his increasing interest in modern scientific methods for problem solving such as decision theory, operations research, systems analysis, and search theory. In 1970 with Ondrick (Griffiths and Ondrick 1970), he showed that randomly (uniformly distributed) points on a square surface could be fit with a Poisson distribution when sampled. If part of the square containing the same points was masked, only a negative binomial distribution could be fit to the points demonstrating a clustering of points. The points had not changed, only the ability to see some of them. This is not different from our perception of the clustering of mineral deposits – deposits known are mostly based on exposed rocks and exclude areas masked by younger cover. Insights like this led to his controversial proposal to grid-drill the United States to obtain unbiased data for national planning. His teaching ability was so remarkable that the International Association for Mathematical Geosciences gives an annual award in his name for outstanding teaching in the application of mathematics to the geosciences. Students who expected to learn everything Griffiths knew soon recognized that this was impossible because he was continuing to learn and share new aspects of science every day. His students
Griffiths, John Cedric
learned applied statistics of spot, stratified and channel sampling schemes with pebble measurements in a partially layered gravel pit and in measures of size and shape of quartz grains and point counting of minerals in thin sections. But each student was also exposed to a different set of classes throughout the university education. Classes varied for each student and ranged from industrial engineering, computer science, mineral economics, agronomy, chemistry, mathematical statistics, geology, to psychology. From these classes, students learned the commonality of problems with major differences being each discipline’s language barrier. His teaching methods were unusual and by no means obvious to others. His students were taught how to define and solve scientific problems. The teaching award in his name by the International Association for Mathematical Geoscience was certainly appropriate.
Bibliography Griffiths JC (1967) Scientific method in analysis of sediments. McGrawHill, New York. 508 p Griffiths JC, Ondrick CW (1970) Structure by sampling in the geosciences. In: Patil GP (ed) Random counts in physical science, geoscience, and business: the Penn State statistics series. The Pennsylvania State University Press, University Park/London, pp 31–55
H
Harbaugh, John W. Johannes Wendebourg TotalEnergies Exploration, Paris, France
Fig. 1 John W. Harbaugh, Courtesy of family Harbaugh’s private collection
Biography John W. Harbaugh (born August 6, 1926, died July 28, 2019) was a professor of mathematical geology at Stanford University where he taught quantitative methods for 44 years from 1955–1999. He received his undergraduate degree in geology from the University of Kansas in 1949 and his PhD from the University of Wisconsin in 1955. His main contributions to the field of mathematical geology are in the area of geological computer modeling where he pioneered modern simulation techniques, and in resource estimation methods. In 1968, he cofounded the International Association of Mathematical Geology (IAMG), from which he received the Krumbein award in 1985.
As a field geologist, Harbaugh worked mainly on carbonate rocks but quickly became interested in computer applications. His earliest work was on trend surfaces but quickly he moved to computer simulation. In 1970, he published together with Graeme Bonham-Carter the seminal book on Computer Simulation in Geology which treats a wide variety of subjects from stratigraphic modeling to geomorphology, including numerical methods (Harbaugh and Bonham-Carter 1970). This visionary book came into fruition some 15+ years later in the form of Stanford’s Geomath program that he founded and ran from the early 1980s until 1999 and that was dedicated to the computer simulation of geological processes. The software called SEDSIM that he and his coworkers developed pioneered modern geological forward modeling techniques using novel Lagrangian numerical methods (Tetzlaff and Harbaugh 1989). Several monographs resulted from this work, including simulation of processes of sediment transport and deposition, and fluid flow in sedimentary rocks. In the area of resource estimation, Harbaugh applied various statistical methods (Harbaugh et al. 1977) and worked closely together with John C Davis which culminated in a 500 + page monograph on risk analysis in oil and gas exploration (Harbaugh et al. 1995). Harbaugh also was interested in Markov processes applied to geology, as they are not purely random, and applied them to spatial representations of geology (Lin and Harbaugh 1984).
Bibliography Harbaugh JW, Bonham-Carter G (1970) Computer simulation in geology. Wiley Interscience, New York. 575 pp Harbaugh JW, Doveton JH, Davis JC (1977) Probability methods in oil exploration. Wiley-Interscience, New York. 269 pp Harbaugh JW, Davis JC, Wendebourg J (1995) Computing risk for oil prospects: principles and programs. Pergamon Press, Oxford. 464 p Lin C, Harbaugh JW (1984) Graphic display of two- and threedimensional Markov computer models in geology. Van Nostrand Reinhold, New York. 180 pp Tetzlaff DM, Harbaugh JW (1989) Simulating clastic sedimentation. Van Nostrand Reinhold, New York. 202 pp
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
604
Harff, Jan Martin Meschede Institute of Geography and Geology, University of Greifswald, Greifswald, Germany
Fig. 1 Jan Harff, 2002, on board with Professor Albrecht Penck (Photo: Foto Manthey)
Biography Jan Harff was born in Güstrow in Mecklenburg (East Germany). After his Abitur in 1961 his first attempt to begin a study of architecture at the Technical University of Berlin failed caused by the Berlin wall construction. After some years in the building sector, he finally began his study of geology at the Humboldt University in East Berlin in 1964. The centrally controlled GDR university reform of 1968 forced him to continue his studies at the University of Greifswald in the northeasternmost part of Germany, where he finished his diploma in 1963. After this, he wanted to specialize in marine geology trying to get a position as a doctoral student at the Institute of Marine Studies in Warnemünde. On political reasons, however, he was not allowed to participate in the offshore research project, because he did not get the admission of the former socialistic regime of the GDR to pass the maritime boundary (Fig. 1). Mainly caused by this denial, Jan Harff started his career as a mathematical geologist, first with a PhD thesis on classification methods on geophysical borehole data. After a short time at the district office of geology in Rostock, he switched to the Central Institute of Earth Physics (ZIPE, now German Research Centre for Geosciences, GFZ), where he was responsible to set up a department of mathematical geology,
Harff, Jan
a branch of geosciences which developed mainly in the 1970s. The main target of his investigations was to optimize the exploration methods of oil and gas using a theoretical basin model of the North German-Polish basin. In 1991, after the German reunification, Jan Harff was offered the position as head of the project group mathematical geology at the newly formed GFZ (which substituted the ZIPE). However, after only 3 months at the GFZ, he got the chance of his life to become the head of the section marine geology at the Leibniz Institute of Baltic Research (IOW) in Warnemünde. With this job, his dream of marine geology came true and he could finally do what he wanted to do in the beginning of his career. As member of the Institute of Baltic Research, Jan Harff was able to sail on various cruises, mainly using the IOW research vessel Professor Albrecht Penck in the Baltic Sea. Moreover, participations and expedition leaderships on international cruises with different German and Russian research vessels were a major part of his research activities. He concentrated on coastal regions and modeled, for instance, changes in the coastlines of the Baltic Sea according to the post-glacial uplift of Scandinavia and tried to model their future development based on climatic change data. Additionally, he also investigated the coastal areas of Vietnam and Southern China. After his retirement at the IOW in 2008, Jan Harff moved to the University of Szczecin in Poland and took over a professorship in marine geosciences. This position allowed him to proceed with his international research projects, in particular in the coastal region of the South China (von Storch and Dietrich, 2017). Together with three colleagues from different geomarine disciplines, he edited the Encyclopedia of Marine Geosciences which in 2017 gained the “Mary B. Ansari Best Geoscience Research Resource Work Award” of the Geoscience Information Society (GSIS). Jan Harff was several times awarded: in 1989 with the FriedrichStammberger Prize for geology in the former GDR, in 1998 with the Krumbein Medal of the International Association for Mathematical Geology (IAMG), in 2009 with the Serge-vonBubnoff Medal of the German Geological Society (DGG), and in 2013 with the Chinese National Prize.
References von Storch H, Dietrich R (2017) Jan Harff – zwischen Welten. http://www. hvonstorch.de/klima/Media/interviews/jan.harff.pdf. Last accessed 15 May 2020
High-Order Spatial Stochastic Models
High-Order Spatial Stochastic Models Roussos Dimitrakopoulos and Lingqing Yao COSMO – Stochastic Mine Planning Laboratory, Department of Mining and Materials Engineering, McGill University, Montreal, Canada
Definition High-order spatial stochastic models refer to a general nonGaussian, nonlinear geostatistical modeling framework that consistently characterizes complex spatial architectures and facilitates the spatial uncertainty quantification of pertinent attributes of the earth sciences and engineering phenomena predicted from finite measurements. High-order spatial statistics such as spatial cumulants characterize non-Gaussian spatial random fields and facilitate the consistent description of complex nonlinear directional spatial patterns and the multiple point connectivity of extreme values. The corresponding high-order stochastic simulation approaches are data-driven and built upon the random field theory, beyond both the conventional second-order and the multipoint statistics based simulation frameworks.
Introduction Traditional geostatistical simulation methods utilize covariance/variogram functions to characterize the so-called second-order spatial statistics and fully define the related multi-Gaussian random field. These traditional simulation methods have two major limitations. Firstly, most of the natural attributes are not compliant with Gaussian random fields and their symmetric bell-shape probability distributions as well as their maximum entropy aspects that conflict with the structured behavior of actual geological spatial patterns. In addition, second-order spatial statistics only capture the correlations between pairs of points and hence are unable to characterize more complex spatial structures that are typical in geological phenomena, such as curvilinear features, overlaying geological events, complex nonlinear patterns, directional multiple point periodicity, and so on. An early attempt to address the above shortcomings and to eliminate Gaussian assumptions was sequential indicator simulation (Journel 1989; Journel and Alabert 1989). This was followed by the introduction of multiple-point statistics (MPS) by Prof. Andre Journel at his Stanford Centre for Reservoir Forecasting, Stanford University, as opposed to the previously used twopoint second-order statistics; this led to the development of the multiple-point simulation framework (Guardiano and Srivastava 1993; Journel 1993, 2003, 2005). The first
605
algorithm in this direction was SNESIM (Strébelle 2000; Strebelle 2002), a typical pixel-based MPS method, followed by several new patch-based MPS methods (Remy et al. 2009; Mariethoz et al. 2010; Mariethoz and Caers 2014; GómezHernández and Srivastava 2021). Despite improvements in terms of reproducing spatial connectivity over the secondorder stochastic simulation approaches, MPS methods require a training image (TI) as the source for the related inference (Guardiano and Srivastava 1993; Journel 1997, 2003, Journel and Zhang 2006) of multiple-point statistics. The term training image refers to an exhaustive image reflecting prior geological interpretations and acting as statistical analog to the underlying attributes of interest. This training-image driven nature generates a major limitation when MPS methods are applied in cases where statistical conflicts exist between the sample data available and the TI. This is a prominent issue when relatively reasonable sample data are available, as is typical in mining applications (Osterholt and Dimitrakopoulos 2007; Goodfellow et al. 2012). Figure 1 shows an example of variogram reproduction issues from a SNESIM application at a mineral deposit, where realizations reproduce the variogram of the training image but not that of the available data. In addition, the experimentally selected multiple-point pattern (or moment) in MPS does not consider the consistency or relations between higher and lower order spatial moments over a series of orders used or facilitates data-driven connectivity of complex spatial patterns. Furthermore, while the MPS framework is more informed than those based on twopoint or second-order statistics, the possibility of new approaches that are even more informed, given the wealth of information used in applications, remains to be addressed. In recent years, a new high-order spatial modeling framework, based on measures of high-order complexity in spatial architectures, termed spatial cumulants, has been proposed. This approach not only facilitates dropping any distributional assumptions and data transformations but also develops spatial statistical models that are data-driven while capitalizing from integrating substantially more information from the underlying geological phenomena and related data. This includes complex, diverse, nonlinear, multiple-point directional or periodic patterns, advanced data-driven analytics, and the reconstruction of consistency between lower- and higher-order spatial complexity in the data used. Cumulants are combinations of moments of statistical parameters that characterize non-Gaussian random fields. Research to date provides definitions, geological interpretations, and implementations of high-order spatial cumulants that are used, for example, in the high-dimensional space of Legendre polynomials to stochastically simulate complex, non-Gaussian, nonlinear spatial phenomena. The principal innovation, and a fundamentally challenging problem to address, is the development of mathematical models that
H
606
High-Order Spatial Stochastic Models
High-Order Spatial Stochastic Models, Fig. 1 Indicator variogram reproduction of SNESIM simulations (light gray) vs training image (dotted line) and data (solid line). (From Goodfellow et al. 2012)
move well beyond the “second-order models” to be more data-driven and consistent, as well as more informed than the previous MPS framework. The related mathematical models in turn open new avenues for the complex spatial modelling of geological uncertainty. The advantages of this work include (a) the absence of distributional assumptions and pre/post-processing steps, e.g., data normalization or training image (TI) filtering; (b) the use of high-order spatial relations present in the available data to the simulation process (data-driven, not TI-driven), adding a wealth of additional information; and (c) the generation of complex spatial patterns to reproduce any data distribution and related highorder spatial cumulants, respecting the consistency of different high-order spatial relations. These developments define a mathematically consistent and more informed alternative to MPS, with substantial additional advantages (e.g., it is datadriven and reconstructs consistently the lower-order spatial complexity in the data used) for applications, particularly in cases where reasonably sized data sets are available. Notable is that the so-termed high-order stochastic simulation framework requires no fitting of parametric statistical models; similar to the MPS framework, high-order multiple-point relations are calculated directly from the available data sources during the simulation process, making the utilization of the related algorithms straightforward for the user. At the same time, the reproduction of related high-order spatial statistics by the resulting simulated realizations can be assessed through the cumulant maps mentioned in a subsequent section.
mathematical representation of high-order spatial statistics of a random field that quantify complex directional multiple point statistical interactions and characterize the spatial architectures of random fields. Related work in other fields includes signal processing (Nikias and Petropulu 1993; Zhang 2005; others), astrophysics (e.g. Gaztanaga et al. 2000), and image processing (Zetzsche and Krieger 2001; Boulemnadjel et al. 2018). Moments and cumulants are critical statistical concepts, given that the probability distribution of a random variable can be transformed accordingly as a moment-generating or cumulant-generating function. Let ðO, I, PÞ be a probability space; given a real-valued random variable, Z, as a measurable function defined on the sample space Ω with a probability density function fZ(z), the rth (r 0) moment of Z is expressed as Mom½Z, . . . , Z ¼ E½Z r ¼
Spatial cumulants introduced in Dimitrakopoulos et al. (2010) for earth science and engineering problems are a
1
zr f Z ðzÞdz: ð1Þ
The moment-generating function (Rosenblatt 1985) of Z is defined by MðwÞ ¼ E½ewz ¼
þ1 1
ewz f Z ðzÞdz:
ð2Þ
The cumulant-generating function of Z is defined as the logarithm of the moment-generating function M(w) K ðwÞ ¼ lnðE½ewz Þ:
High-Order Spatial Cumulants
þ1
ð3Þ
The rth moment of Z can be obtained as the rth derivative of M(w) at the origin through the Taylor expansion of the moment-generating function about the origin
High-Order Spatial Stochastic Models
607
MðwÞ ¼ E½ewz ¼ E 1 þ wZ þ . . . þ
wr Z r þ ... r!
r
¼
1
wr Mom Z, . . . , Z
r¼0
:
r!
ð4Þ
and moments can be readily established through the generalization of Eqs. (4) and (5) to multivariate probability distributions, the spatial cumulants can be written as a combination of spatial moments and vice versa (Smith 1995; Dimitrakopoulos et al. 2010). For a zero-mean stationary random field, the second-order spatial cumulant corresponds to the covariance and is given by
The cumulants of Z are the coefficients in the Taylor expansion of the cumulant-generating function, K(w), about the origin
K ðwÞ ¼ lnðE½ewz Þ ¼
r¼0
wr Cum Z, . . . , Z r!
ð6Þ
the third- and fourth-order spatial cumulants are defined respectively as
r 1
cz2 ðhÞ ¼ E½Z ðuÞZðu þ hÞ,
ð5Þ
cz3 ðh1 , h2 Þ ¼ E½Z ðuÞZðu þ h1 ÞZ ðu þ h2 Þ,
:
The relationship between the cumulants and moments can be further derived by extracting the coefficients of the two Taylor series (4) and (5) through differentiation. Spatial cumulants extend the concept of cumulants to the multivariate probability distribution of a random field endowed with more complex spatial structures, similar in a way to how the covariance function extends the second-order moments to represent two-point spatial statistics. The spatial cumulants associated with any subset of random variables of the random field Z(u) are statistical functions of distance vectors among the locations of the related random variables. The spatial configuration of an arbitrary set of random variables Z(u0), Z(u1), . . ., Z(un), of the random field Z(u), can be uniquely determined by the so-called spatial template with a center node noted as Z(u0) and the distance vectors h1, . . ., hn, defined as those vectors between each node other than Z(u0) to the center node (Fig. 2). The r-th spatial cumulants of random variables within a spatial template can be expressed as czr ðh1 , . . . , hr1 Þ. While the relations between cumulants
High-Order Spatial Stochastic Models, Fig. 2 Spatial template of Z(u0), Z(u1), . . ., Z(un)
ð7Þ
and cz4 ðh1 , h2 , h3 Þ ¼ E½Z ðuÞZ ðu þ h1 ÞZ ðu þ h2 ÞZ ðu þ h3 Þ cz2 ðh1 Þcz2 ðh2 h3 Þ cz2 ðh2 Þcz2 ðh3 h1 Þ cz2 ðh3 Þcz2 ðh1 h2 Þ: ð8Þ More specifically, spatial cumulants of order r (r 3) consist of a combination of second-order spatial statistics and other spatial moments of order no greater than r. The experimental computation of spatial cumulants is similar to that of experimental variograms, while the mathematical formulae are more sophisticated, as exemplified by Eqs. (7) and (8). The calculation of experimental spatial cumulants can be carried out either on an exhaustive training image or on irregularly distributed samples by allowing a certain tolerance in the corresponding spatial template (Fig. 3). A computational algorithm and related computer program for high-order spatial cumulants is available in the
High-Order Spatial Stochastic Models, Fig. 3 An example of irregular template for third-order cumulant calculation (from Dimitrakopoulos et al. 2010)
H
608
High-Order Spatial Stochastic Models
High-Order Spatial Stochastic Models, Fig. 4 Interpretations of the high positive anomalies in the third-order map. (1.a) to (2.a) explains, respectively, the high anomalies in (1.b) to (2.b) from the interactions
between the black zones in the red rectangles (from Mustapha and Dimitrakopoulos 2010a, b)
public domain (Mustapha and Dimitrakopoulos 2010a, b). This algorithm provides a tool to generate high-order cumulant maps as a typical application of spatial cumulants to delineate the spatial patterns of natural attributes. Figure 4 shows an example of third-order cumulant maps to characterize the interactions among attributes at multiple locations. In general, high-order spatial cumulants provide a statistical entity to capture the spatial architecture of natural attributes, including spatial directionality, periodicity, homogeneity, and connectivity, in multiple directions and among multiple points. An additional property worth noting is that the spatial cumulants of the Gaussian random field vanish for orders higher than 2, indicating that high-order spatial cumulants are essential to characterize the nonGaussian features of natural phenomena.
High-Order Simulation Methods A general framework for generating realizations from a random field is the sequential simulation framework (Journel 1989, 1994; Deutsch and Journel 1992; Gómez-Hernández and Srivastava 2021), which has been applied to many popular stochastic simulation methods, as well as to the highorder stochastic simulation method (hosim) presented herein. Consider a random field Z(u), u Rd(d ¼ 1, 2, 3), consisting of a family of random variables as Z(u1), Z(u2), . . ., Z(uN) on a finite discretized domain, with a multivariate probability density function denoted as f(z1, z2, . . ., zN). If an initial data set denoted as Λ0 ¼ {ζ1, . . ., ζn} is available, then the joint probability density function (PDF) can be decomposed (Johnson 1987) as
High-Order Spatial Stochastic Models
609
f ðz1 , z2 , . . . , zN jL0 Þ ¼ f ðz1 jL0 Þf ðz2 jL1 Þ f ðzN jLN1 Þ, ð9Þ where Λi þ 1 ¼ {zi} [ Λi, i ¼ 0, 1, . . ., N 1. The decomposition of joint PDF in Eq. (9) allows the generation of a realization from the related random field by sampling from a sequence of conditional probability distributions with density f(zi| Λi 1), i ¼ 1, . . ., N. In practice, the conditioning data Λi are often confined to a limited neighborhood considering the computational efficiency as well as the statistical irrelevancy of the farther data due to the so-called screen effect (Dimitrakopoulos and Luo 2004). The main difference and advantage of high-order stochastic simulation from other previously developed simulation approaches, including MPS methods, is that they are not only free of distributional assumptions but also fully account for the high-order spatial statistics from the available data in a consistent, data-driven, and substantially more complete and informed manner. As the related technical literature shows (Minniakhmetov et al. 2018), the joint PDF of the random field is approximated by an orthogonal polynomial expansion series. Given a complete system of orthogonal functions, ’1(z), ’2(z), . . ., defined on interval [a, b], then f(z) can be approximated as o
f ðzÞ
L ’ ðzÞ, m¼0 m m
ð10Þ
where o is the maximum order of the polynomials in the truncated expansion series. Similarly, an (n þ 1)-variate piecewise continuous function f(z0, z1, z2, . . ., zn) can be approximated as f ðz0 , z1 , . . . zn Þ on mn
o0 m0 ¼0
o1
m ¼0 1
ð11Þ
L ’ ðz Þ’m2 ðz1 Þ ’mn ðzn Þ: ¼0 m0 ,m1 ,...,mn m0 0
Especially, consider f(z0, z1, . . . zn) as the joint PDF of random variables, Z(u0), Z(u1), . . ., Z(un), defined on a random field Z(u); the coefficients Lm0 ,m1 ,...,mn can be derived as high-order moments according to the orthogonal property as (Minniakhmetov et al. 2018)
Lm0 ,m1 ,...mn E ’m0 ðz0 Þ’m1 ðz1 Þ ’mn ðzn Þ N h1 ,h2 ,...hn 1 ’m0 zk0 ’m1 zk1 ’mn zkn : k¼1
N h1 ,h2 ,...hn
ð12Þ The approximation in Eq. (12) also shows the experimental computation of the coefficients Lm0 ,m1 ,...mn , where N h1 ,h2 ,...hn is the number of replicates retrieved from the available data and zk0 , zk1 , . . . , zkn are the corresponding attribute values given a spatial template of distance vectors as h1, . . ., hn.
Note that the early approach in Mustapha and Dimitrkopoulos (2010b) uses Legendre polynomials as the orthogonal bases to develop the approximation of the joint PDF in Eq. (11). However, the numerical stability of approximation by Legendre polynomial series is influenced by the available data. Thus, Minniakhmetov et al. (2018) propose the use of Legendre-like splines to replace the Legendre polynomials as the orthogonal bases. This method is shown to perform better in the reproduction of spatial connectivity of extreme values, which are more likely to cause numerical instability in the expansion series. From the perspective of computational efficiency, a new computational model of high-order simulation has been proposed to replace the explicit calculation of spatial cumulants in the Legendre polynomial expansion series by a simple kernel-like function that can be computed in polynomial time (Yao et al. 2018). A statistical learning framework of high-order simulation is also developed based on a new proposed concept of spatial Legendre moment kernel so that the joint PDF of the random field is estimated in a stable solution domain through a learning algorithm, with possible extension to develop a TI-free high-order simulation (Yao et al. 2020; Yao et al. 2021a, 2021b). Other advancements in high-order simulation methods include block-support highorder simulation (de Carvalho et al. 2019), joint high-order simulation of multivariate attributes (Minniakhmetov and Dimitrakopoulos 2016), and high-order simulation of categorical data (Minniakhmetov and Dimitrakopoulos 2017, 2021). An algorithmic description of high-order stochastic simulation can be summarized as the following: 1. Draw a random path to visit all the nodes to be simulated. 2. Approximate the local conditional probability density function of each node based on Legendre polynomial expansion series in Eqs. (11) and (12) by computing the coefficients from the experimental high-order spatial statistics, which are calculated directly from the data and TI used. 3. Draw a random value from the local conditional probability density function for the simulated node and add it into the conditioning data. 4. Repeat from step (1) until all the nodes in the random path have been visited.
Examples High-order sequential simulations (hosim) have been used in different applications, including for generating realizations of mineral deposit grades (e.g., Minniakhmetov and Dimitrakopoulos 2016; Minniakhmetov et al. 2018; de Carvalho et al. 2019) for mine production planning and the heterogeneity representation for subsurface flow (e.g.,
H
610
High-Order Spatial Stochastic Models
High-Order Spatial Stochastic Models, Fig. 5 Sections of a simulated realization using hosim from the gold deposit: (a) horizontal sect. 1; (b) horizontal sect. 2; and (c) a vertical section. Colors indicate grades in g/t. (From Minniakhmetov et al. 2018)
High-Order Spatial Stochastic Models, Fig. 6 The same sections as in Fig. 5 but from simulated realization of the gold deposit using sgsim: (a) horizontal sect. 1; (b) horizontal sect. 2; and (c) a vertical section. Colors indicate grades in g/t. (From Minniakhmetov et al. 2018)
Mustapha et al. 2011; Tamayo-Mas et al. 2016). To demonstrate the main aspects of high-order spatial statistics and simulation methods discussed in the previous sections, as well as the ability of high-order sequential simulation to capture data-driven spatial complexity, some examples from an application at a typical gold deposit (Minniakhmetov et al. 2018) are included in this section. The data available in an operating gold mine cover an area around 4 km2 and include 288 exploration drillholes; a training image is available from the blasthole samples. High-order sequential simulation is used to generate realizations of the gold grades. The high-order statistics needed are directly calculated from the available data and TI employed by the hosim algorithm; this is analogous to the approach used by MPS simulation algorithms; no fitting of parametric highorder statistical models is part of the related process. Figure 5 shows cross-sections from a realization generated using highorder sequential simulation, while Fig. 6 shows, for reasons of comparison, the same sections generated using the conventional sequential Gaussian simulation (sgsim). The differences in overall patterns are clear and as expected.
With respect to the reproduction of data histograms and variograms, realizations from both hosim and sgsim show a reasonable reproduction. As expected, high-order simulations also reproduce high-order spatial data statistics. For example, Fig. 7 shows the fourth-order cumulant maps of the data, the training image used, a hosim realization, and a realization from sgsim. Last, as seen in the visualization of the crosssections in Figs. 5 and 6, hosim generates a better reproduction of the spatial connectivity of extreme values than sgsim, which is also shown in the connectivity measures for the 99th gold grade percentile in Fig. 8. Note that comparisons of simulated realizations generated with high-order sequential simulation approaches to realizations generated with the MPS simulation method filtersim (Remy et al. 2009) also show a better performance of the high-order statistics framework with respect to integrating and reproducing complex spatial patterns (e.g., Mustapha et al. 2011; Yao et al. 2018).
High-Order Spatial Stochastic Models
611
H
High-Order Spatial Stochastic Models, Fig. 7 The fourth-order spatial cumulant maps of (a) the drill-hole data, (b) the training image (TI), (c) a simulation using hosim, and (d) a simulated realization using sgsim. (From Minniakhmetov et al. 2018)
Summary and Conclusions High-order stochastic simulation provides a general, datadriven sequential simulation framework free of distributional assumptions that characterize spatially distributed and varying natural phenomena with a diversity of complex spatial textures and architectures. The fundamental concepts in highorder sequential simulation are (i) the definition of high-order spatial statistics, such as spatial cumulants and (ii) the estimation of multivariate probability distributions of a random field based on mathematical models that incorporate the highorder spatial statistics. The introduction and use of high-order spatial statistics and, more specifically, spatial cumulants is critically important and a major advancement compared to the previously developed framework of multiple point statistics and related simulation approaches. As noted in the introduction, spatial
cumulants characterize non-Gaussian spatial random fields and exhibit complex nonlinear directional spatial patterns and multiple point connectivity of extreme values, while making the subsequent simulation process consistent over a series of orders and facilitating a data-driven process. This consistency is not the case with MPS simulation methods, which are driven by a statistically less informed and arbitrarily selected pattern or moment from a training image. The data-driven nature of spatial cumulant inference facilitates (i) the development of high-order sequential stochastic simulation models, such as the ones mentioned above; (ii) the optimization of data processing and analytics, including the mitigation of the statistical conflicts among samples and other additional information such as the TI; and (iii) the utilization of advanced learning algorithms constrained to the high-order spatial statistics of the available data. It should be stressed that the high-order stochastic simulation framework is
612
High-Order Spatial Stochastic Models
High-Order Spatial Stochastic Models, Fig. 8 The connectivity along x (left subfigure) and y (right subfigure) for 99% percentile: the TI (blue line), P50 statistics for hosim simulations (solid red line), P10 and P90 statistics for hosim simulations (dashed red line), and P50 statistics for simulations using sgsim (grey line). (From Minniakhmetov et al. 2018)
nonparametric; that is, spatial cumulants or other type of highorder spatial statistics needed are calculated directly during the high-order sequential simulation process, and no parametric model fitting is introduced; thus making the use of related algorithms very practical and user friendly. Note that computer programs for both calculating highorder cumulants as well as high-order sequential simulation with the method described herein are available in the public domain (Mustapha and Dimitrakopoulos 2010a, b; Minniakhmetov et al. 2018). The practice of high-order sequential simulation is, evidently, neither more complex nor simpler than other modern geostatistical simulation frameworks.
Cross-References ▶ Geostatistics ▶ Journel, André ▶ Multiple Point Statistics ▶ Probability Density Function ▶ Random Function ▶ Random Variable ▶ Sequential Gaussian Simulation ▶ Simulation ▶ Spatial Autocorrelation ▶ Spatial Statistics
Bibliography Boulemnadjel A, Kaabache B, Kharfouchi S, Hachouf F (2018) Higherorder spatial statistics: a strong alternative. In image processing, 2018 3rd international conference on pattern analysis and intelligent systems (PAIS), pp. 1-6. https://doi.org/10.1109/PAIS.2018.8598513 de Carvalho JP, Dimitrakopoulos R, Minniakhmetov I (2019) High-order block support spatial simulation method and its application at a gold deposit. Math Geosci 51(6):793–810. https://doi.org/10.1007/ s11004-019-09784-x Deutsch CV, Journel AG (1992) GSLIB: Geostatistical software library and user’s guide. Oxford University Press, New York Dimitrakopoulos R, Luo X (2004) Generalized sequential gaussian simulation on group size n and screen-effect approximations for large field simulations. Math Geol 36(5):567–591. https://doi.org/ 10.1023/B:MATG.0000037737.11615.df
Dimitrakopoulos R, Mustapha H, Gloaguen E (2010) High-order statistics of spatial random fields: exploring spatial cumulants for modeling complex non-Gaussian and non-linear phenomena. Math Geosci 42(1):65–99. https://doi.org/10.1007/s11004-009-9258-9 Gaztanaga E, Fosalba P, Elizalde E (2000) Gravitational evolution of the large-scale probability density distribution: the Edgeworth and gamma expansions. Astrophys J 539(2):522–531 Gómez-Hernández JJ, Srivastava RM (2021) One step at a time: the origins of sequential simulation and beyond. Math Geosci 53(2): 193–209. https://doi.org/10.1007/s11004-021-09926 Goodfellow R, Albor Consuegra F, Dimitrakopoulos R, Lloyd T (2012) Quantifying multi-element and volumetric uncertainty, Coleman McCreedy deposit, Ontario. Canada Computers & Geosciences 42: 71–78. https://doi.org/10.1016/j.cageo.2012.02.018 Guardiano F, Srivastava RM (1993) Multivariate geostatistics: beyond bivariate moments. In: Soares A (ed) Geostatistics Tróia ‘92, Quantitative geology and geostatistics, vol 5. Kluwer Academic, Dordrecht, pp 133–144. https://doi.org/10.1007/978-94-011-1739-5_12 Johnson ME (1987) Multivariate generation techniques. In: Multivariate statistical simulation. Wiley, pp. 43–48. https://doi.org/10.1002/ 9781118150740.ch3 Journel A (1989) Fundamentals of Geostatistics in five lessons. American Geophysical Union, Book Series 8. https://doi.org/10.1029/ SC008 Journel AG (1993) Geostatistics: roadblocks and challenges. In: Soares A. (eds) Geostatistics Tróia ‘92. Quantitative geology and Geostatistics, vol 5. Springer. Dordrecht. https://doi.org/10.1007/ 978-94-011-1739-5_18 Journel A (1994) Modeling uncertainty: some conceptual thoughts. In: Dimitrakopoulos R (ed) Geostatistics for the next century, Quantitative geology and Geostatistics, vol 6. Springer, Dordrecht, pp 30–43. https://doi.org/10.1007/978-94-011-0824-9_5 Journel AG (1997) Deterministic geostatistics: a new visit. In: Baafi E, Schofield N (eds) Geostatistics Woolongong ‘96. Kluwer, Dordrecht, pp 213–224 Journel AG (2003) Multiple-point geostatistics: a state of the art. Report 16, Stanford Center for Reservoir Forecasting, Stanford University, Stanford Ca, USA. (available at pangea.stanford.edu) Journel AG (2005) Beyond covariance: The advent of multiple-point geostatistics. In Geostatistics Banff 2004, Springer, pp. 225–233. https://doi.org/10.1007/978-1-4020-3610-1_23 Journel AG, Alabert F (1989) Non-Gaussian data expansion in the earth sciences. Terra Nova 1(2):123–134. https://doi.org/10.1111/j.13653121.1989.tb00344.x Journel AG, Zhang T (2006) The necessity of a multiple-point prior model. Math Geo 38(5):591–610 Mariethoz G, Caers J (2014) Multiple-point geostatistics: stochastic modeling with training images. Wiley, Hoboken Mariethoz G, Renard P, Straubhaar J (2010) The direct sampling method to perform multiple-point simulation. Water Resour Res 46(W11536):10.1029/2008WR007621 Minniakhmetov I, Dimitrakopoulos R (2016) Joint high-order simulation of spatially correlated variables using high-order spatial
Hilbert Space statistics. Math Geosci 49(1):39–66. https://doi.org/10.1007/s11004016-9662-x Minniakhmetov I, Dimitrakopoulos R (2017) A high-order, data-driven framework for joint simulation of categorical variables. In: GómezHernández JJ, Rodrigo-Ilarri J, Rodrigo-Clavero ME, Cassiraga E, Vargas-Guzmán JA (eds) Geostatistics valencia 2016. Springer International Publishing, Cham, pp 287–301. https://doi.org/10.1007/ 978-3-319-46819-8_19 Minniakhmetov I, Dimitrakopoulos R (2021) High-order data-driven spatial simulation of categorical variables. Math Geosci 54(1): 23–45. https://doi.org/10.1007/s11004-021-09943-z Minniakhmetov I, Dimitrakopoulos R, Godoy M (2018) High-order spatial simulation using Legendre-like orthogonal splines. Math Geosci 50(7):753–780. https://doi.org/10.1007/s11004-018-9741-2 Mustapha H, Dimitrakopoulos R (2010a) A new approach for geological pattern recognition using high-order spatial cumulants. Comput Geosci 36(3):313–334. https://doi.org/10.1016/j.cageo.2009. 04.015. (code at: www.iamg.org/documents/oldftp/VOL36/v36-0306.zip ) Mustapha H, Dimitrakopoulos R (2010b) High-order stochastic simulation of complex spatially distributed natural phenomena. Math Geosci 42(5):457–485. https://doi.org/10.1007/s11004-010-9291-8 Mustapha H, Dimitrakopoulos R, Chatterjee S (2011) Geologic heterogeneity representation using high-order spatial cumulants for subsurface flow and transport simulations. Water Resour Res 47. https://doi. org/10.1029/2010WR009515 Nikias CL, Petropulu AP (1993) Higher-order spectra analysis : a nonlinear signal processing framework. PTR Prentice Hall, Englewood Cliffs.J Osterholt V, Dimitrakopoulos R (2007) Simulation of wireframes and geometric features with multiple-point techniques: application at Yandi iron ore deposit, Australia. In: Orebody modelling and strategic mine planning, vol 14. 2 edn. AusIMM Spectrum Series, pp 51–60 Remy N, Boucher A, Wu J (2009) Applied geostatistics with SGeMS: a user’s guide. Cambridge University Press, Cambridge, UK Rosenblatt M (1985) Stationary sequences and random fields. Birkhäuser, Boston Smith PJ (1995) A recursive formulation of the old problem of obtaining moments from cumulants and vice versa. Am Stat 49(2):217–218. https://doi.org/10.1080/00031305.1995.10476146 Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34(1):1–21. https:// doi.org/10.1023/A:1014009426274 Strébelle S (2000) Sequential simulation drawing structures from training images. PhD thesis, Stanford University, Sanford Tamayo-Mas E, Mustapha H, Dimitrakopoulos R (2016) Testing geological heterogeneity representations for enhanced oil recovery techniques. J Pet Sci Eng 146:222–240. https://doi.org/10.1016/j.petrol. 2016.04.027 Yao L, Dimitrakopoulos R, Gamache M (2018) A new computational model of high-order stochastic simulation based on spatial Legendre moments. Math Geosci 50(8):929–960. https://doi.org/10.1007/ s11004-018-9744-z Yao L, Dimitrakopoulos R, Gamache M (2020) High-order sequential simulation via statistical learning in reproducing kernel Hilbert space. Math Geosci 52(5):693–723. https://doi.org/10.1007/ s11004-019-09843-3 Yao L, Dimitrakopoulos R, Gamache M (2021a) Learning high-order spatial statistics at multiple scales: a kernel-based stochastic simulation algorithm and its implementation. Comput Geosci:104702. https://doi.org/10.1016/j.cageo.2021.104702 Yao L, Dimitrakopoulos R, Gamache M (2021b) Training image free high-order stochastic simulation based on aggregated kernel statistics. Math Geosci. https://doi.org/10.1007/s11004-021-09923-3
613 Zetzsche C, Krieger G (2001) Intrinsic dimensionality: nonlinear image operators and higher-order statistics. In: Mitra SK, Sicuranza GL (eds) Nonlinear image processing. Academic Press, San Diego, pp 403–448. https://doi.org/10.1016/B978-012500451-0/50014-X Zhang F (2005) A high order cumulants based multivariate nonlinear blind source separation method. Mach Learn 61(1):105–127. https:// doi.org/10.1007/s10994-005-1506-8
Hilbert Space Daisy Arroyo Department of Statistics, Universidad de Concepción, Concepcion, Chile
Definition A Hilbert space is an inner product space that is complete with respect to the norm induced by the inner product and is commonly denoted as H. Completeness in this context means that each Cauchy sequence of elements in space converges to one element in space, in the sense that the norm of differences approaches to zero. Each Hilbert space is therefore also a Banach space (but not the reverse).
Introduction The methods of linear algebra study the concepts of spaces and apply to their transformations, operators, projectors, etc. In abstract spaces, elements are called a set of points, stochastic variables, vectors, and functions that have certain properties. In particular, the Euclidean spaces of n dimensions and the Hilbert spaces of infinite dimensions are of great importance. The Hilbert space concept is a generalization of the Euclidean space. This generalization allows algebraic and geometric notions and techniques applicable to two- and three-dimensional spaces to be extended to arbitrarydimensional spaces, including infinite-dimensional spaces. There are no restrictions on how vectors should look in Hilbert space: they can be three-dimensional vectors, functions, or infinite sequences. The Venn diagram below summarizes the different types of spaces and their relationship to the Hilbert space (Fig. 1). Remembering that: • A vector space is a set X on which two operations are defined called vector addition that takes each ordered pair (x, y) of elements of X to an element x þ y X, and scalar multiplication that takes a pair (l, x) to an element lx X, for l ℝ (or ℂ) and x X. The vector addition must satisfy the following conditions:
H
614
Hilbert Space
Hilbert Space, Fig. 1 Venn diagram with the different types of spaces
(a) x þ y ¼ y þ x, for all x, y X (commutative law). (b) x þ (y þ z) ¼ (x þ y) þ z, for all x, y, z X (associative law). (c) The set X contains an additive identity element, denoted by 0, such that for any vector x X, x þ 0 ¼ x. (d) For each x X, there is an additive inverse x X, such that x þ (x) ¼ 0. On the other hand, the scalar multiplication must satisfy the following conditions: (a) l(x þ y) ¼ lx þ ly, for all x, y X, and l ℝ (or ℂ) (distributive law) (b) (l þ m)x ¼ lx þ mx, for x X, and l, m ℝ (or ℂ) (c) lðmxÞ ¼ ðlmÞx, for x X, and l, m ℝ ðor ℂÞ (d) For all x X, 1x ¼ x (multiplicative identity) • An inner product space is a vector space X along with an inner product on X, where an inner product is a function that takes each ordered pair (x, y) of elements of X to a number hx, yi ℝ (or ℂ) and has the following properties: (a) hx, xi 0, for all x X, and hx, xi ¼ 0 if only if x ¼ 0 (positive definiteness) (b) hx þ y, zi ¼ hx, zi þ hy, zi, for all x, y, z X (additivity) (c) hlx, yi ¼ lhx, yi, for all l ℝ (or ℂ), and all x, y X (homogeneity) (d) hx, yi ¼ hy, xi, for all x, y X (conjugate symmetry), where the bar denotes the complex conjugate of a complex number
• A normed space is a vector space X endowed with a function k∙k : X ! ½0, 1Þ, called the norm on X, with the following properties: (a) klxk ¼ |l|kxk, for all x X, l ℝ (homogeneity) (b) kx þ yk kxk þ kyk, for all x, y X (triangle inequality) (c) kxk ¼ 0 if only if x ¼ 0 (positive definiteness) • A Banach space is a normed vector space X with the property that each Cauchy sequence fxk g1 in k¼1 X converges toward some x X. In cases where the relevant vector space X is a subspace of a larger space, it is important to note that there are two additional requirements: each Cauchy sequence of elements in X must be convergent and the limit of the Cauchy sequence must belong to X. Natural examples of Banach spaces are ℝn and ℂn. The Banach space Lp(ℝ), for 1 < p < 1 , is the space of functions f for which |f|p is integrable: 1
j f ð xÞ j p < 1 :
L ðℝÞ≔ f : ℝ ! ℂ such that p
1
A Hilbert space is a complete inner product space. Every Hilbert space is a Banach space with respect to the norm.
Hilbert Space
615
Historical Material In the first decade of the twentieth century, the German mathematician David Hilbert first described this space in his work on integral equations and Fourier series. Indeed, the definition of a Hilbert space was first given by von Neumann, rather than Hilbert, the origin of the designation “der abstrakte Hilbertsche Raum” in his famous work on unbounded Hermitian operators published in 1929. The name Hilbert space was soon adopted by others, for example, Hermann Weyl in his book The Theory of Groups and Quantum Mechanics published in 1931. After Hilbert’s research, the Austrian-German mathematician Ernst Fischer and the Hungarian mathematician Frigyes Riesz (1907; Fischer 1907) showed that integrable square functions could also be considered as points in a complete inner product space that is equivalent to the Hilbert space.
Similarly, any E measurable subset of ℝd with m(E) > 0 (measure of E must be positive), that is, the space of square integrable functions that are supported on E is a Hilbert space, denoted by L2(E).
The Hilbert Space ℂn The standard inner product in this space is given by (Rudin 1986) n
hx, yi ¼
xi yi , i¼1
where x ¼ (x1, . . ., xn) and y ¼ (y1, . . ., yn), with xi, yi ℂ, and the norm is defined as n
Examples
kxk ¼
All finite-dimensional inner product spaces are Hilbert spaces, for example, Euclidean space with the ordinary dot product. Other examples of Hilbert spaces include spaces of square-integrable functions, spaces of sequences, Sobolev spaces consisting of generalized functions, and Hardy spaces of holomorphic functions. Infinite dimension examples are much more important in applications. Below are some of the most popular Hilbert spaces.
2
The Hilbert Space L (ℝ )
This space is complete, therefore is a finite-dimensional Hilbert space.
The Hilbert Space l2(ℤ) Defined as: 1
ð. . . , x2 , x1 , x0 , x1 , ::Þ : xi ℂ,
jxn j2 < 1 :
n¼1
Among the L (ℝ )-spaces, the case p ¼ 2 is very important. The collection of square integrable functions in ℝd, L2(ℝd), is formed by all measurable functions of complex value f such that p
:
i¼0
l2 ðℤÞ ¼
d
1=2
j xi j 2
d
The space l2(ℤ) is a complex linear space. The inner product is given by 1
hx, yi ¼
xn yn , where n¼1
L2 ℝd ¼
jf ðxÞj2 dx < 1 :
f : ℝd ! ℂ such that ℝd
and the norm is defined as
The inner product is defined as f ðxÞgðxÞdx, 8f , g L2 ℝd ,
h f , gi ¼ ℝ
x ¼ ð. . . , x2 , x1 , x0 , x1 , ::Þ and y ¼ ð. . . , y2 , y1 , y0 , y1 , ::Þ,
1
kxk ¼
1=2
j xn j 2
:
n¼1
d
and the norm is defined as 1=2
jf ðxÞj2 dx
kf k ¼ ℝ
d
:
All infinite dimensional (separable) Hilbert spaces are l2(ℤ) (Stein and Shakarchi 2005). The space l2(ℕ) of square-summable sequences fxn g1 is defined in an n¼1 analogous way.
H
616
Hilbert Space
The Hilbert Space ℂm 3 n The space of all m n matrices with complex entries. The inner product on ℂm n is defined as hA, Bi ¼ trðA BÞ ¼
m
n
aij bij , i¼1 j¼1
where A ¼ (aij), B ¼ (bij) are matrices, tr denotes the trace, and denotes the Hermitian conjugate of a matrix (the complex-conjugate transpose). The corresponding norm m
n
kAk ¼
aij
2
Hilbert Space, Fig. 2 y is the point in P closest to x
1=2
kx yk ¼ min kx zk: zP
i¼1 j¼1
is called the Hilbert-Schmidt norm. The properties of orthogonality and projection in Hilbert spaces, which are of great interest in applications, are briefly described below.
Orthogonality The inner product structure of a Hilbert space allows introducing the concept of orthogonality (Rudin 1986), which makes it possible to visualize vectors and linear subspaces of Hilbert space in a geometric way. If x, y are vectors in a Hilbert space H, then x and y are orthogonal (denoted as x ⊥ y) if hx, yi ¼ 0. The subsets A and B are orthogonal (denoted as A ⊥ B) if x ⊥ y for all x A and for all y B. The orthogonal complement (denoted as A⊥) of a subset A is the set of vectors orthogonal to A: A⊥ ¼ fx H j x⊥y for all y Ag: The orthogonal complement of a subset of a Hilbert space is a closed linear subspace.
Projection Let P be a closed linear subspace of a Hilbert space H, then (Fig. 2): (a) For each x H, there is a unique closest point y P such that
(b) The point y P closest to x H is the unique element of P with the property that (x y) ⊥ P.
Applications Hilbert spaces are the most important tools in the theories of partial differential equations, quantum mechanics, Fourier analysis, and ergodicity. Hilbert spaces serve to clarify and generalize the concept of Fourier expansion and certain linear transformations such as the Fourier transform. The Fourier series are a useful tool for the representation and approximation of periodic functions using trigonometric functions. Each of following Hilbert space has a type of Fourier analysis associated with it: • • • •
L2([a, b])! Fourier series l2([0, n 1])! discrete Fourier transform L2(ℝ)! Fourier transform l2(ℤ)! discrete time Fourier transform
The Hilbert space L2(ℝ) plays a central role in many areas of science; for example, in signal processing, many timevarying signals are interpreted as functions in L2(ℝ), and in quantum mechanics, the term “Hilbert space” simply means L2(ℝ). For the case of L2([π, π]), the Fourier series is an infinite sum of functions sin(nπ) and cos(nπ), that is, oscillan tions with frequencies 2p , with n ¼ 0, 1, 2, . . .. In the 1920s von Neumann realized that Hilbert spaces are the natural setting for quantum mechanics, where a particle is confined to move in a straight line between two parallel walls. This particle at each instant in time t is described by an element j(∙, t) L2([0, M]), that is, a vector in the Hilbert
Hilbert Space
617
space of square-integrable, complex-valued function on the interval [0, M], where M is the distance between the walls and the function j is called the wave function of the particle. In probability theory, Hilbert spaces arise naturally. A random experiment is modeled mathematically by a space Ω (sample space) and a probability measure P on Ω. Each point o Ω corresponds to a possible outcome of the experiment. An event A is a measurable subset of Ω. The probability measure P associates each event A with a probability P(A), where 0 P(A) 1 and P(Ω) ¼ 1. A random variable X is a measurable function X : Ω ! ℂ, which maps each outcome in o to a complex number (or real number). The expected value [X] of a random variable X is the mean, or integral, of the random variable X with respect to the probability measure P, that is ½X ¼
O
XðoÞdPðoÞ:
On the other hand, a random variable X is said to be second order if [|X|2] < 1. The set of second-order random variables forms a Hilbert space with respect to the inner product hX, Yi ¼ XY : The space of second-order random variables may be identified with the space L2(Ω, P) of square-integrable functions on (Ω, P), with the inner product hX, Yi ¼
O
XðoÞYðoÞdPðoÞ:
In geostatistics, one can find the Hilbert spaces of random fields (Chilès and Delfiner 2012). Consider a family of complex-valued random variables X defined on a probability space (Ω, A, P) and having finite second-order moments (Chilès and Delfiner 2012) jXj2 ¼ jXðoÞj2 PðdoÞ < 1: These random variables constitute a Hilbert space denoted L2(Ω, A, P), with its inner product hX, Yi ¼ XY (the upper bar denotes complex conjugation) and its norm kXk ¼ jXj2 . In this Hilbert space, two random variables are orthogonal when they are uncorrelated (hX, Yi ¼ XY ¼ 0).
In a Hilbert space, the orthogonal projection of X onto a closed linear subspace K can be defined as the unique point X0 in the subspace closest to X (Chilès and Delfiner 2012; Halmos 1951): X0 ¼ arg minkX Yk , hX X0 , Yi ¼ 0, for all Y K: YK
Since X0 K, it satisfies hX X0, X0i ¼ 0, considering the triangular inequality we have kX X0 þ X0 k2 ¼ kX X0 k2 þ kX0 k2 : Or equivalently kX X0 k2 ¼ kXk2 kX0 k2 : This result is used in kriging theory.
Conclusions The Hilbert spaces are Banach spaces with a norm that is derived from an inner product, which is used to introduce the notion of orthogonality. The Hilbert spaces have several applications which arise naturally, mainly in physics and mathematics.
Cross-References ▶ Fast Fourier Transform ▶ Geostatistics ▶ Kriging
Bibliography Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty. Wiley Series in Probability and Statistics, 2nd edn, John Wiley & Sons, Inc., Hoboken, New Jersey, 734pp Fischer E (1907) Sur la convergence en moyenne. C R Acad Sci 144: 1022–1024 Halmos PR (1951) Introduction to Hilbert space and the theory of spectral multiplicity. Chelsea, New York, 118pp Riesz F (1907) Sur les systèmes orthogonaux de fonctions. C R Acad Sci 144:615–619 Rudin W (1986) Real and complex analysis, 3rd edn. McGraw-Hill Education, McGraw-Hili Book Co., Singapore, 483pp Stein EM, Shakarchi R (2005) Real analysis: measure theory, integration, and Hilbert spaces. Princeton University Press, Princeton, 424pp
H
618
Horton, Robert Elmer Keith Beven Lancaster University, Lancaster, UK
Fig. 1 Robert Elemer Horton (from Merrill Bernard 1945), from Bulletin American Metrological Society
Biography Robert Horton is recognized as one of the foremost hydrologists of the twentieth century. He was born in 1875 in Parma, Michigan and obtained a BSc degree from Albion College, Michigan in 1897 (which later also awarded him an honorary Ph.D. in 1932). Horton’s professional work began under the direction of his uncle, George Rafter, a prominent civil engineer who had earlier worked on the Erie Canal. For better streamflow measurements, Rafter had commissioned laboratory weir studies at Cornell. Horton analyzed and summarized the results. This initial work was then considerably extended when Horton joined the US Geological Survey in 1900. The resulting publications became the standard US work on the subject. This led on to a program of extensive stream gaging on New York streams and work on baseflow which he recognized was dominated by groundwater. He also started to focus on the role of infiltration in controlling both recharge to groundwater and surface runoff. He subsequently came to view the soil surface as a “separating surface” between water that either infiltrated and would later become baseflow or return to the atmosphere as evapotranspiration, and water in excess of the infiltration capacity of the soil surface that would become surface runoff and could contribute quickly to the stream hydrograph (Horton 1933). He proposed an equation for infiltration rates at the surface that would then allow
Horton, Robert Elmer
the calculation of rates of surface runoff given rainfall inputs (Horton 1939). This was used by Horton in his later work as a hydrologic consultant, based at the Horton Hydrological Laboratory near Voorheesville, New York. It was rapidly adopted by many others as a useful tool in flood prediction to the extent that the concept became known as “Hortonian overland flow.” Horton’s view of infiltration was, however, much more sophisticated than has been commonly appreciated. He argued that infiltration capacities were controlled by surface rather than profile controls, and that these would change over time both within an event due to “extinction phenomena,” such as the displacement of soil particles blocking larger pores, and between events. He also recognized the role of air pressure, cracks and macropores, on infiltration and the variability of soil characteristics in runoff production. Interestingly, analyses suggest that he might not have seen widespread surface runoff on his own experimental catchment at the Horton Laboratory very often (Beven 2004). This did not stop the concepts getting widely used (even to the present day). Throughout his career, Horton was interested in flood generation and prediction, including publishing on the meteorological aspects of intense storms. He was one of the first to try to estimate “maximum possible rainfalls” for different places, which when combined with the infiltration theory might allow a maximum flood to be estimated. He showed how some observed events had approached these limits. In terms of mathematical geoscience, Horton published a number of interesting analytical solutions for hydrological processes during his career, but his most significant contribution appeared in the Bulletin of the Geological Society of America just 1 month before his death in 1945 (Horton 1945). This is an extensive treatise of 95 pages that deals with the implications of his surface runoff theory on the erosional development of hillslopes and drainage basins. It is a remarkable seminal achievement that later stimulated other hydrologists and geomorphologists in the “quantitative revolution” in Geography that started in the late 1960s and the analysis of landscapes as fractal surfaces in the 1980s. Horton’s importance to both hydrology and meteorology is recognized by both the Horton Medal of the American Geophysical Union and the Horton Lecture of the American Meteorological Society.
Bibliography Bernard M (1945) Robert E. Horton. Bull Am Meteorol Soc 26:242 Beven KJ (2004) Surface runoff at the Horton hydrologic laboratory (or not?). J Hydrol 293:219–234
Howarth, Richard J. Horton RE (1933) The role of infiltration in the hydrologic cycle. Trans Am Geophys Union 14:446–460 Horton RE (1939) Analysis of runoff-plat experiments with varying infiltration-capacity. Trans Am Geophys Soc 20:693–711 Horton RE (1945) Erosional development of streams and their drainage basins: hydrophysical approach to quantitative morphology. Bull Geol Soc Am 56:275–370
Howarth, Richard J. J. M. McArthur Earth Sciences, UCL, London, WC1E 6BT, UK
Fig. 1 Richard J. Howarth, courtesy of Prof. Howarth
Biography Richard J. Howarth was born in Southport, UK, in 1941. After gaining a B.Sc. in geology in 1963 from Bristol University he stayed on to undertake a Ph.D. (completed 1966) on the Irish extension of the Cryogenian Port Askaig Tillite formation. The research involved statistical calibrations for the first XRF spectrometer to be installed in a British geology department, and the use of factor analysis for the interpretation of geochemical and other data. The period 1966–1968 was spent in
619
The Hague, The Netherlands, with Bataafse Petroleum Maatschappij (Shell), where he undertook statistical analysis relating the geology of the World’s sedimentary basins to their hydrocarbon production. Returning to the UK in 1968 for family reasons, he joined John Webb’s Applied Geochemistry Research Group, in the Department of Geology at Imperial College of Science & Technology, London, where he worked until 1985. His research at Imperial College was initially concerned with analytical methodologies and with mapping and interpreting regional geochemical survey data for mineral exploration, geological, and epidemiological purposes. He worked with chemist Michael Thompson to develop statistical techniques and procedures, which have now become standard, for quality assurance of analytical data used for regional geochemical mapping (e.g., Thompson and Howarth 1973). The software he developed while at Imperial underpinned production of pioneering regional geochemical atlases of Northern Ireland, of England and Wales (Webb et al. 1978), and elsewhere. He also worked with a team under George Koch Jr., of the University of Georgia, USA (1978–84), on the statistical interpretation of data from the US National Uranium Resource Evaluation Program. Much of this accumulated experience was summarized in Howarth (1983). In 1985 he joined the British Petroleum group as a senior consultant in BP Research, Sunbury, where his research and advisory duties encompassed biostratigraphy, chemostratigraphy, estimation of hydrocarbon reserves, petrophysical data analysis, laboratory quality-control, well-inflow performance monitoring, and numerous other matters. In the course of this, he collaborated with P.C. Smalley and A. Higgins and others to develop Sr-isotope stratigraphy, work later continued with others at University College London over the next 23 years (e.g., McArthur et al. 2001). Since 1993, he has held the honorary position of Professor in Mathematical Geology in the Department of Earth Sciences, University College London. During this time his interests broadened to include writings on the history of geology and geophysics (e.g., Howarth 2017). His wide reading of the literature of applied statistics enabled him to cross discipline boundaries and introduce techniques which were new to geology, for example, empirical discriminant function, nonlinear mapping, power transform, and various estimators of uncertainty. He has been awarded the Murchison Fund of The Geological Society of London (1987), William Christian Krumbein Medal of the International Association for Mathematical Geology (2000) for contributions to mathematical geology, and the Sue Tyler Friedman Medal of the Geological Society of London (2016) for contributions to the history of geology.
H
620
Hurst Exponent
Bibliography Howarth RJ (ed) (1983) Handbook of exploration geochemistry. volume 2. Statistics and data analysis in geochemical prospecting. Elsevier, Amsterdam. 425 p Howarth RJ (2017) Dictionary of mathematical geosciences with historical notes. Springer International Publishing, Cham. 893 p McArthur JM, Howarth RJ, Bailey TR (2001) Strontium isotope stratigraphy: LOWESS version 3: best fit to the marine Sr-isotope curve for 0–590 Ma and accompanying look-up table for deriving numerical age. J Geol 109:155–170 Thompson M, Howarth RJ (1973) The rapid estimation and control of precision by duplicate determinations. Analyst 98:153–160 Webb JS, Thornton I, Thompson M, Howarth RJ, Lowenstein PL (1978) The Wolfson geochemical atlas of England and Wales. Clarendon Press, Oxford/London. 74 p
Hj measures the global regularity per interval. Estimated Hurst exponent H of the function f is able to say that f is an anti-correlated random walk (H < 1/2: antipersistent random walk), or positively correlated (H > 1/2: persistent random walk). H ¼ 1/2 corresponds to classical Brownian motion (Audit et al. 2004).
The Mulifractional Brownian Motion Let H ]0, 1[, the fractional Brownian motion (fBm) is defined as a centered Gaussian process (BH t )t0 with covariance function RH ðt, sÞ ¼ E BHt BSt ¼ 1=2 t2H þ s2H jt sj2H : The index H is called the Hurst exponent and it determines the main properties of the process BH, such as a self-similarity and a regularity of the sample paths and a long memory. Figure 1 shows an example of fractional Brownian motion realization with 1024 samples and a Hurst exponent H ¼ 0.60.
Hurst Exponent Sid-Ali Ouadfeul1 and Leila Aliouane2 1 Algerian Petroleum Institute, Sonatrach, Boumerdes, Algeria 2 LABOPHT, Faculty of Hydrocarbons and Chemistry, University of Boumerdes, Boumerdes, Algeria
Methods of Estimation of the Hurst Exponent Spectral Density Method The Fourier transform of a fBm function f(t) is given by:
Definition of the Ho¨lder Exponent
þ1
f ðtÞe2pjft dt ¼ Fðf Þ ¼ Aðf Þ þ jBðf Þ
FT ½f ðtÞ ¼ The singularity exponent at a point x0 can be expressed by an exponent called the Hölder exponent, and this exponent is defined as follows: The Hölder exponent of the distribution f at point (x0) is the largest h, whose f is Lipshitizian with exponent h at point (x0). It means, there exists a constant C and apolynomial of Pn(x) of n order such that for all x belonging to the neighborhood of x0 we have: h
jf ðxÞ Pn ðx x0 Þj Cjx x0 j
Fðf Þ ¼ jFðf Þjej;ðf Þ
Definition of the Hurst Exponent If the Hölder exponent of a function f is constant per interval (hi ¼ Hj for the interval [aj, bj],i ¼ 0,1,2. . .Nj, j ¼ 1,2,3. . .M), Hj are the Hurst exponents.
ð3Þ
|F( f )| and ; ( f ) are the spectra of amplitude and phase, these spectra are given by: j Fð f Þ j ¼
ð1Þ
If h(x0) ]n, n þ 1[ we can easily demonstrate that at point x0 the polynomial Pn(x) corresponds to the Taylor series of f at point x0, h(x0) measures how the distribution f is irregular at x0, it means: h(x0) is very high , f is regular. In the most cases h(x0) ¼ h þ 1 for the primitive of f, and h(x0) ¼ h 1 for the derivative.
ð2Þ
1
;ð f Þ ¼
arctg
Að f Þ 2 þ Bð f Þ 2
ð4Þ
Bð f Þ Að f Þ
ð5Þ
if Aðf Þ 6¼ 0
p if Aðf Þ ¼ 0 2
|F( f )|2 is the spectral density jFðf Þj2 ¼ f1B where b is the spectral exponent, it is related to the Hurst exponent by a linear relationship β ¼ 2H þ 1 To estimate the spectral exponent, we present the spectral density versus the frequency in the log-log scale, a linear regression will provide an estimated value of β by consequence of H. Figure 2 shows the graph of the spectral density versus the frequency in the log-log scale, the linear regression (red line)
Hurst Exponent
621
Hurst Exponent, Fig. 1 An example of a Fractional Brownian motion realization with 1024 samples and H ¼ 0.60
0,4
0,2
fbm(t)
0,0
-0,2
-0,4
-0,6
-0,8
H 0
200
400
600
800
1000
t
Hurst Exponent, Fig. 2 The spectral density of the fBm signal presented in Fig. 1 versus the frequency in the log-log scale, the red line is the linear regression
0,01 1E-3 1E-4
Spectral density
1E-5 1E-6 1E-7 1E-8 1E-9 1E-10 1E-11 1E-12 1E-13 1E-3
0,01
0,1
Frequency
in Fig. 2 gives a slope β ¼ 2.2 0.05 by consequence H ¼ 0.60 0.025. The Variogram A variogram is a geostatistical method of comparing similarity of a data value to neighboring values within a field of data (Deutsch 2003). The variogram is calculated by:
g ð hÞ ¼
1 N
N i¼1
½Z ðxi Þ Zðxiþh Þ2
ð6Þ
where N is the number of neighboring data points within the specified lag distance being compared. Z(xi): is the physical property parameter value of the initial point.
622
Hurst Exponent
Z(xi þ h): is the parameter value of the neighboring point. The value of logarithm γ(h) is then plotted versus logarithm the distance between the initial point and the compared points. The Hurst exponent is the slope of straight line resulting of the linear regression of this curve (Ouadfeul and Aliouane 2011). The estimated Hurst exponent of the fBm signal presented in Fig. 1 is H ¼ 0.59 0.012 which is very close to the actual Hurst exponent. The Detrended Fluctuation Analysis The method for quantifying the correlation propriety in nonstationary time series is based on the computation of a scaling exponent H by means of a modified root mean square analysis of a random walk (Peng et al. 1994). To compute H from a time-series x(i) [i ¼ 1,. . ., N], like the interval tachogram, the time series is first integrated: y ðk Þ ¼
k i¼1
½xðiÞ M
ð7Þ
where M is the average value of the series x(i), and k ranges between 1 and N. Next, the integrated series y(k) is divided into boxes of equal length n and the least-square line fitting the data in each box, yn(k), is calculated. The integrated time series is detrended by subtracting the local trend yn(k), and the root-mean square fluctuation of the detrended series, F(n), is computed as (Peng et al. 1995): Fð n Þ ¼
1 N
N
The Wavelet Transform Modulus Maxima Lines The wavelet transform modulus lines (WTMM) is a multifractal formalism introduced by Frish and Parisi (1985) revisited by the continuous wavelet transform introduced Grossmann and Morlet in 1985. The WTMM method was first introduced by Malat and Huang in 1992 and used for image processing. The continuous wavelet transform is a decomposition of a given signal S(t) into a dilated and translated wavelets ’ tb a obtained from a mother ’(t) that must have n vanishing moments; þ1
tn ’ðtÞdt ¼ 0n < þ1
½yðkÞ yn ðkÞ i¼1
þ1
ð8Þ
SðtÞ’ 1
tb dt a
10
Log(n)
1
0,1
0,01
1
ð9Þ
1
CWT ða, bÞ ¼
2
Hurst Exponent, Fig. 3 DFA estimator of the Hurst exponent of the fBm signal presented in Fig. 1, the red line is the linear fit
F(n) is computed for all time-scales n. Typically, F(n) increases with n, the “box-size.” If log F(n) increases linearly with log n, then the slope of the line relating F(n) and n in a log-log scale gives the scaling exponent α. where α ¼ H þ 1 The estimated Hurst exponent of the fBm signal presented in Fig. 1 using the DFA method (see Fig. 3) is H ¼ 0.46 0.01, which is different to theoretical Hurst exponent, since the DFA estimator is recommended for short time series (Ouadfeul and Aliouane 2011).
10
100
Log(F(n))
ð10Þ
Hurst Exponent
623
where a R +, b R The first step of WTMM method is to calculate the continuous wavelet transform (CWT) of a given signal and the modulus of CWT, the next step is maxima of the continuous wavelet transform. Determination of local maxima is performed using the computation of the first and second derivative of the wavelet coefficients. CWT (a,b) admits a maximum at point b0 if it satisfies the following two conditions (Ouadfeul 2020): @CWT ¼0 @b @ 2 CWT 30), the Z-test is the alternative to T-test. The statistic z follows a standard Gaussian distribution and is defined replacing s by s at Eq. 1.
Tests for Two Samples Given the samples X ¼ {X1, . . ., Xn} and Y ¼ {Y1, . . ., Ym}, a variant of T-test allows comparing the mean values from two samples. The statistic t for this case is defined as:
t¼
XY þ s2Y =m
ð2Þ
s2X =n
where X, Y: sX, and sY are the means and standard deviations computed through the samples X and Y, which are composed by n and m observations, respectively. The same suppositions made for the single sample version still kept. The statistic t follows a Student’s T distribution with (n þ m 2) freedom degrees. The critical region definition and rejection of ℋ0 are similar to its single sample version. In analogy, knowing the standard deviations sX and sY regarding the populations where X and Y are extracted, the comparison of its mean values is carried through the Z-test for two samples. Again, the statistic of this test follows a standard Gaussian distribution and is defined replacing sX and sY by sX and sY in Eq. 2. When the focus is to check the similarity between the variances of two populations, with supposed Gaussian distributions, the F-test may be applied. Its statistic of test f, defined in Eq. 3, follows a Snedecor F-distribution with parameters (n 1) and (m 1) freedom degrees. ð m 1Þ f ¼ ðn 1Þ
n i¼1 m j¼1
Xi X
2
Yi Y
2
ð3Þ
A non-parametric alternative to the T-test is given by the U Mann-Whitney test. In this case, supposing that the samples X and Y has at least 20 observations, the u statistic expresses the Mann-Whitney’s test: u¼
U nm=2 ðnmðn þ m þ 1ÞÞ=12
ð4Þ
Þ with U ¼ min {UX, UY} where U X ¼ nm þ nðnþ1 2 mðmþ1Þ RX , U Y ¼ nm þ 2 RY ; RX is obtained by the sum of ranks assigned to the observations in X after a joint regarding both observations from X and Y; RY is computed in analogy to RX. If the samples does not attain the minimum required size, distinct procedures, as discussed in Siegel and Castellan (1988) and Sheskin (2011), should be adopted. The statistic u follows a standard Gaussian distribution, and then such distribution rules the critical region definition and ℋ0 rejection. In the case of paired comparison, the Wilcoxon test is a non-parametric procedure that can be employed to determine whether or not two samples X and Y represent the same population. When the analysis involves small samples, this test embraces: (i) calculate the difference di relative to the
H
632
Hypothesis Testing
scores of the pair (Xi, Yi); (ii) assign a rank to each di regardless of its sign; (iii) include the signal to the ranks; and (iv) calculate the statistic r that corresponds to the sum of the ranks of the same sign. Finally, the probability of occurrence of r is obtained through tabulated distribution (Siegel and Castellan 1988; Sheskin 2011). However, when the number of pairs is greater than 25, the statistic of test w, which comes from the sum of the ranks follows a standard Gaussian distribution, is expressed by: w¼
r ðnðn þ 1ÞÞ=4 ðnðn þ 1Þð2n þ 1ÞÞ=24
ð5Þ
where n is the number of observations in both samples X and Y. Once again, the decision to reject ℋ0 is still in the same mood as previous discussions.
Tests for Several Samples, Correlation, Concordance, and Time Series Hypothesis tests for comparisons involving several samples are not performed essentially by combining tests for two samples applied to each pair of samples. On the assumption of normal distribution of k samples, whose variances are equal, the analysis of variance (ANOVA) allows comparing the means of these samples. In this test, ℋ0 is characterized as a simultaneous equivalence among the means of all samples (i.e., ℋ0 : m1 ¼ m2 ¼ ¼ mk), and ℋ1 establishes that there is difference at least between a pair of samples. This technique has different versions and may be applied on paired and independent samples. The statistic of the test resulting from the ANOVA follows Snedecor’s F-distribution. For this situation, when the data does not respect the assumptions required by the parametric approaches, the Friedmann and Kruskal-Wallis tests are the alternatives to deal with paired and independent samples, respectively. Correlation coefficients are applied to measure the relationship between variables to explain how a variable behaves in a scenario where the other changes. The Pearson and Spearman coefficients are, respectively, the parametric (it assumes that a bivariate Gaussian distribution expresses a joint distribution of the variables) and non-parametric ways of measuring the correlation between variables. These correlation measures range in [1, +1], where the lower and upper bounds indicate perfect inverse and direct correlations, respectively. When these measures approach zero, it implies that there is no correlation between the variables. In addition to calculating and interpreting the correlation values, a hypothesis test can be applied to verify that such value is significant to the null coefficient (i.e., no correlation).
Correlation coefficients involving more than two variables can be computed using Kendall’s coefficient. Agreement coefficients extracted from contingency tables (confusion matrices) that express the relation between expected and observed quantities according to different classes are also subject to hypothesis tests. Specifically, values of coefficients such as kappa (k) and tau (t) from different confusion matrices may have their significance verified 2 through statistic of tests in the form pc1 c where c1 and c2 2 2 s1 þs2
represent the agreement coefficients (of common type) and s21 and s22 of the variances associated with the respective coefficients. The presented statistic follows standard Gaussian distribution. Finally, regarding the analysis of time series, identifying increasing or decreasing trends is of great importance. In this context, the Mann-Kendall test allows this verification where the ℋ0 and ℋ1 hypotheses imply the absence and existence of a trend in the observed period, respectively.
Applications in Geosciences A hypothesis test is an essential tool in data analysis processes. Regarding the geosciences, several studies employ this concept. In Luo and Yang (2017), hypothesis tests are used to verify the effectiveness of a sensor network developed to detect the concentration of water pollution. In this case, the T-test is applied to identify deviations from expected behavior and then reveal the presence of the pollutant source. In Eberhard et al. (2012), the Wilcoxon test is used to evaluate the accuracy of earthquake forecasting models. Similarly, Jena et al. (2015) employs the Z-test, among others, to evaluate the precipitation and climate change models. Finally, Nascimento et al. (2019) develop a new hypothesis test to perform temporal change detection in the Earth’s surface using polarimetric images obtained by remote sensing.
Summary A hypothesis test is a fundamental statistical procedure designed to check statistical elements like random variables, populations, distributions, or parameters that behave according to a statistical hypothesis.
Cross-References ▶ Probability Density Function ▶ Random Variable
Hypsometry
Bibliography Eberhard DAJ, Zechar JD, Wiemer S (2012) A prospective earthquake forecast experiment in the western pacific. Geophys J Int 190(3): 1579–1592. https://doi.org/10.1111/j.1365-246X.2012.05548.x Jena P, Azad S, Rajeevan MN (2015) Statistical selection of the optimum models in the CMIP5 dataset for climate change projections of Indian monsoon rainfall. Climate 3(4):858–875. https://doi.org/10.3390/ cli3040858 Luo X, Yang J (2017) Water pollution detection based on hypothesis testing in sensor networks. J Sens 2017:3829894. https://doi.org/10. 1155/2017/3829894 Mood AM, Boes DC, Graybill FA (1974) Introduction to the theory of statistics, 3rd edn. McGraw-Hill Nascimento ADC, Frery AC, Cintra RJ (2019) Detecting changes in fully polarimetric SAR imagery with statistical information theory. IEEE Trans Geosci Remote Sens 57(3):1380–1392. https://doi.org/ 10.1109/TGRS.2018.2866367 Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures, 5th edn. Chapman & Hall/CRC Siegel S, Castellan N (1988) Nonparametric statistics for the Behavioral sciences. McGraw-Hill international editions. Statistics series. McGraw-Hill
Hypsometry Sebastiano Trevisani1 and Lorenzo Marchi2 1 Università Iuav di Venezia, Venice, Italy 2 Consiglio Nazionale delle Ricerche – Istituto di Ricerca per la Protezione Idrogeologica, Padova, Italy
Definition In very general terms, hypsometry is the measurement of land elevation. In geomorphology, hypsometry refers to the analysis of the cumulative distribution of the elevation, absolute or relative, in a given region. The distribution of elevation is a basic topographic feature of a territory, and its representation is useful for interpreting and characterizing geomorphic processes. Hypsometry can be computed in spatial domains of any shape and extent, including the whole earth surface; however, often the studies of hypsometry focus on drainage basins.
Hypsometric Parameters Hypsometric curves, which represent the fraction of area above a given elevation, are a widely adopted tool for the graphical representation of hypsometry. The hypsometric curve is the cumulative distribution of elevation in a defined spatial domain of analysis (Hypsometry, Fig. 1). Hypsometric curves are often presented in nondimensional form
633
(Hypsometry, Fig. 2), which enables the comparison between different regions or drainage basins. Nondimensional hypsometric curves are plots of the relative height (h/H) versus the relative area (a/A) of the region under investigation (Hypsometry, Fig. 1). The relative height is the ratio of the height above the lowest point in the basin at a given contour to the total relief (H¼Hmax-Hmin), and the relative area is the ratio of horizontal cross-sectional area at that elevation to the total area (i.e., the frequency of values above a given threshold of elevation h). Hypsometry requires the assessment of the areas at different elevation belts, which was done in the past through manual measurements on topographic maps with contour lines, and is now easily implemented through the analysis of digital terrain models (DTMs) or, more in general, digital elevation models (DEMs). From the hypsometric curve, multiple parameters describing the statistical distribution of elevation can be derived. One of the most popular parameter is the hypsometric integral (HI), which corresponds to the area below the nondimensional hypsometric curve, a numerical index introduced by Strahler (1952) to summarize the hypsometry of fluvial basins. From the geometrical point of view, the hypsometric integral is the ratio between the volume contained between the land surface and the base plane (Hypsometry, Fig. 1b) and the volume obtained extruding the base plane for the basin relief (Hypsometry, Fig. 1c). A simple equation (Eq. 1) proposed by Pike and Wilson (1971) permits the computation of the hypsometric integral (HI) using the mean (Hm), minimum (Hmin), and maximum (Hmax) elevation in the studied area: HI ¼
H m H min H max H min
ð1Þ
The hypsometric curves and integrals are robust against different estimation methods (Singh et al. 2008). In case of derivation from a DTM, HI shows low sensitivity to variations in DTM resolution (Hurtrez et al. 1999). These features enable the comparison of hypsometric parameters computed with different methods and using topographic data of different resolutions. However, a relevant limitation of the hypsometric integral is that different hypsometric curves, which imply different distribution of areas with elevation, could have similar hypsometric integrals. In earth sciences and ecological studies, multiple indices derived from hypsometry are considered, including statistical moments of the distribution of elevation and the possibility to consider the probability density distribution. In this regard, the modeling of the experimental hypsometric curve with a theoretical function (Strahler 1952) facilitates the derivation of descriptive indices. For example, third degree polynomial
H
634
Hypsometry
Hypsometry, Fig. 1 (a) sketch with the main geometric elements of hypsometry for a drainage basin (modified from Strahler 1952); (b and c) representative volumes in hypsometric integral calculation
Relevance in Geomorphology
Hypsometry, Fig. 2 Nondimensional hypsometric curves for three stages of drainage basin evolution (modified from Strahler 1952)
curves (Eq. 2) have been widely applied to fit the hypsometric curves in a quite wide range of morphological settings. h=H
¼ c1 þ c2 ∙a=A þ c3 ∙ða=AÞ2 þ c4 ∙ða=AÞ3
ð2Þ
where c1, c2, c3, and c4 are coefficients determined by curve fitting. However, in presence of complex hypsometric curves, related, for example, to lithological and geomechanical heterogeneities in the basin, as in the monadnock phase, these polynomials are not sufficiently flexible (Vanderwaal and Ssegane 2013).
In his pioneering studies on hypsometry, Strahler (1952, 1957), considering also the work of Langbein et al. (1947), related the shape of the hypsometric curves and the values of hypsometric integral to the different stages of evolution of drainage basins (Hypsometry, Fig. 2). A young stage features high hypsometric integrals and is characterized by inequilibrium conditions that lead to a mature equilibrium stage. The monadnock phase with very low hypsometric integral, when it does occur, can be followed by a return to equilibrium conditions after the removal of isolated elevated areas of resistant rock. Several studies (e.g., Willgoose and Hancock 1998; Hurtrez et al. 1999; Korup et al. 2005) have provided evidence of the scale dependency of hypsometric parameters of drainage basins. The transition from diffusive to fluvial dominance (Willgoose and Hancock 1998), which affects the relative importance of river processes versus hillslopes processes (Hurtrez et al. 1999), determines the dependence of hypsometry on basin scale. The shape of the hypsometric curves (e.g., Lifton and Chase 1992; Pérez-Peña et al. 2009; Marchi et al. 2015) is directly related to multiple geomorphic processes and factors, such as geostructural setting, lithological heterogeneity, climatic factors, and biotic processes which control the morphological evolution of a basin. Given the relationships between hypsometric characteristics and geomorphic processes and factors, it is not surprising the ample literature related to the exploitation of hypsometricbased indices in multiple research contexts, not limited to the analysis of drainage basins (e.g., morphology of glaciers, planetary morphology, etc.). Moreover, hypsometry-based indices are also considered in geomorphometry as input
Hypsometry
parameters for geocomputational approaches based on supervised (e.g., landslide susceptibility models) and/or unsupervised learning (e.g., landscape classification based on geomorphometric features).
635
▶ Horton, Robert Elmer ▶ LiDAR ▶ Machine Learning ▶ Morphometry ▶ Quantitative Geomorphology ▶ Scaling and Scale Invariance
Application in Hydrology In hydrology, hypsometry helps computing the spatial average precipitation in regions where the orographic effects are important. In the traditional approach, the combined use of a relationship linking precipitation and elevation and the hypsometric curve permits assessing the precipitation at different elevation belts and the mean average precipitation in the studied area (Dingman 2002). Like in other sectors of quantitative terrain analysis, also in the assessment of mean areal precipitation, the use of gridded elevation data permits taking into account the effect of elevation without using graphical tools such as the hypsometric curve.
Conclusions Hypsometry describes the statistical distribution of elevations in a defined spatial domain of analysis, often represented by a drainage basin, through parameters that provide a spatialstatistical representation of a landscape. Hypsometry can thus be considered a precursor of modern geomorphometry. Hypsometry has found many applications in geomorphology: it was historically considered as an indicator of basins and landforms evolution and was then related to the geomorphic processes and their controlling factors. Care should be paid when using hypsometric curves and derived parameters, such as the hypsometric integral, for comparative purposes. The dependency of hypsometry on multiple geomorphic processes and factors, and, in some conditions, on spatial scale should be always taken into account.
Cross-References ▶ Digital Elevation Model ▶ Digital Geological Mapping ▶ Earth Surface Processes
Bibliography Dingman SL (2002) Physical hydrology. Prentice Hall, Upper Saddle River Hurtrez JE, Sol C, Lucazeau F (1999) Effect of drainage area on hypsometry from an analysis of small-scale drainage basins in the Siwalik Hills (central Nepal). Earth Surf Process Landf 24: 799–808 Korup O, Schmidt J, McSaveney MJ (2005) Regional relief characteristics and denudation pattern of the western Southern Alps, New Zealand. Geomorphology 71:402–423. https://doi.org/10.1016/j. geomorph.2005.04.013 Langbein WB et al (1947) Topographic characteristics of drainage basins. United States Geological Survey, Water-Supply Paper 968-C, pp 125–158 Lifton NA, Chase CG (1992) Tectonic, climatic and lithologic influences on landscape fractal dimension and hypsometry: implications for landscape evolution in the San Gabriel Mountains, California. Geomorphology 5:77–114 Marchi L, Cavalli M, Trevisani S (2015) Hypsometric analysis of headwater rock basins in the Dolomites (Eastern Alps) using highresolution topography. Geogr Ann Ser B 97(2):317–335. https:// doi.org/10.1111/geoa.12067 Pérez-Peña JV, Azañón JM, Booth-Rea G, Azor A, Delgado J (2009) Differentiating geology and tectonics using a spatial autocorrelation technique for the hypsometric integral. J Geophys Res 114:F02018. https://doi.org/10.1029/2008JF001092 Pike RJ, Wilson SE (1971) Elevation-relief ratio, hypsometric integral, and geomorphic area – altitude analysis. Geol Soc Am Bull 82: 1079–1084 Singh O, Sarangi A, Sharma MC (2008) Hypsometric integral estimation methods and its relevance on erosion status of North-western Lesser Himalayan watersheds. Water Resour Manag 22:1545–1560. https:// doi.org/10.1007/s11269-008-9242-z Strahler AN (1952) Hypsometric (area-altitude) analysis of erosional topography. Bull Geol Soc Am 68:1117–1142 Strahler AN (1957) Quantitative analysis of watershed geomorphology. Trans Am Geophys Union 38(6):913–920 Vanderwaal JA, Ssegane H (2013) Do polynomials adequately describe the hypsometry of monadnock phase watersheds? J Am Water Resour Assoc 49(6):1485–1495 Willgoose G, Hancock G (1998) Revisiting the hypsometric curve as an indicator of form and process in transport-limited catchment. Earth Surf Process Landf 23:611–623
H
I
Imputation Javier Palarea-Albaladejo Biomathematics and Statistics Scotland, Edinburgh, UK
Definition In data analysis and statistical modelling, imputation refers to a procedure by which missing or defective entries in a data set are replaced by plausible estimates to generate a complete data set that is usable for further processing.
Missing Data and Nondetects in Geoscientific Studies Modern datasets generated in the geosciences ordinarily contain collections of variables referring to varied characteristics of the medium under study. The observed data are a realization of the underlaying physical and chemical processes being investigated and multivariate statistical methods are well suited for their analysis and interpretation. Thus, along with the specialized methods of geostatistics, which are mostly focused on modelling spatiotemporal phenomena, multivariate techniques for data reduction, classification, and prediction are part of the basic quantitative toolbox of the geoscientist. Regardless of how cautiously a researcher designs and executes a study, one of the most prevalent problems in empirical work relates to the presence of missing values in multivariate data sets. Missing data in general can be due to many reasons, e.g., circumstances affecting data collection, failures of the instruments, or data handling issues. These are commonly indicated in a data matrix by empty cells or using a special code or character. A prevalent type of missing data in the geosciences refers to left-censored data. For example, in spatial exploratory studies, there are frequently geographic locations where indicators of the presence
of an element of interest are found, but its precise quantities cannot be determined. This commonly corresponds to trace element concentrations falling below the limit of detection of the measuring instrument and is a form of left-censored data: the actual value is unknown, but we do know that it must be somewhere in the lower tail of the data distribution below the detection limit. Thus, they are often called less-thans or nondetects. A detection limit (DL) is considered a threshold below which measured values cannot be distinguished from a blank signal or background noise, at a specified level of confidence. Once it has been deemed impossible to recover the actual values, the practical problem is how to deal with missing data, nondetects, or often both simultaneously, to be able to carry on with data analysis and modelling. Statistical analysis based on only the available data, or after a naïve substitution of the unobserved values, can introduce important bias and achieve reduced statistical power, particularly as the number of cases increases. These solutions are unfortunately common default options in statistical software packages.
Missingness Mechanisms Frameworks for missingness mechanisms have been developed that can assist in determining the approach to deal with missing data. Understanding the occurrence of missing elements in a data matrix X 5 (Xobs, Xmis) as a probabilistic phenomenon, with Xobs and Xmis referring to the observed and missing partitions, respectively; we can define a random missingness matrix M with elements 0 or 1 corresponding to observed or missing values in X respectively. Depending on the conditional distribution of M given X, and a set of unknown parameters x, the missingness mechanisms are generally classified as: • Missing completely at random (MCAR), when the probability of missingness is the same for all elements of the data
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
638
matrix, i.e., P[M| Xobs,Xmis;x] 5 P[M; x]. This is the simplest case, implying that missing values are a random sample of the data matrix. Basic approaches like data analysis using only the observed data can be sensible in this setting, however this is generally unrealistic. • Missing at random (MAR), when the probability a value is missing depends only on the available information, i.e., P [M| Xobs,Xmis;x] 5 P[M| Xobs; x]. This is the situation most commonly assumed in practice. The missing part can be ignored as long as the variables that affect the probability of a missing value are accounted for. Note, however, that if M depended on other variables that are not explicitly included in Xobs, then MAR would no longer be satisfied. • Missing not at random (MNAR), when the probability a value is missing depends on the value itself and hence both Xobs and Xmis are involved. This is the most complicated situation and implies a systematic difference between the observed and missing data. The case of nondetects as discussed above falls into this category.
Imputation
Advantages of this simple substitution are simplicity and the fact that no model assumptions are made. However, different studies have shown the distortion and undesirable consequences that such practice can have and advocate for methods that exploit the statistical properties of the data. Specialized maximum likelihood, robust, and nonparametric methods have been introduced for the unbiased estimation of common univariate statistics (means, standard deviations, quantiles, and so on) from left-censored data (Helsel 2012). However, these are not well suited for complex missingness patterns typical of multivariate data sets, possibly in high dimensions, which often involve varied interrelations between variables, for example spatial or temporal dependencies. Thus, prediction to recover the regional structure of an analyte or mineral resource and estimation of the overall spatial variability are common tasks in spatial modelling. These may benefit from exploiting the spatial correlation to deal with any censored data (Schelin and Sjöstedt-de Luna 2014).
Multivariate Data Imputation Given its practical relevance, the treatment of missing data is an active statistical research area with ramifications throughout varied applied disciplines (Little and Rubin 2002; Molenberghs et al. 2014). Approaches like completecase analysis (list-wise deletion) or available-case analysis (pair-wise deletion) simplify the problem by discarding data. The first leaves out all samples including any missing values. The second intents to maximize the use of the information available in each case. Thus, e.g., the calculation of means can be based on different samples depending on the missingness pattern of each variable and, analogously, correlations use data simultaneously available for each particular pair of variables. Note that the latter can lead to the technical issue of non-definite positiveness of correlation matrices, preventing the use of most multivariate techniques. As stressed above, these strategies must be used with great caution, particularly when missing data are not MCAR and there are meaningful associations between variables. Nonetheless, it is not feasible to define universal rules of thumb about when to use these methods or not, or actually any others. A fundamental consideration is of course the rate of missing data, but there are also important factors that are context specific. Hence, some testing and exploration blended with expert knowledge would better assist in making informed decisions in practice. More sophisticated methods that commonly assume a MAR mechanism include different forms of regression modelling, likelihood-based estimation methods, Bayesian data augmentation, and single and multiple imputation procedures. For the NMAR case of nondetects, a popular approach among practitioners in the earth and environmental sciences is simply substituting them by a fraction of the detection limit, typically DL/2, or simply using DL itself.
Data imputation provides an intuitive and powerful tool for the analysis of multivariate data with complex missing data patterns. The basic idea is straightforward: replacing missing values across variables by plausible values, ideally by statistically exploiting the observed data and the associations between variables, using different approaches depending on the nature of the data (van Buuren 2018; Schafer 1997). The result is a completed version of the data set that allows to carry on with the subsequent data analysis and modelling. Note that imputation can be also useful to deal with other forms of irregular data such as outliers, particularly when these are regarded as erroneous or inaccurate measurements. There exists a wide range of imputation methods of varied levels of complexity. The actual impact of any procedure on the results will depend on the characteristics of the data set and the focus of the subsequent analysis. We refer the reader to the general references provided above for an overview. One of the simplest methods is mean imputation, i.e., replacing missing values in a variable by the mean of the observed values. Note that this will tend to underestimate standard deviations and distort the correlations between variables. Model-based imputation will be generally preferred for moderate to high fractions of missing data in regular multivariate data sets, even when the imputation model only weakly agrees with the model assumed for the complete data (Schafer 1997). Unlike single imputation, multiple imputation is meant to create several versions of the imputed data set and combine estimates from them. This principally aims to incorporate the extra uncertainly derived from the own imputation process to the estimation of standard errors and, hence, to, e.g., the determination of rejection regions in statistical hypothesis
Imputation
testing. Methods within the sphere of machine learning such as random forests or k-nearest neighbors (KNN) algorithms have been also formulated to serve as imputation devices. Other imputation methods particularly relevant in computationally intensive high-dimensional contexts, e.g., in highthroughput experiments, rely on low-rank matrix completion. Specialized computer routines can nowadays be found in the most popular statistical computing platforms. Within the open-source arena, we refer the reader to The Comprehensive R Archive Network (CRAN) Task View on missing data (https://cran.r-project.org/web/views/MissingData. html), which keeps a detailed account of related packages available on the R statistical computing environment.
The Case of Compositional Data Compositional data (CoDa) are observations referring to fractions or parts of a whole, often represented in units such as proportions, ppm, weight %, mg/kg, and similar.
639
Compositional data are ubiquitous in different areas of the geosciences, including geochemistry, soil science, minerology, sedimentology, among others (Buccianti et al. 2006; Tolosana-Delgado et al. 2019). Compositions are defined on a sample space equipped with a special geometry that departs from the ordinary real space assumed by most statistical methods. Hence, statistical analysis and modelling of CoDa present some challenges and particularities that are not well addressed by conventional methods, which can lead to misleading results and interpretations. The mainstream approach to CoDa analysis involves focusing on (log-)ratios between the parts of the composition. These log-ratios carry the relevant relative information and crucially do not depend on an arbitrary scale of measurement. Note that zero values do not fit into the above conceptualization, since they are incompatible with the ratio and logarithm operations. However, they commonly occur in experimental studies involving compositions. Hence, this has been a particularly important issue in practical CoDa analysis. Zero parts in our context often concur with the
Imputation, Fig. 1 Patterns of missingness generated by simulation in illustrative geochemical data set and relative frequencies of cases
I
640
concept of nondetects as described above, i.e., small values that have been rounded off or have fallen below a limit of detection and then registered as zeros. The problem is more complex here than in standard multivariate analysis due to the fact that a single unobserved part renders the whole composition undefined. A basic property of any method to deal with zeros, nondetects, or missing data in general within this context, is that they should preserve the relative variation structure of the data. A number of specialized imputation methods have been developed in CoDa analysis following different methodological flavors. The R package zCompositions (Palarea-Albaladejo and Martín-Fernández 2015) provides an integrated framework for compositional data imputation (applicable to zeros, nondetects, missing data, or combinations of them) following the principles of the log-ratio approach. This includes a consistent treatment of closed and non-closed compositions, unique or varying detection limits, parametric and nonparametric imputation, single and multiple imputation, maximum likelihood and robust estimation, and other related tools. For illustration, we used 66 complete samples of a 14-part geochemical composition (in mg/g) from La Paloma stream (Venezuela) including the elements K, Ti, P, Ce, Ba, Li, Rb, V, Sr, Cr, B, Y, La, and Cu (the entire data set is available in zCompositions; the elements have been sorted here by decreasing mean relative abundance). The first five were considered major elements, whereas the remaining nine were assumed to be minor elements. We simulated around 15% of missing values in each of a random selection of 60% of the major elements (MCAR; 3.25% missing overall) and around 10% of nondetects in each of a random selection of 70% of the minor elements (using their 10% quantiles as DLs; 4% nondetects overall). This generated an artificial data set including 30 complete samples (45.45%); along with 18 (27.27%), 11 (16.67%), and 7 (10.61%) samples having respectively either missing, nondetect, or both types simultaneously. Figure 1 summarizes the patterns of missingness in the data using the zPatterns function in zCompositions (combining missing and nondetects and sorting patterns by decreasing frequency). The bars on the sides indicate the percentage cases, either by element (columns) or samples per pattern (rows). Imputation was conducted using the lrEMplus function in zCompositions (missing and nondetect values were coded as NA and 0, respectively, for this), with the 10% quantiles used as vector of DLs for the nondetects. This function implements an iterative model-based regression-type imputation method (more details and references can be found in the documentation accompanying the package). Basic compositional summary statistics of central tendency and variability were computed from the originally complete data and the imputed data to briefly assess the results. In particular, Table 1 shows
Imputation Imputation, Table 1 Compositional summary statistics from complete and imputed data sets: closed geometric mean (in percent units), total variance, and individual contributions to total variance (raw and percent values) Complete data CGM Contrib. var. (in %) (%) K 79.883 0.181 (9.44) Ti 16.547 0.068 (3.57) P 1.005 0.277 (14.45) Ce 0.806 0.097 (5.07) Ba 0.373 0.259 (13.52) Li 0.337 0.177 (9.24) Rb 0.326 0.140 (7.33) V 0.233 0.040 (2.08) Sr 0.154 0.135 (7.04) Cr 0.106 0.082 (4.28) B 0.083 0.052 (2.70) Y 0.071 0.114 (5.95) La 0.055 0.136 (7.10) Cu 0.020 0.157 (8.23) Total variance 1.915 (100)
Imputed data CGM Contrib. var. (in %) (%) 79.271 0.183 (9.71) 17.144 0.078 (4.15) 1.034 0.275 (14.57) 0.804 0.099 (5.24) 0.370 0.261 (13.86) 0.335 0.174 (9.24) 0.320 0.147 (7.81) 0.231 0.039 (2.09) 0.154 0.127 (6.75) 0.107 0.074 (3.93) 0.083 0.054 (2.84) 0.070 0.112 (5.94) 0.056 0.092 (4.87) 0.020 0.170 (9.00) 1.885 (100)
the closed geometric mean (CGM; a.k.a. compositional center); that is, the vector of geometric means of the individual elements rescaled (closed) to be expressed in percentage units in this case. Moreover, the total variance of the data set and the relative contributions of the elements to it are reported (computed from ordinary variances of the centered log-ratio transformation of the composition; raw and percent values shown). It can be observed that the imputation method introduced negligible distortion regarding the center of the data set. The total variation was slightly underestimated, with main differences in contribution being related with the minor elements La and Cu and the major element Ti. Figure 2 shows the pair-wise log-ratio variances (variances of the log-ratios between each pair of elements; arranged by rows and columns) from the complete and imputed data (displayed in upper and lower triangle respectively). Lower values correspond to higher associations between elements in terms of proportionality. A color scale was used to facilitate
Imputation
641
Imputation, Fig. 2 Pair-wise log-ratio variances from complete and imputed data sets (upper and lower triangle respectively)
I
interpretation, ranging from redder for stronger associations to greener for weaker associations. In line with the results above, the patterns of associations were very much comparable between complete and imputed data. On the one hand, the strongest associations were generally found among the minor elements, with B being often involved. Major elements Ti and Ce appeared as the most associated with minor elements. On the other hand, the weakest associations were generally found among the major elements, with the highest log-ratio variance being that between P and Ba. Finally, complete and imputed data sets were visualized in two dimensions using a compositional principal component analysis (PCA) biplot (Fig. 3). The first two principal components (PC1 and PC2) explained similarly moderate 42.35% and 43.87% of the total variability of the respective data sets. Colors and symbols were used to distinguish samples that were complete from those including either nondetects, missing values, or both simultaneously. The overall structure of the data was well preserved after imputation. The length of the links between the arrowheads corresponding to the different elements is related with the log-ratio variances between them. Thus, in agreement with the results showed in Fig. 2, the most independent elements appeared to be P and Ba, particularly with respect to K, Rb, and Li. Obviously, a main factor in
determining the distortion introduced by any imputation method in practice will be the gravity of the missingness problem. We refer the reader to the specialized literature for more detailed discussions about performance in different scenarios.
Summary Missing data and nondetects are ubiquitous in experimental studies. Ignoring them or treating them carelessly can seriously distort the results and scientific conclusions, particularly where there is a systematic difference between observed and unobserved values. In this sense, it is convenient to investigate why they occurred and make an informed decision about how they could be best handled. Important factors to consider include the severity of the problem, the size of the data set, and the distributional characteristics and nature of the variables involved. Data imputation is a popular approach, and there is a wide range of methods available under different flavors and assumptions. This includes specialized methods for data of compositional nature; that is, data representing relative amounts as frequently found across the geological and environmental sciences.
642
Independent Component Analysis
Imputation, Fig. 3 Compositional PCA biplot of original complete data set (a) and imputed data set (b) after simulating missing values and nondetects in illustrative geochemical composition data
Cross-References ▶ Compositional Data ▶ Geostatistics ▶ Multivariate Analysis ▶ Statistical Computing
Tolosana-Delgado R, Mueller U, van den Boogaart KG (2019) Geostatistics for compositional data: an overview. Math Geosci 51:485–526 van Buuren S (2018) Flexible imputation of missing data. Chapman & Hall/CRC Press, Boca Raton, Florida, USA
Independent Component Analysis Bibliography Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) (2006) Compositional data analysis in the geosciences: from theory to practice. Geological Society of London, London, UK Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty. Wiley, Hoboken, New Jersey, USA Helsel DR (2012) Statistics for censored environmental data using Minitab ® and R. Wiley, Hoboken, New Jersey, USA Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, Hoboken, New Jersey, USA Molenberghs G, Fitzmaurice G, Kenward M, Tsiatis B, Verbeke G (2014) Handbook of missing data methodology. CRC Press, Boca Raton, Florida, USA Palarea-Albaladejo J, Martín-Fernández JA (2015) zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemometr Intell Lab Syst 143:85–96 Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall, Boca Raton, Florida, USA Schelin L, Sjöstedt-de Luna S (2014) Spatial prediction in the presence of left-censoring. Comput Stat Data Anal 74:125–141
Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition Independent component analysis (ICA) is a method of decomposing a mixture of signals, i.e., a measured signal, to its individual sources, and is widely understood using its “classical cocktail party” analog (Stone 2002). In a situation at a cocktail party where many people are talking and the speech is captured using a single microphone, ICA is a solution for extracting the different speech signals from the microphone output, given that the voices of different people are unrelated. This process of extraction of source signals is
Independent Component Analysis
called “blind source separation” (BSS). An example in geosciences is in using ICA for finding the different rock units in a stream sediment geochemical dataset for mineral exploration (Yang and Cheng 2015). ICA is formulated as y ¼ Mx þ n, where y is the mixed signal, x is the matrix containing vectors of component signals, M is the mixing matrix, and n is the additive noise. If the noise is Gaussian, when using higher-order statistics in the equation, the noise term vanishes. This linear equation, upon removing the noise term, complies with the mixed signal being a linear combination of statistically independent source signals. However, in practice, the ordinary noise-free ICA methods are modified to remove the bias owing to the noise, for implementing the noisy ICA method (Hyvärinen 1999). The x maximizes a “contrast” function, as introduced by Gassiat in 1988 (Comon 1994), and for which “entropy” value is used. Both M and x are unknown, here, and the solution for x is arrived at when using the Moore–Penrose pseudoinverse of the mixing matrix, i.e., x ¼ M†y, and when x has maximum entropy.
Overview The term independent component analysis was introduced by Herault and Jutten circa 1986 (Comon 1994), owing to its similarity with principal component analysis (PCA). A brief history of ICA and its evolution has been discussed by Comon (1994). Alternative methods for BSS are PCA, factor analysis (FA), and linear dynamical systems (LDS). In ICA, the source signals, extracted as components, are assumed to be statistically independent and non-Gaussian. Unlike ICA, the popularly used alternative methods, PCA and FA, work on the basic assumption that the source signals are uncorrelated and Gaussian. Statistical independence between two signals implies that the values of one signal have no influence or information on those of the other signal. Being uncorrelated does not imply statistical independence as the value of one signal can provide information on the other signal. Thus, uncorrelated variables can be considered to be partly independent (Hyvärinen and Oja 2000). The source signals are extracted as eigenvectors and factors, respectively, in PCA and FA. In ICA and PCA, the source signals are referred to as “components.” In PCA, the uncorrelated signals are captured using eigenvectors of the correlation matrix, which are orthogonal components, by design. Thus, PCA removes correlations, which are second-order statistical dependencies, to extract components. That also implies that PCA does not infer higher-order statistical dependencies, e.g., skew and kurtosis. ICA, on the other hand, captures higher-order dependencies. There are known to be similarities between blind deconvolution and ICA (Hyvärinen 1999). When the blind
643
deconvolution is applied to a signal, which is independently and identically distributed (i.i.d.) over time, it relates closely with ICA. A widely used data preprocessing step for ICA is the removal of correlations in the signal, referred to as “whitening” the signal. PCA has been used for whitening signals. The extraction of higher-order dependencies is reflected in the non-orthogonality of the components in ICA. The property of statistically independent signals having maximum entropy is exploited in finding independent components. ICA has been widely developed and used in cognitive science for both biomedical data analysis and computational modeling. Solving PCA using eigenvalue decomposition (EVD) of covariance or correlation matrix is a linear process, whereas removing higher-order dependencies in ICA leads to a multilinear EVD (De Lathauwer et al. 2000). Instead of EVD of correlation matrices, singular value decomposition (SVD) of the raw data can be implemented. There are four widely used algebraic methods using multilinear EVD for solving ICA, namely, • By means of higher-order eigenvalue decomposition (HOEVD). • By means of maximal diagonality (MD). • By means of joint approximate diagonalization of eigenmatrices (JADE). • By means of simultaneous third-order tensor diagonalization (STOTD). The HOEVD is the least accurate method, and among the remaining three, MD is the most computationally intensive. Thus, STOTD and JADE algorithms are the preferred methods, of which STOTD is more efficient owing to the readily available tensors that have to be diagonalized, whereas the dominant eigenspaces have to be additionally computed in the case of JADE. Overall, the ICA algorithm has two aspects, namely, (1) the objective function that maximizes entropy, etc., and (2) the optimization algorithm that influences the algorithmic properties, such as convergence speed, memory complexity, and stability (Hyvärinen 1999). In practice, using stochastic gradient method leads to slower convergence. Hence, a fixedpoint algorithm, called FastICA, was proposed by Hyvärinen in 1997 to solve the convergence issue, for which a brief summary is given in Hyvärinen 1999. Extending the BSS/ICA model from one-dimensional to multidimensional signals, the standard algorithms for ICA can be used for multidimensional ICA (MICA) (Cardoso 1998). This can be done by using a geometric parameterization to avoid using the matrix algebraic technique. The matrix formulation of noise-free ICA leads to two known ambiguities (Hyvärinen and Oja 2000). Firstly, the variances of the independent components cannot be
I
644
determined. Secondly, the order of independent components cannot be determined. This is because all independent components are equally important, unlike the reducing variances in the principal components in PCA that provides the ordering.
Induction, Deduction, and Abduction
Cross-References ▶ Eigenvalues and Eigenvectors ▶ Principal Component Analysis ▶ Q-Mode Factor Analysis ▶ R-Mode Factor Analysis
Applications Bibliography Common applications of ICA include separation of artifacts in the magnetoencephalography (MEG) data of cortical neurons, determination of hidden factors in financial data, and reducing noise in natural images (Hyvärinen and Oja 2000). Similarly, ICA has also found widespread applicability in geosciences and related areas. When used as exploratory tools for geochemical data analysis, PCA has been found to be useful in finding significant geo-objects, whereas ICA is capable of separating geological populations in the dataset (Yang and Cheng 2015). Hyperspectral images have been classified into thematic land-cover classes using ICA and extended morphological attribute profiles (EAPs) of the images (Dalla Mura et al. 2010). The morphological attribute profiles are first computed for capturing the multiscale variation in geometry. Then, they are applied to the independent components of the image, computed using ICA. The support vector machine (SVM) classifier is then used with the EAPs for classification. ICA has been found to give more accurate classification outcomes than PCA for this application. Similarly, ICA has been useful for analyzing the variability of tropical sea surface temperature (SST) owing to its modeling of statistically independent components (Aires et al. 2000). The whitened SST has been found to be a mixture involving variabilities in several timescales in different geographical areas. Hence, ICA is best applicable in identifying the contributing components, which in turn determine the influence of SST variability in the El Nino Southern Oscillations (ENSO). This method has successfully shown the relationship between the SSTs of the northern and southern equatorial Atlantic Ocean through the “dipole,” and between the SSTs of the Pacific and Atlantic Oceans through the ENSO.
Aires F, Chédin A, Nadal JP (2000) Independent component analysis of multivariate time series: application to the tropical SST variability. J Geophys Res Atmos 105(D13):17,437–17,455 Cardoso JF (1998) Multidimensional independent component analysis. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), vol 4. IEEE, pp 1941–1944 Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314 Dalla Mura M, Villa A, Benediktsson JA, Chanussot J, Bruzzone L (2010) Classification of hyperspectral images by using extended morphological attribute profiles and independent component analysis. IEEE Geosci Remote Sens Lett 8(3):542–546 De Lathauwer L, De Moor B, Vandewalle J (2000) An introduction to independent component analysis. J Chemom 14(3):123–149 Hyvärinen A (1999) Survey on independent component analysis. Neural Computing Surveys 2:94–128 Hyvärinen A (2013) Independent component analysis: recent advances. Philos Trans Royal Soc A Math Phys Eng Sci 371(1984):20110,534 Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430 Stone JV (2002) Independent component analysis: an introduction. Trends Cogn Sci 6(2):59–64 Yang J, Cheng Q (2015) A comparative study of independent component analysis with principal component analysis in geological objects identification, Part I: Simulations. J Geochem Explor 149:127–135
Induction, Deduction, and Abduction Frits Agterberg Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Definition Future Scope Some of the advances in ICA since 2000 include causal relationship analysis, testing independent components, and analysis of three-way datasets (Hyvärinen 2013). There have also been developments in solving for ICA using timefrequency decompositions, modeling distributions of the components, and non-negative models. In summary, ICA is a powerful data exploratory tool for separating statistically independent and non-Gaussian components from a mixed signal.
Induction is a method of reasoning from premises that supply some evidence without full assurance of the truth of the conclusion that is reached. It differs from deduction that starts from premises that are correct and lead to conclusions which are certainly true. Abduction is a rarely used form of logical inference advocated by the philosopher Peirce. It starts with observations from which it seeks to draw the simplest and most likely conclusion. All three words are based on the Latin verb “ducere,” meaning to lead. The prefix “de” means “from,” indicating that deduction derives truth from factual
Induction, Deduction, and Abduction
statements; “in” means “toward,” and induction results in a generalization; while “ab” means “away,” i.e., taking away the best explanation.
Introduction Induction and deduction are philosophical concepts that are more frequently used than abduction, which often is regarded as a kind of induction or as a form of exploratory data analysis. The philosopher Charles Sanders Pierce (see Staat 1993) made a distinction between inference and decisionmaking under uncertainty on the one hand, and “theorizing” on the other. He called the former induction and the latter abduction. Traditionally, both induction and theorizing are prevalent in the geological sciences where, commonly, limited observations are used to construct 3-D realities often along with their history in geologic time. This process of reconstructing geological realities is prone to uncertainties and has often resulted in fierce debates in the past. Difficulties in the application of mathematics to solve many types geological problems stem from: (1) the nature of geological phenomena, especially paucity of exposure of bedrock and restriction of observations to a record of past events that have been only partially preserved in the rocks, and (2) the nature of traditional geological methods of research, which are largely nonmathematical (cf. Agterberg 1961, 1974). Induction is prevalent in geology. It could be argued that geology is a science of induction or abduction. When mathematics is used to solve geologic problems, the parameters must be defined in a manner sufficiently rigorous to permit nontrivial derivations. During the design of models, it is important to keep in mind Chamberlin’s (1897) warning: “The fascinating impressiveness of rigorous mathematical analysis, with its atmosphere of precision and elegance, should not blind us to the defects of the premises that condition the whole process. There is, perhaps no beguilement more insidious and dangerous than an elaborate and elegant mathematical process built upon unfortified premises.” Problems related to mathematical applications in the geosciences will be discussed later in this entry. Special attention will be given to the application of Bayesian data analysis, which serves to change prior probabilities based on induction into posterior probabilities including true evidence.
Induction Versus Deduction in Mathematical Statistics Mathematical statistics originated as a science during the seventeenth century with the definitions of “probability” in 1654, “expectation” in 1657, and other basic concepts (Hacking 2006) including the first application of a statistical
645
significance test Arbuthnot (1710). The idea of inductive logic was initially invented by philosophers under the banner “equipossibility” toward the end of the sixteenth century (Hacking 2006), but the idea remained dormant until the second half of the eighteenth century after the publication of the famous paper by Bayes (1763) that contains the general equation (known as Bayes’ rule or postulate): PðAjBÞ ¼
PðAÞ∙PðBjAÞ Pð BÞ
where A is a hypothesis (induction) and B is evidence. Proponents of Bayesian statistics included Jeffries (1939) who, in his book “Theory of Probability” insisted that inductive probabilities can only be defined in relation to evidence. Jeffries was immediately criticized by R.A. Fisher, who is generally regarded as the father of mathematical statistics, but who dismissed Jeffries’ book with the words: “He makes a logical mistake on the first page which invalidates all the 395 formulae in his book” (cf., Fisher-Box 1978). Jeffries’ “mistake” was to adopt Bayes’ postulate. Mcgrayve (2011)’s book “The theory that would not die” describes in detail how Bayes’ rule emerged triumphant from two centuries of controversy. Bayesian data analysis is well described in Gelman et al. (2013). Examples of geologic examples using “equipossibilityy” and Bayes’ postulate will be given later in this entry.
Geological Data, Concepts, and Maps The first geological map was published in 1815. This feat has been well documented by Winchester (2001) in his book: “The Map that Changed the World – William Smith and the Birth of Modern Geology.” According to the concept of stratigraphy, strata were deposited successively in the course of geologic time and they are often uniquely characterized by fossils such as ammonites that are strikingly different from age to age. It is remarkable that this fact remained virtually unknown until approximately 1800. There are several other examples of geologic concepts that became only gradually accepted more widely, although they were proposed much earlier in one form or another by individual scientists. The best-known example is plate tectonics: Alfred Wegener (1966) had demonstrated the concept of continental drift convincingly as early as 1912 but this idea only became acceptable in the early 1960s. One reason that this theory initially was rejected by most geoscientists was lack of a plausible mechanism for the movement of continents. Along similar lines, Staub (1928) had argued that the principal force that controlled mountain building in the Alps was crustal shortening between Africa and Eurasia. Figure 1 (from Agterberg 1961, Fig. 107) shows tectonic sketch maps
I
646
Induction, Deduction, and Abduction
Induction, Deduction, and Abduction, Fig. 1 Tectonic sketch maps of Strigno area on in northern Italy and Eastern Alps (after Agterberg 1961). Although area sizes differ by a factor of 1,600, the patterns showing overthrust sheets are similar. Originally, gravity sliding was assumed to be the major cause of these patterns. However, after development of the theory of plate tectonics in the 1960s, it has become
apparent that plate collision was the primary driving force of Alpine orogeny. The Strigno overthrust sheets are due to Late Miocene southward movement of the Eurasian plate over the Adria microplate, and the eastern Alpine nappes were formed during the earlier (primarily Oligocene) northward movement of the African plate across the Eurasian plate. (Source: Agterberg 1961, Fig. 107)
of two areas in the eastern Alps. The area in northern Italy shown at the top of Fig. 1 is 1600 smaller in area than the region for the eastern Alps at the bottom. The tectonic structure of these two regions is similar. Both contain overthrust
sheets with older rocks including crystalline basement overlying much younger rocks. It is now well known that the main structure of the map at the bottom was created during the Oligocene (33.9-23.0 Ma) when the African plate moved
Induction, Deduction, and Abduction
northward over the Eurasian plate. On the other hand, the main structure of the relatively small Strigno area was created much later during the Late Miocene (Tortonian, 11.6-7.3 Ma) when the Eurasian plate moved southward overriding the Adria microplate that originally was part of the African plate. Another geological idea that was conceived early on, but initially rejected as a figment of the imagination, is what later became known as the Milankovitch theory. Croll (1875) had suggested that the Pleistocene ice ages were caused by variations in the distance between the Earth and the Sun. Milankovitch commenced working on astronomical control of climate in 1913 (Schwarzacher 1993). His detailed calculations of orbital variations were published nearly 30 years later (Milankovitch 1941) showing quantitatively that amount of solar radiation drastically changes our climate. This theory was immediately rejected by climatologists because the changes in solar radiation due to orbital variations are miniscule, and by geologists as well because, stratigraphically, their correlation with ice ages apparently was not very good. However, in the mid-1950, new methods helped to establish the Milankovitch theory beyond any doubt. Subsequently, it resulted in the establishment of two new disciplines: cyclostratigraphy and astrochronology (Hinnov and Hilgen 2012). The latest geologic timescales of the Neogene (23.02.59 Ma) and Paleogene (66.0-23.0 Ma) periods are entirely based on astronomical calibrations, and “floating” astrochronologies are available for extended time intervals (multiple millions of years) extending through stages in the geologic timescale belonging to the older periods (Gradstein et al. 2020). The preceding three examples have in common that the underlying geologic processes were anticipated by individual geoscientists but remained in dispute until their existence could be proven mathematically beyond doubts.
Map-Making A field geologist is concerned with collecting numerous observations from those places where rocks are exposed at the surface. Observation is often hampered by poor exposure. In most areas, 90% or more of the bedrock surface is covered by unconsolidated overburden restricting observation to available exposures; for example, along rivers (cf. Agterberg and Robinson 1972). The existence of these exposures may be a function of the rock properties. In formerly glaciated areas, for example, the only rock that can be seen may be hard knobs of pegmatite or granite, whereas the softer rocks may never be exposed. Of course, drilling and geophysical exploration techniques provide additional information, but bore-holes are expensive and geophysics provides only partial, indirect information on rock composition, facies, age, and other properties of interest.
647
It can be argued that today most outcrops of bedrock in the world have been visited by competent geologists. Geologic maps at different scales are available for nearly all countries. One of the major accomplishments of stratigraphy is not only that the compositions but also the ages of rocks nearly everywhere at the Earth’s surface now are fairly well known. However, as Harrison (1963) has pointed out, although the outcrops in an area remain more or less the same during the immediate past, the geologic map constructed from them can change significantly over time when new geological concepts become available. A striking example is shown in Fig. 2. Over a 30-year period, the outcrops in this study area that were repeatedly visited by geologists had remained nearly unchanged. Discrepancies in the map patterns reflect changes in the state of geologic knowledge at different points in time. Many geologists regard mapping as a creative art. From scattered bits of evidence, one must piece together a picture at a reduced scale, which covers at least most of the surface of bedrock in an area. Usually, this cannot be done without a good understanding of the underlying geologic processes that may have been operative at different geologic times. A large amount of interpretation is involved. Many situations can only be evaluated by experts. Although most geologists agree that it is desirable to make a rigorous distinction between facts and interpretation, this is hardly possible in practice, partly because during compilation results for larger regions must be represented at scales of 1: 25,000 or 1: 250,000, or less. Numerous observable features cannot be represented adequately in these scale models. Consequently, there is often significant discrepancy of opinions among geologists that can be bewildering to scientists in other disciplines and to others including decision-makers in government and industry. Van Bemmelen (1961) has pointed out that the shortcomings of classical methods of geological observation constrain the quantification of geology. Much is left to the “feeling” and experience of the individual geologist. The results of this work, presented in the form of maps, sections, and narratives with hypothetical reconstructions of the geological evolution of a region, do not have the same exactitude as the records and accounts of geophysical and geochemical surveys, which are more readily computerized even though the results may be equally accurate in an interpretive sense. Geophysical and geochemical variables are determined by the characteristics of the bedrock geology which, in any given area, is likely to be nonuniform because of the presence of different rock types, often with abrupt changes at the contacts between them. The heterogeneous nature of the geologic framework will be reflected or masked in these other variables. Geologists, geophysicists, and geochemists produce different types of maps for the same region. Geophysical measurements are mainly indirect. They include gravity and
I
648
Induction, Deduction, and Abduction
Induction, Deduction, and Abduction, Fig. 2 Two geological maps for the same area on the Canadian Shield (after Harrison 1963). Between 1928 and 1958 there was development of conceptual ideas about the
nature of metamorphic processes. This, in turn, resulted in geologists deriving different maps based on the same observations
seismic data, variations in the Earth’s magnetic field, and electrical conductivity. They generally are averages of specific properties of rocks over large volumes with intensities of penetration decreasing with distance and depth. Remotely sensed data are very precise and can be subjected to a variety of filtering methods. However, they are restricted to the Earth’s surface. Geochemists mainly work with element concentration values determined from chips of rocks in situ, and also from samples of water, mud, soil, or organic material.
media that can only be observed in 2-D. The geologist can look at a rock formation but not inside it. Sound geological concepts are a basic requirement for making 3-D projections. During the past two centuries, after William Smith, geologists have acquired a remarkably good capability of imagining 3-D configurations by conceptual methods. This skill was not obtained easily. In Fig. 3, according to Nieuwenkamp (1968), an example is shown of a typical geologic extrapolation problem with results strongly depending on initially erroneous theoretical considerations. In the Kinnehulle area in Sweden, the tops of the hills consist of basaltic rocks; sedimentary rocks are exposed on the slopes; granites and gneisses in the valleys. The first two projections into depth for this area were made by a prominent geologist (Von Buch 1842). It can be assumed that today most geologists would quickly arrive at the third 3-D reconstruction (Fig. 3c) by Westergård et al. (1943).
Geological Cross-Sections The facts observed at the surface of the Earth must be correlated with one another; for example, according to a stratigraphic column. Continually, trends must be established and projected into the unknown. This is because rocks are 3-D
Induction, Deduction, and Abduction
649
I
Induction, Deduction, and Abduction, Fig. 3 Schematic sections originally compiled by Nieuwenkamp (1968) showing that a good conceptional understanding of time-dependent geological processes is required for downward extrapolation of geological features observed at the surface of the Earth. Sections A and B are modified after von Buch
(1842) illustrating his genetic interpretation that was based on combining Abraham Werner’s theory of Neptunism with von Buch’s own firsthand knowledge of volcanoes including Mount Etna in Sicily with a basaltic magma chamber. Section C is after Westergård et al. (1943) (Source: Agterberg 1974, Fig. 1)
At the time of von Buch it was not yet common knowledge that basaltic flows can form extensive flows within sedimentary sequences. The projections in Fig. 3a, b reflect A.G. Werner’s early pan-sedimentary theory of “Neptunism” according to which all rocks on Earth were deposited in a primeval ocean. Nieuwenkamp (1968) has demonstrated that this theory was related to philosophical concepts of F.W.J. Schelling and G.W.F. Hegel. When Werner’s view was criticized by other early geologists who assumed processes of change during geologic time, Hegel publicly supported Werner by comparing the structure of the Earth with that of a house. One looks at the complete house only and it is trivial that the basement was constructed first and the roof last. Initially, Werner’s conceptual model provided an appropriate and temporarily adequate classification system, although von Buch, who was a follower of Werner, rapidly ran into problems during his attempts to apply Neptunism to explain occurrences of rock formations in different parts of Europe. For the Kinnehulle area (Fig. 3) von Buch assumed that the primeval granite became active later changing sediments into gneisses, while the primeval basalt became a source for hypothetical volcanoes. Clearly, 3D timedependent mapping continues to present important geoscientific challenges. It is likely that in future a major role will be played by new methodologies initially developed in mathematical morphology (Matheron 1981), a subject currently advanced by Sagar (2018) and colleagues.
Conditional Probability and Bayes’ Theorem Suppose that p(D|B) represents the conditional probability that event D occurs given event B (e.g., a mineral deposit D occurring in a small unit cell underlain by rock type B on a geological map). This conditional probability obeys three basic rules (cf. Lindley 1987, p. 18): 1. Convexity: 0 p(D|B) 1; D occurs with certainty if B logically implies D; then, p(D|B) ¼1, and p(Dc|B) ¼ 0 where Dc represents the complement of D 2. Addition; p(B[C|D) ¼ p(B|D) þ p(C|D) – p(B\C|D) 3. Multiplication: p(B\C|D) ¼ p(B|D) p(C|B\D) These three basic rules lead to many other rules (cf. Agterberg 2014). For example, replacement of B by B\D in the multiplication rule gives: p(B\C|D) ¼ p(B|D) p(C|B\D). Likewise, it is readily derived that: p(B\C\D) ¼ p(B|D) p(C|B\D) p(D). This leads to Bayes’ theorem in odds form: pðDjB \ CÞ pðBjC \ DÞ pðDjCÞ ¼ ∙ pðDc jB \ CÞ pðBjC \ Dc Þ pðDc jCÞ or
650
Induction, Deduction, and Abduction
OðDjB \ CÞ ¼ expðW B\C Þ∙OðDjCÞ where O ¼ p/(1–p) are the odds corresponding to p ¼ O/(1 þ O), and WB\C is the “weight of evidence” for occurrence of the event D given B and C. Suppose that the probability p refers to occurrence of a mineral deposit D under a small area on the map (circular or square unit area). Suppose further that B represents a binary indicator pattern, and that C is the study area within which D and B have been determined. Under the assumption of equipossibility (or equiprobability), the prior probability is equal to total number of deposits in the study area divided by total number of unit cells in the study area. Theoretically, C is selected from an infinitely large universe (parent population) with constant probabilities for the relationship between D and B. In practical applications, only one study area is selected per problem and C can be deleted from the preceding equation. Then Bayes’ theorem can be written in the form:
As an example of this type of application of Bayes’ theorem, suppose that a study area C, which is a million times as large as the unit cell, contains 10 mineral deposits; 20% of C is underlain by rock type B, which contains 8 deposits. The prior probability p(D) then is equal to 0.000 01; the posterior probability for a unit cell on B is p(D|B) ¼ 0.000 04, and the posterior probability for a unit cell not on B is p(D| Bc) ¼ 2.5 106. The weights of evidence are WB+ ¼ 0.982 and WB ¼ 1.056, respectively. In this example, the two posterior probabilities can be calculated without use of Bayes’ theorem. However, the weights themselves provide useful information as will be seen in the next section. The method of weights of evidence modeling was invented by Good (1950). In its original application to mineral resource potential mapping (Agterberg 1989; Bonham-Carter 1994), extensive use was made of medical applications (Spiegelhalter 1986).
c ln OðDjBÞ ¼ W þ B þ ln OðDÞ; ln OðDjB Þ ¼ W B þ ln OðDÞ
for presence or absence of B, respectively. If the area of the unit cell underlain by B is small in comparison with the total area underlain by B, the odds O are approximately equal to the probability p. The weights satisfy: Wþ B ¼ ln
pðB \ DÞ pðBc \ DÞ c ; WB ¼ pðB \ D Þ pðBc \ Dc Þ
Induction, Deduction, and Abduction, Fig. 4 Artificial example to illustrate the concept of combining two binary patterns related to occurrence of mineral deposits; (a) outcrop pattern of rock type, lineaments,
Basic Concepts and Artificial Example Figure 4 illustrates the concept of combining two binary patterns for which it can be assumed that they are related to occurrences of mineral deposits of a given type. Figure 4a shows locations of six hypothetical deposits, the outcrop pattern of a rock type (B) with which several of the deposits may be associated (Fig. 4b), and two lineaments that have
and mineral deposits; (b) rock type and deposits dilatated to unit cells; (c) lineaments dilatated to corridors; (d) superposition of three patterns. (Source: Agterberg et al. 1990, Fig. 1)
Induction, Deduction, and Abduction
651
Induction, Deduction, and Abduction, Fig. 5 Venn diagrams corresponding to areas of binary patterns of Fig. 4; (a) is for Fig. 5.1B; (b) is for Fig. 5.1C; (c) is for Fig. 5.1D (Source: Agterberg et al. 1990, Fig. 2)
been dilatated in Fig. 4c. Within the corridors around the lineaments, the likelihood of locating deposits may be greater than elsewhere in the study area. In Fig. 4b–d, the deposits are surrounded by a small unit area. This permits estimation the unconditional “prior” probability P(D) that a unit area with random location in the study area contains a deposit, as well as the conditional “posterior” probabilities P(D|B), P(D|C), and P(D|BC) that unit areas located on the rock type, within a corridor and both on the rock type and within a corridor contain a deposit. These probabilities are estimated by counting how many deposits occur within the areas occupied by the polygons of their patterns. The relationships between the two patterns (B and C), and the deposits (D) can be represented by Venn diagrams as shown schematically in Fig. 5. Operations such as creating corridors around line segments on maps and measuring areas can be performed by using Geographic Information Systems (GISs). The Spatial Data Modeller developed by Raines and Bonham-Carter (2007) is an example of a system that provides tools for weights of evidence, logistic regression, fuzzy logic, and neural networks. The availability of excellent software for WofE and WLR has been a factor in promoting widespread usage of these methods. Examples of applications to mapping mineral prospectivity can be found in Carranza (2004), and Porwal et al. (2010).
Bayes’ Rule for One or More Map Layers When there is a single pattern B, the odds O(D|B) for occurrence of mineralization if B is present is given by the ratio of the following two expressions of Bayes’ rule: PðDjBÞ ¼ PðBjDÞPðDÞ PðBjDÞPðDÞ ; P DjB ¼ where the set D represents PðBÞ PðBÞ thecomplementofD.Consequently,lnO(D| B)¼ lnO(D)þW+ BjDÞ . where the positive weight for presence of B is: W+ ¼ ln PPðBjD ð Þ
PðBjDÞ The negative weight for absence of B is: W ¼ ln PðBDÞ . The result of application of Bayes’ rule applied to a single map layer can be extended by using it as prior probability input for other map layers provided that there is approximate conditional independence (CI) of map layers. The order in which new patterns are added is immaterial. When there are ÞPðDÞ two map patterns as in Fig. 5.1: PðDjB \ CÞ ¼ PðB\CjD PðB\CÞ PðB\CjDÞPðDÞ and P DjB \ C ¼ . Conditional independence PðB\CÞ of D with respect to B and C implies:
PðB \ CjDÞ ¼ PðBjDÞPðCjDÞ; P B \ CjD ¼ P BjD P CjD : Consequently, PðDjB \ CÞ ¼ PðDÞ ¼P D
PðBjDÞPðCjDÞ ; P DjB \ C PðB \ CÞ P BjD P CjD : PðB \ CÞ
From these two equations it follows that: PðDÞPðBjDÞPðCjDÞ PðDÞPðBjDÞPðCjDÞ
.
This
expression Wþ 1
Wþ 2 .
is
PðDjB\CÞ PðDjB\CÞ
equivalent
¼ to
ln OðDjB \ CÞ ¼ ln OðDÞ þ þ The posterior logit on the left side of this equation is the sum of the prior logit and the weights of the two map layers. The posterior probability follows from the posterior logit. Similar expressions apply when either one or both patterns are absent. Cheng (2008) has pointed out that, since it is based on a ratio, the underlying assumption is somewhat weaker than assuming conditional independence of D with respect to B1 and B2. If there are p map layers, the final result is based on prior logit plus the p weights for these map layers. A good WofE strategy is first to achieve approximate conditional independence by preprocessing. A common problem is that final estimated probabilities usually are biased. If there are N deposits in a study area and the sum of all estimated probabilities is written as S,
I
652
Induction, Deduction, and Abduction
Induction, Deduction, and Abduction, Fig. 6 Surficial geology of Oak Ridge Moraine in study area according to Sharpe et al. (1997). (Source: Cheng 2004, Fig. 2)
Induction, Deduction, and Abduction, Fig. 7 Flowing wells versus distance from buffer zone around Oak Ridge Moraine in study area of Fig. 6. (Courtesy of Q. Cheng)
Induction, Deduction, and Abduction
WofE often results in S > N. The difference S-N can be tested for statistical significance (Agterberg and Cheng 2002). The main advantage of WofE in comparison with other methods such as weighted logistic regression is transparency in that it is easy to compare weights with one another. On the other hand, the coefficients resulting from logistic regression generally are subject to considerable uncertainty. The contrast C ¼ W+ W— is the difference between positive and negative weight for a binary map layer. It is a convenient measure for strength of spatial correlation between a point pattern and the map layer (Bonham-Carter et al. 1988; Agterberg 1989). It is somewhat similar to Yule’s C (1912) “measure of association” Q ¼ a1 aþ1 with a ¼ e . Both C and Q express strength of correlation between two binary variables that only can assume the values 1 (for presence) or 1 (for absence). Like the ordinary correlation coefficient, Q is confined to the interval [1, 1]. If the binary variables are uncorrelated, then E(Q) ¼ 0. If a binary map layer is used as an indicator variable, the probability of occurrence of a deposit is greater when it is present than when it is absent, and W + is positive whereas W — is negative. Consequently, C generally is positive.
653
However, in a practical application, it may turn out that C is negative. This would mean that the map layer considered is not an indicator variable. In a situation of this type, one could switch presence of the map layer with its absence, so that W+ and C both become positive (and W negative). An excellent strategy often applied in practice is to create corridors of variable width x around linear map features that are either lineaments as in Fig. 4 or contacts between different rock types (e.g., boundaries of intrusive bodies) and to maximize 2 C(x). From dQ da ¼ ðaþ1Þ2 being positive it follows that Q(x) and C(x) reach their maximum value at the same value of x (cf. Agterberg et al. 1990). Alternative approaches to WofE as discussed here have more recently been proposed by Baddeley et al. (2020).
Flowing Wells in the Greater Toronto Area Cheng (2004) has given the following application of Weightsof-Evidence modeling. Figure 6 shows surficial geology Oak Ridge Moraine (ORM) for a study on assessment of flowing water wells in surficial geology in the Greater Toronto area,
Induction, Deduction, and Abduction, Fig. 8 Flowing wells versus distance from relatively steep slope zone in study area of Fig. 6. (Courtesy of Q. Cheng)
I
654
Ontario. The ORM is a 150 km long east-west trending belt of stratified glaciofluvial-glaciolacustrine deposits. It is 5–15 km wide and up to 150 m thick. For more detailed discussion of the geology of ORM, see Sharpe et al. (1997). It is generally recognized that the ORM is the main source of recharge in the area. Figures 7 and 8 show flowing wells together with two map patterns related to ORM thought to be relevant for flowing well occurrence. Cheng (2004) used Weights-of-Evidence to test the influence of the ORM on locations of flowing wells. A number of binary patterns were constructed by maximizing the contrast C for distance from ORM. C reaches its maximum at 2 km meaning that a YES-NO binary pattern with YES on points belonging to ORM plus all points that occur less than 2 km from ORM, and NO for the remainder of the study area with points that are on ORM plus points more than 2 km away from ORM provides positive and negative weight that would be best to use for a binary pattern of enlarged ORM for which the contrast C ¼ W+ W— reaches its maximum value of spatial correlation. Two other binary patterns constructed in the same way by Cheng (2004) were distance from enlarged ORM (Fig. 7), and
Induction, Deduction, and Abduction
distance from a relatively thick glacial drift layer (Fig. 8). The posterior probability map shown in Fig. 9 is based on binary maps at which the spatial correlation reaches their maximum values in Figs. 7 and 8, respectively. It not only quantifies the relationship between the known artesian aquifers and these two binary map layers, it also outlines areas with no or relatively few aquifers that have good potential for additional aquifers.
Summary and Conclusions The philosophical terms “induction” and “deduction” are more frequently used than “abduction.” Induction is a method of reasoning from premises that supply some evidence without full assurance of the truth of the conclusion that is reached. It differs from deduction that starts from premises that are correct and leads to conclusions that are certainly true. Abduction starts with observations from which it seeks to find the simplest and most likely conclusion. Originally, geology was a science of induction and abduction.
Induction, Deduction, and Abduction, Fig. 9 Posterior probability map for flowing wells based on buffer zone around Oak Ridge Moraine (Fig. 7) and steep slope zone area (Fig. 8) in study area. (Courtesy of Q. Cheng)
Induction, Deduction, and Abduction
Mathematical statistics commenced as a science of deduction. Induction was introduced slowly with the concept of “equiprobability” and after increasing popularity of Bayes’ rule. Until recently, there remained significant disagreement between Bayesian statisticians and those, sometimes called “frequentists,” who avoided subjective notions in their statistical modeling. Bayesian data analysis aims to improve estimates of statistical parameters initially obtained by subjective reasoning. In the example of flowing wells given in this entry the prior probabilities are based on the assumption that the wells are randomly distributed across the study area but the posterior probabilities incorporate various types of evidence pertaining to the geographic locations of the wells.
Cross-References ▶ Bayes’s Theorem ▶ Digital Geological Mapping ▶ Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence
Bibliography Agterberg FP (1961) Tectonics of the crystalline basement of the Dolomites in North Italy. Kemink, Utrecht, 232 pp Agterberg FP (1974) Geomathematics. Elsevier, Amsterdam, 596 pp Agterberg FP (1989) Computer programs for mineral exploration. Science 245:76–81 Agterberg FP (2014) Geomathematics: theoretical foundations, applications and future developments. Springer, Heidelberg, 553 pp Agterberg FP, Cheng Q (2002) Conditional independence test for weights-of-evidence modeling. Nat Resour Res 11:249–255 Agterberg FP, Robinson SC (1972) Mathematical problems in geology. Proc 38th Sess Intern Stat Inst, Bull 38:567–596 Agterberg FP, Bonham-Carter GF, Wright DF (1990) Statistical pattern integration for mineral exploration. In: Gaal G, Merriam DF (eds) Computer applications in resource estimation, prediction and assessment for metals and petroleum. Pergamon, Oxford, pp 1–21 Arbuthnot J (1710) An argument for divine providence from the constant regularity observed in the births of bot sexes. Phil Trans R Soc London 27:186–190 Baddeley A, Brown W, Milne RK, Nair G, Rakshit S, Lawrence T, Phatak A, Fu SC (2020) Optimal thresholding of predictors in mineral prospectivity analysis. Nat Resour Res. https://doi.org/10.1007/ s/1053-020-09769-2 Bayes T (1763) An essay towards solving a problem in the doctrine of chances. Phil Trans R Soc London 53:370–418 Bonham-Carter GF (1994) Geographic information systems for geoscientists: modelling with GIS. Pergamon, Oxford, 398 pp Carranza EJ (2004) Weights of evidence modeling of mineral potential: a case study using small number of prospects, Abra, Philippines. Nat Resour Res 13(3):173–185 Chamberlin TC (1897) The method of multiple working hypotheses. J Geol 5:837–848
655 Cheng Q (2004) Application of weights of evidence method for assessment of flowing wells in the Greater Toronto Area, Canada. Nat Resour Res 13(2):77–86 Cheng Q (2008) Non-linear theory and power-law models for information integration and mineral resources quantitative assessments. In: Bonham-Carter GF, Cheng Q (eds) Progress in geomathematics. Springer, Heidelberg, pp 195–225 Croll J (1875) Climate and time in their geological relations. Appleton, New York Fisher-Box J (1978) R.A. Fisher – The life of a scientist. Wiley, New York, 512 pp Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari RDB (2013) Bayesian data analysis, 3rd edn. CRC Press, Boca Raton, 660 pp Good H (1950) Probability and the weighing of evidence. Griffin, London Gradstein FM, Ogg JG, Schmitz MB, Ogg GM (eds) (2020) Geologic time scale, two volumes. Elsevier, Amsterdam, 809 pp Hacking I (2006) The emergence of probability, 2nd edn. Cambridge University Press, 209 pp Harrison JM (1963) Nature and significance of geological maps. In: Albritton CC Jr (ed) The fabric of geology. Addison-Wesley, Cambridge, MA, pp 225–232 Hinnov IA, Hilgen FJ (2012) Cyclostratigraphy and astrochronology. In: Gradstein FM, Ogg JG, Schmitz M, Ogg G (eds) The Geologic Time Scale 2012. Elsevier, Amsterdam, pp 63–113 Jeffries H (1939) The theory of probability, 3rd edn. Oxford University Press, 492 pp Lindley DV (1987) The probability approach to the treatment of uncertainty in artificial intelligence and expert systems. Stat Sci 2(1):17–24 Matheron G (1981) Random sets and integral geometry. Wiley, New York Mcgrayve SB (2011) The theory that would not die. Yale University Press, New haven, 320 pp Milankovitch M (1941) Kanon der Erdbestrahlung und eine Auswendung auf das Eiszeitproblem. R Serb Acad Spec Publ 32, Belgrade Nieuwenkamp W (1968) Natuurfilosofie en de geologie van Leopold von Buch. K Ned Akad Wet Proc Ser B 71(4):262–278 Porwal A, González-Álvarez I, Markwitz V, McCuaig TC, Mamuse A (2010) Weights-of-evidence and logistic regression modeling of magmatic nickel sulfide prospectivity in the Yilgarn Craton, Western Australia. Ore Geol Rev 38:184–196 Sagar BSD (2018) Mathematical morphology in geosciences and GISci: an illustrative review. In: Sagar BSD, Cheng Q, Agterberg FP (eds) Handbook of mathematical geosciences. Fifty years of IAMG. Springer, Heidelberg, pp 703–740 Schwarzacher W (1993) Cyclostratigraphy and the Milankovitch theory. Elsevier, Amsterdam Sharpe DR, Barnett PJ, Brennand TA, Finley D, Gorrel G, Russell HAJ (1997) Surficial geology of the greater Toronto and Ridges Moraine areas, compilation map sheet. Geol Surv Can Open File 3062, map 1: 200,000 Spiegelhalter DJ (1986) Uncertainty in expert systems. In: Gale WA (ed) Artificial intelligence and statistics. Addison-Wesley, Reading, pp 17–55 Staat W (1993) On abduction, deduction, induction and the categories. Transact Charles S Peirce Soc 29:97–219 Staub R (1928) Bewegungsmechanismen der Erde. Borntraeger, Stuttgart Van Bemmelen RW (1961) The scientific nature of geology. J Geol:453–465 Von Buch L (1842) Ueber Granit und Gneiss, vorzűglich in Hinsicht der äusseren Form, mit welcher diese Gebirgsarten auf Erdflache erscheinen. Abh K Akad Berlin IV/2(18840):717–738 Wegener A (1966) The origin of continents and oceans, 4th edn. Dover, New York
I
656 Westergård AH, Johansson S, Sundius N (1943) Beskrivning till Kartbladet Lidköping. Sver Geol Unders Ser Aa 182 Winchester S (2001) The map that changed the world. Viking, Penguin Books, London Yule GU (1912) On the methods of measuring association between two attributes. J Roy Stat Soc 75:579–642
International Generic Sample Number Sarah Ramdeen1, Kerstin Lehnert1, Jens Klump2 and Lesley Wyborn3 1 Columbia University, Lamont Doherty Earth Observatory, Palisades, NY, USA 2 Mineral Resources, CSIRO, Perth, WA, Australia 3 Research School of Earth Sciences, Australian National University, Canberra, ACT, Australia
Synonyms IGSN
Definition IGSN (International Generic Sample Number) is a persistent identifier for material samples and specimens. Material samples (also referred to as physical samples) include objects such as cores, minerals, fossil specimens, synthetic specimens, water samples, and more. The concept of the IGSN was originally developed for the earth sciences but has seen growing adoption in other domains including, but not limited to, natural and environmental sciences, materials sciences, physical anthropology, and archaeology. Persistent identifiers are unique labels given to an object (often a digital resource) in order to facilitate unique identification. In the case of IGSN, the identifier is assigned as part of a metadata record representing a material sample. The metadata record includes provenance and other information from along the samples lifecycle such as where the sample was collected, who collected it, and descriptive metadata ranging from spatial coordinates, collection method, mineral classification to identifiers for related publications. Once an IGSN is assigned, it does not change, it is persistent and globally unique. The metadata record may be updated but the identifier will remain unchanged. IGSN provides sample citations in the literature which can be used to unambiguously track a sample and build connections between data and analysis derived from the same samples in different applications (Conze et al. 2017). A key feature of an IGSN is that it is “resolvable.” IGSNs can be used to retrieve sample metadata by using Internet protocols to obtain the digital resource the IGSN identifies.
International Generic Sample Number
IGSN uses the Handle system (http://www.handle.net/), the same technical infrastructure used by Digital Object Identifiers (DOIs), to resolve an IGSN to the appropriate resource location on the web. To resolve an IGSN, one would append the URL “https://hdl.handle.net/10273/” with the identifier. For example, to resolve the IGSN “BFBG-154433,” visit https://hdl.handle.net/10273/BFBG-154433. The resulting URL will “resolve” to the webpage hosting the metadata profile for the sample represented by the IGSN. While IGSN functions similarly to DOIs, they do have important differences. DOIs are used to identify objects like journal articles, publications, and datasets. DOIs have standard metadata requirements, which are optimized for the types of objects they support. IGSN have metadata requirements that are tailored to material samples. They include metadata fields that document the scientific nature of the samples and their provenance. The metadata fields can be community specific, capturing the critical metadata for a particular community or domain. IGSNs, like other persistent identifiers, are important because they help support data management and comply with the principles system for making scientific data and samples FAIR: findable, accessible, interoperable, and reusable (Wilkinson et al. 2016). FAIR samples support reproducible research, enable access to publicly funded assets, and allow researchers to leverage existing research investments by way of new analytical techniques or by augmenting existing data through reanalysis. For example, the use of IGSN allows scientists to unambiguously link analytical data for individual samples which were generated in different labs and published in different articles or data systems. Ultimately these identifiers will allow researchers to link data, literature, investigators, institutions, etc., creating an “Internet of samples” (Wyborn et al. 2017). IGSNs are a recommended persistent identifier by the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS) and publishers such as AGU and Elsevier.
History and Evolution of IGSN The concept of the IGSN was developed in 2004 at the Lamont-Doherty Earth Observatory at Columbia University with support from the National Science Foundation (NSF) (Lehnert et al. 2004). The first registration agent (also known as an allocating agent) to issue IGSNs was SESAR, the S y s t e m f o r E a r t h S a m p l e R e g i s t r a t i o n ( w w w. geosamples.org). The IGSN e.V. (www.igsn.org), the implementation organization of the IGSN, was formed in 2011. It is an international, member based, nonprofit organization which oversees the federated IGSN registration service and operates the IGSN Central Registry. The IGSN e.V. has more than 20 members (full and affiliate). The members are defined as
International Generic Sample Number
organizations who operate or are developing an allocating agent. Affiliate members are organizations which support the IGSN community and use existing allocating agents for sample registration. As part of the federated IGSN system, allocating agents “allocate” IGSN identifiers and ensure the persistence of metadata records and IGSN metadata profile pages. In 2018, the Alfred P. Sloan Foundation funded the IGSN 2040 project “to re-design and mature the existing organization and technical architecture of the IGSN to create a global, scalable, and sustainable technical and organizational infrastructure for persistent unique identifiers (PID) of material samples” (IGSN e.V. 2019). The primary outcomes of the project have led to negotiations with DataCite to establish a partnership which will provide scalable and sustained IGSN registration, development of a samples “Community of Communities” to scale activities such as standards for sample identifiers, community engagement, broader adoption of IGSN, and a redesigned technical architecture which embraces modern web technologies (Lehnert et al. 2021a).
Current System Architecture of the IGSN e.V. In simplified terms, the IGSN system architecture (Fig. 1) is made up of four roles: the IGSN Central Registry, allocating agents, end users, and metadata portals. As previously mentioned, the IGSN Central Registry is based on the Handle System. The Handle System is a distributed system for assigning identifiers (handles) for digital objects while ensuring that the identifier is unique and persistent (Arms and Ely 1995). The IGSN Central Registry is a global server which stores the identifiers, registration metadata about the objects being identified, and provides a service to resolve identifiers. The IGSN Central Registry uses a REST API (https://doidb. wdc-terra.org/igsn/static/apidoc) for its federated registration service.
International Generic Sample Number, Fig. 1 Simplified system architecture of the IGSN registration. Source: https://igsn.github.io/ system/
657
Allocating agents are the local service providers. They provide a range of services including registration (or minting) services to one or more sample repositories or data centers. See Table 1 below for a list of IGSN allocating agents and the community they serve. New agents will emerge as new communities adopt IGSN. Allocating agents capture registration metadata and descriptive metadata, which make up the IGSN metadata kernel. Registration metadata are the metadata elements required for the registration and administration of IGSN (http://schema.igsn.org/registration/). Descriptive metadata are dependent on the “community of practice” of the allocating agent and capture information used to catalog samples. This metadata is used to support discovery and access to samples. Descriptive metadata are not stored by the IGSN Central Registry. Allocating agents are responsible for storing the descriptive metadata and providing access to it for harvesting using the IGSN metadata kernel or Dublin Core. End users of IGSN range from individual researchers or research teams to curators at sample repositories or museums. They may be using IGSN in their management of material sample collections, small and large. These users make decisions about what samples to register with IGSN and create metadata records. They register their sample metadata at an allocating agent. Allocating agents ensure the persistence of the submitted metadata and, depending on the agent, may provide curatorial review of the records to ensure they adhere to community standards. Ultimately, the users are responsible for maintaining the accuracy and quality of their metadata content. End users of IGSN also include those looking for existing samples to reuse. Samples may be reused in new and different ways or to validate previous work. Rich descriptive metadata allows new users to evaluate existing samples and may help to reduce the need to conduct field research to gather new samples at the same site. Metadata portals are sample catalogs which harvest metadata from IGSN allocating agents. These metadata portals are used to support discovery and access to material samples. Metadata portals harvest sample metadata from allocating agents using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) protocols. IGSN metadata portals serve specific communities of practice. The various sample communities require different metadata to identify and describe the scientific qualities of their research objects as defined by their community. There isn’t a “one size fits all” approach to sample metadata. Example community portals can be viewed at http://igsn2.csiro.au/ and http://www. geosamples.org/. Syntax and Metadata IGSN are represented by string values which can contain the following characters ^[A-Za-z]{2-4}[A-Za-z0-9.-]{1-71}$. They are not intended to be read by humans but can contain some information embedded in the identifier. IGSN begins
I
658
International Generic Sample Number
International Generic Sample Number, Table 1 List of IGSN allocating agents Allocating agent name Australian Research Data Commons (ARDC) Christian Albrechts Universität Kiel CSIRO Mineral Resources Cyber Carothèque Nationale (CNRS) Geoscience Australia German Research Centre for Geosciences GFZ, Potsdam Institut Français de Recherche pour l’Exploitation de la Mer (IFREMER) Korea Institute of Geoscience & Mineral Resources (KIGAM) MARUM Centre for Marine Environmental Sciences, Univ. Bremen System for Earth Sample Registration (SESAR)
URL https://ardc.edu.au/services/ identifier/igsn/ https://www.ifg.uni-kiel.de/en/ http://www.csiro.au/en/ Research/MRF https://www.igsn.cnrs.fr/ http://ldweb.ga.gov.au/igsn/ http://dataservices.gfz-potsdam. de/mesi/overview.php?id¼38 https://campagnes. flotteoceanographique.fr/search https://data.kigam.re.kr/
Community of practice Sample holders and curators affiliated with Australian research organizations that do not offer their own IGSN minting service Kiel facilities and staff CSIRO facilities and staff Under development Australian Government Geological Surveys and other Australian state and federal government users GFZ facilities and staff IFREMER facilities and staff Korea’s geoscience research and academic sectors
http://www.marum.de/
MARUM facilities and staff
http://www.geosamples.org/
Supports registration by the general public
with a prefix which includes “super-namespace” codes among the characters which represent the allocating agent where the IGSNs were minted (e.g., AU for Geoscience Australia). At SESAR, IGSNs have a five-digit prefix, which includes the super-namespace code IE, followed by a three-character personal namespace selected by the user. IGSNs are typically less than 20 characters in length (e.g., AU999971; IEUJA0011; ICDP5054EHW1001). The remaining characters after the super-namespace and personal namespace are randomly assigned during the registration process. Depending on the allocating agent, the characters after the super-namespace and personal namespace may also be user assigned. There are some exceptions to the five-digit prefix. IGSNs registered at SESAR before the founding of the IGSN e.V. are grandfathered in with three-digit prefixes. These IGSN do not include the super-namespace code signifying the allocating agent where the samples were registered. This concept of the super-namespace was added once the federated concept of the IGSN e.V. was developed. IGSN operates using the handle-namespace “10273.” The address of the IGSN resolver is http://hdl.handle.net/10273. As previously outlined, IGSN can be resolved by appending the IGSN to this address, for example, http://hdl.handle.net/ 10273/IEUJA0011. This address then redirects to the metadata profile page (Fig. 2) maintained by the allocating agent which registered the sample. As previously mentioned, the IGSN metadata kernel is divided into two major categories, registration metadata and descriptive metadata. Registration metadata are the metadata elements required by the IGSN e.V. as part of the minting process. Descriptive metadata are dependent on the community of practice of the allocating agent. The IGSN e.V. has a
suggested list of descriptive metadata which “provide a minimum set of elements to describe a geological sample. The schema only contains elements that do not change over time, in analogy to being a ‘birth certificate’ of a sample” (IGSN e.V. 2016). Descriptive metadata is not prescribed by the IGSN e.V. as the allocating agents are best situated to determine what metadata is required to support their community of practice. The IGSN metadata kernel aligns with the DataCite Metadata Schema 4.0. Future Technical Architecture of the IGSN In order to increase the sustainability and scale IGSN registration capabilities to the billions, the IGSN 2040 project has developed a new architecture for the IGSN system and services (Klump et al. 2020). The redesign involves moving from XML based metadata schemas to using web architecture principles where metadata would be stored using JSON-LD. The use of JSON-LD will allow allocating agents greater flexibility in supporting community specific variations on their metadata profiles. Ultimately the use of web architecture principles will support the scaling of publishing and harvesting of IGSNs to billions of samples, which will lead to broader services for metadata portals (referred to as metadata aggregators in the proposed architecture) and end users. These services include the support of knowledge graphs which might connect samples to other related identifiers or resources.
Adoption of IGSN As of 2021, nearly ten million samples have been registered with IGSN. IGSN has been adopted by geological surveys in
International Generic Sample Number
659
International Generic Sample Number, Fig. 2 Sample metadata profile. (Source: IEUJA0011)
I
the USA, the UK, Australia, Korea, and other organizations and universities globally. In the USA, current users include the Smithsonian Institution, the US Department of Energy, NOAA-NCEI, US core repositories, and the International Continental Drilling Program. IGSN is a recommended best practice in support of Open and FAIR samples by publishers and sample focused communities. IGSN can be used to track samples and link sample data along its lifecycle from field collection to analysis and publication and to archive and reuse. IGSN are being incorporated into data systems in order to create an efficient, collaborative pipeline (Lehnert et al. 2021b). This includes iSamples (www. isamples.org), StraboSpot (www.strabospot.org), EarthChem (www.earthchem.org), SPARROW (sparrow-data.org), and Macrostrat (www.macrostrat.org).
Moving Forward IGSN e.V. and DataCite will finalize their partnership in late 2021. As a result, the system and technical architecture of the IGSN may undergo some changes. This includes the implementation of the newly developed technical architecture built on web architecture principles. As interest and adoption of IGSN are expanding beyond the earth sciences, the IGSN e.V. will be changing the definition of the IGSN acronym. The change from geo to something more inclusive will better represent the diverse sample community around the IGSN. The change is expected to be announced in early 2022.
Summary IGSN is a persistent identifier for material samples and specimens. They are used as citations to samples which can be tracked throughout their lifecycle. IGSN are globally unique and are resolvable, meaning they use Internet protocols to direct users to a persistent webpage which hosts metadata about the sample they represent. IGSN can be obtained from one of the allocating agents listed in Table 1. Allocating agents represent different communities of practice and may have different requirements for the descriptive metadata about a sample. The IGSN e.V., the implementation office of the IGSN, supports the IGSN Central Registry. Allocating agents mint IGSN through the Central Registry and must make their sample metadata available for harvesting by metadata portals. Allocating agents may offer these services themselves. The IGSN 2040 project made recommendations to scale the IGSN towards the future. This includes a partnership with DataCite and the design of a web architecture-based system. These changes will enable broader services related to the IGSN and allow IGSN to scale towards the future.
Cross-References ▶ Data Acquisition ▶ Data Life Cycle ▶ FAIR Data Principles ▶ Geoinformatics
660
▶ Machine Learning ▶ Metadata ▶ Open Data
Bibliography Arms W, Ely D (1995) The handle system: a technical overview. Available via http://www.cnri.reston.va.us/home/cstr/handle-overview. html. Accessed 30 Sept 2021 Conze R, Lorenz H, Ulbricht D, Elger K, Gorgas T (2017) Utilizing the international generic sample number concept in continental scientific drilling during ICDP expedition COSC-1. Data Sci J 16:2. https:// doi.org/10.5334/dsj-2017-002 IGSN e.V. (2016) Metadata. In: Technical Documentation of the IGSN. Available via https://igsn.github.io/metadata/. Accessed 30 Sept 2021 IGSN e.V. (2019) IGSN 2040. In: IGSN. Available via https://www.igsn. org/igsn-2040/. Accessed 30 Sept 2021 Klump J, Lehnert K, Wyborn L, Ramdeen S (2020) IGSN 2040 technical steering committee meeting report. Zenodo https://doi.org/10.5281/ zenodo.3724683 Lehnert K, Goldstein S, Lenhardt W, Vinayagamoorthy S (2004) SESAR: addressing the need for unique sample identification in the Solid Earth Sciences. American Geophysical Union Fall Meeting 2004 SF32A-06 Lehnert K, Klump J, Ramdeen S, Wyborn L, HaakL (2021a) IGSN 2040 summary report: defining the future of the IGSN as a global persistent identifier for material samples. Zenodo https://doi.org/10.5281/ zenodo.5118289 Lehnert K, Quinn D, Tikoff B, Walker D, Ramdeen S, Profeta L, Peters S, Pauli J (2021b) Linking data systems into a collaborative pipeline for geochemical data from field to archive. EGU General Assembly Conference Abstracts 2021EGUGA..2313940L Wilkinson M, Dumontier M, Aalbersberg I et al (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018. https://doi.org/10.1038/sdata.2016.18 Wyborn L, Klump J, Bastrakova I, Devaraju A, McInnes B, Cox S, Karssies L, Martin J, Ross S, Morrissey J, Fraser R (2017) Building an Internet of Samples: The Australian Contribution. EGU General Assembly Conference Abstracts 2017EGUGA..1911497W
Interpolation Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Interpolation is the methodology by which unknown values of a variable are computed within a range of discrete data points using their known values. These values, both known and unknown, are governed by the assumption of an underlying continuous function. Interpolation involves identifying or approximating this continuous function, which is then used to compute the values at desired data points.
Interpolation
Definition and History For interpolation of a single variable x, the ordered set of tuples S is known, such that S ¼ {(x0, y0), (x1, y1), . . ., (xn, yn)}, where xi for i ¼ 0, 1,. . ., n is sorted in increasing order, then the interpolating function, or interpolant, f(x) is determined, such that f(xi) ¼ yi 8 i [0, n], and i ℤ+. This is referred to as univariate interpolation. Hereafter, the “data points” (x, y) will be simply referred to as “points” for the sake of readability. Similarly, multivariate interpolation deals with interpolation of multiple variables, which can be seen analogous to reconstruction of a continuous function in a multidimensional space. These functions are represented in the form of polynomials. Thorvald Thiele has defined in his book Interpolationsrechnung in 1909 that interpolation is the “art of reading between the lines in a numerical table” (Meijering 2002). In science and engineering, mathematically modeling discrete values as a continuous function gives a tensor field, which is a generalization of scalar field and vector field in its zeroth and first order, respectively. The discrete points, which are used for determining the interpolant, are obtained through sampling, observation, or experimentation. Interpolation is the process by which the continuous function of a variable or the field is reconstructed from its discrete values. In this regard, sampling based on Nyqvist theorem is appropriately needed to reduce reconstruction error. In the scope of reconstruction, there is a salient difference between approximation and interpolation. The approximation theorem states that every continuous function on a closed interval can be approximated uniformly by a polynomial to a predetermined accuracy. However, the approximating function, unlike the interpolating one, need not guarantee that the function value at the discrete points is equal to the known values. The history of interpolation dates back to seventeenth century in the age of scientific revolution, even though there is evidence of usage of history in texts from ancient Babylon and Greece, the early medieval China and India, and the late medieval Arab, Persia, and India (Meijering 2002). Interpolation theory was further developed in the Western countries by scientists, notably including Copernicus, Kepler, Galileo, Newton, and Lagrange, to promote their work in astronomy and physics.
Univariate Interpolation The simplest forms of interpolation are implemented using univariate interpolants. There are several widely used interpolants – linear, piecewise linear, polynomial, trigonometric interpolants, etc. Linear interpolation is by far the most popular one used across different domains as it has the best tradeoff between accuracy and efficiency. The criteria for picking an interpolant for a specific computational application
Interpolation
661
I
Interpolation, Fig. 1 Piecewise linear interpolation generated by adding points progressively. The newly added points are annotated by the red circles on the visualization of linear interpolation. (Image source:
The screenshots of visualization of the interpolation were generated from an online browser-based application, Online Tools, developed by Timo Denk, https://tools.timodenk.com)
depend on both the required accuracy and its efficiency, measured using the computation time. For example, computer graphics applications, including geovisualization, use linear interpolation owing to two reasons. Firstly, the prescribed accuracy for the color mapping, i.e., mapping colors to real values, is limited by the human perception of color. Secondly, the expected computational efficiency is in creating interactive graphical user interface (GUI) applications. Linear interpolation entails finding the line equation for given two points, described by the two tuples of (x, y) values. For two discrete points (x1, f(()x1)) and (x2; f(()x2)), the linear interpolant is ððÞx1 Þ given by f ððÞxÞ ¼ f ððÞxx22Þf :ðx x1 Þ þ f ðx1 Þ. x1 In order to reduce the reconstruction error, linear interpolation can be extended as piecewise linear interpolation, which introduces linear interpolants between two consecutive tuples in the ordered list of the tuples, sorted by their x-value. Figure 1 shows how the piecewise linear interpolation can be progressively generated. The degree of the interpolant can be increased by including more points to generate each interpolant, as shown in Fig. 2. In such a scenario, each interpolant between a curve is substituted by an interpolant of a higher degree using derivatives in the case of quadratic function and knots in a spline function in the case of a cubic spline function.
Extending the examples so far, each segment in the piecewise interpolants can include more than two points, thus using n points in each segment to have a polynomial interpolant of degree (n – 1), at the most. The key aspect of piecewise interpolants is that the points at the end of the segment guarantee up to C0 continuity, i.e., continuity by the function value or the zeroth derivative. The interpolant type and application revolve around the mathematical function that gives the function values at the sample points. Apart from polynomials, the interpolants can be exponential, logarithmic, trigonometric, etc. There are also specific interpolation methods through the use of Gaussian processes, such as kriging, which has several variants of its own, e.g., simple, ordinary, and regression kriging (see the chapter on “Kriging” for more details).
Multivariate Interpolation Now, interpolation can be extended from a single variable to multiple ones. Multivariate interpolation leads to higher dimensional interpolation, e.g., bilinear, and trilinear interpolants. These extensions of linear interpolants involve independent interpolation across different dimensions. This is
662
Interpolation
Interpolation, Fig. 2 Increasing the degree of interpolation in piecewise functions for the same set of points used in piecewise linear interpolation given in Fig. 1, going from linear to quadratic to cubic,
from left to right. (Image source: The screenshots of visualization of the interpolation were generated from an online browser-based application, Online Tools, developed by Timo Denk, https://tools.timodenk.com)
done by adding more points that are interpolated in one dimension, but used as sample points in another dimension, with the constraint that all dimensions are used for interpolation only once. For two-dimensional (2D) and three-dimensional (3D) spatial applications, it is important to discuss bivariate and trivariate interpolation, respectively. As already discussed, interpolation involves the identification of the known values that would generate the interpolant. These known values in all applications are essentially immediate or local neighbors. Finding neighborhoods is straightforward in gridded and structured datasets as the indices, which are used to search and retrieve a point, are also helpful in finding the neighbors. For instance, in a 3D lattice structure, a grid or lattice point P(i, j, k) with indices along x, y, and z axes, respectively, has 26 local neighbors whose indices are constrained within [i – 1, i þ 1], [j – 1, j þ 1], and [k – 1, k þ 1] simultaneously. However, in the case of scattered (2D or 3D) points, there are several methods that could be used once neighbors are identified. Identifying neighbors is conventionally based on sorting the distance between the concerned point and the plausible neighboring points. Several constraints can be applied as neighborhood search criteria, e.g., spherical radius and k-nearest neighbors. Since the neighborhood is determined using the distance measure, which itself could be the
conventionally used Euclidean, or Hamming, Chebyshev, etc., special interpolants are used for scattered points. These include Shepard’s method [inverse distance weighting (IDW)] (Shepard 1968), Sibson’s interpolation (natural neighbor interpolation) (Sibson 1981), and radial basis function interpolation (Hardy 1971). These interpolants have a global as well as a local variant. In the global variant, all points are considered, whereas only local neighbors are considered in the local variant of the interpolant. The global or the local extent of the points considered is selected for computing weights for the known values. Consider the case of IDW to understand the weighting process better, where the interpolant f(x) is computed using n selected points, such that, x, xi ℝd, i.e., d-dimensional space, and f(xi) are the known values. Then, with distance between x and y given as d(x, y) and the distance-decay parameter p ℤ+, f ðxk Þ, f ð xÞ ¼
if dðx, xk Þ ¼ 0, where i ¼ k,
n i¼1 wi ðxÞf ðxi Þ , n i¼1 wi ðxÞ
else d ðx, xi Þ 6¼ 0, 8i and wi ðxÞ ¼ ðd ðx, xi ÞÞðpÞ In a variant of IDW, the distance-decay parameter has been adaptively computed using the point pattern of the local neighborhood (Lu and Wong 2008). IDW and its adaptive
Interpolation
663
I
Interpolation, Fig. 3 An example of a choropleth map, as generated of India, using the Orange (anaconda) Python library, an open-source data visualization and mining software. (Image courtesy: https://orangedatamining.com/widget-catalog/geo/choroplethmap/)
variants have been routinely used for 3D geological modeling (Liu et al. 2021).
Applications There are several uses of interpolation, of which data visualization has several applications. Contour extraction is a visualization technique that entails the use of linear interpolation and its higher-dimensional variants in gridded datasets for finding contour lines in 2D and isosurfaces in 3D space. Contour lines are computed using Marching Squares and isosurfaces using the Marching Cubes method (Lorensen and Cline 1987), in 2D and 3D datasets, respectively. Contour lines or isosurfaces are defined as a locus of points all having the same function value. For example, contour tracking has been effectively used for superresolution border segmentation of remotely sensed images for geomorphologic studies (Cipolletti et al. 2012). Linear interpolation is also widely used in several visualization techniques such as streamline computation, direct volume rendering, graphical rendering techniques with shading models, etc. In visualization, linear interpolation is used in
color mapping that has varied applications, such as choropleth maps, i.e., cartographic maps with colors in regions within geopolitical or other boundaries. As shown in Fig. 3, the color legend in the choropleth map indicates how color is linearly interpolated based on the known values and discretized for the numerical bins of the values. The choice of interpolation techniques is based on performance – efficiency and accuracy. Reconstruction errors are to be studied for the latter. Apart from these metrics, the level of continuity that is needed from the interpolant, e.g., C0, C1 continuity, is determined by the application, and it governs the choice of the interpolant.
Future Scope Interpolation has been ubiquitous for mathematical modeling required for a variety of applications in geoscience. It has found its place as a key step in modern data science workflows, such as deep learning. For instance, residual learning networks (ResNets) have been used for seismic data interpolation (Wang et al. 2019).
664
Interquartile Range
Cross-References ▶ Data Visualization ▶ K-nearest Neighbors ▶ Kriging ▶ Neural Networks
in nondecreasing order, i.e., x1:n x2:n xn:n are the order statistics of the sample. The lower quartile is q1=4 ðxÞ ¼ xn4:n , the upper quartile is q3=4 ðxÞ ¼ x3n4 :n , and the interquartile range is IQR(x) ¼ q3/4(x) – q1/4(x). Sample sizes for which n/4 is not an integer require a special definition.
Usage and Interpretation Bibliography Cipolletti MP, Delrieux CA, Perillo GM, Piccolo MC (2012) Superresolution border segmentation and measurement in remote sensing images. Comput Geosci 40:87–96 Hardy RL (1971) Multiquadric equations of topography and other irregular surfaces. J Geophys Res 76(8):1905–1915 Liu Z, Zhang Z, Zhou C, Ming W, Du Z (2021) An adaptive inversedistance weighting interpolation me thod considering spatial differentiation in 3D geological modeling. Geosciences 11(2):51 Lorensen WE, Cline HE (1987) Marching cubes: a high resolution 3D surface construction algorithm. ACM SIGGRAPH Comp Graph 21(4):163–169 Lu GY, Wong DW (2008) An adaptive inverse-distance weighting spatial interpolation technique. Comput Geosci 34(9):1044–1055 Meijering E (2002) A chronology of interpolation: from ancient astronomy to modern signal and image processing. Proc IEEE 90(3): 319–342 Shepard D (1968) A two-dimensional interpolation function for irregularly-spaced data. In: Proceedings of the 1968 23rd ACM national conference, pp 517–524 Sibson R (1981) A brief description of natural neighbour interpolation (Chapter 2). In: Interpreting multivariate data, pp 21–36 Wang B, Zhang N, Lu W, Wang J (2019) Deep-learning-based seismic data interpolation: a preliminary result. Geophysics 84(1): V11–V20
The interquartile range is analogous to the standard deviation in the sense that it measures the dispersion or scale of a sample. While the standard deviation computes the spread around the mean, the interquartile range depends on the sample’s order statistics. The interquartile range is, therefore, a robust measure in the sense that a few outlying observations have little or no effect on it. Such robustness justifies the use of the interquartile range in outlier identification procedures. If X ~ N (m, s2) denotes a random variable with mean m ℝ and standard deviation s > 0, then its interquartile range is 2sf1(3/4), where f is its cumulative distribution function. This value can be approximated by 1.349s or by 27s/20. With this in mind, the sample standard deviation and 20/27 times the interquartile range of the sample are comparable, provided the data are outcomes of independent identically distributed random variables N(m, s2)-distributed. Comparing these two quantities is a simple way of checking this last hypothesis. The interquartile range appears in the boxplot as the distance between the hinges.
Definitions for All Cases of n ≥ 4
Interquartile Range Alejandro C. Frery School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
Hyndman and Fan (1996) provide a detailed discussion about the ways in which order statistics can be computed. These definitions are different estimators of the quantiles of a distribution. The (population or distributional) quantile α (0, 1) of a distribution is
Synonyms Qa ¼ F ðaÞ ¼ inf fFðxÞ ag, H-spread; Middle 50%; Midspread
Definition The interquartile range is a measure of dispersion or scale based on order statistics. Consider the univariate sample x ¼ (x1, x2, . . ., xn). For simplicity, assume that the sample size n is such that n/4 is an integer. Denote x ¼ ðx1:n , x2:n , . . . , xn:n Þ the sample x sorted
xℝ
where F is the cumulative distribution function that characterizes the distribution of the random variable X. This definition encompasses all types of distributions. Three values of α render well-known names: Q1/4 is the lower or first quartile, Q1/2 is the median or second quartile, and Q3/4 is the upper or third quartile. The thick red line in Fig. 1 shows the cumulative distribution function of a beta random variable with both shape parameters equal to 3, i.e.,
Interquartile Range
665
I Interquartile Range, Fig. 1 Populational and sample quantiles
Fð x Þ ¼
0 x3 x4 x5 þ 3 2 5 1
if x < 0, if 0 x 1,
ð1Þ
if x > 1:
The population (or “distributional”) first quartile, the median, and the third quartile are, respectively, Q1/4 = 0.359436, Q1/2 = 1/2, and Q1/4 = 0.640564 (the first and last values were obtained numerically). The red arrows in Fig. 1 show how they stem as the inverse of F. Figure 1 also shows a classical estimator of F based on the sample x ¼ (x1, x2, . . ., xn), namely, the empirical cumulative distribution function FðxÞ: 1 FðtÞ ¼ #f‘ : x‘ tg, for every t ℝ, n where we omitted the dependence of F on x, and #A denotes the cardinality of the set A. This function appears as “stairs” of intervals closed to the left and open to the right, and height 1/n. This function, or any other estimator of F, is the basis for computing sample quantiles of arbitrary order. We now take a sample of size n ¼ 12 from the Beta distribution characterized by the cumulative distribution function given in (1). This particular sample x is such that the first and third quartiles are q1/4 ¼ x3:12 ¼ 0.29, and q3/4 ¼ x9:12 ¼ 0.70 (after rounding). Black arrows show their position, and we notice that q1/4 < Q1/4, and that q3/4 > Q1/4. In this case, we computed the sample median as the midpoint between xn2:n and xn2þ1:n . After rounding, its value is q1/2 ¼ 0.55 and, therefore, q1/2 > Q1/2. Similar approaches are taken to
compute the first and fourth quartiles of samples of sizes n such that n/4 ℕ. Hyndman and Fan (1996) define nine ways of computing qα(x), for α (0, 1). All of these approaches are a weighted average of consecutive order statistics: ð iÞ
qa ðxÞ ¼ 1ðiÞ g xj:n þ
ðiÞ
gxjþ1:n ,
ð2Þ
jmþ1 (i) where i ¼ 1, 2, . . ., 9 is the method, jm n a n , and γ is a function of j ¼ bαn þ mc and g ¼ αn þ m – j. Any method should satisfy the following properties:
P1) (i)qα(x) is continuous on α (0, 1). P2) The usual definition of median is respected:
ðiÞ
q1=2 ðxÞ ¼
xnþ1:n 2 xn2:n þ xn2þ1:n 2
if n is odd, if n is even:
P3) The tails of the sample are treated equally: # ‘ : x‘
ðiÞ
qa ð xÞ ¼ # ‘ : x ‘
ðiÞ
q1a ðxÞ :
P4) The sample and distributional quantiles are analogous, i.e., # ‘ : x‘
ðiÞ
qa ðxÞ an:
P5) Where the inverse (i)q1(x) is uniquely defined, holds that
666
a) b)
Inverse Distance Weight
q (x1:n) > 0 and that (i)q1(xn:n) ¼ 1 depending on the model under consideration. Shepard’s method (Shepard, 1968) is a variation on inverse power, with two different weighting functions using two separate neighborhoods. The default weighting function for Shepard’s method is an exponent of 2 in the inner
i
1 D2i
n
þ i¼kn
Z i D14
i¼k
i
1 D4i
ð2Þ
Neighborhood Size The neighborhood size determines how many points are included in the inverse distance weighting. The neighborhood size can be specified in terms of its radius (in km), the number of points, or a combination of the two. If a radius is specified, the user also can specify an override in terms of a minimum and/or maximum number of points. Invoking the override option will expand or contract the circle as needed. If the user specifies the number of points, as an override of a minimum and/or maximum radius can be included. It also is possible to specify an average radius “R” based upon a specified number of points. Again, there is an override to expand or contract the neighborhood to include a minimum and/or maximum number of points. In Shepard’s method, there are two circular neighborhoods; the inner neighborhood is taken to be one-third the radius of the outer radius. Anisotropy Correction In many instances, the observation points are not uniformly spaced about the interpolation points. Several points are in a particular direction and fewer in direction orthogonal to the dominant direction. This condition is known as anisotropic condition. This situation produces a spatial bias of the estimate, as the clustered points carry an artificially large weight. The anisotropy corrector permits the weighted average to downweight clustered points that are providing redundant information. The user selects this option by setting the anisotropy factor to a positive value. A value of 1 produces its full effects, while a value of 0 produces no correction. This correction factor is defined by computing the angle between every pair of observation points in the neighborhood, relative to the observation point. Gradient Correction A disadvantage of the inverse weighted distance functions is that the function is forced to have a maximum or minimum at the data points (or on a boundary of the study region). The gradient corrector permits a nonzero gradient at the observation points. The implementation is such that the interpolated value is the sum of two values.
I
668
One of the key governing factor of equation 1 is the value of number of samples “n” and their spatial alignment around the location of the point of interest. Many geometric approaches are in practice for selection of data points. In most of the software implementations a circular filter with a radius “R” around the point of interest has been in use to select the location of sample values. Some other implementations use a rectangular or square filter around the point of interest. If the values of the phenomena have to be computed for gridded locations from the value of sampled scatter points such as generation of a DEM (Digital Elevation Model) from a scatter point of surveyed heights, then a moving circular disk or rectangle is considered for computing the values for uniformly spaced grid locations.
How Inverse Distance Weighted Interpolation Works? Inverse distance weighted (IDW) interpolation explicitly makes the assumption that things that are close to one another are more alike than those that are farther apart. To predict a value for any unmeasured location, IDW uses the measured values surrounding the prediction location. The measured values closest to the prediction location have more influence on the predicted value than those farther away. IDW assumes that each measured point has a local influence that diminishes with distance. It gives greater weights to points closest to the prediction location, and the weights diminish as a function of distance, hence the name inverse distance weighted. Weights assigned to data points are illustrated in the example 1. The geometric shape and dimension of the search neighbor is an important factor in effectiveness of IDW. It is decided depending upon the spatial distribution of the sampled data locations and the characteristic of the sampled data. Some of the frequently used geometric search patterns employed in IDW are: (a) radial search with mean radius “R”, (b) spherical geometry with mean radius “R” for scattered data in 3D, (c) rectangular geometry with (Length(L) x Width(W)), (d) elliptic geometry with (semi major axis(a) and semi minor axis (b)), and (e) disk-like structure with inner radius “r” and outer radius “R” as depicted in Fig 1 a, b, c, d. The geometric structure and its dimension decide the search neighbor for inclusion and exclusion of the values of sampled points for participating in computation of unknown value using IDW. Once search neighbor is decided the next step for software implementation of IDW is to answer the question whether location (xi, yi) from among the sampled locations is within the sample neighbor? This question can be answered using computational geometric queries such as the following:
Inverse Distance Weight
(a) (b) (c) (d)
Point_inside_circle(Xi, Yi, R) Point_inside_ellipse(Xi, Yi, a, b) Point_ inside_rectangle(Xi, Yi, L, W) Point_inside_Sphere (Xi, Yi, R)
If the Point Inside Neighbor geometry returns TRUE, then the corresponding value Vi of (Xi, Yi) is included for computation using IDW else it is discarded. Further if the spatial extend of the physical phenomena is spread over a large extend marked by a bounding rectangle (LxW), then the process of computing the values at different locations from the scattered data is obtained by repeatedly computing using the moving neighbor operator for the entire spatial domain. To consolidate the concept of IDW, let us analyze a reallife example. One need to compute the depth of snow cover at a high slope and hazard location from among the values of snow precipitation surveyed from a set of known locations surrounding the location under question. This scenario is depicted in the Fig. 2. The field observations of five locations surrounding the point and their distance data is tabulated (Table 1). Using IDW the value of precipitation at the location in question is computed using equation 3. Also how the variation in the computed value is manifesting when computed using inverse square distance weight (ISDW) is analyzed in equation 4. One can see the variation in the computed value by IDW and ISDW. The variation is very small implicating the effectiveness and the resilience of the method. IDW and ISDW are applied to above data separately to compute the precipitation of the snow in the location as depicted in Fig. 1. 1 1 1 1 1 þ 42 þ 17 þ 52 þ 27 5 8 14 12 9 ¼ 20:597589
34
34
1 1 þ 42 2 8 52
þ 17
1 142
þ 52
1 122
þ 27
ð3Þ 1 92
¼ 2:797387 ð4Þ The values computed at the location using IDW and ISDW are 26.0439 and 22.35799, respectively. Therefore it is highly encouraged to use IDW or ISDW method judiciously to compute an estimate of the phenomena. Often the choice of the type and dimension of the search neighbor for filtering the spatial locations which can participate in computation of the unknown value using IDW is governed by many factors:
Inverse Distance Weight
669
I
Inverse Distance Weight, Fig. 1 (a) Sampled locations within a circular neighbor, (b) elliptic neighbor for flow like phenomena, (c) circular disk neighbor, (d) spherical neighbor with undulated terrain
Depending on the type of physical phenomena and its spatial behavior the neighborhood geometry is selected.
• For chemical concentration or computation of value of precipitation of rain or snow circular neighborhood geometry should be more suitable. • For computing the pressure and fault line in mines, slope and aspect of undulated terrain and geological exploration sphere like structure is more preferable. • To find the exact location of brain from where seizure like EEG signals are emanating, spherical neighborhood geometry is more suitable.
• For random static phenomena the neighborhood geometry is circular with radius “R.” • For air flow or fluid flow kind of phenomena it is recommended to use the elliptic structure.
The Behavior of Power Function As mentioned above, weights are proportional to the inverse of the distance (between the data point and the location of prediction) raised to the power value p. As a result, as the
(a) Whether the phenomena is random and scattered spatially. (b) Whether the phenomenon is directional like flow of air, water, etc. (c) Whether the phenomena are static or influenced more by local factors such as seizure from brain.
670
Inverse Distance Weight
1.0 0.8
p=0
0.6
p=1
Relative weight
p=2
0.4 0.2 0.0 0
5
10 Distance
15
20
Inverse Distance Weight, Fig. 3 Relative weight of the known values decrease rapidly with increase of value of power factor “p.” This is illustrated in the plot weight-distance with varying “p”
Inverse Distance Weight, Fig. 2 Snow precipitation measured at different Locations
Inverse Distance Weight, Table 1 Precipitation measured in known locations Precipitation of snow in mm 34 42 17 52 27
Distance from the point of unknown value 5 8 14 12 9
Inverse of distance 1/D 1/5 ¼ 0.2 1/8 ¼ 0.125 1/14 ¼ 0.071428 1/12 ¼ 0.083333 1/9 ¼ 0.111111
Square of inverse of distance 1/D2 1/52 ¼ 0.04 1/82 ¼ 0.015625 1/142 ¼ 0.005102 1/122 ¼ 0.006944 1/92 ¼ 0.012345
distance increases, the weights decrease rapidly. The rate at which the weight decrease is dependent on the value of “p.” If p ¼ 0, there is no decrease with distance, and because each weight Di is the same, the prediction will be the mean of all the data values in the search neighborhood and IDW acts like an moving average interpolator. As p increases, the weights for distant points decrease rapidly. If the p value is very high, only the immediate surrounding points will influence the prediction. The nature of the weight becomes asymptotic with increase distance making the impact of the values of faraway locations less significant in the overall computation. This phenomena is depicted in the plot (Fig. 3). In most of the geo-statistical analysis the value of p is chose to be 2, although there is no theoretical justification to prefer this value over others, and the effect of changing p should be investigated by previewing the output and examining the cross-validation statistics.
The Selection of Search Neighborhood The philosophy of IDW modeling is that “things that are close to one another are more alike and related than those that are farther away.” As the locations get farther away, the measured values will have little relationship to the value of the prediction location. Also to speed us computations, one need to select a fixed set of known values from spatially close range and exclude the more distant points that will have little influence on the prediction. As a result, it is common practice to limit the number of measured values by specifying a search neighborhood. The shape of the neighborhood restricts how far and where to look for the measured values to be used in the prediction. Other neighborhood parameters restrict the locations that will be used within that shape. In the following image, five measured points (neighbors) will be used when predicting a value for the location without a measurement, the yellow point. The shape of the neighborhood is influenced by the input data and the surface one is trying to create from the data. If there are no directional influences in the data, and observation points are equally distributed in all the directions then simple circular or spherical structure is preferred depending on the dimension of the data points. However, if there is a directional influence in the data, such as a prevailing wind, one may want to adjust for it by changing the shape of the search neighborhood to an ellipse with the major axis parallel to the direction of the flow of the wind. The adjustment for this directional influence is justified because one know that locations upwind from a prediction location are going to be more similar at remote distances than locations that are perpendicular to the wind but located closer to the prediction location. Once a neighborhood shape has been specified, you can restrict which data locations within the shape should be used. One can define the maximum and minimum number of locations to use; also one can divide the neighborhood into sectors. If one divides the neighborhood into sectors, the maximum and minimum constraints will be applied to each sector.
Inverse Distance Weight
The output surface generated using IDW is sensitive to clustering and the presence of outliers. IDW assumes that the phenomenon being modeled is driven by local variation, which can be captured (modeled) by defining an adequate search neighborhood. Since IDW does not provide prediction standard errors, justifying the use of this model may be problematic.
Applications of IDW Given a set of known values such as elevation, rain in mm, pollution density or noise, there are different spatial interpolation techniques available to compute the value at unsampled location. IDW, Kriging, Voronoi tessellation, and moving average are some of the alternate tool. To estimate at points where you do not know its value, IDW uses spatial autocorrelation in its math. Closer values have more effect while farther away ones have less effect. Application Considering Spatial Autocorrelation in 3D Spatial interpolation is a main research method in 3D geological modeling, which has important impacts on 3D geological structure model accuracy. The inverse distance weighting (IDW) method is one of the most commonly used deterministic models, and its calculation accuracy is affected by two parameters: search radius and inverse-distance weight power value. Nevertheless, these two parameters are usually set by humans without scientific basis. To this end, we introduced the concept of “correlation distance” to analyze the correlations between geological borehole elevation values, and calculate the correlation distances for each stratum elevation. The correlation coefficient was defined as the ratio of correlation distance to the sampling interval, which determined the estimation point interpolation neighborhood. We analyzed the distance-decay relationship in the interpolation neighborhood to correct the weight power value. Sampling points were selected to validate the calculations. We concluded that each estimation point should be given a different weight power value according to the distribution of sampling points in the interpolated neighborhood. When the spatial variability was large, the improved IDW method performed better than the general IDW and ordinary kriging methods. Spatial interpolation is a main research method in 3D geological modeling, which has important impacts on 3D geological structure model accuracy. The inverse distance weighting (IDW) method is one of the most commonly used deterministic models, and its calculation accuracy is affected by two parameters: search radius and inverse-distance weight power value. Nevertheless, these two parameters are usually set by humans without scientific basis. To this end, we introduced the concept of “correlation distance” to analyze the correlations between geological borehole elevation values,
671
and calculate the correlation distances for each stratum elevation. The correlation coefficient was defined as the ratio of correlation distance to the sampling interval, which determined the estimation point interpolation neighborhood. We analyzed the distance-decay relationship in the interpolation neighborhood to correct the weight power value. Sampling points were selected to validate the calculations. We concluded that each estimation point should be given a different weight power value according to the distribution of sampling points in the interpolated neighborhood. When the spatial variability was large, the improved IDW method performed better than the general IDW and ordinary kriging methods.
Syntax or Signature of API Implementing IDW IDW(Surveyed_Data_Set_with_Location, Location_of_Unknown_Point, power_of_IDW, Dimension, z_field, search_neighborhood_geometry,cell_size, Output_Data_set) To execute IDW IDW(inPointFeatures, (Xi,Yi), zField, 2, 2, circular, 4, outDataSet)
Summary IDW uses the measured values surrounding the prediction location to compute a value for any unsampled location, based on the assumption that things that are close to one another are more alike than those that are farther apart. It selects the neighborhood points from among the domain using a neighborhood policy which defines the geometry and search distance, or number of closest points. The power of the inverse distance decides the degree of influence of the phenomena. The search geometry and the search range are some of the decision one need to take before applying IDW. One need to choose higher power settings for more localized peaks and troughs, the directional behavior of the physical phenomena often plays a crucial role in choosing the shape of the filter, upper or lower bound of sample values to compute. Some of the key features of IDW are • The predicted value is limited to the range of the values used to interpolate. Because IDW is a weighted distance average, the average cannot be greater than the highest or less than the lowest input. Therefore, it cannot create ridges or valleys if these extremes have not already been sampled. • IDW can produce a bull’s-eye effect around data locations. • Unlike other interpolation methods such as Kriging, IDW does not make explicit assumptions about the statistical properties of the input data. IDW is often used when the
I
672
input data does not meet the statistical assumptions of more advanced interpolation methods. • This method is well-suited to be used with very large input datasets. The Inverse Distance Weighting interpolation method is as flexible and simple to program. But it is often the case that other interpolation techniques like kriging can help obtain a more robust model.
Bibliography Alcaraz M, Vazquez-Sune E, Velasco V, Diviu M (2016) 3D GIS-based visualisation of geological, hydrogeological, hydrogeochemical and geothermal models. Zeitschrift Der Deutschen Gesellschaft Fur Geowissenschaften 167(4):377–388 Fisher NI, Lewis T, Embleton BJJ (1987) Statistical analysis of spherical data. Cambridge University Press, p 329 George YL, David WW (2008) An adaptive inverse-distance weighting spatial interpolation technique. Comput Geosci 34:1044–1055 Łukaszyk S (2004) A new concept of probability metric and its applications in approximation of scattered data sets. Comput Mech 33(4): 299–304. https://doi.org/10.1007/s00466-003-0532-2 Narayan P (2014) Computations in geographic information system. CRC Press. ISBN 978-1-4822-2314-9 Ozelkan E, Bagis S, Ozelkan EC, Ustundag BB, Yucel M, Ormeci C (2015) Spatial interpolation of climatic variables using land surface temperature and modified inverse distance weighting. Int J Remote Sens 36(4):1000–1025 Shepard D (1968) A two-dimensional interpolation function for irregularly-spaced data. In: Proceedings of 23rd National Conference ACM. ACM, pp 517–524
Inversion Theory R. N. Singh1 and Ajay Manglik2 1 Discipline of Earth Sciences, Indian Institute of Technology, Gandhinagar, Palaj, Gandhinagar, India 2 CSIR-National Geophysical Research Institute, Hyderabad, India
Definition Inverse theory – The mathematical theory for deducing from observations the cause(s) that produce them.
Introduction Inverse theory in general deals with deducing the cause from observations through intricate mathematical formulations. In geophysics, physical fields are recorded by various sensors located on the earth’s surface, within boreholes or above the ground by airborne and satellite observations. These fields are
Inversion Theory
responses of physical property variations within the earth, and, thus, inversion theory is used to image the earth’s structure in terms of physical property variations from the observations of physical fields (Tarantola 1987; Aster et al. 2019). Inverse theory is also used in other branches of geosciences to understand the geological processes, e.g., estimation of magma source composition from inversion of trace elements (Albarede 1983; McKenzie and O’Nions 1991). The subject of inverse theory and its application in geosciences is vast, and a lot of development has taken place in this field. In this entry, we provide a brief description of inverse theory to give a flavor of the subject. In inversion approach, first a forward problem (▶ Forward and Inverse Models) is defined that relates model parameters m to observed responses d through a functional f as d ¼ f ðmÞ:
ð1Þ
Here, the functional f contains information about the physics of the given problem. Since in geophysical studies, we aim at obtaining the unknown model parameters m from the observed responses d, in the simplest form, ignoring all complications of noise in data and inadequacy of data, assuming that the exact inverse exists the inverse problem can be posed as m ¼ f 1 ðd Þ:
ð2Þ
Inverse of a Discrete Problem In general form, the model parameters are a continuous function of the spatial coordinate system and the functional is a continuous operator. This requires solving Eqs. 1 and 2 in the continuous domain. There are methods such as the BackusGilbert method that are used to solve such inverse problems. However, a generally followed practice is to convert the problem into the discrete domain such that m 5 {m1, m2, . . ., mN}T and d 5 {d1, d2, . . ., dM}T, where M, N are number of observations and model parameters, respectively, and then express Eq. 1 in matrix form as d ¼ Fm:
ð3Þ
In the ideal situation of noise-free data and full-rank matrix (M ¼ N ¼ K, where K is the rank of the matrix), inverse of Eq. 3 is m ¼ F 1 d:
ð4Þ
This is the desired exact solution of an inverse problem. However, geophysical inverse problems are mostly nonlinear, rank-deficient, and ill-conditioned, and in practice it is difficult to get the exact solution (Menke 1984; Tarantola 1987). Therefore, a generalized inverse of the problem is obtained by
Inversion Theory
673
imposing some other constraints. There are two main scenarios; (i) M > N, and (ii) M < N assuming that the rank K ¼ min (M, N ). In the first case, the number of observations is more than the number of unknown model parameters. It is called a strictly overdetermined system (Gupta 2011) for which the error of misfit is e ¼ d Fm,
ð5aÞ
where m is the estimated model parameter vector and is minimized in the least square sense, i.e., eTe is minimized that yields the best fit solution as m ¼ FT F
1
F T d:
ð5bÞ
In the second case, the number of observations is less than the number of unknown model parameters. It is called a strictly under-determined system. Here, the norm of the model parameter vector is minimized by minimizing mTm to yield the best fit solution as m ¼ F T FF T
1
ð6aÞ
d:
There is another scenario when the rank of the matrix K < min (M, N ). Inverse of an overdetermined system for such a case is obtained by minimizing an objective function that is a weighted sum of eTe and mTm, i.e., minimizing eTe + l2mTm, which yields m ¼ F T F þ l2 I
1
F T d:
ð6bÞ
Here, I is identity matrix and l is the control parameter to assign relative weight to the data error and model norm, also known as the Marquardt parameter (Marquardt 1963). For an optimum value of l, the L-curve is used which is constructed on a plane with axes as eTe and mTm for different values of l. The value of l is chosen to correspond to maximum curvature point of this curve. This is a simple case of regularization. In case of noisy data, if the errors in data are given in terms of variances, then a weight matrix is introduced while minimizing errors as e
T
C 1 d e,
ð7aÞ
where
Cd ¼
s2d1 0 ⋮ 0 0
0 s2d2 ⋮ 0 0
0 0 ⋱ ⋮ ⋱ 0
0 0 ⋮ : 0 s2dN
In this case the solution of the inverse problem is
(7b)
m ¼ F T C 1 d F
1
F T C 1 d d:
ð8Þ
Singular Value Decomposition In the inversion approach described above, the generalized inverse is obtained by different matrix operations depending on the nature of the problem and the rank of the matrix. Singular value decomposition (SVD) (Singular Value Decomposition) does not require a priori information about the nature of the matrix and hence is very useful especially for inversion of singular or ill-conditioned matrices (Golub and Kahan 1965; Golub and Reinsch 1970). The technique was discovered independently by Beltrami in 1873 and Jordan in 1874 as a tool to solve square matrices and was extended to rectangular matrices by Eckart and Young in the 1930s (Stewart 1993; Manglik 2021). In SVD, a matrix F of dimension M N and rank K is decomposed into data and parameter eigen matrices U(M K ) and V(N K ), respectively, and a diagonal matrix containing non-zero eigen values S(K K ) arranged in decreasing value of magnitude such that F ¼ USV T :
ð9Þ
An inverse of this matrix can be obtained as F ¼ V S1 U T :
ð10Þ
The data and parameter matrices satisfy the conditions UTU ¼ VTV ¼ I but UUT and VVT need not be the identity matrix for a geophysical problem. The first one is known as the information density matrix that provides information about the data sensing different model parameters, whereas the second one is called the parameter resolution matrix that is used to understand the resolution of model parameters. This matrix is very useful to find equivalence of model parameters. The SVD approach reduces the drudgery of computation and provides ease in interpretation.
Inverse of a Continuous Problem Discrete inverse problem assumes that the model parameters are discrete in nature, i.e., the earth’s structure is composed of discrete layers/cells each having constant values of physical properties. However, physical properties may vary continuously in spatial domain and observations made at discrete locations are a function of physical property variation in the entire domain. This leads to a situation where observed data are always less compared to the unknown model parameters (under-determined system). Backus and Gilbert (1967, 1968, 1970) developed an elegant approach to invert this continuous problem. This method tries to invert the parameters given by continuous functions. We have in this case
I
674 1 0
Inversion Theory
Fi ðxÞmðxÞdx ¼ d i , for the domain in the range, 0 x 1,
ð11Þ where m(x) represents continuous model parameter variation and Fi(x) is a function describing the physics of the problem. The subscript i denotes the index of the data point (i ¼ 1, 2, , N ). Here, we have finite set of data and wish to estimate the unknown function m(x). To do this, we first express m at x ¼ x0 in terms of the Dirac delta function as 1
mðx0 Þ ¼
0
dðx x0 ÞmðxÞdx,
ð12Þ
and assume that m(x0) is a linear combination of the data with weights ai, (i ¼ 1, 2, , N ), i.e.
described in previous sections that treat the model as deterministic. The model thus has a probability distribution and we invert for a posterior distribution of the model parameters from the known prior distribution for the model and the data. The posterior distribution of the model according to the Bayes theorem is expressed as f Mjd ¼
f Djm f M ðmÞ , f D ðd Þ
ð16Þ
where fM(m)( fD(d )) is the probability density function (▶ Probability Density Function) for model (data) and fMjd( fDjm) is the conditional probability distribution of model given data (or data given model). This can be demonstrated for a normal distribution by considering that the model follows a Gaussian distribution so that the probability fM(m) is expressed as
N
ai di ¼ mðx0 Þ:
ð13Þ
i¼1
Substituting Eq. 11 into Eq. 13 gives N
N
ai d i ¼ i¼1
1
1
i¼1
0
1 2
ai Fi mðxÞdx: i¼1
0
ð14Þ Comparing Eq. 14 with Eq. 12, we get N
ai Fi ¼ dðx x0 Þ:
ð15Þ
i¼1
This is the ideal situation in which we get the exact solution of the model at x ¼ x0. However, this is mostly not possible and N
weights ai are obtained such that
T
f Djm / exp
1 ðFm dÞT C1 d ðFm d Þ 2
_
Dirac delta function as possible and we get an estimate mðx0 Þ of the true model m(x) around x ¼ x0. The spread of the function gives information about the resolution of the model. An example of application of the Backus-Gilbert method to magnetotelluric data can be found in Manglik and Moharir (1998).
,
ð18Þ
where d represents observed data with Cd corresponding data covariance matrix. Using Bayes theorem, we can get model posterior conditional probability density function given data as 1 T f Mjd / exp½ f m mprior C1 m m mprior 2 þðFm d ÞT C1 d ðFm d Þ
ai Fi is as close to the i¼1
, ð17Þ
m mprior C1 m m mprior
where mprior and Cm are the mean of the prior distribution of the model and its covariance matrix. The conditional distribution of data given model is then
N
ai Fi ðxÞmðxÞdx ¼
f M ðmÞ / exp
g:
ð19Þ
Equation 19 can be simplified to (Tarantola 1987) 1 f Mjd / exp ðm mMAP ÞT C1 M0 ðm mMAP Þ , 2
ð20Þ
where mMAP is the maximum a posteriori model and modified covariance matrix CM0 and is given by 1 CM0 ¼ FT C1 d F þ Cm
1
:
ð21Þ
Bayesian Inversion In this inversion approach (▶ Bayesian Inversion), the model is considered as a random variable unlike the methods
The maximum a posteriori model mMAP is obtained by minimizing the following expression
Inversion Theory
675 T
T 1 m mprior C1 m m mprior þ ðFm d Þ Cd ðm d Þ:
ð22Þ Bayesian method is highly computationally intensive as a large number of random samples from probability distributions are generated.
Inversion of a Nonlinear Problem Geophysical inverse problems are in general nonlinear in nature. The above discussed linear inverse theory for discrete systems is applied to nonlinear problem by quasi-linearizing the nonlinear problem through Taylor’s series expansion. Here, the model is expanded around some initial model m0 assuming that the model perturbation δm is small d ¼ f ðmÞ ¼ f ðm0 þ dmÞ f ðm0 Þ þ G 0 dm þ . . . ,
ð23Þ
where G0 ¼
@f @m
:
ð24Þ
interpretation of geophysics data in terms meaningful models of the subsurface.
Geophysical Example We demonstrate an application of the inverse theory in developing a joint inversion (JI) algorithm for interpretation of seismic and magnetotelluric geophysical data. These two methods sense different physical properties, namely, bulk and shear moduli, and Poisson’s ratio through seismic velocities, and electrical resistivity, respectively, of the earth. JI algorithms allow inverting of such diverse and unrelated datasets together, under the assumption that the geometry of the causative structure is the same, to obtain an integrated model of the earth’s structure. Here, we give the mathematical description in very brief. For the detailed formulation, one may refer to Manglik and Verma (1998) and Manglik et al. (2009) and for field application to Manglik et al. (2011). In the forward problem of magnetotelluric method, apparent resistivity (ra) and phase (fa) measured on the surface of the earth are linked to the resistivity distribution inside the earth in terms of surface impedance Z1 as
m¼m0
Ignoring second- and higher-order derivatives in Eq. 23, we get 0
dd ¼ d d 0 ¼ G dm,
d 0 ¼ f ðm0 Þ:
ð25Þ
Thus, we get again a linear form of inverse problem dd ¼ G0 dm,
ra ðoÞ ¼
1
Z Z , dwe 1 1
fa ðoÞ ¼ tan 1
ð27Þ
p where w ¼ iom , o is the angular frequency, m is the magnetic permeability of the medium, and “ ” represents the complex conjugate. For a n-layered model (nth layer begin the half-space), the impedance of the ith layer is represented as
ð26Þ
which is solved for perturbation in model parameters given perturbation in data. Once perturbation in model parameters is estimated, the model is updated and the process is repeated till the desired convergence is obtained. Thus, the solution is obtained iteratively. The quasi-linearization approach tries to converge to a minimum closest to the chosen initial model m0. For a highly nonlinear problem having several minima, this approach might yield an incorrect solution corresponding to a local minimum. There are global optimization techniques such as simulated annealing and genetic algorithms (▶ Genetic Algorithm) that are used to circumvent the issue of local minima (▶ Optimization Methods). Further development has taken place in the form of application of artificial neural networks to construct model parameter space from observed geophysical data (▶ Artificial Neural Network). Recently, data-driven approaches of machine learning (▶ Machine Learning and Geosciences) and hybrid data-driven and physics-based approaches (▶ Deep Learning) have also made in-roads to
ImðZ Þ , ReðZÞ
Zi ðoÞ ¼ f ðZ iþ1 ðoÞ, o, ri , hi Þ,
p with Z n ðoÞ ¼ w rn , ð28Þ
where ri and hi are the electrical resistivity and thickness, respectively, of the ith layer. The partial derivatives of ra and fa with respect to the unknown layer parameters ri and hi, needed to construct the matrix G0 (Eq.24), can be obtained from Eqs. 27 and 28. Seismic travel time modeling involves refraction and reflection travel times. For surface seismics, the time trr i taken by head waves generated by a critically refracted wave at the ith interface of an isotropic, horizontally layered earth, and recorded at an offset distance X can be expressed as trr i ¼ f ½X, vk ðk¼ 1, . . . , iÞ, hk ðk¼ 1, . . . , i 1Þ,
ð29Þ
where vk is the seismic velocity of the kth layer. Similarly, reflection travel time trfi of a wave reflected from the ith
I
676
Inversion Theory
interface and recorded at an offset distance x is related to the model parameters as trfi ¼ f ½R, vk ðk¼ 1, . . . , iÞ, hk ðk¼ 1, . . . , iÞ:
ð30Þ
Here, R is called the ray parameter which is a nonlinear function of the offset distance and velocity and thickness of layers. The partial derivatives of these seismic observations with respect to the model parameters can be obtained from these equations. Following the quasi-linearization approach (Eqs. 25 and 26), we can write a combined matrix for geometrical JI as (dropping δ for ease of writing expressions) dr
Grr
Ghr
0
df
Grf
Ghf
0
0
Ghrr
Gvrr
0
Ghrf
Gvrf
drr d rf
¼
r h
or
d ¼ Gm:
v ð31Þ
In this matrix, the data column vector d has the array dimension (ma 1), where ma is the sum of all observations corresponding to the four datasets and the model parameter vector m has array dimension (3n 1) corresponding to a layered earth model with (n 1) layers overlying a halfspace. Equation 31 is solved by SVD (Manglik 2021). A field example of application of this method is discussed by Manglik et al. (2011) who jointly inverted coincident deep crustal seismic (DCS) and magnetotelluric (MT) data at one shot-point along the Kuppam-Palani geotransect in the southern Indian shield. The geotransect was covered by different geophysical methods to delineate the crustal structure of various tectonic blocks of the Archaean to the Neoproterozoic age separated by shear zones. The study delineated a fourlayer crustal model with a mid-crustal low velocity layer beneath the entire profile (Reddy et al. 2003) which is also found to be electrically conductive (Harinarayana et al. 2003). Quantification of physical properties and thickness of such low seismic velocity and high electrical conductivity layers (LVCL) sandwiched between high velocity and resistive layers is a challenging task because of certain limitation of these methods. In seismics, the absence of refraction from the top of a LVCL and trapping of seismic energy within the layer severely limits interpretation of seismic data for reliable estimates of the thickness and seismic velocity of the LVCL. Similarly, electrical and electromagnetic methods of geophysical exploration suffer from the well-known problem of equivalence due to which different combinations of conductivity and thickness of a subsurface layer for a 1-D earth can
produce similar response. This leads to a range of acceptable models, all satisfying the minimum root mean square (RMS) error criterion. Manglik et al. (2011) performed JI of seismic reflection and refraction travel times (SI) and MT apparent resistivity and phase data to test the efficacy of the JI approach (Manglik and Verma 1998) in reducing the problem of equivalence and providing better constrained model of the LVCL compared to the models obtained by inversion of individual datasets. Another advantage of the JI approach is that the features of the model sensed by at least one method can be accounted for in the model and, thus, can yield an improved model of the subsurface structure. For example, a near-surface layer needed to explain the MT data was ignored by the SI method. Similarly, the Moho (a major interface between the crust and the mantle) is a prominent reflector in the SI data but may remain transparent in the MT data. Using these additional details, Manglik et al. (2011) modified the four-layer model of Reddy et al. (2003) to a six-layer model by including a thin near-surface conducting layer and a lithospheric mantle layer below the crust and jointly inverted the SI and MT data. The results are shown in Fig. 1a, b. For comparison, the results obtained by inversion of individual datasets are also shown in the figure. The JI results reveal that the LVCL is about 5 km thicker in comparison to the MT model but it is approximately the same as obtained by the SI method. However, the results depend on the choice of the initial guess due to the equivalence problem. Therefore, it is important to check for the sensitivity of the model to the choice of the starting model. A model appraisal was performed by Manglik et al. (2011) to test if JI can better constrain the range of acceptable models compared to the range of SI and MT models. This was done by a systematic scan of the model parameter space for the LVCL covering the chosen resistivity, thickness, and velocity range of 50–200 Ω.m, 4–20 km, and 4–7 km/s, respectively, and constructing the starting models (shown by open circles in Fig. 1c, d) from this model space for inversion. Individual SI and MT inversions and JI inversion were performed for each of these starting models. The final models obtained after inversion corresponding to all starting models are also shown in Fig.1c, d by diamonds for MT/SI and stars for JI. All these models fall in a narrow zone of minimum RMS error, suggesting that the parameters of LVCL suffer from the problem of equivalence. Since the parameters of LVCL converge in a long narrow zone of minimum RMS error even with JI, a cursory look at Fig. 1c, d would suggest almost no improvement. However, an analysis of the normalized RMS errors of all the final models has revealed a pattern. Here, normalization is done in such a way that the model having largest RMS error has normalized error of 1 and that having least RMS error now has a normalized error of 0. The results (Fig. 1e–g) corresponding to MT (triangles), SI (stars), and JI (diamonds)
Inversion Theory
677
I
Inversion Theory, Fig. 1 (a and b) Seismic velocity and electrical resistivity models obtained by inversion of individual seismic reflection and refraction (SI) data and magnetotelluric apparent resistivity and phase (MT) data as well as their joint inversion (JI). A six-layer earth model was considered for inversion. Starting model for inversion is shown as IG. (c and d) Appraisal of the seismic velocity and electrical resistivity of the LVCL. Circles, diamonds, and stars represent initial value, final value by individual methods, and final value by JI,
respectively. (e–g) Normalized RMS error vs. resistivity (e), seismic velocity (f), and thickness (g) for various models used in (c and d). Models with less than 10% error (shaded region) are considered as acceptable models. Triangle, stars, and diamonds represent MT, SI, and JI solutions, respectively. [Reprinted/adapted by permission from Springer Nature: Springer Nature, Joint Inversion of Seismic and MT Data – An Example from Southern Granulite Terrain, India by A. Manglik, S.K. Verma, K. Sain et al. (2011)].
reveal a narrowing of the zone of acceptable models if one considers the models falling within 10 percent of the least normalized error (shaded region) as acceptable. It can be seen that SI models have more scatter in the range of acceptable models. The LVCL with seismic velocity and thickness in the range of 3.25–5.75 km/s and 5.5–12.8 km, respectively, can fit the observed data equally well. MT gives a better constrained model of the LVCL with resistivity and thickness in the range of 106–110 Ω.m and 9.8 km, respectively. In contrast, JI models fall in the seismic velocity range of 5.0–5.8 km/s with the minimum at 5.4 km/s, resistivity in the range of 115–140 Ω.m with minimum at 122 Ω.m, and thickness in the range of 10–13 km with the minimum at 11.5 km. These results indicate that JI gives a wellconstrained model of the LVCL and reduces the ambiguity in its parameters.
Summary Inverse theory has been applied to interpret diverse geophysical and geochemical data to image the earth’s interior and understand the processes since its use in seismology by Backus and Gilbert (1968) and Tikhonov and Arsenin (1977). Problem is essentially reduced to solving a linear algebra problem. Current endeavor is to use more data-intensive techniques such as machine learning to reduce uncertainty in the solutions.
Cross-References ▶ Artificial Neural Network ▶ Bayesian Inversion in Geoscience ▶ Deep Learning in Geoscience
678
▶ Forward and Inverse Stratigraphic Models ▶ Genetic Algorithms ▶ Machine Learning ▶ Optimization in Geosciences ▶ Probability Density Function Acknowledgments AM carried out this work under the project MLP6404-28(AM) with CSIR-NGRI contribution number NGRI/Lib/ 2020/Pub-201
Inversion Theory in Geoscience Reddy PR, Rajendra Prasad B, Vijaya Rao V, Sain K, Prasada Rao P, Khare P, Reddy MS (2003) Deep seismic reflection and refraction/ wide-angle reflection studies along Kuppam-Palani transect in the southern Granulite terrain of India. In: Ramakrishnan M (ed) Tectonics of Southern Granulite Terrain: Kuppam-Palani geotransect. Memoir Nr 50. Geological Society of India, Bangalore, pp 79–106 Stewart GW (1993) On the early history of the singular value decomposition. SIAM Rev 35:551–566 Tarantola A (1987) Inverse problem theory: methods for data fitting and model parameter estimation. Elsevier Science, Amsterdam Tikhonov AN, Arsenin VY (1977) Solutions of ill-posed problems. V.H. Winston and Sons, New York
Bibliography Albarede F (1983) Inversion of batch melting equations and the trace element pattern of the mantle. J Geophys Res 88:10573–10583 Aster RC, Borchers B, Thurber CH (2019) Parameter estimation and inverse problems. Elsevier Backus GE, Gilbert JF (1967) Numerical applications of a formalism for geophysical inverse problems. Geophys J R Astron Soc 13: 247–273 Backus GE, Gilbert JF (1968) The resolving power of gross earth data. Geophys J R Astron Soc 16:169–205 Backus G, Gilbert JF (1970) Uniqueness in the inversion of inaccurate gross earth data. Philos Trans Roy Soc Lond A 266:123–192 Golub GH, Kahan W (1965) Calculating the singular values and pseudoinverse of a matrix. J Soc Indus Appl Math Ser B 2:205–224 Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer Math 14:403–420 Gupta PK (2011) Inverse theory, linear. In: Gupta H (ed) Encyclopedia of solid earth geophysics. Encyclopedia of earth sciences series. Springer, pp 632–639 Harinarayana T, Naganjaneyulu K, Manoj C, Patro BPK, Kareemunnisa Begum S, Murthy DN, Rao M, Kumaraswamy VTC, Virupakshi G (2003) Magnetotelluric investigations along Kuppam-Palani geotransect, South India – 2-D modeling results. In: Ramakrishnan M (ed) Tectonics of southern Granulite terrain: Kuppam-Palani geotransect. Memoir Nr 50. Geological Society of India, Bangalore, pp 107–124 Manglik A (2021) Inverse theory, singular value decomposition. In: Gupta H (ed) Encyclopedia of solid earth geophysics. Encyclopedia of earth sciences series. Springer, Cham. https://doi.org/10.1007/ 978-3-030-10475-7 Manglik A, Moharir PS (1998) Backus -Gilbert magnetotelluric inversion. In: Roy KK, Verma SK, Mallick K (eds) Deep electromagnetic exploration. Narosa Publishing House & Springer, pp 488–496 Manglik A, Verma SK (1998) Delineation of sediments below flood basalts by joint inversion of seismic and magnetotelluric data. Geophys Res Lett 25:4042–4045 Manglik A, Verma SK, Kumar H (2009) Detection of sub-basaltic sediments by a multi-parametric joint inversion approach. J Earth Syst Sci 118:551–562 Manglik A, Verma SK, Sain K, Harinarayana T, Vijaya Rao V (2011) Joint inversion of seismic and MT data – an example from Southern Granulite Terrain, India. In: Petrovsky E, Ivers D, Harinarayana T, Herrero-Bervera E (eds) The Earth’s magnetic interior. IAGA special Sopron book series. Springer, pp 83–90 Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. SIAM J Appl Math 11:431–441 McKenzie D, O’Nions RK (1991) Partial melt distributions from inversion of rare Earth element concentrations. J Petrol 32:1,021–1,091 Menke W (1984) Geophysical data analysis: discrete inverse theory. Academic, New York
Inversion Theory in Geoscience Shib Sankar Ganguli and V. P. Dimri CSIR-National Geophysical Research Institute, Hyderabad, Telangana, India
Definition Inverse problems are familiar to numerous disciplines of geosciences that relate physical properties characterizing a model, m, with its model parameters, mi, to collected geoscientific observations, i.e., data, d. This demands a knowledge of the forward model competent in predicting geoscientific data for specific subsurface geological structures. The modeled data can be expressed as d ¼ Gðm, mi Þ:
ð1Þ
Here G is the forward modeling operator, which is a nonlinear operator in most geoscientific inverse problems. In practice, for an inverse method, a model of the feasible subsurface is assumed and the model response is computed, which is subsequently compared with the observed data. This procedure is repeated several times until a minimum difference between the model response and observations is obtained.
Inverse Theory with Essential Concepts In any geoscientific inverse problem, the ultimate aim is to estimate the earth’s physical properties from the surface or subsurface measurements, which have been experienced variation in properties of the earth element. Undeniably, inverse theory is also capable of deciphering the quality of the predicted model than just determining model parameters. It can assist in deciding which model parameters or which combinations of model parameters are the most appropriate
Inversion Theory in Geoscience
representatives of the earth’s subsurface. Further, it can be also useful in analyzing the effect of noise on the stability of the solution. For all of these, it is essential to realize the theoretical response of an assumed earth model (i.e., simulation or forward problem), thereby establishing the mathematical relationship among discrete data and a continuously defined model. Generally speaking, a real geological scenario is approximated by considering a simple model, and relevant model parameters are inferred from the observed data. This is known as an inverse problem. Mathematically, it is also possible to formulate a forward problem as an inverse problem. The general forward and inverse problems can be expressed as: Forward problem : model f model parameters, mi g ! data: Inverse problem : data ! model f estimated model parameters, mi g:
Finding solutions to the geoscientific inverse problems has always been at the forefront of an active research area, and therefore methods such as direct inversion, linear and iterative inversions, and global optimization methods have been investigated methodically. Even though the forward problem can provide a unique solution, the inverse problem does not. Inverse problem has multiple solutions and hence a priori information becomes vital to obtain meaningful model parameters. In general, a priori information on the model parameters is characterized using a probabilistic point of view on model space. The model space is a hypothetical space with manifold space points, independent of any specific parametrization, each of which essentially signifies a plausible model of the system. Apart from the obvious goal of estimating a set of model parameters, it is equally important to estimate the formal uncertainties in the model parameters. The ultimate goal is, of course, to obtain a reasonable, valid, and acceptable solution to the inverse problem of interest. A comprehensive framework showing the real implementation of various inversion methods including the process workflow to infer earth physical properties is illustrated in Fig. 1. Each element of a typical geophysical inverse problem, central for obtaining a meaningful solution, is indicated by arrows (Fig. 1). It can be seen that the inversion process comprises inputs (e.g., geophysical data, the equations that consider governing physical laws, a priori information about the problem), implementation (e.g., forward simulation and the inversion approach), and evaluation (e.g., assessment of the model). During the implementation stage, it is ensured that the relevant components are properly defined so that a well-posed inverse problem is developed and is being solved mathematically. For an effective geophysical inversion, two abilities of the algorithm become important and these are:
679
(a) the ability to run a forward simulation and generate synthetic data given a physical model and (b) the ability to appraise the model performance and update the model until the misfit function is optimized (Fig. 1). Regularization plays a critical role in the evaluation of the model’s performance so that a suitable model is developed that is in good agreement with a priori information and assumed statistical distributions. Numerically, the inversion problem will be solved through optimization to obtain the unknown earth parameters. During optimization, often we need to alter the model regularization or change the aspects of numerical computations until the final model is accepted based on how well the model fits the observed data with less uncertainty. For more details, several excellent textbooks on the geophysical inverse theory are accessible, and interested readers are referred to Tarantola (1987), Menke (2012), Richter (2020), and others. In any geoscientific inverse problem, the following significant issues related to the problem of finding solutions remain valid: 1. Does the solution exist for the problem? 2. If yes, then is it unique? 3. Is it stable and robust? The existence of solution ({m, mi} ¼ G1d ) is usually related to the mathematical design of the inverse problem. It is important to note that, from a physical perspective dealing with geological structures, a certain solution to the problem is anticipated. However, it may be also possible that the mathematical perspective could not provide an adequate numerical model that would fit the observations (data). There is an infinite possibility of models of a given subsurface geological condition that could match the data. In general, material properties within the subsurface are characterized by continuous functions, and the inversion method attempts to derive these functions from the set of observations of a finite number of data points. This leads to the problem of nonuniqueness in results. Let us assume two different models, m1 and m2 with two different sources, S1 and S2 that generate the same data d, then it is difficult to distinguish these two models from the given set of observations, and the solution will be nonunique. A solution will be considered as unique if these two diverse models, m1 and m2, will produce two dissimilar data sets, d1 and d2 (d1 6¼ d2). By and large, geophysical data are invariably contaminated with some amount of noise. Hence, the last problem dealing with the stability and robustness of the solution is pivotal in geophysical inversion. Stability specifies how minor errors obtained during observations spread into the model and robustness deals with the magnitude of insensitivity concerning a small number of large errors within the data.
I
680
Inversion Theory in Geoscience
Inversion Theory in Geoscience, Fig. 1 A comprehensive framework of geophysical inversion including inputs, implementation of algorithms, evaluation, and interpretation (Cockett et al. 2015).
A stable and robust solution is insensitive to small errors in the data. An inverse problem is said to be ill-posed if the solution is unstable and nonunique; however, such inverse problems can result in physically and/or mathematically
meaningful solutions if regularization techniques are applied to them (Tikhonov and Arsenin 1977; Dimri 1992). In practice, inversion methods are categorized into two broad classes: direct inversion methods and model-based
Inversion Theory in Geoscience
inversion methods. For the sake of completeness, these two are discussed in the subsequent parts of this chapter.
Direct Inversion Methods In the case of direct inversion, the model is derived directly from the data without any assumptions on the model parameter values. These types of inversion schemes are implemented by a mathematical operator, which is derived based on the governing physics of the forward problem and subsequently applied to the data directly. The layer-stripping method is one of the most familiar direct inversion methods used in seismology to infer the medium properties of 1D acoustic or elastic medium. This is founded on the forward problem dealing with the reflectivity concept accounting for the plane wave response of the medium at individual frequency. A synthetic seismogram can be generated by combining the responses of all the plane waves for a layered earth model and applying inverse Fourier transform to the same. In practice, seismic data is transformed from the travel-time domain to the intercept time ray parameter domain and then the effects of each layer are pulled out by means of downward continuation during the estimation of layer properties. The foundation of this technique is on the assumption that seismic data is collected in noise-free conditions and is essentially in the broadband range. Such a scenario is difficult in a real field study, therefore, information about the frequency content needs to be supplied self-sufficiently in order to obtain stable model parameters. Apart from this, another direct inversion method is Born inversion, which is based on the inverse scattering approach (popular as Born theory). The important limitation of the layer-stripping algorithm is relaxed in this case by considering the earth to be varied in 2D or 3D. This is treated as a concomitant high-priority approach and is of interest since the Born theory implicitly contains migration with a proper band-limited layer property estimate. That said, the algorithm can estimate the precise time and amplitudes of all internal multiples independent of whether the medium is acoustic or elastic, deemed to be useful in deciphering subsurface structures. One of the major limitations of direct inversion methods is that these algorithms are sensitive to noise and often it becomes problematic for the operator to infer meaningful information from the data contaminated with noise. Such issues can be controlled, to some extent, in a model-based inversion approach where uncertainties associated with data are mostly addressed.
Model-Based Inversion Methods A model-based inversion algorithm employs a forward model to compute synthetic data, which is then compared with the
681
observed data, unlike the direct inversion approach. The process of generating synthetic data is repeated until a proper match between the computed and observed is obtained. If the match between observed and computed is acceptable with minimum error, the model is accepted as the solution to the inverse problem. This repeated process, in essence, is viewed as an optimization problem in which the derived model elucidates the set of observations in a much more effective sense. Based on the approach of a search for an optimum model, optimization methods differ. The simplest and bestunderstood optimization method is one that can be represented with an explicit linear equation, which laid the foundation of linear discrete inverse theory. The discrete inverse theory can be expressed as M
Gkl ml :
dk ¼
ð2Þ
l¼1
Whereas the continuous inverse theory considers discrete data, in which a continuous model function can be represented as dk ¼ Gk ðxÞ mðxÞdx:
ð3Þ
As stated earlier, the observed data in inverse problems are always discrete. Hence, a discrete inverse problem is taken into consideration by approximating the data and model as vectors, which are represented in the following forms d ¼ ½d1 , d2 , d3 , . . . , dND T , and m ¼ ½mðx1 Þ, mðx2 Þ, mðx3 Þ, . . . , mðxNM ÞT , where ND is the total number of data points, NM is the total number of model parameters, and T defines the transpose of a matrix. Generally speaking, the solution to the linear inverse problem is based on measures of the size or length of the misfit (objective function) between the observed and calculated data. The length, namely Euclidean length, of the prediction error, is usually defined by “norm” and can be represented by the sum of the absolute values of the components of the error vector. Consequently, the best model is then the one that culminates in the smallest overall error (ek): pre obs ek ¼ dobs k dk ¼ dk GðmÞ:
ð4Þ
The most commonly used norms are characterized by Ln norm (Menke 1984), where n is the power, takes the following form
I
682
Inversion Theory in Geoscience ND
jek j1 ,
L1 norm : jj e jj1 ¼
ð5Þ
k¼1
ND
L2 norm : jj e jj2 ¼
j ek j
2
1 2
,
ð6Þ
k¼1 1 n
ND
Ln norm : jj e jjn ¼
j ek j
n
, where n ¼ 0, 1, 2, . . . , /
k¼1
ð7Þ or, in vector notation, it can be expressed as: Ln norm : jj e jjn ¼
dobs k GðmÞ
T
dobs k GðmÞ
1 n
: ð8Þ
Different norms offer different weights to outliers. Comparatively, the higher norms provide more weights to large elements of the misfit between the observed and predicted data. To understand how the concept of measure of length can be pertinent to the solution of an inverse problem, let us consider a simple problem of fitting data to (m, d) pairs with a single outlier, as depicted in Fig. 2. We wish to obtain a norm that leads to a straight line that passes across the set of data points (m, d) so that the sum of the squared residuals (errors) is minimal. Figure 2 demonstrates that three lines are fitting the data points corresponding to L1, L2, and L1 norms. As expected, the L1 norm is least sensitive to the presence of outlier, whereas the L1 norm finds the curve which minimizes the largest single error between the observed and Inversion Theory in Geoscience, Fig. 2 Fitting a straight line to (m, d) data sets where the measured error is represented by L1, L2, and L/ norms (after Menke 1984). Note the L1 norm offers a minimum weight to the outlier
predicted values. Also, the latter is strongly influenced by the outlier and is unable to eradicate the error (residual) since that may establish a large error elsewhere. Hence, the choice of norm determines how the total error can be quantified, wherein the norm is preferred to signify the nature of noise or scatter in the observed data. Among the various available norms, the L2 norm is the most widely applied norm in geophysical inverse problems. The preference of L2 norm over L1 norm is because the former weights the data outliers that lie nowhere near the average trend (Fig. 2) and obeys Gaussian statistics. Application of other kinds of norms, e.g., L1 norm, has lately received attention in geophysical inversion (Wang et al. 2020). Generally speaking, a lower order norm is preferred for reliable estimates if the data are likely to be dispersed broadly about the average trend. A comparative analysis of these two norms is summarized in Table 1. It is evident that L1 norm is more robust as compared to L2 norm since the former is insensitive to a small number of big errors (outlier) in the data. This characteristic, in principle, gave new impetus to Huber (1964) to introduce the theory of robust estimation. Some interesting pieces of literature for more details about robust estimation are Hampel (1968) and Wilcox (2011). Based on the search method to obtain the optimal solution, the model-based inversion can be classified into four categories, as discussed below. Linear Inversion Methods As the name suggests, these types of inversion techniques presume that data and models have a linear relationship. These methods have been implemented using linear algebra
Inversion Theory in Geoscience
683
Inversion Theory in Geoscience, Table 1 Summary of comparative analysis of using L1 and L2 norms Features Quality aspect Output Accuracy Computational time Programming used
L1 norm Robust Sparse Better More Linear
L2 norm Not very robust Non-sparse Good Less Least squares
in many geophysical applications for the past several decades. Of course, these algorithms innately demand certain conditions to establish a linear relationship between data and model. For instance, in Zoeppritz equations, the reflection coefficients are nonlinearly related to the P-wave velocity (Vp), S-wave velocity (Vs), and density. In exploration seismology, the Zoeppritz equations are a set of equations that define the segregation of incident seismic energy at a planar interface, typically a boundary between two layers of rock. These equations become helpful in the detection of petroleum reservoirs, especially during the investigation of the governing factors affecting the amplitude of the plane waves when the angle of incidence is altered. However, these equations are essentially nonlinear and to solve this problem, researchers have derived a linearized form of these equations, which are valid for small angles of reflection. An alternative way to establish a linear relation is to consider a linear relationship among perturbations in data and that in the model. This linear relationship remains valid as long as the perturbation in the model from the initial model is small. The linear relationship between the data perturbation and model perturbation can be accomplished by assuming that due to a small perturbation δm to a reference model m0, the model is affected by the forward modeling operator (G) in a way that results in the observed data in the following form dobs ¼ Gðm0 þ dmÞ, and dsyn ¼ Gðm0 Þ: To obtain the linearize relation, we implemented Taylor’s series expansion about some initial guess of m0 as Gðm0 þ dmÞ ¼ Gðm0 Þ þ
@Gðm0 Þ @m
Dm þ :: . . . :: ð9Þ m¼m0
Now neglecting the second- and higher-order terms, we get dobs ¼ dsyn þ
@Gðm0 Þ @m
Dm or Dd ¼ G0 Dm,
ð10Þ
m¼m0
where Δd is the data misfit, i.e., vector difference between the observed data and modeled data, and G0 is the sensitivity matrix comprises partial derivatives of synthetic data concerning model parameters. After knowing G, and that its
inverse operator exists, we can easily solve for the model vector by implementing the inverse operator for G, i.e., G1 to the data vector. Also, note that it is possible to get an update to the model perturbation by merely applying G1 0 to the data residual vector. The solution to the linear inverse problem (d ¼ Gm) is straightforward and can be derived from the widely used least-square inverse method, as given below: GT d ¼ GT G m:
ð11aÞ
Now, assuming that (GTG)1 exists, we can obtain the model parameter estimates in the form of mest ¼ GT G
1
GT d:
ð11bÞ
Thus, from knowledge of the matrix (GTG)1GT, which is known as Moore-Penrose generalized inverse (Dimri 1992), and its operation on the data, we can infer about the model parameters of interest. In order to get one unique solution, one of the important criteria is that there are as many systems of equations as the number of unknown model parameters. Such a system involves a situation that the data has enough information on all the model parameters, which means the number of data points is equal to the number of model parameters to be determined, i.e., ND ¼ NM. This is a case of an evendetermined problem. In a system, where NM > ND, i.e., the model parameters are more as compared to the observed data, then it is known as an under-determined problem. The system of equations, in this case, can determine uniquely some of the model parameters and fails to offer adequate relevant information to estimate uniquely all the model parameters. Generally speaking, this would typically be the situation with almost every geoscientific inverse problem, in which we strive to infer continuous (mainly infinite) earth model parameters from a finite set of observed data. Often, the problem is reduced to an even-determined or over-determined (ND > NM) problem by simply splitting the earth model into a set of discrete layers in lieu of solving for the earth parameters as a continuous function of spatial coordinates. Another possibility of a case may occur in practice, in which the observed data may encompass complete information on several model parameters and not a bit on the others. Such a case is referred to as a mixed-determined problem, which arises in seismic tomography studies, there may be one block that receives several rays while other blocks are devoid of a single ray coverage (completely undetermined). For an under-determined problem, the pseudo inverse (Moore-Penrose generalized inverse) can be written as: mest ¼ GT GT G
1
d,
ð11cÞ
I
684
Inversion Theory in Geoscience
and for a completely over-determined problem, the model parameters can be obtained by the weighted least square solution, as given by mest ¼ GT We G
1
GT We d,
ð11dÞ
where the matrix We is known as the weighting factor, which denotes the relative influence of each error on the total prediction error. It is important to note that the matrix GTG can occasionally be singular or nearly singular, so the inversion of the matrix is not possible or calculated imprecisely. The damped least squares method is proposed as a solution. For this, the solution to the inverse problem can be represented as mest ¼ GT G þ lI
1
GT d,
ð11eÞ
where l and I denote the damping parameter and identity matrix, respectively. The damping parameter (l) plays a major role in preventing the inversion from blowing-off by increasing the magnitude of eigenvalues of the matrix (Dimri 1992; Menke 2012; Ganguli et al. 2016). The damped least square method is analogous to the regularization method (e.g., Tikhonov regularization) that minimizes the objective function (misfit) in a least square manner and ensures the smoothness of the model parameters. In practice, l regulates the smoothness of the model which is attained in an iterative sense. For a detailed study on regularization techniques, damped least squares, weighted least squares, and the Backus-Gilbert method for ill-posed inverse problems, see Menke (1984) and Dimri (1992). For the solution of the inverse problem of type d ¼ Gm, there may be a situation that the least square inversion begets numerical inaccuracies while generating the matrix GTG. In such a scenario, it is prudent to obtain a straightforward solution, i.e., mest ¼ G1d. In general, it is not feasible to determine G1 employing standard approaches, since G is not a square matrix in most of the geoscientific cases. To overcome this, singular value decomposition (SVD) of the matrix G can be performed to compute the pseudo inverse. SVD is an extremely effective method for factorizing the matrix G (ND NM) into orthogonal matrices in the following manner G ¼ U S VT ,
ð12Þ
where U (ND ND) is the left orthogonal matrix, is V (NM NM) the right orthogonal matrix, and S is a singular value matrix consisting of singular values as diagonal elements in decreasing order, respectively. If G matrix is singular, then some of the elements in S will also be zero. Further, these two eigenvector matrices, i.e., U and V may be each divided into two submatrices V¼ [ Vp j V0], and U¼ [ Up j U0], in which Vp and Up are allied with singular values Si 6¼ 0 and
V0 and U0 are allied with the singular values Si ¼ 0. The V0 and U0 vector spaces are known as “null spaces” because these are essentially the blind spots devoid of illumination by the operator G. To cite some examples, the null space in seismic deconvolution corresponds to the frequencies that lie in the range outward of the estimated wavelet bandwidth. In crosswell seismic tomography, it primarily resembles a rapid change in horizontal velocity. In the presence of null space, the G matrix can be rewritten as G ¼ Up j U0
Sp
0
0
0
T
Vp j V0 :
ð13Þ
The uniqueness of the solution can be explained by Vp and V0 while the existence of the solution is attributed to the vector spaces Up and U0. Therefore, we can write T G ¼ Up Sp VTp and G1 ¼ Vp S1 p Up :
Finally, we have the estimated model parameters as T T T mest ¼ G1 dð¼ GmÞ ¼ Vp S1 p Up Up Sp Vp m ¼ Vp Vp m ¼ Rm,
ð14Þ where R ¼ Vp VTp is the model resolution matrix. If this matrix is an identity matrix (R ¼ I), the solution is unique and the resolution is perfect. Otherwise, the resolution is not good with the estimated model parameters being some weighted averages of the true model parameters. Likewise, we can also write dest ¼ Gmest ¼ Up UTp d ¼ Nd:
ð15Þ
Here N ¼ Up UTp is called data resolution matrix, which measures the capability of the inverse operator to uniquely estimate the synthetic data. If N ¼ I, the modeled data will be precisely similar to the observed data. Iterative Linear Inversion Methods We have now seen that the solution of the inverse problem of the type d ¼ Gm can be straightforwardly obtained by the method of least squares or the SVD method. However, such linear methods do not offer a solution if the linearity relation is not valid, especially for quasi-linear problems. These types of problems are solved by the iterative linear methods (also known as calculus-based methods), in which the minimum of the error function can be achieved iteratively by consecutive linear guesstimates at individual iteration, and refined successively until the stopping criterion is satisfied (Sen and Stoffa 1995). Most of the iterative methods do not necessitate the matrix to be explicitly well-defined. This inversion scheme updates the reference model (or a priori model) to
Inversion Theory in Geoscience
685
compute a new model and the difference between observed and synthetic data (Δd) is measured to evaluate the model performance. The process is repeated until the data misfit achieves its minimum and the relevant equation for the method is given below (Sen and Stoffa 1995): miþ1 ¼ mi þ G1 ½dobs Gðmi Þ,
ð16Þ
where G(mi) generates the synthetic data for a model mi to match iteratively the observed data, dobs. This inversion method is reasonably sensitive to the initial model guess, therefore, among several minima, it prefers to choose the minimum of the error function adjacent to the starting model. In order to avoid the uncertainties due to data and prior models, both the data and model can be represented by random variables and the inversion can be solved in the context of probability theory. Grid Search Method This method entails a methodical search through each point (every possible) in model space to select the solution with the smallest error. For many geophysical inverse problems, it is a trying task to find “every possible” solution while computing the synthetic data for large model space. Yet a large set of trial solutions can be drawn from a regular grid in the model space. This method is most practical for problems where the total number of model parameters (NM) is small (trial solutions (L) can be proportional to LNM, where the grid is M-dimensional) and the solution falls within a certain range of values which may be utilized to describe the limits of the grid. In general, several approximations are invoked to estimate the model parameters when the inputs are inadequate to use the grid search algorithm. Some applications of the grid search method to infer seismic parameters can be found in Koesoemadinata and McMechan (2003), Lodge and Helffrich (2009), and others. Monte Carlo Method The Monte Carlo method (MC) is an amendment of the grid search method in which the trial solutions are obtained, selfsufficiently of earlier ones, after some small and finite number of trials by random sampling of the model space. This involves a completely blind search of the model space, hence computationally expensive. In this method, theoretical values are computed based on each model parameter varying within a preset search interval (say mmin mi mmax ) and i i subsequently compared with the observed data. Afterward, a random number, say Rn, is sampled from a uniform distribution, U [0,1], and is subsequently mapped into a model parameter. The new model parameter can be obtained by min mnew ¼ mmin þ Rn mmax : i i i mi
ð17Þ
The latest model parameter is thus produced by random perturbation of a definite number of model parameters in the model vector. Model response is analyzed for the new model and compared with the observed data. If the difference between the observed data and modeled data is less than the previously accepted solution, the model is accepted. Otherwise, the model is rejected and the search continues several times until convergence is met. The most popular convergence criterion, in this case, is the total number of accepted models. With the advent of computational efficiency, millions of models can be attempted to obtain a suitable solution. One of the well-known applications of the MC algorithm in geophysics is to the inversion of seismic body wave travel times generating five million earth models with the data comprising 97 eigenperiods of free oscillations, travel times of P and S waves, and mass and moment of inertia of the earth (Press 1968). Finally, out of five million models, only six models met all the constraints. Through this work, Press (1968) addressed for the first time the problem of uniqueness through direct samples drawn from the model space. There are, however, always two important issues with the MC method that need to be pondered and these are: (a) how to know whether one has tested sufficient models that represent the observations properly? and (b) how the successful models can be utilized to estimate the uncertainty in the estimation? To make this method to be more practical in use, search space can be compacted, in fact, it can be defined through a pdf (preferably Gaussian with small variance) instead of a uniform distribution. It is desirable to devise random searchbased algorithms that can sample a large portion of the model space more efficiently considering randomly selected several error functions. Due to the large degree of randomness, we can expect unbiased solutions. Also, it is equally important to avoid getting trapped in a local minimum of the error function. These types of issues can be well addressed by the directed Monte Carlo methods, e.g., simulated annealing and genetic algorithms. These two methods utilize random sampling to guide their search for models and results in a global minimum of the misfit function amidst several local minima, unlike the pure MC method. Both methods are suitable for large-scale geoscientific problems. Numerous pieces of literature are available that discuss the applications of MC methods and directed MC methods to solve various geoscientific problems (Sambridge and Drij koningen 1992; Sen and Stoffa 1995; Beaty et al. 2002) and many others.
Applications of Inverse Theory Seismic Inversion to Detect Thin Layers As an application of inverse theory in geophysics, we will provide here an example of a basis pursuit seismic inversion to detect the stratigraphic layer boundaries, especially
I
686
Inversion Theory in Geoscience
resolving thin layers, beneficial for reservoir characterization. Hydrocarbon exploration and development studies demand proper identification of layer boundaries and thin beds (mainly interbedded) in seismic. Although challenging, thinbed imaging aids to develop a suitable petrophysical model that is more representative of the reservoir. Given the dominant frequency content of seismic, delineation of a bed with only a few meters thickness is still a burdensome job. Seismic inversion based on the least squares method involving Tikhonov regularization often fails to provide focused transitions between the adjacent layers. The basis pursuit inversion (BPI) algorithm based on L1 norm optimization method can be formulated to obtain a more reliable detection of sharp boundaries between thin layers. The inverse problem is to infer the physical properties such as velocity (both Vp and Vs) and density of the stratigraphic layers from the amplitude variation with angle (AVA) seismic data. The initial model for prestack seismic data can be built in a similar way as performed in poststack case through interpolation and extrapolation of the well-log derived Vp, Vs, and density values. The BPI algorithm defines the objection function that comprises L2 norm of the data misfit function and L1 norm of the solution, and these functions are minimized concurrently to attain the final solution (Zhang et al. 2013). The objective function in the BPI method takes the following form min½jjd Gw mw jj2 þ ljjmw jj1 :
ð18Þ
Initially, a set of angle-dependent wavelets were identified through the calibration of individual angle datum with well log reflectivity, as a substitute for constant wavelets. The wavelet kernel matrix for the respective angle data was developed exploiting each wavelet having a length of 200 milliseconds (ms). The derivation of the density was ignored here since the maximum angle of investigation is only 20 , which is not suitable for inferring density. After testing the BPI algorithm in one well location, it was subsequently applied to a real field 2D seismic profile. The well log velocities are blocky filtered by a 10 ms window that essentially replaces the entire values within this specific window into the mean value. The inverted velocities were found to be in good agreement with the well log measured velocities, and the bed boundaries were well resolved in the seismic (Zhang et al. 2013). This also stands valid when 1D velocities derived from the well log, BPI method, and conventional method were compared. For a proper comparison, the blocky filtered well log velocities were adapted because the well log contains high frequencies, and it is not advisable to compare the unfiltered well data with the inverted results. As a whole, the BPI algorithm could delineate the bed boundaries exceptionally well as compared to the conventional approach, offering valuable information for quantitative interpretation of elastic
properties. Of course, some discrepancies between the well log measured velocities and inversion results were detected; however, the BPI inverted results could follow the similar trend adopted by the conventional method. The BPI algorithm includes a wedge dictionary that supports an improved resolution of the single as well as double thin beds (interbedded layers) potential for hydrocarbon exploration. To get more insights on the BPI method and its detailed application for reservoir characterization, readers are referred to Zhang et al. (2013). SVD Gravity Inversion to Identify Geological Structures The second application of inverse theory in geoscience is presented here as SVD gravity inversion to decipher geological structures. The Bouguer gravity anomaly was analyzed in terms of eigenimages using SVD inversion. In practice, SVD decomposes the matrix G into a sequence of eigenimages, which appeared to be suitable for the separation of signal from background noise. In consonance with the nonlinear theory, singular geological processes are those which entail anomalous aggregates of energy release or accumulation within a restricted range of space or time. Any such anomalous energy aggregates due to geological features may be captured in a few particular eigenimages. Thus, to characterize the geological structure’s potential for hydrocarbon or mineral prospects, it is significant to delineate those specific eigenimages employing the SVD inversion method. The SVD inferred singular values possess a unique feature of signifying diverse weighting coefficients of eigenimages. The SVD of the matrix G can be expressed as (Ganguli and Dimri 2013): l
sk Uk VTk :
G¼
ð19Þ
k¼1
According to the above equation, the G matrix can be reconstructed using the sequence of eigenimages. Note that the signal (i.e., Bouguer anomaly) can be suitably reconstructed from only those few eigenimages whose eigenvalues are not zero. In order to know which eigenimages will contribute more to the Bouguer anomaly reconstruction, the following equation can be considered: Pk ¼ s2k =∑lj¼1 s2j ¼ lk=∑lj¼1 lj ,
ð20Þ
where Pk denotes the percentage of each eigenimage contributing to the reconstruction, l is the rank of the matrix, and lk represents the eigenvalues in descending order. In this study, the application of eigenimage extraction using the SVD inversion is verified over a real field Bouguer anomaly from the Jabera-Damoh region, Vindhyan Basin (India). The Jabera-Damoh region has received importance
Inversion Theory in Geoscience
687
I
Inversion Theory in Geoscience, Fig. 3 (a) the Bouguer anomaly map along with the profile under investigation (i.e., AA/, as marked by blue line); (b) the reconstructed eigenimages inferred from SVD
inversion on the Bouguer gravity data from the studied region. (Modified after Ganguli and Dimri 2013)
in recent decades due to its potential for hydrocarbon prospects; the national oil and gas company has discovered gas near a borehole at a depth of around 3 km depth. (Ganguli and Dimri 2013). A Bouguer anomaly profile, namely AA/ (marked by the blue line in Fig. 3a) from this region was considered for the analysis since it passes through one of the major anomalous parts of this region. The key objective was to identify some important singular geological processes relevant for hydrocarbon prospects in the region through eigenimage analysis. It was identified that about 85% of the total anomalous energy of this gravity signal was dominantly captured by the first eigenimage and the remaining by the second eigenimage. The regional geological structure was efficiently captured by the first eigenimage, while the second eigenimage depicts the local geological characteristics together with noise from different sources (Fig. 3b). Note that the reconstructed eigenimages for the real field gravity data are akin to those derived for a buried faulted structure. In such a case, the sedimentary basin faulted on both margins can be characterized by relatively low (negative) singular values at the center
of the second eigenimage, separating the footwall and hanging wall corresponding to higher and positive singular values (Fig. 3b). The low singular values can be attributed to accumulated sediments of a low density from the studied sedimentary basin. The structure depicted in the second eigenimage may be related to the shallow features and not due to deeper geological bodies since the contribution of anomalous signal in the second eigenimage is always less as compared to the first eigenimage. This observation is further supported by other geophysical and geological studies from this basin, suggesting the study area to be dominated by faulted basin in the crystalline basement. For detailed analysis and interpretation, we refer the interested readers to the work by Ganguli and Dimri (2013).
Summary and Conclusions The inverse theory is crucial to geoscience studies for inferring the physical properties of the earth system. Here, we
688
presented an overview of the inverse theory together with its fundamental concepts and two applications to demonstrate the power of inversion in solving seismic and gravity inverse problems. Most direct inversion methods are not powerful to invert the data that are inadequate to recover properties of the earth and contaminated with noise. This unlocked an opportunity for the application of model-based inversion techniques to derive rock properties of interest, which have gained considerable popularity. Much of the success of the inversion method lies in the data quality and forward modeling approach. It is important to have a nominal set of model parameters that fully characterize the system, together with the proper implementation of a priori information and a forward model that can make predictions on some observable parameters based on relevant physical laws. Most of the forward operators are nonlinear and the performance of the inversion algorithm can be arbitrated by how better we elucidate the geoscientific observations knowing their uncertainty. We can get a simple solution if the inverse problem is linearized by imposing a set of limiting conditions to make the forward operator a linear one. Otherwise, iterative methods such as gradient-based methods and random sampling-based search methods can be adapted to solve the geoscientific inverse problems. Several plausible solutions (equally valid) can be derived, owing to the nonuniqueness of the problem. Global optimization methods have been useful to select the final model that attains the global minimum of the objective (misfit) function amidst several minima. These algorithms, however, do not presume any shape of the objective function and are independent of any type of options for starting models. Often highly nonlinear problems can be solved by considering an initial model very close to the peak of the probability density function (PPD) while using these methods. In reality, of course, estimating PPD is quite challenging in a large multidimensional model space. We have presented only two applications of inverse theory in geoscience, particularly in seismic and gravity studies. Nevertheless, the application and advanced proposals for the solution of the inverse problems are incessantly intensifying. Physics-guided machine learning-based algorithms for solving large-scale complex systems of equations are receiving augmented attention. With the advent of fast and economical computing facilities, these applications are encouraged and will continue to grow to solve various geoscientific problems.
Iterative Weighted Least Squares
Bibliography Beaty KS, Schmitt DR, Sacchi M (2002) Simulated annealing inversion of multimode Rayleigh wave dispersion curves for geological structure. Geophys J Int 151(2):622–631 Cockett R, Kang S, Heagy LJ, Pidlisecky A, Oldenburg DW (2015) SimPEG: an open source framework for simulation and gradient based parameter estimation in geophysical applications. Comput Geosci 85:142–154 Dimri VP (1992) Deconvolution and inverse theory: application to geophysical problems. Elsevier, Amsterdam Ganguli SS, Dimri VP (2013) Interpretation of gravity data using eigenimage with Indian case study: a SVD approach. J Appl Geophys 95: 23–35 Ganguli SS, Nayak GK, Vedanti N, Dimri VP (2016) A regularized Wiener–Hopf filter for inverting models with magnetic susceptibility. Geophys Prospect 64(2):456–468 Hampel FR (1968) Contributions to the theory of robust estimation. Unpublished dissertation, University of California, Berkeley Huber PJ (1964) A robust estimation of location parameter. Ann Math Stat 35:73–101 Koesoemadinata AP, McMechan GA (2003) Petro-seismic inversion for sandstone properties. Geophysics 68(5):1446–1761 Lodge A, Helffrich G (2009) Grid search inversion of teleseismic receiver functions. Geophys J Int 178:513–523 Menke W (1984) Geophysical data analysis: discrete inverse theory. Academic, New York Menke W (2012) Geophysical data analysis: discrete inverse theory, 3rd edn. Academic, 330 pp Press F (1968) Earth models obtained by Monte Carlo inversion. J Geophys Res 73(16):5223–5234 Richter M (2020) Inverse problems: basics, theory and applications in geophysics. Lecture notes in geosystems mathematics and computing, vol XIV, 2nd edn. Birkhäuser, Basel, 273 pp Sambridge M, Drij Koningen G (1992) Genetic algorithms in seismic waveform inversion. Geophys J Int 109:323–342 Sen MK, Stoffa PL (1995) Global optimization methods in geophysical inversion. Elsevier, Amsterdam Tarantola A (1987) Inverse problem theory, methods of data fitting and model parameter estimation. Elsevier, Amsterdam Tikhonov AN, Arsenin V (1977) Solution of ill-posed problems. Wiley, Washington, DC Wang R, Wang Y, Rao Y (2020) Seismic reflectivity inversion using an L1-norm basis-pursuit method and GPU parallelization. J Geophys Eng 17:776–782 Wilcox R (2011) Introduction to robust estimation and hypothesis testing, 3rd edn. Elsevier Zhang R, Sen MK, Srinivasan S (2013) A prestack basis pursuit seismic inversion. Geophysics 78(1):R1–R11
Iterative Weighted Least Squares Sara Taskinen and Klaus Nordhausen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä, Finland
Cross-References ▶ Forward and Inverse Stratigraphic Models ▶ Inversion Theory ▶ Monte Carlo Method
Definition Iterative (re-)weighted least squares (IWLS) is a widely used algorithm for estimating regression coefficients. In the
Iterative Weighted Least Squares
689
algorithm, weighted least squares estimates are computed at each iteration step so that weights are updated at each iteration. The algorithm can be applied to various regression problems like generalized linear regression or robust regression. In this entry, we will focus, however, on its use in robust regression.
wi ¼ 1=s2i is optimal. If we further assume that the error distribution is Gaussian, then the WLS estimate is the maximum likelihood estimate. Notice that in practice the weights are often not known and need to be estimated. Some examples of weight selection are given, for example, in Montgomery et al. (2012). Notice also that linear regression with nonconstant error variance is only one application area of WLS.
Introduction Iterative Weighted Least Squares Consider a data set consisting of n independent and identically distributed (iid) observations x⊤i , yi , i ¼ 1,. . ., n, where xi ¼ (xi1, . . ., xip)⊤ and yi are the observed values of the predictor variables and the response variable, respectively. The data are assumed to follow the linear regression model yi ¼ x⊤i b þ ϵ i , where the p-vector β ¼ (β1, . . ., βp)⊤ contains unknown regression coefficients, which are to be estimated based on the data. The errors ϵ i are iid with mean zero and variance s2 and independent of xi. The well-known least squares (LS) estimator for β is the b r 2 that minimizes the sum of squared residuals i¼1 ri(β) , n ⊤ where r i ðbÞ ¼ yi xi b , or equivalently, solves i¼1ri(β) xi ¼ 0. To put this into a matrix form, let us collect the responses and predictors into a n 1 vector y ¼ (y1, . . ., yn)⊤ and a n p matrix X ¼ (x1, . . ., xn)⊤, respectively, then the LS problem is given by argminky X bk22 , b
ð1Þ
and the estimate can be simply computed as bLS ¼ X ⊤ X
1
X ⊤ y,
1
X ⊤ Wy:
1. Compute an initial regression estimate b0 . 2. For k ¼ 0, 1,. . ., compute the residuals r i,k ðbk Þ ¼ yi xi bk and weight matrix Wk ¼ diag (w1,k, . . ., wn,k), where wi,k ¼ wðr i,k ðbk ÞÞ with some weight function w. Then update bkþ1 using WLS as in (2) with weight matrix Wk, that is, bkþ1 ¼ X ⊤ W k X
1
X ⊤ W k y:
3. Stop when maxi |ri,k – ri,k+1| < ϵ, where ϵ may be fixed or related to the residual scale. It is shown in Maronna et al. (2018) that the algorithm converges if the weight function w(x) is non-increasing for x > 0, otherwise starting values should be selected with care.
IWLS and Robust Regression
assuming that (X⊤X)1 exists. If the assumption of constant variance of the errors ϵ i is violated, we can use the weighted least squares method to estimate the regression coefficients β. The weighted least n squares (WLS) estimate solves i¼1wiri(β)xi ¼ 0, where w1,. . .,wn are some nonnegative, fixed weights. If we let W be a n n diagonal matrix with weights w1,. . .,wn on its diagonal, then the WLS estimate can be computed by applying ordinary least squares method to W1/2y and W1/2X. Thus, bWLS ¼ X ⊤ W X
When the weight matrix W in (2) is not fixed, but may, for example, depend on the regression coefficients via residuals, we can apply the iterated weighted least squares (IWLS) algorithm for estimating the parameters. In such a case, the regression coefficients and weights are updated alternately as follows:
Let us next illustrate how IWLS is used in the context of robust regression, which is widely used in geosciences as atypical observations should not have an impact on the parameter estimation and often should be detected. For a recent comparison of nonrobust regression with robust regression applied to geochemical data, see, for example, van den Boogaart et al. (2021). For a given estimate s of the scale parameter s, a robust M estimate of regression coefficient β can be obtained by minimizing
ð2Þ r
If we assume that the errors ϵ i are independent with mean zero and variance s2i , then the WLS estimate with weights
i
r i ðbÞ , s
ð3Þ
I
690
Iterative Weighted Least Squares
where r is a robust, symmetric (r(–r) ¼ r(r)) loss function with a minimum at zero (Huber 1981), or equivalently, by solving c i
r i ðbÞ xi ¼ 0, s
convergence in the case of simultaneous estimation of scale and regression coefficients, see Holland and Welsch (1977). Some widely used robust loss functions include the Huber loss function 1 2 x , 2
ð4Þ rH ¼
where c ¼ r0. Here the scale estimate s is needed in order to guarantee the scale equivariance of b. If the estimate is not known, it can be estimated simultaneously with b. For possible robust scale estimators, see, for example, Maronna et al. (2018). The IWLS algorithm can be used for estimating robust M estimates as follows. Write w(x) ¼ c(x)/x and further wi ¼ wðr i ðbÞ=sÞ . Then the M estimation equation in (4) reduces to
jxj c
c jxj
c , 2
jxj > c
yielding cH ðxÞ ¼
jxj c
x,
signðxÞ, c jxj > c,
and the Tukey biweight function 1 1
rT ¼ wi r i ðbÞxi ¼ 0,
3 x 2 , c
1,
jxj c jxj > c,
yielding
i
2 x 2 I ðjxj c
cÞ:
The tuning constant c provides a trade-off between robustness and efficiency at the normal model. Popular choices are cH ¼ 1.345 and cT ¼ 4.685 which yield an efficiency of 95% for the corresponding estimates. The two loss functions and their derivatives are shown in Figure 1. Notice that as the Tukey biweight function is not monotone, the IWLS algorithm may converge to multiple solutions. Good starting values are thus needed in order to ensure convergence to a good solution. The asymptotic normality
1
f(x)
2 0
0
1
f(x)
3
2
4
3
5
6
and the robust M estimate of regression can be computed using the IWLS algorithm. For the comparison of robust M estimates computed using different algorithms, see Holland and Welsch (1977). For monotone c(x), the IWLS algorithm converges to a unique solution given any starting value. Starting values, however, affect the number of iterations and should therefore be chosen carefully. Maronna et al. (2018) advice to use the least absolute value (LAV) estimate as β0. If the robust scale is estimated simultaneously with the regression coefficients, it is updated at each iteration step. The median absolute deviation (MAD) estimate can then be used as a starting value for the scale. For a discussion on
cT ðxÞ ¼ x 1
−4
−2
0 x
2
4
ρT(x) ψT(x)
−1
−1
ρH(x) ψH(x)
−4
−2
0
2
4
x
Iterative Weighted Least Squares, Fig. 1 Huber’s and Tukey’s biweight r and c functions with c chosen such that the efficiency at the normal model is 95%.
Iterative Weighted Least Squares
691
of robust M regression estimates is discussed, for example, in Huber (1981).
image compression such as in hyperspectral imagery (see, e.g., Zhao et al. 2020, for details).
Other Usages of IWLS
Summary and Conclusion
Generalized linear models (GLM, McCullagh and Nelder 1989), a generalization of the linear model described above, allows the response variable to come from the family of exponential distributions to which also the normal distribution belongs to. The regression coefficients in GLM are usually estimated via the maximum likelihood method which coincides with the LS method for normal responses. However, for other members from the exponential family, there are no closed form expressions for the regression estimates. The standard way of obtaining the regression estimates is the Fisher scoring algorithm which can be expressed as an IWLS problem (McCullagh and Nelder 1989). The Fisher scoring algorithm creates working responses at each iteration step. Therefore, when using IWLS for estimating coefficients of GLMs, not only are the weights in W updated at each iteration but also the (working) responses y. How Wk and yk are updated depends on the distribution of the response. For details, see, for example, McCullagh and Nelder (1989). The use of GLMs in soil science is, for example, discussed in Lane (2002). The idea in robust M estimation described above is to down-weight large residuals, whereas in the LS method, where the L2 norm is used in the minimization problem (1), these get a large weight. Another way of approaching the estimation is to consider another norm, such as a general Lp norm, which then leads to
The iterative weighted least squares algorithm is a simple and powerful algorithm, which iteratively solves a least squares estimation problem. The algorithm is extensively employed in many areas of statistics such as robust regression, heteroscedastic regression, generalized linear models, and Lp norm approximations.
arg min ky X bkpp , b
which, for example, for p ¼ 1 corresponds to the least absolute value (LAV) regression. It turns out that to solve the Lp norm minimization problem, also IWLS can be used with weights Wk ¼ diag (r1(βk)2 p, . . ., rn(βk)2 p). Uniqueness of the solution and behavior of the algorithm depend on the choice of p. Also residuals, which are too close to zero often, need to be replaced by a threshold value. For more details, see, for example, Gentle (2007) and Burrus (2012). The motivation for using Lp norm in regression is that it may produce sparse solutions for β and thus can be used in
Cross-References ▶ Least Absolute Value ▶ Least Mean Squares ▶ Least Squares ▶ Locally Weighted Scatterplot Smoother ▶ Ordinary Least Squares ▶ Regression
References Burrus CS (2012) Iterative reweighted least squares. OpenStax CNX Gentle JE (2007) Matrix algebra. Theory, computations, and applications in statistics. Springer, New York Holland PW, Welsch RE (1977) Robust regression using iteratively reweighted least-squares. Commun Stat Theor Methods 6(9): 813–827. https://doi.org/10.1080/03610927708827533 Huber PJ (1981) Robust statistics. Wiley, New York Lane PW (2002) Generalized linear models in soil science. Eur J Soil Sci 53(2):241–251. https://doi.org/10.1046/j.1365-2389.2002.00440.x Maronna RA, Martin RD, Yohai VJ, Salibian-Barrera M (2018) Robust statistics: theory and methods (with R), 2nd edn. Wiley, Hoboken McCullagh P, Nelder J (1989) Generalized linear models, 2nd edn. Chapmann & Hall, Boca Raton Montgomery DC, Peck EA, Vining GG (2012) Introduction to linear regression analysis, 5th edn. Wiley & Sons, Hoboken van den Boogaart KG, Filzmoser P, Hron K, Templ M, TolosanaDelgado R (2021) Classical and robust regression analysis with compositional data. Math Geosci 53(3):823–858. https://doi.org/10. 1007/s11004-020-09895-w Zhao X, Li W, Zhang M, Tao R, Ma P (2020) Adaptive iterated shrinkage thresholding-based lp-norm sparse representation for hyperspectral imagery target detection. Remote Sens 12(23):3991. https://doi.org/ 10.3390/rs12233991
I
J
Journel, Andre´ R. Mohan Srivastava TriStar Gold Inc, Toronto, ON, Canada
Biography
Fig. 1 André Journel (Courtesy Mohan Srivastava)
André Journel is a geostatistician, one of the first group of students to whom Georges Matheron taught his theory of regionalized variables. Educated as a mining engineer at the École Nationale Supérieure de Géologie in Nancy in the late 1960s, he joined the geostatistics research group at Fontainebleau in 1969 and worked there for a decade as a researcher, teacher, and consultant to mining companies around the world. His book Mining Geostatistics, co-authored with Charles Huijbregts in 1978, was the first comprehensive and rigorous reference book on the subject, the “bible” for geostatistics for many who explored the mysteries of the new science of spatial interpolation grounded in statistical theory.
In 1978, he became a professor at Stanford University, where he established one of the world’s preeminent geostatistics research programs. In the early 1980s, he moved the application of geostatistics into environmental sciences and championed the theory and practice of risk-qualified mapping. He worked with the US Environmental Protection Agency to create Geo-EAS, the first widely used public-domain software toolkit for geostatistical data analysis and interpolation. In the late 1980s, he and his students at Stanford pioneered many new methods for stochastic simulation, conditioned by hard and soft data. These new algorithms were quickly adopted by the international petroleum industry as the accepted approach for modeling rock and fluid properties in oil and gas reservoirs where correctly representing spatial heterogeneity in computer models of the subsurface is critical for accurate flow modeling and where risk analysis calls for a family of alternate scenarios, all honoring the same input data and user parameter choices. The principal vehicle for André Journel’s prolific research through the last half of his active academic career was the Stanford Center for Reservoir Forecasting (SCRF), a consortium of industry, government research organizations, and software companies that he founded in 1986 and directed for more than 20 years. By the time he retired from Stanford in 2010, he had served as the advisor and mentor to more than 50 graduate students and had been the catalyst for two more major public-domain software packages: GSLIB in the 1990s and SGeMS in the early 2000s. He guided the “Applied Geostatistics Series” published by Oxford University Press from its inception in the late 1980s through the early 2000s. He has been the recipient of many awards and honors, including the Krumbein Medal from the International Association for Mathematical Geosciences (1989), the Earth Sciences Teaching Award at Stanford (1995), the Lucas Gold Medal from the Society of Petroleum Engineers (1998), and the Erasmus Award from the European Association of Geoscientists and Engineers (2012). In 1998, he was elected to the US National Academy of Engineering.
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
K
K-Means Clustering Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition Clustering is a process of grouping n observations into k groups, where k n, and these groups are commonly referred to as clusters. k-means clustering is a method which ensures that the observations in a cluster are the closest to the representative observation of the cluster. The representative observation is given by the centroid, i.e., the mean, of the observations belonging to the cluster. This is akin to Voronoi tessellation of observation in a d-dimensional space, if the observations are represented using d-dimensional feature vector. An optimal k-means clustering is one where it achieves minimum intra-cluster, i.e., within-cluster, variance that also implies maximum inter-cluster, i.e., between-cluster, variance. Given a set of observations (x1, x2, . . ., xn) that are distributed/partitioned into k clusters, C ¼ {C1, C2, . . ., Ck}, such that k n, then the objective function for optimization is given by: k
arg min C
i¼1 x Ci
kx mi k2 ¼ arg min C
k
j Ci j s2 ðCi Þ,
i¼1
where mi is the mean of the observations in the cluster Ci and s2(Ci) is its intra-cluster variance. Intra-cluster variance is defined as the variance of the position coordinates of the observations in a cluster, i.e., given by the sum of squared Euclidean distances of all observations in the cluster with all the others within the same cluster. Similarly, inter-cluster variance refers to the sum of the distance of the observations belonging to two different clusters.
Euclidean distance or the L2 norm of the distance vector is conventionally used in k-means clustering. The use of Euclidean distances manifest as nearly spherical clusters. Thus, k-means clustering works the best for observations which are inclined to have spherical clusters. Other distance measures, such as Mahalanobis distance and cosine similarity, can also be used (Morissette and Chartier 2013). If the distance measure used is the Mahalanobis distance, then the covariance between the vectors of observations is considered. Similarly, if the distance measure is the cosine similarity, then correlation is used as the distance measure. Thus, the characterization of the distance measure influences the clustering outcomes.
Overview The term k-means was first coined in the context of qualitative and quantitative analysis of large-scale multivariate data (MacQueen et al. 1967). The Lloyd-Forgy algorithm (Forgy 1965; Lloyd 1982) is the classical implementation that is widely used today and is also referred to as the naïve k-means. The algorithm, in both Lloyd-Forgy and Macqueen variants, comprises six key steps: (i) choose k, (ii) choose distance metric, (iii) choose method to pick centroids of k clusters, (iv) initialize centroids, (v) update assignment of membership of observation to closest centroid, and update centroids. Step (v) is implemented iteratively until convergence, which is determined by no change from previous iteration. Apart from convergence, the termination condition could also be imposed by forcing the number of iterations. Now, the difference in both variants is in the manner of implementation of step (v). In the Macqueen algorithm, only observation is updated at a time, causing centroids to be updated, whereas in Lloyd-Forgy algorithm, the membership of all observations to the clusters is updated, and then the centroids are updated. Thus, Macqueen algorithm is an online/incremental algorithm, and Lloyd-Forgy algorithm is
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
696
an offline/batch algorithm. The former is efficient owing to frequent updates of centroids, whereas the latter is better suited for analyzing large datasets. The Lloyd and Forgy algorithms themselves differ in a minor point of considering the data distribution to be discrete and continuous, respectively. The Lloyd-Forgy algorithm is considered to be analogous to a spatial partition algorithm, specifically Voronoi tessellation. Hartigan-Wong implementation, unlike the Macqueen and Lloyd-Forgy variants, focuses on achieving the objective function for optimization using the local minima as opposed to the global minimum (Hartigan and Wong 1979). It follows the aforementioned steps (i)–(v), followed by an additional step (vi) compute the within-cluster sum of squares of errors (SSE) to locally optimize. It means that when a centroid of a cluster, say Ci, is updated in the last iteration, the withincluster SSE is compared with the within-cluster SSE if an observation currently in Ci were to be in another cluster Cj, such that (i 6¼ j). It does not guarantee a global optimum and hence, is a heuristic. Hartigan-Wong algorithm, like the Macqueen one, is sensitive to the order of change of cluster membership of the observations. The pros of the basic iterative implementation of the k-means clustering algorithm include its relatively simple implementation and guaranteed convergence. But, its cons include the requirement of the number of clusters k as an input and the initialization of centroids prior to the iterative implementation. While the latter is conventionally done using random values, there are several methods which improve the initialization, e.g., using progressive methods, structured partitioning, a priori information, hierarchical clustering (Ward’s method), etc. The k-means algorithm also suffers from the curse of dimensionality, as the ratio of the standard deviation to the mean of distance between observations decreases with increase in the number of dimensions. As the ratio decreases, the k-means clustering algorithm becomes increasingly ineffective in distinguishing between the observations. The curse of dimensionality can be alleviated using spectral clustering by projecting the observations to a lower dimensional subspace of k-dimensions, using principal component analysis, and then performing k-means in the k-dimensional space. Owing to the popularity of the algorithm, several improvements have been implemented on the k-means algorithm. The effectiveness of the algorithm can be measured using internal and external criteria. Internal criterion is a metric that measures if the global optimum has been reached, and the external criterion measures the similarity of the current partition with a known partition, e.g., ground truth. Dunn index and Jaccard similarity are examples of internal and external criteria, respectively. k-means algorithm is an unsupervised learning method, as it does not require training data for pattern matching.
K-Means Clustering
Applications The pros of the algorithm satisfy its two significant goals, namely, exploratory data analysis and reduction of its complexity (Morissette and Chartier 2013). Geoscientific and related datasets have the characteristics of unknowns in the data and of being large-scale and complex. Thus, k-means clustering is widely used in different geoscientific domains and for different types of data analysis. The examples in seismology, flood extent change detection from synthetic aperture radar (SAR) images, and land cover change detection (LCCD) from satellite images are covered as samples of the specific problems in different domains that use k-means clustering. The examples in clustering for machine learning and visualization tool show how k-means clustering is a tool or a method integrated with a data analysis toolkit or application. Overall, these examples demonstrate how k-means clustering is one of the standard data analysis methods used in geoscience. The k-means algorithm has been used in seismology for probabilistic seismic hazard analysis, where k-means is used to obtain a partition of earthquake hypocenters or alternatively a partition of fault ruptures (Weatherill and Burton 2009). The initialization of cluster centroids is done using ensemble analyses for identifying best choices. The k-means partitions of seismicity are then used as source models, whose representation in the regional seismotectonics is studied. The assessment shows that the k-means algorithm has given clusters that provide the most appropriate spatial variation in hypocentral distribution and fault type in the region. Similarly, k-means can be applied to features representing earthquake waveforms to study temporal patterns in lowmagnitude earthquakes in geothermal fields. This indicates the usefulness of the k-means algorithm in solid Earth geosciences. Change detection is done from multitemporal data, e.g., SAR and satellite images. Flood extent mapping has been done by change detection in SAR by using a combination of techniques using a two-step process (Zheng et al. 2013). Firstly, the difference image of SAR images of a region from two consecutive time-stamps is computed. After applying median and mean filters on the difference image, and the outcomes are combined, k-means algorithm is applied as a second step to cluster changed and unchanged regions. k-means algorithm has been found to be effective in capturing spatial variations in the difference image. Similar method can be applied for LCCD in remote sensed images, i.e., Landsat images (Lv et al. 2019). In addition to the aforementioned two steps, there is an additional step of adaptive majority voting (AMV) in local neighborhoods to identify the class type for land cover to analyze urban expansion, urban build-up changing, deforestation, etc. Majority voting is an ensemble analysis method. Overall, such integrated methods that include
K-Medoids
697
efficient methods, such as the k-means algorithm, provide flexibility of applications and generality of change. In addition to unsupervised learning applications, k-means algorithm has been used to reduce complexity in geoscientific data that is further visualized using appropriate metaphors (Li et al. 2015). For instance, geoscientific observation stations can be clustered for providing overview of the data, and these clusters are represented using simplified data visualization layouts, e.g., radial layout.
Weatherill G, Burton PW (2009) Delineation of shallow seismic source zones using K-means cluster analysis, with application to the Aegean region. Geophys J Int 176(2):565–588 Yu T, Zhao W, Liu P, Janjic V, Yan X, Wang S, Fu H, Yang G, Thomson J (2019) Large-scale automatic K-means clustering for heterogeneous many-core supercomputer. IEEE Trans Parallel Distrib Syst 31(5):997–1008 Zheng Y, Zhang X, Hou B, Liu G (2013) Using combined difference image and k-means clustering for SAR image change detection. IEEE Geosci Remote Sens Lett 11(3):691–695
Future Scope
K-Medoids
Geoscience and related datasets are now classified as big data, owing to the volume, velocity, variety, value, and veracity, i.e., the five Vs. Such complex data requires highperformance computing for its analysis. Hence, graphics processing unit (GPU)-based large-scale computations are implemented which entails hardware accelerated implementation of the algorithms, including k-means algorithm (Yu et al. 2019). The use of GPUs enable automatic hyperparameter tuning which allows the scalability of k-means algorithms to large datasets and also improving the potential of the algorithm for various applications. Parallel implementation of k-means has been used with real-world geographical datasets. Given the increasing complexity of the geoscientific and geographical data and metadata, there is potential for both hardware and software enhancements to be fulfilled, for the k-means algorithm. Overall, k-means algorithm is an effective and efficient clustering algorithm that has been successfully used on geoscientific datasets.
Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Bibliography Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769 Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108 Li J, Meng ZP, Huang ML, Zhang K (2015) An interactive radial visualization of geoscience observation data. In: Proceedings of the 8th international symposium on visual information communication and interaction, pp 93–102 Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137 Lv Z, Liu T, Shi C, Benediktsson JA, Du H (2019) Novel land cover change detection method based on K-means clustering and adaptive majority voting using bitemporal remote sensing images. IEEE Access 7:34,425–34,437 MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, vol 1, pp 281–297 Morissette L, Chartier S (2013) The k-means clustering technique: general considerations and implementation in Mathematica. Tutor Quant Methods Psychol 9(1):15–24
Definition A medoid is defined as a representative item in a dataset or its subset (or cluster), which is centrally located and has the least sum of dissimilarities with other items in the group (Kaufman and Rousseeuw 1987). Medoids are identified in a dataset to implement partitioning around medoids (PAM), which is a clustering method. Since PAM is used to generate K clusters, where K is a user-defined positive integer, such that K > 1, this method is also referred to as K-medoids clustering method (Kaufman and Rousseeuw 1987). A medoid is not equivalent to other statistical descriptors of median, centroid, or mean, but is the closest to median by virtue of an item in the dataset serving as a representative, as opposed to a computed entity. A median is identified by sorting items with respect to a single variable and picking the “middle” value of the variable, whereas the medoid is the item itself. Hence, medoids can be considered to be more generalized since it represents multivariate or multidimensional space, with flexibility in identifying them, based on the usage of appropriate dissimilarity or distance measures. Given that the rest of the article impresses upon geoscientific application, “item” will be referred to as “point” hereafter.
Overview K-medoids clustering is often compared with the K-means one (see the article on “K-Means Clustering”), owing to the similarity in the algorithms. A key difference in both these partitioning algorithms is that the cluster representative or center is an actual point in the case of K-medoids method, i.e., the medoid itself (Kaufman and Rousseeuw 1987),
K
698
K-Medoids
K-Medoids, Fig. 1 A graphical representation of the key difference in the cluster representative or the cluster center identified in (left) K-medoids and (right) K-means clustering algorithms. The magenta box indicates the cluster center, for a cluster of sites (blue circles)
indicated by its boundary (red dotted curve). (Image courtesy: Author’s own, but adapted from a diagram in the article by Jin and Han (2011) https://doi.org/10.1007/978-0-387-30164-8_426)
whereas a centroid is computed from the points present in the K-means cluster. Traditionally, by design, another important difference between the two methods existed that revolved around the constraint of use of Euclidean distance in K-means, which was generalized to any similarity measure in K-medoids. Thus, the first implementation of PAM (Kaufman and Rousseeuw 1987) focused on introducing the use of dissimilarity measures, e.g., one minus Pearson correlation, to be a governing factor in the partitioning algorithm. Over time, Euclidean distance, one minus absolute correlation, and one minus cosine similarity have also become widely used dissimilarity measures (Van der Laan et al. 2003). K-medoids clustering problem is known to be NP-hard and hence has approximate solutions (Fig. 1). The implementation of the original PAM has two steps, INITIAL and SWAP. INITIAL starts with D, a n n, dissimilarity matrix for n points, and M is a set of initial seed points of size K, used as medoids, for K clusters. Each point is associated with a seed point by virtue of its lowest dissimilarity with the seed point, i.e., its corresponding medoid. Then, SWAP is done iteratively which minimizes the optimizes M by minimizing the loss function, i.e., the sum of dissimilarities of each point and its corresponding medoid. In the SWAP step, each medoid is considered for a swap of roles with a non-medoid point, and the swap is implemented only when the recomputed loss function is minimized. This greedy algorithm terminates when SWAP does not find any more changes for implementation and is thus more efficient than an exhaustive search method. The original PAM algorithm recommends determining a data-driven K value by maximizing the average silhouette. The silhouette is computed based on the outcome of the implementation of PAM with a random K value (Kaufman and Rousseeuw 1990). PAM has been alternatively solved using the loss function that is the sum of maximum dissimilarity between a point and its corresponding medoid (Kaufman and Rousseeuw 1990). This methodology has been referred to as the K-center model.
PAM has been extended to cluster large datasets using a sampling strategy. This variant of PAM is called CLARA (clustering large applications) (Kaufman and Rousseeuw 1990). CLARA involves two steps, where the first step involves finding a sample in the large dataset and implementing PAM on this sample or subset, and the second step entails adding the remaining points (outside of the sample) to the current set of medoids. After the clustering of the entire dataset is completed, the medoids can be recomputed, as required. CLARANS (clustering large applications based on randomized search) has been a further improvement over PAM and CLARA, using an abstraction of a hypergraph (Ng and Han 2002). A graph is constructed in such a way that each node is a set of K objects, such that they can be potentially medoids, and two nodes are neighbors only when the cardinality of their set intersection is K 1. Thus, a node is an abstraction of an instance of the partition or clustering itself. PAM is now implemented on this graph as a search for the node with the minimum cost by going from an initial random node to a neighbor with the deepest descent in costs. CLARANS has been proven to run efficiently on large databases too. The attractive characteristic of PAM is in its flexibility of distance or similarity measures that can be used as per the semantics needed in the clustering. Medoids are specific cluster representatives, unlike the cluster centers used in similar partitioning methods are less sensitive to outliers (Van der Laan et al. 2003). In hierarchical clustering methods, the representative points used as medoids can be effectively used. This is owing to the completeness of data in all dimensions for the medoids, just as any other point in the data, whereas the computed cluster centers usually have the attributes only for the subset of dimensions used for the clustering process. While PAM does not find good clusters when the cluster sizes are unbalanced or skewed, this can be resolved by using the silhouette function as the cost function in the PAM algorithm (Van der Laan et al. 2003).
K-Medoids
The implementation of PAM is still inefficient and has time complexity of O(K3.n2) (Xu and Tian 2015), which is improved by implementing the distance matrix only once and iteratively finding new medoids, by running several methods to identify the initial medoids (Park and Jun 2009). This algorithm is referred to as a “K-means like” algorithm, which improves the efficiency of PAM. This faster algorithm has a time complexity of O(K(n K )2), which is high compared to the time complexity of CLARA, which is O(K3 þ nK), and to that of CLARANS, which is O(n), which also has to include its characterization of a randomized algorithm.
Applications This method is applicable to all datasets involving scattered points, which includes both geospatial locations and large collections or databases of definite objects, e.g., images, vector data (cartographic maps), and videos. One such example of clustering spatial points is that of ARGO buoys in the ocean done using an improved K-medoids method for anomaly detection in the ocean data (Jiang et al. 2019). This entails including a density criterion in identifying the initial seed points for running the K-medoids algorithm. For each cluster, the points that do not belong to the dense regions of the cluster are identified as outliers. In such an application, it has been found that the density-based approach in the K-medoids algorithm has improved the results by a higher accuracy score and a lower false detection rate. Similar applications have been found in the spatial clustering of traffic accidents. As an example of clustering of objects, different from geospatial locations, K-medoids have been used for clustering and classification of remotely sensed images of snowflakes acquired using a multi-angle snowflake camera (MASC) (Leinonen and Berne 2020). In this specific example, a generative adversarial network (GAN) is used for feature extraction from the images, and then, the feature vectors are clustered and classified using K-medoids and hierarchical clustering.
699
the form of MapReduce, which enables parallelism. Parallel seeding involves identifying a sample of points in the large dataset and conducting a global search over this subset of the points. The parallel refinement is in MapReduce that involves performing a local search in the entire dataset. The difference between local and global searches is in finding a new medoid within a considered group vs outside of the group, respectively. This method is designed to be implemented on Spark and Hadoop, which are both horizontally scalable big data frameworks. Faster implementation of PAM has been achieved by relaxing the number and choice of swaps in the SWAP step, e.g., FastPAM (Schubert and Rousseeuw 2019). This strategy is flexible to be incorporated with variants of PAM implementations, such as CLARA and CLARANS algorithms. FastPAM works by removing redundancy in computations for finding the best swap in PAM and swapping for multiple medoids simultaneously, owing to the independence of clusters. Another line of work on the K-medoid method is in incorporating it as a step in data science workflows, such as that for the classification of snowflake images (Leinonen and Berne 2020). Such integration is usually challenging when the method has to be coupled with another data mining method. In that regard, the flexibility built into the K-medoids clustering both in terms of choice of dissimilarity measure and initialization method for the seed points is useful in coupling it with other methods, such as feature extraction, feature ranking, etc. In summary, K-medoids clustering is a partitioning algorithm that provides robust cluster representatives which have proven to be useful across several geoscientific applications. Thus, this method has been successfully used for more than three decades.
Cross-References ▶ K-Means Clustering ▶ Neural Networks
Future Scope
Bibliography
As observed, one of the key criticisms against the K-medoids algorithm is its high run-time complexity, which has been the scope of improvement in the current and future work on this method. There are several improvements that have already been proposed. One such example is in the parallel implementation of the K-medoids algorithm, PA-MAE (parallel k-medoids clustering with high accuracy and efficiency) (Song et al. 2017). PAMAE implements a two-phase method, namely, parallel seeding and parallel refinement, for improving accuracy and efficiency. These phases are implemented in
Jiang H, Wu Y, Lyu K, Wang H (2019) Ocean data anomaly detection algorithm based on improved K-medoids. In: 2019 Eleventh international conference on advanced computational intelligence (ICACI). IEEE, pp 196–201. https://doi.org/10.1109/ICACI.2019.8778515 Kaufman L, Rousseeuw PJ (1987) Clustering by means of medoids. In: Proceedings of the statistical data analysis based on the L1 norm conference, Neuchatel, Switzerland, pp 405–416 Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York Leinonen J, Berne A (2020) Unsupervised classification of snowflake images using a generative adversarial network and K-medoids classification. Atmos Meas Tech 13(6):2949–2964. https://doi.org/10.
K
700 5194/amt-13-2949-2020. https://amt.copernicus.org/articles/13/ 2949/2020/ Ng RT, Han J (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016 Park HS, Jun CH (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst Appl 36(2):3336–3341 Schubert E, Rousseeuw PJ (2019) Faster K-medoids clustering: improving the PAM, CLARA, and CLARANS algorithms. In: Similarity search and applications (SISAP). Springer, Cham, pp 171–187. https://doi.org/10.1007/978-3-030-32047-8_16 Song H, Lee JG, Han WS (2017) PAMAE: parallel K-medoids clustering with high accuracy and efficiency. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1087–1096. https://doi.org/10.1145/3097983. 3098098 Van der Laan M, Pollard K, Bryan J (2003) A new partitioning around medoids algorithm. J Stat Comput Simul 73(8):575–584 Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193. https://doi.org/10.1007/s40745-0150040-1
K-nearest Neighbors
X is assigned a class label, i.e., the label of a majority of the objects in k objects from S, closest to X. Thus, a majority voting approach is implemented on a predetermined set of labeled local neighbors of X, N ðXÞ, where its set cardinality is k, for determining the class label of X. The closeness between objects is defined using pairwise distance functions, e.g., Euclidean distance. Given two objects X and Y, defined using attribute/feature vector a ℝp, the Euclidean distance is given by p
dðX, Y Þ ¼ kaðXÞ aðY Þk2 ¼
ðai ðXÞ ai ðY ÞÞ2 :
i¼1
Given N ðXÞ S, such that N ðXÞ ¼ fY 1 , . . . , Y k g and the class labels set C, the majority voting gives the class label of X as k
K-nearest Neighbors Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition The k-nearest neighbors (kNNs) generally refer to the kNN algorithm that was initially proposed as a nonparametric solution to the discrimination problem (Fix and Hodges 1951). The discrimination problem refers to that of determining if a random variable Z with an observed value z is distributed over a p-dimensional space according to parametric distributions F or G. There are three variants of the problem, where both F and G are (i) completely known; (ii) known, but parameters are unknown; and (iii) completely unknown. The seminal solution proposed for the variant (iii), now known as the kNN algorithm, uses a nonparametric approach. It is given that there are m samples, {Xi j 0 i < m} and n samples, {Yi j 0 i < n}, of F and G, respectively. The solution says that if the k-nearest neighbors of z include M Xs and N ¼ (k – M) Ys, where (k, M, N ℤ+), (k < m), (k < n), then using likelihood ratio discrimination, Z, can be assigned to M N F iff m > c n , where c is an appropriate positive constant. This is the nonparametric counterpart of the parametric condition used to solve variants (i) and (ii), i.e., gf ððzzÞÞ > c , using density functions f and g, of F and G, respectively. This proposed approach is now known as the majority voting. Over time, the kNN algorithm has been reframed as a classification method, where, in the absence of the parametric class distributions of a set of labeled objects S, a new object
cðXÞ ¼ arg max cC
¼
1, 0,
dðc, cðY i ÞÞ, where dðc, cðY i ÞÞ i¼1
if c ¼ cðY i Þ, otherwise:
Overview The basic kNN algorithm using majority voting has been improved using several strategies (Jiang et al. 2007). The use of Euclidean distance leads to inaccuracies in the presence of irrelevant features, which is referred to as the curse of dimensionality. Hence, strategies such as the use of optimal feature subsets based on feature ranking or filtering, attribute weighting using mutual information, frequency-based distance measure, etc. have been successfully used to overcome the curse of dimensionality. Adaptive or dynamic kNN involves identification of an optimal k value for each object prior to running the kNN algorithm or other related methods. The kNN algorithm is now extended beyond classification owing to the versatility of applications of the k-nearest neighbors (kNNs) themselves. The set of kNNs of an object X, N ðXÞ, can now be defined as its local neighborhood that can have different variants based on the choice of the distance function, which includes other distance measures, e.g., Hamming, Chebyshev distances, etc. The use of kNNs now include feature extraction, segmentation, regression, and other data mining processes. Generalizing the use of kNNs across all these applications entails applying appropriate (function) map on a specific property, q, of each of the local neighbors of X and subsequent reduce operation, i.e., accumulation functions, to determine the value of q at X. q is now generalized as class label, value of a feature in its hand-crafted feature vector, global or local descriptor, etc. In the basic kNN algorithm, the map is an identity function on class label, as q, and the majority function is used for accumulation. As an
K-nearest Neighbors
example of generalized application, in LiDAR (light detection and ranging) point cloud processing, the position coordinates of 3D points serve as q, the covariance matrix between a 3D point and its neighbor is the map, and the matrix sum is a reduce operation, which gives the design matrix (Filin and Pfeifer 2005). This matrix serves as the local geometric descriptor of the 3D point, required for feature extraction for point cloud classification. The local neighborhood used for the LiDAR points can be of kNNs or of spherical, cuboidal, or cylindrical shape.
Applications There are several geoscientific applications for which both kNN algorithm and kNNs have been successfully used. There has been a long history of classification problems pertaining to digital maps and satellite remote sensing image data for both pixel-based and image classification, for which kNN algorithm and its variants are useful. These examples here demonstrate that the kNN algorithm works effectively on different types of remote sensed images for domain-specific classification problems. • As an example of pixel-based classification, delineation of forest and non-forest land use classes has been done using kNN algorithm to classify plot pixels on digital map data, routinely used in Forest Inventory and Analysis (FIA) monitoring (Tomppo and Katila 1991). Satellite imagery and field data from FIA have been used to improve the weighting distances used in kNN method. The field data also serves as ground truth. The classification is used for constructing maps for basal area, volume, and cover type of the forests. kNN has been widely used for forest map delineation as it is effective for mapping continuous variables, e.g., basal area, volume, and cover type. Another reason for quick adoption of the kNN algorithm by the FIA community has been its easy integration with existing systems and methods. • In another example of pixel-based classification involving hyperspectral images (HSI), kNN algorithm with guided filter has been effective in extraction of the information of the spatial context as well as denoising classification outcomes using edge-preserving filtering (Guo et al. 2018). HSI classification involves using spectral features of pixels to classify them into different classes, e.g., different types of soil types, plant species, etc. The joint representation of kNN and guided filter has been found to work the best with the full feature space, without performing any dimensionality reduction. • As an example of image classification, for that of infrared satellite images to six different cloud types, kNN classifier has been found to be more accurate than the neural network using self-organizing feature map (SOFM) classifier, when applied to different texture features (Christodoulou et al. 2003). kNN classifier with weighted averaging has
701
also performed better than the traditional majority voting for cloud satellite image classification. Apart from classification, kNN regression has been widely used in geoscientific applications for prediction and estimation. As an example of regression, estimating ground vibrations in rock fragmentation is considered. Blasting is routinely used for rock fragmentation, and estimating the blast-induced ground vibration is an important problem to study the effect of blasting in the surrounding environment. A combination of kNNs and particle swarm optimization (PSO) has been effectively used for estimating the peak particle velocity (ppv) that is a measure of the ground vibration (Bui et al. 2019). Here, the kNNs are used as the specific observations that influence the output predictions in a regression model. The optimization of particle positions using the PSO algorithm enables improving the precision of the kNN regression. kNNs are effective for value replacement in several geoscientific applications. An example is in the use of nearest neighbors in improving the analysis of the geochemical composition of soil. For k ¼ 1, kNN approach is used to replace values screened of several elements (e.g., Al, C, Ca, Na, P, etc.) at sample sites (Grunsky et al. 2018). Usually, there are observations for several elements where the values are less than the lower limit of detection ( 0, its Laplace transform is f ðsÞ ¼ s12 , where s > 0, and thus, the Laplace inverse of f ðsÞ ¼ s12 is F(t) ¼ t, as shown in Fig. 1. If s 0, then the integral diverges. Thus, Laplace transformation, denoted by the symbol ℒ, is a map applied to a function F(t) to generate a new function f(s), and the same is applicable to its inverse, ℒ1 .
Theorem 1 If F(t) is piecewise continuous on [0, 1) and of exponential form with order α, then the transform ℒ(F(t)) exists for Re(s) > α and the integral converges. 1 For example, ℒðea:t Þ ¼ ðsa Þ, where Re(s) > α. Thus, this theorem defines the class of functions, L, with existing Laplace transform (Schiff 1999). Some of the useful properties of Laplace transform include: • Linearity: If F1 L for Re(s) > α, and F2 L for Re(s) > β, then F1 þ F2 L for Re(s) > max(α, β), and ℒ(c1F1(t) þ c2F2(t)) ¼ c1ℒ(F1(t)) þ c2ℒ(F2(t)), for constants c1 and c2. • Infinite Series: The Laplace transform of an infinite series in t can be done term-by-term computation if the series converges for t 0, exponential order α > 0, and a constant term K > 0, where the coefficients of the powers of t in the power series have absolute values, given by n jan j Ka n! . • Uniform Convergence: For functions in L, the integral in Laplace transform converges uniformly. • Translation Property: Also, known as shifting property, the first and second translations imply that shifting in the time domain (t) or Laplace domain (s), provides closed form solutions. For ℒ(F(t)) ¼ f(s):
First translation : ℒðea:t FðtÞÞ ¼ f ðs aÞ: Second translation : ℒðGðtÞÞ ¼ ea:s f ðsÞ, where GðtÞ ¼
Fðt aÞ,
if t > a
0,
if t < a
• Differentiation and Integration: There exist analytical solutions of derivatives of functions using its Laplace
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
716
Laplace Transform
Laplace Transform, Fig. 1 The Laplace transform of F(t) ¼ t and its inverse, as shown on an online tool for computing Fourier-Laplace transforms on the WIMS (WWW Interactive Multipurpose Server),
authored by Gang Xiao (1999). (Image source: https://wims.univcotedazur.fr/wims)
transforms and, similarly, of integrals using its inverse Laplace transforms. • Change of Scale Property: The analytical solutions for scalar multiplication of the functions give the following Laplace transform, i.e., if ℒ(F(t)) ¼ f(s), then:
Z-transform, which has potential applications with higherorder linear dynamic equations (Bohner and Peterson 2002).
1 s ℒðFða:tÞÞ ¼ :f : a a
The history of the modern Laplace transform is known to be not too old, as it started in 1937 with Bateman using it in 1910 and then was popularized by Bernstein and Doetsch in the 1920s and 1930s (Deakin 1992). The modern method was formalized in 1937, but prior to that, it was known as Heaviside operational calculus. The older Laplace transformation is closely related to Laplace transform and was then considered a solution to the PDE, t.X00 ((t) t.X0 (t) þ X(t) ¼ 0. Heaviside calculus was popular among electrical engineers, and Heaviside was a contemporary of Koppelmann and Boole. The Laplace transformation itself was introduced by Euler and then used and studied by Laplace, Abel, and Poincaré. In 1937 and later, the modern Laplace transform slowly replaced the Laplace transformation, when the solutions to the aforementioned PDE began to be used in the integral form, XðtÞ ¼ est :x1 ðsÞ:ds for x1(s) ¼ α.s2. Unlike
For a more exhaustive list of properties, the textbooks on Laplace theorem (Spiegel 1965; Schiff 1999) are highly recommended references. The direct or inverse Laplace transforms of some of the special functions have closed form solutions. Such special functions include Gamma function (Γ), Bessel function of order n (Jn), Heaviside’s unit step function (U), Dirac delta function, or unit impulse function (δ). Laplace transform can be seen similar to Fourier transform, in the context of transforming signals in the time domain into a different domain. It can also have overlap with Z-transform, where the one-sided Z-transform and the Laplace transform of an ideally sampled signal are equivalent, def when substituting z ¼ es:T , where s is the Laplace variable, 1 T ¼ fs , where fs is the sampling frequency. Additionally, the Laplace transform can be extended and unified with
Overview
C
the earlier solutions, which used fixed limits and precomputed tables, the new solution allowed for the application/problem to define the contour C in the integral equation.
Least Absolute Deviation
In practice, there are several numerical methods used for Laplace inversion (Davies and Martin 1979). There are broadly five classes of methods, namely, one which (i) computes a sample, (ii) expands F(t) in exponential functions, (iii) uses Gaussian quadrature, (iv) uses bilinear transformation, and (v) uses Fourier series. In (i), there are methods such as Widder and Gaver-Stehfast. In (ii), methods using Legendre polynomials, trigonometric functions, and methods by Bellman et al. and Schapery are included. In (iii), Piessens’ Gaussian quadrature and Schmittroth’s methods are considered. In (iv), methods by LaguerreWeeks, Laguerre-Piessens-Branders, and Chebyshev are included. In (v), methods by Dubner-Abate, SilverbergDurbin, and Crump are included. In current practice, these methods are included in several mathematical packages, e.g., Mathematica.
717
denoise, i.e., eliminate high-frequency components in advection flow in atmospheric waves, such as Kelvin waves (Clancy and Lynch 2011).
Conclusions In summary, several physical phenomena require Laplace transforms and their inversions for their modeling, simulation, and analysis. Hence, it continues to be a powerful tool used in geoscientific applications, even in conjunction with modern deep learning solutions.
Cross-References ▶ Fast Fourier Transform
Applications References There are several applications where the Laplace operator is used in geoscientific modeling and analysis. As an example, in differential geometry, generalizations of Laplace operator are widely used, such as Laplace-Beltrami operator and Laplace-de Rham operator. Each generalization has applications of its own, e.g., Laplace-Beltrami operator is widely used for shape analysis in computational geometry, for any twice-differentiable real-valued function in Euclidean space. The Laplace-Beltrami operator can be applied as the divergence of covariant tensor derivative and hence can be used on second-order tensor fields, such as stress, strain, etc., commonly used in geological or geophysical applications (Ken-Ichi 1984). Yet another example is in the use of waveform inversion using Laplace-transformed wavefields to delineate structures of the Earth, in three-dimensional (3D) seismic inversion (Shin and Cha 2008). In practice, a class of applications where the Laplace operator is widely used is in hydrological flows (Chen et al. 2003). For convergent flows, say for that in sand and gravel aquifer, Laplace transformed power series is used for solving the radially scale-dependent advection-dispersion equation (Chen et al. 2003). Advection-dispersion equation is a special case of advection-dispersion-reaction (ADR) equation. ADR equation is used as a transport model, e.g., contaminant transport in groundwater, where the advection part of the PDE models the bulk movement of the solute (contaminant) in the flow, the dispersion part models the spreading of the solute, and the reaction part models changes in the solute mass owing to biotic and abiotic processes. In groundwater flow modeling, inverse Laplace transform has been routinely used for studying the late-time behavior of the hydraulic head in the flow (Mathias and Zimmerman 2003). In shallow water equations, Laplace inversion has been used in a contour integral to simulate low-frequency components and at the same time,
Bohner M, Peterson A (2002) Laplace transform and Z-transform: unification and extension. Meth Appl Anal 9(1):151–158 Chen JS, Liu CW, Hsu HT, Liao CM (2003) A Laplace transform power series solution for solute transport in a convergent flow field with scale-dependent dispersion. Water Resour Res 39(8) Clancy C, Lynch P (2011) Laplace transform integration of the shallowwater equations. Part I: Eulerian formulation and Kelvin waves. Q J R Meteorol Soc 137(656):792–799 Davies B, Martin B (1979) Numerical inversion of the Laplace transform: a survey and comparison of methods. J Comput Phys 33(1): 1–32 Deakin MA (1992) The ascendancy of the Laplace transform and how it came about. Arch Hist Exact Sci 44(3):265–286 Ken-Ichi K (1984) Distribution of directional data and fabric tensors. Int J Eng Sci 22(2):149–164 Mathias SA, Zimmerman RW (2003) Laplace transform inversion for late-time behavior of groundwater flow problems. Water Resour Res 39(10) Schiff JL (1999) The Laplace transform: theory and applications. Springer Science & Business Media, New York Shin C, Cha YH (2008) Waveform inversion in the Laplace domain. Geophys J Int 173(3):922–931 Spiegel MR (1965) Laplace transforms. McGraw-Hill, New York
Least Absolute Deviation A. Erhan Tercan Department of Mining Engineering, Hacettepe University, Ankara, Turkey
Synonyms L1-norm; Least Absolute Error; Least Absolute Value; Minimum Sum of Absolute Errors
L
718
Least Absolute Deviation
Definition Least absolute deviation (LAD) is an optimization criterion based on minimization of sum of absolute deviations between observed and predicted values. The following sections discuss the problem within the scope of estimation. Let xik be the ith observation on the kth independent variable for i ¼ 1,. . .,n and k ¼ 1,. . .,p and yi be the ith value of the dependent variable. We want to find a function f such that f(xi1, xi2, . . ., xip) ¼ yi. To attain this objective, suppose that the function f is of the linear form yi ¼ b0 þ b1 xi1 þ b2 xi2 þ . . . þ bp xip þ ei
ð1Þ
where β 0, β1 , β 2, . . ., βp are parameters whose values are not known but which we would like to estimate and ε i is the random disturbance. We now seek estimated values of the unknown parameters, denoted by b 0, b1, b 2, . . ., bp that minimize the sum of the absolute values of the residuals n i¼1
jei j ¼
n i¼1
jyi yi j
ð2Þ
where yi ¼ b0 þ b1 xi1 þ b2 xi2 þ . . . þ bp xip is the ith predicted value, n is the number of observations, and p is the number of the independent variables. This type of minimization problem can arise in many fields of science and engineering, including regression analysis, function approximation, signal processing, image restoration, parameter estimation, filter design, robot control, and speech enhancement. LAD estimation is commonly used in regression analysis as an alternative measure to least squares deviation (LSD) that uses minimum sum of squared errors. It is more attractive than LSD when the errors follow non-Gaussian distributions with longer tails.
Historical Material Roger Joseph Boscovich introduced LAD criterion in 1757 to attempt to reconcile inconsistent measurements for the purpose of estimating the shape of the Earth. Surprisingly, it was nearly a half-century before Legendre published his “Principle of Least Squares” in 1805 (Farebrother 1999). Computational difficulties associated with LAD effectively prevented its use until the second half of the twentieth century. Charnes et al. (1955) indicated that solving least absolute deviation problem is essentially equivalent to solving a linear programming problem via the simplex method. This was probably the birth of automated LAD fitting
(Bloomfield and Steiger, 1983). Starting in 1987 a number of congresses were dedicated to the problems related to least absolute deviation, and the last one was held in 2002 (Dodge 2002). LAD did not attract Earth scientists very much due to computational and theoretical difficulties. Dougherty and Smith (1966) proposed the use of a simple trend surface method in which the sum of absolute values of deviations is minimized. While looking for resistant and robust methods of kriging, Henley (1981) devoted his book to robust methods of spatial estimation and suggested a kriging estimator based on the minimization of absolute error. As a linear estimator, kriging minimizes squared difference (error) between true and estimated value and therefore is not resistant to outliers. Dowd (1984) discussed several resistant kriging methods, reviewed robust and resistant variogram estimations, and introduced alternative methods. It is well known that the traditional estimator of the variogram is neither robust nor resistant. Walker and Loftis (1997) developed a median-type estimator based on distance-weighted LAD regression, minimizing the sum of absolute errors to optimally fit a polynomial regression to the observations.
Essential Concepts and Applications in More Detail When p ¼ 0, Eq.2 is reduced to minimization of ni¼1 jyi b0 j so that the optimal value of b0 is the median of the values yi, i ¼ 1,. . .,n. For the same estimation problem but with LSD, the solution for b0 is the arithmetic mean of the values of the dependent variable. It is easy to see that for an even number m of the values, the optimum value of b0 for LAD lies anywhere between the two central values of yi. Thus, the minimization of Eq. 2 with p ¼ 0 often leads to an infinity of solutions. In contrast, LSD produces unique solutions. When p 1, there are no simple formulas for minimization of Eq. 2 because it is non-differentiable. On the other hand, it is well known that it can be transformed into solving the linear programming problem as follows: n
ð ai þ z i Þ
Minimize i¼1
subject to b0 þ b1 xi1 þ b2 xi2 þ . . . þ bp xip þ zi ai ¼ yi ai 0, zi 0, i ¼ 1, . . . , n
ð3Þ
Least Absolute Deviation
719
Least Absolute Deviation, Table 1 The assay data for 12 samples collected from the iron deposit K2O (%) CaO (%)
4.50 10.50
2.11 7.00
Least Absolute Deviation, Fig. 1 Scatter plot of K2O versus CaO for assay data from the iron deposit. The fitted LAD line (solid) is CaO ¼ 1.85 K2O þ 2.16, while it is CaO ¼ 1.27 K2O þ 3.94 for LSD (dashed)
1.44 3.53
3.30 8.92
2.99 7.21
2.32 7.81
1.55 11.65
3.21 6.99
1.10 4.20
2.40 4.10
2.65 7.85
2.41 5.64
14
CaO (%) 12 10 8
Assay data LAD line LSD line
6 4 2
K2O (%)
0 0
1
where ai and zi are, respectively, the positive and negative deviation associated with the ith observation (Armstrong and Kung 1978). This problem can be solved by the simplex method. To illustrate the method, consider the data in Table 1 representing the variation of K2O and CaO for 12 samples from a volcanic syn-sedimentary iron deposit. Assume that K2O is the independent variable and CaO is the dependent variable. Figure 1 shows a scatter plot of K2O versus CaO together with the LAD and the LSD lines. In the plot, the solid line is the LAD fit to the data, while the dashed line is for the LSD fit. The intercept and the slope for LAD are b0¼2.16 and b1¼1.85, respectively. The LAD line was computed using the algorithm given in Armstrong and Kung (1978). Note that the data contains an outlier at the point (1.55% K2O, 11.65% CaO). As long as a sample remains on the same side of the LAD line, the estimates, b0 and b1, do not change. This is obviously not the case for LSD. LAD criterion is thus less sensitive to outliers than that LSD; the LAD estimator is called a robust estimator. Note also that the LAD line passes through two of the data points. This is always the case produced by the LAD algorithm for a nondegenerate data set with p ¼ 1. If the data set is subject to degeneracy, then the LAD line passes through more than two data points. Birkes and Dodge (1993) suggest simple solutions for such cases.
2
3
4
5
6
Conclusions Least absolute deviation is a robust criterion when the deviations follow distributions that are non-normal and subject to outliers. Since the objective function based on the LAD criterion is non-differentiable, there are no formulas for its solution; instead, the iterative algorithms such as the simplex method are used. Perhaps because of these computational and theoretical difficulties, robust methods developed on the LAD criterion have not gained wide acceptance among geoscience practitioners.
Cross-References ▶ Iterative Weighted Least Squares ▶ Least Absolute Value ▶ Least Squares ▶ Ordinary Least Squares ▶ Regression
Bibliography Armstrong RD, Kung MT (1978) Algorithm AS 132: least absolute estimates for a simple linear regression problem. J R Stat Soc Series C 27(3):363–366 Birkes D, Dodge Y (1993) Alternative methods of regression. Wiley, New York Bloomfield P, Steiger WL (1983) Least absolute deviations: theory, applications and algorithms. Birkhäuser, Boston
L
720
Least Absolute Value
Charnes A, Cooper WW, Ferguson RO (1955) Optimal estimation of executive compensation by linear programming. Manag Sci 1: 138–151 Dodge Y (ed) (2002) Statistical data analysis based on the L1-norm and related methods. Birkhäuser, Basel Dougherty EL, Smith ST (1966) The use of linear programming to filter digitized map data. Geophysics 31:253–259 Dowd PA (1984) The variogram and kriging: robust and resistant estimators. In: Verly G, David M, Journel AG, Marechal A (eds) Geostatistics for natural resources characterization. Springer, Dordrecht, pp 91–106 Farebrother RW (1999) Fitting linear relationships: a history of the calculus of observations 1750–1900. Springer, New York Henley S (1981) Nonparametric Geostatistics. Elsevier, London Walker DD, Loftis JC (1997) Alternative spatial estimators for groundwater and soil measurements. Ground Water 35(4):593–601
LAV is however mostly considered in a regression framework. Let y denote the vector of responses and X ¼ (x1,. . ., xn)⊤ be the matrix containing the p predictors for each response for the regression model y ¼ Xb þ «,
ð1Þ
where β is the p-variate unknown vector of coefficients one needs to estimate and ε is an n-vector of random disturbances. For the estimator b of β, the residuals are defined as: e ¼ ðe1 , . . . , en Þ⊤ ¼ y Xb: The least absolute value (LAV) estimator b of β minimizes thus
Least Absolute Value
n
U. Radojičić1 and Klaus Nordhausen2 1 Institute of Statistics and Mathematical Methods in Economics, TU Wien, Vienna, Austria 2 Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä, Finland
Definition The least absolute value (LAV) method is a statistical optimization method which has, as a minimization criterion, the absolute errors between the data and the statistic of interest. In the one sample location problem, the solution is the sample median, while in a regression context it is an alternative to the least squares regression and is also known as least absolute deviations (LAD) regression, least absolute errors (LAE) regression, or L1-norm regression.
ð2Þ
jei j: i¼1
An early formulation of LAV dates back to R. J. Boscovich in 1757 but the form mentioned above is due to F. Y. Edgeworth who considered it in 1887 for the simple linear regression model. LAV regression was however often overshadowed by ordinary least squares regression (OLS) which is formulated as the minimization problem: n
argmin b
e2i
ð3Þ
i¼1
and thus minimizes the squared deviations which can be seen as L2-norm regression.
Motivation Introduction Consider a random sample y ¼ (y1,. . .,yn)⊤ of size n. When one is interested in estimating the statistic t(y), then the least absolute value (LAV) method solves the given problem by minimizing the criterion: n
n
jyi tðyÞj or i¼1
jei j, i¼1
with ei ¼ yi t(y). Thus, it minimizes the sum of absolute deviations between the observed samples and the statistic. For example, in the one sample location case the solution to the given problem is the median of the given sample.
There is a large variety of reasons why OLS regression is probably the most common regression method, one of which is widely available efficient and easy-to-use computer routines which have made the application of OLS rather effortless and inexpensive. Also, for independent and identically distributed (iid) disturbances with finite variances, the OLS has many optimality properties, while for iid Gaussian disturbances it corresponds to the maximum likelihood estimator of model (1). However, it is well known that the OLS suffers if the disturbances have heavy tails and already a single outlier can ruin the estimates completely. The lack of robustness is due to the optimization criterion which squares the deviations and therefore tries to avoid large residual values. This might even cause outliers in the data to
721
80 60 20
40
Zinc [p.p.million]
100
120
140
Least Absolute Value
0
OLS LAV 0
100
200
300
400
500
600
Copper [p.p.million]
Least Absolute Value, Fig. 1 Mineral dataset with OLS and LAV model regression lines when modeling zinc content versus copper content. The suspicious observation is marked in red
be masked making outlier identification, based on residuals, difficult. Figure 1 visualizes this problem. The data shown there comes from 53 rock samples collected in Western Australia and gives the copper (Cu) and zinc (Zn) contents (in p.p. million) and can be found in the R package RobStatTM (Yohai et al. 2020) as the dataset minerals. The figure shows the OLS fit and the LAV fit and reveals that the OLS fit is much more attracted by the observation marked in red, which can be considered an outlier, than the LAV fit. Figure 2 shows the corresponding residuals for both fits and the outlier is much less extreme in the OLS residuals than in the LAV residuals. To address the lack of robustness and efficiency of OLS in such situations, many robust regression methods have been suggested. See, for example, Maronna et al. (2018) for a general overview and Wilson (1978) for a detailed comparison of OLS and LAV. The general idea is that LAV might be preferable over OLS whenever the median would be also more appropriate than the mean for the data at hand.
Computation of LAV Estimate One of the things that make LAV regression appealing when compared to other robust alternatives to OLS is that the LAV criterion is convex. However, although the idea of LAV
regression is as intuitive as the one of OLS, unlike the OLS cost function, the LAV cost function is not smooth, and hence the computation of LAV is not that straightforward and the explicit solution to it does not exist. Hence, an iterative approach, which is guaranteed by the convexity of the LAV cost function to converge to the (possibly not unique) global minimum, is needed to find an optimal LAV solution. The most common approach to obtain LAV estimates is based on the fact that the optimization problem can be reformulated as a linear program. Denote for that purpose yi ¼ xi ⊤ b as the fitted values, then the optimization problem with cost function as in Eq. (2) can be stated as the linear program n
min ei , b
ei , i¼1
under the constraints ei yi yi , ei ðyi yi Þ, for i ¼ 1, . . . , n: Thus, the constraints directly yield ei jyi yi j, implying that the function is being minimized for ei ¼ jyi yi j, i ¼ 1, . . . , n, and for the suitable choice of b. Hence, the LAV estimate can be obtained with one of the many techniques to solve a linear program, like, for example,
L
Least Absolute Value
40 20 -20
0
LAV residuals
40 20 0 -40
-40
-20
OLS residuals
60
60
80
80
722
0
10
20
30
40
50
0
10
20
Index
30
40
50
Index
Least Absolute Value, Fig. 2 Residuals of OLS (left) and LAV (right) models of the mineral dataset when modeling zinc content versus copper content. The residual of the suspicious observation is again marked red
the well-known simplex method (Barrodale and Roberts 1974). Other approaches to obtain the LAV estimates can be based on gradient descent (Money et al. 1982) or iterated least squares approach (Armstrong et al. 1979). A detailed list of various optimization methods can be found in Dielman (2005). A property already quite early recognized for LAV regression is that there is a solution which passes through p observations. However, going through all possible hyperplanes defined by p observations to see which is the minimizer of Eq. (2) is only feasible for small sample sizes. Another property of the LAV estimator worth mentioning is that it can, as the OLS, be seen as a maximum likelihood estimator. This is the case when the disturbances in Eq. (1) follow a Laplace distribution. To conclude, while no closed form solution for LAV regression exists, the problem is well formulated with many algorithms available and implemented, for example, in R in the packages quantreg (Koenker 2021) and L1pack (Osorio and Wolodzko 2020). Detailed comparison of the performance of the various algorithms can be found in Dielman (2005) and the references therein.
and positive density f at the median. Furthermore, assume that 1 ⊤ n X X ! Q for n ! 1, where Q is a positive definite, symmetric matrix. Then p n b b ! Z, where Z N 0, w2 Q1 comes from a multivariate normal distribution and w2/n is the asymptotic variance of the ordinary sample median of samples with distribution F, that is, w2 ¼ (2f (m))2, where m is the median of F. Moreover, under these general conditions, it has been proven that b is consistent for β. This asymptotic result clearly shows that for disturbance distributions for which the median is a more efficient estimator of the location than the mean, the LAV estimator b of β is more efficient than the corresponding OLS estimator (Bassett and Koenker 1978). Based on this key result it is straightforward to develop ⊤ inferential tools for β. Let b ¼ b⊤1 , b⊤2 , where β1 ℝp1 and β2 ℝp2, p1 þ p2 ¼ p. Consider then the tests for the nested hypothesis of the form H 0 : b2 ¼ 0;
Properties of LAV Estimator In this section, we give some of the asymptotic properties of LAVestimator b of β under certain “natural” assumptions. Let the vector of observations y follow model (1). Assume furthermore that the disturbances are independent and identically distributed (iid) with distribution function F and continuous
• Likelihood ratio test LR ¼ 2
LAV 1 LAV 0 , w
Least Absolute Value
723
where LAV1 and LAV0 are sum of absolute values of residuals obtained at the restricted and unrestricted models, respectively. • Wald test ⊤
WALD ¼ b2 Db2 =w2 , where b2 is the LAVestimate of β2 and D is the appropriate diagonal block of X⊤X. • Score test U ¼ g⊤2 Dg2 , where g2 is the appropriate part of the normalized gradient of the LAV cost function, evaluated at the restricted estimate, while D is as discussed in the Wald test. All three presented test statistics have, asymptotically, a chi-squared w2p2 distribution. Unlike the score test, both the likelihood ratio and the Wald tests however require estimation of the residual scale parameter w. Details on the tests and estimation of w can be found in Dielman (2005). Unlike most other robust alternatives to OLS, the LAV estimator possesses a number of equivariance properties. It has been proven that the LAV estimator is affine equivariant, shift and scale equivariant, and is also being equivariant to design reparameterizations. For details, see Bassett and Koenker (1978). These properties are also shared by the OLS. LAV possess another interesting property that has been inherited from a univariate median. Namely, for regression in ℝ2, the estimated LAV regression line does not change if we vary the responses, as long as they stay at the same side of the regression line. Although it is rather obvious in the case of the one-dimensional median, as nicely stated in Dasgupta and Mishra (2004), it gives us intuition to LAVs “median-type” robustness which transfers to certain insensitivity to outliers. It is obvious that this property of LAV is not being shared by OLS. However, it makes also clear that the LAV does not tolerate well outliers in the x values.
LAV Regression with Corrected Errors Linear regression is often used to model a linear trend in the data that is serially dependent. In such cases, the disturbances in the model are correlated and the key result about the limiting distribution from above cannot be directly applied. For simplicity consider the case (xt, yt), t ¼ 1, . . ., n, following model (1), where the disturbances follow an auto-regressive AR(1) process:
et ¼ ret1 þ zt , where r, |r| < 1 is the autocorrelation coefficient and the zt are iid random errors. There are several two-staged procedures used to correct the autocorrelation in the error terms and are commonly used in OLS regression as well. The first stage is to linearly transform the data using the autocorrelation coefficient r, after which the regression is done on the transformed data. The goal of the first stage is to obtain transformed data for the linear model that has uncorrelated errors. One of such two-stage procedures is Prais–Winsten (PW) (Prais and Winsten 1954). The PW transformation matrix for model (1) with disturbances from an auto-regressive AR(1) process with parameter r can be written as:
M¼
1 r2 r
0... 1...
0 0
0 0
⋮
⋮⋱
⋮
⋮
0
0...
r
1
:
The first stage of PW procedure is to pre-multiply model (1) by M yielding MY ¼ MXb þ M«, i:e:, Y ¼ X b þ « ,
ð4Þ
where the vector of transformed responses Y* is Y ¼ MY ¼
1 r2 y1 , y2 ry1 , . . . , yn ryn1 ,
and the matrix of transformed predictors (including a transformed intercept) is
X ¼ MX ¼
1 r2 1r ⋮ 1r
1 r2 x1,1 x2,1 rx1,1 ⋮ xn,1 rxn1,1
... ... ⋱ ...
1 r2 x1,p1 x2,p1 rx2,p2 ⋮ xn,p1 rxn1,p2
:
In the transformed model (4), ε* is the vector of uncorrelated errors. These transformations are possible due to the equivariance properties of the LVA estimator. Other two-staged methods tackle the problem of correlated errors in a similar manner. Discussion on the performance of various methods can be found in Dielman (2005).
Other Extensions of LAV In recent datasets the dimension p is increasing dramatically, often having p n and then sparse solutions b , of the regression problem, are required. Wang et al. (2007) suggested LAV Lasso which adds an L1 penalty on the β
L
724
coefficients. It is remarkable that the LAV Lasso can be reformulated as a regular LAV regression by augmenting the predictor matrix and can therefore be solved using the regular algorithms mentioned above. LAV regression is also sometimes called median regression as when there is only an intercept, i.e., in the one-sample problem, the estimator of the responses is the “median”. The median gives however only one view of the conditional distribution yi j xi. Quantile regression addresses these issues and can model any quantile of yi j xi, and therefore median regression is a special case of quantile regression. For a general overview of quantile regression, see, for example, Koenker (2005). The least absolute values methodology was originally developed in a univariate framework. However, seeing the minimization criterion as a minimization of an L1-norm of the model residuals allows a natural generalization to multivariate problems. General overviews over multivariate L1-norm methods are, for example, Oja (2010) and Nordhausen and Oja (2011).
Summary LAV is a general statistical approach, however, most known in the context of regression, where it is well developed and a serious competitor to the ordinary least squares regression. Compared to OLS it is more robust and has better efficiencies if the disturbances have a heavy-tailed distribution and is therefore, for example, recommended for Cauchy distributed disturbances and is optimal for Laplace distributed disturbances.
Cross-References ▶ Iterative Weighted Least Squares ▶ Least Absolute Deviation ▶ Least Mean Squares ▶ Least Squares ▶ Ordinary Least Squares ▶ Regression
Bibliography Armstrong RD, Ftome EL, Kung DS (1979) A revised simplex algorithm for the absolute deviation curve fitting problem. Commun Stat Simul Comput 8(2):175–190. https://doi.org/10.1080/03610917908812113 Barrodale I, Roberts FDK (1974) Solution of an overdetermined system of equations in the l1 norm. Commun ACM 17(6):319–320. https:// doi.org/10.1145/355616.361024
Least Mean Squares Bassett G, Koenker R (1978) Asymptotic theory of least absolute error regression. J Am Stat Assoc 73(363):618–622. https://doi.org/10. 1080/01621459.1978.10480065 Dasgupta M, Mishra SK (2004) Least absolute deviation estimation of linear econometric models: a literature review. SSRN Electron J. https://doi.org/10.2139/ssrn.552502 Dielman TE (2005) Least absolute value regression: recent contributions. J Stat Comput Simul 75(4):263–286. https://doi.org/10.1080/ 0094965042000223680 Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge, UK. https://doi.org/10.1017/CBO9780511754098 Koenker R (2021) quantreg: quantile regression. https://CRAN.Rproject.org/package¼quantreg. R package version 5.85 Maronna RA, Martin RD, Yohai VJ, Salibian-Barrera M (2018) Robust statistics: theory and methods (with R), 2nd edn. Wiley, Hoboken Money AH, Affleck-Graves JF, Hart ML, Barr GDI (1982) The linear regression model: Lp norm estimation and the choice of p. Commun Stat Simul Comput 11(1):89–109. https://doi.org/10.1080/ 03610918208812247 Nordhausen K, Oja H (2011) Multivariate L1 statistical methods: the package MNM. J Stat Softw 43:1–28. https://doi.org/10.18637/jss. v043.i05 Oja H (2010) Multivariate nonparametric methods with R: an approach based on spatial signs and ranks. Springer, New York Osorio F, Wolodzko T (2020) L1pack: Routines for L1 estimation. http:// l1pack.mat.utfsm.cl. R package version 0.38.196 Prais SJ, Winsten CB (1954) Trend estimators and serial correlation. Technical report, Cowles Commission discussion paper, Chicago Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-lasso. J Bus Econ Stat 25: 347–355. https://doi.org/10.1198/073500106000000251 Wilson HG (1978) Least squares versus minimum absolute deviations estimation in linear models. Decis Sci 9(2):322–335. https://doi.org/ 10.1111/j.1540-5915.1978.tb01388.x Yohai V, Maronna R, Martin D, Brownson G, Konis K, Salibian-Barrera M (2020) RobStatTM: robust statistics: theory and methods. https:// CRAN.R-project.org/package¼RobStatTM. R package version 1.0.2
Least Mean Squares Mark A. Engle Department of Earth, Environmental and Resource Sciences, University of Texas at El Paso, El Paso, Texas, USA
Definition Least mean squares is a numerical algorithm that iteratively estimates ideal data-fitting parameters or weights by attempting to minimize the mean of the squared distance of the errors between each data point and the corresponding fitted estimate. The minimization of least mean squares has no analytical solution and often relies on stochastic gradient descent, optimizing the first-order derivative of the least mean squares function to incrementally adjust and update model parameters or weights towards a global minimum. Typically, least mean squares makes updates to the weights after a
Least Mean Squares
725
randomly chosen sample or a subsample, rather than the entire batch, making it a form of adaptive learning. The method is used across a number of Earth science applications including regression and geophysical data filtering and is the basis of early machine learning methods.
Introduction Consider a dataset composed of a real, independent variable (y) that is a function of n real, dependent variables (x). One could assume that the relationship between y and x can be reasonably approximated by applying n weights (w) to the dependent variables, such as in linear regression: y ¼ w1 x1 þ w2 x2 þ . . . þ wn xn , or in vector form: y ¼ w Txi , where i¼1, . . .,n. For a given input (yi, xi), the least mean squares algorithm quantifies the goodness of fit, error, or the cost (J(w)) between yi and predicted estimates for a given w: J ðwÞ ¼
1 2
m
2
yi w T xi : i¼1
The function J(w) calculates the sum of squared errors or residuals between yi and the predicted estimates. The set of weights w that produce the smallest value of the function are considered optimal. There is no analytical solution for the minimization of J(w) so other numerical methods must be considered. While stochastic methods to find optimal solutions for w could be applied at this point, least mean squares often utilizes a gradient descent approach. If we make an initial guess (w0) for the weights w, using either then entire dataset or a subset thereof (referred to as the training dataset), we can calculate the loss function J(w) but would not be sure how to adjust the values of w for our next guess. Instead, we can calculate the gradient (∇) of the loss function and determine the direction in which J(w) decreases (Fig. 1). With that information, we can update or generate a new guess for w (w1) which makes a step in the direction of the steepest descent of J(w). With this new estimate for w, we can calculate a new ∇J(w), determine the direction of the steepest gradient, and take the step in this direction to generate yet another updated estimate of w (w2). Thus, the general algorithm goes through the following steps: 1. Select an initial estimate for w. 2. Determine the value of the loss function J(w), given w and the complete set, a subset, or a single random sample drawn from y, x.
Least Mean Squares, Fig. 1 Gradient descent method for approaching the least mean squares (smallest value of J(w)) by taking steps in the opposite direction of the gradient ∇J(w) to provide updated weights.
3. If J(w) is smaller than the convergence criteria, stop and accept the current values for w as estimates of the optimal solution. Else, move to step 4. 4. Calculate ∇J(w) using the current estimate of w (wt), and create an updated guess (wt+1), based upon a learning rate constant r: wtþ1 ¼ wt r∇J ðwt Þ 5. Start back at step 2
The ∇J(w) function has the form: ∇J ðwt Þ ¼
@J @J @J , , ..., : @w1 @w2 @wn
The partial differential of the least squares algorithm is solvable, allowing for calculation of the elements of ∇J(wt): @J 1 ¼ @wj 2
m i¼1
@ 2 y wT xi : @wj i
After application of the chain rule and algebra, the equation simplifies to: @J ¼ @wj
m
yi wT xi xij , i¼1
L
726
which represents the sum of the products of residual between yi and the predicted value based on wt and the complete set or a subset of x. Thus, while there is no analytical solution to minimize the loss least mean squares function directly, there is an analytical solution of its gradient. The learning rate r is generally chosen to be a relatively small constant. Generally, if r is chosen to be a large number, the algorithm will converge more quickly than if a small number is used because the size of the steps taken towards the steepest descent of ∇J(w) will be greater. Because the surface of least mean squares is a quadratic function, it is convex (no local minima) and has a global minimum value. However, if r is chosen to be very large, the iteration could repeatedly overshoot the global minimum leading to difficulty in convergence. There are several variations in gradient descent solutions for least mean squares minimization. In a batch method the update process is only completed after gradient ∇J(w) has been calculated for all of the samples in the complete dataset of y, x. At the other extreme, one can calculate ∇J(w) and update the weights for single, randomly selected samples of y, x. This latter method, known as the incremental or true stochastic gradient descent, provides more accurate updates than the batch method but is numerically expensive. Many authors consider the incremental stochastic gradient descent to be synonymous with the least mean squares method. Conceptually, because the gradient of the loss function is updated after
Least Mean Squares
every single individual sample is drawn, the method does not necessarily follow the steepest descent and produces a more random path towards the global minimum (Fig. 2). This approach allows for decisions about updating the weights to be made on the fly, which has implications for machine learning. A popular trade-off between the batch and incremental stochastic gradient descent methods is that of the minibatch stochastic gradient descent, where a subset or a fraction of y, x, is randomly drawn and the weights w are updated only after calculating ∇J(w) for the entire subset. One can conceptualize the mini-batch stochastic gradient descent as a path towards a minimum that it does not necessarily follow the steepest path, but not as circuitous as that of the incremental gradient descent. Of note, the example discussed above examines the linear regression between a single output or independent variable ( y) and one or more input or independent variables (x). However, the least mean squares algorithm is extremely flexible and can be applied to solving higher-order relationships and/or scenarios with multiple independent variables. As discussed below, geophysical applications of least mean squares have been successful in estimating ideal solutions for nonlinear adaptive signal processing filters. One serious consideration in the utilization of least mean squares methods is that estimates of optimal solutions can be strongly impacted by outliers and erroneous data because of the squaring of the error in the cost function. That is, highly anomalous or erroneous data can produce outsized loss values, as described for the entry on least squares. The effect of anomalous data can be further compounded when the method is solved using incremental stochastic gradient descent methods or mini-batch stochastic gradient descent with very small subsamples, as the impact of even a single significant outlier could cause ∇J(w) to be spurious and update the model weights to move away from a local minima. For this reason, datasets being used with least mean squares methods should be thoroughly examined in advance for errors or exceptional univariate and multivariate outliers.
Least Mean Squares as a Geophysical Data Filter
Least Mean Squares, Fig. 2 Cartoon depicting steps taken that update weights ŵ0 and ŵ1 and corresponding reduction of J(w) during a stochastic descent to the global minimum value. Note that the path is not necessarily the steepest descent, opposite the local gradient, but is rather an adaptive approach based on incremental analysis of input data.
Least mean squares is a method for adaptive data filtering and has been heavily used in the application of electrical signal and geophysical data processing. In this case, the user is attempting to reproduce the effect of an unknown filter h which produces one or more output signals y based on n known input signals x. For example, one can consider a case where two input signals x1 and x2, such as sine functions, are fed into an unknown linear transfer function h that produces an output signal g that is further perturbed by random noise b
Least Mean Squares
727
Least Mean Squares, Fig. 3 Schematic of an approach to utilize an adaptive least mean squares filter to emulate an unknown transfer function h. Taking a stochastic gradient descent of the least mean squares of the raw error has the effect of modifying the weights in the adaptive filter until it replicates the effect of h.
yi ¼ gi þ b, where y represents the combination of the unknown filter output with random noise. In order to simulate or mimic the effect of h on x, we can create an auxiliary system which ^ which allows for the application includes an adaptive filter h of variable weights w to the input signals (Fig. 3) and produces out ŷ: yi ¼ wT xi ¼ hðxi Þ:
We can make adjustments to w for sample i and quantify our accuracy in recreating the effect of h by comparing the least mean square error (J(w)) between the output of the two filters: J ðw Þ ¼
1 2
m
2
yi w T xi , i¼1
where i¼1, . . ., n. As described in the Introduction section, the method works to find optimum estimates of w from input data by determining the gradient ∇J(w) and updating the weights by moving in steps following the steepest descent of the subset of the training dataset. Assuming convergence toward the global minimum, the application of the final updated weights to the input signals should reasonably approximate the effect of the unknown transfer filter h . One significant advantage of the least mean squares filter is that it automatically adjusts the unknown filter weights incrementally as new data are added. This has allowed for their
application in the development of automated and adaptive filters such a noise cancellation and adaptive signal equalization (Widrow and Stearns 1985).
ADALINE and Neural Networks Widrow and Hoff (1960) presented the least mean squares algorithm as part of an adaptive linear (hence, “ADALINE”) machine that automatically classifies input patterns, including those affected by random noise. ADALINE and the similar perceptron (Rosenblatt 1958, 1962) mark early significant advances in the development of neural networks. The basic ADALINE model consists of the following steps: 1. Activate input xi, where i¼1, . . ., n. 2. Determine yi from yi ¼ b þ wTxi, where b is a random bias term, using the activated input from step 1. 3. Apply the following activation function to determine the output based on yi: f ðyi Þ ¼
1 if yi 0 1 if yi 1 and the observations are in general position, meaning that any p observations give a unique determination of b, then the finite sample breakdown point of the LMS estimator is ([n/2] p + 2)/n, where [n/2] denotes the largest integer less than or equal to n/2. When considering the limit n ! 1 (with p fixed), the asymptotic breakdown point of the LMS estimator is 50%,
0
10
20
30 Index
40
50
0
10
20
30
40
50
Index
Least Median of Squares, Fig. 2 Scaled residuals based on LS regression (left) and LMS regression (right) applied to mineral data. An observation with an absolute value of standardized residual larger than 2.5 is identified as an outlier and marked using filled black dots
Least Squares
731
and 2.5 are also added in the figure. The plot indicates that all scaled residuals are very small and although the LS method reveals two outliers, none of them is classified extremely atypical. The LMS estimator is not affected by observation 15 and the resulting fit is nearly equivalent to the LS fit obtained when the observation 15 is rejected. The LMS estimates for intercept and slope are 11.79 and 0.06, respectively (as compared to the LS estimates 7.96 and 0.13). The scaled residuals in Fig. 2 indicate that the method identifies altogether eight outliers and observation 15 clearly stands out in the residual plot. To have an idea about the model fit, Rousseeuw and Leroy (1987) define the coefficient of determination in the case of LMS as R2LMS ¼ 1
Mediani jr i j Madi yi
2
,
where Mad y denotes the median absolute deviation. For our example data R2LMS ¼ 0:59 which is much better than the regular coefficient of determination of the LS fit of R2LS ¼ 0:46. Finally notice that in this simple example it was easy to identify an outlier just by plotting the data. However, outlier identification becomes more difficult when the number of predictors increase and cannot be performed just by plotting the data. In such a case one should proceed as guided in Rousseeuw and Leroy (1987), for example.
Summary and Conclusion The paper introducing the LMS Rousseeuw and Leroy (1987) is considered as a seminal paper in the development of robust statistics and is, for example, appreciated in the Breakthroughs in Statistics series (Kotz and Johnson 1997). LMS demonstrates that high breakdown regression can be performed with outliers present in response and predictors. However, due to it is low efficiency and the development of better robust regression methods such as MM-regression (Yohai 1987), LMS is nowadays mainly used for exploratory data analysis and as an initial estimate for more sophisticated methods.
Bibliography Donoho DL, Huber PJ (1983) The notion of breakdown point. In: Bickel PJ, Doksum KA, Hodges JL (eds) A Festschrift for Erich L. Lehmann. Wadsworth, Belmont, pp 157–184 Hettmansperger TP, Sheather SJ (1992) A cautionary note on the method of least median squares. Am Stat 46(2):79–83. https://doi.org/10. 2307/2684169 Huber PJ (1973) Robust regression: asymptotics, conjectures and Monte Carlo. Ann Stat 1(5):799–821. https://doi.org/10.1214/aos/ 1176342503 Joss J, Marazzi A (1990) Probabilistic algorithms for least median of squares regression. Comput Stat Data Anal 9(1):123–133. https://doi. org/10.1016/0167-9473(90)90075-S Kotz S, Johnson NL (1997) Breakthroughs in statistics, vol 3. Springer, New York Maronna RA, Martin DR, Yohai VJ, Salibian-Barrera M (2018) Robust statistics: theory and methods (with R), 2nd edn. Wiley, Hoboken R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.Rproject.org/ Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79(388):871–880. https://doi.org/10.1080/01621459.1984. 10477105 Rousseeuw PJ, Leroy A (1987) Robust regression and outlier detection. Wiley, Hoboken, pp 1–335 Smith RE, Campbell NA, Litchfield R (1984) Multivariate statistical techniques applied to pisolitic laterite geochemistry at golden grove, Western Australia. J Geochem Explor 22(1):193–216. https://doi.org/10.1016/0375-6742(84)90012-8 Souvaine DL, Steele MJ (1987) Time- and space-efficient algorithms for least median of squares regression. J Am Stat Assoc 82(399): 794–801. https://doi.org/10.1080/01621459.1987.10478500 Steele JM, Steiger WL (1986) Algorithms and complexity for least median of squares regression. Discret Appl Math 14(1):93–100. https://doi.org/10.1016/0166-218X(86)90009-0 Stigler SM (1981) Gauss and the invention of least squares. Ann Stat 9(3):465–474. https://doi.org/10.1214/aos/1176345451 Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York Yohai VJ (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(2):642–656. https://doi.org/ 10.1214/aos/1176350366
Least Squares Mark A. Engle Department of Geological Sciences, University of Texas at El Paso, El Paso, TX, USA
Definition Cross-References ▶ Iterative Weighted Least Squares ▶ Ordinary Least Squares ▶ Regression
Least squares is a numerical method to estimate ideal solutions for model fit parameters in overdetermined systems by minimizing the sum of squares distance of the residuals between one or more variables for each data point and the corresponding model estimate(s). The approach only
L
732
Least Squares
considers uncertainty or error in the variables being used in the determination of residuals (e.g., response variables), but not in the other variables (e.g., predictor variables). The method is commonly used to approximate best-fitting solutions to both linear and nonlinear regression systems, but is also applied to more general applications including machine learning.
m predictor variables (xik) and model parameters βk, where k ¼ 1, . . ., m with a generalized form of: yi ¼ b0 þ b1 xi1 þ . . . þ bk xik þ e, where ε represents model error or uncertainty. In the case of simple linear least squares (also called ordinary least squares), based on a single xi, modeled values of yi are calculated from estimates of model parameters b0 and b1 :
Introduction Consider a set of n data points composed of one or more predictor variables (xi) and one or more response variables (yi), where i ¼ 1, . . ., n. Myriad methods exist to determine optimal parameters for numerical models estimating values of the response variable(s) ðyi Þ from the predictor value(s) for overfitted systems. One such method, least squares, is an efficient approach that utilizes the set of residuals or errors (ei), defined as difference between the measured and predicted response values: ei ¼ yi yi , but can more generally refer to any variables or set of variables that the model is attempting to fit. Regardless of the exact mathematical form of the prediction model, the least squares method evaluates candidates for the m model parameters (βk), where k ¼ 1, . . ., m, by calculating the sum of squares residuals or errors (SSe) for the data set: n
SS e ¼
yi ¼ b0 þ b1 xi , and the residuals are calculated as described in the “Introduction” section. As an example, let us say we want to provide rapid screening results in the field by developing a model to predict the density of water samples (typically a laboratory measurement) from their specific conductance measurements (a field measurement) for brackish groundwater samples from the Dockum aquifer in Texas, USA (data subset from Reyes et al. 2018). There exists an analytical solution for b0 and b1 in simple linear least squares regression that minimizes SSe: b1 ¼
SS xy SS xx
and b0 ¼ y b1 x, where
e2i :
n
i¼1
The set of m values for βk that produces the lowest SSe is considered the optimal solution. A generalized solution to the minimization of SSE for any numerical systems is to set a gradient with respect to βk equal to zero: @SS e ¼ 0: @bk
Linear Least Squares Regression One of the oldest and most well utilized applications of least squares methods is linear least squares regression, which was developed by Legendre (1805) and Gauss (1809). Linear least squares is a commonly applied method to generate linear estimates of yi in overdetermined mathematical systems, using m number of predictor variables. The method is pervasive throughout the earth sciences and used in a broad range of topics from evaluating potential controls on geologic processes to calibrating instrumentation and numerical models. Values of yi are modeled from linear combinations of the
SS x ¼
ðxi xÞ2
i¼1
and n
SS xy ¼
ðxi xÞðyi yÞ: i¼1
SSx is the sum of squared distances between the predictor values and its mean, which if divided by the degrees of freedom of xi is equal to its variance. Similarly, SSxy is the sum of squares for the xi and yi cross products. The sum of squares of yi (SSy) can be divided into the sum of squares accounted for by the model (SSp) versus that not explained by the model, which is equal to SSe or the sum of squares of the residuals: SS y ¼ SS p þ SS e , and can be further explored using Analysis of Variance (ANOVA) methods. It is then relatively intuitive to develop
733
1.015
least squares regression line residuals
1.010
Density (g/L)
1.020
Least Squares
5
10
15
20
Sp. Cond. (mS/cm) Least Squares, Fig. 1 Simple linear least squares regression model to predict density from specific conductivity data for groundwater samples from the Dockum Aquifer, Texas, USA. (Data subset from Reyes et al. 2018)
an estimator of fit quality (the coefficient of determination, or R2) calculated as portion of SSy explained by the model: R2 ¼
SS p : SS y
Relationships between xi, yi, yi , and ei, including the least squares regression line for the above example are illustrated in Fig. 1. In this case, SSy ¼ 2.05 104, SSp ¼ 1.42 104, and SSe ¼ 6.32 105, producing an R2 value of 0.69, meaning than nearly 70% of the sum of squares in the dependent variable is accounted for by the simple linear least squares regression model. Many texts suggest that in order for a linear regression model to satisfy its mathematical assumptions, the residuals must be independent, constant across the range of predictor variables (homoscedastic), and normally distributed. In reality, however, the requirement to meet assumptions varies depending on the purpose of the linear least squares regression model. For instance, Helsel et al. (2020) argue that if the intended use of the model is simply to predict yi from xi, then the only requirement is that the relationship between the predictor variable(s) and the response variable is linear and that the data used in the analysis are representative. Assumptions that the residuals be normally distributed are reserved for instances of hypothesis testing or calculating confidence intervals. Depending on the final purpose of the model predicting density from specific conductance, additional investigation about the nature of the residuals may or may not be required. Least squares solutions are used to estimate model parameters for many types of linear regression models including
linear regression, linear multivariate regression (many predictor variables and one response variable), and linear multivariable regression (many predictor and response variables). One documented problem with least squares models utilizing multiple predictor and/or response variables is that if sets of variables are strongly correlated (i.e., multicollinearity), the model results can be misleading. Methods such as stepwise, lasso, and ridge regression have been developed to determine variables most important in model prediction, and subsequently remove the remaining variables, as one method to overcome multicollinearity problems. This approach has the advantage that a more parsimonious model is generated, requiring fewer model inputs, than one utilizing all of the available variables. An alternative approach, utilized by least partial squares and principal component regression methods, is to extract latent components from the predictor variables. The latent components, which should have minimal correlation to one another, are used as predictor variables in the regression model instead of the original predictor variables. These methods make more efficient use of the data, while reducing multicollinearity and the number of parameters in the final model.
Nonlinear Least Squares Regression Linear regression methods are most useful when the relationship between the predictor variable(s) and the response variable(s): (1) follows a hyperplane and (2) can be reasonably recreated from linear combinations of the response variables. In some cases, the relationships between the original variables is not linear but can be approximated as such by transforming one or more variables and processing the transformed data with linear methods. However, in instances were variable transformation fails to produce linear relationships between xi and yi, or residuals vary in a nonrandom way around xi, nonlinear least squares regression methods can be attempted by utilizing a nonlinear prediction function f(x, β). In nonlinear least squares regression, there is no closed form analytical solution and optimal values of regression parameters are estimated iteratively, by adjusting the m model parameters at iteration t by a known quantity (Δβ) for the next time step (t þ 1): btþ1 ¼ btk þ Db: k If the model contains m predictor variables, then the goal of the iterative approach is to find model parameters βk that produce a value of zero for the following equation: @S ¼ 2 @bk
m
ri i¼1
@f ðxi , bÞ ¼ 0: @bk
L
734
Least Squares
Least squares methods require only the ability to measure residuals between response variable(s) and their modeled values and one or more model parameters to adjust and make efficient use of the data. This make least squares attractive for a wide range of geologic problems. Some of the earliest applications of least squares methods were fitting models to estimate the size and shape of the earth and moon and of the orbits of the planets (Nieverfelt 2000), for instance. Least squares models have also made in-roads as an efficient solution for machine learning techniques, particularly variations of support vector machines. Support vector machines are non-probabilistic supervised classification models that defines classifier functions (typically pairs of hyperplanes) that best separate each defined group of data, or projected data, in the dataset. For a given training data set {xk, yk}, for k ¼ 1, . . ., l, where xk are real data corresponding and yk is the group classifier (yk {1, +1} in the case of 2 groups), if there exists no hyperplane or set of hyperplanes that can perfectly separate the data, a slack variable (ζk) is added to the relationship such that:
Robust Least Squares Methods One criticism of least squares methods is that calculating the square of the residuals, as opposed to other distance measurements, biases the results towards data points that are far away from the predicted values. Comparison of sum of squares versus absolute residual distances (as is used in L1 linear least squares regression, which estimates the median value of yi for a given xi) shows that the former heavily weights points at distances when residuals are >1 while it gives relatively less weight when residuals are 0. wT is a transformed classification vector, j(yk) is nonlinear transformation of yk into high level space created using the so-called kernel trick, and b is constant. A common method to find estimates of optimized model parameters (wT and b ), which represent the hyperplane exhibiting the largest region between the groups of samples (i.e., margin) is to construct Lagrange multipliers, producing a quadratic optimization equation. Suykens and Vandewalle (1999) reformulated the problem into a least squares model producing, after utilization of Lagrange multipliers, a linear system to be solved. Least squares methods have made in-roads into other machine learning methods as well; Mao et al. (2017) applied a least squares approach to the discriminator function in generative adversarial networks to overcome the vanishing gradient problem using descent methods during learning.
Distance
yk ¼ wT ’ðyk Þ þ b 1 zk
10
The equation is numerically solved through the application of iterative convergence methods such as the Gauss-Newton and Gauss-Siedel algorithms. This generalized solution approach allows for the application of least squares modelfitting approaches to be applied to a range of other applications and models. However, unlike linear systems, solutions to nonlinear least squares models are not unique as multiple local minima can exist. Thus, there is no guarantee that solutions are optimal.
−4
−2
0
2
4
y^i − y i
Least Squares, Fig. 2 Comparison of sum of squares, absolute, and Huber’s M-estimator distance (using a tuning parameter of 1.345) based on residuals
LiDAR
adapt to outliers in both xi and yi and have breakdown points as high as 50%. Numerically, the objective function of least trimmed squares regression is nearly identical to linear least squares regression, except that: (1) the calculation is performed on a subset of the data (typically 50–75%) that exhibits the lowest minimum covariance determinant and (2) there exists no analytical solution. Comparison between least trimmed squares and least mean squares regression show that in instances with well-behaved data, the methods will produce nearly identical results. However, in cases where even a small proportion of outliers are present, as is common in many geological datasets, they are much more likely to have a significant impact on the optimized model parameters between the two methods, largely due to the difference in breakdown points. With improvements in computing power and optimization procedures and the similarity between models from traditional and robust methods, there is some logic in utilizing robust least squares methods as the default approach. Use of robust and resistant least squares methods, arguably, opens scientists and engineers to less potential criticism relative to an alternative approach of removing seemingly problematic points from the dataset. Work on methods to improve robust methods for least squares models and methods is continuing.
Summary Least squares methods have been applied to a wide range of applications in the earth and planetary sciences, dating back more than two centuries. The well-studied behavior of least squares models, their efficient use of the data, and their minimal data requirements make them an attractive choice for model fitting in overdetermined systems. Though mostly known for their use in linear regression models, least squares methods have also been successfully applied in nonlinear model fitting and are finding revitalization in machine learning algorithms. Users of least squares models should be aware of the heavy weight the method places on data points with large residuals and that for some applications robust alternatives are available to prevent deleterious modeling results from erroneous or problematic data.
Cross-References ▶ Iterative Weighted Least Squares ▶ Least Median of Squares ▶ Ordinary Least Squares ▶ Regression ▶ Support Vector Machines
735
Bibliography Gauss CF (1809) Theoria motus corporum coelestium. Perthes, Hamburg Helsel DR, Hirsch RM, Ryberg KR, Archfield SA, Gilroy EJ (2020) Statistical methods in water resources, Chapter 4-A3. In: Techniques of methods. U.S. Geological Survey, Reston, 458 pp Huber PJ (1964) Robust estimation of a location parameter. Ann Math Stat 35:73–101 Legendre A-M (1805) Nouvelles méthodes pour la détermination des orbites des comètes. F. Didot, Paris Mao X, Li Q, Xie H, Lau RY, Wang Z, Smolley SP (2017) Least squares generative adversarial networks. In: Proceedings of IEEE international conference on computer vision, pp 2794–2802 Nieverfelt Y (2000) A tutorial history of least squares with applications to astronomy and geodesy. J Comput Appl Math 121:37–72 Reyes FR, Engle MA, Jin L, Jacobs MA, Konter JG (2018) Hydrogeochemical controls on brackish groundwater and its suitability for use in hydraulic fracturing: the Dockum Aquifer, Midland Basin, Texas. Environ Geosci 25:37–63 Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, 329 pp Suykens J, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Processing Lett 9:293–300
LiDAR Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition Light Detection and Ranging (LiDAR) is a remote sensing technique which uses electromagnetic radiation, i.e., light, to measure the distance of objects of interest from the sensor. The time taken for the reflected radiation to return to the sensor gives the distance from the sensor. The reflected radiation, in comparison to the emitted radiation, loses energy during its onward and return travel. The loss in energy may be attributed to the optical properties of the material of the objects it is incident upon, as well as, the medium through which it has traveled. After normalizing for the latter, the former is useful information for semantically analyzing the acquired data, and reconstructing the region that has been scanned. LiDAR is an active remote sensing system, as it generates energy, which is emitted in the form of light from a laser sensor at a high rate. The LiDAR instrument has a transmitter to emit light pulses, and receiver to record the reflected pulses. The distance captured from the time taken by the reflected rays is corrected using the geolocation information of the sensor. This distance is the elevation in the case of airborne
L
736
sensors or aerial laser scanners, but is the distance to the sensor in the case of terrestrial or mobile sensors, mounted on static or dynamic platforms. The GPS (global positioning system) and the IMU (internal measurement unit) in the LiDAR instrument are used for obtaining accurate positional coordinates and orientation of the instrument to find absolute position coordinates of the objects of interest. Depending on the application and type of LiDAR system used, the objects of interest include objects in the scanned scene or topography, including buildings, vehicles, vegetation, and people. The advantages of LiDAR include high-density, high-accuracy discretized point observations, useful for generating digital elevation models (DEM), and diverse applications. LiDAR technology is versatile to be used for land management, hazard assessment (e.g., flooding), forest science, ocean sciences, etc. The measurements acquired from the LiDAR system are stored and managed as a point cloud, which is a set of position coordinates (x, y, z) from where the light pulses have been reflected. The LiDAR sensor captures the reflected light energy in two different ways, namely, the discrete returns and the full waveform systems. The discrete LiDAR records returns, which are points in the peaks of the waveform. Such a system can record one to four returns. However, a full waveform LiDAR system records more information through a full distribution of reflected energy. Both systems record discrete points, which are stored in .las file format. GeoTIFF formats are also used for storing LiDAR data. This data generally stores position coordinates of the three-dimensional (3D) points in the point cloud, intensity (or remission), and optionally, semantic class labels. Semantic class labels refer to the class of objects the point belong to, e.g., buildings and road in urban data.
Overview LiDAR technology is broadly categorized as airborne (aerial laser scanning (ALS)) and terrestrial (terrestrial laser scanning (TLS)) sensors, based on its data acquisition mode. Airborne LiDAR sensors are usually used for scanning larger areas, and are carried by low flying aircrafts or unmanned aerial vehicles. Terrestrial LiDAR is collected from land, from a fixed location or when mounted on land vehicles. Terrestrial LiDAR can scan 360 views. Airborne LiDAR is apt for scanning larger spatial scales, such as regions at city-level or for coastal projects, giving a bird’s eye view, in comparison to that of terrestrial LiDAR. Terrestrial laser scanning covers regions where airborne sensors cannot access, such as moving traffic, building interiors, etc. Combining airborne and terrestrial LiDAR remote sensing can provide complete details of regions of interest. Airborne LiDAR is subclassified as topographic and bathymetric, based on its applications. Topographic LiDAR
LiDAR
uses near-infrared light pulses to sense the land, and bathymetric LiDAR uses green light to penetrate through volume of water to sense the sea-floor or riverbed. The former has been originally used for studying topography or structures under dense tree cover or canopy, and in mountainous regions. The topographic LiDAR has also been used for computing accurate shorelines and flood maps. The bathymetric LiDAR has been widely used for studying underwater terrains, and varied hydrographic applications. The topographic and bathymetric maps essentially are elevation maps. Terrestrial LiDAR, used for scanning built-up area, are of two types – static and mobile LiDAR. Terrestrial LiDAR measures distance maps, which measures distances from the sensor in its 360 view. TLS scanners enable scanning above ground using fast and dense sampling. They capture data with low oblique angle of signals, and the light signals which are reflected by obstacles also include shadows in 3D point clouds. To mitigate such effects, TLS scanning is done from multiple viewpoints (Baltensweiler et al. 2017). The irregular point distribution and shadowing by obstacles introduce challenges in separation of ground and non-ground points in TLS data, more than so in ALS data.
Applications The LiDAR point clouds are used for generating digital elevation models (DEM), and TLS data is used for digital soil mapping, among other applications (Baltensweiler et al. 2017). The term “DEM” loosely include both digital surface model (DSM) and digital terrain model (DTM), of which the latter is specific to terrains (Chen et al. 2017). The spatial resolution of the DEM generated from ALS is within 0.1–1.5 m. The root mean square errors (RMSE) of the DEMs are dependent on the site/region types, and are of the order of 0.007–0.525 m and 0.286–0.306 m for TLS and ALS, respectively, considering at to uneven slope terrains. Coastal mapping is done extensively using LiDAR technology. This includes the use of terrestrial LiDAR to provide dry-beach DEM, and use of airborne LiDAR bathmetry for shoreline mapping by gathering data for regional coastal geomorphology and identifying shallow-water depth measurement (Irish and Lillycrop 1999). ALS data has been used for qualitative and quantitative channel dynamics in fluvial geomorphology, applicable to river catchment. The applications of ALS data also include ground filtering, building detection and reconstruction, tree classification, road detection, and semantic segmentation and classification (Lohani and Ghosh 2017). The use of Unmanned Aerial Vehicles (UAV), such as drones, has increased in various applications, including archeological remote sensing. Similarly, the modern usage of mobile LiDAR in sensing has been used widely in autonomous driving to support data analysis for blind-spots, lane
LiDAR
drifting, cruise control, etc. The interest in the area of vehicle LiDAR also include the development of varied computing platforms on the edge, and in the fog and the cloud. Overall the applications of LiDAR measurements are relevant based on the spatial resolution and accuracy values.
Point Cloud Analysis: With a Focus on ALS Airborne LiDAR or ALS point clouds include complete scans of top-view of sites, as shown in Fig. 1, which exploit geometric information for automated analysis. This geometric information is extracted using spatially local neighbors, as the LiDAR data preserves spatial locality. Such information is extracted using local geometric descriptors. These descriptors are defined as data that encodes the local geometric information, and are usually in the form of matrices or positive semidefinite second-order tensor fields (Sreevalsan-Nair and Kumari 2017). It is important for the matrices or second-order tensors to be positive semi-definite, as positive eigenvalues are required for further analysis. These local neighborhood are of varied shapes, depending on the application. Spherical, cylindrical, k-nearest, cubical neighborhoods are usually used. Neighborhood search is a computationally intensive process, which can be made efficient using appropriate data structures. Octrees and kd-trees are used conventionally to partition the 3D bounding box containing the point cloud. The use of efficient hierarchical data structures reduces overall time in descriptor computation and subsequent feature extraction. Semantic or object-based classification is a key data analytic process implemented on both ALS and TLS point
LiDAR, Fig. 1 (a) Point rendering of Aerial Laser Scanned (ALS) point clouds of topographic data from Vaihingen site, Germany, using data provided by ISPRS as benchmark data. The red points are linear features, delineating structures, e.g., building roofs, and localizing objects, e.g., foliage, and gray are remaining points. (b) Corresponding orthoimage.
737
clouds. Supervised learning is largely used for ALS point clouds, where the point-wise feature vectors are hand-crafted. These features include two- and three-dimensional features, height-, and eigenvalue-based features. The feature extraction involves computation of features from local geometric descriptors using multiple spatial scales to gather more relevant information, of which an optimal scale is determined (Weinmann et al. 2015). The spatial scale considered here is the size of the local neighborhood where the optimal scale is where the entropy in the geometric classification of the point cloud is a global minimum. Depending on the relevance of the features, there are up-to 21 features, computed at the optimal scale included in the feature vector (Weinmann et al. 2015). Random forest tree classifier is best suited for the semantic classification of the LiDAR point cloud. Being a supervised learning, sufficiently large training data is required for training the model to improve classification accuracy. Detection and localization/instantiation of specific classes, such as, buildings, road/ground, etc., are well-researched topics (Sampath and Shan 2009; Chen et al. 2017). The points can be geometrically classified as linear, planar, and volumetric features. They are alternatively called linetype, surface-type, and point/junction-type features. Combining semantic and geometric classification as a tuple of labels could be referred to as augmented semantic classification (Kumari and Sreevalsan-Nair 2015). The tuple of labels help in better graphical rendering of point clouds, where edges are delineated clearly. Larger sections of the point clouds tend to be surface-type features, as they include points on the exposed faces, interior to the object/instance boundaries. The normal to the surfaces can provide salient information. The normal information computed for multiple scales can be used to compute difference of normal (DoN) (Ioannou
(Image source: (a) Author’s own; (b) Dataset provider. Dataset Source: German Association of Photogrammetry, Remote Sensing and GeoInformation (DGPF) (Cramer 2010); https://www2.isprs.org/com missions/comm2/wg4/benchmark/3d-semantic-labeling/)
L
738
et al. 2012). DoN and other surface features have been effectively used for segmentation and classification. Geometric reconstruction is another significant data analytic application implemented on point clouds. Building roof extraction is an application where points classified as building points, and localized as instances are used to reconstruct regions for graphical rendering, 3D printing, computer aided design (CAD), etc. Roof reconstruction involves finding contour lines or boundaries, and planar facets. The former can be done by using break-line extraction (Sampath and Shan 2009), Marching Triangles method on a triangulated irregular network (TIN) of points, and tensor-line extraction (Sreevalsan-Nair et al. 2018). The points can be clustered effectively to improve segmentation (Sampath and Shan 2009). Once the boundaries are extracted, one can fill in the planar facets using plane equations and geometric approximation methods of plane fitting.
LiDAR
statistical analysis, and multi-scale aggregation methods. Use of the vehicle/mobile LiDAR for autonomous driving application is the latest entrant to be added to potential application of LiDAR point clouds. Geometric modeling is not directly extensible from ALS to mobile LiDAR owing to the incompleteness in scans of objects, especially of moving ones, in 360 views. Apart from bringing in variety in processing steps, efficiency is also a key requirement for edge, fog, and cloud computing that is posed to be used in autonomous driving, which is a potential use-case of interconnected network of sensors and things, referring to the internet of things (IoT). In summary, LiDAR imagery in the form of 3D point clouds is an effective remote sensing method which is useful for varied applications in geoscience and GIS (geographical information systems), owing to its accuracy and sampling.
Bibliography Future Scope Efficient and accurate DEM generation continues to be a problem of interest, specifically to improve the LiDAR point analysis. DTM generation can be improved by combining several related DTM generation methods (Chen et al. 2017), which is applicable to the larger set of DEMs. This involves combining different multi-scale aggregation algorithms, clustering, segmentation, classification, and geometric extraction methods. There are two approaches in combining methods. In the first approach, processes or components in the generation algorithms may be combined. In the second approach, the different outputs (DEMs) may be merged to give an improved DEM. Another way of improving DEMs is the multimodal method, which involves inclusion of different data sources, in addition to LiDAR data. Thus, this method addresses the deficiencies in information capture and processing in LiDAR. Spectral features from aerial images, high resolution satellite imagery, high resolution photogrammatic point clouds are some of the data sources that have been successfully combined with LiDAR to extend its applications beyond DEM or DTM generation (Chen et al. 2017). Apart from the widely used discrete return LiDAR, the full-waveform LiDAR data has demonstrated strong potential for generating DTMs. Some of the challenges faced in mobile LiDAR in data processing in the presence of high data uncertainty are handled by the use of machine learning and neural network models. However, this work requires in-depth research, as in ALS point clouds, to improve on the different classes of methodology for point cloud processing. For DTM generation using ALS, there are six different classes of methods, namely, surface-based corrections, morphological filters, TIN refinement, segmentation in conjunction with classification,
Baltensweiler A, Walthert L, Ginzler C, Sutter F, Purves RS, Hanewinkel M (2017) Terrestrial laser scanning improves digital elevation models and topsoil pH modelling in regions with complex topography and dense vegetation. Environ Model Softw 95:13–21 Chen Z, Gao B, Devereux B (2017) State-of-the-art: DTM generation using airborne LiDAR data. Sensors 17(1):150 Cramer M (2010) The DGPF-Test on Digital Airborne Camera Evaluation – Overview and Test Design. Photogrammetrie – Fernerkundung – Geoinformation (PFG) 2:73–82. https://www.dgpf.de/pfg/2010/ pfg2010_2_Cramer.pdf Ioannou Y, Taati B, Harrap R, Greenspan M (2012) Difference of normals as a multi-scale operator in unorganized point clouds. In: 2012 second international conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), IEEE, pp 501–508 Irish JL, Lillycrop WJ (1999) Scanning laser mapping of the coastal zone: the SHOALS system. ISPRS J Photogramm Remote Sens 54(2–3):123–129 Kumari B, Sreevalsan-Nair J (2015) An interactive visual analytic tool for semantic classification of 3D urban LiDAR point cloud. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, ACM, p 73 Lohani B, Ghosh S (2017) Airborne LiDAR technology: a review of data collection and processing systems. Proc Natl Acad Sci India Sect A Phys Sci 87(4):567–579 Sampath A, Shan J (2009) Segmentation and reconstruction of polyhedral building roofs from aerial LiDAR point clouds. IEEE Trans Geosci Remote Sens 48(3):1554–1567 Sreevalsan-Nair J, Kumari B (2017) Local geometric descriptors for multi-scale probabilistic point classification of airborne LiDAR point clouds. In: Modeling, analysis, and visualization of anisotropy. Springer, New York, pp 175–200 Sreevalsan-Nair J, Jindal A, Kumari B (2018) Contour extraction in buildings in airborne LiDAR point clouds using multiscale local geometric descriptors and visual analytics. IEEE J Sel Top Appl Earth Obs Remote Sens 11(7):2320–2335 Weinmann M, Jutzi B, Hinz S, Mallet C (2015) Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers. ISPRS J Photogramm Remote Sens 105:286–304
Linear Unmixing
739
Theory
Linear Unmixing Katherine L. Silversides Australian Centre for Field Robotics, Rio Tinto Centre for Mining Automation, The University of Sydney, Sydney, NSW, Australia
Definition
Linear mixing can be considered as a matrix problem. The first matrix, X, contains D characteristics observed in N samples. Each sample is assumed to each be a mixture of P endmembers, where the characteristics of each composite sample are a weighted average of the endmember characteristics. The second matrix, T, is a PxD matrix that describes the characteristics of each endmember. These can be used to solve X ¼ Y ∙ T,
Linear unmixing is a method of calculating the composition of a mixed sample (Weltje 1997). A geological example is when a rock sample is known to be a physical mix of certain minerals. These minerals are the endmembers. If the chemical composition of the endmembers and the sample is known, linear unmixing can be used to estimate the amount of each mineral in the sample. If the endmember minerals or their compositions are unknown, instead bilinear unmixing can be used to estimate both the endmember and sample compositions (Full 2018; Weltje 1997).
where Y ¼ (yij) is a matrix containing the weights of the jth endmember in the ith sample. Solving for Y therefore provides the composition of each sample. In linear unmixing, the endmember characteristics, and therefore T, are known. In the bilinear problem, the endmember characteristics are unknown, and more complex solutions are required to solve for Y and T simultaneously (Tolosana-Delgado et al. 2011; Full 2018).
Introduction
Mathematical Example
A linear mixture occurs when endmembers are physically mixed or a sample has a composition that can be mathematically considered to be a combination of distinct endmembers with fixed compositions (Weltje 1997). An example of a linear mixture is when red and blue grains are mixed and the result is a mix of individual red and blue grains, not purple grains. When the composition of a sample and the endmembers is known, linear unmixing can be used to calculate the percentage of endmembers present in the sample. If the composition of the endmembers is unknown, the problem becomes bilinear (or explicit) unmixing. In this case, the composition of the endmembers is also estimated at the same time (Tolosana-Delgado et al. 2011; Weltje 1997). Geological specimens or rocks can be considered to be comprised of linear mixes of endmembers, where the endmembers are a set of minerals with known chemical compositions (Weltje 1997). All of the samples are then comprised of a mixture of these endmembers. If the number of endmembers and their compositions is known, then the bulk composition of each rock can be calculated or estimated using linear unmixing (Weltje 1997). Linear mixtures are used in situations such as identifying minerals from bulk geochemistry (Renner et al. 2014; Silversides and Melkumyan 2017; Tolosana-Delgado et al. 2011; Weltje 1997) and hyperspectral data analysis (Bioucas-Dias et al. 2012; Chase and Holyer 1990; Dobigeon et al. 2009; Liu et al. 2018; van der Meer 1999).
A system containing three endmembers, which are measured using four characteristics, can be described using a 3 4 matrix. For example,
T¼
L
4
1 2
4 1
1 6 4 2
1
4 0
If there are two composite samples, they can be represented by a 4 2 matrix, for example, Y¼
1
1
4 4
2
5
3 2
These can then be used to find the weights using the equation above. Using these examples, the weights are X¼
22 28 33 27
24 40
This means that the first sample contains the endmembers in the ratio 22:28:24 and the second sample contains the endmembers in the ratio 33:27:40.
740
Applications Linear unmixing can be applied to any sequence of data that contains mixtures of materials where each mixture produces observations that are a linear combination of the endmembers. Common applications include hyperspectral analysis and determining mineralogy from bulk chemistry. These are briefly discussed below. Hyperspectral cameras or sensors can be used in geoscience applications to determine the proportion of materials such as minerals. These sensors measure electromagnetic energy across numerous wavelengths. In the geosciences, bands covering the visible, near-infrared, and shortwave infrared spectral bands are frequently used. The signal or spectrum recorded by a hyperspectral sensor is a result of the light reflected by the objects located in its field of view. Spectroscopic analysis can then be used to determine the composition of these objects. In mineral scanning, it is common for each pixel in a sensor to receive a mixed spectrum due to the mineral size being finer than the sensor resolution. If certain conditions are met, the mineral mixing is considered linear, and linear unmixing can be used to measure the abundance of each mineral present. These conditions can be summed up by each photon only interacting with a single mineral, so that the signal received is a sum of photons from individual minerals. These minerals are the endmembers, and their known spectral signatures can be used as the characteristics for linear unmixing of the hyperspectral data to find the fractional abundances of the minerals (Bioucas-Dias et al. 2012; Dobigeon et al. 2009; van der Meer 1999). Linear unmixing can also be applied to hyperspectral data in many other diverse applications, for example, mapping coral bathymetry (Liu et al. 2018) or identifying sea ice type and concentration (Chase and Holyer 1990). The chemical composition of a sample alone cannot provide all of the information that is desired for mining and processing. For example, factors such as hardness, crystallinity, and element substitution can affect the processing and upgradability of ore. While there are methods that can directly measure the minerals, such as X-ray diffraction (XRD), they are expensive and time-consuming. Therefore, they are typically not routinely performed during a mining operation. However, when the minerals present in a deposit are known, along with their typical chemical composition, linear unmixing can be used to calculate the mineral composition of a particular sample (Silversides and Melkumyan 2017; Tolosana-Delgado et al. 2011). In this situation, the characteristics are the measured chemical species, the endmembers are the pure minerals, and the samples are the geochemical assays of the mixed composition rocks. Examples of using linear unmixing to determine mineralogy include mineral
Linear Unmixing
composition in production holes in banded iron formation (BIF) hosted iron ore (Silversides and Melkumyan 2017), determining sediment mixtures in moraines in retreating glaciers (Tolosana-Delgado et al. 2011) and determining the mineral composition of deep-sea manganese nodules (Renner et al. 2014). Note that this application uses compositional data. Further information on compositional data and methods of processing is presented in other chapters.
Summary Linear unmixing involves calculating the composition of a mixed sample using the characteristics of the sample and the characteristics of the endmembers. If the characteristics of the endmembers are unknown, then bilinear unmixing is used to find these characteristics simultaneously with the proportions of the endmembers present in the mixture. Linear unmixing has a wide range of geoscientific applications including mineral identification from hyperspectral sensing and determining mineralogy from bulk chemistry.
Cross-References ▶ Compositional Data ▶ Hyperspectral Remote Sensing
Bibliography Bioucas-Dias JM, Plaza A, Dobigeon N, Parente M, Du Q, Gader P, Chanussot J (2012) Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches. IEEE J Sel Top Appl Earth Obs Remote Sens 5(2):354–379 Chase JR, Holyer RJ (1990) Estimation of sea ice type and concentration by linear unmixing of Geosat altimeter waveforms. J Geophys Res 95(C10):18015–18025 Dobigeon N, Moussaoui S, Coulon M, Tourneret J, Hero AO (2009) Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery. IEEE Trans Signal Process 57(11): 4355–4368 Full WE (2018) Linear Unmixing in the geologic sciences: more than a half-century of progress. In: Daya SB, Cheng Q, Agterberg F (eds) Handbook of mathematical geosciences. Springer, Cham Liu Y, Deng R, Li J, Qin Y, Xiong L, Chen Q, Liu X (2018) Multispectral bathymetry via linear unmixing of the benthic reflectance. IEEE J Sel Top Appl Earth Obs Remote Sens 11(11):4349–4363 Renner RM, Nath BN, Glasby GP (2014) An appraisal of an iterative construction of the endmembers controlling the composition of deepsea manganese nodules from the Central Indian Ocean Basin. J Earth Syst Sci 123:1399–1411 Silversides KL, Melkumyan A (2017) Mineralogy identification through linear unmixing of blast hole geochemistry. Appl Earth Sci 126(4): 188–194
Linearity
741
Tolosana-Delgado R, von Eynatten H, Karius V (2011) Constructing modal mineralogy from geochemical composition: a geometricBayesian approach. Comput Geosci 37:677–691 van der Meer F (1999) Image classification through spectral unmixing. In: Stein A, van der Meer F, Gorte B (eds) Spatial statistics for remote sensing. Kluwer, Dordrecht, pp 185–193 Weltje GJ (1997) End-member modeling of compositional data: numerical-statistical algorithms for solving the explicit mixing problem. Math Geol 29(4):503–549
Linearity
slope of the line 95 and the correlation coefficient between Fahrenheit and Celsius equal to one.
Linear Regression An approach to find the best linear relationship between a response (dependent variable) and a set of independent/ explanatory variables is known as linear regression. Consider the following linear regression model: Y ¼ Xb þ «
Christien Thiart Department of Statistical Sciences, University of Cape Town, Cape Town, South Africa
Definition Linearity possesses the properties of additivity and homogeneity and furthermore, from a statistical point of view, it means linear in the parameters/coefficients or weights. For clarity consider the following simple relationship between Y and X Y ¼ β0 þ β1 f (X), where β0 and β1 are known as the parameters/coefficients and f(X) is any function of X that does not depend on any of the parameters. If f(X) ¼ X then this relationship would be a straight line crossing the Y axis at β0 and the slope of the line is determined by β1, e.g., one-unit change in X will cause a change of β1 in Y.
Introduction Linearity is everywhere: conversion from one scale to another; an assumption of linear models, linear regression, logistic regression, and even in interpolation methods for spatial data. To expand our simple definition, consider the following scenarios:
Conversion Formula Linearity can be as simple as a formula for conversion from one scale to another, e.g., to convert temperature from degrees Celsius (C) to degrees Fahrenheit (F): 9 F ¼ C þ 32 5 This can easily be visualized by an XY-plot, with the Celsius temperature on the X axis and the Fahrenheit temperature on the Y axis. A straight line will show the relationship between the different measurements of temperature. It would be an exact line, with the intercept of the Y axis (F) at 32, the
where Y is a n 1 response vector, β is an r 1 vector of unknown coefficients, X is a n r matrix of nonrandom explanatory variables whose rank is r (n > r), and ε is n 1 vector of random error, with ε (t, Sε), E(ε) ¼ t and Var(ε) ¼ Sε. In the case of uncorrelated errors the expectation and variance of ε are E(ε) ¼ t ¼ 0 and Var(ε) ¼ s2I, where I is an identity matrix. Ordinary least square (OLS) produces best linear unbiased estimates (BLUE). Sometimes we also make the convenient assumption of normality, ε N(t, s2I), allowing us to investigate various hypotheses for the β’s and goodness-of-fit tests. Furthermore, under normality the conditional expectation of Y, given that X ¼ x, is linear in x: EðYjX¼ xÞ ¼ mY þ
rsY ðx mX Þ sX
where the parameters mY and mX are the population means of Y and X, r is the correlation coefficient between X and Y, and sY and sX the standard deviations. The conditional expectation is also called the regression curve of Y on x. A special case is the simple linear model where, for example, we only have one X variable: Y ¼ b0 þ b1 X This equation is linear both in the parameters (the β’s) and in the variable X. If we consider a polynomial function of the X variables, say of order s Y ¼ b0 þ b1 X þ b2 X2 þ . . . bs Xs then this equation is linear in the parameters (the β’s) but not linear with respect to the variable X. Even if at first glance an equation/line does not look linear/straight, we can transform it to obtain linearity. Consider Y ¼ eb0 þb1 X then by taking the natural log on both sides we obtain
L
742
Linearity
log Y ¼ b0 þ b1 X which is now linear. When presented with a nonlinear function, it is useful to transform the variables in order to obtain linearity. Transformations to achieve linearity are outside the scope of this article, and help is easily available in any multivariate regression book (e.g., Mosteller and Tukey 1977).
Visualizing Linearity In the case of bivariate variables it is easy to visualize the association between them with a scatterplot (XY-plot). One variable (usually the independent variable) defines the X axis and the other (dependent) variable defines the Y axis. The points on this graph represent the relationship between the two points. If the points lie on a straight line (as in our temperature example), we have perfect correlation between X and Y. In this case the correlation coefficient r XY ¼ 1, indicating a perfect positive relationship. The correlation coefficient lies between 1 and þ1. If the correlation coefficient is near 0, no linear relationship between the two variables exists. Values of the correlation coefficient near 1 are an indication of strong linear relationships. But be aware the correlation coefficient is not a “test” or an indication of linearity. It is good practice to always inspect the residuals after fitting a model. The residuals are defined as the difference between the observed value and the predicted value (Y) « ¼ Y Y ¼ Y Xb In the residual (some practitioners prefer the studentized residuals) by predicted scatterplot (Y), we see that the residuals are randomly scattered around zero, with no obvious nonrandom pattern, the so-called “null plot.” Nonlinearity, outliers, and influential observations will be easily identified for further investigation. Outliers are observations that are inconsistent with the rest of the observations, and influential points are single points far removed from the others. Both outliers and influential observation can affect the regression results (the regression coefficients [β0 s] and the correlation coefficient).
Linear Dependence Linearity can also be linked to the collinearity problem. Collinearity happens easily in observational studies, and the
X matrix representing the independent variables in the linear model can be rank-deficient. The rank of a matrix is defined as the number of linearly independent columns/rows in the matrix. No linear dependencies among the columns indicate that the matrix is of full rank or nonsingular. If the matrix is singular (some linear dependencies) we have collinearity. Collinearity leads to instability of the regression coefficients, sometimes resulting in nonsense coefficients and coefficients with very large standard deviations. The whole process is unstable and adding small random errors in Y causes large shifts in the coefficients. The regression results are unstable (Rawlings et al. 1998).
Linearity in Interpolation Methods for Spatial Data Suppose that observations are made on an attribute Z at n spatial locations s ¼ (s1, . . ., sn) in a designated geographic region D. Let Z(sg) denote the observed response at a generic location sg, where sg D Rd, and let Z(s) denote the n 1 vector of responses at the n locations. The prediction Z(s0) at location s0 in D is usually a (linear) weighted sum of the observations observed around it: N ðs0 Þ
Z ðs0 Þ ¼
li Zðsi Þ ¼ lT ZðsÞ, i¼1
where li is the weight associated with the ith observation Z(si) N ðs0 Þ with i¼1 li ¼ 1 , l ¼ l1 , . . . , lN ðs0 Þ , and N(s0) is the number of points in the search neighborhood around the prediction location s0. For inverse distance interpolation the weights will be a function of the distance between points and for kriging the weights are obtained by minimizing the mean square prediction error.
Example To illustrate the concept of linearity and the importance of visualization, we consider three small datasets. Dataset 1 is generated using a simple linear regression model: Y ¼ 5 þ 5X þ ε, where ε N(0, 1.44) is a random error term, X is a sequence of numbers evenly spaced from 0 to 10, and n ¼ 31. Dataset 2 is generated by adding a quadratic term: Y ¼ 5 þ 5X þ X2 þ ε and dataset 3 is generated using an exponential model: Y ¼ 5 exp 12 X þ e . Dataset 1: We fit a simple linear regression model to give Y ¼ 5.18 þ 4.99X, R2 ¼ 0.99 (blue line, Fig. 1, top left). For the fitted values versus residuals plot we observed a roughly
Linearity
743
Linearity, Fig. 1 Analysis of simulated datasets. Row one is dataset 1, generated from a simple linear regression model; row two is dataset 2, generated from a quadratic model; and row three is dataset 3, generated from an exponential model. First column: scatter plot of observed points; blue line is the fitted simple linear regression model, red line is a fitted
model improving on the simple regresion model (if applicable). The middle column presents fitted values versus residuals for the regression models and the last column the fitted values versus residuals for the improved model (if applicable)
centered random pattern around 0 with one possible outlier (Fig. 1, top middle). The residual plot and the R2 ¼ 0.99 are an indication of a good fit. Dataset 2: We first fit a simple linear regression model to give Y ¼ 10.9 þ 15X, R2 ¼ 0.97 (blue line, Fig. 1, row 2 left). The corresponding fitted values versus residual plot for this model (Fig. 1, row 2 middle) shows a systematic pattern; the residuals are positive for small X values and large X values and negative for X values in between. When we add a quadratic term, the resulting fitted model Y ¼ 5 þ 5.1X þ X2, R2 ¼ 0.99 (red line, Fig. 1, row2, left) produces a residual versus fitted plot that shows a centered random pattern around zero (Fig. 1, row 2 right). Note that the R2 value of 97% for the simple regression model is very high and misleading, thereby illustrating the concept that one cannot just interpret the output of a model without inspecting the residuals. Datest 3: We first fit a simple linear regression model to give Y ¼ 6.40.8X, R2 ¼ 0.41 (blue line, Fig. 1, bottom left). But if one looks at fitted values versus residuals (Fig. 1, bottom middle), we observe a nonrandom pattern. If we fit a model by taking the log of the Y’s (red line, Fig. 1, bottom left), then Y ¼ exp (2.10.49X), R2 ¼ 0.74, and the resulting plot of fitted values versus residuals (Fig. 1, bottom right) is now a random pattern around zero.
Summary Linearity is everywhere, either exactly or approximately; it can be a simple conversion formula but can also be an important assumption of various models, e.g., linear models, linear and logistic regressions. In the case of spatial data, predictions at unknown locations are linear weighted sums of the locations observed around the unknown locations. Visualization is a useful tool to judge the linearity between bivariate variables; if we have linear dependence between exploratory variables it can lead to collinearity, resulting in instability of the fitted model. Even when a function is nonlinear, for many purposes it can be useful as a start to approximate it by a linear function.
Cross-References ▶ Interpolation ▶ Inverse Distance Weight ▶ Kriging ▶ Nonlinearity ▶ Ordinary Least Squares ▶ Regression
L
744
Bibliography Mosteller F, Tukey JW (1977) Data analysis and regression: a second course in statistics. Addison-Wesley, Reading Rawlings JO, Pantula SG, Dickey DA (1998) Applied regression analysis, a research tool, 2nd edn. Springer, Cham
Local Singularity Analysis Wenlei Wang1, Shuyun Xie2 and Zhijun Chen2 1 Institute of Geomechanics, Chinese Academy of Geological Sciences, Beijing, China 2 China University of Geosciences (Wuhan), Wuhan, China
Definition Local singularity analysis as a fractal-based concept was initially proposed to investigate anomalous distributions of ore elements (Cheng 1999). Formally defined in the Dictionary of Mathematical Geosciences (Howarth 2017), the concept of singularity analysis is “In geochemical applications, the singularity also can be directly estimated from the slope of a straight line fitted to a log transformed element concentration value as a function of a log transformed size measure. Areas with low singularities can be used to delineate anomalies characterized by relatively strong enrichment of the elements considered. Local singularity maps usually differ significantly from maps with geochemical anomalies delineated by other statistical methods.” In geologic applications, local singularity analysis is an effective method of quantitatively and qualitatively characterizing irregular energy release and/or material accumulation within relatively short spatialtemporal intervals generated by nonlinear geoprocesses (Cheng 2007). In a local comparison of element concentrations (or other physical quantities) between samples and their vicinities, spatially varied physicochemical signatures can be quantified using a singularity index (i.e., a measure of the intensity of variations or changes). As an example of the investigation of local signatures (e.g., ore element concentrations) across space, spatial variations of geoanomalies associated with mineralization can be evaluated. After more than 10 years of development, local singularity analysis has been extended to characterize many nonlinear geoprocesses and extreme geoevents as part of geological, geophysical, and litho-geochemical data analyses.
Introduction In the context of earth science, geoanomalies having geologic signatures distinctive from those of their surroundings are a
Local Singularity Analysis
major focus in the study of the nature of hazards, the environment, and energy and mineral resources for the survival and development of human society. Geoanomaly analysis has become a fundamental technical route in many subdisciplines of earth science. In past centuries, both linear and nonlinear methodologies have played important roles in the locating and characterizing of geoanomalies. Statistical methods including univariate and multivariate analyses (e.g., the use of a Q-Q plot or bi-plot cluster) are broadly adopted for their effectiveness in solving problems in the statistical frequency domain rather than the spatial domain. With the development of applications in geology, a branch of statistics termed geostatistics has been proposed to predict ore grades in the mining industry and is maturely applied in various branches of geology and geography. Most observational datasets in spatial and spatial-temporal forms can be effectively interpreted in the spatial domain. In the early stage of its application in geological exploration, geostatistics was based on the hypothesis that mineralization-related geovariables (i.e., geochemical data) follow a normal or lognormal distribution (Ahrens 1953). The simplest and classical method of identifying a geoanomaly from the background is the use of an algorithm based on the mean (m) standard deviation(s) (e.g., s, 2s, or 3s) (Fig. 1). Many mineral deposits have been discovered from geoanomalies identified by geostatistical methods, especially geochemical anomalies. Besides the adoption of the above type of algorithm, other approaches including the use of multielement indices, principal component analysis, independent component analysis, multidimensional scaling, and random forests have been adopted successfully in Australia, Canada, and many other countries (Grunsky and Caritat 2020). However, with the advance of mineral exploration around the world, most easily discoverable mineral resources buried at a shallow depth have been exploited. Exploration fields now tend to be in covered areas, the deep earth, and other untraditional spaces. Geoanomalies now tend to be weak and difficult to identify from observational datasets adopting traditional geostatistical methods (Fig. 1). Fractal methods that consider both the frequency distribution and spatial self-similarity have become popular in recent decades. Against this background, local singularity analysis, a typical fractal method, has been proposed to identify weak anomalies that have not been discovered owing to the strong variance of the background and/or deep burial (Fig. 1). Local singularity analysis has been shown to be efficient in enhancing weak geoanomalies and effective in discovering causative bodies (e.g., ore bodies) of geoanomalies in covered areas and deeply buried spaces. As introduced by Cheng (2007), local variations of ore elements have been investigated by local singularity analysis and evaluated using a singularity index. Weak geochemical anomalies produced by deeply buried ore bodies have been appropriately enhanced. Meanwhile, strong anomalies identifiable using traditional methods are
Local Singularity Analysis
745
Local Singularity Analysis, Fig. 1 Schematic diagrams of the traditional method (a) and local singularity analysis (b) for the identification of a geoanomaly in a mineral district (c). In comparison with the
identification of a geoanomaly using the traditional method (d), local singularity analysis (e) is more effective in characterizing a weak geoanomaly
preserved without the loss of indicative patterns (Fig. 1). Local singularity analysis has since flourished in mineral exploration and other subdisciplines of earth science. Its application has been extended to investigate geological and geophysical data. The concept of the fault singularity has been proposed and used to characterize the development of fault systems indicative of mineralization-favoring spaces produced by faulting activity (Wang et al. 2012). In application to gravity and magnetic data, geophysical anomalies associated with mineralization have been enhanced, demonstrating that local singularity analysis allows improved and simplified high-pass filtering with the advantage of scale independence (Wang et al. 2013a). Recently, local singularity analysis has been applied to investigate earthquakes, magmatic flare-ups, continental crust evolution, and other extreme geological events (Cheng 2016). The progressive development of the theory, application and algorithms of local singularity analysis has had profound implications for the simulation, prediction, and interpretation of nonlinear geoprocesses.
CðV Þ ¼ cðV Þa=E1 ,
ð2Þ
where the constant c determines the magnitude of functions; E is the Euclidian dimension; and α is termed the singularity index. The scaling exponent α is a kernel parameter of local singularity analysis that preserves the shape of functions and changing behaviors of the ore materials at different scales of spatial-temporal intervals. Different physicochemical behaviors can be characterized on the basis of changes in the singularity index α. For α ¼ E, the ore materials follow a monofractal distribution that indicates that the mass m(V ) and density C(V ) are independent of the volume V. For α > E, there is a negative singularity that corresponds to the depletion of ore materials. This implies that ore materials follow a multifractal distribution. For α < E, there is a positive singularity that indicates that ore materials follow a multifractal distribution, implying enrichment as introduced in the definition of local singularity analysis.
Methods of Estimating the Singularity Index
Singularity Index As local singularity analysis was originally defined in the context of hydrothermal mineralization (Cheng 1999), the mass m(V ) and density C(V) of ore materials each have a power-law relationship with V: mðV Þ ¼ cðV Þa=E ,
ð1Þ
Methods of estimating the singularity index have progressively advanced over two decades. The first proposed estimation method is the square-window-based method (Cheng 2007). In its application to geochemical data in two dimensions, the general estimation steps (Fig. 2) of the method are: (1) centering on a location i of the study area and predefining a
L
746
Local Singularity Analysis
Local Singularity Analysis, Fig. 2 Schematic diagram of the square-window-based method of estimating the singularity index (after Wang et al. 2018)
set of square windows with size A(ei ei) and density C [A(ei)]; (2) plotting C[A(ei)] and A(ei ei) on a log-log plot (Eq. 3); (3) adopting the least-squares method to estimate the slope (k) of the linear relation between logC[A(ei)] and log ei; (4) calculating the singularity index αi ¼ k þ E or k þ 2; and (5) iterating these steps at each location to estimate the spatial distribution of the singularity index of the study area. Equation 3 is written as log C½Aðei Þ ¼ c þ ða EÞ logðei Þ:
ð3Þ
Selected Case Studies Furthermore, algorithms of local singularity analysis have been developed to characterize geoanomalies more precisely and appropriately. As an example, geochemical signatures inherited from multiple geoprocesses are often those of heterogeneity and anisotropy. In addition to identifying the locations of geoanomalies indicative of mineralization, the characterization of the anisotropic distributions of the related mineralization is necessary. The directional window-based method was first proposed by Li et al. (2005). In this method, a series of directional (rectangular) windows are predefined to obtain C[A(ei)] and ei (Fig. 3a). Other steps of estimating the singularity index α are similar to those in the square-windowbased method. The variation in window size only follows changes in ei, and the estimation thus becomes a onedimensional problem and α ¼ k 1. Adopting this method, the spatial variation of geochemical behaviors along the initial (i.e., northward) direction is consequently characterized. Moreover, other directional singularity indices, such as the eastward-trending window-based index (Wang et al. 2018), can be estimated. Wang et al. (2013b) proposed a geologically constrained local singularity analysis that is termed the fault-trace-
oriented singularity mapping technique and quantitatively describes interrelations between fault structures and orebearing fluids. In contrast with former window-based methods, fault traces are first divided equally into segments (Fig. 3b). Centering on one segment, a series of rectangular windows along the direction vertical to the segment is defined. Then, in steps similar to those discussed above, the fault-oriented singularity index is estimated to describe interactions between fault activity and fluid flow. A new fault property (Fig. 3c) is obtained by assigning the index to the fault segment: α > 1 indicates a negative fault, or a gradual depletion of metals approaching the fault space, whereas α < 1 indicates a positive fault, or a continuous enrichment of metals approaching the fault space. Wang et al. (2018) further developed the directional window method and proposed the concept of the anisotropic singularity. Following the directional window method, the anisotropic singularity estimation focuses more on the intensity of anisotropic variations between samples and their vicinities. Centering on one sample location, series of rectangular windows trending along all directions are predefined (Fig. 3d). Using the estimation method discussed above, the direction with the maximum slope, or |α 1|max, is determined and used to obtain the singularity index for the current location, such that the variation (accumulation or depletion) characterized by |α 1|max is greatest along the selected direction (Fig. 3e). In addition, as shown in Fig. 3f, directional patterns of element accumulation (with arrows directed from low to high) and depletion (with arrows directed from high to low) depict the anisotropy of geochemical behaviors appropriately. Cheng (2017) applied the local singularity analysis to global databases of igneous and detrital zircon U–Pb ages. Figure 4 shows the U–Pb ages on a histogram. The episodic growth of the continental crust and the development of
Local Singularity Analysis
747
Local Singularity Analysis, Fig. 3 Schematic diagrams of directional window-based (a), fault-trace-oriented (b, c), and anisotropic singularity index estimation methods (d, e, f) (after Wang et al. 2018)
Local Singularity Analysis, Fig. 4 Schematic diagram of the method of estimating the singularity index for histogram data in one dimension (a) and practical application in analyzing the global zircon U–Pb age series (Cheng 2017)
supercontinents are characterized by the age peaks. In analyzing the one-dimensional data, the estimation method (Fig. 4) was modified to have the steps of (1) choosing one bin of the histogram as the initial one-dimensional window; (2) while centering on this bin, defining a series of onedimensional windows with size ei; (3) plotting average numbers of recorded ages of bins within each window C [A(ei)] and ei on a log-log graph; and (4) estimating the slope k using the least-squares method and using k to calculate the
singularity index for the initial bin, α ¼ k þ 1. The case study well extends the application fields of local singularity analysis to lithogeochemical data, focusing on the shapes of age peaks depicted by the singularity index regardless of the bin size of the histogram (i.e., scale independency) rather than the amplitudes and periodicity of age series. The singularity indices of global data show that all age peaks have an episodic pattern with a duration of 600–800 Myr (Fig. 4) (Cheng 2017).
L
748
Conclusions After more than 20 years of development, local singularity analysis has become a broadly used geoanomaly identification method, especially in the exploration of weak geoanomalies obscured by covering materials, owing to its advantages in quantifying spatial-temporal variations of physicochemical signatures generated by nonlinear geoprocesses. Initially applied in geochemical data analysis, spatially varied ore element concentrations have been characterized using the singularity index to represent accumulation and depletion in the comparison of samples within local vicinities. In the context of two-dimensional data analysis, geological and geophysical data have been investigated to delineate mineralization and other geoanomalies related to nonlinear geoprocesses, quantitatively and qualitatively. Algorithms have been further developed to satisfy geological constraints and the anisotropic nature of geoprocesses. In recent practice, local singularity analysis was adopted to simulate and predict extreme geological events that extend the application fields in earth sciences. In more widespread and profound application, local singularity analysis is expected to be adopted for new situations of geological singular events.
Cross-References ▶ Fractal Geometry in Geosciences ▶ Nonlinearity
Locally Weighted Scatterplot Smoother Wang W, Zhao J, Cheng Q (2013a) Application of singularity index mapping technique to gravity/magnetic data analysis in southeastern Yunnan mineral district, China. J Appl Geophys 92:39–49 Wang W, Zhao J, Cheng Q (2013b) Fault trace-oriented singularity mapping technique to characterize anisotropic geochemical signatures in the Gejiu mineral district, China. J Geochem Explor 134: 27–37 Wang W, Cheng Q, Zhang S, Zhao J (2018) Anisotropic singularity: a novel way to characterize controlling effects of geological processes on mineralization. J Geochem Explor 189:32–41
Locally Weighted Scatterplot Smoother Klaus Nordhausen and Sara Taskinen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä, Finland
Definition The locally weighted scatterplot smoother (LOWESS) uses local regression models to obtain a smooth estimator for the conditional expected value of response variable. The locality is defined based on a nearest neighbour-type approach assigning weights to the observations based on their distances from the point of interest as well as on their outlyingness which are then used in an iterative weighted least squares approach.
Introduction Bibliography Ahrens LH (1953) A fundamental law of geochemistry. Nature 172: 1148–1148 Cheng Q (1999) Multifractality and spatial statistics. Comput Geosci 25: 949–961 Cheng Q (2007) Mapping singularities with stream sediment geochemical data for prediction of undiscovered mineral deposits in Gejiu, Yunnan Province, China. Ore Geol Rev 32:314–324 Cheng Q (2016) Fractal density and singularity analysis of heat flow over ocean ridges. Sci Rep 6:1–10 Cheng Q (2017) Singularity analysis of global zircon U-Pb age series and implication of continental crust evolution. Gondwana Res 51: 51–63 Grunsky E, Caritat P (2020) State-of-the-art analysis of geochemical data for mineral exploration. Geochemistry 20(2):217–232 Howarth RJ (2017) Dictionary of mathematical geosciences. Springer, Cham Li Q, Liu S, Liang G (2005) Anisotropic singularity and application for mineral potential mapping in GIS environments. Prog Geophys 20: 1015–1020. (in Chinese with English abstract) Wang W, Zhao J, Cheng Q, Liu J (2012) Tectonic-geochemical exploration modeling for characterizing geo-anomalies in southeastern Yunnan district, China. J Geochem Explor 122:71–80
A common problem in statistics is to identify the relationship between two variables, i.e., between an explaining variable x and a response variable y based on a sample (xi, yi), i, . . ., n. Traditionally, the conditional expectation of y given x is modelled as a function of x, and we assume that the data are generated by yi ¼ gðxi Þ þ ϵi , where g( ) is a smooth function and ϵ i is some random fluctuation with E(ϵi) ¼ 0 and Var(ϵ i) ¼ s2. When visualizing a scatterplot of xi against yi, one thus wants to see a smooth relationship in the conditional mean of y as a function of x. Assuming g(x) ¼ β0 + β1x, where β0 and β1 are constants, a straight line is used to model the conditional mean of y, which is known as linear regression. However, in many practical problems, it seems an oversimplification to approximate g( ) with a linear function. The use of a polynomial of order d, i.e.,
Locally Weighted Scatterplot Smoother
749
g(x) ¼ β0 + β1x + . . . +βdxd, may also not be reasonable and may overfit the data and cause problems at the boundaries of x. In many cases, it may be reasonable to assume that locally, in a neighbourhood of a point x, a linear regression provides a good approximation for the conditional mean of y. For such a local regression, observations close to x are naturally more relevant than observations further away, thus leading to the following weighted regression problem n
argmin b0 ,b1
K i¼1
xi x ð y i b0 b 1 x i Þ 2 , h
ð1Þ
where K is a non-negative weight function called kernel, and the parameter h which defines locality is called the bandwidth. The bandwidth can be chosen, for example, using cross validation. This modelling approach is also known as local linear regression and is easily extended to the case of several predictors and to local polynomial regression. Local linear regression has a long tradition and is discussed in detail in Loader (1999), for example. The main advantages of local linear regression are, according to Cleveland & Loader (1996), that (i) it is easy to understand and interpret, (ii) it adapts well to problems at the boundaries and regions of high curvature, and (iii) it does not require smoothness or regularity conditions. Many computational tools for local linear regression exist. In R, the method is available, for example, in the package locfit (Loader 2020). Despite its flexibility, local linear regression as defined in (1) has some shortcomings. One disadvantage is that the method uses the quadratic loss function, which is well known to be sensitive to outliers. It is therefore natural to replace the quadratic loss function with a robust loss function and fit locally some robust regression model instead of a least squares regression. Local robust polynomial regression of order d can then be defined as n
argmin b0 , b1 , ..., bd
K i¼1
xi x r yi b0 b1 xi . . . bd xdi , h
ð2Þ
where r is a loss function. The data analyst must now choose the kernel function K, the loss function r, the order of the polynominal d, and the bandwidth h.
Locally Weighted Scatterplot Smoother In this context, Cleveland (1979) suggested the locally weighted scatterplot smoother (LOWESS) which uses as a kernel the tricube weight function K ðxÞ ¼ 1 jd j3
3
,
when |d| < 1, and 0 otherwise, where d is the distance from the neighbourhood sample points to the value x. Cleveland (1979) argues that such kernel is beneficial as it assigns points far away a zero weight and is therefore better than kernels based on symmetric densities which give usually all points some positive weight. Depending on the scale of x, it may then however be advisable to scale the observations. For the neighbourhood structure, LOWESS suggests an adaptive nearest neighbour method. Let 0 < f 1 be a tuning parameter and denote then r ¼ [fn] as the number of local data points to be used. At point x, those r points, for which |x xi| is the smallest, then form the neighbourhood. As the next step, initial fits yi at each data point are obtained using the local polynomial least square regression in order to obtain the residuals ri ¼ yi yi. These residuals are then robustly scaled using six times the median absolute deviation (mad), i.e., the median of {|r1|,. . ., |rn|}. To downweight large residuals, biweight kernel B(t) ¼ (1 t2)2I[1,1](t) is then used to assign robust weights δi ¼ B(ri/(6 mad)). Based on this initial step, the procedure is then iterated M times so that the kernel K(x) is multiplied with robust weights δi. Thus, this corresponds to local iterative weighted least squares (IWLS) where the weights depend on the distance from the point of interest and whether the point is considered as outlying. The parameter values for LOWESS that need to be chosen are f, d, and M. Cleveland (1979) suggested to choose d from 0,1, and 2 with a tendency to d ¼ 1 as higher order polynomials tend to overfit the data and the method becomes numerically unstable. It is also argued that the number of iterations can be small, say M ¼ 2 or M ¼ 3, and no convergence criterion is needed. Parameter f has most impact on the results, as, when the value is increased, the smoothness of the fit increases. Thus, it is a trade-off between variability and recognizing the pattern. Cleveland (1979) suggests to start with f ¼ 0.5 and limit the value to the range of [0.2, 0.8]. For automatic procedures, Cleveland (1979) suggests to first choose the initial f which minimizes ni¼1 ðyi yi, f Þ2 and then to choose the final f as the minimizer of 2 n i¼1 di ðyi yi, f Þ . Based on the final robust weights, one can then compute the fitted values yi , . . . , ym for a grid of, not necessarily observed values, x1 , . . . , xm covering the range of the observed x’s. Using linear interpolation, these points can then be used to visualize the smooth function. For the theoretical properties of LOWESS under the assumption of iid errors following a Gaussian distribution, we refer to Cleveland (1979). The procedure in an algorithmic form is summarized in Algorithm 1. Algorithm 1: LOWESS algorithm Input: n data points (yi, xi), i ¼ 1, . . ., n; 1 Set f (0, 1] to specify the neighborhood; 2 Set integer d 0 as the degree of the local polynomial;
L
750
Locally Weighted Scatterplot Smoother
3 Set integer M 1 for the number of iterations; 4 Compute the size of the nearest neighbourhood r ¼ [fn]; 5 for i 1 to n do 6 For xi, identify the r points forming the neighbourhood and denote them xirl , where rl {1, . . ., i 1, i þ 1, . . ., n} and l ¼ 1, . . ., r; 7 Use these r points to fit a local polynomial regression model of order d using the kernel Kð xirl Þ ¼ l xirl xi
3 3
;
8 Obtain the initial fit yi ; 9 for iter 1 to M do 10 Calculate the residuals r i ¼ yi yi and the median of absolute deviations (mad) of the residuals; 11 Compute the robust weights δi ¼ B(ri/(6 mad)); 12 for j 1 to n do 13 Use the neighbourhood points of xj, xjrl and their corresponding robust weights djrl to fit a local polynomial regression model of order d using the kernel Kð xjrl Þ ¼ djrl 1 xjrl xj
3 3
;
14 Obtain the robust fit yj ; 15 Predict y for any point of interest x; In R, LOWESS with d ¼ 1 can be applied using, for example, the function lowess.
Example To demonstrate LOWESS, we consider average yearly temperatures (in centrigrades) of the German state BadenWuerttemberg from 1880 to 2020. The data can be obtained from the CDC-portal of the Deutscher Wetterdienst. Figure 1 shows the data together with three global polynomial least squares fits. All fits indicate an increase in average temperature but seem not to fit the data very well for early temperature records. We then apply LOWESS via the R function loess and visualize the results in Fig. 2 for different degrees of the polynomial (d) and different parameters defining the
9 8 7
Average temperature (C)
Cleveland (1979) pointed out that LOWESS can be used with any kernel function, and also variants exist to speed up the computations for large data sets and for accommodating prior weights for the observations. The main extension of LOWESS is however suggested in Cleveland & Devlin (1988) as locally estimated scatterplot smoothing (LOESS). LOESS generalizes LOWESS to the case of several predictors by replacing the simple regression with a multiple regression. Implementations such as LOESS in R let the user choose the degree of the polynominal d as well as the need for robust updates.
linear quadratic cubic
10
Locally Weighted Scatterplot Smoother, Fig. 1 Yearly average temperatures ( C) for Baden-Wuerttemberg in Germany together with three global fits
Extensions of LOWESS
1880
1900
1920
1940
1960 Year
1980
2000
2020
Logistic Attractors
Rob, f=0.75, d=2 Rob, f=0.25, d=2 Rob, f=0.75, d=1
8
9
10
LS, f=0.75, d=2 LS, f=0.25, d=2 LS, f=0.75, d=1
7
Average temperature (C)
Locally Weighted Scatterplot Smoother, Fig. 2 Yearly average temperatures ( C) for Baden-Wuerttemberg in Germany together with locally smoothed fits
751
1880
1900
1920
1940
1960
1980
2000
2020
Year
neighbourhood ( f ). Both, the robust variant with the IWLS steps as well as the pure local LS fits, are illustrated. The figure clearly shows that the differences between robust fits and LS fits are quite small indicating the lack of outlying observations. A quadratic polynomial seems however preferable, especially with a large value of f, and the LOWESS fit shows that after a long period of slow rise in average temperature, the rise has accelerated in the last 50 years. A more thorough analysis of temperature data using LOWESS is available in Wanishsakpong & Notodiputro (2018), for example.
Summary LOWESS and its extension LOESS are powerful and simple regression methods ideal for situations where an initial model assumption is not available, as the methods do not use any special model specification. LOWESS exploits locality which means that it requires dense observations to fully achieve its strengths. While LOWESS was originally designed for models with signal predictors, LOESS allows more predictors meaning that the method creates smooth surfaces which then need even denser data and a careful choice for the notion of distance.
References Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836. https://doi.org/10. 1080/01621459.1979.10481038 Cleveland WS, Devlin SJ (1988) Locally weighted regression: an approach to regression analysis by local fitting. J Am Stat Assoc 83(403): 596–610. https://doi.org/10.1080/01621459.1988.10478639 Cleveland WS, Loader C (1996) Smoothing by local regression: principles and methods. In: Härdle W, Schimek MG (eds) Statistical theory and computational aspects of smoothing. Physica-Verlag HD, Heidelberg, pp 10–49. https://doi.org/10.1007/978-3-642-48425-4_2 Loader C (1999) Local regression and likelihood. Springer, New York Loader C (2020) locfit: local regression, likelihood and density estimation. R package version 1.5–9.4. https://CRAN.R-project.org/ package=locfit Wanishsakpong W, Notodiputro KA (2018) Locally weighted scatter-plot smoothing for analysing temperature changes and patterns in Australia. Meteorol Appl 25(3):357–364. https://doi.org/10.1002/met.1702
Logistic Attractors Balakrishnan Ashok Centre for Complex Systems and Soft Matter Physics, International Institute of Information Technology Bangalore, Electronics City, Bangalore, Karnataka, India
Cross-References
Definition
▶ Smoothing Filter ▶ Splines
The logistic map is a common starting point for modeling growth phenomena in various fields due to not only its simple
L
752
Logistic Attractors
form but due to the extensive and rich dynamics it displays. Systems in nature typically display nonlinear behavior, and the logistic map is a quadratic nonlinear equation that can be extensively analyzed, even as it shows both regular and chaotic behavior, depending upon the value of the control parameter. A logistic attractor, as the name implies, is an attractor associated with the logistic map and can correspond to either a fixed point or a limit cycle or even chaotic oscillations.
Logistic Map and Logistic Function To understand how logistic attractors appear in nature, an overview of the associated logistic map or difference equation is required, as also the associated logistic differential equation and logistic function, which are discussed briefly in this section. The logistic map is the quadratic map defined as xnþ1 ¼ rxn ð1 xn Þ,
ð1Þ
with 0 xn 1, 0 r 4. This was first discussed at length by May (1976) in the context of modeling population growth from one generation xn, to the next, xn þ 1, with a growth rate r serving as the control parameter. The logistic map is the discrete analog to the logistic differential equation
Logistic Attractors, Fig. 1 Plot of the logistic function f(x) versus x for different values of the growth rate r and maximum value m and midpoint x0 ¼ 0.6: curve (1) m ¼ 1.0, r ¼ 0.5; curve (2) m ¼ 1.5, r ¼ 0.5; curve (3) m ¼ 1.0, r ¼ 1.5
d xðtÞ ¼ rxðtÞð1 xðtÞÞ, dt
ð2Þ
which has the logistic function xðtÞ ¼ m=ð1 þ expðr ðt t0 ÞÞ
ð3Þ
as its solution, with r being termed the logistic growth rate, and m being the maximum value of x(t). Figure 1 shows examples of logistic functions with various values of growth rate and maximum value. The logistic differential equation was introduced by Pierre Francois Verhulst as an improvement over the Malthusian exponential growth model for population growth (Verhulst 1838). The assumptions made in formulating the logistic model are that the population in each new generation is proportional to the then-current one for smaller populations, and there is a maximum population level, M, reaching that results in population extinction in the subsequent generation. This gives N nþ1 ¼ rN n ð1 N n =MÞ,
ð4Þ
If the population Nn in generation n is taken to represent the fraction of the maximum value M, and normalizing M so that M ¼ 1 and 0 Nn 1, with r as the growth rate, this has the same form as Eq. (1), which is also known as the logistic difference equation or the discrete logistic equation.
Logistic Attractors
753
Stability and Attractors The logistic map has several attractors, which vary depending upon the value of the control parameter. These can be fixed points, limit cycles, or chaotic attractors. Fixed points, x*, for the logistic map exist when f ðx Þ x ¼ rx ð1 x Þ,
ð5Þ
which would imply that x* ¼ 0 or x* ¼ 1–1/r. Regardless of the value of the control parameter, the logistic map always has a fixed point at the origin x* ¼ 0. x* ¼ 1–1/r is a fixed point for r 1. The stability of these fixed points can be obtained from f 0 (x*) ¼ r 2rx*, with stable fixed points resulting from | f 0 (x*)| < 1 and unstable fixed points from | f 0 (x*)| > 1. At x* ¼ 0, f 0 (0) ¼ r, which would be stable for r < 1 and unstable for r > 1. At the fixed point x* ¼ 1–1/r, f 0 (x* ¼ 1–1/r) ¼ 2 r, stable fixed points would require 1 < r < 3, and r > 3, resulting in all extant fixed points being unstable. Evolution of the stability of the system as a function of the control parameter can be observed from a return map of xn þ 1 versus xn, as shown in Fig. 2. The intersection of the diagonal with the parabola would be at the fixed point since xn þ 1 ¼ xn there. Three typical curves are shown for values of r less than, equal to, and greater than 1. As can be seen, for r < 1, the origin can be
the only fixed point since the parabola falls below the diagonal. It will be noted that increasing r makes the parabola steeper so that the diagonal is tangent to it when r ¼ 1. For r > 1, the second fixed point is located at the intersection of the diagonal with the parabola, with the other fixed point continuing to be the origin, which, however, loses stability. Hence, at r ¼ 1, x* bifurcating from the origin occurs through a transcritical bifurcation. For 1 < r < 3, the logistic map has an attractor at x* ¼ 1–1/r, which is a fixed point for the system. The system shows its first bifurcation to a period-2 cycle at r ¼ r1 ¼ 3, where a flip bifurcation or period-doubling bifurcation occurs. This attractor exists for 3 r ≲ 3.449, after which a second perioddoubling bifurcation occurs at r ¼ r2 ¼ 3.449 with the appearance of another attractor, a period-4 cycle, followed by further period-doubling attractors at r3 ¼ 3.54409, r4 ¼ 3.5644, and so on till r ¼ r1 ¼ 3.569946, at which point chaotic behavior sets in. The presence of these attractors can also be easily seen from the orbit diagram for the logistic map, shown in Fig. 3. As can be seen, the first period-doubling bifurcation occurs at r ¼ 3, followed thereafter by further period-doubling bifurcations as mentioned above. After the onset of chaotic behavior, there are again windows of periodic behavior, including the period-3 window that appears at r ≈ 3.83. The appearance of a period-3 cycle in a system is an indicator of the presence of chaotic behavior –
0.6 r=2 r=1 r=0.5 xn+1=xn 0.5
xn+1
0.4
0.3
0.2
0.1
0 0
0.2
0.6
0.4
0.8
1
xn
Logistic Attractors, Fig. 2 Plot of xn þ 1 versus xn for the logistic map, showing the location of the fixed points for three different r values
L
754
Logistic Attractors
Logistic Attractors, Fig. 3 Orbit diagram for the logistic map showing period-doubling. Sections of panel (A) are shown magnified in (B–F). A period-3 cycle can be seen in panel F after r ≈ 3.83
“period 3 implies chaos” as was proved by Li and Yorke in their famous paper with the same title (Li and Yorke 1975). Self-similarity can also be seen in the orbit diagram of the logistic map, with the branches seeming to repeat endlessly at finer and finer scales, as can be seen in Fig. 3. The appearance of n-cycles and attractors can also be observed by constructing a cobweb diagram for the map, which is a graphic method of locating fixed points and cycles. Cobweb diagrams illustrating the stability of the attractors of the logistic map are shown in Fig. 4 for six distinct r values, r ¼ 1.5, 2.0, 2.8, 3.2, 3.5, and 3.9. For r ¼ 1.5 and r ¼ 2.0, and r ¼ 2.8, the fixed point at the origin x* ¼ 0.0 is a repelling one while the fixed point at x* ¼ 1–1/r is attracting. That is, x* ¼ 0.33, 0.5, and 0.64 are attracting for r ¼ 1.5, 2, and 2.8, respectively. At r ¼ 3.2, both the fixed points (at x* ¼ 0 and x* ¼ 0.6875) are repelling fixed points, as seen in part (d). The attractor at r ¼ 3.5 (e) is a period-4 cycle while for r ¼ 3.9 shown in (e), a multitude of cycles are possible, and a chaotic attractor is evident. The behavior of the logistic equation and the logistic map is extensively discussed in the dynamical systems theory literature, including, for example, in Strogatz (1994). In the context of geophysical applications, one can also refer to the work of Turcotte (Turcotte 1997) for an introduction to the logistic map and dynamical systems.
In Geophysics The logistic equation can be extended to more generalized forms, such as d xðtÞ ¼ pxðtÞq ð1 ðxðtÞ=kÞc Þtn , dt
ð6Þ
with p, q, k, c, n being constants, allowing for the modeling of various systems showing still further rich dynamics. In a recent work (Maslov and Anokhin 2012), Maslov and Anokhin used such a generalized logistic equation to derive the empirical Gutenberg–Richter equation relating the magnitude m of an earthquake to its frequency f (Gutenberg and Richter 1942), similar to that obtained by Ishimoto and Iida, earlier (Ishimoto and Iida 1939). The Gutenberg–Richter equation logðFÞ ¼ a bm
ð7Þ
depicts the logarithmic relation between earthquake frequency and magnitude, a and b being constants. Considering one special case of the Eq. (6):
Logistic Attractors
1
755 1
a
0.8
0.7
0.7
0.6
0.6 x n+1
x n+1
0.8
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.1
0.2
0.3
0.4
0.5 xn
0.6
0.7
0.8
0.9
0 0
1
0.1
0.2
0.3
0.4
0.5 xn
0.6
0.7
0.8
0.9
1
1
1
c
0.9
0.9
d
0.8
0.8 0.7
0.7
0.6
0.6 x n+1
x n+1
0.5
0.4
0 0
b
0.9
0.9
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 0
0.1
0.2
0.3
0.4
0.5 xn
0.6
0.7
0.8
0.9
L
0 0
1
0.1
0.2
0.3
0.1
0.2
0.3
0.4
0.5 xn
0.6
0.7
0.8
0.9
1
1 1
f
0.9
e
0.9
0.8 0.8 0.7
0.7
0.6 x n+1
x n+1
0.6 0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0.1
0.2
0.3
0.4
0.5 xn
0.6
0.7
0.8
0.9
1
0
0
0.4
0.5 xn
0.6
0.7
Logistic Attractors, Fig. 4 Cobweb diagram for the logistic map for r ¼ (a) 1.5, (b) 2.0, (c) 2.8, (d) 3.2, (e) 3.5, and (f) 3.9
0.8
0.9
1
756
Logistic Regression
d xðtÞ ¼ rxðtÞðM xðtÞÞta , dt
ð8Þ
Maslov and Anokhin show that a generalized Gutenberg– Richter formula ln FðmÞ ln A
s maþ1 aþ1
ð9Þ
can be obtained by appropriate identification and redefinition of variables with those in the generalized logistic equation. The usual Gutenberg–Richter formula can be obtained by taking α ¼ 0, a ¼ ln A and b ¼ s. A generalized logistic equation can also be used to model natural phenomena, including earthquakes and landslides, and the kinetics of such events that involve hierarchical aggregation, as has been proposed by Maslov and Chebotarev (Maslov and Chebotarev 2017), where the function x(t) in Eq. (6) is replaced by N(A), with A corresponding to the basic size of an element in a structure, and may correspond to, in the context of the problem being modeled, the magnitude of an earthquake or even area or mass. The function N is interpreted as a measure of the number of elements that are of size smaller than A. Equation 6 then serves to model the rate of growth of a particular phenomenon. The distributions of fore- and aftershocks of the 2014 Napa Valley earthquake, as well as the 2004 Sumatra earthquake, were modeled using the solution of the generalized logistic equation in Maslov and Chebotarev (2017) and showed good agreement, both qualitatively and quantitatively, with observed data points. Other researchers have also used generalized logistic functions to model fracture propagation and growth and include self-similarity in the fracture process (Lyakhovsky 2001).
Summary Logistic attractors are the fixed points or limit cycles of the logistic difference equation. Despite the very simple form of this unimodal map, its nonlinear behavior gives rise to both simply periodic behavior, as well as period-doubling behavior with the number of stable solutions doubling, eventually showing chaotic behavior, after appropriately changing the value of the control parameter. The logistic map or its counterpart in continuous systems, the logistic equation, can be used to model varied growth phenomena and is useful in modeling geological phenomena such as earthquakes, fluid flows, as well as fracture propagation.
Bibliography Gutenberg B, Richter CF (1942) Earthquake magnitude, intensity, energy and acceleration. Bull Seismol Soc Am 32:163–191 Ishimoto M, Iida K (1939) Observations of earthquakes registered with the microseismograph constructed recently. Bull Earthq Res Inst Univ Tokyo 17:443–478 Li TY, Yorke JA (1975) Period three implies chaos. Am Math Mon 82: 985–992 Lyakhovsky V (2001) Scaling of fracture length and distributed damage. Geophys J Int 144:114–122 Maslov L, Anokhin V (2012) Derivation of the Gutenberg-Richter empirical formula from the solution of the generalized logistic equation. Nat Sci 4:648–651 Maslov LA, Chebotarev VI (2017) Modeling statistics and kinetics of the natural aggregation structures and processes with the solution of generalized logistic equation. Physica A 468:691–697 May R (1976) Simple mathematical models with very complicated dynamics. Nature 261:459467 Strogatz S (1994) Nonlinear dynamics and chaos: with applications to physics, biology, chemistry and engineering. Perseus Books Turcotte DL (1997) Fractals and chaos in geology and geophysics, 2nd edn. Cambridge University Press Verhulst P-F (1838) Notice sur la loi que la population suit dans son accroissement. Correspondance mathematique et physique 10:113–121
Logistic Regression D. Arun Kumar Department of Electronics and Communication Engineering and Center for Research and Innovation, KSRM College of Engineering, Kadapa, Andhra Pradesh, India
Definition Logistic regression (LR) generates the probability of outcome for the given input variable. The most commonly used LR is with binary outcome such as 1 or 0, true or false, and yes or no. In multinomial LR model the number of discrete outcomes are more than two. Logistic regression is used to predict the class label of an unknown sample/pattern to a category. LR is used in the data classification applications.
Introduction Advancement in the sensor technology produced a large amount of spatial data. The spatial data is processed to obtain the information related to the ground area of interest (Bishop 1995). There are various methods of spatial data analysis techniques to extract the information from the data of
Logistic Regression
different dimensions. The spatial data analysis techniques are broadly classified as data classification, data regression, and data clustering. Data classification is assigning class label to the data sample (Duda et al. 2000). Data clustering is grouping the data samples in to individual groups without class labels. Data regression analysis is predicting the numeric values of output variable for the given input variables. The classification methods like Logistic regression, K-nearest neighbor classifier, minimum distance to mean classifier, etc., are used for spatial data analysis. The regression methods like linear regression, robust regression, probit regression, ridge regression, etc., are used to compute the values of output variable. The methods like K-means clustering, density-based clustering, and hierarchal clustering are used to group the data samples without class labels. There exist various types of regression models to predict the unknown values of output variable. The models such as linear regression and logistic regression are frequently used to predict the values of a output variable. The linear regression model is used to predict continuous values of output variable such as predicting the slope value of a terrain from digital elevation model (DEM) image. The predicted continuous values of output variable are not sufficient to identify the number of classes present in the dataset. With this motivation logistic regression is suggested for data classification. LR was first introduced in nineteenth century to predict the population growth (Bacaër 2011). The LR model was later updated in order to expand the model to classify the population of various countries (Wilson and Lorenz 2015). LR model is used to predict the discrete values of output variable. The discrete values are also called as class labels. The logistic regression model is mathematically represented in the form of nonlinear equation with activation function to provide the discrete values. There are various types of logistic regression models for data classification such as binary logistic regression, multinomial logistic regression, ordinal logistic regression model, etc. In the present study, a binary logistic regression model is explained with an example.
Logistic Regression Analysis With traditional linear regression model, the output variable is generated as continuous values (Duda et al. 2000). The continuous values produced by linear regression have the limitation in producing class discrimination among the samples in the dataset. This limitation of regression model is solved by producing the probability of input sample/pattern to a given class. The functional block diagram of logistic regression model is provided in Fig. 1. The data samples are collected to prepare the dataset and furthermore, a logistic regression model is trained with the training samples. Let a sample/ Pattern (P) in the dataset is associated with n number of
757
Input data
Data Preprocessing
Logistic Regression Model
Class label
Logistic Regression, Fig. 1 Logistic Regression model for data analysis. The model considers input data, preprocessing of input data, training the logistic regression model, classifying the data, and assigning the class label to the data
feature values. The patterns are represented as points in n-dimensional feature space with each axis named as feature axis. The linear regression model for the training dataset is mathematically given in Eq. 1. z ¼ a0 þ a1 x 1 þ a 2 x 2 þ . . . þ an x n ,
ð1Þ
where a0, a1, . . ., and an are coefficients or weight parameters, x1, x2, and xn are feature values for n features in a sample. During the training stage, the coefficients a0, a1,. . ., and an are updated by passing each input sample/pattern based on back propagating the output error (Theodoridis and Koutroumbas 1999). Linear regression model predicts continuous numeric values to the output variable data analysis (Cramer 2002). The model considers input data, preprocessing of input data, training the logistic regression model, classifying the data, assigning the class label to the input sample. The limitation of linear regression is to predict discrete output value and represent the corresponding class label (Lever et al. 2016). The logistic regression model uses sigmoid activation function to generate the probabilities of input samples to the respective classes in the dataset. The sigmoid function generates output value in the range (0,1) for all the input real numbers (Hilbe 2009). Hence, if the input to the function is either a very large negative number or a very large positive number, the output is always between 0 and 1. The predicted probabilities are used by the model to label sample to the respective class. The output of logistic regression model is computed as according to Eq. 1. y¼
1 ð1 þ ez Þ
ð2Þ
The output value z is passed through sigmoid activation function and final output is obtained according to Eq. 2. The value of y is estimated in range [0,1] (according to Eq. 2) and the value indicate the probability of input sample belonging to a class (Ji and Telgarsky 2018). The plot for sigmoid activation function is given in Fig. 2.
L
758
Logistic Regression
1 Er
0.1247 0
50-iterations
Logistic Regression, Fig. 3 Cost value of logistic regression reduced from maximum value down to zero over 50 number of iterations
Logistic Regression, Fig. 2 Pictorial representation of sigmoid activation function. The x-axis represents input variable (z) and y-axis represents output variable (y)
During the training phase, each sample/pattern is passed as an input and the corresponding output is computed. The output value of logistic regression model is compared with the target value and the output error is computed to update the coefficients. Updating the coefficients of logistic regression model is also termed as learning process. The learning process is implemented by decreasing the cost function near to zero by training the logistic regression model to multiple number of iterations. The cost of logistic regression model is given as, Er ¼
1 2
c
y i¼1 i
di
ð3Þ
where i is the number of output variables, yi is the computed output value for ith class and di is desired value for ith class. Pictorial representation of cost function for multiple number of iterations is given in Fig. 3. Furthermore, the logistic regression model is tested with test data, and corresponding class label is assigned to the test pattern by applying max operator. The labeled patterns are evaluated based on various evaluation metrics. The detailed list of evaluation metrics is given in the following section.
Evaluation of Logistic Regression Model The classified samples using logistic regression model in output image are evaluated using different performance metrics such as overall accuracy (OA), precision (P), recall (R), kappa coefficient (KC), and dispersion score (DS). These performance metrics are derived from confusion matrix (CM). CM is a table of actual class labels assigned by logistic regression model to the patterns of different classes. OA is defined as the sum of diagonal elements of CM divide by total number of samples. Precision is ratio of relevant classified samples to the
retrieved classified samples. Recall (R) is the ratio of retrieved classified samples to relevant classified samples. Kappa coefficient (KC) is the measure of overall performance of the model based on the individual classwise agreement. Dispersion score (DS) measures the distribution of classified samples among the classes present in dataset and also, specify the overlapping characteristics of various class boundaries. A classifier with less DS to a particular class has a better ability to classify the samples to the corresponding class.
Example – Logistic Regression In the present study, Sentinel Multispectral Imager (MSI) dataset with two Land use/Land cover classes like Forest and Water are considered to fit the logistic regression model. The dataset consists of 600 pixel vectors such as 300 pixel vectors for each class. Each pixel in MSI Sentinel image is characterized by 12 features (12 spectral bands such as Band 1, Band 2, Band 3, Band 4, . . ., Band 12). The pixel is represented as a point in 12D feature space. The dataset is divided in to training and testing sets with 60% of the pixels selected in to training set and remaining 40% pixels are used for test set. The logistic regression model is trained with train dataset and tested with test dataset. The application of logistic regression model on training dataset is given in Fig. 4. In Fig. 4, B1 and B2 are feature axis and 1, 2 are land use/land cover classes where class 1 is Forest and class 2 is Water. In Fig. 4, few samples of class 1 are assigned to class 2 and few samples of class 2 are assigned to class 1. The logistic regression model is tested with test dataset and the classified pixels in B1-B2 feature space is given in Fig. 5. In Fig. 5, the model classified the test samples with 98% OA. The performance of logistic regression model is similar to OA as tested with other metrics. A few test samples of test data are misclassified to other classes as shown in Fig. 5. Logistic regression performs better in discriminating the samples of various classes in comparison with similar type of models such as supervised kernel based support vector machines (SVM) and ensemble methods. The LR model has the limitation of handling the data with complex relationship between independent variable and dependent variable. The performance of LR is less when the decision boundary among
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence
759
logistic regression model is due to the usage of sigmoid activation function. In the present study, logistic regression model is trained and tested with Sentinel MSI dataset. The logistic regression model produced 98% overall accuracy as tested with Sentinel MSI dataset. The logistic regression model can also be extended to multiclass data classification problems in higher dimensional feature space.
Cross-References
Logistic Regression, Fig. 4 Logistic regression model applied on training set. X- axis represents the feature axis (Band – B1) and Y-axis represents the feature axis (Band – B2). The training data consists of class 1 – Forest (1) and class 2 – Water (2)
Logistic Regression, Fig. 5 Logistic regression model applied on test set. X-axis represents the feature axis (Band – B1) and Y-axis represents the feature axis (Band – B2). The test data consists of class 1 – Forest (1) and class 2 – Water (2)
the classes in the dataset is nonlinear. Furthermore, the combination of LR and dimensionality reduction technique would reduce the complexity between the independent variable and dependent variable and in turn would produce better classification results.
Conclusion In the present study, logistic regression model for data classification is discussed. Linear regression model is used to predict the continuous values for output variable. The limitation of linear regression model is its inability to produce discrete classes to the pixels. Logistic regression model produces discrete classes to the pixels unlike continuous values produced by linear regression model. The advantage of
▶ Geographically Weighted Regression ▶ Machine Learning ▶ Regression ▶ Spatiotemporal Weighted Regression
Bibliography Bacaër N (2011) Verhulst and the logistic equation (1838). In: A short history of mathematical population dynamics. Springer, London, pp 35–39 Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York Cramer JS (2002) The origins of logistic regression. Tinbergen Institute Working Paper Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley Interscience Publications, New York Hilbe JM (2009) Logistic regression models. Chapman and Hall/CRC, Boca Raton Ji Z, Telgarsky M (2018) Risk and parameter convergence of logistic regression. In Foundations of Machine Learning Reunion, University of California Press, Berkeley Lever J, Krzywinski M, Altman N (2016) Logistic regression. Nat Methods 13(8):603–604 Theodoridis S, Koutroumbas K (1999) Pattern recognition and neural networks. In: Advanced course on artificial intelligence. Springer, pp 169–195 Wilson JR, Lorenz KA (2015) Short history of the logistic regression model. In: Modeling binary correlated responses using SAS, SPSS and R. ICSA book series in statistics, vol 9. Springer, Cham. https:// doi.org/10.1007/978-3-319-23805-0_2
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence Helmut Schaeben Technische Universität Bergakademie Freiberg, Freiberg, Germany
Definition Logistic regression is a generalization of linear regression in case the target variable is binary or categorical. Then the error
L
760
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence
term is not normally distributed, and a least squares approach does not apply. The coefficients are estimated in terms of maximum likelihood leading to a system of nonlinear equations. A major application of logistic regression is the estimation of conditional probabilities updating prior unconditional probabilities. Assuming binary predictor variables that are jointly conditionally independent given the target variable the logistic regression model largely simplifies such that its coefficients get independent of one another and can be estimated by mere counting. They are recognized as Turing’s Bayes factors in favor of T ¼ 1 as against T ¼ 0 provided by B ¼ 1 (Good 2011), the Bayes log-factors, or eventually the weights of evidence (Good 1950, 1968). Mineral prospectivity modeling, where the data and the probabilities refer to given locations, applies among other methods logistic regression and weights of evidence even though they cannot account for spatially induced dependencies in the data and the probabilities to be estimated.
Introduction The very definition of probability and its estimation have kept mankind, dilettantes, and professionals busy for centuries. Logistic regression and weights of evidence are two methods to estimate the conditional probability of an event given additional information conditioning the targeted event. The practical application of the latter is hampered by the requirement that the random variables representing the conditions are jointly conditionally independent given the random variable representing the target event. In prospectivity and exploration, geosciences are expected to predict the probability of a specified mineralization at a given location if conditions in favor or to its disadvantage are provided at this location. Neglecting the spatial reference of both the target and the predictor variables, logistic regression and, if joint conditional independence holds, weights of evidence apply. Weights of evidence has been geologists’ most favorite method because of its apparent simplicity. However, the simplicity is deceiving as the modeling assumption of joint conditional independence is involved. The choice of the predictor variables to be sampled is usually guided by geological knowledge of the processes triggering a specified mineralization. Thus a flavor of causality modeling is added to the obvious regression problem to estimate a probability. In fact, conditional independence is a probabilistic approach to causality.
k
\ Bi‘
P
‘¼1
k
PðBi‘ Þ
¼
(1)
‘¼1
for each k-tupel (i1, . . ., ik) with 1 i1 < . . . < ik m, k ¼ 2, . . . m. The random variables B1, . . ., Bm defined on Ω mapping into the sets S‘ of the measurable spaces (S‘, S‘), ‘ ¼ 1, . . ., m, are jointly independent, if the events B‘ ¼ B1 ‘ ðE‘ Þ for any E‘ S ‘, ‘ ¼ 1, . . ., m, are jointly independent. If the random variables possess the probability measures PB‘ with PB‘ ðE‘ Þ ¼ P B1 ‘ ðE‘ Þ , ‘ ¼ 1, . . ., m, then their joint independence is equivalent to m
PB1 ,...,Bm ¼ Pm‘¼1 B‘ ¼ PB‘ , ‘¼1
(2)
that is, their joint probability is equal to the product of their individual probabilities. Stochastic Conditional Independence The random events B1 , . . . , Bm A are jointly conditionally independent of the event A A with P(A) 6¼ 0, if P
m
m
\ B‘ jA
‘¼1
¼
PðB‘ jAÞ,
(3)
‘¼1
that is, if the joint conditional probability of B1, . . ., Bm given A agrees with the product of the individual conditional probabilities of B‘, ‘ ¼ 1, . . ., m, given A. Eq. (3) is equivalent to P Bk j
m
\ B‘ \ A ‘¼1, ‘6¼k
¼ PðBk jAÞ:
(4)
The random variables B1, . . ., Bm are jointly conditionally independent of the random variable T defined on Ω, if m
PB1 ;...; Bm jT ¼ Pm‘¼1 B‘ jT ¼ PB‘ jT ; ‘¼1
(5)
that is, if their joint conditional probability given T agrees with the product of their individual conditional probabilities given T.
Mathematical Basics
• Independence does not imply conditional independence and vice versa. • Conditionally independent random variables may be (significantly) correlated or not. • Conditionally independent random variables are conditionally uncorrelated.
Stochastic Independence Let ðO, A, PÞ be a probability space. The events B1 , . . . , Bm A are jointly independent, if
Conditional independence is a probabilistic approach to causality while correlation is not. A comprehensive account of conditional independence is given in Dawid (1979).
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence
that is, the conditional probability of the event T ¼ 1 given predictors B‘, ‘ ¼ 1, . . ., m, and B0 1?
Odds, logit Transform, and Logistic Function In stochastics, an odds (in favor) O is defined as the ratio of the probabilities of an event and its complement. The logit transform is defined as the natural logarithm of an odds. Thus, for a random indicator variable T where T ¼ 1 indicates presence and T ¼ 0 indicates absence of some event, the logit transform of the probability PT is defined as logit PT ¼ ln OT ¼ ln
P T ð 1Þ PT ð 1Þ ¼ ln P T ð 0Þ 1 PT ð 1Þ
ð6Þ
provided that PT(0) 6¼ 0. Odds and logit transform are depicted in Fig. 1. The logistic function is defined as LðzÞ ¼
1 , z ℝ: 1 þ expðzÞ
761
ð7Þ
Logit transform and logistic function are mutually inverse. The graph of the latter is shown in Figs. 1 and 2.
Estimating Probabilities Odds, logit transform, and logistic function are instrumental in resolving the problem of estimating probabilities. Let T denote a random indicator variable often referred to as target variable T coded such that the value T ¼ 1 indicates the occurrence of an appealing event or the presence of an interesting phenomenon, for example, a certain mineralization. Let B‘, ‘ ¼ 1, . . ., m, m 1, denote a collection of categorical or real-valued continuous random predictor variables providing favorable or unfavorable conditions for the event T ¼ 1. What are reasonable ways to estimate PTjB0 ‘ B‘ ð1j1, . . . , b‘ , . . .Þ ,
Logistic Regression Logistic regression in its basic linear form statistically models, that is, estimates the logit transform logit PTjm‘¼0 B‘ of the conditional probability of a random indicator target variable T to be equal 1 given categorical or real-valued continuous predictor variables B‘, ‘ ¼ 1, . . ., m, m 1, by a linear combination of the predictor variables plus a constant, the intercept B0 1, that is, with m ‘¼0 B‘ ¼ ðB0 , B1 , . . . , Bm ÞT logit PTj‘ B‘ ¼ b0 þ
‘
b‘ B‘ ,
ð8Þ
where the ^denotes the estimator. In practical applications the coefficients β‘ of the linear combination are estimated by b‘ from data logit PTj‘ B‘ ð1j1, . . . ,b‘ , . . .Þ ¼ b^0 þ
‘
b^‘ b‘
(9)
applying maximum likelihood estimation, and usually determined numerically by solving the corresponding system of nonlinear equations by application of iteratively reweighted least squares. Significance tests exist to check the reliability of the model. Since logit transform and logistic function are inverse, the conditional probability of T ¼ 1 given predictors B0, B1, . . ., Bm is given by
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence, Fig. 1 Graphs of odds O associated to probabilities P (left), and logit transform of probabilities P. (From Schaeben (2014b))
L
762
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence, Fig. 2 Graphs of the sigmoidal logistic function Λ(z) (left) and Λ(32z) (right), indicating its fast convergence to the Heaviside function. (From Schaeben (2014b))
PTj‘ B‘ ð1j1, . . .Þ ¼ L b0 þ
‘
b‘ B‘
ð10Þ
Linear logistic regression is optimum, that is, exact, as the estimator agrees with the true conditional probability, if the predictors B‘ are indicator or categorical random variables and jointly conditionally independent given the target variable T (Schaeben 2014a). A linear logistic regression model can be generalized into nonlinear logistic regression models when interaction terms, that is, products of predictors, are included logit PTj‘ B‘ ¼ b0 þ
b‘ B‘
‘
þ ‘i ;...;‘j
b‘i ; ... ;‘j B‘i . . . B‘j :
(11)
Then
PTj‘ B‘ ¼ L b0 þ
‘
b‘ B‘ þ
‘i ;...;‘j
b‘i ; ...; ‘j B‘i . . . B‘j :
essential because the total number 2m of all possible terms would have to be reasonably smaller than the sample size n to be a practically feasible model. Logistic regression seems to have been systematically developed first by Berkson (1944), a standard reference for modern applied logistic regression is Hosmer et al. (2013), and a recent publication that focused on its application to prospectivity modeling is Kost (2020). Logistic regression is a method of statistical learning, the focus of which is inference of interpretable models. Often, the major progress of understanding and knowledge takes place by a trial-and-error approach to select the most powerful predictor variables or products thereof advancing possible inference most efficiently.
Weights of Evidence Without any additional mathematical/statistical modeling assumption Bayes’ theorem for the random indicator target variable T and several random indicator variables B‘, ‘ ¼ 1, . . ., m, m 2, gives logit PTjm‘¼1 B‘ ð1j . . .Þ ¼ logit PTjm1 B‘ ð1j . . .Þ ‘¼1
(12) For indicator or categorical random predictor variables, proper interaction terms compensate any lack of conditional independence exactly. Then logistic regression with interaction terms is optimum, that is, the estimator agrees with the true conditional probability (Schaeben 2014a). Given m 2 predictor variables B‘ ≢ 1, ‘ ¼ 1, . . ., m, there m m is a total of m ‘¼2 ‘ ¼ 2 ðm þ 1Þ possible interaction terms. To compensate a violation of conditional independence the interaction term B‘1 . . . B‘k , k m, is actually required, if B‘1 , . . . , B‘k are not jointly conditionally independent given T. Thus, recognizing which variables or interaction terms cause violations of conditional independence becomes
þ ln Fm
ð13Þ
m
¼ logit PT ð1Þ þ
ln F‘
ð14Þ
, ‘ ¼ 1, . . . ,m:
(15)
‘¼1
with F‘ ¼
PB‘ j‘1 i¼0
Bi T ð:j1, . . . ,1Þ
PB‘ j‘1
Bi T ð:j1, . . . ,0Þ
i¼0
Assuming joint conditional independence of B1, . . ., Bm given T simplifies the factors F‘ to
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence
F‘ ¼
PB‘ jT ð:j1Þ , ‘ ¼ 1, . . . , m: PB‘ jT ð:j0Þ
ð16Þ
Explicitly distinguishing the two possible outcomes for each B‘ and taking logarithms results in the weights of evidence W ‘ ðiÞ ¼ ln
PB‘ jT ðij1Þ , i ¼ 0, 1, PB‘ jT ðij0Þ
ð17Þ
and their contrasts
763
C‘ ¼ W ‘ ð1Þ W ‘ ð0Þ, ‘ ¼ 1, . . . , m,
ð18Þ
and eventually in the log-linear form of Bayes’ theorem referred to as the method of “weights of evidence” in geology logit PTj‘ B‘ ð1j . . .Þ ¼ C 0 þ
C ‘ B‘ ‘
¼ C0 þ
C ‘ B‘
(19)
‘:B‘ 6¼0
PTj‘ B‘ ð1j . . .Þ ¼ L C 0 þ
C ‘ B‘
(20)
‘:B‘ 6¼0
L
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence, Fig. 3 Classification induced by fitted logistic regression models. Realizations of B1, B2 are generated according to a bivariate normal distribution with mean 0 and standard deviation 1. They are plotted in the plane spanned by B1 and B2. Realizations of the indicator random variable T are generated according to the binomial distribution B(1, p) with (i) p as provided by the linear model, that is, p ¼ Λ(2 2B1 þ 2B2), and (ii) p as provided by the nonlinear model including the interaction term, that is,
p ¼ Λ(2 2B1 þ 2B2 þ 2B1B2), respectively. Linear logistic regression analysis applied to simulated data of model (i) yields the two classes separated by the straight line shown at the top left; linear logistic regression analysis applied to simulated data of model (ii) yields the two classes separated by the straight line shown at the top right; nonlinear logistic regression analysis including the interaction term B1B2 applied to simulated data of model (ii) yields the two classes separated by the curve shown at the bottom center.
764
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence
with C0 ¼ logitPT(1) þ ‘ W‘(0). Thus, weights of evidence is first of all a straightforward application of Bayes’ theorem. In practical applications the weights, that is, the involved conditional probabilities, are estimated by relative frequencies, that is, by counting. Weights of evidence were introduced by (Good 1950, 1968), and with respect to potential modeling or prospectivity modeling into geology by Bonham-Carter et al. (1989) and Agterberg et al. (1990).
Relationship of Linear Logistic Regression and Weights of Evidence Comparing Eq. (19) and Eq. (8) reveals with β0 ¼ C0 and β‘ ¼ C‘, ‘ ¼ 1, . . ., m, immediately that weights of evidence is the special case of logistic regression for jointly conditionally independent random indicator predictor variables (Schaeben 2014b). Additional light is shed on logistic regression and weights of evidence, respectively, from a classification point of view. Thresholding of estimated conditional probabilities may be applied to design a {0,1} classification, usually with respect to the estimator of the conditional median of the target variable T0:5 ¼
0 1
if PTj‘ B‘ ð1j . . .Þ 0:5, otherwise:
For logistic regression the locus PTj‘ B‘ ð1j . . .Þ ¼ 0:5 is given by setting b0 þ
‘
b‘ B‘ þ
‘i , ..., ‘j
for
b‘i ,...,‘j B‘i . . . B‘j ¼ 0:
ð21Þ
which
ð22Þ
As Fig. 3 illustrates, if all interaction terms vanish, that is, in the case of linear logistic regression, Eq. (22) represents a hyperplane in the space spanned by the contributing B‘; in the general nonlinear case Eq. (22) represents a curved surface separating the two classes. The method of weights of evidence cannot include interaction terms, because along the derivation from Bayes’ general theorem for several random indicator variables, Eq. (14), to weights of evidence, Eq. (17), they are made to vanish by the very assumption of joint conditional independence. Therefore the two classes of the classification induced by weights of evidence are always separated by a hyperplane. Another way of comparing methods of estimating probabilities or conditional probabilities is by precision-recall curves (PR) rather than receiver operating characteristic (ROC) curves (Davis and Goadrich 2006) because PR curves are more appropriate if the target event T ¼ 1 is a rare event as often experienced in practical applications.
Practical Example with Fabricated Data One hundred triplets of realizations (b1i, b2i, ti), i ¼ 1, . . ., 100, of random indicator variables (B1, B2, T) are assigned to 100 pixels arranged in a (10 10)-array of pixels of digital map images. A specific assignment is depicted in Fig. 4 and referred to as dataset RANKIT. The assignment could be different without changing the ordinary statistics; however, spatial statistics as captured by a variogram would be different (Schaeben 2014b). Obviously, logistic regression and weights of evidence are nonspatial methods, as they are independent of any specific spatial assignment. Testing the null hypothesis of joint conditional independence with the standard log-likelihood ratio test yields
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence, Fig. 4 Spatial distribution of two indicator predictor variables B1, B2, and the indicator target variable T of the dataset RANKIT. (From Schaeben (2014b))
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence
765
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence, Table 1 Comparison of predicted conditional probabilities for various methods applied to the training dataset RANKIT PTjB1
B2 ð1j::Þ
B1 ¼ 1, B2 ¼ 1 B1 ¼ 1, B2 ¼ 0 B1 ¼ 0, B2 ¼ 1 B1 ¼ 0, B2 ¼ 0
Elementary counting 0.25000 0.37500 0.31250 0.03125
WofE 0.58636 0.26875 0.20156 0.06142
Linear LogReg 0.64055 0.27736 0.21486 0.05565
Nonlinear LogReg 0.25000 0.37500 0.31250 0.03125
L
Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence, Fig. 5 Spatial distribution of predicted P ðT¼ 1jB1 B2 Þ for the training dataset RANKIT according to elementary estimation (top left), logistic regression with
interaction term (top right), weights-of-evidence (bottom left), and logistic regression without interaction term (bottom right). (From Schaeben (2014b))
766
Log-Likelihood Ratio Test
p ¼ 0.06443468 and suggests the decision to reject the null hypothesis (Schaeben 2014b). The fitted models of weights of evidence, linear logistic regression without interaction term, and nonlinear logistic regression with interaction term read explicitly as follows: WofE : PTjB1 B2 ð1j::Þ ¼ Lð2:726 þ 1:725 B1 þ 1:349 B2 Þ
ð23Þ
linLogReg : PTjB1 B2 ð1j::Þ ¼ Lð2:831 þ 1:874 B1 þ 1:535 B2 Þ
ð24Þ
non linLogReg : PTjB1 B2 ð1j::Þ ¼ Lð3:434 þ 2:923 B1 þ 2:646 B2 3:233 B1 B2 Þ ð25Þ Since the mathematical modeling assumption of conditional independence is violated, only logistic regression with interaction terms yields a proper model and predicts the conditional probabilities almost exactly. The results of weights of evidence, logistic regression with or without interaction terms, applied to the fabricated dataset RANKIT are summarized in Table 1, and Fig. 5 depicts the results of dataset RANKIT. The example illustrates that application of weights of evidence even though the predictor variables are not jointly conditionally independent of the target variable corrupts its estimated conditional probabilities and assigns their largest values to false locations, that is, false pixels of the digital map image. Logistic regression is the canonical generalization of weights of evidence as it exactly compensates lack of joint conditional independence of random indicator predictor variables by including interaction terms corresponding to the actual violations of joint conditional independence. For random indicator predictor variables logistic regression is optimum. Thus there is no need for ad hoc recipes to fix weights of evidence corrupted by lack of joint conditional independence. Eventually logistic regression is not restricted to random indicator predictor variables, the predictors may be realvalued continuous random variables, rendering their transform to indicator predictors by thresholding as required by weights of evidence dispensable.
applying the truncated iteratively re-weighted least squares algorithm. If the functional form is restricted to linear dependencies only, and if the predictor variables are assumed to be binary and jointly conditionally independent given the binary target variable, then logistic regression largely simplifies to the formula of Bayes’ theorem for several variables referred to as weights of evidence; its numerics is largely simplified to counting as means to estimate probabilities from empirical frequencies of events.
References Agterberg FP, Bonham-Carter GF, Wright DF (1990) Statistical pattern integration for mineral exploration. In: Gaal G (ed) Computer applications in resource exploration. Pergamon, Oxford, pp 1–22 Berkson J (1944) Application of the logistic function to bio-assay. J Am Stat Assoc 39:357 Bonham-Carter GF, Agterberg FP, Wright DF (1989) Weights of evidence modeling: a new approach to mapping mineral potential. In: BonhamCarter GF, Agterberg FP (eds) Statistical applications in the earth sciences, paper 89-9. Geological Survey of Canada, pp 171–183 Davis J, Goadrich M (2006) The relationship between precision-recall and roc curves. In: Cohen WW, Moore A (eds) Proceedings of the 23rd international conference on machine learning (ICML’06), pp 233–240 Dawid AP (1979) Conditional independence in statistical theory. J R Stat Soc B 41:1 Good IJ (1950) Probability and the weighting of evidence. Griffin, London Good IJ (1968) The estimation of probabilities: an essay on modern Bayesian methods. Research monograph no 30. The MIT Press, Cambridge Good IJ (2011) A list of properties of Bayes-Turing factors unclassified by NSA in 2011. https://www.nsa.gov/news-features/declassifieddocuments/tech-journals/assets/files/list-of-properties.pdf Hosmer DW, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, 3rd edn. Wiley, Hoboken Kost S (2020) PhD thesis, TU Bergakademie Freiberg. https://nbnresolving.de/urn:nbn:de:bsz:105-qucosa2-728751 Schaeben H (2014a) A mathematical view of weights-of-evidence, conditional independence, and logistic regression in terms of markov random fields. Math Geosci 46:691 Schaeben H (2014b) Potential modeling: conditional independence matters. Int J Geomath 5:99
Log-Likelihood Ratio Test Alejandro C. Frery School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
Conclusions Logistic regression provides a functional model to estimate conditional probabilities. The regression coefficients are determined according to the maximum likelihood method
Definition The log-likelihood ratio test contrasts two parametric models, of which one is a subset or restriction of the other.
Log-Likelihood Ratio Test
767
Consider the random sample X ¼ (X1, X2, . . ., Xn) from the model characterized by the distribution D ðyÞ, with θ Θ ℝp. One may be interested in checking the null hypothesis that the model for the data belongs to a subset ℋ0 : θ Θ0 Θ of all the possible models, versus the alternative ℋ1 : θ Θ1 ¼ ΘΘ0. An approach consists in comparing the likelihoods of the sample under ℋ0 and under the unrestricted model. The comparison can be made by computing the ratio of the likelihoods, or the logarithm of such ratio. Adapting the definition given by Casella and Berger (2002), we will say that a likelihoods ratio test for contrasting ℋ0 : θ Θ0 against ℋ1 : θ Θ\Θ0 is any test that rejects ℋ0 if ð1Þ
L y; X L y0 ; X
1
k
n
Y j¼1 j
,
pj
and the maximum likelihood estimator in Θ is the vector of proportions p ¼ p1 , p2 , pk , pkþ1 ¼
Y1 Y2 Y , , , k , 1 n n n
>c,
ð2Þ
¼ log L y; X log L y0 ; X
n j¼1 Yj
n
:
L
The likelihoods ratio test statistic is Y
0
Since both numerator and denominator are positive,
log
pYk k
j¼1
where L(θ; X) is the likelihood of the parameter θ for the sample X and c0 is a constant that depends on the desired significance. Assume the maximum likelihood estimator for the problem exists. The numerator in Eq. (1) is the likelihood over all possible values of the parameter, which is maximized by the maximum likelihood estimator y. The denominator in Eq. (1) is the maximum likelihood estimator restricted to Θ0; denote it y0 . Inequality (Eq. 1) becomes
L y0 ; X
n! kþ1 pY 1 pY 2 pYkþ1 Y 1 !Y 2 ! Y kþ1 ! 1 2 n! ¼ pY1 1 pY2 2 k Y 1 !Y 2 Y k ! n j¼1 Yj !
Lðp; X Þ ¼
k
supyY Lðy; X Þ > c0 , supyY0 Lðy; X Þ
L y; X
j {1, 2, . . ., k þ 1}. The parameter space is Θ ¼ {p ¼ ( p1, p2, . . ., pk)} : pj 0 and kj¼1 pj 1g. We want to test the null hypothesis that we know the k probabilities that index the distribution, i.e., ℋ0 : {pi ¼ πi, i {1, 2, . . ., k}}, since pkþ1 ¼ 1 kj¼1 pj. The parameter space in the null hypothesis is the point Θ0 ¼ π ¼ (π1, π2, . . ., πk), which is known. Denote Yi the number of observations in X that are equal to i. The likelihood of the sample is
Y Y Lðp; X Þ p1 1 p2 2 pkþ1 kþ1 p ¼ 1 ¼ Y kþ1 p1 Lðp; X Þ p1 1 pY2 2 pYkþ1
Y1
p2 p2
Y2
pkþ1 pkþ1
Y kþ1
:
Both numerator and denominator are positive, and the latter, being constrained, cannot be larger than the former. Taking the logarithm, we obtain the log-likelihoods ratio: kþ1
‘ðp; X Þ ‘ðp; X Þ ¼
Y i log j¼1
kþ1 pj pj ¼n pj log : ð3Þ pj p j j¼1
0
¼ ‘ y; X ‘ y0 ; X > log c ¼ c is the same test, in which ‘(θ; X) is the reduced log-likelihood of θ for X. We will see two examples before completing the theory of likelihood ratio tests.
Testing Under the Multinomial Distribution Consider the Example 12.6.6 from Lehman and Romano (2005). The random sample X ¼ (X1, X2, . . ., Xn) follows a multinomial distribution, i.e., each random variable may assume one of k þ 1 values with probability Pr(X ¼ j) ¼ pj,
Finally, the test statistic Rn ðX Þ ¼ 2½‘ðp; X Þ ‘ðp; X Þ becomes useful by knowing that, under ℋ0, it is asymptotically a w2k -distributed random variable. Notice that Eq. (3) is closely related to the Kullback-Leibler divergence between p and π. We now illustrate this example with simulations. Fixing π ¼ (0.3, 0.2, 0.1, 0.25, 0.15), we obtain samples of size n {50, 100, 200, 300, 400, 500, 103, 104} from this distribution, then we compute p and obtain Rn. Since Rn is asymptotically distributed as w24 , the asymptotic values of the
Log-Likelihood Ratio Test
Log−likelihoods
768
16
12
4
50
100
200
300
400
500
1000
10000
Sample size Log-Likelihood Ratio Test, Fig. 1 Rn samples boxplots under ℋ0 for n {50, 100, 200, 300, 400, 500, 103, 104}, along with the mean (Eq. 4), median, and 95% and 99% quantiles (dashed lines) of a w24 -distributed random variable
mean, (approximate) median, and quantiles of order 95% and 99% are, respectively, 4, 3.36, 9.49, and 13.28. Figure 1 shows the boxplots of these test statistics after 300 independent replications for each situation, along with these values. Notice that there is good agreement between the sample values and asymptotic quantities.
Testing Under the Normal Distribution Consider the situation in which X is a collection of normal random variables with unknown mean m ℝ and variance s2 > 0. The parameter space is (m, s2) Θ ¼ ℝ ℝ+. The densities are of the form f(x; m, s2) ¼ (2πs2)1/2 exp {(2 x m) /(2s2)}, so the reduced likelihood and log-likelihood are, respectively, L m, s2 ; X ¼ s2
n=2
exp
1 2s2
n
ðXi mÞ2 , and
m¼
1 n
n
Xi , and s2 ¼ i¼1
1 n
n
ðXi mÞ2 :
ð6Þ
i¼1
The estimator given in Eq. (6) maximizes both Eqs. (4) and (5) in the unrestricted parameter space Θ. Assume now we are interested in testing the null hypothesis ℋ0: m ¼ 0. The parameter space under ℋ0 is Θ0 ¼ ℝ+, and we now need to estimate s2 under the restriction m ¼ 0, which amounts to computing s20 ¼
1 n
n
X2i :
i¼1
We may now compare the models with the difference of the log-likelihoods: Rn ðX Þ ¼ ‘ m, s2 ; X ‘ s20 ; X :
i¼1
ð4Þ n 1 ‘ m, s2 ; X ¼ log s2 2 2 2s
n
ðXi mÞ2 :
ð5Þ
i¼1
The maximum likelihood estimator of (m, s2) is, thus, the pair m, s2 given by
Figure 2 shows the boxplots of 300 independent replications for each situation of this test statistics. The situations here considered are n ¼ 100, ℋ0, i.e., N(0, 1) samples, and deviations from ℋ0 which consist of sampling from N(Δ, 1), and Δ {1/10, 2/10, 3/10, 4/10, 5/10}. Notice that the loglikelihoods ratio samples vary around zero, while central value of the others increases with Δ.
Lognormal Distribution
769
Log-Likelihood Ratio Test, Fig. 2 Rn samples boxplots under ℋ0 for Δ {0, 1/10, 2/10, 3/10, 4/10, 5/10}
0.3
Log−likelihoods
0.2
0.1
0.0
−0.1
0
0.1
0.2
0.3
0.4
0.5
Deviation from H 0
Asymptotic Distribution
Lognormal Distribution
The result that makes the likelihoods ratio test widely usable is the following. Consider the test statistic
L ¼ 2 log
L y; X L y0 ; X
:
Wilks (1938) proved that, under mild conditions and under ℋ0, this test statistic has asymptotically a w2r distribution with r degrees of freedom, in which r is the difference of dimensions between Θ and Θ0.
Summary Likelihoods ratio tests are very general and useful for contrasting a variety of hypotheses.
Bibliography Casella G, Berger RL (2002) Statistical inference, 2nd edn. Duxbury Press, Pacific Grove Lehman EL, Romano JP (2005) Testing statistical hypothesis, 3rd edn. Springer, New York Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9(1):60–62. https://doi. org/10.1214/aoms/1177732360
Glòria Mateu-Figueras1 and Ricardo A. Olea2 1 Department of Computer Science, Applied Mathematics and Statistics, University of Girona, Girona, Spain 2 Geology, Energy and Minerals Science Center, U.S. Geological Survey, Reston, VA, USA
Definition A positive random variable X is lognormally distributed with two parameters m and s2 if Y ¼ ln(X) follows a normal distribution with mean m and variance s2. The lognormal distribution is denoted by Λ(m,s2) and its probability density function is (Everitt and Skrondal 2010): f ðxÞ ¼
x s
1 p
2 p
exp
ðln x mÞ2 , 0 < x < 1: ð1Þ 2 s2
Note that the sampling space of X is the positive side of the real axis. By construction, the main property of the lognormal distribution is that ln(X) follows a normal distribution with mean m and variance s2, and, conversely, the exponential form exp(Y) of any normal distribution is a lognormal distribution (Fig. 1). Note that in all cases the lognormal density (Fig. 1a) is unimodal and positively skewed.
L
770
Lognormal Distribution
Lognormal Distribution, Fig. 1 (a) linear scale; (b) logarithmic scale
Historical Remarks and Properties
n
Xn ¼ X0
The normal distribution has long been associated with the modeling of additive errors. The roots of the formulation of the lognormal distribution go back to 1879 to the efforts of Francis Galton and Donald McAlister finding an adequate model for studying multiplicative errors (Galton 1879; McAlister 1879). In fact, given X1,X2,. . .,Xn independent and strictly positive random variables, the product n
Xj
ð2Þ
j¼1
tends to a lognormal distribution as n tends to infinity, as confirmed by the central limit theorem applied to the respective logarithms. Later, different properties in the theory of probability were used to explain the genesis of the lognormal distribution. The most important are the Law of Proportionate Effect and the Theory of Breakage. The Law of Proportionate effect can be formulated as (Gibrat 1930, 1931) Xj ¼ Xj1 1 þ ej , j ¼ 1, . . . , n,
ð3Þ
where X0, X1,. . .,Xn is a sequence of random variables and ε1,. . .,εn are mutually independent and identically distributed random variables, also statistically independent of the set {Xj}. Note that, at the jth step, the change in the variate is a random proportion of the value Xj-1. From Eq. 3 we obtain
1 þ ej :
ð4Þ
j¼1
Assuming that the absolute value of εj is small compared with 1, and using the Taylor expansion of ln(1 þ x), we obtain n
lnðXn Þ ¼ lnðX0 Þ þ
ej :
ð5Þ
j¼1
Finally, using the central limit theorem, we conclude that ln(Xn) is asymptotically normally distributed and, consequently, Xn is asymptotically lognormally distributed. The Theory of Breakage, used in particle size statistics, is essentially an inverse application of the Theory of Proportionate Effect (Crow and Shimizu 1988). In fact, consider that X0 is the initial mass of a particle subject to a breakage process, that is, successive independent subdivisions. Let Xj be the mass of the particle in the jth step. Then, it can be proven that the distribution of Xn is asymptotically lognormal. This theory is very similar to the theory of classification, where the number of items classified in some homogeneity groups can be approximately lognormal distributed. As can be observed in Fig. 1, it seems that the distribution has two shapes. For large variances the density function shows an exponential decline. Otherwise, the density function is peaked and asymmetric. In fact, the density function p (Eq. 1) has two points of inflection at exp m 32 s2 12 s s2 þ 4 (Johnson et al. 1994), but for large values of the parameter s2 these points are too close to the origin to be observed separately.
Lognormal Distribution
771
Also, the lognormal model has been applied in an empirical way to fit many variables. So, there are both theoretical and empirical arguments in support of the existence of natural attributes and phenomena following the lognormal distribution. Aitchison and Brown (1957) and Krige (1978) also made valuable contributions to the use and understanding of the lognormal distribution. Two characteristics of the lognormal distribution that have made it a useful tool in the earth sciences, economics, and population growth are the impossibility of taking negative values and its long upper tail (positive skewness). In the earth sciences, it is used to model the metal content of rock chips in a mineral deposit, the size of oil and gas pools, and in the lognormal kriging method for local estimation. The reader can find a large number of examples and references in Crow and Shimizu (1988), Aitchison and Brown (1957), or Johnson et al. (1994).
Statistics Given X ~ Λ(m,s2), the r-th moment of X about the origin is 1 exp rm þ r 2 s2 2
ð6Þ
obtained using the properties of the moment generating function of the normal distribution, since Y ¼ ln(X). From Eq. 6, we can obtain the mean and the variance as 1 Mean ¼ exp m þ s2 2
ð7Þ
Variance ¼ exp 2 m þ s2 exp s2 1 :
ð8Þ
The shape factors, the coefficient of skewness, and the coefficient of kurtosis, are Skewness ¼ exp s2 þ 2 ∙ exp s2 1
1=2
ð9Þ
As can be observed from the expressions, the relative position of the mean (Eq. 7), median (Eq. 11), and mode (Eq. 12) is Mode < Median < Mean, emphasizing again the positive skewness of the distribution. Given X ~ Λ(m,s2), the properties of the logarithmic function and the properties of the normal distribution allow proving that (1/X) ~ Λ(m,s2), cX ~ Λ(m þ lnc,s2) for c > 0 and Xa ~ Λ(am,a2s2) for a 6¼ 0.
Estimation Given a sample from a Λ(m,s2) distribution, the estimation of the parameters m and s2 is relatively straightforward. As the logarithm of the variable is normally distributed, one must take logarithms of all values and proceed to estimate parameters m and s2 using the well-known estimators for the mean and the variance of any sample. The same is true if we want to obtain exact confidence intervals for parameters m and s2. The main statistics of the lognormal distribution can be obtained by applying Eqs. 7–12. Certainly, the mean and the variance of a lognormal random variable can be estimated applying directly the conventional estimators available for samples, that is, without making any logarithmic transformation. However, it can be proved that such estimators and their confidence intervals may be inefficient and suboptimal. In the literature we can find a large number of works seeking the best estimators. Finney (1941) proposes an unbiased minimum variance estimator for the mean using the Finney function. This solution is difficult to apply because it involves an infinite series, which Clark and Harper (2000) tabulated in an effort to simplify the application. Taraldsen (2005) derives a modified maximum likelihood estimator which approximates the Finney approach. Taraldsen (2005) and Tang (2014) compare several methods to estimate the mean of the lognormal distribution offering some numerical examples and simulations for appreciating the differences. Ginos (2009) does it for both the mean and the variance.
Kurtosis ¼ 3 exp 2 s2 þ 2 exp 3 s2 þ exp 4 s2 3:
ð10Þ
Eq. 10 is zero when the distribution follows a normal distribution. From the expressions Eq. 9 and Eq. 10, it is clear that the skewness and the kurtosis increase with the value of s2. Other measures of interest are Median ¼ expðmÞ
ð11Þ
Mode ¼ exp m s2 :
ð12Þ
Other Forms of the Distribution Mateu-Figueras and Pawlowsky-Glahn (2008) have proposed to consider the multiplicative structure of the positive real line and work with the density function with respect to the resulting multiplicative measure. The result is a density different from Eq. 1 but the same law of probability is obtained. Some interesting differences are obtained in the moments providing a justification for using the geometric mean to represent the first moment. This approach was implicit in the original definition of the lognormal distribution by McAlister (1879).
L
772
Lognormal Distribution
Two other expanded forms of the lognormal distribution have been postulated for accommodating special needs. In the three-parameter lognormal distribution, a value θ is subtracted from Eq. 1, so that now the sampling domain is shifted, starting at θ: 2
ðlnðx yÞ mÞ 1 p , y < x < 1: f ðxÞ ¼ exp 2 s2 ðx yÞ s 2 p
ð13Þ The three-parameter lognormal has been extensively studied by Yuan (1933). The addition of the shift parameter θ gives the model more flexibility but increases the difficulties of estimating parameters. It is an important model in many scientific disciplines. For example, in hydrology, it is used to model seasonal flow volumes, rainfall intensity or soil water retention (Singh 1998). The other expanded form is the truncated lognormal distribution (Zaninetti 2017): p f ð xÞ ¼
xmÞ 2 exp ðln2 s 2
p p s x erf
lnpxl m 2 s
2
erf
lnpxu m 2 s
, xl < x < xu ,
ð14Þ where xl and xu are the lower and upper truncation boundaries, respectively, and erf is the function: 2 erf ðxÞ ¼ p p
x 0
exp t2 dt:
ð15Þ
The truncated lognormal distribution has been used in the assessment of the sizes of undiscovered oil and gas accumulations (Attanasi and Charpentier 2002). If there are problems with the (0, + 1) boundaries of the lognormal distribution, in general it is better to work with other distributions with fixed and flexible boundaries, such as the beta distribution (Olea 2011).
The Multivariate Lognormal Distribution Although not so popular as the univariate distribution, we have the multivariate version of the lognormal distribution. Given Y ¼ (Y1,. . .,Yn) ¼ (ln(X1), . . .,ln(Xn)) a multivariate normal distribution with parameters vector m and matrix S, the distribution of the positive random vector X ¼ (X1, . . ., Xn) is multivariate lognormal and its density function is
f ðx1 , . . . , xn Þ ¼
1 ð2pÞn=2 jSj1=2 x1 xn
1 exp ðln x mÞ0 S1 ðln x mÞ , x Rnþ : 2
ð16Þ
Summary and Conclusions The lognormal distribution matches the tendency of numerous geological attributes that are positively defined and positively skewed, thus highlighting its importance in earth sciences modeling. Also, it is a distribution that has applications supported by theoretical arguments. It is a congenial model due to the relationship with the normal distribution. A large number of publications deal with the problem to directly find consistent estimators and exact confidence intervals for the mean and the variance.
Cross-References ▶ Normal Distribution ▶ Probability Density Function ▶ Random Variable ▶ Variance
Bibliography Aitchison J, Brown JAC (1957) The lognormal distribution. Cambridge University Press, Cambridge, p 176 Attanasi ED, Charpentier RR (2002) Comparison of two probability distributions used to model sizes of undiscovered oil and gas accumulations: does the tail wag the assessment? Math Geol 34(6): 767–777 Clark I, Harper WV (2000) Practical geostatistics 2000. Ecosse North America Llc, Columbus, p 342 Crow EL, Shimizu K (1988) Lognormal distributions. Theory and Applications. Marcel Dekker, Inc, New York. 387 pp Everitt BS, Skrondal A (2010) The Cambridge dictionary of statistics, 4th edn. Cambridge University Press, Cambridge. 480 pp Finney DJ (1941) On the distribution of a variate whose logarithm is normally distributed. J R Stat Soc Ser B 7:155–161 Galton F (1879) The geometric mean, in vital and social statistics. Proc R Soc Lond 29:365–366 Gibrat R (1930) Une loi des répartitions économiques: l’effet proportionel. Bull Statist Gén Fr 19. 460 pp Gibrat R (1931) Les Inégalités Économiques. Libraire du Recuel Sirey, Paris Ginos BF (2009) Parameter estimation for the lognormal distribution. Master thesis, Brigham Young University, https://scholarsarchive. byu.edu/etd/1928 Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions, Wiley series in probability and mathematical statistics: applied probability and statistics, vol 1, 2nd edn. Wiley, New York. 756 pp Krige DG (1978) Lognormal-de Wijsian geostatistics for ore evaluation. South African Institute of Mining and Metallurgy; second revised edition, 50 pp
Lorenz, Edward Norton Mateu-Figueras G, Pawlowsky-Glahn V (2008) A critical approach to probability laws in geochemistry. Math Geosci 40(5):489–502 McAlister D (1879) The law of geometric mean. Proc R Soc Lond 29: 367–376 Olea RA (2011) On the Use of the Beta Distribution in Probabilistic Resource Assessments. Nat Resour Res 20(4):377–388 Singh VP (1998) Three-Parameter Lognormal Distribution. In: EntropyBased Parameter Estimation in Hydrology, Water Science and Technology Library, vol 30. Springer, Dordrecht, 82–107 Tang Q (2014) Comparison of different methods for estimating lognormal means. Master thesis, East Tennessee State University, https://dc.etsu.edu/cgi/viewcontent.cgi?article=3728&context=etd Taraldsen G (2005) A precise estimator for the log-normal mean. Statistical Methodology 2(2):111–120 Yuan P (1933) On the logarithmic frequency distribution and semilogarithmic correlation surface. Ann Math Stat 4:30–74 Zaninetti L (2017) A left and right truncated lognormal distribution for the stars. Adv Astrophys 2(3):197–213
Lorenz, Edward Norton Kerry Emanuel Lorenz Center, Massachusetts Institute of Technology, Cambridge, MA, USA
Fig. 1 Edward Norton Lorenz (1917-2008), courtesy of Prof. Kerry Emanuel
Biography Edward Norton Lorenz, a mathematician and meteorologist, is best known for having demonstrated that the macroscopic universe is almost certainly not predictable beyond fundamental time horizons, putting an end to the notion of a clockwork universe. His work had the most immediate application to weather prediction but has also had a strong influence on mathematics and physics.
This is a condensed version of a more extensive biography of Lorenz, which the author published with the National Academy of Sciences (Emanuel 2011)
773
Lorenz was born in West Hartford, Connecticut, on 23 May 1917, to Edward Henry Lorenz, a mechanical engineer, and Grace Lorenz (née Norton), a school teacher. As a boy he enjoyed solving mathematical puzzles with his father and from his mother learned to love board and card games. Lorenz entered Dartmouth College as a mathematics major in 1934. In 1938 he entered the graduate school of the mathematics department at Harvard, but his mathematical training was interrupted by World War II. He contributed to the war effort by training in meteorology at the Massachusetts Institute of Technology (MIT), and becoming a weather forecaster for the Army Air Corps (now the Air Force) operating out of Saipan, and later Okinawa. After the war, Lorenz decided to pursue academic studies of meteorology rather than mathematics, completing his Ph. D. work at MIT in 1948 and being appointed to the MIT faculty in the mid-1950s. He became interested in the problem of forecasting the weather by using a digital computer to integrate the known equations governing the behavior of fluids like the atmosphere. At the time, a group of statisticians proposed that anything that could be forecast by integrating equations could be forecast at least as well using statistical techniques. Lorenz intuitively felt that was wrong and set out to prove it. By the early 1960s he had found a set of very simple equations that demonstrated certain properties that were then unknown in mathematics, including extreme sensitivity of forecasts to the specification of the initial state of the system. These results were published in a meteorological journal (Lorenz 1963), which later made him famous as the father of chaos theory. Arguably, his most important contribution came in a paper published 6 years later (Lorenz 1969) in which he demonstrated, using a particular mathematical system representing turbulent flows, that the solutions could not be predicted beyond a certain definite time no matter how small one made the errors in the description of the initial state. When married with Irwin Schrodinger’s uncertainty principle, this implies that there exist systems that fundamentally cannot be predicted beyond a certain time horizon. Lorenz died in April, 2008, having made many important contributions to atmospheric science in addition to his fundamental work on chaos theory.
Bibliography Emanuel K (2011) Edward Norton Lorenz, 1917–2008. In: Biographical memoir, vol 28. National Academy of Sciences, Washington, DC Lorenz EN (1963) Deterministic nonperiodic flow. J Atmos Sci 20: 130–141 Lorenz EN (1969) The predictability of a flow which possesses many scales of motion. Tellus 21:289–307. https://doi.org/10.3402/tellusa. v21i3.10086
L
774
Loss Function Tanujit Chakraborty and Uttam Kumar Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India
Definition Over the past few decades artificial intelligence has paved the way for automating routine labor, understanding speech and images, tackling the geosciences and remote sensing problems, supporting basic scientific research through intelligent software, etc. Majority of the tasks are performed through learning from data which is popularly known as machine learning. The generalization performance of a learning method is its prediction capability on independent test data. While assessing the performance of the learning methods, one desires to achieve two separate goals: (a) model selection – estimating the performance of different models in order to choose the best one; (b) model assessment – after selecting a final model, estimating its prediction or generalization error on a new dataset (Hastie et al. 2009). Loss function acts as a building block during this learning phase and enables the algorithm to model the data with higher accuracy. Loss function (also known as cost function or an error function) specifies a penalty for an incorrect estimate from a statistical or machine learning model. Typical loss functions might specify the penalty as a function of the difference between the estimate and the true value, or simply as a binary value depending on whether the estimate is accurate within a certain range. Any loss criterion in classification actually penalizes negative margins more heavily than positive ones as positive observations have already been correctly predicted by the model.
Introduction Learning and improving upon past experiences is the key step in the arena of research of data science and artificial intelligence. In order to understand how a machine learning algorithm learns from data to predict an outcome, it is essential to understand the underlying concepts involved in training an algorithm. The data on which a learning algorithm is trained can be broadly classified as labeled and unlabeled data and the learning is termed as supervised learning and unsupervised learning accordingly. During the training phase, an algorithm explores the data and identifies patterns for determining good values for all the weights and the bias from labeled examples. In supervised learning, a machine learning algorithm builds a model by examining many examples and attempting to find a model that minimizes loss; this process is called empirical risk
Loss Function
minimization (Raschka and Mirjalili 2019). Loss functions are related to model accuracy. If the current output of the algorithm deviates too much from the expected ones, loss function would stump up a very large number. If they are pretty good, it will output a lower number. Gradually, with the help of some optimization functions, loss function learns to reduce the error in prediction (Bishop and Nasrabadi 2006). In Bayesian decision theory, loss function states how costly could be - the use of more than one feature, allowing more than two states of nature for generalization for a small notable expense, allowing actions other than classification facilitating rejection, etc. It is used to convert a probability determination into a decision (Duda et al. 2000).
What Is Loss Function? We shall begin with a quote by François Chollet, “A way to measure whether the algorithm is doing a good job — this is necessary to determine the distance between the algorithm’s current output and its expected output. The measurement is used as a feedback signal to adjust the way the algorithm works. This adjustment step is what we call learning” (Chollet 2021). During the training phase of any supervised learning algorithm, the data-modeler computes the error function for each individual instances by calculating the difference between the actual value and the predicted value. In the next step, the loss function is evaluated as the average error over all training data points. If the model’s prediction is perfect, the loss is zero; otherwise, the loss is greater. The goal of training a model is to find a set of weights and biases that have low loss, on average, across all examples. We shall visualize the loss for a regression problem where we wish to predict the price of an apartment in New York City. We consider two models and calculate the losses for each of them as depicted in Fig. 1. Here the predictions are represented by blue lines and the red arrows indicate the loss. Depending on the problem, the loss function may be directionally agonistic, i.e., it doesn’t matter whether the loss is positive or negative. All that matters is the magnitude. This is not a feature of all loss functions; in fact, the loss function will vary significantly based on the domain and unique context of the problem that we are applying machine learning algorithms to. In some situations, it may be much worse to guess too high than to guess too low, and the loss function we select must reflect that. The machine learning problem of classifier design is usually studied from the perspective of probability elicitation in statistical learning. This shows that the standard approach of proceeding from the specification of a loss to the minimization of conditional risk is overly restrictive. Earlier literature on loss functions is highly biased toward convex loss functions. Recently, various new loss functions have been derived in the literature which
Loss Function
775
Loss Function, Fig. 1 Plot of predicted value (blue), actual output (shaded circle) and loss for each example (red arrows)
are not convex and robust to the contamination of data with outliers, for details see (Vortmeyer-Kley et al. 2021).
Different Kind of Loss Functions There is no “one-size-fits-all loss function” in machine learning algorithms. There are various factors that influences the choice of loss function – one of the major influential factors being the type of problem under consideration. Based on that, loss function can be broadly classified as follows:
Mean Squared Error (MSE) or L2 Loss Function
Mean Squared Error (MSE) is the workhorse of basic loss functions; it’s easy to understand, implement, and generally works pretty well. To calculate MSE, we take the difference between model predictions and the ground truth, square it, and average it out across the whole dataset. The L2 loss function is computed as: MSE ¼
1 n
n
ðY i hðxi ÞÞ2 :
i¼1
A special case of MSE is Root Mean Square Error (RMSE). It is computed as the square root of the MSE and is given by:
Regression Loss Function Regression is a supervised learning method that is used to estimate the relationship between a continuous dependent variable and a set of independent variables. In this section, we will focus on some common loss functions used for the prediction of dependent variable (Hastie et al. 2009). The mathematical expressions of these loss functions are stated using the notations where Yi indicates the target value, h(xi) gives the predicted output, and n is the total number of examples.
In Fig. 2 we have plotted the MAE and MSE loss values for each individual predictions in the interval (10, 10) where the actual output is assumed to be 0.
Mean Absolute Error (MAE) or L1 Loss Function
Log-Cosh Loss
L1 loss function minimizes the absolute differences between the estimated values and the existing target values. Mathematically, the mean absolute error (MAE) is given by
Log-cosh is another loss function used in regression tasks that is smoother than L2 loss function (as evident from Fig. 3). It is the logarithm of the hyperbolic cosine of the prediction error. The mathematical formula is given by:
MAE ¼
1 n
RMSE ¼
1 n
n
ð Y i hð x i Þ Þ 2 :
i¼1
n
j Y i hðxi Þ j : i¼1
n
log cosh ¼
logð coshðhðxi Þ Y i ÞÞ: i¼1
One of the major drawbacks of MAE is that it is a scaledependent measure hence one cannot obtain the relative size of the loss from this measure. To overcome this problem, Mean Absolute Percentage Error (MAPE) was proposed to compare forecasts of different series in different scales. Mathematically, MAPE is denoted as follows: MAPE ¼
100% n
n i¼1
Y i hð x i Þ : Yi
Huber Loss or Smooth Mean Absolute Error
A combination of L1 & L2 loss functions accompanied with an additional parameter delta (δ) gives us the Huber loss. The shape of the loss function is influenced by this parameter (as demonstrated in Fig. 3). So, the delta parameter is finetuned by the algorithm. When the predicted values are far from the original, the function has the behavior of the MAE,
L
776
Loss Function MSE loss vs predcitions
10
100
8
80
6
60
Loss
Loss
MAE loss vs predcitions
4
40
2
20
0
0 –10.0 –7.5 –5.0 –2.5 –0.0
2.5
5.0
7.5 10.0
–10.0 –7.5 –5.0 –2.5 –0.0
Predictions
2.5
5.0
7.5 10.0
Predictions
Loss Function, Fig. 2 Plot of MAE (left) and MSE (right) loss functions
log-cosh loss vs predicted values
Huber loss/ smooth MAE loss vs predicted values 50
8
40
6
30 Loss
Loss
0.1 1 10
4 2
20 10 0
0 –10.0 –7.5 –5.0 –2.5 –0.0
2.5
5.0
7.5 10.0
–10.0 –7.5 –5.0 –2.5 –0.0
Predictions
2.5
5.0
7.5 10.0
Predictions
Loss Function, Fig. 3 Plot of log-cosh (left) and Huber (right) loss functions
closed to the actual values, the function behaves like the MSE. The mathematical form of the Huber Loss is given by (Hastie et al. 2009):
Ld ðy, hðxÞÞ ¼
1 ðy hðxÞÞ2 2 1 2 djyhðxÞj d 2
for jy hðxÞj d, otherwise:
Quantile Loss
The functions discussed so far give us a point estimate of the loss, whereas in certain real-world prediction problems, interval estimate of the loss function can significantly improve decision-making processes. Quantile loss functions turn out to be useful when we are interested in predicting an interval instead of only point predictions. Quantile-based regression aims to estimate the conditional “quantile” (of order γ) of a response variable given certain values of predictor variables. Quantile loss is an extension of MAE (when the quantile is 50th percentile, it is MAE). It is computed as follows:
Lg ðy, hðxÞÞ ¼
ðg 1Þ:jyi hðxi Þj i¼yi 2,
The exponential loss is a convex function and grows exponentially for negative values which makes it more sensitive to outliers. The exponential loss, widely used in optimizing AdaBoost algorithm, is defined as follows:
c¼1
where M ¼ number of classes, log ¼ the natural logarithm, y ¼ indicator if the class label c is the correct classification for the observation o, p ¼ predicted probability for observation o that it belongs to class c. There are several types of cross entropy loss functions in literature. Among them, binary cross entropy or log loss is
Exponential loss ¼
1 N
N
ehðxi Þyi ,
i¼1
where N is the number of examples, h(xi) is the prediction for ith example, and yi is the target for the ith example.
L
778
Loss Function
Hinge Loss
The hinge loss is used for “maximum-margin” classification, most notably for support vector machines (SVMs) (Rosasco et al. 2004). The goal is to make different penalties at the point that are not correctly predicted or are too close to the hyperplane. Its mathematical formula is: Hinge loss ¼ maxð0, 1 y hðxÞÞ, y is the target vector and h(x) is the predicted vector. Hinge loss function can be used for both binary classification and multi-class classification.
data. The growing availability of geoscience big data offers immense potential for machine learning practitioners to apply state-of-the-art methodologies to model this data (Samui 2008); (Toms et al. 2020). The popular types of algorithms that are used in these applications include artificial neural network, support vector machine, self-organizing map, decision trees, ensemble methods such as random forest, neuro fuzzy, multivariate adaptive regression splines, etc. These models use various loss functions in their training phase, however, the most commonly used loss functions in geoscience applications incorporates cross-entropy loss function and Huber loss function due to higher overall accuracy and better learning efficiency.
Kullback-Leibler Divergence Loss
Kullback–Leibler divergence, DKL (also called relative entropy), measures how one probability distribution is different from a reference probability distribution. Usually, P represents the data and the predictions are encoded as Q. The Kullback–Leibler divergence is interpreted as the information gain achieved if P would be used instead of Q which is currently used. The relative entropy from Q to P is denoted by DKL(PkQ) and is evaluated as DKL ðPkQÞ ¼
PðxÞ log xϵw
Pð x Þ , QðxÞ
where P and Q are discrete probability distributions on the probability space w. For P and Q being the distributions of continuous random variables DKL(PkQ) is defined as DKL ðPkQÞ ¼
1 1
pðxÞ log
pðxÞ dx, qðxÞ
where p and q are the probability densities of P and Q, respectively (MacKay and Mac Kay 2003). Kullback-Leibler divergence is not symmetric and does not follow triangle inequality.
Loss Function in Geoscience Geoscience is a field of great societal relevance that require solutions to several urgent problems such as predicting impact of climate change, measuring air pollution, predicting increased risks to infrastructures by disasters, modeling future availability and consumption of water, food, and mineral resources, and identifying factors responsible for natural calamities. As the deluge of big data continues to impact practically every commercial and scientific domain, geoscience has also witnessed a major revolution from being a datapoor field to a data-rich field. This has been possible with the advent of better sensing technologies, improvements in computational resources, and internet-based democratization of
Summary Loss functions provide more than just a static representation of how the model is performing – they are how the algorithms fit data in the first place. Most machine learning algorithms use some sort of loss function in the process of optimization or finding the best parameters (weights) for the data. Importantly, the choice of the loss function is directly related to the formulation of the problem. As such, absolute loss in regression is same as binomial deviance in classification; it increases linearly for extreme margins. Exponential loss is more undesirable than squared-error-loss. When considering robustness in spatial data analysis, squared error loss for regression and exponential loss for classification are not idle. However, they both lead to sophisticated boosting method in forward stagewise additive modeling setting. One must understand that none of the loss functions can be treated as universal, however, choosing a loss function is a difficult proposition and depends on the choice of machine learning model for the problem in hand. We can think of selecting the algorithm as a choice about framing of the prediction problem, and the choice of the loss function as the way to calculate the error for a given framing of the problem. A future scope of research on the area of loss function in geoscience would be to work with Wasserstein loss which has got much attention in machine learning domain for the last decade.
Data and Code The data and code used in this study are available at https:// github.com/tanujit123/loss.
Cross-References ▶ Artificial Intelligence in the Earth Sciences ▶ Big Data
Loss Function
▶ Cluster Analysis and Classification ▶ Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence ▶ Machine Learning ▶ Optimization in Geosciences ▶ Pattern Classification ▶ Probability Density Function ▶ Regression ▶ Support Vector Machines
Bibliography Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 4, no 4. Springer, New York Chollet F (2021) Deep learning with Python. Simon and Schuster Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. A Wiley-Interscience Publication, New York, ISBN 9814-12-602-0
779 Friedman, J, Hastie T, Tibshirani R (2000). Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407 Hastie T, Tibshirani R, Friedman J, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer, New York MacKay D, Mac Kay D (2003) Information theory, inference and learning algorithms, 1st edn. Cambridge University Press, s.l. Raschka S, Mirjalili V (2019) Python machine learning: machine learning and deep learning with Python, scikit-learn, and TensorFlow 2, 3rd edn. Packt Publishing Ltd Rosasco, L., De Vito, E., Caponnetto A, Piana M, Verri A (2004). Are loss functions all the same?. Neural Comput 16(5):1063–1076 Samui P (2008) Support vector machine applied to settlement of shallow foundations on cohesionless soils. Comput Geotech 35(3):419–427 Toms B, Barnes E, Ebert-Uphoff I (2020) Physically interpretable neural networks for the geosciences: applications to earth system variability. J Adv Model Earth Syst 12(9):e2019MS002002 Vortmeyer-Kley R, Nieters P, Pipa G (2021) A trajectory-based loss function to learn missing terms in bifurcating dynamical systems. Sci Rep 11(1):1–13
L
M
Machine Learning Feifei Pan Rensselaer Polytechnic Institute, Troy, NY, USA
Definition Machine learning is the science which allows humans to design algorithms and teach computers to learn patterns from vast amounts of data and use the patterns to automatically make decisions or predictions. In this process, data can be and not limited to the following types: numeric values, text, graph, photos, audio, and more. Machine learning is considered as a subset of artificial intelligence. It is closely related to computational statistics, data science and data mining and frequently applied to other research domains such as natural language processing, computer vision, robotics, bioinformatics, and so forth.
Introduction Machine learning automates computers to make decisions or predictions without being explicitly programmed so. With the exponential growth of data, machine learning has become a crucial component behind many great inventions and services in the past decades. For instance, machine learning enables the accurate recommendation systems for online shopping websites such as Amazon and eBay, the effective search engines such as Google, Baidu, and Bing, the convenient voice assistant systems such as Siri and Alexa. Machine learning has attracted considerable attention since the past century. As early as 1950, Alan Turing raised the questions, “Can machines think?”, in his paper Computing machinery and intelligence (Turing 2009). Later in 1959, the term machine learning was first introduced by Arthur
Samuel in his paper on computer gaming Some studies in machine learning using the game of checkers (Samuel 1959). Nilsson pioneered the research in machine learning with the book Learning Machines in the 1960s, which focused on pattern classification in the discriminant machine systems. Since then, substantial efforts have been put into machine learning research. With the publication of Hinton’s highlycited paper Learning representations by back-propagating errors (Rumelhart et al. 1988) in late 1980s, machine learning research welcomed the era of deep learning. In general, a learning process in machine learning can be described as the following: (1) the model input is denoted as x; (2) the target, can be either a decision or a prediction, is denoted as Y; (3) a dataset D is also provided, where D ¼ {(x1, y1), (x2, y2), . . ., (xn, yn)}, note that y is not always necessary for input data of semi-supervised and unsupervised learning models. The goal of a learning process is to learn a function g: X ! Y that closely approximates the true relation between inputs and outputs f: X ! Y (Abu-Mostafa et al. 2012). The exact target function f: X ! Y usually cannot be learned due to the limitation of data. When encountering unseen data xi, a Yi can be automatically generated by computers using the learned function g: X ! Y. In most cases, the function g: X ! Y is chosen from a set of hypothesis functions based on the characteristics of the data and the properties of the learning tasks.
Type of Learning There are various types of learning designed for different types of input/output data and learning problems. The learning types can be roughly categorized into four major classes: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. Some other learning types include self-learning, feature learning, association rule learning, etc.
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
782
Supervised, Unsupervised, and Semi-Supervised Learning Supervised learning is designed for labeled data, which means the input data are data pairs (xi, yi) with both input xi and ground truth output yi (usually manually labeled). The input data for a supervised model is split into two non-overlapping datasets: training data and testing data. A supervised learning model is built on training data pairs to learn the function g: X ! Y discussed in section “Introduction.” The function g is later applied to the unseen testing data for evaluation. Supervised learning includes classification and regression algorithms, such as k-nearest neighbors, support vector machines, logistic regression, naive Bayes, decision trees, Stochastic Gradient Descent, etc. Active learning and online learning are two of the main variations of supervised learning (Abu-Mostafa et al. 2012). In contrast to supervised learning, unsupervised learning (Hinton et al. 1999) involves unlabeled data. Hence, the input data for unsupervised models only contains x, and the goal of unsupervised learning is to directly learn patterns or structures among data. The primary approach of unsupervised learning is cluster analysis, an algorithm that groups unlabeled data into N clusters based on the underlying similarities. K-means, hierarchical clustering, mixture models, and density-based spatial clustering of applications with noise (DBSCAN) are well-known algorithms for clustering. The main advantage of unsupervised learning is that it requires a minimum amount of supervision and human efforts, especially for data annotation. Semi-supervised learning, similar to its literal meaning, is a hybrid of supervised and unsupervised learning. The model input consists of a small amount of labeled data and a sizable amount of unlabeled data. While unsupervised learning tends to produce lower accuracy, semi-supervised learning introduces a small amount of labeled data for model tuning, which considerably enhances the model performance without significantly increase the annotation expense. Reinforcement Learning Reinforcement learning is also built on unlabeled data but works differently from unsupervised learning. It is most similar to how humans and animals learn basic skills. Imaging a puppy is given a treat every time he/she sits after the command \sit” and is given nothing if he/she does something else. With several repetitions, the puppy gradually learns the correct action to take after the “sit” command under this reward mechanism. In reinforcement learning, the computer is trained similarly: for each input training sample, the outcome is measured accordingly, and the final goal is to maximize the reward for all samples. Reinforcement learning is particularly useful in game theory, control theory, and operations research.
Machine Learning
Learning Models A standard machine learning process involves model building. Commonly seen models include simple models such as linear regression, support vector machine, decision tree, k-nearest neighbors, and more complex models such as neural networks. Linear Models The general idea of linear models is to learn a set of weights, also called coefficients, denoted as w ¼ (w1, w2, w3, . . ., wn) for every feature xi of the input data X. The predicted value y is the linear combination of w and X, where: y ¼ w0 þ w1 x1 þ w2 x2 þ þ wn xn with w0 denotes a constant. The basic linear regression model aims to minimize the sum of squared errors between the predicted values and the ground truth. Mathematically speaking, the linear regression model learns a set of coefficients to minimize the cost function, denotes as: n
ðyi yi Þ2 ¼
i¼1
n
2
k
yi i¼1
xij wj j¼0
Lasso (least absolute shrinkage and selection operator) regression (Tibshirani 1996) is a variation from the basic linear model. It aims to minimize the sum of squared errors with a tuning parameter, also known as the L1 regulation. The cost function of lasso regression is: n
2
n
ðy i y i Þ ¼ i¼1
yi i¼1
2
k
xij wj j¼0
k
þl
wj j¼0
where l is a constant and kj¼0 wj represents the L1 norm. With the regulation, lasso regression automatically performs feature selection and reduces overfitting. Other widely used linear models include logistic regression, multi-task lasso, elastic-net, Bayesian regression and more. Support Vector Machine The Support Vector Machine(SVM) model (Cortes and Vapnik 1995) is a set of supervised algorithms that construct multiple high-dimensional hyperplanes in vector spaces for regression and classification tasks. The SVM model is very effective for high- dimensional data, especially when the datasets are relatively small. It is also very flexible: SVM’s
Machine Learning
decision functions vary with the selection of the kernel functions. Linear, polynomial, sigmoid and Radial Basis Function (RBF) are the frequently used kernel functions for SVM models. Customized kernel functions are also possible for SVM models. Decision Tree A decision tree model produces a flowchart-like, treestructured hierarchy. Each branch represents a condition or a feature, and each bottom layer leaf node represents a class or a label. Decision tree learning is widely used in classification (Classification tree) and regression (Regression tree) tasks. To enhance the accuracy of decision tree learning, Ho Tin-kam proposed the random forest, or the random decision forests model, in her work Random decision forests in 1995 (Ho 1995). Random forest is an ensemble learning method that constructing multiple decision trees and outputting based on the predictions of all decision trees. Random forest model increases the accuracy of the decision tree algorithm and reduces overfitting.
783
Neural Networks Neural networks, the foundation of deep learning, are learning models that designed to replicate the mechanism of human brains. A typical neural network model has an input layer, one or multiple connected, hidden layers and an output layer. Signals are transmitted thought the neurons in each layer and the hidden layers can be organized and connected in different fashion, as shown in Fig. 1. Neural networks can be simple as multi-layer perceptrons and can also be complicated as RNN, CNN, and Long/Short Term Memory (LSTM) models. Neural network models can be used in supervised, unsupervised, semi-supervised, and reinforcement learning. It has a high requirement for computational power but is famous for its portability, robustness, and high accuracy.
Issues and Limitations Overfitting is one of the most common issues for machine learning models. It occurs when a model only focuses on perfectly fitting the training data while losses generality and
M
Machine Learning, Fig. 1 Diagrams of three neural network architectures: Recursive Neural Network (RvNN), Recurrent Neural Networks (RNNs), and Convolutional Neural Networks (CNNs). It is the Fig. 1 in
A Survey on Deep Learning: Algorithms, Techniques, and Applications (Pouyanfar et al. 2018)
784
causes low performance on unseen validation data. Overfitting is very likely to increase the computational cost and lower the portability of the models. However, overfitting can be avoided through properly-designed cross-validation. Machine learning models highly rely on the quality of data. Lack of data, biased data, and inaccessible data are some of the significant issues that fail machine learning models. Training with low-quality data may cause algorithmic bias and requires additional human supervision and remedy. Recently, Xin et al. proposed the idea of Human-in-the-Loop Machine Learning in Xin et al. (2018), which integrates human efforts into the traditional machine learning process and achieves impressive results on typical iterative workflows. Computers, as one of the essential components in learning processes, can also become a barrier to the development of machine learning. Deep learning was first emerged in the early twentieth century but did not fully blossom until the late 1990s when the graphics processing units(GPU) were developed. The quality of computer hardware also largely contributes to the success of learning.
Mandelbrot, Benoit B. Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229 Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58(1):267–288 Turing AM (2009) Computing machinery and intelligence. In: Parsing the Turing test. Springer, Dordrecht, pp 23–65 Xin D, Ma L, Liu J, Macke S, Song S, Parameswaran A (2018) Accelerating human-in-the-loop machine learning: challenges and opportunities. In: Proceedings of the second workshop on data management for end-to-end machine learning. DEEM’18, Association for Computing Machinery, New York. https://doi.org/10.1145/ 3209889.3209897
Mandelbrot, Benoit B. Katepalli R. Sreenivasan New York University, New York, NY, USA
Summary and Conclusions Machine learning offers a solution for accurate and quick data analysis and knowledge discovery in this big data era and enables the advancements in artificial intelligence. Despite the minor drawbacks mentioned in section “Issues and Limitations,” machine learning now tackles various type of data under different problem settings and has been successfully applied to solve data problems in diverse research domains.
Cross-References ▶ Big Data ▶ Deep Learning in Geoscience
Bibliography Abu-Mostafa YS, Magdon-Ismail M, Lin HT (2012) Learning from data. AMLBook, Pasadena Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3): 273–297 Hinton GE, Sejnowski TJ, Poggio TA et al (1999) Unsupervised learning: foundations of neural computation. MIT Press, Cambridge, MA Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, vol 1, pp 278–282 Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MP, Shyu ML, Chen SC, Iyengar S (2018) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv 51(5):1–36 Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. MIT Press, Cambridge, MA, pp 696–699
Fig. 1 Benoit B. Mandelbrot (1924–2010), courtesy of Prof. Laurent Mandelbrot, son of Benoit Mandelbrot
Biography Benoit Mandelbrot has contributed much to several branches of science, often in pioneering ways, by seeing what others could not see. I will describe two of his contributions to turbulence (but omit other topics such as vortical boundaries and the R/S analysis of hydrological data). Chapter ten of his book (Mandelbrot 1982), which he regarded as “[a plea] for a more geometric approach to turbulence and for the use of fractals,” discusses his thoughts at the time; it is fair to say,
Markov Chain Monte Carlo
however, that those views have been superseded in significant ways. One of them was his speculation, on the basis of a freehand sketch prepared by S. Corrsin (1959), that the interface bounding a dye mixed by homogeneous turbulence is a fractal. His ideas on this topic, expressed in his 1982 book (also its 1977 predecessor), were free-wheeling yet remarkable because, at that time, there was no real evidence to back up his speculations. It took me and my colleagues many years of diligent work to bring realism to them; I cite two examples that are almost three decades apart (Sreenivasan 1991; Iyer et al. 2020). A comparison of Corrsin’s schematic (see p. 54 of Benoit’s 1982 book) was accomplished by actual direct numerical simulation, finally, in Sreenivasan and Schumacher (2010). This comparison reveals both the fertility of past imagination and its limitations. The second contribution described here stemmed from the fact, well-known at the time, that turbulent energy dissipation is intermittent in amplitude, in both space and time. Benoit knew that a single fractal dimension would not suffice for describing this property, and developed further ideas (Mandelbrot 1974): while geometric objects such as interfaces can be characterized meaningfully by a single fractal dimension, others such as energy dissipation require a multitude of dimensions. Benoit’s ideas were not yet sharp and, to his later chagrin, he did not use the term “multifractal” until after it was first used by Frisch and Parisi (1985); the characterization of multifractal measures (without using the word multifractal) was the work of Halsey et al. (1986). Later, Benoit connected these developments to his earlier ideas of 1974 and gave a beautiful probabilistic interpretation of multifractals, which led him to devise concepts such as negative and latent dimensions (see Chhabra and Sreenivasan 1991). Meneveau and I (1987, 1991) – see also Sreenivasan (1991) – connected these concepts meaningfully to turbulence, made measurements of the multifractal spectrum for energy dissipation and other such quantities, and developed and measured joint multifractal measures. Benoit was my colleague at Yale for many years. My students and I benefitted greatly from discussions with him, for which I am grateful. His enthusiasm for our work was remarkably uplifting. He was well aware of being described as the “father of fractals,” and he once told me that fractals as a subject wouldn’t exist without him; he is probably right. His most memorable lines are recorded in his magnum opus, the 1982 book: “Clouds are not spheres, mountains are not cones, coastlines are not circles, and bark is not smooth, nor does lightning travel in a straight line.” Toward the end of his career, Benoit tried to recast his legacy as having brought attention to the overarching theme of “roughness” as an important fact of Nature, whereas much of the rest of science was focused on smoothness.
785
All in all, Benoit was a unique scientist who ventured into several areas, some of which went beyond science, and originated key concepts in each of them. He has an assured place in the annals of Science.
Bibliography Chhabra AB, Sreenivasan KR (1991) Negative dimensions: theory, computation, and experiment. Phys Rev A 43:1114(R) Corrsin S (1959) Outline of some topics in homogeneous turbulent flow. J Geophys Res 64:2134 Frisch U, Parisi G (1985) In: Ghil M, Benzi R, Parisi G (eds) Turbulence and Predictability in Geophysical Fluid Dynamics, Proc. Intern. School Phys. “E. Fermi”, p 84 Halsey TC, Jensen MH, Kadanoff LP, Procaccia I, Shraiman BI (1986) Fractal measures and their singularities: the characterization of strange sets. Phys Rev A 33:1141 Iyer KP, Schumacher J, Sreenivasan KR, Yeung PK (2020) Fractal isolevel sets in high-Reynolds-number scalar turbulence. Phys Rev Fluids 5:044501 Mandelbrot BB (1974) Intermittent turbulence in self-similar cascades: divergence of high moments and dimension of the carrier. J Fluid Mech 62:331 Mandelbrot BB (1982) The fractal geometry of nature. W.H. Freeman & Co. Meneveau C, Sreenivasan KR (1987) Simple multifractal cascade model for fully developed turbulence. Phys Rev Lett 59:1424 Meneveau C, Sreenivasan KR (1991) The multifractal nature of turbulent energy dissipation. J Fluid Mech 224:429 Sreenivasan KR (1991) Fractals and multifractals in fluid turbulence. Annu Rev Fluid Mech 23:539 Sreenivasan KR, Schumacher J (2010) Lagrangian views on turbulent mixing of passive scalars. Phil Trans Roy Soc Lond A 368:1561
Markov Chain Monte Carlo Swathi Padmanabhan and Uma Ranjan Indian Institute of Science, Bangalore, Karnataka, India
Definition Probabilistic inference is a way to derive the probability of a set of random variables, and it can take a set of value/values. Markov chain Monte Carlo (MCMC) is a stochastic process that aims to get the inference through efficient sampling means. Markov chain Monte Carlo (MCMC) is a class of methods that combines the Markov chains to Monte Carlo sampling for efficient sampling. This method helps construct the “most likely” distribution that gets us closer to the actual or expected distribution. The Bayesian approach helps quantify the uncertainty and thus gives a method to sample the inference. Various distributions of physical parameters can be studied with the help of this MCMC as the use of posterior distribution maximizes the expectation of the desired
M
786
parameter. Many extensive applications can be found for MCMC from analyzing rock physics to understand the weak points of rock to developing efficient deep learning algorithms to better analyze various phenomena. Some include modelling distributions of geochronological age data and inference of sea-level and sediment supply histories from 2D stratigraphic cross-sections (Gallagher et al. 2009) and so on.
Markov Chain Monte Carlo
evolution involves these algorithms taking in various advances like the Gibbs sampling, perfect sampling, regeneration, and central limit theorem for convergence and so on. Over time, the algorithms have evolved and significantly advanced along with the growth of computation; these techniques have influenced many aspects and applications (Robert and Casella 2011).
Monte Carlo Sampling Introduction In the computation of Bayesian inverse problems, an important step is that of sampling the posterior distribution. The parameters of interest correspond to the properties of the posterior distribution. The direct calculation of this parameter is computationally intractable. Hence, the posterior probability must be approximated by other means (Svensén and Bishop 2007). Typically, the desired calculations are done as a sum of a discrete distribution of many random variables or integral of a continuous distribution of many variables and is difficult to calculate. This problem exists in both schools of probability (Bayesian and Frequentist); it is more prevalent with Bayesian probability and integrating over a posterior distribution for a model. One common solution to extract inference is by Monte Carlo sampling. This method draws independent samples from the probability distribution, and then repeat this process many times to approximate the desired physical quantity. Markov chain is a systematic method of predicting the probability distribution of the future or next state based on the current state. Monte Carlo samples the random variables independent from each other directly from the desired probability density function but fails at higher dimensions. Markov chain Monte Carlo (MCMC) combines both these methods and allows random sampling of high-dimensional probability distributions that takes into account the probabilistic dependence (transition probability) between samples by constructing a Markov chain that comprise the Monte Carlo sample (Gamerman and Lopes 2006). Historically speaking, Monte Carlo methods have existed since World War II times. The first MCMC algorithm, now Metropolis algorithm, was published in 1953. But more practical aspects and integration with statistics took when the work of Hastings came in the 1970s. The development of Monte Carlo methods during the 1940s coincided with the development of the first computer, ENIAC. Von Neuman set the computation of the Monte Carlo method was set up by Von Neuman for analyzing thermonuclear and fission problems in 1947. Fermi invented the version without computers in the 1930s, but it did not gain any recognition. The Metropolis algorithm (first MCMC algorithm) was developed using MANIAC computer under the guidance of Nicholas Metropolis in the early 1950s. From the 1990s to the present, the
Monte Carlo estimates the physical quantity desired as the expectation of random variable averaged over a large number of samples for a desired density function. MC methods are used to solve problems that are stochastic in nature (particle transport, telecommunication systems, population studies based on the statistical nature of survival and reproduction) and that are deterministic (solving integrals, complex linear algebra equations, solving complex partial differential equation (PDE)). The main advantages of using Monte Carlo sampling are to: 1. Estimate density by collecting samples to approximate the distribution of a target function. 2. Approximate a parameter (physical quantity), like the mean or variance of a distribution. 3. Optimize a function by locating a sample that maximizes or minimizes the target function or the desired distribution. 4. Estimate more than one physical quantities simultaneously by using more than one random variables. The random numbers are generally considered to follow a uniform distribution or any probability function desired. The random number generated are independent, thus having no correlation between each sample generated. Sometimes, pseudorandom sequences are used for rerunning the simulations where the pseudorandom sequences involves a known deterministic sequence to sample the random the samples from the probability density function of the physical quantity that needs to be predicted. This method is generally known as inverse distribution method and this is used for Monte Carlo simulations in optical imaging methods (Wang and Wu 2012). The sampling process in Monte Carlo can be explained as follows (Goodfellow et al. 2016). Consider the expectation defined over an integral of a probability density (or distribution) over a random variable x: q ¼ f ðxÞpðxÞ ¼ E½f ðxÞ
ð1Þ
If Eq. 1 is the integral to estimate the quantity q, then q can be approximated by drawing “n” number of samples and then by averaging it as (Goodfellow et al. 2016):
Markov Chain Monte Carlo
qn ¼
787
1 n
n
f xi
ð2Þ
i¼1
The main drawbacks of Monte Carlo Sampling are:
qn is the average of n samples for a quantity, q. The law of large numbers in statistics states that if the samples xi are i.i. d, then the average value, qn , will definitely converge to the expected values of the random variable, provided the variance is bounded by: Var½qn ¼
Var½f ðxÞ n
Drawbacks
ð3Þ
Var½qn is the variance of qn , f(x) is the distribution of the samples and Var[f(x)] is the variance of the f(x). By central limit theorem, we can say that the distribution of qn to a normal distribution has a finite mean and variance. In this case, the mean will be q and variance as in Eq. 3. Hence, from these outcomes, accurate estimations of qn can be done from the corresponding cumulative distribution of the normal density function. The graphical illustration (Wang and Wu 2012) of sampling from a distribution described in Fig. 1 will give a better knowledge of generating the pseudorandom random numbers when we have specific probability density function as goal.
• MC sampling fails at high dimensions because the volume of the sample space increases exponentially with the number of parameters (dimensions). • MC samples each of the random sample from the desired base probability density function and each of them are independent and is drawn independently as well. This increases the complexity of the model.
Markov Chain Monte Carlo (MCMC) The framework for MCMC provides a general approach for generating Monte Carlo samples from the posterior distribution, in cases where we cannot efficiently sample from the posterior directly. We get the posterior distributions in Bayesian inference (Ravenzwaaij et al. 2018). In MCMC, we construct an iterative process that gradually samples from distributions that are closer and closer to the posterior. The challenge is to decide how many iterations we should perform before we can collect a sample as being (almost) generated from the posterior (Koller and Friedman 2009).
M Markov Chain Monte Carlo, Fig. 1 Example illustration of sampling a random variable x from a desired probability density function p(x) based on a random number X generated from a uniform distribution in interval [a, b]. The graph also shows the mapping with the cumulative functions of the same.
788
Markov Chain Monte Carlo
Markov Chains Markov Chain is a special type of stochastic process, which deals with characterization of sequences of random variables. It focuses on the dynamic and limiting behaviors of a sequence (Koller and Friedman, 2009). It can also be defined as a random walk where the next state or move is only dependent upon the current state and the randomly chosen move. Also, the sampling can be explained using chain dynamics as follows. Consider a random sequence of states x0, x1, x2, . . ., which will have a random transition model. A simple Markov chain (sequence) representing the different states x0, x1, x2, . . . xk with weights w1, w2, w3, . . . wk is shown in Fig. 2. Let the state of the process at step “k” be viewed for a random variable Xk. We assume that the initial state X0 is distributed according to some initial state distribution P0(X0). The definition of subsequent states of P are P1(X1), P2(X2), P3(X3). Pðkþ1Þ X ðkþ1Þ ¼ x0 ¼
PðkÞ X ðkÞ ¼ x T ðx ! x0 Þ x ValðX Þ
ð4Þ where the state transition from x to x0 at k + 1 is nothing but the sum of all the possible states of x at step “k.” Markov chains provides good insight to random walk models, but as a standalone model, it is inefficient to reach inferences.
Designing MCMC For implementing a Markov chain Monte Carlo, there are certain conditions of the process to be understood. Firstly, the Markov chain process is considered to be stationary, or it is sampled until it comes to the stationary state. If we arrive at the stationary distribution (invariant distribution) of Markov chain, then it continues to remain in that state for all the future samples of Monte Carlo. The condition for a Markov chain to be stationary is given by definition as (Koller and Friedman 2009): pðX ¼ x0 Þ ¼
pðX ¼ xÞT ðx ! x0 Þ
ð5Þ
x ValðX Þ
where the T denotes the transition probability and π denotes the probability of the states of each sample X. An
Markov Chain Monte Carlo, Fig. 2 A very simple Markov chain representing states x0, x1, x2, . . . xk with weights w1, w2, w3, . . . wk
important property of a transition model of Markov chain is that, for a stationary distribution, the transition model T can be viewed as a matrix A (also known as adjacency matrix). The stationary Markov distribution is then an eigen vector of the matrix A corresponding to the eigen value 1. The representation of a Markov chain sampling using adjacency matrix, A in linear algebra context and π as probability states: pA ¼ p
ð6Þ
where this relation shows the eigen vector relationship between the states and π is the target distribution. In MCMC, the samples are drawn from the probability distribution by building a Markov Chain, and the next sample is drawn from the probability distribution that dependent upon the last sample that was drawn. After an initial number of samples, the distribution will settle or converge to become stationary(equilibrium) that will help estimate the physical quantity desired. Hence, the introduction of correlation between the sampling of variables in Monte Carlo will enable efficient estimation of the density or quantity rather than wandering over a domain.
Markov Chain Monte Carlo Algorithms Various algorithms exist for MCMC. The two commonly used for MCMC are: • The Gibbs sampling algorithm • Metropolis-Hastings algorithm The idea of MCMC is that the sequences are constructed such that, although the first sample may be generated from the prior, subsequent samples are generated from distributions that conclusively gets closer and closer to the desired succeeding values for samples (Koller and Friedman 2009). The initial samples are important for MCMC as it is during these samples, the distributions are observed to know whether it is converging or not. From the initial point, the samples undergo a warm-up phase where the samples are being chained to the Markov process to reach useful values. Generally, some prior samples are eliminated or discarded from the initial phase, once useful values are attained. This is referred to as burning-in of Markov process that marks entry to the stationary distribution of Markov chain (Murphy 2019). For constructing a Markov chain, the detailed balance condition is necessary. The detailed balance condition for a finite state Markov chain is reversible if there exists a unique distribution p for all states of x:
Markov Chain Monte Carlo
pðxÞT ðx ! x0 Þ ¼ pðx0 ÞT ðx0 ! xÞ
789
ð7Þ
Equation 7 asserts that the probability of a transition from x to x0 is the same as the probability of a transition for x0 to x, which implies that a random transition from state x to x0 is reversible and the p(x) forms a stationary distribution to the transition probability T (Koller and Friedman 2009).
Gibbs Sampling Gibbs sampling is a technique used when we have to consider more than one dimensions. The density functions may also be multivariate distributions. The probability of the next state is calculated from the conditional probability of the prior state. Therefore, the new states or samples are conditioned over all the values of the entire distribution (Murphy 2019). The conditional sampling method is easy and accurate when the dimensions are low and more difficult as dimensions increase. Gibbs sampling is an easy to implement model. The samples are acquired from the posterior distribution P(Q|E ¼ e) where “e” is the observations, Q ¼ X E. The states in the Markov chain will be defined from x to X E. Then, transition states are defined so as to converge to a stationary distribution π(Q). For arriving at this condition, let the states in the Markov chain be (x–i, xi). Transition probability of the upcoming states are given as: T i ðxi , xi Þ ! xi , x0i T i ðxi , xi Þ ¼ P x0i jxi
ð8Þ
From Eq. 9, it can be understood that the transition probability for Gibbs sampling depends only on the remaining states x–i for a posterior distribution PN ðX jeÞ for a set of N factors. This shows that PN ðX jeÞ is stationary for the process considered (Koller and Friedman 2009).
Metropolis-Hastings Metropolis-Hastings was initially published in 1970, which was a paper that generalized the Metropolis algorithm and to overcome the dimensionality drawback of the MC methods. In this algorithm, the stationary distribution needed is acquired using the detailed balance condition in Eq. 7. with a particular stationary distribution. The Metropolis-Hastings algorithm requires a proposal distribution as we cannot sample directly from the required distribution. It uses the concept of acceptance probability to decide is the proposed state can be accepted or rejected. If we denote the acceptance probability as A(x ! x0 ), which gives the probability of the state, x0 getting accepted over the proposal distribution t q defines the
Markov Chain Monte Carlo, Fig. 3 Illustration of acceptance probability using an example distribution p(x) on the proposed states. If the x0 is towards the denser part of distribution (green point), then the state is accepted else it might or not be accepted as in the state denoted by blue point
transition model for a given state space. The random states are generated using the defined proposal distribution. When two successive states are generated, the state can be either accepted or rejected based on their acceptance probabilities. For example, if the new state x0 is found to satisfy the desired density function p(x) and the prior state is x, the chance of acceptance of x0 as the nest state is higher if the state is lying closer to the higher density as in Fig. 3. This will lead to the successive states to be closer to the probability values populated around that region. Suppose if the next state is farther away as in right hand side of the distribution in Fig. 3, then it may or may not get accepted. The acceptance here depends on the distance between the current state and the proposed state. Therefore, the condition for acceptance can be defined based on the detailed balance as: pðxÞT ðx ! x0 Þt q ðx ! x0 Þ ¼ pðx0 ÞT ðx0 ! xÞt q ðx0 ! xÞ8x 6¼ x0
ð9Þ Then, the acceptance probability can be defined for a stationary distribution as: Aðx ! x0 Þ ¼ min 1,
pðx0 Þt q ðx0 ! xÞ pðxÞt q ðx ! x0 Þ
ð10Þ
Or in other words, using conditional probability, tq can be denoted as a function, say V, and then
M
790
Markov Chain Monte Carlo
Aðx ! x0 Þ ¼ min 1,
pðx0 ÞV ðxjx0 Þ pðxÞV ðx0 jxÞ
ð11Þ
If the condition for Eqs. 9 and 10 is satisfied, then the process has reached stationary state. Gibbs Sampling method is a special case of the Metropolis-Hastings method. There are cases when it becomes difficult to use the MetropolisHastings algorithms as result of nonlinearity in the models. In applications, Metropolis-Hastings is used in conjunction with a Gibbs sampler, but when nonlinearity in model arises, it becomes difficult to sample from the full conditional distributions. In those cases, some resampling techniques like rejection method is employed in the Metropolis-Hastings algorithm (Gamerman and Lopes 2006) especially in the context of nonlinear and non-normal models.
Summary and Conclusions In short, Markov chain Monte Carlo (MCMC) is a method for efficiently sampling the random variables for accurate or exact estimation of the parameters necessary. Monte Carlo sampling alone cannot provide very good estimations as it fails at higher dimensions particularly. The beauty of extracting inference from Bayesian statistics is what forms the basis for MCMC. It focusses on generating samples from a posterior distribution using a reversible Markov chain that will have its converging or equilibrium distribution as the posterior distribution. The dependency created due to integration of Markov chains will ensure fast convergence to the true or exact solution while combined with Monte Carlo. The conditions for a good MCMC can be defined with the help of transition probability and the stationary condition for Markov chain. The two common algorithms for MCMC are the Gibbs sampling and the Metropolis-Hastings algorithms. Metropolis-Hastings is a more general and commonly used algorithm, while Gibbs sampling can be considered as a special case of Metropolis-Hastings. MCMC based on Metropolis-Hastings is used more because of its flexibility over other algorithms. Gibbs algorithm is more often used when the next step is the conditional probability distribution as the proposal distribution. There are generalized algorithms when the proposal distributions are symmetric in nature like the Gaussian (Murphy 2019). For efficient convergence of the models, we can design the algorithms that has altered Markov chain sequences or that alter the samples generated or the transition kernels based on the complexity of the process under study (Gamerman and Lopes 2006). Different strategies can be adopted based on the requirement of the desired target distribution.
MCMC in Geosciences: Various geophysical problems involve inverse problems wherein an input parameter and an output solution are obtained via inversion of an operator. This is commonly used in seismological models and resolution analysis and model optimization (Gallagher et al. 2009), for example, finding randomly searching earth’s consistent model with seismological model. MCMC provides improved reliability in GIS systems. Generally, GIS problems are ill-posed and solved linearly. Solving nonlinear cases involve non-Gaussian changes and requirement of a posterior distribution. Since complexity involved is high, MCMC improves the prediction and reliability for the nonlinear cases. SAR (synthetic aperture radar) land applications acquires images that are RAW images composed of complex data. The speckles contained can be removed by Markovian random field grouping and the method involves estimating iteratively estimation of parameter, and the conditional posterior distribution of each pixel is updated at every step of the algorithm, with only dependent on its neighboring pixels.
Cross-References ▶ Markov Chains: Addition ▶ Monte Carlo Method
References Gallagher K, Charvin K, Nielsen S, Sambridge M, Stephenson J (2009) Markov chain Monte Carlo (MCMC) sampling methods to determine optimal models, model resolution and model choice for Earth Science problems. Marine Petrol Geol 26(4): 0264–8172 Gamerman D, Lopes HF (2006) Markov chain Monte Carlo: stochastic simulation for Bayesian inference. CRC Press, Boca Raton Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge Murphy KP (2019) Machine learning: a probabilistic perspective. MIT Press, Cambridge Robert C, Casella G (2011) A short history of Markov chain Monte Carlo: subjective recollections from incomplete data. Stat Sci Inst Math Stat 26:102–115. https://doi.org/10.1214/10-STS351 Svensén M, Bishop CM (2007) Pattern recognition and machine learning. Springer, New York Van Ravenzwaaij D, Cassey P, Brown SD (2018) A simple introduction to Markov Chain Monte–Carlo sampling. Psychonomic Bull Rev 25(1):143–154 Wang LV, Wu H-i (2012) Biomedical optics: principles and imaging. Wiley, New York
Markov Chains: Addition
791
0.6
Markov Chains: Addition
T
A
Adway Mitra Centre of Excellence in Artificial Intelligence, Indian Institute of Technology Kharagpur, Kharagpur, India
0.2 0.3
0.2 0.5
Consider a system, which can be described at any point of time t using a state variable Xt. This state variable may be a D-dimensional vector (where D may be 1 or higher), defined on a space X called the state space, and each individual value in the state space is called a state. This space can be either discrete or continuous, depending on the nature of the system. Time may be considered to be discrete. System dynamics refers to the process by which the state of the system changes over time, so that we have a sequence of states {X1, X2, . . ., Xt1, Xt, Xtþ1, . . .}. The changes of state from one time-point to the next are known as state transitions. If the state space is discrete and finite, we can look upon a Markov chain as a finite-state machine which shifts between the different states. Suppose, these state variables are random variables, i.e., their values are assigned according to probability distributions. We describe such a system as a Markov chain if the state transition from Xt1 to Xt is governed by a probability distribution that depends only on Xt1, but not any previous state or any other external information, for any t. This is known as the Markov property. In other words pðXt j Xt1 , Xt2 , . . . , X1 Þ ¼ pðXt j Xt1 Þ
ð1Þ
The conditional distribution p(Xt j Xt1) satisfies the following relation: prob(Xt ¼ b j Xt1 ¼ a) ¼ Tt(a, b) where a and b are two states and Tt is any function specific to time t that defines a probability distribution for Xt with respect to b, i.e., Tt(a, b) 0 8 a, b, and T t ða, bÞ ¼ 1 (if w is discrete), bX
or X T t ða, bÞdb ¼ 1 (if w is continuous). If the state space is discrete and countable, T may be looked upon as a state transition table, where each row denotes a possible value of the current state, each column denotes a possible value of the next state, and the corresponding entry indicates the probability of such a state transition. An example of such a Markov chain is given in Fig. 1. We can define various properties of Markov chains using marginal distributions of individual state variables, i.e., P(Xt) for any t, or their joint distributions, i.e., P(Xt1, Xt2, . . ., Xtn), n is any integer and (t1, t2, . . ., tn) are arbitrary time-points. A Markov chain is said to be stationary or homogeneous if the following holds
A
B
C
0.6 0.3 0.4
0.2 0.5 0.2
0.2 0.2 0.4
0.4 0.2
B
Definition
A B 0.2 C
C 0.4
Markov Chains: Addition, Fig. 1 A three-state discrete Markov chain showing the state transition table
PðXt1 , Xt2 , . . . , Xtn Þ ¼ PðXt1þk , Xt2þk , . . . , Xtnþk Þ8n, k ð2Þ In a homogeneous Markov chain, Tt is independent of t. Further, by setting n ¼ 1, it follows that the marginal distributions of all state variables of a stationary Markov chain are identical, and this distribution is called the stationary distribution of the Markov chain, denoted by π. This basically represents the probability distribution over the state space that the state variable can take at any time. In case of discrete state spaces, where T is the state transition matrix, it can be shown that πT ¼ π.
Introduction Markov chains are widely used in various domains of science and engineering to represent stochastic processes. They are specifically useful in applications where there is a system whose state is continuously evolving, but the evolution has a randomness associated with it. Typically, the system state cannot be directly measured. However, there are measurable quantities that are functions of the system state, and these functions may be either deterministic or stochastic. Markov chains are often used to model the system state in such situations. The main reason for choosing them are as follows: (i) They can take care of the stochasticity, and (ii) the Markov assumption is simplifying and hence computationally tractable. In most cases, the task is to solve the inverse problem, i.e., estimate the system state based on the observations, even though it is really the system state which produces the observations. A good example is in atmospheric sciences, where the state of the atmosphere cannot be measured directly, but other dependent variables such as temperature, precipitation, and wind speed can be measured. Solving the inverse problem also enables us to estimate the dynamics of state transition, which in turn enables forecasting. Markov models find applications in a very wide range of domains. They are used in natural language processing for assigning part-of-speech tags to words in a sentence; in
M
792
Markov Chains: Addition
speech processing for identifying the speakers in a long audio stream; in activity recognition to identify a sequence of activities by a person based on readings from wearable sensors; in bioinformatics to find the most likely alignment between sequences of DNA, RNA, or proteins; and many others. Though the initial applications were mostly in the domains of computer science, signal processing, and computational biology, the past decade has seen their uses in other disciplines including earth system sciences.
Variants and Derived Concepts Various concepts related to or derived from Markov chains are used frequently to solve various problems related to geosciences. Some of the major variants of Markov chains and concepts based on them are discussed below. Hidden Markov Models Consider a sequence of observations {Y1, . . ., Yt, . . ., YT}. Let us consider this as a random variable which may take values from an observation space denoted by y. Yt is a P-dimensional vector (P 1) that may be continuous or discrete. According to the hidden Markov model, the unobserved state variable X is a Markov chain over a discrete state space, and the observation Yt follows a distribution over the observation space that is a function of Xt, i.e., P(Yt ¼ yjXt ¼ x) ¼ E(x, y). Here E(x, y) is called the emission distribution that satisfies E(x, y) 08(x, y) and Eðx, yÞ ¼ 1 (if Y is discrete) and Y Eðx, yÞdy ¼ 1 (if Y is yY
continuous). The salient feature of this model is that the current system state depends only on the previous state, and the current observation depends only on the current state. This can be succinctly represented by a graphical model, as shown in Fig. 2. These assumptions may be simplistic, but they have an important benefit: they allow efficient and exact inference, i.e., we can exactly compute the distribution p(Xt j Y1, . . ., YT) for any t, and also we can compute the most likely sequence of values of the unobserved state variable, given a sequence of observations.
X(t-1)
E
Y(t-1)
T
X(t)
E
Y(t)
T
X(t+1)
T
E
Y(t+1)
Markov Chains: Addition, Fig. 2 Graphical representation of hidden Markov model
For specialized applications, various extensions of hidden Markov models are used. The most common one for earth science applications are the non-homogeneous hidden Markov model, where the state variable Xt is driven by an additional external variable Ut in addition to its past value denoted by Xt1.Inotherwords,Prob(Xt ¼bjXt1 ¼a, Ut ¼c)¼T(a, b, c). 1 ¼ a, Ut ¼ c) ¼ T(a, b, c). This allows the state variable’s marginal distribution to vary over time. Another well-known variant is the factorial hidden Markov model, where the state space is decomposed into smaller spaces, each represented by separate state variables (X1, X2, . . ., XK) in which individual Markov chains are defined. The observation Y is a function of all of these K variables, i.e., prob Y t ¼ y j X 1t ¼ x1 , X 2t ¼ x2 , . . . , X Kt ¼ xK ¼ Eðx1 , x2 , . . . , xK ,yÞ:
Markov Chain Monte Carlo Another important concept based on Markov chains, which is frequently used in the domain of geosciences is Markov chain Monte Carlo (MCMC) sampling. This is essentially a sampling technique, which enables us to estimate the posterior distribution of an unknown quantity X, such as the state variable. It is essentially a Monte Carlo method of computation through random sampling that is based on the law of large number of statistics. The idea is to consider different possible values of the variable X as the states of a Markov chain and estimate the posterior distribution of X as the stationary distribution of such a Markov chain. To estimate this, MCMC methods aim to perform random walk over the state space. But this is possible only if there is a transition function from one state to another, and this matter is handled in different ways. Consider the special case where the conditional distribution of each dimension of X, conditioned on all other dimensions, i.e., P(XijX1, . . ., Xi1, Xiþ1, . . ., XD), is known for all i {1, D}, where D is the dimensionality of the variable X. In this situation, we may use Gibbs sampling, where we choose any one dimension and sample its value from the corresponding conditional distribution, keeping the values of all other dimensions unchanged. Consider an initial value x0 of X. Then using the above procedure, we can sample a new value x1, where at most one dimension will have a different value from x0. This is essentially a transition of the Markov chain. Next we again choose another dimension, sample its value conditioned on the rest, and hence get another sample x2. This process is repeated over and over, till we get a large enough number of samples of X that effectively represent the stationary distribution of this Markov chain and hence the posterior distribution of X. In a more general case where these conditional distributions are not known, but we do know a function f(X) that is proportional to the unknown posterior distribution P(X), we may use another Markov chain Monte Carlo method known
Markov Chains: Addition
as Metropolis-Hastings algorithm. Here too we start with an initial estimate x0 of X. We choose another candidate value y according to a proposal distribution Q(y|x). However, this value is accepted with a probability that is related to its relative likelihood with respect to the current value x0, i.e., according to the ratio ffððxy0ÞÞ. If accepted, we set x1 ¼ y and then proceed in the same way to get the next sample x2. However if y is rejected, then we again sample more values and test them, until one of them is accepted and set as x1. Repeating this process enables us to collect a sample set of X which effectively represents the stationary distribution of this Markov chain and hence the posterior distribution of X. In either case, while collecting the samples, it is necessary to consider samples at regular intervals only (as successive samples are highly correlated to each other). There are also many other algorithms based on the concept of Markov chain Monte Carlo, such as slice sampling. There are also various hybrid methods to enable faster navigation of the state space and focusing on high-density areas (sample with higher likelihood under the posterior P). For a more detailed understanding of these algorithms, the reader is referred to the excellent tutorial paper by Neal (1993).
Applications in Geosciences Now, we come to case studies of various applications of the above models and concepts in various domains of geosciences, such as climate, hydrology, geology, seismology, and remote sensing. Hydroclimatology Hidden Markov models have been used quite extensively for modeling rainfall at hourly or daily spells. The main reason for this is that rainfall usually occurs in spells that may last for a few hours (in case of convective events like thunderstorms), or may hang around for a few days (in case of depressions), and hence the rainfall occurrence and intensity in adjacent time-steps are likely to be related. However, the process of rainfall is dependent on many complex processes in the atmosphere, not all of which can be observed. Hence, it is a common practice in statistical hydro-meteorology to represent the daily/hourly weather state as a latent state variable that follows a Markov chain, and the rainfall at any time-step depends only on the weather state at the corresponding time. However, this weather state may be affected by other atmospheric variables like geopotential height, temperature, and relative humidity, which are introduced as external variables affecting the latent state, as in a non-homogeneous hidden Markov model (NHMM). This modeling style was first
793
introduced by Bellone et al. (2000) and has continued for the next two decades. In these models, the emission distribution of rainfall are usually hybrid: First, a Bernoulli distribution decides whether there is any rain or not, and if so, then the volume is generated by another distribution which may be gamma, exponential, log-normal, etc. In some studies, the parameters of the emission distribution too depend on the external variables. A more sophisticated application of the same idea is developed by Kwon et al. (2018), where the observations are measurements of soil moisture at multiple sites. The latent variable represents the spatial profile of soil moisture. The covariates or external variables of the NHMM are weather observations like temperature and rainfall. The challenge here is to learn the optimal number of values that the state variable can take, and the parameters related to transition and emission distributions. Successful estimation of the model enables the prediction of soil moisture based on weather forecasts. Hydrology In the domain of hydrology, hidden Markov models have been used for modeling daily streamflow sequences in many studies such as Pender et al. (2016). Here, the innovation lies in the emission distribution, which has been specifically designed to model extreme values using extreme value distributions like generalized Pareto (GP) and generalized extreme value (GEV). Some of the states signify extreme conditions of streamflow, for which these distributions are used for emission. Zhang et al. (2018) applied MCMC-based methods in the domain of hydrology, for the purpose of estimating parameters of a groundwater model. They used a Bayesian approach to quantify the uncertainties of the model parameters, conditioned on the magnetotelluric measurements for subsurface salinity of groundwater across a vertical profile. Different types of MCMC algorithms were used to estimate a posterior distribution of the model parameters. In the simple Metropolis-Hastings, a number of sample parameter values were drawn from the prior distribution (based on empirical studies) and accepted or rejected based on the ratio of likelihoods with respect to the hydrological model. In the second approach, an adaptive sampling strategy was considered to navigate the sample space more efficiently, as all accepted samples were used to progressively update the prior distribution. This ensured that samples that are closer to the highdensity regions of the posterior are picked, and hence fewer samples are rejected. Geology and Geophysics Geostatistics is an important concept in the domain of geology and geophysics which aims to estimate geological values
M
794
Markov Chains: Addition
at any arbitrary spatial location, with the help of a geostatistical equation. The geostatistical equation has parameters which may be estimated based on a few locations where the desired variable is measured, using algorithms like kriging. However, Oh and Kwon (2001) use a Bayesian approach for estimation of these parameters based on Schlumberger inverse and well log observations of dipoledipole resistivity. For this, they need a posterior distribution over the parameter space. They first build a prior distribution using simulations conditioned on the observations, after which they carry out MCMC sampling of parameter values. They draw samples from the prior distribution, use them to run simulations, compute the error at the points where observations are available, and subsequently accept or discard these samples based on the error. The prior distribution gets updated as more samples are accepted. Another work by Jasra et al. (2006) considers the use of MCMC approaches to identify the most suitable model parameters, including structural parameters such as number of components in a mixture distribution. They analyze geochronological data to identify age groups of minerals. The geological properties of minerals from each age group are represented by a component of the mixture distribution. However, the number of mixture components is not fixed but itself a random variable, for which a distribution is estimated using a variant of MCMC sampling called reverse-jump MCMC, which allows for birth and death of components.
explored by the research community are to include spatial and temporal characteristics of processes at multiple scales into these models and applying them in conjunction with deep learning – the state-of-the-art approach being increasingly adopted by researchers of all scientific domain, including geosciences.
Remote Sensing Markov chain-related concepts have been considered in the domain of remote sensing too. Harrison et al. (2012) used Markov chain Monte Carlo sampling to estimate posterior distributions on the parameters of land surface models and the uncertainty in simulations of soil moisture by them, by calibrating these models against passive microwave remote sensing estimates of soil moisture. Liu et al. (2021) consider a spatiotemporal HMM for classifying the pixels of satellite image sequences according to land cover types. The spatial consistency of adjacent pixels and the dynamics of temporal changes in land cover are captured through this version of HMM.
Bellone E, Hughes JP, Guttorp P (2000) A hidden Markov model for downscaling synoptic atmospheric patterns to precipitation amounts. Clim Res 15(1):1–12 Harrison KW, Kumar SV, Peters-Lidard CD, Santanello JA (2012) Quantifying the change in soil moisture modeling uncertainty from remote sensing observations using Bayesian inference techniques. Water Resour Res 48(11):11514 Jasra A, Stephens DA, Gallagher K, Holmes CC (2006) Bayesian mixture modelling in geochronology via Markov chain Monte Carlo. Math Geol 38(3):269–300 Kwon M, Kwon HH, Han D (2018) A spatial downscaling of soil moisture from rainfall, temperature, and amsr2 using a Gaussianmixture nonstationary hidden Markov model. J Hydrol 564: 1194–1207 Liu C, Song W, Lu C, Xia J (2021) Spatial-temporal hidden Markov model for land cover classification using multitemporal satellite images. IEEE Access 9:76493–76502 Neal RM (1993) Probabilistic inference using Markov chain Monte Carlo methods. Department of Computer Science, University of Toronto, Toronto Oh SH, Kwon BD (2001) Geostatistical approach to Bayesian inversion of geophysical data: Markov chain Monte Carlo method. Earth Planets Space 53(8):777–791 Pender D, Patidar S, Pender G, Haynes H (2016) Stochastic simulation of daily streamflow sequences using a hidden Markov model. Hydrol Res 47(1):75–88 Zhang J, Man J, Lin G, Wu L, Zeng L (2018) Inverse modeling of hydrologic systems with adaptive multifidelity Markov chain Monte Carlo simulations. Water Resour Res 54(7):4867–4886
Summary and Conclusions The concept of Markov chain has been developed into models like hidden Markov models and its variants and algorithmic frameworks like Markov chain Monte Carlo. These models and methods find applications in many problems related to mathematical geosciences, especially for forecasting, simulation, and uncertainty estimations. Recent challenges being
Cross-References ▶ Bayesian Inversion in Geoscience ▶ Computational Geoscience ▶ Geostatistics ▶ High-Order Spatial Stochastic Models ▶ Interpolation ▶ Machine Learning ▶ Markov Chain Monte Carlo ▶ Markov Random Fields ▶ Remote Sensing ▶ Sequential Gaussian Simulation ▶ Spatial Data ▶ Spatial Statistics ▶ Spatiotemporal Analysis ▶ Spatiotemporal Modeling
Bibliography
Markov Random Fields
Markov Random Fields Uma Ranjan Indian Institute of Science, Bengaluru, Karnataka, India
Definition Most physical phenomena result in structures which have short-range continuity and long-range changes. Such structures are seen in the distribution of rock structures, soil characteristics, temperature and humidity patterns, ore distributions, etc. Even within areas of uniformity or same rock type, there are changes which are random and do not have a spatial structure. Markov random fields are an effective way of characterizing the joint probability distribution of such structures. Markov random fields are typically defined over a discrete lattice, with an associated neighborhood structure. The neighborhood structure dictates how the value at a lattice point is influenced by the values at nearby sites. Hence, a Markov random fields can also be viewed as a graph whose nodes correspond to the lattice points and whose edges correspond to the neighborhood structure. Such a model, also known as a graph model, combines the power of stochastic models with the computational ease and visualization offered by graph models. MRF are used in several areas ranging from computer graphics to computer vision, classification of images to generating textures. MRFs form an essential part of inverse problems, where characterizing a solution as an MRF allows attractive computational benefits to estimating joint probability distributions.
Introduction Markov random fields are mathematical models used to solve problems of estimation of quantities. Mathematical models of acquisition of images are well-studied. Digital images of terrain, rock, or earth samples are obtained through different modes of acquisition such as synthetic aperture radar, electron microscopy, or micro-CT. Each of these use certain properties of the object such as surface reflectance, scattering, or absorption to create a digital image, typically representing the spatial distribution of a certain quantity on a discrete lattice. Inverse problems in image processing are those where the properties of an object are inferred from the images. For instance, a camera may record intensities that are reflected, but these observations are modified by blurring and noise. Hence, the true reflectance is the parameter to be estimated from the noisy observations. In satellite imagery, one is interested in identifying the vegetation type at each pixel, which represents
795
a spatial location. In images of rocks, one is interested in identifying different segments such as rock and pore and sub-types of rocks. Most problems of interest in geosciences are inverse problems. Inverse problems suffer from the problem of ill-posedness. According to Hadamard (1923), a well-posed solution is one which is possible, accurate, and smooth (shows gradual variations for small variations of input). Most natural systems are smoothly varying; hence, this is a reasonable condition to impose on a solution. Moreover, smooth solutions are easier to handle computationally and represent stability with respect to variations of input, a significant advantage in engineering systems. However, a number of factors beyond our control cause a problem to become ill-posed. Limitations of acquisition systems, different types of noise, and functions used for modeling may cause solutions to be non-smooth. One way to make a problem well-posed is through the use of additional constraints of smoothness which force the solutions to correspond to our assumptions of a well-behaved system. This process is called regularization. Markov random fields can be represented through probability distributions. This leads to powerful frameworks such as Bayesian inference which can be used to accurately estimate the distribution of the unknown quantities.
Bayesian Inference One of the most popular methods to obtain regularized solutions to inverse problems is through a probabilistic framework. The variables of interest are assumed to follow a choice of probability distributions. The notion behind the use of probabilistic distribution is the assumption that the underlying quantity is a well-posed approximation of what is actually seen or obtained from a numerical solution. This uncertainty is modeled as a random variable and forms the fundamentals of the Bayesian approach to inverse problems. The quantity which needs to be modelled is estimated as a parameter of the probability distribution. In the context of image restoration, the parameter to be estimated is the actual (de-noised) intensity value at each pixel. In an image segmentation problem, the parameter is the class to which each pixel belongs. The four aspects of Bayesian inverse solutions are: 1. Prior probability distribution 2. An estimated likelihood of the quantity of interest 3. Posterior probability, obtained as a product of the prior probability and the likelihood 4. A loss function, which penalizes an error in the estimate of the parameter of interest
M
796
Markov Random Fields
The prior, likelihood and posterior probability are related through the Bayes theorem. If f is the quantity that is to be estimated and g are the observations (such as the acquired digital image), we have
dependent only on the information in a neighborhood of a fixed size around a point.
Markov Random Field Prff jgg ¼ Prfgjf gPrff g=Prfgg where Pr() refers to the probability function. Pr{f} is the prior, reflecting our prior knowledge of the quantity to be estimated. It represents our belief about the distribution of the quantity of interest prior to observing the actual data. The prior probability is defined on the space of values which the quantity to be estimated can take. Pr{g|f } is the likelihood that the observed data fits the prior probability. This represents a “goodness of fit” of the assumed prior with the actual observations. Since the assumed prior is a model of the acquisition system, Pr{g|f } is the likelihood function which reflects the fit of the observations to the model. This is also referred to as the degradation model. The quantity Pr{f |g}Pr{g} represents the posterior probability distribution. In usual practice, the normalizing constant Pr{g} is neglected, since all observations are assumed to be equally likely. Hence, the conditional probability Pr{g|f } is often referred to as the posterior probability. The posterior probability distribution represents an update of our belief about the distribution of the quantity of interest over the lattice. Since the value to be estimated at each point is considered to be a random variable, its actual value is estimated as a parameter of the posterior distribution. This estimate depends on the loss function that is chosen. The loss function indicates the penalty of choosing a parameter value different from the true value. Different loss functions lead us to different estimators. For example, minimizing the squared loss function leads to estimating the posterior mean. Minimizing the zero-one loss leads to an estimate of the mode of the posterior (also known as the maximum a posteriori estimate (MAP)). If no information about the prior is available, only the likelihood is maximized, leading to maximum likelihood (ML) estimate. Ripley (Ripley 1988) showed that the MAP estimate minimizes a variational energy which consists of two terms – one related to data fidelity and the second to a smoothness. The smoothness condition acts as a regularizer. The desired solution corresponds to the minimum of such an energy. In image processing applications, the posterior probability needs to be estimated over the entire pixel array. When the image sizes are large as in most applications, the posterior probability is computationally intractable. However, when the posterior probability can be modeled as a combination of local interactions, it simplifies the computation of the posterior. This is captured in the notion of a Markov random field, where the assumption is that the information at a point is
If Xs is a quantity of interest (such as intensity, color, texture) that is to be estimated at pixel s in an image S, the Markov property states that the conditional probability of value at a pixel s, estimated from the knowledge of the entire lattice, is the same as the probability of its value, estimated from its neighbors. PrfXs ¼ xs jXt ¼ xt , t Sg ¼ PrfXs ¼ xs jXt ¼ xt , t N ðsÞg Xs is a random variable at pixel location s, and xs is a specific value that Xs can take. The Markov property thus states that the probability of the value at a pixel s in the image, given the value of X over the rest of the image, is the same as the probability of xs s, given only its neighbors. The MRF is then a model of the quantity, together with a probability measure which represents the distribution of the quantity of interest. The probability measure is assumed to take values from a pre-determined set xs Ω. These values represent the digitized values of the property (such as color, texture, gradient, or more complex quantities such as class to which the pixel belongs). The notion of a local neighborhood induces a connectivity structure. For example, in Fig. 1, the central orange pixel is connected to eight neighbors. This configuration is referred to
Markov Random Fields, Fig. 1 Eight neighborhood
Markov Random Fields
797
as an eight-neighborhood structure. This neighborhood structure is commonly considered to be symmetric, i.e. is x is a neighbor of y, and then y is a neighbor of x. Hence, an MRF can also be represented as an undirected graph S, G, where S consists of the set of nodes and G represents the set of edges between the nodes. The graph structure enables the use of efficient computational methods such as graph coloring to perform segmentation tasks on images. The blue lattice cells are the neighbors of the central orange lattice cells, and the black edges denote the connectivity structure. The nondirectional nature of the graph representation does not mean that the underlying process which gave rise to the observations is nondirectional. In some cases, the observations may correspond to an underlying drift or diffusion process which could have a direction. The Markov random field in such a case represents the (final) evolutionary state of the underlying process. Hence, an MRF could also be viewed as a snapshot in time of an underlying evolutionary process. This viewpoint has been used in some cases to define hidden factors which correspond to actual physical entities and their evolution, and these factors could in turn influence the observed MRF. This formulation of the MRF, however, does not directly give us a procedure to actually construct a probability distribution which satisfies the MRF property. A practical way to construct an MRF is given by the Clifford-Hammersley theorem of MRF-Gibbs equivalence: If an MRF also satisfies the following properties, then it has a joint distribution which is a Gibbs distribution (Rangarajan and Chellappa 1995):
The energy U(X) over the entire configuration X can be factored into energies over smaller, localized set of nodes called cliques. A clique is defined by a neighborhood configuration in such a way that all the sites in the cliques are neighbors of each other. This offers attractive possibilities of creating computationally efficient means of estimating the MRF. Figure 2 illustrates the possible cliques of two sites in a fourneighborhood and eight-neighbourhood of a regular lattice. In the four-neighborhood case, two-site cliques can only be of two orientations, while in the eight-neighborhood case, there are four possible orientations. It is to be noted that Fig. 2b shows only the unique orientations. Each clique c is associated with an energy Vc, such that the energy of the entire configuration is given by the sum of the clique potentials:
• Positivity: Pr{X ¼ x} > 0 • Locality: Pr{Xs ¼ xs|Xt ¼ xt, t S} ¼ Pr{Xs ¼ xs|Xt ¼ xt, t N(s)} • Homogeneity: Pr{Xs ¼ xs|Xt ¼ xt, t N(s)} is the same for all sites s in the lattice.
A modification over the MRF is when the variables are not directly observed through a measurement but are inferred from other variables that are observed. For example, fault planes, lithospheric boundaries, or mineral distributions are not directly observed but derived from intermediate variables
Positivity implies that the values at all pixels are positive, locality is the Markovian property, and Homogeneity implies that the probabilities are independent of spatial location. The Gibbs distribution is a global property over the entire lattice, given by the following property: PrðXÞ ¼
CV c ðXÞ c
Gibbs distributions are widely used in statistical physics and have the advantage that they correspond to the minimum value of an energy functional. Thus, the problem of estimating the maximum a posteriori (MAP) estimate is now converted to the problem of estimating the minimum of an energy functional. This enables us to define the energy functional using contextual spatial knowledge for the specific problem.
Factor Graphs
a
b
1 1 exp U ðXÞ Z T
f is a configuration in the sample space, and Z is a normalization constant given by Z¼
1 exp U ðXÞ T XS
4 neighbourhood
8-neighbourhood
Markov Random Fields, Fig. 2 Two-node cliques in different neighborhoods
M
798
Markov Random Fields, Fig. 3 MRFs vs. factor graphs. (Reproduced from Bagnell 2021)
such as Potts spins. These intermediate variables are known as latent variables. Latent variables are modeled through the use of factor graphs. Factor graphs make explicit use of latent factors to represent unobserved variables. The neighborhood structure in MRF and Gibbs fields does not specify whether the neighborhood pixels must all be considered simultaneously or pairwise. Factor graphs resolve this ambiguity. In Fig. 3 (reproduced from Bagnell 2021), it can be seen that the ambiguity related to diagonal pixels is easily resolved in the factor graph representation.
Solutions of MRF Finding the optimal joint distribution of an MRF which minimizes a cost function is in general intractable. The Gibbs distribution, while offering a way to compute distributions over a local neighborhood, includes a normalization constant which is NP-hard to compute, since it involves a summation over all possible clique energies. However, a number of approximations exist which sample the Gibbs distribution in a manner which does not need a normalization. An efficient way to estimate the Gibbs distribution on a discrete lattice is through the Gibbs sampler, which is a special case of a Markov chain. This is the basis of the Markov chain Monte Carlo (MCMC) method. The choice of the prior distribution also has a role in determining the complexity of the solution. In general, it is desirable that the prior and posterior probability distributions have the same functional form. This is guaranteed through the use of conjugate prior functions for the given likelihood function. Normal distributions are a popular choice to ensure conjugate priors. Some of the methods that have been used for the MAP estimate are: Simulated Annealing (Geman and Geman 1984): The Gibbs sampler was using to obtain the global minimum of
Markov Random Fields
the energy function which corresponded to the maximum a posteriori estimate. The energy functional is defined over an entire image configuration, and the Gibbs sampler traverses the state space, accepting configurations of a lower energy according to a probability dependent on the temperature. The temperature follows an annealing schedule, which allows the Gibbs sampler to converge in probability to the global optimum solution. This method, while giving very good results, is extremely slow. Some of the later modifications (Ranjan et al. 1998) have focused on reformulating the state-space to accelerate convergence. Iterated Conditional Modes (Besag 1986): An iterative algorithm was presented, where the value at a point is updated one parameter at a time, while holding the other parameters fixed. This corresponds to a local steepest descent of the conditional probability. This results in a local minimum solution of the posterior probability and has the disadvantage of being sensitive to initial conditions. However, its rate of convergence is fast. The highest confidence first (HCF) method introduces a confidence measure (stability) for each node. The algorithm starts with a null state at each pixel and terminates at a configuration which minimizes the variational energy. At each step, the stability of a node is computed as an estimate of the local posterior function, i.e., as a combined measure of the observable evidence and a priori knowledge about the preferences of the current state over the other alternatives. The ICM updates only the node with the highest stability. A higher stability denotes an estimate in higher consistency with the observations and a priori knowledge. In this way, the HCF algorithm makes maximal progress towards a final solution based on current knowledge about the MRF. Belief Propagation: In belief propagation, the interactions between the nodes are modeled as messages being passed between them. The messages represent a state of “belief” of each pixel about the marginal distribution of its neighbor. The messages are updated in a single direction, and the marginal distribution is estimated from a normalized set of beliefs after the messages have converged. A number of improvements to accelerate convergence and improve memory efficiency have been proposed in recent literature. Multidimensional dependencies and inter-class relationships have been modeled in the context of geosciences through Markov chain random fields (MCRF) (Li 2007). These are a special category of Markov Random Fields (MRF) where the neighborhood structure is not fixed but may vary in different directions.
Marsily, Ghislain de
Although initial use of the MRF formulation in geosciences was focused on images, it is being increasingly used in more general field settings such as characterization of using the Rayleigh scatter of the Earth’s microseismic field (Tsukanov and Gorbatnikov 2018). Thus, the MRF formulation and its accurate solutions are central to developing an increased understanding of the structural and environmental properties of the Earth.
799
Marsily, Ghislain de Craig T. Simmons College of Science and Engineering & National Centre for Groundwater Research and Training, Flinders University, Adelaide, SA, Australia
Summary and Conclusions Spatial data such as images can be modeled through Markov random fields. Obtaining an MRF that satisfies optimality conditions together with constraints that contribute to regularized solutions is a rich field. Many of the approaches that have been used for estimating MRFs have also been presented here. Many of these estimations use stochastic optimization methods to find global minimum solutions. Recently, deep learning methods have been combined with Markov random field models to provide deterministic solutions to inverse problems. These frameworks provide the advantage of computational scaling along with providing contextual information and capturing higher-order relationships. MRFs are hence an active area of research in the foreseeable future.
References Besag J (1986) On the statistical analysis of dirty pictures. J Royal Stat Soc 48(3):259–302 D Bagnell (2021) Gibbs Fields and Markov Random Fields. In: Statistical techniques in robotics (16-831, F10), Carnegie Mellon University. http://www.cs.cmu.edu/~16831-f14/notes/F11/16831_ lecture07_bneuman.pdf. Accessed 24 Sept 2021. Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741 Hadamard J (1923) Lectures on Cauchy’s problem in linear partial differential equations. Yale University Press, New Haven Li W (2007) Markov chain random fields for estimation of categorical variables. Math Geol 39:321–335 Rangarajan A, Chellappa R (1995) Markov random field models in image processing. In: Arbib M (ed) The handbook of brain theory and neural networks. MIT press, Cambridge Ranjan US, Borkar VS, Sastry PS (1998), Edge detection through a timehomogeneous Markov system. J Indian Inst Sci 78:31–43 Ripley BD (1988) Statistical inference for spatial processes. Cambridge University Press, Cambridge Tsukanov AA, Gorbatnikov AV (2018) Influence of embedded inhomogeneities on the spectral ratio of the horizontal components of a random field of rayleigh waves. Acoust Phys 64:70–76
Fig. 1 Professor Ghislain de Marsily. (Photo by Antoine Meyssonnier 2014)
Biography Professor Ghislain de Marsily is an internationally renowned scientist famed for his contributions to groundwater hydrology and water management. A highly active member of the French Academy of Sciences, he is currently Professor Emeritus at both the Sorbonne University (Pierre et Marie Curie) and Paris School of Mines, France.
Career Overview Graduating from the École des Mines as an engineer in 1963, de Marsily went on to complete a doctoral degree in science at Pierre et Marie Curie in 1978. He then proceeded to build a long and distinguished career in hydrology characterized by energy, innovation, and technical mastery. As a hydrologist, Professor de Marsily has researched areas including basin analysis, fluid mechanics, solute transport, waste disposal, river ecology, surface hydrology as well as hydrogeology, water resources management, and global food production. He is a pioneer in the development of stochastic hydrogeology and inverse methods and is the originator of the “hydrogeological national parks” concept to protect the world’s dwindling supplies of groundwater. He has collaborated extensively to study groundwater systems throughout the world and has been influential in exploring the hydrology of North Africa together with his African colleagues. He has also worked in India with his former students.
M
800
In addition to his personal science, de Marsily has been internationally important in his capacity to build networks and collaborations to address global issues at local levels. In 1989, he established the UMR CNRS SISYPHE, an interdisciplinary research unit involving the National Centre for Scientific Research and the University Pierre and Marie Curie, directing the initiative until 2000. From 1989 to 1999, he established and directed the PIREN-SEINE program at the National Centre for Scientific Research, studying the hydrology of the whole of the Seine catchment area. From 2000 to 2004, he was the founder and director of the Postgraduate School of Geosciences and Natural Resources at Pierre et Marie Curie. From 1994 to 2006, he was a member of the national commission which evaluated research into the management of radioactive waste in France.
Scientific Contribution Professor de Marsily has made significant and sustained contributions to mathematical geosciences and porous media flow and modeling. He is recognized as a pioneer in geostatistics, groundwater modeling, and approaches for dealing with hydrogeologic spatial heterogeneity. He worked closely with geostatistician Georges Matheron, particularly on a paper dealing with transport in porous media (Matheron and de Marsily 1980). Their paper showed, for the very first time that for the special case of a stratified porous medium with flow parallel to the bedding, that solute transport cannot, in general, be represented by the usual convection-diffusion equation, even for large time. They went on to demonstrate that when flow is not exactly parallel to the stratification, diffusive behavior will eventually occur at large times. They prompted and emphasized the need for further work on the mechanism of solute transport in porous media. This paper was an extraordinary breakthrough triggering a major line of critical scientific inquiry that is of profound importance to groundwater science and to mathematical geosciences to this day. It has altered our understanding of groundwater flow and solute transport behavior, demonstrating for the first time the ways in which diffusion and dispersion act to control those processes. The Matheron and de Marsily (1980) paper has become a seminal work in hydrogeological science. Cited 731 times (Google Scholar 25 March 2021), it remains one of the most cited papers in groundwater science to this day. de Marsily’s impact extends well beyond his own research. He has been a driving force in communicating and teaching groundwater science to students and professionals around the world. His textbook Quantitative Hydrogeology: Groundwater Hydrology for Engineers (de Marsily 1986) is a classic text in groundwater science. It is a unique book, placing great emphasis on both qualitative and quantitative hydrogeology.
Marsily, Ghislain de
As noted in its preface: “the qualitative and the quantitative aspects are not treated separately but combined and blended together, just as geology and hydrology are woven together in hydrogeology.” This was, and remains, a simple but profound observation that has done more to shape the field of hydrogeology than many research papers. It reflects de Marsily’s keen interest and success in understanding concepts and to ensure that they are treated physically, mathematically, and philosophically in a robust and rigorous manner. The textbook is still in its first and only edition and remains in print to this day, with 3161 citations (Google Scholar 25 March 2021) – and growing. Professor de Marsily remains an energetic, inspirational, and provocative scientist, a visionary thinker, and a world leader in groundwater research. He has inspired, mentored, and taught generations of researchers and students while making a remarkable impact on all aspects of hydrogeology research, education, policy, and management. Emphasizing the vital relevance of science to society, he has both made and inspired important contributions to some of the most critical environmental debates of our time, including critical planetary-scale insights into water, climate change, food and population growth, and nuclear waste disposal.
Awards and Honors Recognizing his outstanding contributions to hydrogeology, including his legendary lectures and numerous books on hydrogeology, fluid transport, and geostatistics, de Marsily has received many awards, including the O. E. Meinzer Award of the Geological Society of America, Robert E. Horton Medal of the American Geophysical Union, and the President’s Award of the International Association of Hydrogeologists. In addition to these honors, de Marsily is Chevalier de la Légion d’Honneur, a Member of the French Academy of Sciences, French Academy of Technologies, French Academy of Agriculture, and a foreign Member of the US National Academy of Engineering. In recognition of his scientific achievements and significant contributions to hydrogeology, de Marsily was awarded the degree of Doctor of Science honoris causa from Flinders University (Australia), University of Neuchâtel (Switzerland), and University of Québec (Canada).
Bibliography de Marsily G (1986) Quantitative hydrogeology. Groundwater hydrology for engineers. Academic, New York, 440p Matheron G, de Marsily G (1980) Is transport in porous media always diffusive? A counterexample. Water Resour Res 16(5):901–917
Mathematical Geosciences
Mathematical Geosciences Qiuming Cheng School of Earth Science and Engineering, Sun Yat-Sen University, Zhuhai, China State Key lab of Geological Processes and Mineral Resources, China University of Geosciences, Beijing, China
Definition Mathematical geosciences or geomathematics (MG or GM) is an interdisciplinary field involving application of mathematics, statistics, and informatics in earth sciences. The International Association for Mathematical Geosciences (IAMG) is a unique international academic society with mathematical geosciences as part of its tittle and the mission of which is to promote, worldwide, the advancement of mathematics, statistics, and informatics in the Geosciences. However, the lack of a unified definition of mathematical geosciences or geomathematics (MG or GM) as an interdisciplinary field of natural science may lead to misunderstanding of the subject or not even treating it as an independent discipline. This has, to some extent, affected the development of the field of mathematical geosciences. Since late 1990s, the author of the current chapter has served IAMG as capacities of, council member, vice president, and president. Thus, the author has witnessed and contributed to the transformation of IAMG from Mathematical Geology to Mathematical Geosciences, as well as various updates of IAMG associated publications including journals and conferences proceedings. In 2014, in his IAMG president column in the newsletter, the author discussed about a definition and connotations of mathematical geosciences as well as demonstration of contributions of mathematical geoscientists and frontiers of mathematical geosciences. Mathematical Geosciences was defined as an interdisciplinary scientific subject that integrates mathematics, computer science, and earth science, and it is the study of the mathematical characteristics and processes of the earth (and other planets) and the prediction and analysis of its resources and evaluation of environment.
Introduction The International Association for Mathematical Geosciences (IAMG) is a unique international association with the term of Mathematical Geosciences as its title and the mission of promoting application of mathematics, statistics, and informatics in earth science. IAMG was initially established with its title of International Association for Mathematical Geology in 1968 at the 23rd International Geological Congress
801
(IGC) and in affiliation to the International Union of Geological Sciences (IUGS) and the International Statistical Institute (ISI). In addition to Mathematical Geology, Geomathematics and Geostatistics were also suggested to be used as its name. Some books and series of monographs published by the IAMG members in the field of mathematical geology also used these names. For example, the books authored by Frits Agterberg and published by Springer use geomathematics as book titles: the book “Geomathematics: Mathematical Background and Geo-science Applications” in 1974 and the book “Geomathematics: Theoretical Foundations, Applications and Future Developments” in 2014. Since mathematical geology could be literally interpreted as a branch of geology, some government Natural Science Foundations also group mathematical geology as a subdiscipline of geology. With the continuous development in the field, the applications of mathematical theory and methods as well as relevant software systems are no longer limited to geology. They are also widely used in other fields of geosciences including but not limited to hydrology, geophysics, and geochemistry. To serve the community better and to represent the complete connotation of the discipline, during the General Assembly of IAMG at the 33rd IGC held in Oslo in 2008, the name of the Association was changed from International Association for Mathematical Geology to International Association for Mathematical Geosciences with the same abbreviation IAMG. Accordingly, it revised the names of journals, for example, the flagship journal “Mathematical Geology” was renamed as “Mathematical Geosciences.” Thus, the three journals of IAMG that time include Mathematical Geosciences, Computers & Geosciences, and Natural Resources Research. Since then, these journals and IAMG annual conference proceedings have published works with more variety of mathematical geoscience topics. The name change of the association must be important in broadening and expanding the application fields of mathematical geosciences. One example worth mentioning is that IAMG established an affiliation with the International Union of Geodesy and Geophysics (IUGG). In this way, the international alliance to which IAMG is affiliated has expanded to include the International Union of Geological Sciences (IUGS), the International Union of Geodesy and Geophysics (IUGG), and International Statistics Institute (ISI). These three international organizations are large and active union members of the International Science Council (ISC). They cover most of the disciplines of earth sciences, mathematics, and statistics. Establishing affiliations and actively engaging in activities with these Unions have broadened the IAMG’s international collaboration which in turn enhanced its international visibility and influence in the mainstream earth science community. Now IAMG fully represents a branch of geosciences rather than only geology.
M
802
It should be noted that Mathematical Geosciences (or Geomathematics), as a relatively young earth science discipline, still has not been widely accepted by the mainstream field of geosciences and is even often ignored. On the one hand, it may be due to the organization of the association which has relatively small number of registered members. On the other hand, it may be because the society does not have a clear subject definition and connotative boundaries, especially the unclear subject boundaries with other related disciplines. Without a clear definition of the Mathematical Geosciences subject, those who indeed work in the fields of mathematical and statistical geosciences are not considered as mathematical geoscientists rather as geodesists, geophysicists, etc., and their work and academic contributions are not regarded as mathematical geosciences. For example, the section of this chapter will introduce an internationally renowned scientist Professor Dan McKenzie of Cambridge University in the United Kingdom. He is a physicist and mathematician by education and training. In the 1960s, he combined mathematical thermodynamics with geoscience and systematically established thermal structural models for describing plate tectonics and mantle dynamics. Because of his pioneering and groundbreaking work in the formation of plate tectonics theory, Professor Dan McKenzie is praised as one of the four major contributors to the development of the modern plate tectonics theory (https://geolsoc.org.uk/ Plate-Tectonics/Chap1-Pioneers-of-Plate-Tectonics/DanMcKenzie). McKenzie’s work is well known in the fields of geophysics and tectonics, and he is also considered as a geophysicist rather than a mathematical geoscientist. Although several descriptive disciplinary definitions and terminologies about mathematical geosciences have been proposed in the literature, there is still a lack of systematic, inclusive, and relatively standard disciplinary definition. For example, mathematical geosciences are often referred to simply as the application of mathematical and statistical methods in geology (or in earth science), or the application of geological data analysis and quantitative prediction model development (Howarth 2017). The IAMG website once defines the mission of IAMG as: to promote the development and application of mathematics, statistics, and informatics in earth sciences. In the author’s view to define Mathematical Geosciences (MG) as a formal subject, or to simply define it as the application of mathematics, statistics, and informatics in the Earth sciences, is a fundamental question that has a vital impact on the development of the subject. For this reason, during his presidency of IAMG, the author of current chapter attempted to propose a definition of mathematical geosciences. It was published in as president forum on Newsletter (Issue 76–79) (Cheng 2014). Here I will elaborate on the definition of MG and give several examples to demonstrate the contributions MG made to geosciences.
Mathematical Geosciences
What Is Mathematical Geoscience (MG)? As mentioned earlier, Vistelius (1962) gave an earlier definition of mathematical geology and used it as the name of the Association. Geostatistics is a remarkable new subject of mathematical geology developed by IAMG scholars. Now the geostatistical theory and methods have been used not only in earth sciences but also in many other scientific fields. Geostatistics was once refereed as the application of statistical methods in geology or other geosciences (e.g., Merriam 1970; McCammon 1975), and in many cases this simple definition seems to be still in use. As mentioned in previous section, the term Geomathematics has also been used by several authors including Agterberg (1974), who used the term as the title of his two books (Agterberg 1974, 2014). Since the IAMG changed its name from Mathematical Geology to Mathematical Geosciences in 2008, the term Mathematical Geosciences has often appeared in IAMG’s literature including books, conference proceedings, and journals. The difference between mathematical geology and mathematical geosciences is not only in terms but also in the connotation and scope of the subject. If mathematical geology is a branch of geology, mathematical geosciences should be an interdisciplinary subject of geological sciences, and it must include mathematical geology as one of the branches of mathematical geosciences. Compared with geology, geosciences also cover other related disciplines including but not limited to geochemistry, geophysics, geobiology, and hydrology. Mathematical geosciences should be a branch of geosciences parallel to other cross-disciplines in earth sciences such as geochemistry, geophysics, and geobiology, rather than a branch of geology (Fig. 1). In the author’s opinion, this distinction is very important for the development of the subject. For example, under the concept of mathematical geology, the subject is limited to the application of mathematics, statistics, and informatics in geology, but as a subject of mathematical geosciences, similar to geochemistry and geophysics, MG serves the entire geosciences. So, what should be the definition of mathematical geosciences or geomathematics? What role should mathematical geosciences play in the earth science family? What are the major contributions of mathematical geoscientists to the development of earth science? What are the frontiers of mathematical geosciences today? In the future, how will mathematical geosciences develop in the context of the rapid development of big data, artificial intelligence, and other new mathematical theories? These questions are of interest to mathematical geoscientists who must provide answers, especially for the younger generation of students and researchers in geosciences. The following is a summary of the author’s thoughts during the past years about the above questions related to mathematical geosciences.
Mathematical Geosciences
803
Mathematical Geosciences, Fig. 1 Schematic diagrams showing the relationship between natural sciences and earth science (a) and how mathematical geosciences works (b)
To give a definition of mathematical geosciences, we might as well understand the situation of other related interdisciplinary subjects, such as geochemistry, geophysics, and geobiology. According to the descriptions in Collins English Dictionary, Geophysics is referred as “a science of studying the earth’s physical properties and the physical processes acting upon, above, and within the Earth.” Similarly, Geochemistry is a science that deals with the chemical composition of and chemical changes in the solid matter of the earth or a celestial body (Unabridged dictionary). Finally, Biogeosciences is referred as an interdisciplinary field of integrating geoscience and biological science: the study of the interaction of biological and geological processes (Unabridged dictionary). The above definitions emphasize on interdisciplinarity of intersection of basic natural sciences and earth sciences. Similarly, we can define mathematical geosciences as an interdisciplinary scientific subject that integrates mathematics, computer science, and earth science, and it is the study of the mathematical characteristics and processes of the earth (and other planets) and the prediction and analysis of its resources and evaluation of environment (Cheng 2014). The core issues of the new definition are the characteristics and processes of the earth. The concepts of chemical and physical properties of the earth, and chemical, physical, and biological processes of the earth are relatively easy to understand and so do the definitions of geochemistry, geophysics, and geobiology. But what are the mathematical characteristics, properties, and processes of the Earth? And why is the prediction and evaluation of the earth’s resources and environment inherently related to mathematics? These are key questions to answer to convince that mathematical geosciences can be regarded as an independent but interdisciplinary discipline which distinguishes it from other related disciplines. So, let’s first review the main differences among mathematics, physics, and chemistry. Most people started to learn these basic natural courses in elementary or middle schools. Chemistry is mainly about studies of composition, properties, structure, and change laws of matter at the molecular and atomic microscopic level; physics is mainly concerned with studies of physical properties and processes
of matter, including the most general laws of motion and basic structure of matter; and mathematics can be referred as studies of quantity, structure, change, space, information, etc. The integrations of these basic disciplines and earth sciences form various interdisciplinary geoscience subjects such as geochemistry, geophysics, geomathematics, and geobiology. Actual integration of multiple branches of basic sciences such as chemistry with earth science has formed various subdisciplines of geochemistry, including major element geochemistry, trace element geochemistry, stable isotope geochemistry, and isotope chronology. Various analytical and testing techniques developed in the field of chemistry have been adopted into geochemistry such as mass spectrometry, fluorescence analysis, and laser ablation imaging technique. Similarly, multiple branches of physics are integrated with earth sciences to form multiple geophysical directions, such as gravity, magnetism, electricity, semiology, radioactivity, superconductivity, and other geophysical directions. Accordingly, various types of physical detection and survey technologies have been developed and applied in geosciences, including gravitational measurement, nuclear magnetic resonance, magnetic measurement, seismic measurement and electrical measurement. Similarly, various mathematical disciplines such as geometry, calculus, power spectrum analysis, morphology, topology, probability and statistics, and fuzzy mathematics can provide indispensable theories and methods for quantitative studies of relevant properties of the Earth. The characteristics and properties of the Earth that must be studied by means of mathematics include but not limited to the geometric characteristics, dynamic characteristics, the uncertainty of earth observation and measurement, the prediction error of earth events, and so on. These important issues related to the Earth are all mathematical problems in nature which cannot be studied without mathematics. Therefore, the integration of these subdisciplines of mathematics with geoscience forms different directions of mathematical geosciences. For example, the integration of probabilistic statistics and geosciences produces results in a new subject of geostatistics and the integration of calculus and geosciences creates a subject of
M
804
Mathematical Geosciences
mathematical geodynamics. Theories and methods developed in Geometry can be used in earth science research to solve various geometric problems of the earth, such as the shape, coordinates, angle, length, volume, quantity, distance, and direction of the Earth. Many examples can be cited to demonstrate effective utilization of various mathematical theories and methods in solving geoscientific problems. In the next section, some selected examples will be introduced to show how mathematical geoscientists have played an important role in earth science research, what significant contributions that mathematical geoscientists have made in the development and advancement of modern earth science theories and solving applied problems related to sustainable development, and how they will continue to play an important and irreplaceable role in the innovation of earth science in the future. Mathematical Geosciences, Fig. 2 The mathematical model of the geometric shape of the earth as an ellipsoid
Examples of Important Contributions of Mathematical Geoscientists to Earth Science There are many examples to show how mathematical geosciences and mathematical geoscientists have made indispensable and fundamental contributions to the development of modern earth sciences. Here just name a few examples. Earth Geometry The geometric shape of the Earth is an important property of the Earth which serves the base for quantitative earth science. Mathematical models of the shape of the earth, such as Clark ellipsoid and Hayford ellipsoid, are the basic geodetic models for positioning and navigating systems (such as GPS and BDS) and other types of remote sensing technology (RS) for spatial measurement and spatial analysis. The shape of the earth is irregular and complicated, but in order to establish an earth coordinate system, a mathematical model needs to be used to approximately represent the shape of the earth, which is the earth ellipse (Elipsoid or Spheroid) (Fig. 2). The equation of the ellipsoid in the Cartesian coordinate system is: x 2 y2 z2 þ þ ¼1 a2 b2 c 2
ð1Þ
where a and b are the equatorial radius along the x and y axis, and c is the polar radius along the z axis. These three parameters determine the shape of the ellipsoid, including the flatness or flatness rate of the earth’s ellipse. To elaborate on the relationship between the ellipse and the Geoid is beyond the scope of this chapter. The point worth mentioning is that the shape of the Earth is indeed a mathematical feature of the Earth. As a matter of fact, the first to establish the earth ellipse model and name it Geodesy was the British scholar Alexander Ross Clarke (1828–1914). He is an internationally renowned mathematician and geodesist. His important contributions
include the calculation of the British Geodetic Triangulation (1858), the calculation of the shape of the Earth (1858–1880), and the publication of a monograph on Geodesy (1880). This is an excellent example done by a pioneer mathematical geoscientist for modeling the geometrical property of the Earth. Age of the Earth The age of the Earth is a curios question of the earth science. Before the invention of isotope geochemistry there was no consistent answer about the age of the Earth. One of the wellknown estimates of the age of the earth was given by William Thomson, 1st Baron Kelvin (1824–1907), a British mathematician, mathematical physicist, and engineer. Kelvin estimated the age of the Earth by investigating the Earth’s cooling and making historical inferences of the Earth’s age from his calculations. Kelvin calculated how long it would have taken Earth to cool if it had begun as a molten mass. Kelvin made several assumptions about the earth to set his conductive model for estimating the age of the Earth. These assumptions include that the Earth’s heat was originally generated by gravitational energy; for a short period of time, say 50,000 years the Earth cooled from a temperature of about 3700 C to the present temperature of about 0 C; since then, the average temperature of the Earth’s surface has not changed significantly; and the interior of the Earth is solid and a secular loss of heat from the whole Earth through heat conduction. Kelvin established a heat conduction equation from which he derived an estimate of 20–40 million years of age of the Earth. While his estimate was proved wrong by the discovery of radioactivity and the radioactive dating technology which yields an estimate of 4.54 billion years old, his heat conduction model of the Earth based on observations and calculations was an accurate and similar conduction models have been applied in other fields of geodynamics including in
Mathematical Geosciences
simulating heat flow processes in mid-ocean ridges as to be introduced in later section of the chapter. William Thomson is also well known for his work in the mathematical analysis of electricity and formulation of the first and second laws of thermodynamics. Kelvin’s work in relevance to the topic of the current chapter is also due to the term of “Mathematical Geology” was used, might have been for the first time in the literature, as the title of review paper introducing Kelvin’s work in Nature by Lodge (1894). Geodynamics of the Plate Tectonics The modern plate tectonics theory being advanced in the 1960s has become a profound revolution in the field of earth science, which provides scientific explanations about evolution of the Earth systems and formations and distributions of major geological events such as earthquakes, volcanoes, magma, and orogeny as well as minerals, water, and energy resources. It is believed that the theory of plate tectonics has completely revolutionized the global view of the solid earth and promoted the formation and development of the concept of the earth system sciences. The pioneers who have made important foundational contributions to modern plate tectonics theory include but not limited to Dan Mekenzie of Cambridge University in the United Kingdom, Jason Morgan of Princeton University in the United States, Professor Xavier Le Pichon of France and Professor Tuzo Wilson of the University of Toronto in Canada. Among them, Dan Mekenzie is an eminent geophysicist and geomathematician. Professor Mekenzie received a scholarship in mathematics and entered the natural sciences in university in his early age. He had systematically studied mathematics, physics, chemistry, and geology, and graduated with a doctorate at the age of 23. He had creatively applied Euler vector theorem and calculus to describe geodynamics of earth plate tectonics and achieved remarkable results which demonstrate the essential role of mathematics in the establishment of plate tectonics theory. Mathematical Principles of Plate Tectonics
Dan Mekenzie’s most influential result was the paper he published in collaboration with Robert Parker (Mckenzie and Parker 1967). This paper uses Euler’s rotation theorem to prove the conjugate relationship of the three sides of rigid plates: mid-ocean ridges, subduction zones, and transform faults. Euler’s rotation theorem shows that a rigid body (such as a plate) is displaced in three-dimensional space, and at least one point inside it is fixed. The displacement of a rigid body is equivalent to a rotation around a fixed axis containing the fixed point (Fig. 3). This achievement is praised by colleagues as the mathematical principle that defines plate tectonics on a sphere, the most theoretical work of the plate tectonics.
805
Mathematical Model of Mantle Convection
Another influential paper by McKenzie (1967) is the result of the heat flow and gravity anomalies in mid-ocean ridges. This model modified the seafloor expansion model established by the American scholar Harry Hess (1960) and established the mid-ocean ridge heat flow diffusion model and the mantle convection mathematical model. This work had laid the foundation for a series of geodynamic models subsequently developed in the field of plate tectonics for characterizing plate tectonics including thermal structures of plates, subducting slab dynamics and dynamics of basin formation. Figure 4 shows the parameters and boundary conditions of the McKenzie model. The plates on both sides of the mid-ocean ridge are regarded as constant rigid bodies with regular boundaries, with constant expansion speed (V) and gradual change of thickness (a), and the boundary conditions of temperature (T1) on bottom boundary and temperature (T0) on top boundary of the plate (lithosphere). Under some simplifications, the heat flux can be expressed by the following diffusion equation: @ ½rCP ðT ÞT @ @T ¼ k ðT Þ @t @z @z
ð2Þ
where r the mass density of the lithosphere; T temperature; CP specific heat capacity; t time; k thermal diffusivity; and z depth. The upper left picture in Fig. 4 shows the mid-ocean ridge heat flow diffusion model and boundary and initial conditions established by McKenzie (1967). The upper right picture in Fig. 4 shows the comparison between the actual measured heat flow results and the theoretical prediction results using the McKenzie model. The difference between the two results is relatively large. Many authors believed that the difference is due to the heat loss caused by the convection of hydrothermal fluid near the mid-ocean ridge. Starting from the complexity and fractal structure of the mid-ocean ridge lithosphere, the author of this book chapter substituted the lithospheric density in the McKenzie model with the fractal density proposed by the author and re-established the diffusion model. The re-established diffusion model significantly improved the results for prediction of heat flow (Cheng 2016). McKenzie and many later authors established basin formation models based on plate thermal diffusion models, slab thermal structure, subduction models, etc. Cheng (2016) and subsequent series of results have demonstrated that classical mathematical models such as thermal diffusion are effective for smooth, gradual and linear geological processes, but they are less effective in simulation of singular, abrupt, and nonlinear geological processes and geological events. For the latter, the fractal density and the principle of singularity need to be used. This will be introduced in Sect. 3 of the chapter.
M
806
Mathematical Geosciences
Mathematical Geosciences, Fig. 3 (a) Euler geometry applied to plate tectonics; (b) explanation of the conjugate relationship among the mid-ocean ridge, subduction zone, and transformation fault (adapted from McKenzie and Parker 1967) according to Euler vector theorem; (c) map showing the conjugate relationship among mid-ocean ridge, subduction zone, and transform faults in the Pacific/ North America subduction zone. (Adapted from Johnson and Embley 1990)
Mineral Crystals Mineralogy is a basic subject of geological sciences. Chemical composition and crystal structure determine the physical and chemical properties of a mineral. Crystallography is a subject of study of the crystal structure of the atomic arrangement of minerals. At the microscopic atomic level, the crystal structure of a mineral can be expressed as a crystal lattice, the smallest unit of a crystal at the molecular level. Therefore, the geometric shape and mathematical properties of the crystal lattice are important parameters for determining the properties of minerals and for minerals classification. Mathematics has played an irreplaceable role in studying the crystal lattice.
Due to the significant contributions of scientists in the field of crystals, crystallography has become a new branch of mathematics. Mathematical crystallography is still a very active field of research today (Nespolo 2008). Mathematical System of Lattice and Crystals
There are many international distinguished scientists and mathematicians who work in the field of crystallography. Here just name a few examples: German mineralogist, physicist, and mathematician Christian Samuel Weiss (1780–1856) was a pioneer who made fundamental contribution to the modern crystallography. He established crystal
Mathematical Geosciences
807
M
Mathematical Geosciences, Fig. 4 The mid-ocean ridge heat flow model by McKenzie. (Adapted from McKenzie 1967). (a) Model boundary conditions; (b) comparison of McKenzie model theoretical curve and
observed data, showing large deviations between the two; and (c) heatflow profile showing much improved agreement with a new model (by author) using fractal density (black line)
systems and a classification scheme based on crystallographic axes (Fig. 5). He established the Weiss crystal face index or later called the Miller crystal face index (Weiss indices or Miller indices) and their mathematical algebraic relationships (Weiss 1820), which has been termed Weiss’s zone law that has been used till today (Fig. 5). Another giant scientist whom I cannot fail to mention is the French physicist and statistician Auguste Bravais (1811–1863). He is not only a great physicist but also an outstanding statistician. He has taught a course in applied mathematics to students of the Department of Astronomy in the Faculty of Natural Sciences since 1840. Bravais published a paper on statistical correlation in 1844 which later became known as the famous Karl Pearson correlation coefficient. Today the Pearson correlation coefficient is still the most used statistical method to measure correlation in natural sciences, social sciences, and economics. Bravais also studied the observation error theory and published the research results of the mathematical analysis of error probability in 1946. Of
course, the most known academic contribution of Bravais is the work on crystallography, especially the lattice system established in 1848 (Fig. 5). He mathematically proved that there were 14 unique lattices in three-dimensional crystalline systems, which were later called Bravais lattices. Crystal Lattice Symmetry and the Mathematical System of Mineral Classification
The physical and chemical properties of minerals are not only dependent on their chemical compositions but also constrained by their crystal structures. Crystal symmetry refers to the property of the repetition of the same part of a unit cell. It must be achieved through a certain symmetry operation. Crystal symmetry is a primary geometric characteristic of the crystal lattice that has been found as an essential attribute that affects the physical and chemical properties of minerals, such as the optical properties of minerals. There are three types of symmetry operations: rotation, reflection, and
808
Mathematical Geosciences
Mathematical Geosciences, Fig. 5 Crystallographic systems. (a) Symmetry of crystal unit cells; (b) crystal systems; (c) Weiss or Miller indices
Mathematical Geosciences
809
inversion. Each kind of symmetry can be expressed by a symmetry element, and various combinations of symmetry elements form multiple but not independent combination types. In 1830, the German mineralogist Johan Friedrich Hesse used mathematical deduction to study the symmetry operation and mathematically proved that there are 32 independent crystal symmetries. Thus, mineral crystals should be divided into 32 groups (Bradley and Cracknell 2009). However, the 32 groups of minerals were not completely observed at that time, and it was gradually conformed later. This indeed indicates that answers for complex issues could be predicted when proper mathematical models and laws are established.
objects that are preserved under continuous deformation. Simple topological relations include orientational relations (such as inside and outside, up and down, left and right), connectivity and directionality, etc. Topological theories and methods are widely used in many fields of natural science (Richeson 2008). For example, the concept of topology has been used in geographic information system (GIS) for organizing spatial data in topological data model and for spatial topological relationship analysis. The capability of handling topological relations for spatial information analysis distinguishes GIS from other computer graphics and image processing systems.
Sedimentology-Mathematical Index of Sediment Grain Size Classification Sedimentology is a basic geological discipline for studying sediments including exploring the origin, transport, deposition, and diagenetic alterations of the materials that compose sediments and sedimentary rocks. The grain size and its distribution of sediments can reflect the lithofacies and paleogeographic environment as well as other mechanisms, such as transportation, sorting, suspension, and deposition. The grain size may also affect other properties of sediments such as porosity, permeability, load intensity, and chemical reactivity. Therefore, grain size analysis is a primary task in sedimentology. According to grain size of sediment, sedimentary rocks are classified as: mudstone (2s) based on normal or log-normal distributions; (b) pattern delineated according
to varying elemental concentrations in a mineral district depicts selfsimilarity (adapted from Cheng et al. 1994); and (c) the multifractal model (adapted from Cheng 2007a, 2015a)
prediction (Wang et al. 1990), and nonlinear methods for mineral resources prediction on the basis of fractal singularity theory (Cheng 2007b, 2008). In recent years, exploration for mineral, water, and energy resources has become the forefront of geoscience due to the shift from traditional areas for shallow and low-cost resources to nontraditional regions such as deep earth, remote areas, covered areas, deep ocean, and deep space for strategic resources including critical minerals. Here are just a few examples to show the recent activities of resources assessments in nontraditional areas, and USGS
conducted a statistical assessment of oil and gas resources in the Arctic in 2009. The results of the study show that about 30% of the world’s undiscovered gas and 13% of the world’s undiscovered oil may exist in Arctic (Gautier et al. 2009). In 2018, Japanese scholars reported discoveries of considerable amounts of rare-earth elements and yttrium (REY) resources in deep sea mud in the western North Pacific Ocean (Takaya et al. 2018). Italian scientists claimed they found evidence of liquid water in a subglacial lake with a diameter of 20 km below the ice of the South Polar Layered Deposits on Mars
812
(Orosei et al. 2018). For the past decade, the author of this chapter and his research team have been focusing on quantitative prediction and evaluation of mineral resources in the regions covered by transported materials in several provinces of China. Figure 7 shows a successful example of mapping mineral potential for Pb/Zn/Cu and other metals in the Lanping-Jinding area of Yunnan Province. Stream sediment geochemical data (Fig. 7a) were processed using the Spectrum-Area (S-A) fractal filtering method (Cheng 2012a). Use of the S-A method could effectively decompose the weak anomalous from variable background signals (Cheng and Xu 1998; Cheng et al. 2000). The S-A model successfully delineated mineralization-induced anomalies by eliminating
Mathematical Geosciences
the influence of complex background, especially the interference from the basalt containing high concentration of medals. Five copper-lead-zinc prospecting target areas with equal spatial intervals conforming self-similar distributions have been delineated (Fig. 7b). Based on the above findings, the Geological Survey Institute affiliated to Yunnan Geological Survey carried further exploration and successfully discovered several mineral deposits in the target areas. The same method was further utilized by the Yunnan Geological Survey for mineral prospecting in the entire Sanjiang area and more mineral deposits were discovered in the delineated target areas.
Mathematical Geosciences, Fig. 7 Application of spectral energy density – area method (S-A) to Zn anomaly and related background analyses in the Lanping-Jinding area in Yunnan Province. (a) Distribution of Zn anomalies; and (b) newly identified Cu-Pb-Zn prospecting areas
Mathematical Geosciences
Frontiers of Mathematical Geosciences Some contributions of mathematical geoscientists to related fields of geosciences have been introduced in the previous section. Next, we will briefly discuss the frontiers of mathematical geosciences, aiming to answer the question are mathematical geoscientists at the earth science frontiers? To answer the question, one approach is to summarize current scientific research in mathematical geosciences, and the other is to study the frontiers of geosciences and the strategic programs of some large international scientific organizations, such as the Future Earth 2025 vision, 2021 Strategic Challenge Domains of ISC, the Resourcing Future Generations (RFG) (Lambert et al. 2013) and Deep-time Digital Earth (DDE) (Cheng et al. 2020) programs led by International Union of Geological Sciences (IUGS), the newly released “Earth in Time, A Vision for NSF Earth Sciences” 2020–2030 by National Academies of Sciences, Engineering and Medicine (2020), US Geological Survey -Century Science Strategy for 2020–2030 (USGS 2021), and the new 2050 Science Framework: Exploring Earth by Scientific Ocean Drilling proposed by the International Ocean Discovery Program (IODP) (Koppers and Coggon 2020). These strategic reports and big science programs have revealed the developing trend of geosciences in next decade from different perspectives. Furthermore, the author has maintained close communication with many international associations through international programs and meetings, including ISC-CSP, IUGG, AGU, EGU, SGA, PDAC, and CGU. Some key topics that can be extracted from above sources of information can reflect the current trends and frontiers of the earth sciences, such as data science, data analysis, computing, interdiscipline, uncertainty of measurement and prediction, geometry and dynamics of the Earth, climate change and the Arctic, Antarctic, and Tibet Plateau. It also reflects the three major challenges faced by geoscientists in understanding the interaction between the earth system and the human system, understanding the past, present, and future changes of the livable earth’s environment and predicting extreme events on the Earth: (1) The complexity of multilayer interaction of earth system; (2) the chaotic nature of earth processes and the singularity and predictability of extreme earth events; and (3) the effectiveness of observation and monitoring of multiscale nonlinear mixing processes. These challenges are closely related to mathematical geosciences. Innovations on new mathematical theories, models, and computer software are the prerequisite for dealing with these challenges. Therefore, mathematical geoscientists, confronting great challenges and opportunities, are at the earth science frontiers. Many important progresses have been made by mathematical geoscientists in the earth science frontiers and two examples will be detailed as follows: quantitative simulation of
813
geocomplexity and applications of big data and artificial intelligence (AI). Modelling Geocomplexity of the Earth: A New Growing Field of Mathematical Geosciences Understanding and simulating the complexity and chaotic behaviors of the earth systems is a long-term task for scientists. However, there is no unique scientific definition of complexity of the earth system. Some authors regard modeling geocomplexity as a new kind of science (Wolfarm 2002; Turcotte 2006). These authors emphasize that the geocomplexity refers to nonlinear geological properties that cannot be described or characterized by classical mathematical calculus equations. Simple geological processes can be modeled by means of classical mathematical equations with proper boundary conditions and initial conditions. The temporal and spatial distribution of geological processes can be deterministically predicted, for example, the heat flow at the plate margin over the mid-ocean ridges can be estimated by setting up a diffusion model assuming the plate near the mid-ocean ridge as a regular rigid body, its upper and lower surface temperatures are used as boundary conditions, and the mid-ocean ridge is set as the initial condition. Under several simplifications the analytical or numerical solutions of the established thermal diffusion equation can be derived to estimate the heat flux at any position of the plate. However, many complex geological processes cannot be precisely modeled by the above classical differential equations. For example, it is impossible to precisely describe turbulence by deterministic mathematical model due to statistical uncertainty. In order to study such processes and phenomena, American mathematicians and meteorologists Lorenz (1963) proposed Chaos theory and French American mathematician Mandelbrot (1967) proposed fractal theory in 1960s. For a chaotic system or chaotic process, the output of prediction model behaves very differently if a small disturbance in the initial conditions or boundary conditions. Chaotic systems often appear random and unpredictable, but in fact have regular patterns with scale-invariant properties or fractality (Valentine et al. 2002). Fractal geometry is a new branch of mathematics that describes geometry from the perspective of fractional dimensions, including new geometries between traditional points, lines, surfaces, and volumes (Lovejoy et al. 2009). Fractal geometry is irregular geometry or patterns such as Mandelbrot set and Menger sponge with scale invariant property or self-similarity. Obviously, the concept of fractal geometry has greatly generalized the scope of traditional geometry which provides new effective mathematical tools for studying geocomplexity (Mandelbrot 1967). In recent years, fractal geometry theory and fractal models have been used in almost all disciplines of earth sciences. Just to name a few examples, frequency–size power law models were utilized to
M
814
characterize earthquake magnitude-number frequency relationship (Gutenberg and Richter 1944), area-number of plates (Sornette and Pisarenko 2003), plate boundary morphology distribution (Ouillon et al. 1996), magmatic rock areafrequency distribution (Pelletier 1999), volcanic eruption interval distribution (Cannavò and Nunnari 2016), deposit size-number distribution (Agterberg 1995), mineralized alteration zone area-perimeter distribution (Cheng 1995), and geochemical anomalies concentration and area distribution (Cheng 2007b). With the development of fractal geometry and its application in various fields, the concept of fractal and fractal theory have also been further developed. For example, the proposal of multifractal theory (Mandelbrot 1977) has extended fractals from fractal geometry to self-similar distribution of measures or patterns defined on geometric supports (it can also be fractal geometry) which marks an important leap-forward development of fractal theory (Mandelbrot 1989). Multifractal model can be considered as a new distribution model unifying normal, lognormal, and Pareto distributions. Theory of multifractals provides not only a new way to describe complex spatial patterns with self-similarity but also a basis for further development of fractal calculus and fractal dynamics system (Lovejoy 1991). Since geometric set theory is the foundation of mathematical calculus including integral and differential operations. Only when the mathematical system of geometric measures and calculus operations is established can geometry play a greater role in mathematics. How to build mathematical calculus operations and analysis of other physical quantities related to fractals such as specific gravity, density, and thermal conductivity. Establish calculus based on fractal geometry is a unsolved question which has not been systematically explored in the literature. At the present time, the research progress in this area is still very slow, and more efforts especially by young scholars are encouraged. The author has been working on the above problems for a long time and first discovered that the concentration values of ore elements in mineralized rocks may follow the multifractal distribution with self-similarity between average concentration levels (density) and areas of mineralized rocks. Further a fractal model (C-A model) was derived to associate the average concentration values and the irregular areas (fractals) delineated with the concentration values around concentration anomalies. This model shows that the concentration (density) of elements on a fractal depicts power law relations between density and area with a negative fractional exponent, implying the concentration within mineralized areas depicts singularity – infinity large of concentration vs infinity small area. The author termed this type of density as “fractal density,” the density defined on fractals or a density with fractal dimension (Fig. 8a). Accordingly proper mathematical model and the local singularity calculation method have been developed to describe fractal density caused by nonlinear
Mathematical Geosciences
geological processes (earthquakes, mineralization, magmatic events, volcanic events, etc.). Various extreme events (earthquakes, volcanoes, magmatism, mineralization, tsunamis, landslides, etc.) caused by sudden changes of geological systems were studied to explore their common physical mechanism of formation (Fig. 8b, c) and the singularity of the system outputs. The main types of mechanism that explain most of these extreme events may include but not limited to phase transition mechanisms (supercritical fluid, Moho discontinuity, subduction slab interface, interaction surface between deep magma and surface water, etc.), self-organized criticality (lithospheric delamination, rock fracture, slab fragmentation, etc.), and multiplicative cascading cyclic processes, dissipative structures, and chaotic processes (turbidity current, magma convection, mantle flow, metamorphism, etc.) (Fig. 8b, c) (Cheng 2007b, c, 2008, 2018a). The author refers to such extreme geological events as singular geological events or events with singularity. The metric fractal density concept and the local singularity analysis (LSA) method developed have been successfully applied to the quantitative simulation and prediction of a variety of extreme geological events (Cheng 2007b). These include the identification of geochemical anomalies caused by concealed ore bodies in the coverage area and the delineation of prospecting target areas (Cheng 2007b, 2015a), seismic probability density-magnitude model (Cheng and Sun 2018), magma flare-up events due to collision of Indian continent and Eurasia continent (Cheng 2018b), anomalous heat flow over the mid-ocean ridges (Cheng 2016), large-scale magma episodic activity and prediction of plates tectonic future evolution (Cheng 2018b, c); and the singularity of lithospheric phase transformation (Cheng 2018a). Big Data Analytics and Complex Artificial Intelligence Big data and artificial intelligence (AI) have significantly influenced our life and gradually became research hotspots in many fields of natural science, social science, and economics. However, scientists from different fields may have different opinions about the impact of big data and AI in their research fields. For mathematical geoscientists, they are both the contributors to the development of the data science and the users taking full advantage of big data and AI technology and using them for solving practical problems. One of the good questions for mathematical geoscientists is how to further develop big data analytics and AI technology to enhance the applications. AI and ML are based on mathematical models, algorithms, and computer performance. Some common ML techniques include linear regression, logistic regression, artificial neural network, naïve Bayes, weights of evidence, random forests, adaBoost, and principal component analysis. Besides, some traditional mathematical methods such as Monte Carlo
Mathematical Geosciences Mathematical Geosciences, Fig. 8 Concepts of fractal density and its mechanisms and possible geological processes that may result in fractal density. (a) Normal density and fractal density; (b) examples of geological processes that may result in fractal density; and (c) mechanisms of fractal density generation. (The figure describing self-organized criticality was adapted from Wiesenfeld et al. 1989)
815
3D cube
2.73D Sponge (B)
(A) Normal geometry
Fractal geometry
Normal = density
Mass (g/m 3) Volume
advance
Fractal = density
Mass (g/m 2.73) fractal
(B)
Mineralizaon
(C) Pressure (Pa)
Energy density (me)
Earthquakes
Heat flow density Heat
Energy
Mass
Mass density (space)
Heat flow
M
Super cucal fluid
Liquid
Solid
Gas
Temperature (K)
Phase transion (PT)
simulation and least-square fitting are often used in ML. For example, the research algorithm for AlphaGo combined Monte Carlo simulation with value and policy networks (Silver et al. 2016). The abovementioned ML methods are common in mathematical geosciences research. Over the past few decades, these methods have demonstrated great potential for problem solving in mathematical geosciences, particularly for mineral potential mapping and natural resource assessment (Agterberg 1989; Bonham-Carter 1994). In addition, mathematical geoscientists have developed many new models, methods as well as software which have enriched and consummated the function of AI. For example, various variations of the weight of evidence models based on the Bayesian principle have been established ranging from the original
Self-organized cricality (SOC)
Mulplicave cascade or chaoc processes (MCP)
model that requires complete conditional independence of predictors (Naive Bayes, Naïve Bayes, or normal weight of evidence, Weights of Evidence) (Agterberg 1989), to a new model based on predictors with a weak conditional independence (AdaBoost-weight of evidence, AdaBoost Weights of Evidence) (Cheng 2008, 2012b, 2015b) and then to a modified model that does not require conditional independence (weighted weights of evidence) (Journel 2002; Cheng 2008; Zhang et al. 2009; Agterberg 2011). With the rapid development of deep learning, the research in the field of machine learning using artificial neural network (ANN) models has increased dramatically in recent years. However, even though the current ANN can theoretically approximate all Lebesgue integral function per universal
816
approximation theorem, it is not capable of approximating complex non-integrable functions at present due to singularity, non-integrability, and non-differentiability. It often needs to set a large number of neurons and more layers (depth) of neurons to approximate complex functions. This of cause may increase the number of model parameters, which in turn requires more training data and computational power. Otherwise, the results would be of low stability and even divergent, leading to lower repeatability and precision. How to develop ANN models for solving fractal density related problem caused by singular extreme geological events remains as a question which still requires further investigation. This kind of AI should probably be defined as complex artificial intelligence or local precision artificial intelligence.
Conclusions Mathematical geosciences is an interdisciplinary science discipline of studying earth characteristics and geological processes, as well as quantitative prediction and evaluation of resources and environment including disasters. Together with other geoscience disciplines MG plays a dispensable role in studying the evolution of the livable earth system, predicting and evaluating various extreme geological events and serving the sustainable development of society. Mathematical geoscientists have made and will continue to make important contributions to the establishment of the modern earth science system and the development of innovative theories and methods to meet the major needs of mankind. Mathematical geoscientists are already at the forefront of earth sciences. It is necessary to publicize the scientific content of mathematical geosciences comprehensively and correctly. It needs to motivate the interest of school and university students in integrating mathematics and geosciences and to encourage and cultivate more young talents to devote them to the MG. Earth science frontiers in the twenty-first century with supports of big data and artificial intelligence require that earth science moves from simple to complex, from local to global, from description to mechanism, from qualitative to quantitative, and from modeling to prediction. Mathematical Geosciences is an emerging discipline full of hope and prospects.
Bibliography Agterberg F (1974) Geomathematics, mathematical background and geo-science application. Elsevier, Amsterdam Agterberg F (1989) Computer programs for mineral exploration. Science 245(4913):76–81 Agterberg F (1995) Multifractal modeling of the sizes and grades of giant and supergiant deposits. Int Geol Rev 37(1):1–8
Mathematical Geosciences Agterberg F (2011) A modified weights-of-evidence method for regional mineral resource estimation. Nat Resour Res 20(2):95–101 Agterberg F (2014) Geomathematics: theoretical foundations, applications and future developments. Springer, Cham Ahrens L (1953) A fundamental law of geochemistry. Nature 172(4390): 1148–1148 Aubrey K (1954) Frequency distribution of the concentrations of elements in rocks. Nature 174(4420):141–142 Bonham-Carter G (1994) Geographic information systems for geoscientists-modeling with GIS. Comput Methods Geosci 13:398 Bradley C, Cracknell A (2009) The mathematical theory of symmetry in solids: representation theory for point groups and space groups. Oxford University Press, Oxford Cannavò F, Nunnari G (2016) On a possible unified scaling law for volcanic eruption durations. Sci Rep-UK 6(1):22289 Cheng Q (1995) The perimeter-area fractal model and its application to geology. Math Geol 27(1):69–82 Cheng Q (2007a) Multifractal imaging filtering and decomposition methods in space, Fourier frequency, and Eigen domains. Nonlinear Proc Geophys 14(3):293–303 Cheng Q (2007b) Mapping singularities with stream sediment geochemical data for prediction of undiscovered mineral deposits in Gejiu, Yunnan Province, China. Ore Geol Rev 32(1/2):314–324 Cheng Q (2007c) Singular mineralization processes and mineral resources quantitative prediction: new theories and methods. Earth Sci Front 14(5):42–53. (In Chinese with English Abstract) Cheng Q (2008) Non-linear theory and power-law models for information integration and mineral resources quantitative assessments. Math Geosci 40(5):503–532 Cheng Q (2012a) Singularity theory and methods for mapping geochemical anomalies caused by buried sources and for predicting undiscovered mineral deposits in covered areas. J Geochem Explor 122:55–70 Cheng Q (2012b) Application of a newly developed boost Weights of Evidence model (BoostWofE) for mineral resources quantitative assessments. J Jilin Univ 42(6):1976–1985 Cheng Q (2014) Generalized binomial multiplicative cascade processes and asymmetrical multifractal distributions. Nonlinear Proc Geoph 21(2):477–487 Cheng Q (2015a) Multifractal interpolation method for spatial data with singularities. J S Afr Min Metall 115(3):235–240 Cheng Q (2015b) BoostWofE: a new sequential weights of evidence model reducing the effect of conditional dependency. Math Geosci 47(5):591–621 Cheng Q (2016) Fractal density and singularity analysis of heat flow over ocean ridges. Sci Rep-UK 6(1):19167 Cheng Q (2018a) Mathematical geosciences: local singularity analysis of nonlinear earth processes and extreme geo-events. In: Handbook of mathematical geosciences. Springer, Cham Cheng Q (2018b) Singularity analysis of magmatic flare-ups caused by India-Asia collisions. J Geochem Explor 189:25–31 Cheng Q (2018c) Extrapolations of secular trends in magmatic intensity and mantle cooling: implications for future evolution of plate tectonics. Gondwana Res 63:268–273 Cheng Q, Sun H (2018) Variation of singularity of earthquake-size distribution with respect to tectonic regime. Geosci Front 9(2): 453–458 Cheng Q, Xu Y (1998) Geophysical data processing and interpreting and for mineral potential mapping in GIS environment. In: Proceedings of the fourth annual conference of the international association for mathematical geology. De Frede Editore, Napoli, Italy, pp 394–399 Cheng Q, Agterberg F, Ballantyne S (1994) The separation of geochemical anomalies from background by fractal methods. J Geochem Explor 51(2):109–130
Mathematical Geosciences Cheng Q, Xu Y, Grunsky E (2000) Integrated spatial and spectrum method for geochemical anomaly separation. Nat Resour Res 9(1): 43–52 Cheng Q, Oberhänsli R, Zhao M (2020) A new international initiative for facilitating data – driven Earth science transformation. Geol Soc Lond, Spec Publ 499(1):225–240 Clarke FW (1920) The Data of Geochemistry, 4th Edition, U. S. Geological Survey, Washington, D. C. pp. 832 Gautier D, Bird K, Charpentier R, Grantz A, Houseknecht D, Klett T, Moore T, Pitman J, Schenk C, Schuenemeyer J, Sørensen K, Tennyson M, Valin Z, Wandrey C (2009) Assessment of undiscovered oil and gas in the Arctic. Science 324(5931): 1175–1179 Gutenberg B, Richter C (1944) Frequency of earthquakes in California. B Seismol Soc Am 34(4):185–188 Hess H (1960) The evolution of ocean basins. Department of Geology, Princeton University, Princeton Howarth R (2017) Dictionary of mathematical geosciences: with historical notes. Springer, Cham Johnson HP, Embley RW (1990) Axial seamount: an active ridge axis volcano on the central Juan de Fuca Ridge. J Geophys Res-Solid Earth 95:12689–12696 Journel A (2002) Combining knowledge from diverse sources: an alternative to traditional data independence hypotheses. Math Geol 34(5): 573–596 Koppers A, Coggon R (2020) Exploring earth by scientific ocean drilling: 2050 science framework. UC San Diego Library Digital Collections, La Jolla Krumbein W (1934) Size frequency distributions of sediments. J Sediment Res 4(2):65–77 Lambert I, Durrheim R, Godoy M, Kota K, Leahy P, Ludden J, Nickless E, Oberhaensli R, Wang A, Williams N (2013) Resourcing future generations: a proposed new IUGS initiative. Episodes 36(2): 82–86 Lodge O (1894) Mathematical geology. Nature 50:289–293 Lorenz E (1963) Deterministic nonperiodic flow. J Atmos Sci 20(2): 130–141 Lovejoy W (1991) A survey of algorithmic methods for partially observed Markov decision processes. Ann Oper Res 28(1): 47–65 Lovejoy S, Agterberg F, Carsteanu A, Cheng Q, Davidsen J, Gaonac’h H, Gupta V, L’Heureux I, Liu W, Morris S, Sharma S, Shcherbakov R, Tarquis A, Turcotte D, Uritsky V (2009) Nonlinear geophysics: why we need it. EOS Trans Am Geophys Union 90(48): 455–456 Mandelbrot B (1967) How long is the coast of Britain? Statistical selfsimilarity and fractional dimension. Science 156(3775):636–638 Mandelbrot B (1977) Fractals: form, chance and dimension. W.H. Freeman, San Francisco Mandelbrot B (1989) Multifractal measures, especially for the geophysicist. Fractals in geophysics. Birkhäuser, Basel, pp 5–42 Mason B (1992) Victor Moritz Goldschmidt: father of modern geochemistry. Geochemical Society, Washington, DC McCammon R (1975) Concepts in geostatistics. Springer, Cham Mckenzie D (1967) Some remarks on heat flow and gravity anomalies. J Geophys Res 72(24):6261–6273 Mckenzie D, Parker R (1967) The North Pacific: an example of tectonics on a sphere. Nature 216(5122):1276–1280 Merriam D (1970) Geostatistics: A Colloquium (Computer Applications in the Earth Sciences). Plenum Press, New York, pp. 177 National Academies of Sciences, Engineering and Medicine (2020) A vision for NSF Earth Sciences 2020–2030: Earth in time. The National Academies Press, Washington, DC
817 Nespolo M (2008) Does mathematical crystallography still have a role in the XXI century? Acta Cryst A64(1):96–111 Orosei R, Lauro S, Pettinelli E, Cicchetti A, Coradini M, Cosciotti B, Paolo F, Flamini E, Mattei E, Pajola M, Soldovieri F, Cartacci M, Cassenti F, Frigeri A, Giuppi S, Martufi R, Masdea A, Mitri G, Nenna C, Noschese R, Restano M, Seu R (2018) Radar evidence of subglacial liquid water on Mars. Science 361(6401):490–493 Ouillon G, Castaing C, Sornette D (1996) Hierarchical geometry of faulting. J Geophys Res-Solid Earth 101(B3):5477–5487 Pelletier J (1999) Statistical self-similarity of magmatism and volcanism. J Geophys Res-Solid Earth 104(B7):15425–15438 Richeson D (2008) Euler’s Gem: the polyhedron formula and the birth of topology. Princeton University Press, Princeton Silver D, Huang A, Maddison C, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489 Singer D, Menzie W (2010) Quantitative mineral resource assessments: an integrated approach. Oxford University Press, Oxford Sornette D, Pisarenko V (2003) Fractal plate tectonics. Geophys Res Lett 30(3):1105 Takaya Y, Yasukawa K, Kawasaki T, Fujinaga K, Ohta J, Usu Y, Nakamura K, Kimura J, Chang Q, Hamada M, Dodbiba G, Nozaki T, Iijima K, Morisawa T, Kuwahara T, Ishida Y, Ichimura T, Kitazume M, Fujita T, Kato Y (2018) The tremendous potential of deep-sea mud as a source of rare-earth elements. Sci Rep-UK 8:5763. https://doi.org/10.1038/s41598-018-23,948-5 Turcotte D (2006) Modeling geocomplexity: “a new kind of science”. Special papers-Geological Society of America 413. p 39 Udden J (1898) The mechanical composition of wind deposits. Lutheran Augustana Book Concern, Lindsborg United States Geological Survey (2021) Geological survey 21st-century science strategy 2020–2030. Geological Survey, Reston Valentine G, Zhang D, Robinson B (2002) Modeling complex, nonlinear geological processes. Annu Rev Earth Planet Sci 30(1):35–64 Vistelius A (1962) Problems of mathematical geology: a contribution to the history of the problem. Geol Geophys 12(7):3–9 Wang S, Cheng Q, Fan J (1990) Integrated evaluation methods of information on gold resources. Jilin Science and Technology Press, Changchun. (In Chinese) Weiss C (1820) Über die theorie des epidotsystems. Abhandlungen der Königlichen Akademie der Wissenschaften zu Berlin pp 242–269 Wentworth C (1922) A scale of grade and class terms for clastic sediments. J Geol 30(5):377–392 Wiesenfeld K, Chao T, Bak P (1989) A physicist’s sandbox. J Stat Phys 54(5):1441–1458 Wolfram S (2002) A new kind of science: Champaign. Wolfram Media, Champaign Zhang S, Cheng Q, Zhang S, Xia Q (2009) Weighted weights of evidence and stepwise weights of evidence and their application in Sn–Cu mineral potential mapping in Gejiu, Yunnan Province, China. Earth Sci-J China Univ Geosci 34:281–286. (in Chinese with English abstract) Zhao P (1998) Geological anomaly and mineral prediction: modern theory and methods for mineral resources evaluation. The Geological Publishing House, Beijing Zhao P (2002) “Three-component” quantitative resource prediction and assessments: theory and practice of digital mineral prospecting. Earth Sci-J China Univ Geosci 27(5):482–489. (In Chinese with English Abstract)
M
818
Mathematical Minerals John H. Doveton Kansas Geological Survey, University of Kansas, Lawrence, KS, USA
Definition Mathematical minerals are calculated from chemical or physical measurements as theoretical realizations of actual mineral assemblages in rocks. The concept was first introduced as “normative” minerals in igneous rocks, where oxide analyses were assigned to minerals in a protocol that follows crystallization history. While there is only a general match with observed “modal” minerals, the intent of the procedure is to create hypothetical mineral assemblages which can then be compared between igneous rocks. The concept of normative minerals was extended to sedimentary rocks. However, in this case, the mathematical procedures used attempt to match as closely as possible the volumes of mineral species observed. In the most widely used modern application, mathematical mineral compositions are estimated in subsurface rocks from downhole measurements of physical properties or elements.
Normative Minerals in Igneous Rocks The Cross-Iddings-Pirsson-Washington (CIPW) norm has been a standard method for many years to transform oxides measured in igneous rocks to theoretical “normative minerals” (Cross et al. 1902). These norms represent hypothetically the minerals that would crystallize if the igneous source was cooled at low pressure under perfect dry equilibrium conditions. The procedure is governed by simplifying assumptions regarding both the order of mineral formation and phase relationships in the melt. These constraints result in a simple model that deviates from the course of naturally occurring igneous mineral differentiation. However, norms are useful for comparison and classification of igneous rocks, particularly for glassy and finely crystalline samples. Normative minerals differ from “modal” minerals observed in the rock, whose volumes are estimated typically from pointcounting procedures applied to thin sections.
Normative Minerals in Sedimentary Rocks As applied to sedimentary rocks, the calculation of normative minerals does not generally follow a genetic model, but simply aims to estimate mineral assemblages that are a close match with those observed. The earliest methods drew on
Mathematical Minerals
molecular ratios to calculate the mineral components of a rock, using oxide percentages from analysis (Krumbein and Pettijohn 1938). The suite of minerals to be solved was established from rock sample observation. A stepwise process was then applied to assign the molecular ratios to the mineral set. Unique associations between oxides and certain minerals were resolved first in this procedure, and then the remainder allocated between other components. Although estimates of standard rock-forming minerals such as quartz or calcite are generally close to volumes observed visually, clay mineral estimation is often problematic when matched with results from X-ray diffraction analysis. The discrepancy is also not surprising because clay-mineral compositions are highly variable, as a result of isomorphous substitution (Imbrie and Poldervaart 1959).
Mineral Compositions Computed from Wireline Petrophysical Logs The calculation of mineral volumes in the subsurface was originally a biproduct of methods to improve estimates of porosity in multimineral reservoir rocks. Lithologies composed of several minerals required several porosity logs to be run in combination in order to estimate volumetric porosity. In the simplest solution model, the proportions of multiple components together with porosity could be estimated from a set of simultaneous equations for the measured log responses. The earliest application of this technique was in Permian carbonate reservoirs in West Texas where measurements from density, neutron, and sonic logs were used to solve for volumes of dolomite, anhydrite, gypsum, and porosity (Savre 1963). Compositional analysis by the inverse solution is simple and robust. The unity equation dictates that computed proportions must collectively sum to unity. However, individual proportions can be negative or take values greater than unity. These unreasonable numbers are not caused by estimation error, but simply reflect the location of the sample outside the bounds of the composition space as set by the endmember mineral coordinates. Such results can be used as an informal feedback that may suggest the presence of unaccounted minerals or a need to adjust mineral endmember properties. By the same token, a satisfactory solution does not “prove” a composition, but rather a solution that is consistent with the measurements. The inversion model can be written as a set of simultaneous equations where the number of knowns and unknowns are matched in a “determined system.” However, when the number of logs is not sufficient for a unique solution, then the result is an underdetermined system. Alternatively, if the number of logs equals or exceeds the number of components, this becomes an overdetermined system. Both
Mathematical Minerals
underdetermined and overdetermined systems can be resolved by an expansion of the matrix algebra formulation of the determined system solution (Doveton 1994). Modern compositional analysis methods have advanced beyond basic inversion models, with the incorporation of tool error functions and constraints as integral components of the solution strategy. In particular, optimal solutions were set with the goal of minimizing the incoherence between log measurements and their values predicted from compositional analysis (Mayer and Sibbit 1980). Not all component log responses need to be estimated since their properties are restricted to a limited range. However, some mineral components have ambiguous properties. The most example of such components are clay minerals, whose composition is widely variable. Calibration to clay minerals in local core is the best practice to reduce estimation uncertainties (Quirein et al. 1986).
Estimation of Minerals in the Subsurface from Geochemical Logging The use of conventional logs for compositional analysis is implicit in the sense that physical properties are used to deduce mineral proportions. In contrast, elements measured by geochemical logs can be tied directly to mineral compositions. Natural gamma-ray spectroscopy measurements provide potassium, uranium, and thorium while neutron capture spectroscopy supplies silicon, calcium, iron, sulfur, titanium, and the rare earth, gadolinium. While not measured directly, magnesium is generally inferred from the photoelectric factor. Because there will generally be more minerals than elements to resolve them, the situation is mostly underdetermined. But in reality, most sedimentary rocks are dominated by only ten minerals: quartz, two carbonates, three feldspars, and four clays. Consequently, acceptable compositional solutions can be created with relatively small assemblages of minerals, as long as they have been identified appropriately and that the compositions used are consistent and reasonably accurate (Herron 1988). The solution is aided when some of the elements are associated explicitly with specific minerals. The partition of minerals between species then follows an ordered protocol of assignments. However, the application of simultaneous equations to resolve mineral compositions from elemental measurements provides a more general and effective procedure (Harvey et al. 1990). As a result of the explicit relations between elemental measures and mineral compositions, much improved mineral transforms have been developed than those of inference from mineral physical properties. The fundamental goal of an effective transform is to provide a close match between normative mineral solutions and modal mineral suites. This aim distinguishes it from the traditional norm of igneous rocks
819
which is based on hypothetical crystallization and only loosely tied to modal observations (Cross et al. 1902). However, a unique mineral transform from element concentrations may not always be possible in situations of compositional colinearity. If the model is exactly colinear, then there are an infinite range of solutions, creating a matrix singularity and an inversion breakdown (Harvey et al. 1990). In recent years, methods to resolve subsurface mineralogy estimation have increasingly relied on machine learning algorithms, building on pioneer work that applied simple artificial neural networks (Doveton 2014).
Summary Mathematical minerals are calculated from chemical or physical measurements of a rock sample as contrasted with actual minerals that are observed. The earliest methods were developed in igneous petrology, where “normative” minerals are computed from oxide analyses, based on a hypothetical and idealized protocol that mimics the order of mineral differentiation. These norms are only a general approximation of “modal” minerals, but are useful for classification purposes. By contrast, normative minerals in sedimentary rocks are computed to be a best match with observed mineral suites. The most common applications today occur in the subsurface, where downhole petrophysical measurements are used to estimate mineralogy by a variety of statistical procedures. The old terminology of “normative minerals” has been mostly replaced by the term of “mathematical minerals.”
Cross-References ▶ Artificial Neural Network ▶ Compositional Data ▶ Forward and Inverse Stratigraphic Models ▶ Machine Learning ▶ Porosity ▶ Statistical Rock Physics
Bibliography Cross W, Iddings JP, Pirsson LV, Washington HS (1902) A quantitative chemico-mineralogical classification and nomenclature of igneous rocks. J Geol 10:555–690 Doveton JH (1994) Geological log analysis using computer methods: computer applications in geology, No. 2. American Association of Petroleum Geologists, Tulsa, 169 pp Doveton JH (2014) Principles of mathematical petrophysics. Oxford University Press, 267 pp Harvey PK, Bristow JF, Lovell MA (1990) Mineral transforms and downhole geochemical measurements. Sci Drill 1(4):163–176
M
820 Herron MM (1988) Geochemical classification of terrigenous sands and shales from core or log data. J Sediment Petrol 58(5):820–829 Imbrie J, Poldervaart A (1959) Mineral compositions calculated from chemical analyses of sedimentary rocks. J Sediment Petrol 29(4): 588–595 Krumbein WC, Pettijohn FJ (1938) Manual of sedimentary petrography. Appleton-Century-Crofts, New York, 549 pp Mayer C, Sibbit A (1980) GLOBAL: a new approach to computerprocessed log interpretation. SPE Paper 9341, 12 pp Quirein J, Kimminau S, Lavigne J, Singer J, Wendel F (1986) A coherent framework for developing and applying multiple formation evaluation models, paper DD. In: 27th annual logging symposium transactions. Society of Professional Well Log Analysts, 16 pp Savre WC (1963) Determination of a more accurate porosity and mineral composition in complex lithologies with the use of the sonic, neutron, and density surveys. J Pet Technol 15:945–959
Mathematical Morphology Jean Serra Centre de morphologie mathmatique, Ecoles des Mines, Paristech, Paris, France
Definition Mathematical morphology is a discipline of image analysis that was introduced in the mid-1960s. From the very start it included not only theoretical results but also research on the links between geometrical textures and physical properties of the objects under study. Initially conceived as a set theory, with deterministic and random versions, it rapidly developed its concepts in the framework of complete lattices. In geosciences and in image processing, theoretical approaches may rest, or not, on the idea of reversibility. In the first case, addition is the core operation. It leads to convolution, wavelets, Fourier transformation, etc. But the visual universe can be grasped differently, by remarking that the objects are solid and not transparent. An object may cover another, or be included in it, or hit it, or be smaller than it, etc., notions which all refer to set geometry, and which all imply some loss of reversibility. Indeed, the morphological operations, whose core is the order relation, always loose information, and the main issue of the method consists in managing this loss.
Introduction We will begin by a comment. The petrographic example of Fig. 4 is directly the matter of geosciences. Other ones can easily be transposed. The method for detecting the changes of direction in a piece of oak (Fig. 6) also applies to problems of tectonics, and the way to estimate the roughness of a road
Mathematical Morphology
before and after wear (Fig. 7) can serve to quantify the erosion process after an orogeny. Mathematical morphology, which covers both a theory and a practice, appeared in the middle of the years 1960 with the pioneer works of G. Matheron and J. Serra. They aimed at the description of petrographic structures seen under the microscope (porous media, crystals, alloys, etc.), in order to link them to their physical properties (Matheron and Serra 2002). Objects were considered as sets, that were described either by transforming them (e.g., granulometries), or by modeling them in a probabilistic way (e.g., Boolean closed sets). We present here only the first approach, which extended itself through the years 1970 to numerical functions, under the impulse of medical imaging (Serra 1982). During the years 1980, morphological operations acting on more and more varied objects, the need for unification brought out the concept of complete lattice as a common framework (see Serra 1988 in the article “Morphological Filtering”). This algebraic structure lies at the basis of lattice theory, a branch of algebra introduced by the end of the nineteenth century by mathematicians such as E.H. Moore, to whom we owe the crucial notion of a set-theoretical closing, and G. Birkhoff, whose fundamental book has constantly been republished from 1940 until the present time. Algebraists have much studied the structure of lattices, but have nevertheless given scant attention to the concepts of complete lattice and of lattices of operators: for example in Birkhoff’s book, only one (short) chapter among 17 is devoted to them. Though more recent works written by theoretical computer scientists tackle complete lattices and basic operators such as dilations and openings, nevertheless the idea of describing geometrical features by means of lattices never appears in these studies. The chief notions on which Mathematical Morphology rests, such as morphological filters, segmentation by connections, flat operators, the watershed, levelings, etc., are totally absent from them. In fact, the common logical basis of the lattice has been used in mathematical morphology as starting point for exploring new directions. Since the decade of the 1990s, following the works by H. Heijmans, G. Matheron, Ch. Ronse, and J. Serra, among others (Serra 1988; Heijmans and Ronse 1990; Heijmans 1994; Matheron 1996), the theoretical core of mathematical morphology is developing in an autonomous way. Here we base ourselves on these latter references. A comprehensive bibliography for general morphology is given in Najman and Talbot (2010), and for old mathematical references refer Kiselman (2020) and Ronse and Serra (2010). Notation: The generic symbol for a complete lattice is ℒ, with some curvilinear variants for the most usual lattices: P(E) for the family of subsets of the set E, D(E) for that of partitions of E, F for the lattice of functions, among others. More generally, curvilinear capital letters designate families of
Mathematical Morphology
821
sets, such as ℬ for that of invariants of an opening, or a connection C on P (E). Elements of a lattice are generally denoted by lowercase letters; however, we will use capital letters for the elements of the lattices P (E) and TE (respectively of parts of E and of numerical functions E ! T), in order to distinguish them from the elements of E (points) or of T (numerical values), denoted by lowercase letters. Operators acting on lattices are denoted by lowercase Greek letters, of which, in particular, δ, ε, γ, and ’ for a dilation, an erosion, an opening, and a closing, respectively. There are some exceptions, for example, id for the identity.
Complete Lattices Partially Ordered Sets Order relation: Provide the set ℒ with a binary relation that satisfies the following properties: x x ðreflexivityÞ, x y and y x imply x ¼ y ðanti-symmetryÞ, x y and y z imply x z ðtransitivityÞ, for all x, y, z ℒ. Then the relation is called a partial order and the set ℒ is partially ordered (in short, p.o.) by relation . The order becomes total when: 8x, y ℒ, x y or y x:
Examples (a) The set N of all numbers of the open interval [0, 1] is totally ordered for the usual numerical order. The set of all points of Rd equipped with the relation “(x1, x2, . . ., xd) (y1, y2, . . ., yd) when xi yi for all i” is partially ordered. (b) The set P(E) of all subsets of a set E is partially ordered for the inclusion order . Let us remark that unlike the set N of the previous example, P(E) admits a greatest element, E itself, and a least one, namely the empty set 0. Duality: Denote by (ℒ, ) the set ℒ provided with the order relation , and define the relation by x y if and only if y x. This relation is an order on ℒ, and (ℒ, ) is the dual ordered set of (ℒ, ). To each definition, each statement, etc., relative to (ℒ, ), there corresponds a dual notion in (ℒ, ), which is obtained by inverting and . This duality principle, which seems to be obvious, plays a fundamental role in mathematical morphology. When an operation has been introduced, it suggests to look also at the dual version, then at the product of both, etc. (for instance: opening and closing).
Complete Lattices The theory of mathematical morphology rests on the structure of complete lattice. Let ℒ be a p.o. set, and K ℒ . An element a ℒ is called a lower bound of K if a x for all x K , whether a belongs to K or not. If the family of lower bounds of K admits a greatest element a0, it defines the infimum of K . By duality, one introduces the two notions of upper bound and supremum. The infimum (resp., the supremum) of a subset K , if it exists, is unique, and is denoted by inf K or ^K (resp., sup K or _K ). If xi ℒ for a (possibly infinite) family I of indexes i, one writes ^i I xi for ^{xi | i I}, and _i I xi for _{xi | i I}. Lattice: A p.o. set ℒ is a lattice when any nonempty finite subset of ℒ admits an infimum and a supremum. The lattice is said to be complete when this property remains true for all nonempty subsets of ℒ, whether they are finite or not. By definition, every complete lattice ℒ has a least element 0 ¼ ^ ℒ and a greatest element 1 ¼ _ ℒ. These two extreme elements are the universal bounds of ℒ. A subset M of the complete lattice ℒ constitutes a complete sublattice of ℒ when the infimum and the supremum of any family in M belong to M, and when M contains the universal bounds of ℒ. Anamorphosis: The term “anamorphosis” appears in painting during the Renaissance, to designate geometrical distortions of figures in the Euclidean plane R2 that preserve the order of P (R2), so that they permit to find back the initial image from its deformation. But this kind of operation appears in many other lattices. So, the map x 7! log x, for x 0, is an anamorphosis from Rþ ! R, which extends to numerical functions on E. Formally speaking, it can be defined as follows (Matheron 1996): Let ℒ, M be two complete lattices. A map α : ℒ ! M is an anamorphosis, when α is a bijection, and when α and its inverse α1 preserve the order, that is: 8x, y ℒ, x y if and only if aðxÞ aðyÞ
ð1Þ
The most popular morphological operators on gray tone functions are the so-called flat operators, because they commute under anamorphosis. A map ℒ ! ℒ which coincides with its own inverse (α1 ¼ α), like a reflection, is called an inversion, or an involution, of ℒ. Remarkable Elements and Families We now present some remarkable elements or particular subsets that can be found in complete lattices: Sup-generating family: Let ℒ be a complete lattice, and S ℒ a family in ℒ. The class S is sup-generating when every element a ℒ is the supremum of the elements of S smaller than itself:
M
822
Mathematical Morphology
a ¼ _fx Sjx ag: Atom: An element a 6¼ 0 of a complete lattice ℒ is an atom when x a implies that x ¼ 0 or x ¼ a. Any sup-generating family necessarily comprises all the atoms. The lattice ℒ is atomistic when the family of all atoms is sup-generating. For example, for ℒ ¼ P ðEÞ, the singletons are atoms, hence the lattice is atomistic. Co-prime: An element a 6¼ 0 of a complete lattice ℒ is co-prime when a x _ y implies that a x or a y, in a nonexclusive manner. The lattice ℒ is co-prime if the family of its co-primes is sup-generating. Complement: Let ℒ be a complete lattice, with universal bonds 0 and 1. When x, y ℒ are such that x ^ y ¼ 0 and x _ y ¼ 1, they are said to be complements of each other. The lattice ℒ is complemented when each of its elements admits at least one complement. Note that a same element may have several complements. For example, if ℒ is the lattice of all vector subspaces of R2, ordered by inclusion, the complements of a monodimensional subspace are all the other subspaces of dimension 1, whose number is infinite. Distributivity Though many properties involve distributivity, this notion is not always simple, because it can be stated at three different levels. Below, we introduce the finite case only. For the two other ones, the reader may refer to Heijmans (1994) and Matheron (1996). A lattice ℒ is distributive when: 8x, y, z ℒ, x ^ ðy _ zÞ ¼ ðx ^ yÞ _ ðx ^ zÞ, or equivalently: 8x, y, z ℒ, x _ ðy ^ zÞ ¼ ðx _ yÞ ^ ðx _ zÞ: These equalities do not always extend to the case when the collection of those elements inside the first parentheses is infinite. The notions of distributivity, complementation, atom, and co-prime are closely related (Matheron 1996). Theorem In a distributive lattice, the complement of an element, if it exists, is unique and all atoms are co-primes. In a complemented lattice, all co-primes are atoms. Any co-prime complete lattice is infinitely inf-distributive. Monotone Convergence and Continuity Independently of any topology, one can already introduce some convergence and continuity on complete lattices, by relying on their algebraic structure only. Monotone Limit: Every increasing (resp., decreasing) sequence xi(i I N) defines by its supremum
(resp. infimum) the sequential monotone limit x ¼ _i I xi (resp. x ¼ ^i I xi). One then writes xi " x (resp. xi # x). An increasing map c : ℒ ! M between the complete lattices ℒ and M is "-continuous when: xi " x in ℒ ) cðxi Þ " cðxÞ in M, and #-continuous when: xi # x in ℒ ) cðxi Þ # cðxÞ in M: The continuity of an Euclidean map is the touchstone of its digitability. In R2 for example, one can cover any set X by small squares of sides ai centered at the vertices of a grid, of union Xi. If map c is increasing and #-continuous, then the digital transforms of the Xi tend toward that of the Euclidean set X as ai ! 0 (Heijmans and Serra 1992; Matheron 1996). Boolean Lattices G. Boole modeled the logic of propositions by an algebra with two binary operations, namely disjunction and conjunction, and a unary operation, namely negation, which have the usual property of distributivity. His name was thus given to the corresponding class of lattices: Boolean Lattice A distributive and complemented lattice, complete or not, is said to be Boolean. In distributive lattices, each element has at most one complement, and in the Boolean case it is unique. This results in the complementation operation on that type of lattice; just as negation in logic, the complementation is an involution: Theorem Boolean complementation In a Boolean lattice ℒ, the operation of complementation x 7! x is an involution of ℒ: 8x, y ℒ, x
¼ x and ½x y , ½x y : We then have De Morgan’s law: for any finite family xi(i I) in ℒ, we have
^ xi
iI
¼ _ x i and iI
_ xi
iI
¼ ^ x i , iI
and when the lattice ℒ is complete, these two equalities remain valid for the infinite families.
Examples of Lattices Lattices of Sets This section is devoted to a few lattices whose elements are subsets of a given arbitrary set E.
Mathematical Morphology
823
Lattices of P(E) type: The family P(E) of all subsets of E, ordered by inclusion, constitutes a complete lattice, where the supremum and the infimum are given by the union and the intersection. This lattice is distributive. Moreover, for all X P ðEÞ, the set Xc of those points of E that do not belong to X satisfies the two conditions X \ X c ¼ 0 and X [ Xc ¼ E; hence Xc turns out to be a complement of X, and it is unique. Therefore P (E) is Boolean, hence infinitely distributive. The points of E are co-prime atoms, and they form a supgenerating family of P (E). One can wonder which ones of all these nice properties suffices to characterize a lattice of P (E) type. Any finite Boolean lattice is of this type, but it is not always true in the infinite case (see the lattice of the regular closed sets described below). There exist several characterizations of the lattice P(E), we give here two only. Theorem Let ℒ be a complete lattice. Each of the following two properties is equivalent to the fact that ℒ is isomorphic to P (E) for some set E: 1. ℒ is co-prime and complemented (Matheron 1996, p. 179). 2. ℒ is Boolean and atomistic (Heijmans 1994). Moreover, the set E is uniquely determined (up to an isomorphism): by the co-primes in (1), and by the atoms in (2). Lattice of convex sets: The set-theoretical lattices are far from being reduced to P (E) and its sublattices. Here is an example where the order is still the inclusion and the infimum the intersection, but where the supremum is no longer the union. Remember that a set X Rn is convex when for all pairs of points x, y X the segment [x, y] is contained in X (the definition includes the singletons and the empty set). The class of convex sets is closed under intersection, and partially ordered for inclusion, which induces a complete lattice having the intersection for infimum. For the supremum, we must consider the intersection of all convex sets that contain X R2, that is, its convex hull co(X). The supremum of the family Xi is then given by the convex hull of their union: _ Xi ¼ co [ Xi :
iI
iI
This lattice is atomistic, but neither complemented, nor distributive. Lattice of regular closed set: A typical example of Boolean complete lattice is given by the set, ordered by inclusion, of all regular closed set of a topological space, namely of
those sets equal to the closure of their interior: F ¼ Fo . The operations of supremum, of infimum, and of complementation take here the following form: _ Fi ¼ [ F i ¼
iI
iI
∘
[ Fi
iI
, ^ Fi ¼ iI
∘
\ Fi
iI
:
This complete lattice is Boolean, and we have Fc ¼ ðF∘ Þc. Regular sets are the only ones which can correctly be approximated by regular grids of points. In geometrical modeling used for image synthesis, objects are represented by regular closed sets, and the Boolean operations on these objects take the above form. For example, in R3 a closed cube (i.e., that contains its frontier) is a regular closed set, and given two cubes that intersect on a face or on an edge, the infimum of the two will be empty, as their intersection has an empty interior. Lattices of Numerical Functions Complete chain: One calls a complete chain any complete lattice which is totally ordered, as typically the completed Euclidean line R ¼ R [ f1, þ1g , or its discrete version Z ¼ Z [ f1, þ1g. The sets Rþ and Z þ restrictions of the previous ones to numbers 0, are also complete chains. The three chains R, Rþ , and Z þ are isomorphic, but Z and Z þ are not. In R the two operations of numerical supremum and infimum are continuous for the ordering topology. When the distinction between the various types of complete chains is superfluous, one usually takes the symbol T for representing them in a generic manner. E E Lattices R and Z of numerical functions: Let E an E arbitrary set. Equip the class R of real functions F : E ! R with the partial order: F G if for all x E, F(x) G(x). This order induces a complete lattice where the supremum and infimum are given by: G ¼ _ Fi , 8x E, GðxÞ ¼ sup Fi ðxÞ, iI
iI
H ¼ ^ Fi , 8x E, H ðxÞ ¼ inf Fi ðxÞ: iI
ð2Þ
iI
This is nothing but a power of the lattice R , that is, a product of lattice R by itself |E| times, hence the expression E of power lattice, and the notation R . The pulse functions ux,t defined by: ux,t ðyÞ ¼ t if y ¼ x, ux,t ðyÞ ¼ 1 if x 6¼ y, of parameters x E and t R, are co-primes elements but E not atoms, and they constitute a sup-generating family of R ; E the same notion applies to Z by taking the pulses ux,t for
M
824
Mathematical Morphology E
E
t Z. Both lattices R and Z are infinitely distributive, but not complemented. Note in passing that we must start from the completed line R , which is a complete chain (or from one of its closed sublattices Rþ , [0, 1], Z, Zþ , etc.), and not from the line R, for the latter is not a complete chain. Indeed, equation (2) implies that G(x) might be equal to +1, or that H(x) might be equal to 1, even when all Fi(x) are finite. The lattice P (E) of sets, ordered by inclusion, is isomorphic to lattice 2E of all binary functions E ! {0, 1}, equipped with the numerical ordering. Lattices of the Lipschitz functions: The class of all numerical functions is too comprehensive for modeling images of the physical world, and has too many pathological cases unsuitable for discrete approximations. On the other hand, the class of all continuous functions does not form a complete lattice: for example, the functions xn, n 0, taken between 0 and 1 are continuous, whereas y ¼ ^n 0xn is not. The upper (or lower) semi-continuous functions on Rn do form a lattice, but it is not closed under subtraction, and its supremum, namely the topological closure of the usual supremum, is not pointwise. In addition, these functions, which may have no minimum, are inadequate candidates for operations such as the watershed, for example. We must find something else. When the starting space E is metric of distance d, the functions F : E ! R that are the most convenient for morphological operators are undoubtedly the Lipschitz functions, or their extensions under anamorphoses. The Lipschitz functions of module o satisfy the inequality: 8x, y E, jFðxÞ FðyÞj o d,
ð3Þ
2. For each module o, the F o functions form a complete E
3.
4.
5.
6.
7.
8.
9.
10.
sublattice of the lattice R of all numerical functions on R, hence with pointwise sup and inf. Each F o is a compact space for both topologies of pointwise convergence, and of uniform convergence, identical here. When the space E is affine, each lattice F o is closed under any (non-necessarily flat) dilation or erosion which is invariant under translation, and these operations are continuous. For example, if E ¼ Zn or Rn, and if B stands for the unit ball, then for F F o, Beucher’s gradient 12 ½ðF BÞ ðF BÞ F o . If {K(x)|x E} is a family of variable compact structuring elements, whose Hausdorff distance h satisfies the inequality h[K(x), K(y)] d(x, y), then F o is closed under flat dilation and erosion according to the K(x), and these operations are continuous, The two above properties extend to suprema, to infima, and to finite composition products of dilation and erosion. If g(dy) is a measure such that E|g(dy)| 1, then the lattice F o is closed under convolution by g, and this operation is continuous. When E and R are sampled by means of regular grids, then the previous operations can be arbitrarily approximated by their digital versions (a consequence of the continuities). Since each F o is a compact space, one can define probabilities on it, and generalize the theory of the random sets. The finite products of lattices F o still satisfy all above properties (e.g., multispectral images).
The class F o of these functions has three basic properties; for every module o: 1. The constant functions belong to F o 2. a R and F F o imply that a + F F o 3. F F o implies that F F o
Lattice of Partitions Now here is a lattice whose objects are no longer sets of points, or numerical functions, but cuttings of space.
Conversely (Matheron 1996), every complete sublattice ℒ E of R satisfying these three properties and supF ℒ|F(y) F(x)| < + 1 for all x, y E, must be Lipschitz for some metric on E. When we state the incredible series of properties that the F o do satisfy, one wonders by which miracle they fit so well with the morphological requirements (Serra 1992; Matheron 1996):
Definition Partition Let E be an arbitrary space. A partition of E is a family P of subsets of E, called classes, which are:
Equivalently, the partition corresponds to a map D : E ! P ðEÞ that satisfies the following two conditions: • for all x E, x D(x) • for all x, y E, DðxÞ \ DðyÞ 6¼ 0 ) DðxÞ ¼ DðyÞ
1. If function F F o is finite at one point x E, then it is finite everywhere.
1. Non-empty: 0 P 2. Mutually disjoint: 8X, Y P, X 6¼ Y ) X \ Y ¼ 0 3. Whose union covers E : [ P ¼ E
Mathematical Morphology
825
D(x) is called the class of the partition in x. Partitions intervene in image segmentation (see article “Morphological Filtering”). To segment a graytone or color image, one partitions its space of definition into zones which are homogeneous in some sense. Therefore, it should be useful to handle these partitions just as we do with sets or with functions. The set D (E) of partitions of E is partially ordered by refinement: we say that partition Df is finer than partition Dg, or that Dg is coarser than Df, and we write Df Dg, when each class of Df is included in a class of Dg: Df Dg , 8x E, Df ðxÞ Dg ðxÞ: The set D(E), provided with this order is a complete lattice; the greatest element (the coarsest partition) is the universal partition D1 whose unique class is E, whereas the least element (the finest partition) is the identity partition D0 whose classes are all singletons: D1 ðxÞ ¼ E and D0 ðxÞ ¼ fxg:
The class at point x of the infimum of a family {Di|i I} of partitions is nothing but the intersection of the classes Di(x): 8x E,
^ Di ðxÞ ¼ \ Di ðxÞ:
iI
iI
The supremum of partitions Di(i I) is less straightforward. Formally, it is the finest partition D such that Di(x) D(x) for all i I and x E, see Fig. 1. The lattice D (E) of partitions of E is not distributive. However, it is complemented and atomistic, the atoms being those partitions where exactly one class is a pair {x1, x2}, all other classes being singletons. Lattices of Operators We just modeled objects and cuttings. But we can as well represent as lattices most of the families of morphological operations, which is a major achievement of Mathematical Morphology. In the set-theoretical case, for example, the family A of all maps from P (E) into itself is ordered by the following relation between elements α, β of A: ab
,
8 X P ðEÞ,
aðXÞ bðXÞ:
This induces the following complete lattice with obvious supremum: a ¼ _ ai iI
,
8X P ðEÞ,
aðXÞ ¼ [ ai ðXÞ, iI
Mathematical Morphology, Fig. 1 Supremum of two partitions
and simili modo for the infimum. The universal bounds are the constant operators X 7! E and X 7! 0. The lattice A is indeed P (E) at power P (E); it is therefore Boolean and atomistic. Duality by complementation: A new duality appears here, different from that induced by the order. As P (E) is complemented, we can associate with any element a A its dual α* for the complementation by means of the relation
8x E,
8X P ðEÞ,
a ðXÞ ¼ ½aðXc Þc :
ð4Þ
Operator α* has all properties dual (in the sense of the order) of those of α; in particular a1 a2 , a 1 a 2 ,
_ ai
iI
¼ ^ a i : iI
Note that (α ) ¼ α, that is, an operator is the dual of its dual. We can now generalize and consider the set O of all operators ℒ ! ℒ for an arbitrary complete lattice ℒ. This set is ordered like A and has a similar structure of complete lattice. We are thus able to consider operators having the various properties defined below (increasingness, extensivity, closing, etc.) on classes of objects, for example, functions and partitions, as soon as the latter are themselves structured in complete lattices. Many lattices are not isomorphic to their dual in the sense of the order, and therefore do not admit an inversion, for example: the lattice of convex sets, that of partitions, etc. It means that the duality by complementation cannot be extended to this type of lattice. In practice, in these lattices,
M
826
Mathematical Morphology
dual operations, for example, opening and closing, will look very dissimilar. Duality by inversion: In the lattices of numerical funcE tions E ! R , for each real constant t, the map N : R ! E R : F 7! t F is an inversion. The same applies if we take the interval [a, b] instead of R, provided that we put t ¼ a + b. That allows us to establish a duality similar to equation (4). E E For any operator b : R ! R , the dual β* defined by:
b ðFÞ ¼ N ðbðN ðFÞÞÞ
plays a role similar to α in the set-theoretical case. Increasing operator: An operator a O is increasing when: xy
)
aðxÞ aðyÞ:
The glasses of a myopic person are a typically increasing. We easily see that the increasing operators form a complete lattice O 0 , a complete sublattice of O. Extensive and anti-extensive operator: An operator a O is extensive (resp., anti-extensive) when for all x ℒ we have x α(x) (resp., x α(x)). Both families of extensive operators and of anti-extensive ones are closed under nonempty supremum and infimum, but they do not form complete sublattices of O because they do not admit the same universal bounds. For example, the least extensive operator is the identity id : x 7! x whereas the least element of O is the operator x 7! 0; Idempotent operator: An operator a O is idempotent if α2 ¼ α, that is, when xℒ
Moore Families and Closings
ð5Þ
*
8x, y ℒ,
They are also the easiest operators to characterize. We shall thus describe them in the first place. Closings in complete lattices had already been studied by algebraist mathematicians since the years 1940, like R. Baer, C. Everett, and O. Ore. These works form the basis of the theory developed subsequently in mathematical morphology.
aðaðxÞÞ ¼ aðxÞ
Idempotence occurs everywhere in physics and in geosciences, and it also appears in the daily life at all times. To look through a yellow glass (supposed to be bandpass), to empty a bottle, to be born, to die are all idempotent operations. One understands why idempotence, associated with increasingness, is at the root of morphological filtering. One can also consider the complete lattice of operators α : ℒ ! M, where ℒ and M are two distinct complete lattices; for example, ℒ ¼ TE and M ¼ P ðEÞ , and α is a binarization. Then some notions keep their meaning (e.g., increasingness), other loose it (e.g., (anti-) extensivity, idempotence). Note that the product αβ supposes that the starting lattice of α coincides with the arrival lattice of β.
Closings and Openings Openings and closings are at the basis of morphological filtering of images (see the article “Morphological filtering”).
In mathematics, one encounters numerous examples of objects closed under one operation. For example, a set X in Euclidean space is convex if it is closed under the operation joining any two points by a segment. If the set is not closed, one defines then its closing as the least closed set containing it, namely its convex hull C(X). Remark that X C(X), that for any two sets X, Y R2 if X Y, then C(X) C(Y), and finally that C(X) ¼ C[C(X)]. The convex hull operation is thus extensive, increasing, and idempotent. As seen by E. Moore in 1910, the two notions of “closed set” and of “closing” correspond to each other: a closed object is the closing of an object, while the closing of an object is the least closed object containing it. This assumes that among all closed objects containing a given object, there exists one that is the least one. In other words, the lattice structure is perfectly well-suited here. Definition Let ℒ be a complete lattice, whose least and greatest elements are 0 and 1. 1. A closing on ℒ is an operator ’ on ℒ that is extensive, increasing, and idempotent. 2. The invariance domain of an operator c on ℒ is the set ℬc ¼ fx ℒjcðxÞ ¼ xg: 3. A part M of ℒ is a Moore family when M is closed under the operation of infimum and includes 1. In fact, the notions of Moore family and of closing represent the same thing under two different angles: Theorem Closings and Moore families There is a one-toone correspondence between Moore families in ℒ and closings on ℒ: • To a Moore family M one associates the closing ’ defined by setting for every x ℒ : ’(x) is the least y M such that y x. • To a closing ’ one associates the Moore family defined by its invariance domain ℬ’ ¼ {’(x)|x ℒ}.
Mathematical Morphology
Openings One can take the dual point of view, by inverting the order and transposing supremum and infimum. The previous definitions and theorems become thus the following statements: 1. One calls an opening on ℒ an operator γ that is antiextensive, increasing, and idempotent. 2. One calls a dual Moore family of ℒ a part M closed under the operation of supremum and includes 0, As previously, here is a one-to-one correspondence between dual Moore families in ℒ and openings on ℒ: • To a dual Moore family M one associates the opening γ defined by setting for every x ℒ : γ(x) is the greatest y M such that y x. • To an opening γ one associates its invariance domain ℬγ ¼ {γ(x)|x ℒ} which is a dual Moore family.
Generation of Closings and Openings One can construct openings and closings by using their structures. Theorem Structural theorem of openings Let ℒ be a complete lattice and ℒ0 be the lattice of the increasing operations on ℒ. The set of all openings on ℒ is closed under the supremum in ℒ0. For any family {γi} of openings, the supremum _i I γi is itself an opening. The least opening is the constant operator x 7! 0, the greatest one is the identity id. (dual statements for the closings). Remark, however, that neither the product γ1γ2 of two openings nor the infimum ^i I γi openings are openings. The structural theorem orients us towards the construction of openings by suprema of a few basic ones. In practice indeed, all openings derive from two ones, namely the opening by adjunction defined by Eq. (6) and the connection opening defined by Eq. (2) of the article “Morphological Filtering.”
827
However, the algebraists have not given attention to the properties of lattices that derive from Euclidean space, such as invariance under translation, convexity, and similarity, and missed completely the Minkowski operations. In fact, it is to the school of integral geometry, which knew nothing about lattices, that one owes the primary adjunction of mathematical morphology. In 1903, Minkowski defined the addition of two Euclidean sets by the relation: X B ¼ fb þ x, x Xjb Bg ¼ [ Bx ¼ [ Xb : xX
bB
where Bx (resp. Xb) indicates the translate of B at point x (resp. X at point x). When one fixes B, the operator δB : X 7! X B, is a dilation, in the sense that it commutes with the union, and provides even the general form of dilations invariant under translation in P ðEÞ, where E ¼ Rn or Zn. The adjoint erosion: eB : X 7! X B ¼ \ Xb , b Bˇ
where Bˇ ¼ fbj b Bg is the reflected set of B, is named the Minkowski subtraction, although he himself did not envisage it. It appeared with Hadwiger in 1957, who nevertheless did not envisage, himself, the opening by adjunction. The eroded X B is the locus of those points x that Bx is included in X, and the dilate X B is the locus of those points x that Bˇ x hits set X (the geological term of “erosion” obviously has another meaning). In the set case, most of the structuring elements met in practice, like the segment, the square, the hexagon, the disc, the sphere, are symmetrical, that is, Bˇ ¼ B and the small hat can just be omitted. Figure 2 (left and center), illustrates two usual Minkowki operations. Note that the erosion by the disk centered about the origin o is antiextensive, but not that by the pair of points, that does not contain o. Figure 2 (right) introduces another type of dilation, namely by geodesic disks (Lantuéjoul and Beucher 1981). We present it in the digital case. Consider a digital metric on Zn, and let δ(x) stands for the unit ball centered at point x, then the unit geodesic dilation of set Y inside set X is defined by the relation: dX ðY Þ ¼ dðY Þ \ X
Dilation and Erosion Mikowski Operations In the same way as closings and openings, dilations and erosions represent a pair of fundamental notions in mathematical morphology. Their correspondence is intimately linked to that of Galois correspondence, which had already been studied by mathematicians under the name of adjunction.
The dilation of size n of Y inside X is then obtained by iteration: dX,n ðY Þ ¼ dð. . . dðdX ðY ÞÞ \ X . . .Þ \ X
n times
Unlike the Minkowski addition, it is not invariant under translation of Y when the background X is fixed. For n large enough, and if we assumed that set X is regular and bounded,
M
828
Mathematical Morphology
Mathematical Morphology, Fig. 2 Left: Minkowski addition and subtraction by a disk (the initial cat is the median gray image); center: erosion by a pair of points equidistant from the origin; right: in gray, geodesic dilation of the white spot for the hexagonal metric
the geodesic dilation of point p, considered as a marker, completely invades the connected component of X hit by p. _dX,n ðXÞ ¼ gp ðXÞ n
ð6Þ
The operator γp is therefore called the connection opening at point p, and is studied in the article “Morphological Filtering.” It is Matheron (1967) who introduced, in view of granulometries, the opening by adjunction: gB ¼ gB eB : X 7! ðX BÞ B
ð7Þ
by a structuring element B. The open set γB(X) is the union of those sets Bx that are included in X. The corresponding Moore family is the invariance domain ℬγ of γB, which consists of all unions of translates of B. The closing dual for complementation is: ˇ ’Bˇ ¼ eBˇ dBˇ : X 7! X Bˇ B: The notations δB, εB, γB, and ’B mean that one lies in Rn or Z and that the operators, with structuring element B, are invariant under translation. For linking openings and granulometries, Matheron bases himself on the following theorem (Matheron, 1975): n
Theorem A family{Bl| l 0} constitutes the continuous additive semi-group: B Bm ¼ Blþm , l, m 0 if and only if each Bl is homothetic with ratio A of a convex compact B. In the structure of mathematical morphology, this theorem plays the role of a corner-stone, which distributes the
load in various directions. It leads on the one hand to granulometries, but opens also the door to partial differential equations (Maragos 1996), and finally permits to decompose convex structuring elements for implementing the easily. In particular, any family {γlB| l 0} of openings by homothetic convex sets turns out to be a granulometry. Figure 3 depicts a granulometry of the black phase by discs, or equivalently increasing closings of the whites. An Example In a blast furnace, chemical reductions of the initial pellets of sinters (magnetite or hematite) result in iron, according sequences of the type hematite ! ferrite ! iron The reductions are obtained by circulation of CO through the pores. However, these overall relations do not tell us how the process occurs (Jeulin 2000). Is the whole hematite pellet transformed into a whole pellet of ferrite, and then into iron? In Fig. 4, the reduction process is caught red-handed. It is a polished section taken from the middle of the blast furnace. The three phases are imbricated. Is the ferrite phase relatively closed to hematite, or not? A convenient tool for answering the question, here, is the rectangular covariance Cij(h). Take a structuring element made of two points from h apart in the horizontal direction, Cij(h) is the proportion of cases when the left point falls in phase i and the right one in phase j (Fig. 5). The slope of covariance near the origin is proportional to the contact surface per unit volume. The two ferrite covariances are similar. Unlike, that between hematite and pores starts from more below, which means smaller contact surface. Moreover it presents a hole effect for h ¼ 50 this indicates a halo of ferrite around hematite.
Mathematical Morphology
829
Mathematical Morphology, Fig. 3 Initial set, and closings ’B (center) and ’B0 (right) with B and B0 discs, and B B0
M
Mathematical Morphology, Fig. 5 The three rectangle covariances of the sinter specimen
Mathematical Morphology, Fig. 4 Light gray: hematite; dark gray: ferrite; black: pores
Dilations and Erosions, Adjunctions The generalization of the Minkowski operations, and of their links, is straightforwards: Definition. Dilation, Erosion, and Djunction Let ℒ and M be two complete lattices (identical or distinct). 1. An operator δ : ℒ ! M is a dilation when it preserves the supremum: 8fxi ji I g ℒ,
e ^ xi iI
¼ ^ eðxi Þ; iI
in particular, for I empty, ε(1) ¼ 1. 3. Two operators δ : ℒ ! M and ε : M ! ℒ form an adjunction (ε, δ) when: 8x ℒ, 8y M,
dðxÞ y , x eðyÞ:
ð8Þ
Dilation and erosion are dual notions. It is easy to see that both are increasing operators. Conversely, every increasing operator preserving 0 (respectively, preserving 1) is an infimum (respectively, a supremum) of dilations (respectively, erosions) (Serra 1988). Theorem Adjunctions. Let ℒ and M be two complete lattices. Adjunctions constitute a bijection between dilations ℒ ! M and erosions M ! ℒ, that is:
830
Mathematical Morphology
1. Given two operators δ : ℒ ! M and ε : M ! ℒ forming an adjunction (ε, δ), δ is a dilation and ε is an erosion. 2. For every dilation δ : ℒ ! M, (resp. erosion ε) there exists a unique erosion ε : M ! ℒ (resp. dilation δ) such that (ε, δ) is an adjunction.
Composing the erosion and the dilation by a structuring element, one obtains according to the order of composition the opening γ and the closing ’ by that structuring element. That remains true in the general case. Theorem Opening and closing by adjunction Let ℒ and M be two complete lattices, and let a dilation δ : ℒ ! M and an erosion ε : M ! ℒ form an adjunction (ε, δ). Then: 1. ε ¼ εδε and δ ¼ δεδ. 2. δε is an opening on M whose invariant elements are the dilates of the initial elements ℬδε ¼ δ(ℒ). 3. εδ is a closing on ℒ and ℬεδ ¼ ε(M). The two lattices ℒ and M often are identical, and δ, ε, δε, and εδ are just operators on ℒ. General Set-Theoretical Case Let us illustrate these notions in the case of lattices of the form P ðEÞ. According to the definition of dilation and erosion, an operator is a dilation if it preserves the union, and an erosion if it preserves the intersection. One can characterize a dilation by its behavior on the points identified to singletons; for every point p, let us write δ(p) for δ({p}). Then: Theorem Let E1 and E2 be two spaces. A map d : P ðE1 Þ ! P ðE2 Þ is a dilation if and only if: 8 X P ðE1 Þ, dðXÞ ¼ [ dðxÞ: xX
ð9Þ
The adjoin erosion e : P ðE2 Þ ! P ðE1 Þ is then given by: 8 Y P ðE2 Þ, eðY Þ ¼ fx E1 jdðxÞ Y g:
ð10Þ
_ dB i ¼ d[ i I
iI
Bi
and ^ eBi ¼ e[i I Bi : iI
The rules of compositions of dilations and erosions derive from these relations, which imply: dB1 dB2 ¼ dB2 dB1 ¼ dB1 B2 eB1 eB2 ¼ eB2 eB1 ¼ eB1 B2 The opening γB ¼ δBεB admits for invariant sets the dilated by B, that is, the A B for A P ðEÞ. Similarly, the invariance domain of the closing ’B ¼ εBδB is formed of sets eroded by B, that is, the A B pour A P ðEÞ. We conclude this section by a more general result: every opening invariant under translation, in Rn or Zn, can be written as a supremum of openings of the type γB (Matheron 1975), and more generally, every opening in a complete lattice expresses itself under the form of a supremum of openings by adjunction (Heijmans and Ronse 1990). Case of the Unit Circle Morphological operations on the unit circle appear in color images processing, where the hue spans the unit circle. We will not develop morphology for color images here, and refer to Angulo (2007). Another field of applications is the processing of directions, more useful in geosciences. We give below a few indications in the 2-D case of the unit disc C. The approach extends to the unit sphere, but the formalism is less simple. The unit disc C, like the round table of King Arthur knights, has no order of importance, and no dominant position. This signifies we cannot construct a lattice on it, unless the phenomenon under study imposes its origin. However, there are three paths to bypass this interdiction, by focusing on increments, a wide class which covers residues, gradients, medians, etc. (Hanbury and Serra 2001). The points ai on the unit disc C of center O are indicated by their angle, between 0 and 2π from an arbitrary origin a0. Given two points a and a0 , we use the notation a a0 to indicate the value of the acute angle aOa0 , that is, a a0 ¼j a a0 j
if j a a0 j p
a a0 ¼ 2p j a a0 j if j a a0 j p In the case where E1 ¼ E2 ¼ E ¼ Zn or Rn, one easily sees by equation (9) that every dilation invariant under translation is of the form δB : X 7! X B, with the structuring element B ¼ δ(o), the dilation of the origin. An erosion is invariant under translation if and only if the adjoined dilation is invariant under translation, hence it will be of the form εB : X 7! X B. The dilation δB is increasing in B, but εB is decreasing in B. Moreover, one has:
Now consider the opening by adjunction γB(f) of function f by B, and indicate by {Bi, i I} the family of structuring elements which contain point x. The residue at point x can be written f ðxÞ gB ðf ÞðxÞ ¼ sup iI
inf ½f ðyÞ f ðxÞ ,
y Bi
Mathematical Morphology
831
in which there are only increments of the function f around point x. This general relation can be transposed to circular values a and gives
the dilated function δG(F). In the translation invariant case the formalism is as follows: dðF GÞðxÞ ¼ _ ½Fðx yÞ þ GðyÞ, yE
f ðxÞ gB ðf ÞðxÞ ¼ sup iI
inf ½aðxÞ aðyÞ :
y Bi
Gradients and medians on the unit disc are obtained in the same manner. An Example When sorting oak boards destined to make furniture, it is necessary to find the knots, which appear as fast changes of direction, and to measure the sizes. The average direction of the veins is calculated over small squares, and the residues of circular openings on these directions indicate the knots. Figure 6 depicts the residues found onto the original image, for the four sizes 3, 5, 7, and 9 of the structuring element. Case of Numerical Functions Let T ¼ R, Z or a closed interval of R or Z and E ¼ Rn or E ¼ Zn. A numerical function E ! T is equivalent to its subgraph, or umbra in the product space E T (Sternberg, 1986), that is to the set of all points (x, t) which are below (x, F(x)), and dilating the umbra of function F by that of function G still results into an umbra, which in turn defines
eðF GÞðxÞ ¼ ^ ½Fðx þ yÞ GðyÞ
ð11Þ
yE
An Example This example concerns the wear of road surface. Under the traffic, the bitumen of a road loses its initial roughness quality, and one wishes to measure this phenomenon (Serra 1984). A ring of 20 m emulates the road, and a massive set of four wheels emulates the truck. Two experiments, of 1.000 and then 10.000 passages of 6.5 t trucks going at 65 km/h, are made. The relief of the road is measured afterwards by a sensor which produces the two plots of Fig. 7 (left). The set under the plot is eroded by a segment of length l and direction α, and the area reduction is averaged in α. This result in the function P(l) whose second derivative f(l) is the histogram of the intercepts. It admits the following limited expansion near the origin:
Mathematical Morphology, Fig. 6 Detection of knots in an oak board as changes in direction
f ðl Þ ¼
a l EðCÞ
M
832
Mathematical Morphology
Mathematical Morphology, Fig. 7 Left: changes in the roughness of a road surface; right: beginnings of the histograms of the road intercepts
where a is a constant and C stands for the average of the average curvature, assuming that the road relief admits curvatures. By measuring f(h) in six directions and applying the above formula, we find: E(C) ¼ 22102mm2 before wear and E(C2) ¼ 102mm2 after wear (Fig. 7 (right)). In spite of the curvatures assumption, of the poor number of directions, and the passage to a second derivation, the model still provides a pertinent information. Remark the stereological meaning of the result: by means of 1-D structuring elements one reaches, by rotation averaging, an estimate of the 3-D average curvature.
geodesic formalism. For B large enough, the dilation becomes idempotent. It is called then the reconstruction opening of F inside the mask function M (Fig. 8 right). The operations based on flat structuring elements commute under anamorphosis of the gray levels, and are the only ones to do it. This remarkable property implies three very useful consequences: 1. It makes the processing independent of the creation of the image. For example, the angiograph below may, or may not, have been obtained by a camera equipped with the so called “γ correction” which takes the logarithm of the signal. A flat opening γ ensures us that
Flat Structuring Elements log½gðFÞ ¼ g½logðFÞ: The most popular numerical dilations and erosions are those obtained by level set approach, where each thresholded set is treated individually. This amounts to use “flat” structuring functions, that is, to take for G the half cylinder: x B ) G B ðxÞ ¼ 0
x B ) GB ðxÞ ¼ 1
The relations (Eq. 11) of numerical dilation and erosion take the simpler form: dðF GÞðxÞ ¼ _ ½Fðx yÞ, y B, yE
eðF GÞðxÞ ¼ ^ ½Fðx yÞ, y B yE
The numerical geodesic dilation is obtained by replacing the binary dilation by its numerical version in the binary
2. An n-bits image is transformed into an n-bits image, and in particular the indicator functions are preserved. 3. Reducing the dynamic of function F does not damage the processing. For example, in demographic data, F often p range from 1 to 105, but then F ranges from 1 to 512 and can be considered as a usual image of 9 bits. An Example In ophthalmology, the presence of aneurysms on retina is the sign of diabetes. How to extract them from an angiograph? The left part of Fig. 9 depicts the angiograph F of the retina of a diabetic patient. The aneurysms consist of small white spots, but they are blurred by several vessels of various thicknesses and by the heterogeneity of the background.
Mathematical Morphology
833
Mathematical Morphology, Fig. 8 Left: Geodesic flat dilation; right: and reconstruction opening
M
Mathematical Morphology, Fig. 9 Left: Retina image; right: residue of a flat opening by segments
The first idea here is to perform, in all directions, openings by linear structuring elements, small but longer than the diameters of the aneurysms, and to take their union γ(F) over all directions. In practice the directional average is performed in
the three directions of a hexagonal grid. The segments are included inside the long narrow tubes that are the vessels, which therefore belong to the opening and should disappear in the residue F γ(F). One can see this residue in Fig. 9
834
Mathematical Morphology
Mathematical Morphology, Fig. 10 Left: reconstruction opening of the angiograph; right: corresponding residue
(right). The former gray background is now uniformly black, which is nice, but some regions of the vessels were not caught, and are mixed with the aneurysms in the residue. This partial result suggests to replace the adjunction opening by a (still flat) reconstruction opening which invades the vessels. Figure 10 (left) depicts the reconstruction opening γrec(F). All vessels have been reconstructed, and indeed the residue F γrec(F), visible in Fig. 10 (right), exclusively shows the aneurysms.
Summary The above theory has suggested several developments, in various directions, that were not presented here. In geosciences, the potential of description of mathematical morphology is often used for generating synthetic images of reliefs, or of rivers networks (Sagar 2014). In physics, mathematical morphology leads to stochastic models for sets and functions, which serve in the relationships between textures and physical properties of solid materials (Jeulin 2000). In mathematics, two other branches are the concern of differential geometry (Maragos 1996) and of the digital version of the method (Kiselman 2020; Najman et al. 2005).
Cross-References ▶ Matheron, Georges ▶ Morphological Filtering
Bibliography Angulo J (2007) Morphological colour operators in totally ordered lattices based on distances. Application to image filtering, enhancement and analysis. Comput Vis Image Underst 107(2–3):56–73 Hanbury A, Serra J (2001) Morphological operators on the unit circle. IEEE Trans Image Process 10(12):1842–1850 Heijmans M (1994) Morphological image operators. Academic, Boston Heijmans H, Ronse C (1990) The algebraic basis of mathematical morphology: I. Dilations and erosions. Comput Vis Graph Image Process 50:245–295 Heijmans H, Serra J (1992) Convergence, continuity and iteration in mathematical morphology. J Vis Commun Image Represent 3:84–102 Jeulin D (2000) Random texture models for materials structures. Stat Comput 10:121–131 Kiselman O (2020) Elements of digital geometry, mathematical morphology, and discrete optimisation. Cambridge University Press, Cambridge Lantuéjoul C, Beucher S (1981) On the use of the geodesic metric in image analysis. J Microsc 121:39–49 Maragos P (1996) Differential morphology and image processing. IEEE Trans Image Process 5(8):922–937
Matheron, Georges Matheron G (1967) Eléments pour une théorie des milieux poreux. Masson, Paris Matheron G (1975) Random sets and integral geometry. Wiley, New York Matheron G (1996) Treillis compacts et treillis coprimaires. Tech. Rep. N-5/96/G, Ecole des Mines de Paris, Centre de Géostatistique Matheron G, Serra J (2002) The birth of mathematical morphology. In: Talbot H, Beare R (eds) Proceedings of VIth international symposium on mathematical morphology. Commonwealth Scientific and Industrial Research Organisation, Sydney, pp 1–16 Najman L, Talbot H (2010) Mathematical morphology: from theory to applications. Wiley, Hoboken Najman L, Couprie M, Bertrand G (2005) Watersheds, mosaics and the emergence paradigm. Discrete Appl Math 147(2–3):301–324, special issue on DGCI Ronse C, Serra J (2010) Algebraic foundations of morphology. In: Najman L, Talbot H (eds) Mathematical morphology. Wiley, Hoboken, pp 35–80 Sagar BSD (2014) Cartograms via mathematical morphology. Inf Vis 13(1):42–58 Serra J (1982) Image analysis and mathematical morphology. Academic, London Serra J (1984) Descriptors of flatness and roughness. J Microsc 134(3):227–243 Serra J (1992) Equicontinuous functions, a model for mathematical morphology. In: Non-linear algebra and morphological image processing, proceedings, vol 1769. SPIE, San Diego, pp 252–263 Sternberg S (1986) Grayscale morphology. Comput Graph Image Process 35:333–355
Matheron, Georges Jean Serra Centre de Morphologie Mathmatique, Ecole des Mines, Paristech, Paris, France
Biography G. Matheron, who was born in Paris in December 1930, entered the prestigious Ecole Polytechnique in 1949 and the Ecole des Mines de Paris two years later. In 1952, he landed in Algeria with wife and child for working at the Algerian Mining Survey. He took over its scientific management in 1956, then its general management (1958). Starting from the works of Krige, from South Africa, and de Wijs, from the Netherlands, he created a theory for estimating mining resources he named Geostatistics. He designed the basic mathematical notion of a variogram, which is both the key for estimation problems and excellent tool for the describing spatial variability within mineral deposits. He invented random functions with stationary increments, while being skeptical about the underlying
835
probabilistic framework. Every deposit is a unique phenomenon. Studied in itself, it does not enter the scope of probabilities. It was within that semi-random framework that G. Matheron wrote his two volumes “Trait de Gostatistique Applique” in 1962, which was based upon the Algerian experience. He defended his PhD thesis in 1963, in which the deterministic and random parts of his approach were developed successively. But G. Matheron waited 20 years before formulating, in Estimating and Choosing (Matheron 1989), which choice to make of either approach according to the context. In 1968, the Ecole des Mines de Paris gave G. Matheron the opportunity to create the Centre de Morphologie Mathmatique, in Fontainebleau. From that time onward, geostatistics gained international recognition, and requests for mining estimations arrived from the five continents. In 1971 he wrote the notes for a course on the theory of regionalized variables (Matheron 1971) which subsequently became the “bible” for ore reserve estimation in mining industry, although it was not published as a book until 2019 (Pawlowsky-Glahn and Serra 2019)! He was also asked to evaluate oil reservoirs, to map atmospheric pressures, submarine hydrography, etc. G. Matheron responded to these requests by inventing, in1969, the theory of universal kriging which does not require the stationarity constraint, and also inventing nonlinear geostatistics in 1973 for predicting statistical distributions of ore grades in mining panels (Fig. 1). The mining problem of the suitability of grade for grinding led him and Jean Serra to extend the concept of variogram to that of the hit-or-miss transformation, then to that of set opening. They called this new branch of science mathematical morphology, which became progressively independent of geostatistics. G. Matheron extracted from this newly developed morphological applications some general approaches, which could be conceptualized, such as granulometry, increasing operators, Poisson hyper-planes, and Boolean sets, and gathered them into the book entitled “Random Sets and Integral Geometry,” in 1975 (Matheron 1975). In the 1980s, G. Matheron based Mathematical Morphology on the broader level of the complete lattices, thus providing a common approach to sets, functions, and partitions, and constructed the theory of the increasing and idempotent operators that he called morphological filtering (Matheron 1988). He retired in 1995 and passed away five years later. One can find the 11,000 pages of scientific notes written by G. Matheron in the index by name of HTTP://eg.ensmp.fr/ bibliotheque
M
836
Matrix Algebra
Matheron, Georges, Fig. 1 Two photographs of Georges Matheron (courtesy of Mrs Françoise Matheron)
2 12
Bibliography Matheron G (1971) The theory of regionalised variables and its applications. Ecole des Mines de Paris, Paris. 211p Matheron G (1975) Random sets and integral geometry. Wiley, New York Matheron G (1989) Estimating and choosing – an essay on probability in practice. Springer, Berlin/New York Pawlowsky-Glahn V, Serra J (eds) (2019) Matheron’s theory of regionalized variables. Oxford University Press, Oxford
Matrix Algebra Deeksha Aggarwal and Uttam Kumar Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India
Definition Matrices are one of the most powerful tools in mathematics and geosciences with applications in Earth system science. The evolution of the concept of matrices is the result of an attempt to obtain compact and simple methods of solving system of linear equations. A matrix is an ordered rectangular array of numbers or functions. The numbers or functions are called the elements or the entries of the matrix. We denote matrices by capital letters. The following are some examples of matrices:
A¼
4 3
5 1
,B ¼
1
2
14
15
,C ¼
1
2
3
4
5
10
In the above examples, the horizontal lines of elements are said to constitute rows of the matrix and the vertical lines of elements are said to constitute columns of the matrix. Thus A has 3 rows and 2 columns, B has 2 rows and 2 columns while C has 2 rows and 3 columns. Matrix algebra as a subset of linear algebra focuses primarily on the basic concepts and solution techniques. Numerous problems that are addressed in engineering sciences and mathematics are solved by properly setting up a system of linear equations.
Introduction Order of a Matrix A matrix containing n rows and m columns is called a matrix having an order of n m (read as n by m). The total number of elements in a n m matrix is nm. In general, a matrix containing n rows and m columns is represented as
An,m ¼
a1,1
a1,2
. . . a1,m
a2,1 ⋮
a2,2 ⋮
a2,m ⋱ ⋮
an,1
an,2
an,m
nm
Matrix Algebra
837
Types of Matrices Below are the different types of matrices: Column Matrix: When a matrix has only one column, it is said to be a column matrix. For example: a1,1
an,1
A1,m ¼ ½a1,1 a1,2 . . . a1,m 1m Square Matrix: A matrix having number of rows equal to the number of columns, is said to be a square matrix (n ¼ m). The order of such a matrix is n. For example matrix A (shown below) is a square matrix of order n n: a1,1
a1,2
...
a1,n
a2,1 ⋮
a2,2 ⋮
⋱
a2,n ⋮
an,1
an,2
an,n
⋮
⋮
⋱
⋮
0
0
1
nn
a1,1 0
0 a2,2
0 0
⋮ 0
⋮ 0
⋱
⋮ an,n
a
0
0 ⋮
a ⋮
0 ⋱ ⋮
0
0
Rank of a Matrix: The rank of a matrix A, denoted by rankA, is the dimension of column space of A. Null/Zero Matrix: A null or zero matrix is a matrix having all its elements as zero. It is generally denoted by O. Scalar matrix is a null matrix when a ¼ 0. For example:
A¼ nn
Identity Matrix: An identity matrix is a square and diagonal matrix having all its diagonal elements as one. It is generally denoted by In. Scalar matrix is an identity matrix when a ¼ 1. For example:
0 0
0 0
⋮
⋮
⋱
⋮
0
0
0
nn
a1,1
a1,2
. . . a1,m
a2,1 ⋮
a2,2 ⋮
a2,m ⋱ ⋮
an,1
an,2
an,m
b1,1 b2,1
b1,2 b2,2
. . . b1,m b2,m
⋮
⋮
⋱
bn,1
bn,2
bn,m
nm
and
B¼ nn
0 0
Matrix Addition and Subtraction Consider two matrices A and B of the same order n m denoted as:
0
a
ð1Þ
This includes the case when k ¼ 0 and A0 ¼ I.
nn
Scalar Matrix: A diagonal matrix is said to be a scalar matrix when all its diagonal elements are equal. For example:
An,n ¼
0 0
Ak ¼ A: . . . A:I
O¼
Diagonal Matrix: A diagonal matrix is a square matrix having all its non-diagonal elements as zero. For example:
An,n ¼
n1
Row Matrix: When a matrix has only one row, it is said to be a row matrix. For example:
An,n ¼
0 1
Power of a Matrix: For a matrix A of order n n, we write Ak for the product of k (scalar) copies of A times the identity as given below:
a2,1 ⋮
An,1 ¼
In ¼
1 0
⋮ nm
The sum of A þ B is obtained by adding the corresponding elements of the given matrices. Moreover, the two matrices have to be of the same order and the resultant matrix C will also be of the same order n m as shown below:
M
838
C¼
Matrix Algebra
a1,1 þ b1,1
C¼AþB a1,2 þ b1,2 . . .
a1,m þ b1,m
a2,1 þ b2,1
a2,2 þ b2,2
a2,m þ b2,m
⋮
⋮
⋱
⋮
an,1 þ bn,1
an,2 þ bn,2
an,m þ bn,m
Let A be m n and let B and C have sizes for which the indicated sums and products are defined. The matrix multiplication satisfy the following properties:
nm
The difference of A B is obtained by subtracting the corresponding elements of the given matrices, the two matrices have to be of the same order and the resultant matrix C will also be of the same order n m as shown below:
C¼
a1,1 b1,1
C¼AB a1,2 b1,2 . . .
a1,m b1,m
a2,1 b2,1
a2,2 b2,2
a2,m b2,m
⋮
⋮
⋱
⋮
an,1 bn,1
an,2 bn,2
an,m bn,m
nm
The addition of matrices satisfy the following properties: (i) Commutative Law: If two matrices A and B are of same order, then A þ B ¼ B þ A in case of addition of two matrices. (ii) Associative Law: If three matrices A, B, and C are of same order, then (A þ B) þ C ¼ A þ (B þ C). (iii) Additive Identity: If two matrices A and O are of same order among which O is a null matrix, then A þ O ¼ O þ A ¼ A in case of addition of two matrices. Matrix Multiplication The multiplication or product of two matrices A of order n m and B of order s q can be obtained only if the number of columns in A are equal to the number of rows in B, i.e., when m ¼ s. The resultant matrix C obtained by multiplying A and B will be of the order n q. The elements in C will be obtained by element wise multiplication of the rows of A and columns of B and taking sum of all these products as shown below: Let A ¼ [ai, j]n m and B ¼ [bj, k]m q ( m ¼ s) where i [0, n], j [0, m], k [0, q], then the ith row of A is [ai, 1, ai, 2, . . ., ai, m] and the kth column of B is b1,k b2,k , then the element ci,k in the matrix C will be: ⋮ bm,k ci,k ¼ ai,1 b1,k þ ai,2 b2,k þ ai,m bm,k n
ci,k ¼
ai,j bj,k j¼1
C ¼ ½ci,k n q
(i) (ii) (iii) (iv) (v)
A(BC) ¼ (AB)C þ kB A(B þ C) ¼ AB þ AC (B þ C)A ¼ BA þ CA k(A þ B) ¼ kA þ kB k(AB) ¼ (kA)B ¼ A(kB), for any scalar k.
The proof of the properties are beyond the scope of this chapter. Warnings: 1. In general, AB 6¼ BA 2. If AB ¼ AC, then it is not true in general that B ¼ C. 3. If a product AB is a null matrix, one cannot conclude in general that either A ¼ 0 or B ¼ 0. Matrix Scalar Multiplication Multiplication of a matrix by a scalar k is obtained by multiplying each element of the matrix with k as shown below: if
A¼
a1,1 a2,1
a1,2 a2,2
. . . a1,m a2,m
⋮ an,1
⋮ an,2
⋱ ⋮ an,m
nm
then
kA ¼
ka1,1
ka1,2
. . . ka1,m
ka2,1 ⋮
ka2,2 ⋮
ka2,m ⋱ ⋮
kan,1
kan,2
kan,m
nm
The scalar multiplication of a matrix satisfy the following properties: (i) (ii) (iii) (iv) (v) (vi)
k(A þ B) ¼ kA þ kB kA ¼ Ak OA ¼ 0, where O is a null matrix. (k þ l)A ¼ kA þ lA, where k and l are scalars. k(A þ B) ¼ kA þ kB k(sA) ¼ (ks)A
Operations on Matrices This section discusses some of the operations that can be performed on a matrix to obtain another matrix as explained below:
Matrix Algebra
839
i. Transpose of a Matrix: For a matrix A of order n m, transpose of A is obtained by interchanging rows of A with columns of A. It is denoted by AT and will have the order as m n. For example: if
A¼
a1,1 a2,1
a1,2 a2,2
. . . a1,m a2,m
⋮
⋮
⋱
an,1
an,2
an,m
a c
⋮ A¼
nm
a
b
c d det ðDÞ ¼ ad bc
AT ¼
a1,1 a1,2
a2,1 a2,2
...
an,1 an,2
⋮ a1,m
⋮ a2,m
⋱
⋮ an,m
The determinant of a matrix satisfy the following properties: mn
The transpose of a matrix satisfy the following properties: (A þ B)T ¼ AT þ BT and (A B)T ¼ AT BT (kA)T ¼ kAT, where k is a scalar. (AB)T ¼ BTAT, where A and B are matrices. (A1)T ¼ (AT)1 (AT)T ¼ A
Note: The transpose of a product of matrices equals the product of their transposes in the reverse order. ii. Trace of a Matrix: For a square matrix A of order n n, trace of A is the sum obtained by adding its diagonal elements. It is denoted by tr(A). Trace of a matrix exist only for square matrices.
(i) det(A) remains unchanged even if its rows and columns are interchanged, i.e., det(A) ¼ det (AT). (ii) The sign of det(A) changes if any two rows or columns are interchanged. Interchanging two rows or two columns are denoted by Ri $ Rj and Ci $ Cj respectively. Hence, det(A) ¼ det (A) if Ri $ Rj or Ci $ Cj. (iii) The value of det(A) is zero if any of the two rows or columns are identical. (iv) For a matrix A, if any row or column gets multiplied by a scalar value k then det(A) ¼ kdet(A). (v) If a matrix A can be expressed as a sum of two matrices (say B and C) then det(A) ¼ det (B) þ det (C). iv. Inverse of a Matrix For a square matrix A of order n, A is said to be invertible if there exist another square matrix B of the same order n, such that below equation satisfies:
trðAÞ ¼ a1,1 þ a2,2 þ an,n The trace of a matrix satisfy the following properties: (i) (ii) (iii) (iv)
x c1 ¼ c2 y
To know whether the system of linear equations has a unique solution or not, we compute a number which defines the uniqueness of a solution and is known as the determinant (det) formulated as det(A) for a matrix A as:
then
(i) (ii) (iii) (iv) (v)
b d
tr(A þ B) ¼ tr(A) þ tr(B) and tr(A B) ¼ tr(A) tr(B) tr(kA) ¼ ktr(A), where k is a scalar. tr(AB) ¼ tr(BA) tr(AT) ¼ tr(A)
iii. Determinant of a Matrix: We have studied about matrices and arithmetic of matrices in the previous sections. In this section we learn that a system of algebraic equations can be expressed in the form of matrices as shown below: ax þ by ¼ c1 cx þ dy ¼ c2 can be expressed as:
AB ¼ BA ¼ I Then, B is called the inverse of A and is denoted by A1 (read as A inverse). Moreover, if B is the inverse of A, then A is also the inverse of B. A matrix that is not invertible is sometimes called a singular matrix, and an invertible matrix is called a nonsingular matrix. The inverse of a matrix follow the below properties: (i) Let A ¼
a
b
c
d
, if ad bc 6¼ 0, then A is invertible
d b : If ad bc ¼ 0, then A is c a not invertible. That is, a square matrix is only invertible if and only if its det(A) 6¼ 0. (ii) If matrix A of size n n is invertible, then for each b in ℜn, the equation Ax ¼ b has the unique solution x ¼ A1b. 1 and A1 ¼ adbc
M
840
Matrix Algebra
(iii) If A is an invertible matrix, then A1 is also invertible and (A1)1 ¼ A. (iv) If A and B are n n invertible matrices, then so is AB, then the inverse of AB is the product of A and B in the reverse order as: (AB)1 ¼ B1A1. (v) If A is an invertible matrix, then so is AT, then the inverse of AT is the transpose of A1 as: (AT)1 ¼ (A1)T. Note: The product of n n invertible matrices is invertible, and the inverse is the product of their inverses in the reverse order. Eigenvalues and Eigenvectors In the previous section we discussed about scalar and matrix multiplication with matrix A. In this section, we will discuss the ! results and outcome of multiplying a vector p with the matrix ! A. To define formally, let A be a matrix of order n n, p be a non-zero column vector of order n 1 and l be a scalar. Then ! p is an eigenvector of A and l is a eigenvalue of A if the below condition is satisfied: !
!
A p ¼ lp
Matrix Algebra, Table 1 Number of chairs and tables made in different factories Factory I II III
Number of chairs 23 34 30
Solution 1: The order of the matrix is given by the product of the number of rows n and number of columns m. Therefore, we need to find the real values of m and n such that mn ¼ 9. Thus, the possible order of the matrix can be: 1 9, 9 1, 3 3. Problem 2: Consider Table 1 representing the number of tables and chair made in factories I, II and III. Represent this information in terms of a 3 2 matrix. What does the element in the second row and third column represent? Solution 2: From the Table, the number of rows are three and number of columns are two. Hence, the order of the matrix is 3 2 and the matrix can be written as shown below:
!
Matrix A should be square and p should be non-zero. In order to find the eigenvectors or eigenvalues given A, we need ! ! ! to find a non-zero vector p and a scalar l such that A p ¼ lp , which can be further solved as follows: !
!
A p ¼ lp !
!
!
A p lp ¼ 0 !
!
ðA lI Þp ¼ 0
The eigenvalues and eigenvectors of a matrix satisfy the following properties for an invertible matrix A of order n:
A¼
Problem 1: If a matrix has 9 elements then what are the possible order of the matrix?
12 15
30
18
32
Problem 3: Let A, B, and C be the three matrices given below:
A¼
2 12 4 5 1
B¼
1 2 14 15 31
32
1
C¼
1
2
3
4
5
10
32
Find: (i) A þ B (ii) A 2B (iii) 3(A þ B) (iv) AC Solution 3: (i) Let D ¼ A þ B
D¼
2
12
4
5
3
1
2þ1
Case Study In this section, examples on matrix operations and arithmetic are discussed.
23 34
The element in the second row and third column represent that 15 tables are made in the factory II.
3 (i) For a triangular matrix A, the diagonal values of A are the eigenvalues of A. ! (ii) If l is an eigenvalue of A with eigenvector p , then 1l is ! an eigenvalue of A1 with eigenvector p : T (iii) Eigenvalue of A and A is same as l. (iv) Sum of eigenvalues of A is equal to the trace of A. (v) The product of eigenvalue of A is equal to the determinant of A.
Number of tables 12 15 18
D¼
1 þ
3 þ 31 (ii) Let E ¼ A - 2B
14 15 31
12 þ 2
4 þ 14 5 þ 15 1þ1
2
¼
1 3
14
18
20
34
2
23
Matrix Algebra
2 E¼
841
12
1
4
5
2 14
3
1
31 E¼
2
22
12 4
15 , E ¼
4 28
5 30
1
3 62
12
0
8
24
25
59
1
A ¼ T
F ¼ 3 18 34
33
20 , F ¼
18 3 20 3
2
34 3 42
54
60
102
6
F¼
G¼
12
4
5
3 G¼
1
4 5
10
32
23
41þ54
42þ55
4 3 þ 5 10
31þ14
32þ15
3 3 þ 1 10
50
64
126
24
33
62
7
11
19
4
5
1
3
1
1
1
12
4
1
þ1
Problem 5: Find rank of the matrix A ¼
4
5
1
2 3
2
1 4
3
0 5
Solution 5: Reduce A into its echelon form:
33
1
2
3
0 0
3 0
2 0
The above matrix is in row echelon form. Number of nonzero rows ¼ 2. Hence the rank of matrix A ¼ 2. Problem 6: Find eigenvalues of the matrix A ¼
33
Find: (i) AT (ii) tr(A) (iii) det(A) (iv) A1 Solution 4:
and find eigenvector for each eigenvalue.
3 3
Solution 6: det ðA lI Þ ¼
3
15
3
9
T
(i) A can be obtained by interchanging rows and columns as shown below:
M
33
A¼
Problem 4: Let A be a matrix as given below:
A¼
5
þ1ðð4 1Þ ð5 3ÞÞdet ðAÞ ¼ 15
2 3 þ 12 10
1
33
3 1 3 1 1 1 det ðAÞ ¼ 2ðð5 1Þ ð1 1ÞÞ 12ðð4 1Þð1 3ÞÞ
3
12
1
(iii) Determinant of matrix A is:
det ðAÞ ¼ 2
1 2
2
1
23
2 1 þ 12 4 2 2 þ 12 5
G¼
1
trðAÞ ¼ 8
14 3
(iv) Let G ¼ AC 2
3 1
trðAÞ ¼ 2 þ 5 þ 1
14
9
4 5
(ii) The trace of a matrix can be obtained by adding the diagonal elements as shown below:
(iii) From (i) D ¼ A þ B hence, let F ¼ 3D 3
2 12
det ðA lI Þ ¼
3 l
l
1
0
0
1
15
3 9l det ðA lI Þ ¼ ðð5 1Þ ð1 1ÞÞ 12ðð4 1Þ ð1 3ÞÞ þ 1ðð4 1Þ ð5 3ÞÞ det ðAÞ ¼ 15
15 9
842
Maximum Entropy Method
Conclusion Algebraic operations with matrices help in analyzing and solving equations in geocomputing. In this chapter we saw some basic operations for handling two or more matrices. Furthermore, the definitions and properties in this chapter also provide basic tools for handling many applications of linear algebra involving two or more matrices.
Cross-References ▶ Earth System Science ▶ Eigenvalues and Eigenvectors ▶ Geocomputing ▶ Multivariate Data Analysis in Geosciences, Tools
Bibliography Cheney W, Kincaid D (2009) Linear algebra: theory and applications. Aust Math Soc 110:544–550 Lay DC, McDonald J. Addison-Wesley, 2012, Linear algebra and its applications, 0321388836, 9780321388834. Sharma R, Sharma A (2017) Curriculum of Mathematic implemented by NCERT under the plan NCF-2005 at secondary level: an analytical study. Department of Education University of Calcutta, 2277, pp 18–25
Maximum Entropy Method Dionissios T. Hristopulos1 and Emmanouil A. Varouchakis2 1 School of Electrical and Computer Engineering, Technical University of Crete, Chania, Crete, Greece 2 School of Mineral Resources Engineering, Technical University of Crete, Chania, Greece
Abbreviations MEM MaxEnt BME
maximum entropy method maximum entropy Bayesian Maximum entropy
Definition The principle of maximum entropy states that the most suitable probability model for a given system maximizes the Shannon entropy subject to the constraints imposed by the data and – if available – other prior knowledge of the system. The maximum entropy distribution is the most general probability distribution function conditionally on the constraints. In the geosciences, the principle of maximum entropy is
mainly used in two ways: (1) in the maximum entropy method (MEM) for the parametric estimation of the power spectrum and (2) for constructing joint probability models suitable for spatial and spatiotemporal datasets.
Overview The concept of entropy was introduced in thermodynamics by the German physicist Rudolf Clausius in the nineteenth century. Clausius used entropy to measure the thermal energy of a machine per unit temperature which cannot be used to generate useful work. The Austrian physicist Ludwig Boltzmann used entropy in statistical mechanics to quantify the randomness (disorder) of a system. The statistical mechanics definition of entropy reflects the number of microscopic configurations which are accessible by the system. In the twentieth century, the concept of entropy was used by the American mathematician Claude Shannon (1948) to measure the average information contained in signals. Shannon’s influential paper founded the field of information theory. Consequently, the terms Shannon entropy and information entropy are used to distinguish between the entropy content of signals and the mechanistic notion of entropy used in thermodynamics and statistical mechanics. The connection between information theory and statistical mechanics was investigated in two seminal papers by the American physicist Edwin T. Jaynes (1957a, b). He showed that the formulation of statistical mechanics can be derived from the principle of maximum entropy without the need for additional assumptions. The principle of maximum entropy is instrumental in establishing this connection; it dictates that given partial knowledge of the system, the least biased estimate for the probability distribution maximizes Shannon entropy under the specified constraints. According to Jaynes, “Entropy maximization is not an application of a law of physics, but merely a method of reasoning that ensures that no arbitrary assumptions are made.” The work of Jaynes opened the door for the application of MEM to various fields of science and engineering that involved ill-posed problems characterized by incomplete information. Notable application areas include spectral analysis (Burg 1972), image restoration (Skilling and Bryan 1984), geostatistics (Christakos 1990), quantum mechanics, condensed matter physics, tomography, crystallography, chemical spectroscopy, and astronomy, among others (Skilling 2013).
Methodology According to Laplace’s principle of indifference, if prior knowledge regarding possible outcomes of an experiment is unavailable, the uniform probability distribution is the most impartial choice. MEM employs this principle by
Maximum Entropy Method
843
incorporating data-imposed constraints in the model inference process. The MEM probability model depends on the constraints used: The MEM model for a nonnegative random variable with known mean is the exponential distribution. If the constraints include the mean and the variance, the Gaussian (normal) distribution is obtained. In the case of multivariate and spatially distributed processes, the MEM model constrained on the mean and the covariance function is the joint Gaussian distribution.
Notation In the following, it is assumed that X ¼ (X1, . . ., Xn)T (T denotes the transpose) is an n-dimensional random vector defined in a probability space (Ω, F , P ), where Ω is the sample space, F is the sigma-algebra of events, and P is the probability measure. The realizations x ℝn of the random vector X can take either discrete or continuous values. The expectation of the function g(X) over the ensemble of states x is denoted by ½gðXÞ. The expectation involves the joint probability mass function (PMF) p(x) if the random variables Xi take discrete values or the joint probability density function (PDF) f (x) if the Xi are continuous. We use the trace operator, Tr, as a unifying symbol to denote summation (if the random variables Xi are discrete) or integration (if the Xi are continuous) over all probable states x.
(observations). We will denote sample averages by means of gm ðxÞ. Such sampling averages can include the mean, variance, covariance, higher-order moments, or more complicated functions of the sample values. Then, according to the principle of maximum entropy, the probability distribution should respect the constraints (the symbol denotes equivalence) ½gm ðxÞ Trx ½gm ðxÞf ðxÞ ¼ gm ðxÞ,
ð3Þ where Trx[] denotes the “summation” over all possible states x. The above equations are supplemented by the normalization constraint Trx f(x) ¼ 1 which ensures the proper normalization of the probability distribution. The maximum entropy (henceforward, MaxEnt ) distribution maximizes the entropy under the above constraints. This implies a constrained optimization problem defined by means of the following Lagrange functional M
ℒ½f ¼ S þ
lm gm ðxÞ Trx gm ðxÞf ðxÞ m¼1
þ l0 ½1 Trx f ðxÞ:
M
Assuming that the PMF p(x) (in the discrete case) or the PDF f(x) (in the continuous case) is known, Shannon’s entropy can be expressed as pðxÞ ln pðxÞ,
discrete,
ℝ
dx1 . . .
ℝ
dxn f ðxÞ ln f ðxÞ,
0¼
dℒ lm gm ð xÞ l0 : ¼ ln f ðxÞ þ 1 df m¼1
The above equation leads to the MaxEnt probability distribution
ð1Þ
xO
S¼
ð4Þ
The minimization of ℒ[f] can be performed using the calculus of variations to find the stationary point of the Lagrange functional. At the stationary point, the functional derivative δℒ/δf vanishes, i.e.,
General Formulation
S¼
m ¼ 1, . . . , M,
f ð xÞ ¼ continuous:
ð2Þ
The entropy for both the discrete and continuous cases can be expressed as S ¼ ½ln f ðxÞ, where f() here stands for the PMF in the discrete case and the PDF in the continuous case. For the sake of brevity, in the following, we use f() to denote the PDF or the PMF and refer to it as “probability distribution.” We also use the term “summation” to denote either summation (for discrete variables) or integration (for continuous variables) over all possible states x. Let fgm ðxÞgM m¼1 represent a set of M sampling functions, the averages of which can be determined from the data
1 exp Z
M
lm gm ð xÞ ,
ln Z ¼ 1 l0 ,
ð5Þ
m¼1
where the constant Z is the so-called partition function which normalizes the MaxEnt distribution. Since f(x) is normalized by construction, it follows from Eq. (5) that M
Z ¼ Trx exp
lm gm ð xÞ :
ð6Þ
m¼1
The implication of Eqs. (5) and (6) is that l0 depends on the Lagrange multipliers flm gM m¼1 . These need to be determined by solving the following system of M nonlinear constraint equations
M
844
Maximum Entropy Method
gm ðxÞ ¼ Trx gm ðxÞf ðxÞ ¼
@Z , @lm
m ¼ 1, . . . , M:
ð7Þ
In spatial and spatiotemporal problems, the constraints involve joint moments of the field (Christakos 1990, 2000; Hristopulos 2020). The constraints are expressed in terms of real-space coordinates. In the case of time series analysis, it is customary to express the MEM solution in the spectral domain (see next section).
Spectral Analysis MEM has found considerable success in geophysics as a method for estimating the power spectrum of stationary random processes (Burg 1972; Ulrych and Bishop 1975). In spectral analysis, MEM is also known as all poles and autoregressive (AR) method. For a time series with a constant time step δt, the MEM power spectral density at frequency f is given by Pð f Þ ¼
c0 1þ
M m¼1 cm
expð2pi fm dt Þ
2
,
f N f f N,
connection has been exploited in the Bayesian MaxEnt (BME) framework for spatial and spatiotemporal model construction and estimation (Christakos 1990, 2000). BME allows incorporating prior physical knowledge of the spatial or spatiotemporal process in the probability model. In spatial estimation problems, the application of the method of maximum entropy assumes that the mean, covariance, and possibly higher-order moments provide the spatial constraints for random field models. A different approach proposes local constraints that involve geometric properties, such as the square of the gradient and the linearized curvature of the random field (Hristopulos 2020). This approach leads to spatial models with sparse precision matrix structure which is computationally beneficial for the estimation and prediction of large datasets. In recent years, various generalized entropy functions have been proposed (e.g., Ŕenyi, Tsallis, Kaniadakis entropies) for applications that involve longmemory, non-ergodic, and non-Gaussian processes. In principle, it is possible to build generalized MEM distributions using extended notions of entropy (Hristopulos 2020). The mathematical tractability and the application of such generalized maximum entropy principles in geoscience are open research topics.
ð8Þ where fN ¼ 1/2δt is the Nyquist frequency (i.e., the maximum frequency which can be resolved with the time step δt) and fc m gM m¼0 are coefficients which need to be estimated from the data. The term “all-poles” method becomes obvious based on Eq. (8), since P( f ) has poles in the complex plane. The poles coincide with the zeros in the denominator of the fraction that appears in the right-hand side of Eq. (8). The connection with AR time series models of order m lies in the fact that the latter share the spectral density given by Eq. (8). The order of the AR model which is constructed based on MaxEnt is equal to the maximum lag, mδt, for which the auto-covariance function can be reliably estimated based on the available data.
Applications Maximum entropy has been extensively used in many disciplines of geoscience. Notable fields of application include geophysics, seismology and hydrology, estimation of rainfall variability and evapotranspiration, assessment of landslide susceptibility, classification of remote sensing imagery, prediction of categorical variables, investigations of mineral potential prospectivity, and applications in land use and climate models. The principle of maximum entropy is used in Bayesian inference to obtain prior distributions (Skilling 2013). This
Summary or Conclusions The notion of entropy is central in statistical mechanics and information theory. Entropy provides a measure of the information incorporated in a specific signal or dataset. The principle of maximum entropy can be used to derive probability distributions which possess the largest possible uncertainty given the constraints imposed by the data. Equivalently, maximum entropy minimizes the amount of prior information which is integrated into the probability distribution. Thus, the maximum entropy probability distributions are unbiased with respect to what is not known. The most prominent applications of maximum entropy in geoscience include the estimation of spectral density for stationary random processes and the construction of joint probability models for spatial and spatiotemporal datasets.
Cross-References ▶ Fast Fourier Transform ▶ High-Order Spatial Stochastic Models ▶ Power Spectral Density ▶ Spatial Analysis ▶ Spatial Statistics ▶ Spectral Analysis ▶ Stochastic Geometry in the Geosciences ▶ Time Series Analysis
Maximum Entropy Spectral Analysis
845
Bibliography Burg JP (1972) The relationship between maximum entropy spectra and maximum likelihood spectra. Geophysics 37(2):375–376 Christakos G (1990) A Bayesian/maximum-entropy view to the spatial estimation problem. Math Geol 22(7):763–777 Christakos G (2000) Modern spatiotemporal geostatistics, International Association for Mathematical Geology Studies in mathematical geology, vol 6. Oxford University Press, Oxford Hristopulos DT (2020) Random fields for spatial data modeling: a primer for scientists and engineers. Springer Netherlands, Dordrecht Jaynes ET (1957a) Information theory and statistical mechanics. I. Phys Rev 106(4):620–630 Jaynes ET (1957b) Information theory and statistical mechanics. II. Phys Rev 108(2):171–190 Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423 Skilling J (2013) Maximum entropy and Bayesian methods: Cambridge, England, 1988, vol 36. Springer Science & Business Media, Cham Skilling J, Bryan R (1984) Maximum entropy image reconstructiongeneral algorithm. Mon Not R Astron Soc 211:111–124 Ulrych TJ, Bishop TN (1975) Maximum entropy spectral analysis and autoregressive decomposition. Rev Geophys 13(1):183–200
of frequent interest is the estimation of its power spectrum (or spectral density) which is a function that describes the distribution of the variance (power or energy) of the time series across frequencies. In general, the power spectrum of a time series is unknown in advance and must be estimated from the experimental data. The main task of spectral analysis is thus the estimation of the power spectrum for the available data. There are many methods for the estimation of the power spectrum, like the periodogram, the Blackman–Tukey approach, multitaper estimators, singular value decomposition analysis, etc. MESA is another important spectral analysis method that must be added to the list, being a popular choice in geosciences (Ables 1974), and particularly appealing because of its high resolution and its good performance with short time series. The method was first proposed by Burg (1967).
MESA Method The maximum entropy spectral estimator (Burg 1967; Papoulis 1984) is the spectral density SðoÞ that maximizes the entropy (E):
Maximum Entropy Spectral Analysis Eulogio Pardo-Igúzquiza1 and Francisco J. Rodríguez-Tovar2 1 Instituto Geológico y Minero de España (IGME), Madrid, Spain 2 Universidad de Granada, Granada, Spain
Synonyms
E¼
spectral
estimation,
all-poles
spectral
Definition Maximum entropy spectral analysis (MESA) is the statistical estimation of the power spectrum of a stationary time series using the maximum entropy (ME) method. The resulting ME spectral estimator is parametric and equivalent to an autoregressive (AR) spectral estimator.
p
lnfSðoÞg do,
p
In geosciences, there is a plethora of time series of interest in applied geosciences. A sequence of varve thicknesses, a seismogram or the isotopic content in a stalagmite are but three common examples in cyclostratigraphy, geophysics and paleoclimatology, respectively. The time series is modeled as a stochastic process (or random function) where a topic
M
eioh SðoÞdo ¼ CðhÞ,
ð2Þ
for h [q, q þ 1, . . ., 1, 0, 1, . . ., q]. Where CðhÞ ¼ CðhÞ is the estimated covariance for lag h, i is the imaginary unit and o ¼ 2πf is the angular frequency (radians per sampling interval) while f is the frequency in cycles per sampling interval. Next, it may be shown that the ME spectral estimator has the form (Brockwell and Davis 1991): s2q
Sð o Þ ¼
General Comments
ð1Þ
over the class of all densities S(o) that satisfy the constraints given by making (theoretical) covariances equal to sample covariances (i.e. covariances estimated from the data): p
Autoregressive estimation
p
1þ
q
ak
eiko
2
,
ð3Þ
k¼1
which is equal to the AR spectral estimator of order q. Strictly speaking the AR estimator provided in Eq. (3) is the ME estimator when the random function (stochastic process) is Gaussian (Papoulis 1984). In the previous equations: s2q is the variance of a zero-mean white noise stochastic process that depends on the order q of
846
the AR process, and {ak; k ¼ 1, . . ., q} are the q autoregressive coefficients that are also dependent on the chosen order q. These q þ 1 parameters must be estimated from the experimental data in order to apply the ME spectral estimator of Eq. (3). A possibility for obtaining the estimation of the previous q þ 1 parameters is to use the Yule-Walker equations (Papoulis 1984). However, in order to avoid the inversion of large covariance matrixes, it is possible to use the DurbinLevinson algorithm (Brockwell and Davis 1991), which gives the estimates of the AR coefficients and noise variance recursively. Another popular recursive algorithm is given by Burg (1967) which uses estimates of forward and backward estimation errors and their mean square values (Papoulis 1984).
Advantages of MESA High frequency resolution. MESA is capable of resolving discrete frequencies close together even in short time series. Consistent power spectrum estimates. In this sense, MESA produces neither negative estimates nor those that do not coincide with their sample covariance values. There is no leakage of power between nearby frequencies. MESA does not experience the side lobe problem that is
Maximum Entropy Spectral Analysis, Fig. 1 Simulated data time series (Data in the figure) equal to the addition of signal (Signal in the figure) and noise (Noise in the figure). The similarity of signal and noise and how the spectral content is hidden in the data and not at all evident
Maximum Entropy Spectral Analysis
common with other spectral estimators and that is caused by the use of window functions. Efficient spectrogram analysis. MESA can be efficiently used for estimating the spectrogram (Pardo-Igúzquiza and Rodriguez-Tovar 2006) because it achieves good results with short time series.
Difficulties in the Application of MESA An important decision is the choice of an optimum autoregressive order q for the MESA estimator given in Eq. (3). Many alternatives have been proposed. Some methods are based on criteria like the Akaike information criterion or the final error prediction (Ulrych and Bishop 1975). The order is often chosen empirically with a value between N/5 and N/2, N being the number of experimental data points. 2 N/ln(2 N) is another popular empirical choice. If the chosen order is small, the variance of the estimates is small, but the bias is large, while if the chosen order is large, the bias is small, but the variance increases. An optimal choice of the order q results in problems of spectrum line splitting, frequency shifting, spurious peaks and biased power estimation. However, these problems are
are both remarkable. The number of data points is 200, which can be considered a short time series. The signal has been shifted upwards by adding a value of 15 and the noise has been shifted downwards by a value of 15 to avoid the overlap and provide a clearer representation
Maximum Entropy Spectral Analysis
847
M
Maximum Entropy Spectral Analysis, Fig. 2 (a): MESA estimated power spectrum of the observed data in Fig. 1 (200 experimental data points). (b): Achieved confidence level of the estimated power spectrum
given in A. The vertical dashed red lines give the location of the theoretical frequencies of the sinusoids of the signal
848
Maximum Entropy Spectral Analysis, Fig. 3 (a): MESA estimated power spectrum of the observed data with a simulation scheme identical to the one used in Fig. 1 but extending the length of the different time series from 200 to 2000 data points. (b): Achieved confidence level of
Maximum Entropy Spectral Analysis
the estimated power spectrum given in A. The vertical dashed red lines show the location of the theoretical frequencies of the sinusoids of the signal
Maximum Entropy Spectral Analysis
common to all spectral methods if the data are of low quality (few data points, irregular sampling, high level of noise, etc.). Another difficulty is the statistical evaluation of the reliability or uncertainty of the estimated power spectrum. This evaluation is complex and must be based on strong assumptions regarding the model that should be followed by the experimental data (Baggeroer 1976). This problem has been solved by using the computer intensive permutation test method (Pardo-Igúzquiza and Rodriguez-Tovar 2005). The permutation test is easy to apply, non-parametric but based on the computer intensive method. The original time series is ordered at random (a permutation sequence) many times (for example 5000 times) and for each disordered sequence the ME power spectrum is estimated. The statistical significance of the power spectrum estimated from the original time series may then be assessed by calculating how many times it is higher than the power spectrum at the same frequency for all permutation sequences.
Examples of Application The first example is a short time series (200 data points) that has been simulated following the model: observed data equal to signal plus noise. The signal consists of the superposition of six sinusoids of equal amplitude with frequencies forming three pairs: one in the low (0.05 and 0.06), another in the
849
middle (0.25 and 0.26) and a third pair in the high frequencies (0.42 and 0.43), where all these are given in cycles per sampling interval that is considered unity. The amplitude of each sinusoid gives a variance of the signal equal to 3. The white noise is a sequence of Gaussian numbers drawn from a Gaussian distribution with mean zero and variance 12. Thus, the signal-to-noise ratio (SNR) of variances is 3/12 ¼ 0.25. The observed data, signal and noise are represented in Fig. 1. The high similitude of observed data and noise is evident because of the low SNR and it can be correctly said that the signal is hidden in the noise, thus it is a very challenging case. The estimated power spectrum (with p ¼ 67) of the time series of observed data is shown in Fig. 2A and the achieved confidence level of the permutation test is shown in Fig. 2B. Figure 2A demonstrates how the three couples of sinusoidal components have been identified although with a frequency shift in the lowest frequency component. Figure 2B demonstrates how all the components have been identified with high statistical significance (> 99%) except the fourth component, i.e. the high frequency component in the middle frequencies. When the number of data points of those observed (signal + noise) is increased to 2000, Figs. 3A (with p ¼ 150) and 3B show how these difficulties disappear. The second example is a sequence of real data (oxygen isotope of δ18 O) measured along a drill core sample of ocean sediment (Raymo et al. 1992). The time series has been represented in Fig. 4 and it has 866 data points with a
Maximum Entropy Spectral Analysis, Fig. 4 Time series of 866 data points of oxygen isotope δ18 O (Raymo et al. 1992)
M
850
Maximum Entropy Spectral Analysis
Maximum Entropy Spectral Analysis, Fig. 5 (a): MESA estimated power spectrum of the time series shown in Fig. 4. (b): Achieved confidence level of the estimated power spectrum given in A
Maximum Likelihood
sampling interval of 3 kyrs. The power spectrum estimated by MESA with an autoregressive order p ¼ 150 is shown in Fig. 5A and the significance level achieved by the permutation test is shown in Fig. 5B. Figure 5B demonstrates how the most statistically significant frequencies correspond to periodicities of 19, 22.4, 41, 77, 97, 115 and 230 kyrs. These estimated cycles are in line with Milankovitch periodicities of precession (19 and 22.4 kyrs), obliquity (41 kyrs) and short eccentricity (97 and 115 kyrs). The 77 kyrs can be explained as a combination tone of the main periodicities and the 230 kyrs a harmonic of eccentricity.
Summary Maximum entropy spectral analysis is a high-resolution spectral estimator that has been widely used in geosciences. It is based on choosing the spectrum that corresponds to the most random (i.e. unpredictable) time series whose covariance function coincides with the known values (Burg 1975). It comes out that the solution to choosing the maximum entropy spectrum is equal to the autoregressive estimator of the power spectrum. The main problem is to select an optimal value for the autoregressive order q. There are several proposals for choosing a reasonable order of the estimator and the statistical significance of the estimated spectrum can be easily assessed by a computer intensive method like the permutation test.
851 Burg JP (1967) Maximum entropy spectral analysis. 37th Annual International Meeting of the Society for the Exploration of Geophysics. Oklahoma City, OK, pp 34–41 Burg JP (1975) Maximum entropy spectral analysis. PhD Dissertation, Stanford University, 127 p Papoulis A (1984) Probability, random variables and stochastic processes: McGraw-Hill Intern. Editions, Singapore, 576 p Pardo-Igúzquiza E, Rodriguez-Tovar FJ (2005) MAXENPER: a program for maximum entropy spectral estimation with assessment of statistical significance by the permutation test. Comput Geosci 31(5): 555–567 Pardo-Igúzquiza E, Rodriguez-Tovar FJ (2006) Maximum entropy spectral analysis of climatic time series revisited: assessing the statistical significance of estimated spectral peaks. J Geophys Res Atmos 111(D10202):1–8 Raymo ME, Hodell D, Jansen E (1992) Response of deep ocean circulation to the initiation of northern hemisphere glaciation (3–2 myr). Paleoceanography 7:645–672 Ulrych TJ, Bishop TN (1975) Maximum entropy spectral analysis and autoregressive decomposition. Rev Geophys Space Phys 13(1): 183–200
Maximum Likelihood Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition Cross-References ▶ Autocorrelation ▶ Bayesian Maximum Entropy ▶ Entropy ▶ Maximum Entropy Method ▶ Moving Average ▶ Power Spectral Density ▶ Signal Analysis ▶ Signal Processing in Geosciences ▶ Spectral Analysis ▶ Time Series Analysis ▶ Time Series Analysis in the Geosciences
Bibliography Ables JG (1974) Maximum entropy spectral analysis. Astron Astrophys Suppl 15:383–393 Baggeroer AB (1976) Confidence intervals for regression (MEM) spectral estimates. IEEE Trans Inf Theory 22(5):534–545 Brockwell PJ, Davis RA (1991) Times series: theory and methods, 2nd edn. Springer, New York, p 577pp
Estimating a probability distribution to fit the available observed data is an important process required by statisticians and data analysts. Such estimation is required for prediction/ forecasting, missing data analysis, learning functions with uncertain outcomes, etc. where the estimated probability distribution model of the observed data provides the representative behavior. Once a set of random variables has been identified to model specific observation data, a joint probability distribution is to be defined on the set. The joint probability density function of n independent and identically distributed (i.i.d.) observations data x1, x2,. . ., xn from this process, where the probability function is conditioned on a set of parameters θ, is given by likelihood function f ðx1 , x2 , . . . , xn Þ ¼
n
f ðxi juÞ ¼ LðujxÞ. This function is rep-
i¼1
resentative of all information of the sample population from which the observations are made, and this information is what gets used for estimation in the form of an estimator. The widely used estimators are classified based on the distribution type, as parametric, semi-parametric, and nonparametric estimation methods (Greene 2012). Maximum
M
852
Maximum Likelihood
likelihood estimation (MLE) is a classical parametric estimation method, generalized method of moments (GMM) is a semi-parametric estimation method, and kernel density estimation is a non-parametric estimation method. Parametric estimators provide the advantage of a consistent approach owing to which it can be applied to a variety of estimation problems. However, it suffers from the limitation of the robustness of underlying assumptions used, when imposing normal, logistic, or any other widely used distributions to fit the given data. The semi- and nonparametric distributions alleviate the restriction of the model but then often suffer from the issue of providing a range of probabilities as outcomes. As the name suggests, MLE maximizes the likelihood function (Fisher 1912), thus, providing the argument of the likelihood function, θ, with the most probable (likely) occurrence. This is the most intuitive method, as it stems from the joint probability density or mass function of continuous and discrete random variables, respectively. The principle of maximum likelihood states that the most optimal choice of the estimator y from a set of estimators θ is the one that makes the observed data x most probable. Thus, this principle provides a choice of an asymptotically efficient estimator for a parameter or a set of parameters. To improve the practical implementation of determining the maximum value of L(θ | x), its logarithm, referred to as log-likelihood is widely used as the likelihood function in MLE. Log-likelihood function is given as ln LðujxÞ ¼ n
ln
f ðxi juÞ
i¼1
¼
n
ln f ðxi juÞ . The use of log-likelihood
i¼1
functions is more popular due to three key reasons. Firstly, it gives an accurate outcome, as the logarithm of the function attains the maximum when the function is at its maximum value. Secondly, in terms of convenience of using loglikelihood functions instead of the likelihood functions themselves, for most distributions, and especially the widely used exponential families of distributions, the former give simpler terms for extremum computation than the latter. Thirdly, using additive models is more convenient in most mathematical operations, and thus, the summation in log-likelihood functions is more convenient than the equivalent product in the likelihood function. Thus, the MLE widely considers the logarithm function as the likelihood function.
Overview A maximum likelihood estimator is a type of extremum estimator, that optimizes a criterion function, which is either L(θ | x) or (ln L(θ | x)) in the case of MLE, and the outcome is the estimator, which is the argument θ at its extremum value.
Least-squares and GMM are other extremum estimators. MLE and least-squares estimator fall under the subclass of M-estimators, where the criteria function is a sum of terms (Greene 2012). Thus, for maximum likelihood estimation of given independent and identically distributed (i.i.d) observation data x1, x2, . . ., xn, and selected parameters θ, its estimated value is given as: y ¼ arg max
1 n
n
ln f ðxi juÞ .
i¼1
In practice, estimation properties are evaluated to determine the optimal model within a class of estimators. Maximum likelihood methods have desirable properties, such as the outcomes tend to have minimum variance, thus the smallest confidence interval, and they have approximately normal distributions and sample variances to be used for confidence bounds determination and hypothesis testing. Being an M estimator, MLE has desirable properties of consistency and asymptotic normality. Suppose θ0 is the unknown parameter of the selected model for which θ is the approximated one, consistency of the estimator implies y ! y0 , as n ! 1. Asymptotic normality of y implies that in addition to the estimator definitely converging to the unknown parameter, it does so at the rate of p1n. This implies faster convergence with larger observations/sample set, p given as n y y0 !d N 0, s2y0 , where s2y0 is called as the asymptotic variance of the estimate y (Panchenko 2006). At the same time, MLE suffers from the limitations of specificity of the likelihood equations, nontrivial numerical estimation, heavy bias in the case of small samples, and sensitivity to initial conditions (Heckert and Filliben 2003). MLE is attributed to R. A. Fisher owing to the official documented appearance of the term “maximum likelihood” in literature (Fisher 1912). However, there is evidence of the usage of MLE prior to 1912, albeit with the use of different terminology, e.g., “most probable.” Early variants of MLE include estimating using the maximum value of the posterior distribution, referred to as inverse probability, and minimum of variance of “probable errors” (Edgeworth 1908). The seminal works by both Edgeworth and Fisher have independently established the efficiency of maximum likelihood and the minimization of the asymptotic variance of M-estimators (Pratt 1976). While Edgeworth and predecessors, including Gauss, Laplace, and Hagen, used the principle of inverse probability using posterior mode as an estimator, Fisher replaced the posterior mode with the MLE, thus achieving invariance (Hald 1999). Today, the widespread use of MLE may be equally attributed to both Edgeworth and Fisher for providing proof of the optimality of y. Apart from the use of the maximum likelihood principle for estimation, it can also be effectively used for testing a model using the loglikelihood ratio statistic (Akaike 1973). Thus, these two implications of the principle can be combined to a single problem of statistical decision.
McCammon, Richard B.
Applications MLE has been used in geoscientific applications for a long time, which is evident from the fact that geodesists and civil engineers have been early adopters of the predecessors to MLE (Hald 1999). Today, MLE in itself is used in a variety of geoscientific applications, of which a few are listed here. It is one of the methods used in predicting the occurrence of mineral deposits at a regional scale, electrofacies classification in reservoirs for petroleum engineering, parameter estimation for reservoir modeling using multimodal data (geophysical logs, remote sensing images, etc.), and quantifying epistemic uncertainty in geostatistical applications pertaining to natural resources and environment (Sagar et al. 2018). The principle of maximum likelihood is also extended to semantic classification, applied to satellite images (Bolstad and Lillesand 1991). In maximum likelihood classification, the different classes, which are present in the bands of multispectral satellite images, are modeled using different normal distributions, and the labeling is done for each pixel-based on the probability it belongs to each class.
853 Pratt JW (1976) FY Edgeworth and RA Fisher on the efficiency of maximum likelihood estimation. Ann Stat 4(3):501–514 Sagar BSD, Cheng Q, Agterberg F (2018) Handbook of mathematical geosciences: fifty years of IAMG. Springer Nature, Cham, Switzerland
McCammon, Richard B. Michael E. Hohn Morgantown, WV, USA
Future Scope Fig. 1 Richard B. McCammon, courtesy of Dr. Richard McCammon
The scope of applications of utilizing maximum likelihood is continuously growing larger, owing to its popularity over several decades. Increasingly, the maximum likelihoodbased methods, such as MLE and classification, use combinations of methods, e.g., polynomial methods, to improve the efficiency of the data analysis methods used in the application. Overall, the maximum likelihood estimation is an efficient statistical modeling and inference method.
Bibliography Akaike H (1973) Information theory and an extension of the maximum likelihood principle. Proceeding of the second international symposium on information theory, pp 267–281 Bolstad P, Lillesand T (1991) Rapid maximum likelihood classification. Photogramm Eng Remote Sens 57(1):67–74 Edgeworth FY (1908) On the probable errors of frequency-constants. J R Stat Soc 71(2):381–397 Fisher RA (1912) On an absolute criterion for fitting frequency curves. Messenger Math 41:155–156 Greene WH (2012) Econometric Analysis, 7th edn. Prentice Hall, Pearson Hald A (1999) On the history of maximum likelihood in relation to inverse probability and least squares. Stat Sci 14(2):214–222 Heckert NA, Filliben JJ (2003) NIST/SEMATECH e-handbook of statistical methods; chapter 1: exploratory data analysis. https://www.itl. nist.gov/div898/handbook/eda/section3/eda3652.htm Panchenko D (2006) 18.650 Statistics for Applications. Massachusetts Institute of Technology. MIT OpenCourseWare. https://ocw.mit.edu/ courses/mathematics/18-443-statistics-for-applications-fall-2006/lec ture-notes/lecture3.pdf
M Biography Richard McCammon, founding member of the International Association for Mathematical Geology (IAMG), Krumbein Medalist, editor, longtime scientist with the US Geological Survey (USGS), and researcher in geologic resources, was born on December 3, 1932, in Indianapolis, Indiana (USA), son of Bert McCammon, a pharmacist in that city, and wife Mary. He graduated from Broad Ripple High School in 1951 and went on to college at the Massachusetts Institute of Technology, receiving a Bachelor’s degree in 1955. His initial ambition was to receive a degree in chemical engineering, but in his junior year he changed his major to geology, realizing the appeal of working outdoors. He received a Master’s degree in geology from the University of Michigan in 1956 and a Ph.D. from Indiana University in 1959. His dissertation led to an interest in probability and statistics and a postdoctoral fellowship in statistics at the University of Chicago. Following several academic positions and 7 years at Gulf Research and Development Company, Richard took up a position with the USGS in the mid-1970s, where he spent the balance of his career, achieving the position of Chief of the Branch of Resource Analysis in 1992. He conducted research in assessment of mineral resources and application of numerical methods to mineral exploration.
854
Author or coauthor of more than 100 publications and technical reports, Richard was an early user of cluster analysis and other multivariate methods for purposes of classification in the 1960s, when these methods were first being applied to geological problems. His research reports and papers at the USGS addressed numerical methods for estimating number and size of undiscovered mineral resources in a region. His activities in the IAMG include serving as Western Treasurer, 1980–1984; Secretary General, 1984–1989; President, 1989–1992; and Past President 1992–1994. He was the second Editor in Chief for Mathematical Geology (now Mathematical Geosciences), serving in that capacity from 1976–1980. He negotiated with a publisher for works too long for journals, resulting in the Monograph Series in Mathematical Geology (now Studies in Mathematical Geosciences), which he edited 1987–1998. His interest in founding a journal dedicated to the appraisal of natural resources led him to again work with a publisher to create Nonrenewable Resources (now Natural Resources Research) and serve as its first editor in chief, 1992–1998. In recognition for his research contributions, service to the profession, and service to the IAMG, Richard B. McCammon received the prestigious Krumbein Medal in 1992. Acknowledgments I obtained personal, academic, and career details from Ricardo Olea’s biography of Richard McCammon (Mathematical Geology, Vol. 27, p. 463). The IAMG website (www.iamg.org) lists his terms as an IAMG officer and editor. I would like to thank Frits Agterberg for confirming that Richard was present at the inaugural meeting in Prague, Czech Republic, in 1968; and B. S. Daya Sagar for providing the photograph.
Membership Functions Candan Gokceoglu Department of Geological Engineering, Hacettepe University, Ankara, Turkey
Definition Unlike the classical set theory, the transitions in a universe can be smooth, and the definitions of elements can be mixed and diversified with the help of membership definitions of a given set. An element defined with fuzzy sets can be members of different classes, which ensure the gradual transition between them. The fuzzy sets can unfold and model the vagueness and the uncertainty in the universe by designing various degrees of memberships at the boundaries. Hence, a function needs to be designed for defining the memberships of the elements, and
Membership Functions
these functions are defined as membership functions. Figure 1 shows a general difference between characteristic function for crisp set A and membership function for fuzzy set A.
Introduction A fuzzy set is a class of object with a continuum of grades of membership, and such a set is characterized by a membership function which assigns to each object a grade of membership ranging between 0 and 1 (Zadeh 1965). Modeling has become an important research problem in Earth Sciences due to the development of data collection and processing methods in the last few decades. However, the models utilized in Earth Sciences are extremely complex, since they deal with natural events and processes. In addition, it is clear that expressing the elements employed in a model as crisp sets would not be adequate to eliminate the uncertainty. Instead, expressing each element with a degree of membership will decrease the uncertainty and yield to more representative models. Therefore, the use of membership functions in modeling problems in Earth Sciences has a significant contribution for addressing the natural phenomena or processes properly. Mapping is an important concept in relating set-theoretic forms to function-theoretic representations of information (Ross 2004). The main characteristics of membership functions were described by Dombi (1990). According to the membership function properties described by Dombi (1990), all membership functions are continuous, and all membership functions map an interval [a, b] to [0, 1], m [a, b] ! [0, 1]. In addition, the membership functions are either increasing or decreasing monotonically or can be divided into a monotonically increasing or decreasing part (Dombi 1990). The monotonous membership functions on the whole interval are either convex functions or concave functions, or there exists a point c in the interval [a, b] such that [a, c] is convex and [c, b] is concave (called S-shaped functions) (Dombi 1990). Another property of membership functions is that monotonically increasing functions have the property u (a) ¼ 0, u (b) ¼ 1, while monotonically decreasing functions have the property u (a) ¼ 1, u (b) ¼ 0 (Dombi 1990). Finally, the linear form or linearization of the membership function is important (Dombi 1990). Figure 2 shows the general properties of a trapezoidal membership function. As can be seen from Fig. 2, the “core” of a membership function for a fuzzy set A is defined as the region of a universe that is characterized by a complete and full membership in the set A, and hence, the core comprises those elements x of the universe such that mA (x) ¼ 1. The “support” of a membership function for some fuzzy set A is defined as that region of the universe that is characterized by nonzero membership
Membership Functions
855
Membership Functions, Fig. 1 General view of a characteristic function for a crisp set and membership function for a fuzzy set. (Rearranged after Berkan and Trubatch (1997))
mðxÞ ¼ exp
xx s
2
(iii) “Generalized bell membership function” is governed by three parameters. These are a (responsible for its width), c (responsible for its center), and b (responsible for its slopes). mðxÞ ¼
Membership Functions, Fig. 2 General properties of a membership function. (Rearranged after Ross (2004))
in the set A, and it means that the support comprises those elements x of the universe such that mA (x) > 0 (Ross 2004).
Types of Membership Functions In this entry, the most common forms of membership functions are presented. The most commonly used membership functions are singleton, Gaussian, generalized bell, sigmoidal, and triangular and trapezoidal membership functions (Rutkowski 2004). Their mathematical descriptions are provided below. (i) “Singleton membership function” is defined as unity at a particular point and zero everywhere else. mðxÞ5
1 for x ¼ x 0 for x ¼ x
1 1þ
xc 2b a
(iv) “Sigmoidal membership function” is controlled by two parameters such as a (controls for its slope) at the crossover point x ¼ c. mðxÞ ¼
1 1 þ exp½aðx cÞ
(v) “Triangular membership function” is completely controlled by three parameters such as a, b, and c; and a < b < c. mðxÞ ¼ max min
xa ca , ,0 ba cb
(vi) “Trapezoidal membership function” is controlled by four parameters such as a, b, c, and d; and a < b < c < d. mðxÞ ¼ max min
xa dx , 1, ,0 ba dc
The representative views of the membership functions mentioned above are given in Fig. 3.
Use of Membership Functions in Earth Sciences (ii) “Gaussian membership function” is controlled by two parameters such as ¯x (controls the center) and s (controls the width).
When the materials are natural rock, the only known fact about the certainty is that this material can never be known
M
856
Membership Functions
Membership Functions, Fig. 3 Graphical form of the most commonly used membership functions
with certainty (Goodman 1995). This idea is valid for all natural materials, events, and processes. For this reason, classifying natural materials or explaining the natural processes involves uncertainty at different levels. Although the issue of uncertainty (historic, present, or future) is often mentioned, it is mostly as a side note, and it is still rarely used for quantitative and predictive purposes (Caers 2011). In the recent two decades, various fuzzy and neuro-fuzzy modeling studies for classification and prediction problems in Earth Sciences have been demonstrated, and all the models have membership functions. In fact, the features of membership functions fit the classification and prediction of events and processes in Earth Sciences. One of the pioneering fuzzy studies was performed by den Hartog et al. (1997) for the performance prediction of a rock-cutting trencher, in which trapezoidal membership functions were used. Subsequently, several studies in the international literature of the Earth Sciences have been published. Prediction of susceptible zones to landsliding (Akgun et al. 2012) and the prediction of rockburst (Adoko et al. 2013) can be listed as examples for such studies. Akgun et al. (2012) used trapezoidal and triangular membership functions, while Adoko et al. (2013) employed Gaussian
membership functions. The type of membership functions depends on the nature of problem, and the membership functions can be created based on expert opinion or by using datadriven approaches.
Summary or Conclusions This entry presents the definition of membership functions with examples to their types commonly used in Earth Sciences. Due to the complexity of natural events or processes, it is not an easy task to classify natural materials and define their properties, and to predict natural events by employing crisp classes and sets. Due to this reason, the solutions proposed for such problems have uncertainty. To minimize the uncertainty, fuzzy sets including membership functions have been used in real-world problems. Use of expert opinion is one of the major advantages of membership functions because representative and reliable data cannot be obtained each time to solve the problems in Earth Sciences. In addition, the membership functions provide a flexible and transparent tool for many inference systems.
Merriam, Daniel F.
857
Cross-References
Biography
▶ Fuzzy C-Means Clustering ▶ Fuzzy Set Theory in Geosciences ▶ Uncertainty Quantification
Daniel (Dan) Francis Merriam (1927–2017) is known as one of the leading pioneers of mathematical geology and one of the foremost exponents of computer applications in the geosciences. Following distinguished service with the US Navy in the Pacific in World War II, Dan’s professional development began at the University of Kansas where he received an undergraduate degree in geology in 1949. Following 2 years of field work with Union Oil, Dan took a job with the Kansas Geological Survey (KGS) at the University of Kansas and enrolled in graduate school, receiving his MS degree in geology in 1953 and his PhD in 1961. He subsequently received the M.Sc. degree (1969) and D.Sc. degree (1975) from Leicester University, England. Dan became the Survey’s Chief of Geologic Research in 1963, where his promotion of quantitative analysis of geologic data began in earnest. With Dan as editor of the Survey’s Special Distribution Series (1964–1966), numerous papers on quantitative analysis were published with computer programs in ALGOL, BALGOL, and FORTRAN II. Dan then served as editor of the KGS series Computer Contributions, an irregular publication of 50 issues (1966–1970) with computer programs for analysis of geologic data (available online at www.kgs.ku.edu). In 1969, while at the KGS, Dan also became the editor for the new Plenum Press series, Computer Applications in the Earth Sciences. During the dramatic events of the Prague Spring in 1968, Dan participated as a founding member of the International Association for Mathematical Geology (IAMG - now International Association for Mathematical Geosciences). This led to Dan’s creation of IAMG’s international journals, Mathematical Geology (now Mathematical Geosciences) in 1969 and, with Elsevier, Computers & Geosciences in 1975, both edited by Dan for many years. He also edited another IAMG journal, Natural Resources Research, starting in 1999. In addition, Dan edited 11 books and published more than 100 papers during his career. Dan was elected Secretary General of IAMG in 1972 and President in 1978. Among his many awards and distinctions, IAMG presented Dan the sixth William Christian Krumbein Medal (1981) for his scientific contributions and support of the geologic profession in general and the IAMG in particular (Davis 1982). The Geologic Society (London) presented him with the William Smith Medal in 1992. In 1971, Dan moved to Syracuse University as Chairman of the Department of Geology. With Dan’s leadership, Syracuse University became known as a leading center for quantitative studies in the geosciences. In 1981, Dan accepted a position as Chairman of the Department of Geology at Wichita State University in Kansas, where he stayed until 1991 when he rejoined the KGS in Lawrence, Kansas. Retiring in
Bibliography Adoko AC, Gokceoglu C, Wu L, Zuo QJ (2013) Knowledge-based and data-driven fuzzy modeling for rockburst prediction. Int J Rock Mech Min Sci 61:86–95 Akgun A, Sezer EA, Nefeslioglu HA, Gokceoglu C, Pradhan B (2012) An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Comput Geosci 38:23–34 Berkan RC, Trubatch SL (1997) Fuzzy system design principles. IEEE Press, New York, p 495 Caers J (2011) Modeling uncertainty in the earth sciences. Wiley, Hoboken, p 229 den Hartog MH, Babuska R, Deketh HJR, Alvarez Grima M, Verhoef PNW (1997) Knowledge-based fuzzy model for performance prediction of a rock-cutting trencher. Int J Approx Reason 16:43–66 Dombi J (1990) Membership function as an evaluation. Fuzzy Sets Syst 35(1):1–21 Goodman RE (1995) Block theory and its application. Geotechnique 45(3):383–423 Ross TJ (2004) Fuzzy logic with engineering applications, 2nd edn. Wiley, Chichester, p 652 Rutkowski L (2004) Flexible neuro-fuzzy systems: structures, learning and performance evaluation. Kluwer Academic, London, p 279 Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353
Merriam, Daniel F. D. Collins Emeritus Associate Scientist, The University of Kansas, Lawrence, KS, USA
Fig. 1 Dan Merriam (© public domain)
M
858
1997, Dan remained active as an Emeritus Scientist for most of the last two decades of his life.
Bibliography Davis JC (1982) Association announcement. Math Geol 14(6):679–681 Harbaugh JW, Merriam DF (1968) Computer applications in stratigraphic analysis. Wiley, New York Merriam DF (ed) (1966) Computer applications in the earth sciences: Colloquium on classificational procedures. Kansas Geological Survey, computer contribution series 7. Available online. http://www. kgs.ku.edu/Publications/Bulletins/CC/7/CompContr7.pdf Merriam DF (ed) (1967) Computer applications in the earth sciences: Colloquium on time-series analysis. Kansas Geological Survey, computer contribution series 18. Available online. http://www.kgs.ku. edu/Publications/Bulletins/CC/18/CompContr18.pdf Merriam DF (1999) Reminiscences of the editor of the Kansas geological survey computer contributions, 1966–1970 and a byte. Comput Geosci 25:321–334
Metadata Simon J. D. Cox CSIRO, Melbourne, VIC, Australia
Definition Metadata is information about a data resource or transaction. Metadata provides contextual and summary information about a dataset or data item. Metadata may be useful in discovery, selection, management, access, and use of the described resources.
Metadata
catalogue. Metadata for books was standardized through Cataloguing-in-Publication records to be prepared by publishers, the MARC standards from Library of Congress, and in the context of library management software developed by organizations such as OCLC, alongside local and disciplinary standards. These practices had a strong influence on the development of “generic” metadata standards through the Dublin Core Metadata Initiative (Kunze and Baker 2007; DCMI Usage Board 2020), which emerged in the web era, but have been influential quite broadly and are used directly in many metadata records and systems. Dublin Core now includes nearly 80 terms, but the classic 15 terms, which are most widely used, indicate the overall scope. Dublin core element Title Description Subject Coverage (spatial and temporal scope) Language Type Identifier Creator, contributor, and publisher Date Format License, rights Relation
Key application(s) Discovery and selection Discovery and selection Discovery and selection Discovery and selection Discovery and selection, use Discovery and selection, management Management, access Management, provenance Provenance, discovery, and selection Access, use Access, use Discovery and selection, provenance
Web Pages Metadata Standards Metadata refers to information about other data, as distinct from information embodied within it. While descriptive records may be constructed for a wide variety of resources, including services, processes, and physical objects, those relating to data and datasets are often denoted metadata. Private metadata is created and stored as a routine process in data management systems. However, for external or public use, standard metadata structures and formats should be used.
Metadata also appears in web pages, where it may be used for web search indexing. Several of the most prominent search systems sponsored the development of a more comprehensive set of metadata tags in Schema.org (Guha et al. 2015). This is maintained through an open community process. Schema.org now includes hundreds of elements, though only a subset of these are used for the general-purpose indexes. However, a profile known as Science-on-Schema.org has been developed by the earth and environmental science community and is published and harvested in some specialized indexes (Jones et al. 2021).
Libraries Geospatial Libraries and archives had a long-standing practice of summary descriptions of the “information resources” which they manage, for example, in the form of index cards in a
Much geoscience data is georeferenced. Geospatial metadata standards were originally developed somewhat
Metadata
859
Metadata, Fig. 1 The DCAT metadata model, showing relationships between datasets, dataset-series, data-services, and catalog entries
independently, driven by requirements of map and image libraries, which were influenced by rapid growth of nonproprietary remote sensing data. Important general geospatial metadata standards were designed by the US Federal Geographic Data Committee (FGDC), and the Australia and New Zealand Land Information Council (ANZLIC). These were consolidated and standardized internationally as ISO 19115 (ISO/TC 211 2014, 2016, 2019). Data discovery systems based on these ISO standards are available from many statutory data providers in the Geosciences (e.g., geological surveys).
Semantic Web The W3C’s Resource Description Framework (RDF) provides a generic platform for metadata, which enables component vocabularies that have been defined for various concerns to be mixed and matched as required. For example, there are W3C-conformant vocabularies covering a number of specific concerns as follows: Vocabulary SKOS DQV
Concern Organizing keywords and classifications Data quality
ODRL
Rights and licensing
PROV-O
Provenance
Reference Isaac and Summers (2009) Albertoni and Isaac (2016) Iannella et al. (2018) Lebo et al. (2013) (continued)
Vocabulary GeoSPARQL
Concern Geospatial descriptions
OWL-Time
Temporal descriptions
Reference Perry and Herring (2012) Cox and Little (2017)
The Data Catalog Vocabulary (Albertoni et al. 2020) combines these with Dublin Core to cover most of the concerns of a comprehensive metadata record for datasets and services, including geospatial (see Fig. 1 for a summary). The threads are brought together in GeoDCAT (Perego and van Nuffelen 2020), which is effectively an RDF implementation of the ISO 19115 Geospatial metadata standard, accomplished by using elements from many of these vocabularies.
Keywords and Controlled Vocabularies Metadata for geoscience data usually includes keywords and classifiers from domain specific vocabularies, such as minerals, lithology, the geological timescale, and taxonomic references, as well as lists of procedures, sensors, software, and algorithms that may have been used in the production of a dataset (i.e., data provenance), and therefore be important in discovery and selection of datasets by potential users. Geoscience vocabularies are provided by agencies such as geological surveys. Maximum interoperability and transparency are achieved by the use of FAIR vocabularies, where each term or vocabulary item is denoted by a unique, persistent web identifier or IRI (Wilkinson et al. 2016).
M
860
Summary Metadata describes datasets in order to support a number of functions, including finding, selecting, accessing, using, and managing data. Metadata intended for public use should follow standards. Specialist geospatial metadata standards are used by many agencies and data repositories, while general metadata standards have been developed by the web community.
Cross-References ▶ FAIR Data Principles ▶ Ontology ▶ Spatial Data Infrastructure and Generalization
Bibliography Albertoni R, Isaac A (2016) Data on the web best practices: data quality vocabulary. W3C Working Group Note. World Wide Web Consortium. https://www.w3.org/TR/vocab-dqv/ Albertoni R, Browning D, Cox SJD, Gonzalez-Beltran A, Perego A, Winstanley P (2020) Data Catalog Vocabulary (DCAT) – Version 2. W3C Recommendation. World Wide Web Consortium. https:// www.w3.org/TR/vocab-dcat/ Cox SJD, Little C (2017) Time ontology in OWL. W3C Recommendation. World Wide Web Consortium. https://www.w3.org/TR/owltime/ DCMI Usage Board (2020) DCMI metadata terms. DCMI Recommendation. 20 January 2020. https://dublincore.org/specifications/ dublin-core/dcmi-terms/ Guha RV, Brickley D, Macbeth S (2015) Schema.Org: evolution of structured data on the web. ACM Queue 13(9) 28 pp Iannella R, Steidl M, Myles S, Rodriguez-Doncel V (2018) ODRL vocabulary & expression 2.2. W3C Recommendation. World Wide Web Consortium. https://www.w3.org/TR/odrl-vocab/ Isaac, Antoine, and Ed Summers. 2009. SKOS simple knowledge organization system primer. W3C Working Group Note. World Wide Web Consortium. https://www.w3.org/TR/skos-primer/. ISO/TC 211 (2014) ISO 19115-1:2014 geographic information – metadata – Part 1: Fundamentals. International Standard ISO 19115-1. Geographic Information. International Organization for Standardization, Geneva. https://www.iso.org/standard/53798.html ISO/TC 211 (2016) ISO/TS 19115-3:2016 geographic information – metadata – Part 3: XML schema implementation for fundamental concepts. International Standard, Geneva. https://www.iso.org/ standard/32579.html ISO/TC 211 (2019) ISO 19115-2:2019 geographic information – metadata – Part 2: Extensions for acquisition and processing. International Standard, Geneva. https://www.iso.org/standard/67039.html Jones M, Richard SM, Vieglais D, Shepherd A, Duerr R, Fils D, McGibbney L (2021) Science-on-Schema.Org v1.2.0. Zenodo. https://doi.org/10.5281/zenodo.4477164 Kunze J, Baker T (2007) The Dublin core metadata element set. IETF RFC 5013. Internet Engineering Task Force (IETF). https:// datatracker.ietf.org/doc/html/rfc5013
Mine Planning Lebo T, Sahoo S, McGuinness DL (2013) PROV-O: the PROVontology. W3C Recommendation. World Wide Web Consortium, Cambridge, MA. http://www.w3.org/TR/prov-o/ Perego A, van Nuffelen B (2020) GeoDCAT-AP – Version 2.0.0. Recommendation. European Commission. https://semiceu.github.io/ GeoDCAT-AP/releases/2.0.0/ Perry M, Herring J (2012) OGC GeoSPARQL – a geographic query language for RDF data. OGC Standard 11-052r4. OGC 11-052r4. Open Geospatial Consortium, Wayland. https://portal. opengeospatial.org/files/47664 Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N et al (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3(1):160018. https://doi.org/10.1038/sdata.2016.18
Mine Planning N. Caciagli Metals Exploration, BHP Ltd, Toronto, ON, Canada Barrick Gold Corp, Toronto, ON, USA
Synonyms Mine design
Definition Mine planning encompasses the design and implementation of safe, sustainable, and profitable plans for mining operations, with the aim of maximizing value.
Introduction The extraction of natural resources is fundamental to modern society. Almost everything we touch and use has some component that was mined. However, the ore bodies that are being discovered are increasingly lower grade and more difficult to mine. Additionally, the social and environmental impacts to mining need to be addressed and in many jurisdictions are becoming more regulated. As a result, the extraction and processing methods must become more intricate and the planning process and models more detailed and robust.
Inputs into Mine Planning Inputs into the mine planning process include:
Mine Planning
• Orebody characteristics contained within geological, geotechnical, and geometallurgical models • Mining method and layouts, taking into account the local mining and environmental regulations regarding safety, noise, dust, and other impacts to workers and the environment • Technical inputs such as cut-off grades and requirements for blending or stockpiling • Operational inputs such as ore loss and dilution factors, mining, and processing costs • Processing plant capacity and feed requirements • Final product requirements and specifications (e.g., penalty element content) • Commodity value based on both short- and long-range forecasting Much of the mine planning process aims to generate to a set of high-level options around these inputs, building several hypothetical models or plans to be tested to determine which plan yields the highest value (Whittle 2011). Any one of the inputs or decisions can affect all the others; for example, constraints around the processing method will affect the optimum cut-off grades, changes in cut-off grade will affect the mining schedules, and changes in schedules affect which pit shells or mine levels should be selected as mine phases. A change in the commodity price will affect everything, right back to the mine design (Whittle 2010). Additionally, mine planning has the added dimension of time and needs to consider when and how each phase or structure is implemented. Mine planning examines specific problems at different stages of the mining value chain and can be broken down into three views to better address these: • Long-term mine planning • Medium-term mine planning • Short-term mine planning Long-term mine planning aims to define the “best” mine plan over the full life of mine, subject to the constraints imposed by physical and geological conditions, as well as corporate and regulatory policies. The term “best” is typically defined as maximizing the monetary value and guarantying a safe operation but also includes quality characteristics of the product extracted and delivered (Goodfellow and Dimitrakopoulos 2017). Long-term mine planning also considers capital investment decisions, sovereign risk issues as well as market demands for the material to be mined. Medium-term mine planning focuses on issues such as mobile equipment capital investment, strategic positioning for optimization of product quality, and processing plant
861
capacity. This also considers stope access or drift development for underground mines or pit phases and pushbacks for open pit mines. Short-term mine planning focuses on production scheduling. Where to mine next week, is it accessible? Is there enough blasted inventory to feed the crusher and the plant?
Methodology and Tools Most commercial open pit mine planning tools use a NestedPit methodology based on the Lerchs and Grossmann algorithm (Lerchs and Grossmann 1965). The Lerchs-Grossmann algorithm determines the economic three-dimensional shape of the pit considering block grades, pit slopes, costs, recoveries, and metal prices and is based on the valuation of each block in the model. Also known as the Nested Pit method this procedure breaks down the problem into two parts: it determines a set of push-packs (or phases) to be used as guides to construct a production schedule and then schedules the blocks in each push-back, bench by bench. Underground mine planning has no equivalent methodology for determining an “ultimate pit limit,” rather the models or plans to be tested and assessed focus on the on the choice of “best” mining method (e.g., stoping or caving methods), development and layout of infrastructure, design of stope boundaries or envelopes, and production scheduling. For underground mining there are a few optimization algorithms developed in the 1990s and early 2000s. Alford (1995) developed the heuristic floating stope algorithm to define the optimum stope envelopes, analogous to the floating cone algorithm used in open pit optimization. Others have presented variants on the heuristic method first proposed by Alford utilizing a maximum value neighborhood (MVN) heuristic algorithm, integer programming techniques, or 3D modeling (Musingwini 2016 and references therein). Other work has focused on selecting optimal mining methods or planning mine development and layout using neural networks or genetic algorithms (Newman 2010 and references therein). However, a method to integrate the layout and development planning, sizing of stope envelopes, production scheduling, equipment selection, and utilization with the aim to enable just-in-time development analogous to open-pit planning remains elusive. It is understood that the uncertainties associated with the input parameters, especially metal grades or commodity prices, can lead to large deviations from expected budgets. However, accounting for these in mine planning remains an active area of research and development (Musingwini 2016 and references therein, Newman et al. 2010 and references therein, Davis and Newman 2008 CRC mining conference).
M
862
Mineral Prospectivity Analysis
Sources of Uncertainty
Bibliography
Orebody characterization attempts to quantify the sources of uncertainty that contribute most to the differences between planned and operational outcomes. Geological uncertainty represents the level of confidence in the mineralogical characterization, grade, continuity, as well as of the extent and position of geological units. Resource model estimates are continuous interpolations of discretely obtained data, but the models do not capture the real variability of the deposit and tend to reduce the variability of the deposit attributes (Jélvez et al. 2020; Goodfellow and Dimitrakopoulos 2017) which in turn leads to cascading impacts within the mine planning outputs. Accounting for these uncertainties allows for the development of models that are more realistic and creates the opportunity to develop options within the mine plan (Davis and Newman 2008 CRC mining conference). Jélvez et al. (2020) looked at incorporating uncertainty into the mine planning process by representing geological uncertainty as a series of stochastic simulations of the block model. This methodology developed strategies for the definition of the ultimate pit limit, pushback selection, and production scheduling. Goodfellow and Dimitrakopoulos (2017) implemented a stochastic approach to optimizing the entire mining complex incorporating the variability of materials and metal content into the simultaneous production scheduling, ore dispatching, and processing (stockpiling or blending) decision variables. In both case studies a stochastic design is better able to meet the production targets and simultaneously reduce the risk in production profiles and increase the expected NPV, highlighting the importance of characterizing uncertainty and incorporating it into mine planning work.
Goodfellow R, Dimitrakopoulos R (2017) Simultaneous stochastic optimization of mining complexes and mineral value chains. Math Geosci 49:341–360 Goycoolea M, Espinoza D, Moreno E, Rivera O (2015) Comparing new and traditional methodologies for production scheduling in open pit mining. Application of computers and operations research in the mineral industry. In: Proceedings of the 37th international symposium: APCOM 2015 Jélvez E, Morales N, Ortíz JM (2020) Impact of geological uncertainty at different stages of the open-pit mine production planning process. In: Proceedings of the 28th international symposium on mine planning and equipment selection 2019, pp 83–91 Lerchs H, Grossmann I (1965) Optimal design of open-pit mines. Trans CIM 68:17–24 Musingwini C (2016) Presidential address: optimization in underground mine planning- developments and opportunities. J S Afr Inst Min Metall 116(9):1–12 Newman AM, Rubio E, Caro R, Weintraub A, Eurek K (2010) A review of operations research in mine planning. Interfaces 40(3):222–245 Whittle J (1989) The facts and fallacies of open pit optimization. Whittle Programming, North Balwyn Whittle G (2010) Enterprise optimisation. Mine planning and equipment selection, 2010. The Australasian Institute of Mining and Metallurgy publication series no 11/2010. The Australasian Institute of Mining and Metallurgy, Melbourne Whittle D (2011) Open-pit planning and design. In: Darling P (ed) Mining engineering handbook, 3rd edn. Society for Mining, Metallurgy, and Exploration, Englewood
Conclusions Mining has become more challenging given the nature of the orebodies being discovered, health, safety, and environmental requirements and the sophisticated extraction and processing methods available. This has demanded the mine planning process solve increasingly more complex problems and necessitating more intricate and integrated solutions.
Cross-References ▶ Geotechnics ▶ Spatial Statistics ▶ Three-Dimensional Geologic Modeling
Mineral Prospectivity Analysis Emmanuel John M. Carranza Department of Geology, Faculty of Natural and Agricultural Sciences, University of the Free State, Bloemfontein, Republic of South Africa
Definition The phrase “mineral prospectivity” denotes the probability or chance that mineral deposits of a certain class can be discovered in an area because the geology of this area is permissive or favorable to the formation of such class of mineral deposits. The phrase is synonymous to the phrase “mineral favorability” or “mineral potential,” either of which denotes the possibility or likelihood that a certain class of mineral deposits exist in an area because the geology of this area is permissive or favorable to the formation of such class of mineral deposits. However, the phrase “mineral prospectivity” is used in this encyclopedia because the term “prospectivity” relates to the search, prospecting, or exploration of mineral deposits of economic importance. Mineral prospectivity analysis, therefore, aims to predict where undiscovered mineral deposits of a certain class exist in order guide mineral exploration.
Mineral Prospectivity Analysis
Introduction Because the occurrence of any mineral deposit is revealed by the existence of certain pieces of geological evidence (for example, abnormal concentrations of certain metals in certain earth materials), mineral prospectivity is associated to the amount of geological evidence. Therefore, as shown in Fig. 1, mineral prospectivity analysis dwells on two assumptions. First, an area is prospective if it is typified by pieces or layers of geological evidence that are the same or similar to those in areas where mineral deposits of a certain class exist. Second, if in one area there exist more pieces or layers of geological evidence than in another area in a geologically permissive or favorable terrane, then the degree of mineral prospectivity in the first area is higher than in the second area. Mineral prospectivity analysis can be linked to every or any stage of mineral exploration, from broad regional-scale to
Mineral Prospectivity Analysis, Fig. 1 Level of mineral prospectivity is correlated with level of spatial coincidence of relevant layers of evidence. Areas with the same or similar level of mineral prospectivity as existing mineral deposits (triangles) are regarded as exploration targets
863
restricted local-scale, whereby it involves analysis and synthesis of pieces or layers of evidence, in map form, obtained from spatial data of different geoscientific types (e.g., geophysical, geological, and geochemical). The purpose of that is to outline exploration targets (Fig. 1) and to prioritize such targets for exploration of yet to be discovered mineral deposits of the class of interest. On a regional-scale, mineral prospectivity analysis endeavors to outline exploration targets within continental-scale geologically permissive regions; on district- to local-scales, mineral prospectivity analysis seeks to delineate exploration targets in more detail. These imply that the spatial resolution, detail, accuracy, and information content of individual sets of geoscientific data, as well as understanding of mineral deposit occurrence, employed in mineral prospectivity analysis for exploration targeting must increase progressively from regional- to local-scale.
M
864
Conceptual Modeling of Mineral Prospectivity The analysis of mineral prospectivity should relate to mineral deposits of a single class or related mineral deposit belonging to a single mineral system. Therefore, a map of mineral prospectivity for porphyry Cu deposits is not appropriate for guiding exploration for diamond deposits, or the other way around. At every scale of exploration targeting, however, mineral prospectivity analysis adheres to definite steps commencing with the conceptualization of prospectivity for mineral deposit class or mineral system of interest (Fig. 2). Theoretical and spatial relationships among various features, which are factors or controls on how and where mineral deposit of a certain class forms, comprise the mineral prospectivity conceptual model. This model prescribes in expressions and/or illustrations of the theoretical or hypothetical interplays among different geological processes in the measure of how and particularly where mineral deposits of the class of interest possibly form. Conceptualization of prospectivity for mineral deposits in an area needs a thorough review of literature on characteristics and processes of formation of mineral deposits of the class of interest, such as described in mineral deposit models (e.g., Cox and Singer 1986). A robust mineral prospectivity conceptual model is one that also adheres particularly to the mineral systems approach to exploration targeting. This approach focuses on three main themes of geological elements (or processes) that are critical to the formation of mineral deposits (Walshe et al. 2005; McCuaig et al. 2010): (a) source of metals; (b) pathways for metal-bearing fluids; and (c) depositional traps. Analysis of the spatial distribution of mineral deposits of the class of interest in an area and analysis of the spatial relationships of such mineral deposits with certain geological features in the same area can provide supplementary knowledge to the conceptualization of prospectivity for mineral deposits (Carranza 2009). The spatial (geophysical, geological, and geochemical) features of known mineral deposits of the class of interest, whether they exist in the area currently being explored or in Mineral Prospectivity Analysis, Fig. 2 Components of mineral prospectivity analysis. Solid arrows portray input-output workflow. Dashed arrows denote support among or feedback between components
Mineral Prospectivity Analysis
other areas with similar geological setting, comprise the criteria for recognition or modeling of prospectivity. These criteria and the mineral prospectivity conceptual model instruct the structure of mineral prospectivity analysis with regard to the choice of appropriate of the following: (a) Sets of geoscientific spatial data to be employed. (b) Evidential elements to be mapped from each set of geoscientific spatial data. (c) Techniques for converting each map of evidential elements into a map of prospectivity recognition criterion. (d) Techniques for weighting of each class or range of values in a map of prospectivity recognition criterion to generate a predictor map. (e) Techniques of combining predictor maps to generate a mineral prospectivity map. The items (b), (c), and (d) pertain to the analysis of mineral prospectivity parameters.
Preparation of Evidence Maps A map that portrays a prospectivity recognition criterion is called an evidential map, or in Geographic Information System parlance, an evidential layer. Techniques for mapping evidential elements expressive of a criterion of mineral prospectivity are explicit to types of geoscientific spatial data (i.e., geophysical, geological, and geochemical). Concepts of mapping of anomalies from geochemical data for mineral exploration are discussed in the chapter for Exploration Geochemistry. Geological elements of specific classes of mineral deposits, such as hydrothermal alterations, are detectable and mappable by remote sensing (see chapters on Hyperspectral Remote Sensing, Mathematical Methods in Remote Sensing Data and Image Analysis, Remote Sensing). Geological elements such as structures or rock units considered as proxies of controls of mineral deposit formation can be extracted from
Mineral Prospectivity Analysis
existing geological maps or mapped from images of suitable geophysical data (e.g., Kearey et al. 2002). Some geoscientific spatial data or some types of mapped evidential elements must be manipulated or transformed in order to portray them as evidential maps. The choice of technique to manipulate or transform specific relevant data, to generate an evidential map, depends on which criterion for recognition of prospectivity is to be represented. For example, representation of fluid pathways as a control on mineral deposit formation first needs extraction of certain structures from a spatial database, then generation of a map of distances to such structures, and finally classification of distances into ranges of proximity. This manipulation or transformation is generally known as buffering (see chapter on Geographic Information System). Similarly, representation of chemical traps as a control of mineral deposit formation may first need appropriate interpolation of metal concentration data at sample locations and then classification of the interpolated metal data into different levels of presence of chemical traps. For concepts of methods of interpolation, see chapters on Interpolation, Inverse Weighted Distance, or Kriging. Classification is also a typical operation in a Geographic Information System. The purpose of manipulation or transformation of a map of evidential elements or image of relevant spatial data to generate an evidential map is to represent the degree of presence of evidential elements per location.
Generation and Integration of Predictor Maps Techniques for the assignment of weights to each class or range of values in an evidential map, in order to generate a predictor map, involve either data- or knowledge-driven analysis of their spatial relationships with discovered mineral deposits of the class of interest (Carranza 2008). On the one hand, data-driven techniques quantify spatial relationships between a mineral deposit location map and each class or range of values in an evidential map; quantified degrees of spatial relationship translate to weights of each class or range of values in an evidential map. On the other hand, knowledgedriven techniques make use of expert opinions to assign weights to each class or range of values in an evidential map. Not every technique for data-driven quantification of spatial relationships between a mineral deposit location map and an evidential map steers can be used for the generation and then combination of predictor maps to obtain a predictive mineral prospectivity map. Data-driven techniques that only quantify spatial relationships between classes or range of values in an evidential map are those that are useful in providing supplementary knowledge to the conceptualization of prospectivity for mineral deposits as mentioned above. In contrast, every technique for knowledge-driven predictor map generation can be used straight away for predictor map
865
combination to generate a predictive mineral prospectivity map. Therefore, mineral prospectivity analysis is either data- or knowledge-driven. The main distinction between data- and knowledge-driven mineral prospectivity analyses lies in the generation of predictor maps. Data-driven mineral prospectivity analysis uses known mineral deposits to assign (i.e., quantify) weights to each class or range of values in an evidential map. Knowledge-driven mineral prospectivity analysis uses expert opinion to assign (i.e., qualify) weights to each class or range of values in an evidential map. Datadriven mineral prospectivity analysis is suitable in moderately to well-explored geologically permissive areas (also known as “brownfields”) where several mineral deposits of the class of interest exist. Knowledge-driven mineral prospectivity analysis is suitable in less-explored geologically permissive areas (also known as “greenfields”) where no or very few mineral deposits of the class of interest are known to exist. In either case, a predictive mineral prospectivity map is generated by integrating at least two predictor maps (Fig. 1) using a mathematical (e.g., probability, likelihood, favorability, or belief) function that appropriately represents or models the interplays among geological processes, represented by each predictor map that leads to mineral deposit occurrence or formation, thus: predictive mineral prospectivity map ¼ f(predictor maps). The mathematical function f used typically in mineral prospectivity analysis conveys empirical interactions among predictor maps (as proxies of factors or controls) of mineral deposit occurrence. On the one hand, knowledge-driven techniques of mineral prospectivity analysis commonly make use of logical functions (e.g., AND and/or OR operators; see Fuzzy Set Theory) for successive combination of predictor maps guided by an inference system (Fig. 3). On the other hand, data-driven techniques of mineral prospectivity analysis commonly make use of mathematical functions for instantaneous combination of predictor maps irrespective of knowledge on the interplays of geological factors or controls of mineral deposit occurrence. A few data-driven techniques use functions signifying logical operations (e.g., Dempster’s rule of integrating belief functions; see Carranza et al. 2008) successive combination of predictor maps guided by an inference system. Similarly, a few knowledge-driven techniques use mathematical (mainly arithmetic or logical) functions for instantaneous combination of predictor maps. For mineral prospectivity analysis, the most commonly used data-driven technique is weight-of-evidence analysis, and the most commonly used knowledge-driven technique involves the application of the fuzzy set theory (for details, see Bonham-Carter 1994). Most published researches on mineral prospectivity analysis can be found in the following journal: Natural Resources Research, Ore Geology Reviews, Mathematical Geosciences and Computers & Geosciences.
M
866
Mineral Prospectivity Analysis
Mineral Prospectivity Analysis, Fig. 3 Example of inference system using suitable logical functions (e.g., AND and/or OR operators, etc.) for successive combination of predictor maps through knowledge-driven mineral prospectivity analysis. An inference system portrays an analyst’s understanding or an expert’s opinion of the interplays of geological factors or controls of mineral deposit occurrence
Evaluation of Results of Mineral Prospectivity Analysis Every technique of mineral prospectivity analysis produces systemic (or procedure-related) errors vis-à-vis the interplays of geological factors or controls of mineral deposit occurrence. Moreover, each set of geoscientific spatial data or each evidential map employed in mineral prospectivity analysis usually carries parametric (or data-related) errors vis-à-vis the spatial distribution of mineral deposits. Because these types of errors are unavoidable in mineral prospectivity analysis and because they spread from input data to evidential map to predictor map to final output mineral prospectivity map, it is necessary to employ strategies for predictive map evaluation and, if needed, to reduce error in each step of mineral prospectivity analysis. The ideal strategy to evaluate the predictive capacity of a mineral prospectivity map is, of course, to use it as basis for guiding mineral exploration and hope that it leads to mineral deposit discovery. However, this is not a practical strategy because it usually takes several years before a mineral deposit is discovered through grassroots exploration. A practical way to evaluate results of mineral prospectivity analysis is to answer the following question. For at least two predictive mineral prospectivity maps (say, generated by different techniques or generated by the same techniques but using different parameters), which map (a) delineates the smallest proportion of prospective areas but (b) contains the largest proportion of existing mineral deposits of the class of interest? To answer and visualize the answer to this question, one must create a so-called area-occurrence plot per mineral prospectivity map (Fig. 4) (Agterberg and Bonham-Carter
2005). This question is relevant to results of either data- or knowledge-driven mineral prospectivity analysis. For knowledge-driven mineral prospectivity analysis, the area-occurrence plot is appropriately called a prediction-rate plot because mineral deposits are not used to create predictor maps (i.e., assign weights to each class or range of values in an evidential map. Therefore, according to Fig. 4, the map based on mineral prospectivity model 1 has greater predictive capacity than the map based on mineral prospectivity model 2. For example, considering predicted prospective areas occupying 10% of a region, the predictive capacity of mineral prospectivity model 1 is about 75% whereas that of mineral prospectivity model 2 is just about 45%. For data-driven mineral prospectivity analysis in which all existing mineral deposits are used to create predictor maps, the area-occurrence plot is appropriately a success-rate plot. That is, the success-rate plot portrays only the goodness-of-fit between evidential maps and the map of mineral deposits locations used for calculating the weights for each class or range of values in each evidential map. Therefore, according to Fig. 4, the map based on mineral prospectivity model 1 has greater goodness-of-fit with the mineral deposits based on which it was created compared to the map based on mineral prospectivity model 2. To determine the predictive capacity of a data-driven mineral prospectivity model, one must answer the following question: Assume that, for a study area, we split the existing mineral deposits of the class of interest into two sets. The first set, called the training set, is used in mineral prospectivity analysis. What proportion of the second set, called the validation set (i.e., mineral deposits presumed to be undiscovered), coincides spatially with high prospectivity
Mineral Prospectivity Analysis
867
Mineral Prospectivity Analysis, Fig. 4 Area-occurrence plots for evaluation of mineral prospectivity models. The procedure for creating such plots involves a series of delineation of prospective areas in a mineral prospectivity map using decreasing threshold values at narrow, say 5-percentile, intervals from the highest to the lowest prospectivity values, which yield, respectively, the lowest and the highest proportions of prospective areas. Then, the next step is to determine the cumulative increasing proportions of existing mineral deposits contained in
respective cumulative increasing prospective areas defined by decreasing prospectivity values. Finally, proportions of contained mineral deposits are plotted on the y-axis, and proportions of prospective areas are plotted on the x-axis. The gauge line represents a random spatial relationship between mineral deposits and prospectivity values in a mineral prospectivity map. For a mineral prospectivity map, the higher above its occurrence-area plot is relative to the gauge line, the stronger is its predictive capacity
values derived from the first set? This question is the principle of blind testing of data-driven mineral prospectivity analysis (cf. Fabbri and Chung 2008). It applies to evaluation of mineral prospectivity analyses using one technique or different techniques. Similarly, to answer and visualize the answer to this question, one must create a so-called area-occurrence plot per mineral prospectivity map. Moreover, the role of the first and second sets of mineral deposits can be switched to compare the predictive capacities of two mineral prospectivity models using the same technique of mineral prospectivity analysis. In this case, the area-occurrence plot is appropriately called a prediction-rate plot; therefore, according to Fig. 4, the map based on mineral prospectivity model 1 has greater predictive capacity than the map based on mineral prospectivity model 2.
spatial information for management of mineral exploration. The routines in mineral prospectivity analysis can be tedious and complex, but nowadays these are facilitated with use of a Geographic Information System. However, mineral prospectivity analysis is a typical “garbage in, garbage out” activity because, as shown in Fig. 1, if the mineral prospectivity conceptual model is wrong, then the output mineral prospectivity map is wrong. Therefore, knowledge of how and where mineral deposits form is the most critical element of mineral prospectivity analysis. It is judicious to evaluate performance of mineral prospectivity maps before they are used to guide mineral exploration.
Conclusions A predictive mineral prospectivity map is generally an empirical model, portraying the chance of finding undiscovered mineral deposits of the class of interest. This is a critical
Cross-References ▶ Exploration Geochemistry ▶ Favorability Analysis ▶ Fuzzy Set Theory in Geosciences ▶ Hyperspectral Remote Sensing ▶ Interpolation ▶ Inverse Distance Weight
M
868
▶ Kriging ▶ Object-Based Image Analysis ▶ Remote Sensing ▶ Spatial Analysis
Bibliography Agterberg FP, Bonham-Carter GF (2005) Measuring the performance of mineral-potential maps. Nat Resour Res 14(1):1–17 Bonham-Carter GF (1994) Geographic Information Systems for Geoscientists: Modelling with GIS. Pergamon, Oxford Carranza EJM (2008) Geochemical anomaly and mineral prospectivity mapping in GIS. Handbook of exploration and environmental geochemistry, vol 11. Elsevier, Amsterdam Carranza EJM (2009) Controls on mineral deposit occurrence inferred from analysis of their spatial pattern and spatial association with geological features. Ore Geol Rev 35:383–400 Carranza EJM, van Ruitenbeek FJA, Hecker C, van der Meijde M, van der Meer FD (2008) Knowledge-guided data-driven evidential belief modeling of mineral prospectivity in Cabo de Gata, SE Spain. Int J Appl Earth Obs Geoinf 10(3):374–387 Cox DP, Singer DA (eds) (1986) Mineral deposit models. U.S. geological survey bulletin 1693. United States Government Printing Office, Washington, DC Fabbri AG, Chung CJ (2008) On blind tests and spatial prediction models. Nat Resour Res 17(2):107–118 Kearey P, Brooks M, Hill I (2002) An introduction to geophysical exploration, 3rd edn. Blackwell Scientific Publications, Oxford McCuaig TC, Beresford S, Hronsky J (2010) Translating the mineral systems approach into an effective exploration targeting system. Ore Geol Rev 38(3):128–138 Walshe JL, Cooke DR, Neumayr P (2005) Five questions for fun and profit: a mineral systems perspective on metallogenic epochs, provinces and magmatic hydrothermal Cu and Au deposits. In: Mao J, Bierlein FP (eds) Mineral deposit research: Meeting the global challenge, vol 1. Springer, Heidelberg, pp 477–480
Minimum Entropy Deconvolution Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition In reflection seismology, the reflected seismic waves are used to study the geophysical characteristics of the earth subsurface. Thus, in a seismogram model, deconvolution is routinely used to extract the earth reflectivity signal from a noisy seismic trace. The conventional method is used to whiten the spectra of the earth reflectivity signal, i.e., to assume that the signal has a constant value as the Fourier
Minimum Entropy Deconvolution
transform. This, however, reconstructs only a part of the signal, as the assumption also implies that the signal is a Dirac delta function δ, i.e., a spiking function. To adapt this method to include the practical characteristics of the signal of being noisy, band-limited, etc., minimum entropy deconvolution (MED) is used, where it finds the smallest number of the large spikes that are consistent with the data using minimum Wiggins’ entropy measure (Wiggins 1978). MED was proposed by Wiggins in 1978, and MED has been inspired by the factor analysis techniques.
Overview The seismic trace, xt, that is measured has been observed to be a convolution between the earth reflectivity, yt and a wavelet function, wt. To obtain yt, a straightforward signal processing process is to perform a deconvolution process, say using a linear scheme. However, in real life, the signal xt tends to be contaminated with noisy signal, nt. Thus, the normal incidence seismogram model is given by: xt ¼ wt yt þ nt. A filter ft, which upon convolution with the wavelet function gives the Dirac delta function δt, is desired. Thus, ft wt ¼ δt. However, the estimated filter f t gives an averaging function, at, i.e., f t wt ¼ at . Thus, to get the estimated reflectivity signal y t ¼ at y t þ f t nt ¼ y t þ ð at dt Þ y t þ f t nt : The desirable solution is that the at is close to δt and nt is relatively low. Given the band-pass nature of the seismic signals, only a part of the reflectivity signal is retrieved, i.e., the band-limited signal after removing the wavelet, yt ¼ yt at . Thus, yt and yt are referred to as the band-limited and the full-band reflectivity signals. Estimating yt from yt is known to be a non-unique linear inverse problem. The Fourier transform gives Y o ¼ Ao Y o , for frequency o. There are an infinite number of solutions for yt that satisfies the Fourier transform equation, especially when Ao 6¼ 0. MED has been proposed as a solution to reduce the nonuniqueness of the solution (Wiggins 1978) by finding those solutions which provide “minimum structure” to the reflectivity signal. The “minimum structure” is represented by “minimum entropy,” where entropy is proportional to the energy dispersal of the seismic signal. Entropy is also inversely proportional to normalized variance V. The difference between the traditional spiking deconvolution, i.e., predictive deconvolution, and MED is that the former whitens the signal, thus giving minimum normalized variance, whereas MED identifies isolated spikes, thus giving maximum normalized variance (Wiggins 1978).
Minimum Entropy Deconvolution
869
Inspired by factor analysis techniques, the entropy term is given in terms of V of a vector y of reflection coefficients of length N, with amplitude-normalized reflectivity measure q0i , and a monotonically increasing function F q0i :
in the presence of additive noise, when compared to MED. In a different kind of enhancement of MED using the varimax norm, the iterative solution is implemented in the frequency domain by minimizing the objective function F, which is the sum of the amplitudes of the signal values using a linear N 1 programming (LP) formulation (Sacchi et al. 1994). Using 1 1 2 V ðyÞ ¼ q0i :F q0i , where q0i ¼ y4i y : k MED in the frequency domain, referred to as FMED, is an k N N:FðN Þ i¼1 effective method for processing band-limited data. FMED Given this background (Sacchi et al. 1994), the following additionally improves the estimation of higher frequencies compared to the original MED. inequality holds: FFððN1ÞÞ V 1. Thus, the lower bound of V is when all the samples have equal value. The MED is obtained when F q0i ¼ q0i , which is referred to as MED norm or Wiggin’s entropy function. The original algorithm (Wiggins 1978) had used the varimax norm, given by q0i ¼ 2 . To resolve the sensitivity of MED norm y^i4 N1 k y2k to strong reflections, other norms have been used, of which the logarithmic norm or the logarithm entropy function, i.e., F q0i ¼ ln q0i , has been found to be effective. A trivial solution for the reflectivity signal yt is f t ¼ x1 t , yt ¼ dt, which requires inverting the seismic trace, which may not exist. Hence, a filter of fixed length (L ) is applied to avoid the requirement of inverse computation. Thus, yn ¼ L l¼1
f l xnl . Maximizing V implies @V@fðyÞ ¼ 0. k
This optimization outputs a vector b, with bi ¼ Gðqi Þyi
1 N
j Gðqi Þqj
1
. The final solution is solved using
matrix system equation of f R ¼ g( f ), where R is the Toeplitz matrix of the input data, and the vector g(f) is the cross-correlation between b and x. Solving for the fixed length filter is achieved by using an iterative method, given by f(n) ¼ R1 g(f(n1)) (Sacchi et al. 1994). In the original algorithm (Wiggins 1978), the reduction is R f ¼ g, where the matrix R is referred to as an autocorrelation matrix, that is a weighted sum of the autocorrelations of the input signals, and g is the weighted sum of the cross-correlations of the filter outputs cubed with the input signals. Each iteration involves solving the system using Levinson’s algorithm and reproducing the features of the reflectivity signal. If the filter length L is chosen appropriately, the system maximizes V (Wiggins 1978). A few significant improvements to the original MED method (Wiggins 1978) include the use of D norm instead of the varimax norm (Cabrelli 1984) and implementing the iterative solution in the frequency domain instead of the signal domain (Sacchi et al. 1994). The D norm is defined as DðyÞ ¼ max kjyykkj , based on its equivalence to the geomet1kN
rical interpretation of the varimax norm (Cabrelli 1984). Using MED with D norm, referred to as MEDD, has improved the signal-to-noise ratio of the outputs, and stability
Applications MED has been popularly used for seismic signal analysis in the 1980s and 1990s (Nose-Filho et al. 2018), after which the use of MED has increasingly shifted to applications outside of geophysics, e.g., speech signals (González et al. 1995) and early fault detection (EFD) in rotating machinery (Wei et al. 2019). In applications for seismic deconvolution, the predictive convolution and MED filters work with the single-input, single-output (SISO) systems, whereas the requirements have shifted to single-input, multiple-output (SIMO) systems (Nose-Filho et al. 2018). This shift has been required owing to the use of the common shot gather (CSG), which involves multiple receivers recording different shots from a single shot of the seismic wavelet. Thus, the analysis of SIMO seismic systems has paved the way to multichannel blind deconvolution (MBD) methods. Inspired by MED, sparse MBD (SMBD) accounts for sparsity in reflectivity signals. As an example of extending MED to other forms of signals, it has been effectively used for period estimation of seismic signals, including natural, artificial, periodic, and quasiperiodic ECG and speech signals (González et al. 1995). MED has been effective and robust for individual period estimation. Gear failure diagnosis involves analysis of vibration signals, where the quality of MED to identify signals with maximum kurtosis can be effectively used to deconvolve the impulsive sources in gear faults (Endo and Randall 2007). The existing autoregressive (AR) prediction method is used for detecting localized faults in gears, but owing to the use of autocorrelations, AR method is not sensitive to phase relationships. These relationships are necessary for differentiating noise from gear impulses. Since MED uses phase information in the form of kurtosis, MED in combination with AR filtering helps in detecting spalls and tooth fillet cracks in gears. This is an example of how MED is effectively introduced in a workflow of an application, entirely different from a seismic one. MED has now been used in several such applications of combining with similar methods for EFD in rotating machinery (Wei et al. 2019). Analogous to the shift from SISO to SIMO systems in seismology, gear failure diagnosis uses
M
870
MED for solving the fault diagnosis for a single impulse as well as for an infinite impulse train (McDonald and Zhao 2017). Multi D norm, originally used in MEDD, has been used with an adjustment for EFD to generate non-iterative filters to be used on an infinite impulse train. This method is referred to as multipoint optimal MED adjusted (MOMEDA) method. MOMEDA has been further improved for complex working conditions by including product functions (Cai and Wang 2018). Thus, these examples demonstrate the potential of MED in providing stepping stones for improving analysis with increasing complexity in the problem. Another example of a combined method is where variational mode decomposition, L-Kurtosis, and MED are used in sequence for detecting mechanical faults, which have periodic impulses in the vibration signal (Liu and Xiang 2019).
Minimum Maximum Autocorrelation Factors
Minimum Maximum Autocorrelation Factors U. Mueller School of Science, Edith Cowan University, Joondalup, WA, Australia
Definition Let Z(u) 5 [Z1(u), Z2(u), . . ., ZK(u)] be a K-dimensional random field on some region R in two- or three-dimensional space with covariance matrix S0 and set of experimental semivariogram matrices {Γ(hi) : hi 5 ih0,h0 ℝ2 (ℝ3), i 5 1,2, . . .,l} where h0 ℝ2 (ℝ3) is fixed. The minimum/ maximum autocorrelation factors Y of X are given by
Future Scope 1
In the recent past, MED has been effectively used in early fault diagnosis in mechanical systems. MED thus has the potential to work on applications involving impulse signals. Starting with seismological applications, MED has proved its mettle in identifying signals with minimum entropy or minimum structure since its inception (Wiggins 1978).
Bibliography Cabrelli CA (1984) Minimum entropy deconvolution and simplicity: a noniterative algorithm. Geophysics 50(3):394–413 Cai W, Wang Z (2018) Application of an improved multipoint optimal minimum entropy deconvolution adjusted for gearbox composite fault diagnosis. Sensors 18(9):2861 Endo H, Randall R (2007) Enhancement of autoregressive model based gear tooth fault detection technique by the use of minimum entropy deconvolution filter. Mech Syst Signal Process 21(2):906–919 González G, Badra R, Medina R, Regidor J (1995) Period estimation using minimum entropy deconvolution (MED). Signal Process 41(1):91–100 Liu H, Xiang J (2019) A strategy using variational mode decomposition, L-kurtosis and minimum entropy deconvolution to detect mechanical faults. IEEE Access 7:70564–70573 McDonald GL, Zhao Q (2017) Multipoint optimal minimum entropy deconvolution and convolution fix: application to vibration fault detection. Mech Syst Signal Process 82:461–477 Nose-Filho K, Takahata AK, Lopes R, Romano JMT (2018) Improving sparse multichannel blind deconvolution with correlated seismic data: foundations and further results. IEEE Signal Process Mag 35(2):41–50 Sacchi MD, Velis DR, Cominguez AH (1994) Minimum entropy deconvolution with frequency-domain constraints. Geophysics 59(6):938–945 Wei Y, Li Y, Xu M, Huang W (2019) A review of early fault diagnosis approaches and their applications in rotating machinery. Entropy 21(4):409 Wiggins RA (1978) Minimum entropy deconvolution. Geoexploration 16(1–2):21–35
Y ðuÞ5Z ðuÞS0 2 Q where Q is the matrix of eigenvectors of the covariance 1
1
matrix I S0 2 Gðh0 ÞS0 2 corresponding to h0 arranged so that the corresponding eigenvalues are decreasing in magnitude.
Introduction Principal component analysis is a commonly used method of extracting the most relevant combinations from multivariate data with a view to maximize the explained variance (Davis 2002). However, spatial autocorrelation and spatial crosscorrelation present in the data are ignored, and noise in the data stemming from spatial autocorrelation is also not captured. The method of minimum/maximum autocorrelation factors (MAFs) was first introduced by Switzer and Green (1984) as an approach to separate noise and signal in remote sensing data. MAF makes use of global spatial statistics, most notably the experimental covariance matrix and an experimental semivariogram matrix to characterize the short-scale variability of the data. Just as principal components, the MAF factors are linear combinations of the original variables. The first MAF contains maximum autocorrelation between neighboring observations. A higher-order MAF contains maximum autocorrelation subject to the constraint that it is orthogonal to lower-order MAFs and so the factors form a family of factors of decreasing autocorrelation. MAF analysis thus provides a more satisfactory method of orthogonalizing spatial data than PCA. The method has application whenever dimension reduction becomes an issue in geospatial data, and retaining the spatial autocorrelation is desired. MAF has been seen to be more effective than PCA when combined with
Minimum Maximum Autocorrelation Factors
classification techniques such as K-NN (Drummond et al. 2010) or linear discriminant analysis (Mueller and Grunsky 2016; Grunsky et al. 2017). For example, Grunsky et al. (2017) showed that it was possible to use the first six MAF factors of the geochemical composition of the Australian surface regolith, followed by LDA to map the underlying crustal architecture, despite secondary modifications due to transport and weathering effects. It should be noted that this represents reduction in the number of variables to 6 from a total of 50 geochemical variables compared to 8 in the case of PCA and 10 if raw variables were used. Apart from dimension reduction, the most common use in the geosciences lies in the geostatistical estimation and simulation of multivariate data. The MAF method was first applied by Desbarats and Dimitrakopoulos (2000) to simulate pore size distributions. Since this application, MAF has become one of the standard tools for spatial decorrelation as a preprocessing step for geostatistical simulation. The incorporation of MAF was extended to block simulation by Boucher and Dimitrakopoulos (2012), further highlighting the usefulness of the approach for large-scale mining operations. MAF was also used for factorial kriging of geochemical data from Greenland (Nielsen et al. 2000). The authors noted that a combination of maps of MAFs 1, 2, and 3 in a color ternary image resulted in good discrimination of the major provinces of South Greenland highlighting the potential of MAF analysis to identify geochemical boundaries in a region. There are two approaches which can be used to derive MAFs, data driven or model driven (Rondon 2012). The data-driven approach is based on the sample covariance matrix and a semivariogram matrix at a suitably chosen lag spacing; in contrast, the model-driven approach is based on a linear model of coregionalization (LMC) with two model structures, and the coefficient matrices of the model are used to derive the factors. The model-based approach has been investigated extensively to determine whether extensions to more than two model structures could be used to derive factors; however, an extension to more complex LMCs is not possible (Vargas-Guzman and Dimitrakopoulos 2003; Bandarian et al. 2008), which is a consequence of the noncommutativity of square matrices. One exception is the case when the LMC consists of multiple structures whose coefficient matrices are such that all but one form a commuting family in which case a model-based approach will lead to an LMC that is diagonal (Tran et al. 2006).
Calculation of MAFs A typical workflow for the application of data-driven MAF is as follows:
871
1. Compute the variance-covariance matrix S0 of the centered data and determine its principal component decomposition S0 ¼ PLP T with the entries in Λ arranged in decreasing order, and the matrix P is the corresponding orthogonal matrix of eigenvectors. 2. Compute 1=2 zðui ÞS0
the
PCA
scores
zðui Þ ¼ zðui ÞPL1=2 ¼
. At this stage, the new variables have
variance 1. 3. Compute the semivariogram matrix Gðh0 Þ ¼ L1=2 P T Gðh0 Þ PL1=2 for z at the chosen lag h0, and determine its spectral decomposition again ordering the eigenvalues in descending order: Gðh0 Þ ¼ QL1 QT 4. Put yðui Þ ¼ zðui ÞQ. Thus the overall transformation is y(ui) ¼ z(ui)A with A5PL1=2 Q: The matrix A diagonalizes S0 and Γ(h0) simultaneously by congruence: AT S0 A5QT L1=2 P T S0 PL1=2 Q5I and AT Gðh0 ÞA5QT L1=2 P T Gðh0 ÞPL1=2 Q5QT Gðh0 ÞQ5L1 : In the model-driven approach, the variables are first standardized or converted to normal scores. Experimental direct and cross semivariograms are computed based on a suitably chosen lag spacing as represented by the vector h0. This step is followed by fitting an LMC composed of two structures to the experimental semivariograms: GðhÞ ¼ B1 g1 ðhÞ þ B2 g2 ðhÞ Here g1 and g2 are allowed covariance model functions, and B1 and B2 are positive semi-definite coefficient matrices. Their sum is assumed to represent the covariance matrix of the data. It is further assumed that the range of the first covariance model function is less than that of the second; in many cases, the first model is the nugget covariance. The sum of the matrices B ¼ B1 þ B2 is treated as the variancecovariance matrix of the data, and the MAF transformation is determined from B and B1 using steps 1 and 3 from the procedure above. The construction of the MAF transformation can also be seen as the solution of the generalized eigenvalue problem (Bandarian and Mueller 2008).
M
872
Minimum Maximum Autocorrelation Factors
Irrespective as to whether the data-driven or model-driven approach is adopted, the factors are ordered in decreasing continuity in contrast to PCA, where spatial behavior is ignored. The matrix Q derived in step 3 can be interpreted as containing the correlations between the PCA factors derived in step 1 and the MAF factors (Rondon 2012). The matrix Q is an orthogonal matrix and thus a composition of rotations and reflections; as a consequence, PCA and MAF factors are rarely equivalent, and the proportion of variance of the original data explained via MAF differs from that explained via the factors or the PCA. It can also be shown that the MAF factors and the raw data are related by E Y ðuÞT Z ðuÞ ¼ E AT Z ðuÞT Z ðuÞ ¼ AT S0 ¼ QT L1=2 P T PLP T ¼ QT L1=2 P T ¼ A1 : Thus the inverse of A which is required for backtransformation in the case of simulation or estimation does not need to be computed through inversion, but simply through matrix multiplication (Rondon 2012). In the case when the data are normalized, the expectation E(Y(u)TZ(u)) is nothing other but the correlation matrix between the MAF factors and the raw data, which is analogous to the interpretation in the case of PCA. A guided application is provided in Rondon (2012) where the normal scores of cadmium (Cd), cobalt (Co), chromium (Cr), and nickel (Ni) metal concentrations (ppm) from the Jura data set (Atteia et al. 1994) are spatially decorrelated with the data-driven and model-driven approaches. The author provides a detailed comparison of the outputs from a simulation study which shows that overall there is little difference in the performance of the two approaches. Since the data-driven approach does not make any assumptions about the model of continuity, and so is much less restrictive than the model-driven approach, it is the preferred method in geostatistical studies as well as exploratory work.
MAF Biplots As is the case for PCA, we may use the MAF scores and loadings to construct form biplots based on selected MAF factors. The scores are given by Y(u) 5 Z(u)A, and the matrix A is the corresponding loading matrix. The columns of Y are the principal coordinates, represented as dots, and the rows of the inverse loadings matrix are the analogues of the standard coordinates (represented as arrows).
MAF Analysis for Geochemical Surveys The analysis of geochemical survey data is one specific area of application for MAF. The motivation is that typically many variables need to be analyzed jointly and for process discovery or hypothesis building, exploratory multivariate statistical methods are applied including PCA. However, the application of a PCA ignores any spatial dependence among variables which might make MAF more appropriate. When analyzing geochemical survey data, attention needs to be paid to the compositional nature of these, and this in turn requires preprocessing of the data through an appropriate log-ratio transformation, such as an additive (alr), a centered (clr), or an isometric (ilr) log-ratio transformation. The most common choice is to use the clr-transformation. In this case, the covariance matrix Sclr of the transformed data is singular, and the PCA of Sclr can be written as Sclr ¼ [UH, D1/21D] Λ[UH, D1/21D]T, where D denotes the number of clr-variables. The matrix U ¼ [UH, D1/21D] is orthogonal and contains the eigenvectors of Sclr. The matrix Λ ¼ diag (l1, . . ., lD 1, 0) is the diagonal matrix of eigenvalues arranged in descending order, with the last eigenvalue equal to 0 corresponding to the eigenvector uD ¼ D1/21D and H is subspace orthogonal to uD. To complete the MAF transformation, the second decomposition is based on a suitably chosen semivariogram matrix chosen from the set GH ðh‘ Þ ¼ U TH Gclr ðh‘ ÞU H , ‘ ¼ 1, . . . , L of positive definite (D 1) (D 1) matrices (Mueller et al. 2020).
Application The Tellus geochemical survey was conducted between 2004 and 2007 and covers the region of Northern Ireland, UK (GSNI 2007; Young and Donald 2013). It consists of 6862 rural soil samples (X-ray fluorescence (XRF) analyses), collected at 20-cm depth, with average spatial coverage of 1 sample site every 2 km2. Each soil sample site was assigned one of six broadly defined lithological classes (acid volcanics, felsic magmatics, basic volcanics, mafic magmatics, carbonatic, silicic clastics) as described in Tolosana-Delgado and McKinley (2016) and Mueller et al. (2020). A map of the sample locations colored by lithology is shown in Fig. 1. Nine of the 50 geochemical variables available were chosen for analysis. They represent the bulk of the variability and make up the subcomposition Al, Fe, K, Mg, Mn, Na, P, Si, and Ti. The subcomposition was closed through the addition of a rest variable, and experimental direct and cross variograms of the clr data were computed for 30 lags at a nominal spacing of 1 km. The MAF transform was based on an estimate of the
Minimum Maximum Autocorrelation Factors
873
Minimum Maximum Autocorrelation Factors, Fig. 1 Map of Northern Ireland covered by broad lithological classes
M
Minimum Maximum Autocorrelation Factors, Fig. 2 Experimental semivariograms of the direct MAF semivariograms
Minimum Maximum Autocorrelation Factors, Fig. 3 Biplots of the first three MAFs; scores are color coded using the classification in Fig. 1
874
covariance matrix Sclr and the semivariogram matrix for the first lag ΓH(h1). The direct variograms (Fig. 2) of the resulting nine factors show clear destructuration with increasing factor index, with an increase in the nugget-to-sill ratio from about 0.1 for the first to 0.95 for the ninth factor and a decrease in range from 90 km to 15 km. Already for the fifth factor, the contribution of the nugget is at about 60% for the overall variability. Biplots for MAF show a ternary system of mafics, felsics, and siliciclastic material (Fig. 3). The biplots of the first and second and first and third MAFs show a separation of
Minimum Maximum Autocorrelation Factors, Fig. 4 Scree plot for factorization based on MAF and cumulative variance explained
Minimum Maximum Autocorrelation Factors
lithologies, which is no longer the case for the biplot of second and third MAF. The MAF biplot 1–2 indicates collinearity of Mn, Fe, and Ca, which is also still apparent in the MAF1–3 biplot; in contrast, there is no evidence of separation in the biplot contrasting MAF2 and MAF3. The scree plot in Fig. 4 shows that the first factor explains approximately 36% of total variability, but in contrast to a PCA scree plot, the function showing the eigenvalues as a function of the MAF-index is not a decreasing function of the index. When compared with the standard PCA, the MAF clearly prioritizes spatial continuity (see Mueller et al. 2020). The spatial maps of the MAF factors (Fig. 5) identify the lithologies reasonably well. The scores of the first two factors (MAF1 and MAF2) can be interpreted as presenting a broad balance between mafic elements and felsic elements. This feature is related to the contrast of the Paleocene Antrim basalts in the north west with the older rocks across the remainder of Northern Ireland. The intrusive igneous granite and granodiorite are also highlighted (MAF1, MAF2, and MAF3). Carbonates and clastics of different ages are differentiated by the fourth and fifth scores (MAF4 and MAF5). The scores of the sixth, seventh, and eight factors are less clear. The maps also highlight the decrease in spatial continuity of the factors within particular MAFs 1 and 2 having broader spatial features than the remaining MAFs.
Minimum Maximum Autocorrelation Factors, Fig. 5 Maps of MAF factors; top, 1 to 3; center, 4 to 6; bottom, 7–9 showing the decrease in spatial continuity; blue and yellow colors indicate low and high values, respectively
Mining Modeling
Summary Decomposition of multivariate spatial data via MAF provides a powerful means to analyze spatial features of data and to characterize spatial scale. The resulting factors allow for univariate instead of multivariate geostatistical simulation and so provide more freedom in the choice of spatial covariance models than what is readily available in the multivariate case.
Cross-References ▶ Compositional Data ▶ Geostatistics ▶ Multivariate Analysis ▶ Principal Component Analysis ▶ Spatial Autocorrelation
Bibliography Atteia O, Dubois JP, Webster R (1994) Geostatistical analysis of soil contamination in the Swiss Jura. Environ Pollut 86:315–327 Bandarian E, Mueller U (2008) Reformulation of MAF as a generalised eigenvalue problem. Geostats 2008:1173–1178 Bandarian EM, Bloom LM, Mueller UA (2008) Direct minimum/maximum autocorrelation factors within the framework of a two structure linear model of coregionalisation. Comput Geosci 34:190–200 Boucher A, Dimitrakopoulos R (2012) Multivariate block-support simulation of the Yandi iron ore deposit, Western Australia. Math Geosci 44:449–468. https://doi.org/10.1007/s11004-012-9402-9 Davis JC (2002) Statistics and data analysis in geology, 3rd edn. Wiley, Hoboken Desbarats AJ, Dimitrakopoulos R (2000) Geostatistical simulation of regionalized pore-size distributions using min/max autocorrelation factors. Math Geol 23(8):919–941 Drummond, RD, Vidal AC, Bueno JF, Leite EP (2010) Maximum autocorrelation factors applied to electrofacies classification. In: Society of exploration geophysicists international exposition and 80th annual meeting 2010, SEG 2010 Denver, 17 October 2010–22 October 2010, 139015, pp 1560–1565 Grunsky EC, de Caritat P, Mueller UA (2017) Using Regolith geochemistry to map the major crustal blocks of the Australian continent. Gondwana Res 46:227–239 GSNI (2007) Geological survey Northern Ireland Tellus project overview. https://www.bgs.ac.uk/gsni/Tellus/index.html. Accessed 7 Mar 2017 Mueller UA, Grunsky EC (2016) Multivariate spatial analysis of Lake sediment geochemical data. Melville Peninsula, Nunavut, Canada. Appl Geochem. https://doi.org/10.1016/j.apgeochem.2016.02.007 Mueller U, Tolosana-Delgado R, Grunsky EC, McKinley JM (2020) Biplots for compositional data derived from generalised joint diagonalization methods. Appl Comput Geosci 8:100044. https://doi.org/10.1016/j.acags.2020.100044 Nielsen A, Conradsen K, Pedersen J, Steenfelt A (2000) Maximum autocorrelation factorial kriging. In: Kleingeld W, Krige D (eds) Geostats 2000 Cape Town, Vol 2: 548–558, Geostatistical Association of Southern Africa
875 Rondon O (2012) Teaching aid: minimum/maximum autocorrelation factors for joint simulation of attributes. Math Geosci 44:469–504 Switzer P, Green AA (1984) Min/max autocorrelation factors for multivariate spatial imagery. Technical report SWI NSF 06. Department of Statistics, Stanford University Tolosana-Delgado R, McKinley J (2016) Exploring the joint compositional variability of major components and trace elements in the Tellus soil geochemistry survey (Northern Ireland). Appl Geochem 75:263–276 Tran TT, Murphy M, Glacken I (2006) Semivariogram structures used in multivariate conditional simulation via minimum/maximum autocorrelation factors. In: Proceedings XI international congress. IAMG, Liège Vargas-Guzman JA, Dimitrakopoulos R (2003) Computational properties of min/max autocorrelation factors. Comput Geosci 29(6): 715–723. https://doi.org/10.1016/S0098-3004(03)00036-0 Young M, Donald A (2013) A guide to the Tellus data. Geological Survey of Northern Ireland, Belfast. 233pp
Mining Modeling Youhei Kawamura Division of Sustainable Resources Engineering, Hokkaido University, Sapporo, Hokkaido, Japan
Definition A “digital twin” is defined as a virtual representation that serves as the real-time digital counterpart of a physical object or process. Digital twin technology is the result of continuous improvements in product design and engineering. Product drawings and engineering specifications have progressed from handmade drafting to computer-aided design and then to model-based systems engineering. On the other hand, the mining modeling is a relatively universal method of mathematical model construction and application intended to aid managerial personnel at various management levels in decision-making situations, which are frequently characterized by complicated relations of a quantitative as well as logical character. The digital twin of a physical object is dependent on the digital thread – the lowest-level design and specification for a digital twin – for accuracy to be maintained. Changes to product design are implemented using engineering change orders (ECO). An ECO issued for a component item results in a new version of the item’s digital thread, and correspondingly, of the digital twin. The concept of digital twins has started attracting attention due to the expansion of the Internet of Things (IoT) and the development of augmented reality (AR) and virtual reality (VR). However, a digital twin is different from general models and simulations. The concept of reproducing and
M
876
simulating an entity in a digital space using real-world data is not novel. However, the difference between a digital twin and general simulations is that changes in the real world can be reproduced in the digital space and are linked to each other in real time. With the expansion of IoT, real-world data is being automatically collected in real time and immediately reflected in digital spaces through networks. Thus, the similarity between real-world objects and digital-space models can be maintained. From this perspective, a digital twin is considered to be a “dynamic” virtual model in the real world. Digital twin technology for mining utilises can enable “mining modeling” for various purposes. Moreover, a digital twin for mining is a virtual representation of the physical world and is stored in a representative structure on a cloud data platform.
Introduction The idea of “digital twin” is attracting considerable attention in the present-day mining industry and is expected to be used in various applications. Unlike older programmable logic controllers (PLCs), distributed control systems (DCSs), and mining execution systems (MESs), a digital twin leverages the latest updates in user interface (UI) and advanced visualization to allow operators recognitions in mine. The digital twin will be at the core of the interface in “smart mining,” which is rapidly gaining acceptance in the global mining industry. This industry is becoming increasingly serious regarding the environment, the economy, and safety. Visualizing the development of the mine, environmental information, the position of workers, and the operational status of trucks at the central control room on the ground surface using communication systems is advantageous for mine operations. Rio Tinto has developed RTVis (Rio Tinto Visualization), a unique 3D visualization tool that can acquire geological information of open-pit mines, the progress of excavation and blasting, and the position information of trucks in real time (Rio Tinto 2020). RTVis superimposes information on a 3D model, determines the development status in real time in the central control room, and supports decision-making. This technology is an example of the use of digital twins for open-pit mining. These are the areas that “mining modeling” has been in charge of so far. The digital twin makes mining modeling easier to apply and is expected to evolve as a field. It is important to create a 3D model that serves as the “reflection destination,” providing an interface for sensing data of a mine whose shape keeps changing over time. CAD-based BIM/CIM in civil engineering and architecture cannot capture shape changes. Currently, two main methods are used for constructing a 3D model. One involves using a laser scanner, and the other is based on photogrammetry. With the rapid progress in drone technology in recent years, it has become possible to capture images and record
Mining Modeling
videos of large-scale objects, such as open-pit mines. Presently, photogrammetry-built 3D models are being used in open-pit mining projects (Obara et al. 2018).
Structure of Digital Twin The structure of a digital twin is shown in Fig. 1. The “physical world” (mine site) is measured using a set of smart sensors, including a camera (image sensor), and the obtained data is transmitted through a wireless communication system, such as Wi-Fi or 5G, to a cloud server functioning as a data storage and sharing hub. The on-site analogue data is converted to digital data at the time of measurement by the sensors. A digital twin is created from the digital data on the cloud server. Any individual with access rights and sufficient network speed can utilize the created digital twin since it already exists in the cyber world (computational domain). Comprehensive cloud data structures require a representative visualization layer for exploring the entire mining site. Thus, digital twins can be used to build mining sites virtually and obtain detailed information regarding mining processes, assets, and key settings (both current and recommended). In summary, by leveraging digital twins to collect and store data, spatial intelligence graphs can be created to model relationships and interactions between individuals, spaces, and devices at mining sites. In addition, a digital twin with advanced simulation capabilities allows for leveraging artificial intelligence and machine learning models from historical data, thereby simplifying future predictive testing. This methodology, unlike general simulations, allows for simulating extraction of information and optimisation of processing equipment in a virtual environment Fig. 1.
Mining Modeling, Fig. 1 Structure of digital twin
Mining Modeling
Modeling Technologies for Creating Digital Twin Photogrammetry is the most effective technology for inexpensive and convenient 3D modeling of large-scale spaces. Photogrammetry uses structure from motion (SfM) and patchbased multi-view stereo (PMVS) as constituent technologies. In computer vision, methods for reconstructing an object in three dimensions on a computer based on images captured from various angles (multi-viewpoint images) have been studied extensively. SfM is the most versatile and accurate method, although several other reconstruction methods, such as using shadows in the image (shape from shading) and using focus (shape from defocus), are available. Currently, a new and effective method is being developed to reconstruct 3D shapes using SfM to obtain models closer to the real object. This method is called multi-view stereo (MVS). When combined with patch-based technology, it is called “patch-based multi-view stereo.” As mentioned earlier, it is necessary to roughly divide modelling process into two steps to create a 3D model. Three-dimensional reconstruction using a multi-view image can yield information on the depth lost by projecting from a 3D image to a 2D image (Furukawa and Ponce 2010). A 3D model of the entire mine, similar to RTVis, is needed to visualize various types of information. 3D Model Reconstruction from Multi-view Images (Structure from Motion) SfM is a method for simultaneously estimating the position and orientation of a camera and a sparse 3D shape (a collection of points, a sparse point cloud) from multiple images based on multi-view geometry. To estimate the position and orientation of the camera and the sparse point cloud from the multi-viewpoint image, mainly four steps are performed. In sequence, the feature points of the image are extracted and matched, the position and orientation of the camera are estimated, the 3D points are reconstructed via triangulation, and the camera orientation and the 3D point positions are optimized. These four steps are initially performed on two-viewpoint images as a pair. When the camera orientation and 3D points are reconstructed from the two-view images, one new image is added, and processing is performed again between the three images. When the posture estimation and restoration are completed, the images are added, and the process is repeated for all the multi-viewpoint images. Each process is explained below. • Step 1: Extraction and Matching of Feature Points in Images. To estimate the relative positions of the two cameras that have captured the two different viewpoint images, it is necessary to be able to first confirm multiple corresponding points between the images. Image feature
877
extraction technology (also called local image features) is used to search for corresponding feature points between two images. This technique considers a part of an image (a small set of pixels) composed of innumerable pixel values and determines characteristic points (key points) that indicate the overall tendency of that area. Algorithms such as scaled invariance feature transform (SIFT) can automatically calculate the feature points of images and associate them with each other (Lowe 2004). The imageto-image mappings that these algorithms automatically establish often contain errors. In other words, the feature points calculated between the two images are different, but they are related to each other. Random sample consensus (RANSAC) and epipolar constraints are used to eliminate such false correspondences (Isack and Boykov 2012). After performing RANSAC, only the feature points that satisfy the epipolar constraint (the corresponding points of the two images are located along a straight line) are recorded as the official corresponding points. • Step 2: Estimation of Camera Position and Orientation. When the search for the corresponding points of the two different viewpoint images is completed, the positions and orientations of the two cameras are estimated. At this stage, some correspondence is found between the two images. Therefore, the least-squares method is applied on these corresponding points to estimate the elementary matrix. Since the basic matrix contains the position and orientation of the cameras, these parameters can be estimated conveniently from more than two images. • Step 3: Reconstruction of 3DPpoints via Triangulation. Based on Steps 1 and 2, the corresponding points of the two images are searched, and the positions and orientations of the two cameras are estimated. The corresponding points satisfy an epipolar constraint as they lie on a straight line between the two images. Therefore, since one side and two angles are known, triangulation is possible, and the three-dimensional point position can be calculated. This calculation is repeated for each corresponding point to ascertain the position of the 3D point. • Step 4: Optimization of Camera Orientation and 3D Point Position. After Steps 1–3, the reconstruction of the 3D point is completed. The estimated camera position and orientation and the coordinates of the three-dimensional point overlap with each other as each process is performed. The deviation between the restored 3D point is projected onto the image plane, and the observation point on the corresponding image is called the reprojection error. This reprojection error is optimized, and the camera positions and orientation and the 3D points are estimated again Fig. 2.
M
878
Mining Modeling
Mining Modeling, Fig. 2 Flow of structure from motion (SfM)
Patch-Based Multi-View Stereo (PMVS) MVS is a method of reconstructing the dense 3D shape of a subject using multi-view geometry under the condition that the position and orientation of the camera have already been clarified by SfM. MVS generates a dense point cloud from a sparse point cloud by estimating the pixel depth of a multiviewpoint image. Specifically, the depth of each pixel is estimated by exploring the corresponding points in the paired images on the epipolar line using the camera orientation estimated by SfM. Thus, MVS searches for the corresponding points again separately from SfM. PMVS is an advanced method developed by Furukawa et al. to improve the accuracy of conventional MVS (Furukawa and Ponce 2010). PMVS estimates the normal vector and the depth of each pixel simultaneously. First, the feature points are detected, and the initial patch is calculated from the correspondence of the characteristic image area. Thereafter, the patch already obtained is extended to nearby pixels, and the process of removing erroneous corresponding points is repeated to generate a highly accurate and dense patch.
completed. Figure 2 shows a series of steps from a multiviewpoint image, summarizing the information in Sects. 4.1, 4.2, and 4.3. The lower-left model is the final model with an image pasted on the surface via surface reconstruction Fig. 3.
Surface Reconstruction from Point Cloud The dense point cloud obtained after the processing using SfM and PMVS is automatically inferred from the color information, and the multi-viewpoint image of the object is obtained. After surface reconstruction, the 3D model is finally
Particle Size Distribution (PSD) Analysis and Blasting Optimization in Open-Pit Mine The particle size distribution (PSD) of blasted rocks in hard rock mining has significant effects on the subsequent mine-tomill process. For example, regions of fines and oversized
Examples of Cyber-Physical Implementation (Applications of Digital Twin for Mining) This section presents specific examples of cyber-physical implementation in mining, including the multiple elemental technologies shown in Fig. 2. Cyber-physical implementation connects the physical world with the cyber world and improves work efficiency using the digital twin. The development of the digital twins for mining will enable rapid implementation of new technology at the mine site through actualization of the cyber-physical platform, which can appropriately allocate the work costs among the physical and cyber worlds. Specific usage examples are introduced separately for open-pit mines and underground mines.
Mining Modeling
879
Mining Modeling, Fig. 3 Constructing 3D model from multi-view Images of mock piles
rocks substantially reduce the loading and hauling productivity. Given that the material transportation cost may reach 60% of the total operating costs, maintaining an appropriate rock fragment PSD is the ultimate objective of mine productivity optimization. Furthermore, the total milling process energy can be reduced by feeding rock fragments within the optimum particle size ranges. For nearly three decades, the mining industry has been reliant on conventional 2D photo-based rock fragmentation measurement methods, which analyze the PSD of rock fragments from 2D surface images of rock piles. Since the 1980s, various 2D photo-based rock fragmentation measurement systems have been introduced, such as IPACS, WipFrag, SPLIT, and PowerSieve. The drawbacks of conventional 2D photo-based rock fragmentation measurement (Con2D) methods have consistently been highlighted by researchers and practitioners. Therefore, fragmentation management is regarded as an essential task for mining engineers. To overcome the limitations of Con2D, a 3D rock fragmentation measurement system (3DFM) based on photogrammetry technology has been proposed. 3DFM extracts the particle sizes from a 3D rock pile model, which facilitates accurate PSD measurement without scale objects and eliminates the need for excessive manual editing. Furthermore, a representative fragmentation of an entire blasting shot can easily be analyzed by generating a 3D model of an entire blasted rock pile (Jang et al. 2019). In the field of computer vision, which involves equipping a computer with human-like visual function, multi-viewpoint images of an object captured from various angles are integrated to reconstruct the 3D shape of the object. Research on image-based modeling and rendering (IBMR) is being
actively conducted. The objective of our specific study is to develop a particle size distribution (PSD) estimation method that is both accurate and convenient, by applying IBMR to multi-view images of mock piles. In this system, a video or multiple photographs of the mock pile generated by blasting are captured by devices such as a drone or a smartphone. These multi-view images are uploaded to a cloud server via a network such as Wi-Fi or 5G. Then, information is passed from the physical world to the cyber world. A 3D model of the mock pile is generated from the multi-viewpoint images using the aforementioned photogrammetry technique. Thus, digital twins comprising polygons having the same scale as that of the real field are built in the cyber world. Subsequently, it becomes possible to automatically calculate the particle size of each fragmented rock using machine learning (e.g., through supervoxel processing) and to obtain an accurate particle size distribution curve. Since it is possible to create artificial intelligence (AI) with blasting design as the input and particle size distribution as the output, the subsequent blasting design can be optimized, and the new design can be adopted. Thereafter, the information is transferred to the physical world for use. The digital twin enables such blasting optimization Fig. 4. Visualization of Ground Stress Concentration in Underground Mine In future, underground mines are expected to become deeper. Therefore, the rock stress at the face will increase. To handle this situation, it is necessary to develop a solution that fully utilizes the latest technology. The main factor affecting underground work safety is rock stress. The purpose of this study was to improve the
M
880
Mining Modeling
Mining Modeling, Fig. 4 PSD analysis and blasting optimization
efficiency and safety of underground work by constructing a rock stress monitoring/visualization system based on a 3D model of an underground mine and AR technology. We investigated the technology for visualizing the stress information of the rock mass in a tunnel using the natural feature-tracking method, which is a marker-less tracking method, and a point cloud 3D model of the underground tunnel. To construct the 3D model of an underground mine using photogrammetry, numerous images are required, and the shooting time becomes excessively long. To solve these problems, the methods of constructing a three-dimensional model of the mine using a 360 camera were compared with the method of using a conventional camera. From the experimental results, it was confirmed that reconstruction using a 360 camera is superior in terms of shadow time and effective restoration rate. An optimal method for building a 3D model in an underground mine has thus been developed. With the point cloud model of the tunnel constructed using the proposed method, the visualization of rock stress by AR was examined. Initially, assuming that the rock stress distribution in the underground tunnel is divided into three regions, the point cloud model in the tunnel was edited in three colors. GPS connectivity is unavailable in underground tunnels, but
the amount of light is constant. Therefore, the natural featuretracking method was adopted to match the real world with the point cloud model as a virtual object to confirm the extraction and collation of feature points between the images of the rock wall and to estimate the camera attitude and coordinates. From this information, the absolute coordinates of the point cloud model of the mine were determined. By associating these coordinates with the relative coordinates of the camera, the point cloud models of the corresponding colors were superimposed according to the wall surface projected on the camera. An AR system based on this point cloud model can be the basis for visualizing rock stress in real time after the underground monitoring system is completed. This can enable underground workers to identify changes in stress in real time. The system is ultimately expected to provide a safer and more productive work environment for underground workers Fig. 5.
Conclusion The development of new technologies to improve efficiency and safety in mine development is accelerating. This is further
Modal Analysis
881
M Mining Modeling, Fig. 5 AR visualisation system for rock stress in an underground mine
boosted by the inevitable fusion of mining with other disciplines. Among them, smart mining (Mining 4.0) represents a fusion of mining with information engineering, offering new possibilities in the mining industry. Furthermore, digital twins are becoming more important as the core technology for smart mining. A digital twin connects the physical and cyber worlds as an effective interface and contributes to the improvement of efficiency and safety by enabling visualization of mining operations.
Lowe DG (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60:91–110. https://doi.org/10.1023/B:VISI. 0000029664.99615.94 Obara Y, Yoshinaga T, Hamachi M (2018) Measurement accuracy and case studies of monitoring system for rock slope using drone. J MMIJ 134(12):222–231. https://doi.org/10.2473/journalofmmij.134.222 Rio Tinto (2020) Smart mining. https://www.riotinto.com/en/about/ innovation/smart-mining. Accessed 2020/12/24
Modal Analysis References Furukawa Y, Ponce J (2010) Accurate, dense, and robust multiview stereopsis. IEEE Trans Pattern Anal Mach Intell 32(8):1362–1376. https://doi.org/10.1109/TPAMI.2009.161 Isack H, Boykov Y (2012) Energy-based geometric multi-model fitting. Int J Comput Vis 97:123–147. https://doi.org/10.1007/s11263-0110474-7 Jang H, Kitahara I, Kawamura Y, Endo T, Degawa R, Mazara S (2019) Development of 3D rock fragmentation measurement system using photogrammetry. Int J Min Reclam Environ 34(4):294–305. https:// doi.org/10.1080/17480930.2019.1585597
Frits Agterberg Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Definition Modal analysis is the study of system components in the frequency domain. In the geosciences, it is especially important in igneous and sedimentary petrography. It assumes that minerals
882
Modal Analysis
are randomly distributed in the rocks being studied and that proportions of the rock-forming crystals (or other types of rock particles) can be determined by point counting using a special microscope or microprobe. For points forming a regular grid, the presence or absence of the rock-forming minerals is counted to measure the volume percentages of the minerals considered. If distances between points are sufficiently long, exceeding linear dimensions of the mineral grains considered, the binomial distribution and its normal approximation can be used to model these presence-absence data.
The mean m of this distribution satisfies m ¼ np, and the p standard deviation is s ¼ npq. The following example illustrates the application of the binomial distribution in petrographic modal analysis. Suppose that 12% by volume of a rock consists of a given mineral “A.” The method of point counting is applied to a thin section of this rock. Suppose that 100 points are counted. From p ¼ 0.12 and n ¼ 100, it follows that q ¼ 0.88 and m ¼ np ¼ 12; s ¼
Introduction In the geosciences, the technique of modal analysis was introduced by Chayes (1956) for widespread use in petrography; it had several precursors. Delesse (1848) introduced the principle of measuring volume percentage values of constituents of rocks by measuring the presence of every constituent at equally spaced points along lines. However, the initial method was exceedingly laborious as described by Johannsen (1917). Delesse’s preferred method was to trace, on oiled paper, the outline of each constituent as shown on polished slabs. After coloring the resulting drawing, it was pasted on tinfoil and cut apart for each component, soaked off the paper, and weighed. Half a century later, Rosiwal (1898) introduced his improved method of linear measurements. Subsequently Shand (1916) described the method of employing a mechanical stage as part of the microscope that became generally used, because it requires very few measurements. Other early applications of modal analysis include Glagolev (1932) and Krumbein (1935). Chayes (1956) brought mathematical statistics to bear on how to use the stage. His technique is based on the binomial distribution, which is a discrete probability distribution with parameters n (¼ number of points counted) and p (¼ number of points on mineral considered). It is assumed that n independent experiments are performed, each answering a yes-no question with probability p of obtaining a Booleanvalue (0 or 1) outcome. The probability of no success satisfies q ¼ 1 p. A single success-failure experiment is also called a Bernoulli trial.
p
npq ¼
p
12 0:88 ¼ 3:25:
If the experiment of counting 100 points would be repeated many times, the number of times (K ) that the mineral “A” is counted would describe a binomial distribution with mean 12 and standard deviation 3.25. According to the so-called central-limit theorem, a binomial distribution approaches a normal distribution if n increases. Hence, we can say for a single experiment that: Pðm 1:96s < K m þ 1:96sÞ ¼ 0:95 or Pð5:6 < K 18:4Þ ¼ 95% The resulting value of K is between 5.6 and 18.4 with a probability of 95%. This precision can be increased by counting more points. For example, if n ¼ 1000, then m ¼ 120 and s ¼ 10.3 and Pð99:8 < K 140:2Þ ¼ 95% This symmetrical probability can also be written as k ¼ 120 20.2. To compare the experiment of counting 1000 points to that for 100 points, the result must be divided by 10, giving k0 ¼ 12 2.02. This number represents the estimate of volume percentage for the mineral “A.” It is 3.25/(10.3/10) ¼ 3.2 times as precise as the first estimate for 100 points only. It is noted that this increase in precision agrees with the formula s2 ðxÞ ¼ s2 ðxÞ=n for the population variance of x representing the mean of n values. The method of modal analysis has been discussed in more detail by Chayes (1956).
Binomial Distribution The probability P (k, n) of k successes in n experiments satisfies the binomial frequency distribution: Pðk,nÞ ¼
n k nk pq where k
n k
n! ¼ k!ðn k Þ!
Antarctic Meteorite Case Study Petrographic modal analysis is usually performed in conjunction with other methods such as chemical analysis of rock considered, as seen in McKay et al. (1999), that is taken here
Moments
as an example. This particular study describes a preliminary examination of the Antarctic Meteorite LEW 87051 (total weight 0.6 g) which is porphyritic in texture, with equant, euhedral to subhedral olivine phenocrysts ~0.5 mm wide set in a line-grained groundmass of euhedral plagioclase laths and interstitial pyroxene, Fe-rich olivine, kirschsteinite, and some minor constituents. Microprobe modal analysis was performed on these minerals by counting 721 points ranging from a maximum of 34.1% plagioclase to a minimum of 4.7% kirschsteinite, plus traces of several other minerals. The information given in the preceding paragraph can be used to determine the approximate precision of the estimated volume percentage values by using the normal approximation of the binomial model as discussed in the preceding section. For very small frequencies, it would be better to approximate the binomial model by an asymmetric Poisson distribution model instead of by the normal distribution model. This is because, for very small frequencies, the binomial model approaches its Poisson distribution limit. According to Hald (1952), this limit can be set at θ < 0.1, where θ ¼ m/n. This lower limit criterion is not met for kirschsteinite in the current example with θ ¼ 0.047 suggesting that use of the Poisson model would be better. Consulting tables published by Mantel (1962), and later reproduced by Johnson and Kotz (1969), then give its 95% confidence interval as P (23.6 < K 48.3) ¼ 95% instead of P (22.6 < K 45.3) ¼ 95% as results from the normal approximation of the binomial distribution. This is a very small improvement only.
Summary and Conclusions The purpose of petrographic modal analysis is to determine the volume percentages of rock constituents from thin sections under the microscope. The spacing between points at which the constituents are counted should be sufficiently wide to avoid counting the same grain more than once. Precision of the results can be determined by using the binomial distribution model. For rare constituents, the Poisson model can be used. The Antarctic Meteorite LEW 87051 (total weight 0.6 g) was taken for example.
883 Johannsen A (1917) A planimeter method for the determination of the percentage compositions of rocks. J Geol 25:276–283 Johnson NI, Kotz S (1969) Distributions in statistics: discrete distributions. Houghton Mifflin, Boston, p 328 Krumbein WC (1935) Thin-section mechanical analysis of indurated sediments. J Geol 43:482–496 Mantel N (1962) Lung cancer mortality as related to residence and smoking histories. I. White males. J Natl Cancer Inst 28:947–997 McKay G, Crozaz G, Wagstaff J, Yang SR, Lundberg L (1999) A petrographic, electron microprobe, and ion microprobe study of mini-angrite Lewis Cliff 87051. In: Abstracts of the LPSC XXI, pp 771–772 Rosiwal A (1898) Über geometrische Gesteinanalysen. Ein einfacher Weg zur ziffrmässigen Feststellung des Quantitätsverhältnisses der Mineral bestandtheile gemengter Gesteine. Verh der k-k Geol Reichsanstalt, Wien, pp 143–175 Shand J (1916) A recording micrometer for geometrical rock analysis. J Geol 24:394–401
Moments Jessica Silva Lomba1 and Maria Isabel Fraga Alves2 1 CEAUL, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal 2 CEAUL & DEIO, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
Definition Statistical moments can be introduced as features of (the probability distribution of) a random variable (RV). These numerical characteristics describe the behavior of the RV’s distribution, such as location and dispersion, playing an important role in its identification, fitting, and estimation of parameters. Theoretical (population) moments are generally defined as expectations of functions of the RV, while their empirical counterparts, the sample moments, are basically computed as averages of similar functions of observations. The most common statistical moments in Geosciences are the following: the conventional first moment – mean, the second central moment – variance, and the standardized third and fourth central moments – skewness and kurtosis. However, Probability Weighted Moments (PWM) and L-moments have also been gaining visibility in this field.
Bibliography Chayes F (1956) Petrographic modal analysis. Wiley, New York, p 113 Delesse M (1848) Procédé méchanique pour determiner la composition des roches. Ann Min XIII(4):379–388 Glagolev AA (1932) Quantitative mineralogical analysis of rocks with the microscope. Gosgeolizdat, Leningrad, pp 1–25 Hald A (1952) Statistical theory wit engineering applications. Wiley, New York, p 783
Introduction The term statistical moment refers to application of the mathematical concept moment as a tool in the study of probability, or frequency, distributions. First introduced by Karl Pearson (1893) in a short study of asymmetrical frequency curves, moments became widely spread probabilistic features across
M
884
Moments
all data-based pursuits. The choice of name moment connects strongly to the physical interpretation of the moments of mass distributions in Mechanics. Broadly speaking, for a continuous real function of real variable f(x) and c, k ℝ, consider the moment integral þ1 1
ðx cÞk f ðxÞdx:
ð1Þ
This is specially interesting when k ℕ0, as (1) then defines the kth moment of f about c. We restrict our exposition to the continuous case, for practical relevance, but discrete analogues can be found in the literature (e.g., Rohatgi and Saleh 2015). This concept is also extendable to higher dimensions (e.g., see “Multivariate Analysis of Variance”), though our main focus will be the univariate setting. Illustrating the physical interpretation, consider the geophysical study of the Earth’s gravity field by Jekeli (2011), exploring the relationship between low-degree spherical harmonics of the Earth’s gravitational potential and the moments of its mass density (Eq. 50 therin). It is seen that the 0th order density moment is proportional to the Earth’s total mass, the first moments are proportional to its center of mass, and the second moments define the inertia tensor – which can indicate asymmetry in mass distribution. We borrow these ideas for interpretation of statistical moments. Throughout, we think of an RV X with continuous, univariate distribution and density functions F and F0 :¼ f (see “Probability Density Function” – PDF). Moment systems allow identification and description of the distribution of X in terms of location (analogous to center of mass) and shape (including dispersion and symmetry about the center), as functions of its parameters. Historically, moments have been recommended in the context of Geosciences since the 1900s. Since Pearson’s seminal work, generalizations of the conventional moments (or simply Moments) have been suggested, addressing shortcomings of the primary system. However, the conventional remains the most widely known. We present three systems’ definitions and usage, illustrated with applications in Geosciences.
Conventional Moments
MðsÞ :¼ E esX ,
xk f ðxÞdx, S
ð2Þ
ð3Þ
provided the expectation on the right exists around the origin, and so the moments of X appear as successive derivatives of M(s) at that point: M0 (0) ¼ E[X]; M00 (0) ¼ E[X2]; in general, M(k) (0) ¼ E [Xk], for k 1. The MGF may not exist, but existence of M(s) implies existence of all order m0k, uniquely determining F. If E[X] < 1, useful features arise by considering the moments centered about the mean, the central moments, of X mk :¼ E ðX mÞk ¼
ðx mÞk f ðxÞdx,
ð4Þ
S
as they directly inform on the distribution’s shape. These can be computed from raw moments, and vice versa. The first central moment is trivially 0, but, when they exist, higher order ones are specially relevant: – Provided E|X|2 < 1, then s2 :¼ m2 ¼ E X2 ðEjXjÞ2
ð5Þ
is the variance of X, and measures dispersion around the mean (~ moment of inertia); the square root, s, is the standard deviation, expressed in the units of X, and its ratio by the mean results in the coefficient of variation s c¼ ; m
Traditionally, the kth moment of X refers to the expectation of Xk m0k :¼ E Xk ¼
also known as raw moment, stemming from (1) with f the PDF of X, c ¼ 0, and integrating over the variable’s support S. Notice E[X k] in (2) exists only if E|X|k :¼ |x|kf(x) dx < 1. It becomes apparent that m00 ¼ 1 gives the total probability (~ total mass). The first moment of X corresponds to the mean or expected value of the distribution, m :¼ E[X], if it exists, and provides a measure of location, a central value of the distribution (~ center of mass). Some distributions have all finite moments (e.g., see “Normal Distribution”), but not all moments of a distribution necessarily exist (e.g., Cauchy distribution); however, existence of the kth moment implies existence of all moments of lower order 0 < n < k. The Moment Generator Function (MGF) of X is defined as
ð6Þ
(see “Variance,” “Standard Deviation”); – For symmetric distributions, all odd central moments are 0. – The third and fourth standardized central moments give dimensionless measures of asymmetry and peakedness of distributions, respectively, the coefficient of skewness
Moments
885
g1 :¼ ms33 and kurtosis g2 ¼ ms44 .
detailed exposition of generalizations of (2)–(4) to multiple RVs. We briefly mention a few aspects:
This topic is comprehensively addressed in Chapter 3 of Rohatgi and Saleh (2015). The primary practical interest of these features is to compare them, in a way, to their empirical counterparts. For RV collection X1, X2,..., Xn, the kth sample moment and sample central moment (cf. Rohatgi and Saleh 2015, Chapter 6) are defined, respectively, by m0k :¼
1 n
n
Xki
and mk :¼
i¼1
1 n
n
Xi X
k
ð7Þ
i¼1
where the common average X :¼ m01 denotes the sample mean. When fXi gni¼1 constitute a random sample (RS) – independent, identically distributed (i.i.d) RVs – if m0k exists, the statistic m0k is a consistent and unbiased estimator for that parameter. In fact, X is m’s linear unbiased estimator of minimum variance (see “Best Linear Unbiased Estimator”). The sample central moments are generally not unbiased for their population analogues. Hence, the sample variance is commonly defined as n S2 :¼ m , n1 2
ð8Þ
p an unbiased estimator for s2 (not so for S ¼ S2). Regarding higher orders, the sample coefficient of skewness and sample kurtosis, respectively,
g¼
n2 m3 and ðn 1Þðn 2Þ S3 2
n k¼ ð n 2Þ ð n 3Þ
– The covariance between two RVs X and Y covðX, Y Þ :¼ E½ðX E½XÞðY E½Y Þ ¼ E½XY E½XE½Y
ð10Þ (if the expectations exist) provides a measure of joint variation, particularly, covðX, XÞ ¼ s2X ; this concept is generalized for n-dimensional vectors by the (symmetric) variancecovariance matrix, where the main diagonal shows individual variances and off-diagonal entries show pairwise covariances; – While cov(X, Y ) points if Y tends to increase or decrease when X increases (respectively, cov(X, Y ) > 0 or cov(X, Y ) < 0), the correlation coefficient rX,Y ¼ corrðX, Y Þ ≔
covðX, Y Þ sX sY
ð11Þ
indicates the relationship’s strength. Note |rX,Y| 1, the equality equivalent to linear association between X and Y (see “Correlation Coefficient,” “Multiple Correlation Coefficient”). – If X and Y are independent, cov(X, Y ) ¼ 0 ¼ rX,Y, i.e., X and Y are uncorrelated; the converse is not necessarily true; – For a bivariate sample fðXi , Y i Þ gni¼1 , these characteristics are estimated by the sample covariance and correlation coefficient, respectively,
ð9Þ 2
ð n 1Þ n þ 1 m4 3 n 1 S4 n2
þ 3,
are possibly markedly biased estimators of g1 and g2 , with undesirable algebraic bounds determined by the sample size, p jgj n and k n þ 3 (Hosking and Wallis 1997), and are highly outlier-sensitive. Mind that software usually computes the sample excess kurtosis, k – 3, by comparison with the Normal distribution (γ2 ¼ 3). Sampling distributions and asymptotic properties of these statistics have been widely studied and yield extremely useful results, like the Central Limit Theorem and Monte Carlo methods. Other functions of moments relevant in applications can be listed, especially in the context of multivariate and spatial problems. Chapter 4 of Rohatgi and Saleh (2015) includes a
SX,Y :¼ RX,Y
1 n1
n
Xi X Y i Y and i¼1
ð12Þ
SX,Y :¼ ; SX SY
– In Geostatistics, for a stochastic process Z(s), a common generalization is the variogram 2gðs1 , s2 Þ :¼ E
Zðs1 Þ E½Zðs1 ÞÞ Z ðs2 Þ E½ðZ ðs2 ÞÞÞ2 ,
ð13Þ representing the variance of the difference of the process at distinct points s1, s2. If Z(s) is stationary, the variogram is simply the function of the distance h ¼ s2 – s1, since γ(s1, s2) ¼ γ(0, s2 – s1) :¼ γ(h). Assuming intrinsic stationarity
M
886
Moments
of Z(s), for points fsi gni¼1 , the prevailing empirical/experimental (semi)variogram is computed as 2gðhÞ :¼
1 nð hÞ
nðhÞ
ðZðsi Þ Ζðsi þ hÞÞ2 ,
ð14Þ
i¼1
with n(h) the number of pairs of points at distance h, and several theoretical models are available to fit this estimate (see “Variogram,” “Stochastic Process,” and “Stationarity”).
Method of Moments “Method of Moments”(MM) is a broad category including various estimation methodologies, applied in several contexts, presented in different ways, but built on the same basis: essentially, equating sample moments to their population match – a substitution principle. We focus on estimation of parameters of an assumed distribution F from a RS X1,...,Xn, although the principle can be generally applied to statistical models. Let θ ¼ (θ1,..., θp) Θ be the parameter vector of F. Knowing population m0k are functions of θ, the MM estimate u is the solution to an equation system of the type m0k ðuÞ ¼ m0k for k ¼ 1, . . . , m
ð15Þ
with m p the smallest integer guaranteeing a determined system. In general, the MM estimator of l ¼ g m01 , m02 . . ., m0k , a real-valued function of moments, is simply l ¼ g m01 , m02 . . ., m0k ; if g is continuous, l is a consistent estimator for l, asymptotically normal under mild conditions (Rohatgi and Saleh 2015, Chapter 8). Hence, some raw moment equations in (15) are frequently replaced by central moment relatives. Compared to other methods (e.g., “Maximum Likelihood”), MM estimators are more straightforwardly computed, despite being usually less efficient and possibly resulting in inadmissible estimates u Y.
Applications in Geosciences Moments and MM have been vastly applied in many fields of Geosciences. We mention a few examples: – A frequent use of the variogram is Kriging – an optimized spatial interpolation method (see “Kriging”), prolific in Geostatistics’ literature since the 1960s.
– Building on the concept of cumulant – a combination of moments, generated by the logarithm of the MGF – Dimitrakopoulos et al. (2010) suggested a modeling (and, in subsequent work, simulation) approach for nonlinear, non-Gaussian spatial data by introducing highorder spatial cumulants. These were found to efficiently capture complex spatial patterns and geological characteristics. The associated simulation methodology for spatially correlated attributes is shown to be advantageous compared to previous multiple-point (MP) approaches (see “Multiple Point Statistics”). – Osterholt and Dimitrakopoulos (2018) addressed limitations of traditional variogram-based stochastic simulation techniques in capturing non-linear geological complexities of orebodies; the authors revisit MP simulation as an algorithm for the simulation of the geology for mineral deposits, an approach based on highorder spatial statistics and a type of extension of kriging systems. This method, which allows for resource uncertainty assessment, is applied to the Yandi channel iron ore deposit. – Recently, Wu et al. (2020) studied how soil properties’ distribution characteristics influence probabilistic behavior of shallow foundations’ total and differential settlements. Focusing on the influence of the first four moment statistics, used to define the PDF of elastic modulus E, on the settlement’s distribution, the authors suggest a moment-based simulation method for modelling E as a 2-dimensional non-Gaussian homogeneous field. The study (via Monte Carlo simulations) shows that mainly skewness of E significantly influences total/differential settlements and suggests greater attention be given to identifying low-skewed situations, which may result in dangerous unexpectedly large settlements.
Probability Weighted Moments The PWM comprise another valuable system to characterize probability distributions, whose theory parallels that of conventional moments. Introduced by Greenwood et al. (1979), the primary aim was deriving expressions for parameters of continuous distributions with explicit inverse function w(.) – the quantile function, satisfying w(F(x)) ¼ x, x S. PWM have been extensively used in Geosciences, counteracting some conventional moments’ drawbacks leading to unsatisfactory inference. See Hosking and Wallis (1997) and references therein for further details.
Moments
887
The PWM of X are defined as
Applications: PWM Methods
Mp,r,s :¼ E½Xp FðXÞÞr ð1 FðXÞÞs
The “PWM method” falls in the MM category, following the same principle as presented above: matching population Mp,r,s, expressed in terms of the parameters θ ¼ (θ1, ..., θp) of F, with their empirical functionals, for example,
xp ðFðxÞÞr ð1 FðxÞÞs f ðxÞdx, with p, r, s ℝ,
¼ S
ð16Þ where similarities with (2) are clear: m0k ¼ Mk,0,0 , k ℕ0. Their usefulness stems from being translated in terms of w(u), 0 < u < 1: contrast the notable PWM 1
as :¼ M1,0,s ¼
0
br :¼ M1,r,0 ¼
wðuÞð1 uÞs du, s ℕ0 1 0
wðuÞ ur du, r ℕ0
ð17Þ
ð18Þ
against the redefined (2) m0k ¼
1 0
ðwðuÞÞk du, k ℕ0 :
ð19Þ
Higher complexity of m0k is evident, as successively higher powers of w(.) are employed, absent from αs and βr. Appealing properties of PWM include the following: – Mp,r,s exists for all r, s 0 if and only if E|X|p < 1. – Complete determination and characterization of F by the sets , if E|X| < 1 (note the two or moments are functions of each other); however, interpretation as distribution’s features is not apparent. – Connection to expectations of order statistics X1:n X2:n ... Xn:n; particularly, nαn1 ¼ E[X1:n] and nβn1 ¼ E[Xn:n]. – Straightforward estimation – for i.i.d. fXi gni¼1 , the sample PWM as :¼
1 n
n i¼1
ni s
Xi:n
n1 s
1 n
n i¼rþ1
i1 n1 Xi:n r r
ð22Þ
These equations are, again, interchangeable with other convenient equalities of (functions of) PWM. PWM-type estimators have been shown to be more robust to outliers, less subject to sampling variability and estimation bias, and frequently more efficient than conventional moment estimators, potentially even more accurate than Maximum Likelihood estimators for some smaller samples. Generalized PWM (GPWM) methods have been suggested. We mention the GPWM estimators proposed by Diebolt et al. (2008) for the Generalized Extreme Value distribution (GEVd), employed in analysing annual daily precipitation maxima (see “Extrema”), as prescribed by Extreme Value (EV) Statistics framework. PWM-GEVd estimates are valid under the shape parameter condition γ < 1 (equivalently, E|X| < 1); the goal was to extend the method’s validity. The GPWM vo :¼ E[Xo(F(X))], with convenient auxiliary functions o(.), written in terms of the GEVd’s F three parameters (θ1,θ2,γ). The GPWM estimators , the empiriare cal quantile function. Solving the
moment-equations system for three suggested oab(.) functions results in the desired GPWM parameter estimators. These are asymptotically normal, have broader validity domain and improved performance over PWM, especially for large values of γ (common in hydrology and climatology), while conserving conceptual simplicity and easy implementation.
L-moments
1
, s ¼ 0, 1, . . . , n 1 ð20Þ
br ≔
as ðuÞ ¼ as , for s ¼ 0, 1, . . . , m p:
1
, r ¼ 0, 1, . . . , n 1 ð21Þ
are unbiased estimators of αs and βr, also asymptotically normal if m02 of F exists.
Since 1986, when J.R.M. Hosking first introduced the L-moments, a myriad of papers has been published detailing and applying this system and its properties. The fundamental information, for the purpose of this exposition, is compiled in Hosking and Wallis (1997), where L-moments are a steppingstone for several Regional Frequency Analysis (RFA) techniques. Derived as linear combinations of PWM, L-moments inherit the appealing features mentioned above. Particularly, estimation-wise, robustness to measurement errors and sample variability, while accounting for ordering in weighting observations, suggests appropriateness of L-moments for
M
888
Moments
analysis of EVs, paramount to various fields (e.g., seismology). Define the kth L-moment of X as lk :¼
1 0
k1
wðuÞ:
p k, i ui du,
‘1 ¼ a0 ‘2 ¼ a0 2 a1 ‘3 ¼ a0 6 a1 þ 6 a2
¼ b0 ¼ 2 b1 b0 ¼ 6b2 6 b1 þ b0
‘4 ¼ a0 12 a1 þ 30 a2 20 a3
¼ 20b3 30 b2 þ 12 b1 b0
ð29Þ
ð23Þ
i¼0
with corresponding sample L-skewness and sample L-kurtosis
where p k,i
:¼ ð1Þ
ki
k
kþi
i
i
ð1Þki ðk þ iÞ! : ¼ ði!Þ2 ðk iÞ!
t3 ¼ ð24Þ
k
p k,i ai ¼
i¼0
k
p k,i bi :
ð25Þ
i¼0
The complete L-moment set exists and uniquely determines F if E|X| < 1, which is not true for conventional moments. L-moments resolved the PWM interpretability issue, as they meaningfully describe a distribution: explicitly, the first four L-moments are l1 ¼ a0
¼ b0
l2 ¼ a0 2 a1 l3 ¼ a0 6 a1 þ 6 a2 l4 ¼ a0 12 a1 þ 30 a2 20 a3
¼ 2 b1 b0 ¼ 6 b2 6 b1 þ b0 ¼ 20 b3 30 b2 þ 12 b1 b0
ð26Þ where l1 ℝ and l2 0 are the L-location and L-scale; for dimensionless, scale-independent measures of shape, consider the L-moment ratios, tr ¼ lr/l2, r ¼ 3, 4,..., with highlight to the L-skewness and L-kurtosis t3 ¼
l3 l2
and
t4 ¼
l4 : l2
ð27Þ
Moreover, the L-CV t ¼ l2/l1 is akin to c in (6). L-moment ratios are easier to interpret, as they satisfy |tr| < 1 for finite-mean distributions, and, again, all oddorder tr of symmetric distributions are zero; also t4 is bounded relatively to t3 by 1 5t23 1 t4 < 1: 4
and
t4 ¼
‘4 : ‘2
ð30Þ
These natural estimators enjoy attractive properties:
With respect to the PWM (17) and (18), redefine (23) as lkþ1 :¼ ð1Þk
‘3 ‘2
ð28Þ
The L-statistics are simply computable from linear combinations of the order statistics. Sample L-moment counterparts of (26) result from imputing the estimators (20) or (21) into the linear combinations in (25):
– ‘k are unbiased and tr are asymptotically unbiased estimators, both being asymptotically normal if m02 of X exists. – ‘k behave nicely under linear transformations of data. – Contrary to g and k in (9), t3 and t4 are not algebraically bounded w.r.t. sample size. – t3 and t4 are generally less biased than g and k. – The joint distribution of (t3, t4) is near-normal, a useful feature for some RFA procedures. Alternative, less frequent, plotting-position estimators of L-moments can be found in Hosking and Wallis (1997). Again, it is possible to establish the “L-moment method” as a MM fitting approach. It follows the substitution principle introduced above, equating the first m p L-statistics to their population parallels. For brevity, we refer to the Appendices of Hosking and Wallis (1997), showing L-moment-derived expressions for common use distributions’ parameters, including the Lognormal (see “Lognormal Distribution”) and GEVd, relevant in several Geosciences areas. Among the most significant contributions of L-moments is the identification of distributions given a reduced set of summary statistics. For this purpose, heavily featured in RFA, the L-moment Ratio Diagram (LMRD) was introduced. This visual tool presents the relationship between two L-moment ratios for specific distributions. Usually, L-skewness versus L-kurtosis is plotted, as shown in Fig. 1 for some families of interest; this is specially useful for skewed distributions, but higher order ratios could be chosen. A similar plot of conventional skewness versus kurtosis would be uninformative. Notice that 2-parameter distribution families correspond to a point in the LMRD (e.g., Normal), 3-parameter ones draw a curve (e.g., GEVd), different points on the curve imply different shape parameter values, and families with more parameters appear as two-dimensional regions. Marking the estimates (t3, t4) from a given RS on the LMRD, and evaluating its distance to the several distribution-specific curves, may point to appropriateness of fitting one distribution over another; this has multiple practical applications, as we
Moments
889
Moments, Fig. 1 LMRD: Shaded region – general bounds (28); key to distributions: E – Exponential; G – Gumbel; GEV – GEVd; GLO – Generalized Logistic; GP – Generalized Pareto; L – Logistic; LN – Lognormal; N – Normal; U – Uniform (cf. Hosking and Wallis 1997, Appendix A.12)
mention below. In practice, 0 t3 0.5 and 0 t4 0.4 define the commonly appropriate region of the LMRD.
Additionally, we briefly mention recent papers with applications of L-moment methodologies to data of interest from Geosciences:
Regional Frequency Analysis and Other Applications
– Lee et al. (2021) have used the L-moment method to estimate the Gumbel distribution’s parameters and related probabilities, while developing a new, EV theory-based approach of temporal probability assessment of future landslide occurrence from limited rainfall and landslide records. – Silva Lomba and Fraga Alves (2020) developed an automatic procedure, based on the LMRD’s Generalized Pareto-specific curve, for estimation of a crucial entity in EV Statistics, the threshold, showcased by accurate and efficient inference regarding significant wave height data. – Papalexiou et al. (2020) evaluated the reliability of models’ simulations for reproduction of historical global temperatures by comparing, based on L-moments and the LMRD (perceived as robust criteria), the distributional shape of historical/simulated temperatures, aggregated at different time scales.
The aim of the RFA approach suggested by Hosking and Wallis (1997), using an index-flow procedure, is to estimate quantiles of a frequency distribution (FD) at a given site by pooling data, or summary statistics, from sites judged to have similar FD. This is potentially problematic since FDs may not be exactly identical and magnitudes may be dependent across sites. Thus, steps are taken to guarantee accuracy and robustness of the analysis: 1. Screening the data – sites where data behave differently, needing closer inspection, are detected using a discordance measure built on L-moments. 2. Identifying and testing homogeneous regions – after grouping sites where the FDs are judged as approximately the same, L-moments are used in a heterogeneity measure to test if the region is acceptably close to homogeneous. 3. Choosing the FD – a goodness-of-fit L-moment-based measure, relating to the LMRD, is used to evaluate the fitting of candidate distributions to the data. 4. Estimating the FD – a regional L-moment algorithm is suggested for estimating regional and at-site quantiles. L-moments clearly play a pivotal role at all stages of this process.
Summary As we have seen, common usefulness of moment-based methodologies includes data description, distribution identification, parameter estimation, dependence analysis, and further benefits could be mentioned (e.g., “Hypothesis Testing,” “Autocorrelation”). We have explored advantages and drawbacks of three moment systems. The presented recent
M
890
applications to Geosciences show their appropriateness for diverse endeavors. Moreover, there is leeway for generalizations of the mentioned moments to be suggested (e.g., Trimmed L-moments). Moments are, in all, essential devices in Mathematical Geosciences.
Cross-References ▶ Dimensionless Measures ▶ Expectation-Maximization Algorithm ▶ Frequency Distribution ▶ Geostatistics ▶ Monte Carlo Method ▶ Multivariate Analysis ▶ Random Variable ▶ Robust Statistics ▶ Shape ▶ Simulation ▶ Spatial Statistics ▶ Statistical Bias ▶ Statistical Computing ▶ Statistical Outliers ▶ Univariate Acknowldgments JSL and MIFA gratefully acknowledge support from Fundação para a Ciência e a Tecnologia, I.P. through project UIDB/00006/2020 and PhD grant SFRH/BD/130764/2017 (JSL).
Bibliography Diebolt J, Guillou A, Naveau P, Ribereau P (2008) Improving probability-weighted moment methods for the generalized extreme value distribution. REVSTAT-Stat J 6(1):33–50 Dimitrakopoulos R, Mustapha H, Gloaguen E (2010) High-order statistics of spatial random fields: exploring spatial cumulants for modeling complex non-Gaussian and non-linear phenomena. Math Geosci 42(1):65–99 Greenwood JA, Landwehr JM, Matalas NC, Wallis JR (1979) Probability weighted moments: definition and relation to parameters of several distributions expressable in inverse form. Water Resour Res 15(5): 1049–1054 Hosking JRM, Wallis JR (1997) Regional frequency analysis: an approach based on L-moments. Cambridge University Press, Cambridge Jekeli C (2011) Gravity field of the Earth. In: Gupta HK (ed) Encyclopedia of solid earth geophysics. Springer Netherlands, Dordrecht, pp 471–484 Lee JH, Kim H, Park HJ, Heo JH (2021) Temporal prediction modeling for rainfall-induced shallow landslide hazards using extreme value distribution. Landslides 18:321–338 Osterholt V, Dimitrakopoulos R (2018) Simulation of orebody geology with multiple-point geostatistics – application at Yandi channel iron ore deposit, WA, and implications for resource uncertainty. In: Dimitrakopoulos R (ed) Advances in applied strategic mine planning. Springer International Publishing, Cham, pp 335–352 Papalexiou SM, Rajulapati CR, Clark MP, Lehner F (2020) Robustness of CMIP6 historical global mean temperature simulations: trends,
Monte Carlo Method long-term persistence, autocorrelation, and distributional shape. Earth’s Future 8(10):e2020EF001667 Pearson K (1893) Asymmetrical frequency curves. Nature 48(1252): 615–616 Rohatgi VK, Saleh AME (2015) An introduction to probability and statistics. Wiley, New Jersey Silva Lomba J, Fraga Alves MI (2020) L-moments for automatic threshold selection in extreme value analysis. Stoch Env Res Risk A 34(3): 465–491 Wu Y, Gao Y, Zhang L, Yang J (2020) How distribution characteristics of a soil property affect probabilistic foundation settlement – from the aspect of the first four statistical moments. Can Geotech J 57(4): 595–607
Monte Carlo Method Klaus Mosegaard Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark
Definition Any random process can, in principle, be represented by a computer program using pseudorandom numbers. If nearperfectly independent random numbers between 0 and 1 can be produced, appropriate functions of these numbers can simulate realizations of practically any probability distribution needed in technical and scientific applications. The development of Monte Carlo methods grew out of our ability to find those functions and algorithms. The literature on Monte Carlo methods is vast, and the methods are many. We cannot cover everything here, but the intention is to provide a brief description of some of the most important techniques and their principles. After having reviewed basic techniques for random simulation, we shall move on to three high-level categories of Monte Carlo methods: testing, inference, and optimization. We shall also present some hybrid methods, as there is often no sharp borderline between the categories. In the following, unless otherwise stated, we will consider continuous random variables over bounded (measurable) subsets of RN and their probability densities, although many of the methods described are applicable more generally.
Methods Basic Simulation Methods Exact sampling of probability distributions is an important component of all Monte Carlo algorithms. Most important are the following methods (formulated for a 1dimensional space):
Monte Carlo Method
891
Algorithm 1 The “Inverse Method” for a Continuous Random Variable with a Given, Everywhere Nonzero Probability Density Let p be a probability density, and P its distribution function: PðsÞ ¼
x 1
pðsÞds,
ð1Þ
and let r be a random number chosen uniformly at random between 0 and 1. Then the random number x generated through the formula x ¼ P1 ðr Þ
ð2Þ
has probability density p. Algorithm 2 The Box-Muller Method for Gaussian Distributions Let r1 and r2 be random numbers chosen uniformly at random between 0 and 1. Then the random numbers x1 ¼
2 ln r 2 cos ð2pr 1 Þ
ð3Þ
x2 ¼
2 ln r 2 sin ð2pr 1 Þ
ð4Þ
are independent and Gaussian distributed with zero mean and unit variance. As building blocks for some of the more sophisticated algorithms we shall discuss in the coming sections, we need to mention the following two algorithms for sampling in highdimensional spaces: Algorithm 3 Sequential Simulation Consider a probability density p over an N-dimensional space decomposed into a product of N univariate conditional probability densities: pðx1 , x2 , . . . , xN Þ ¼ pðx1 Þpðx2 jx1 Þpðx3 jx2 , x1 Þ . . . pðxN jxN1 , . . . , x1 Þ:
ð5Þ Exact simulation of realizations from p(x1, x2, . . ., xN) can now be performed by first generating a realization x1 from p(x1), then a realization x2 from pðx2 jx1 Þ, then a realization x3 from pðx3 jx2 , x1 Þ, and so on until we have a realization xN from pðxN jxN , . . . , x1 Þ. The point ðx1 , . . . , xN Þ is now a realization from pðx1 , . . . , xN Þ. The above type of algorithm is widely used for simulation of image realizations where the conditional probabilities are obtained from training images (see Strebelle 2002).
The simplest way to sample a constant probability density is through a simple application of Algorithm (3) where all 1D conditional probability densities are constant functions. However, in the Markov Chain Monte Carlo algorithms we shall describe below, we need to perform uniform sampling via random walks whose design can be adapted to our sampling problem. This can be done in the following way: Algorithm 4 Sampling a Constant Probability Density by a Random Walk Given a uniform (constant) probability density, a sequence of random numbers Un(n ¼ 1, 2, . . .) in [0,1], and a PROPOSAL (nþ1) (n) DISTRIBUTION q(x |x ) defining the probability that the random walk goes to x(nþ1), given that it starts in x(n) (possibly with q(x(nþ1)|x(n)) 6¼ q(x(n)|x(nþ1)), then the points visited by the iterative random function V:
V xðnÞ ¼
xðnþ1Þ if U min 1, x
ðnÞ
q xðnÞ jxðnþ1Þ qðxðnþ1Þ jxðnÞ Þ ,
ð6Þ
otherwise
asymptotically converge to a sample of the uniform distribution as n goes to infinity. The above algorithms (1)–(4) are easily demonstrated and straightforward to use in practice. Algorithms (1) and (2) can without difficulty be generalized to higher dimensions.
Monte Carlo Testing If we are interested in testing the validity of a parametric statistical model of, say, an inhomogeneous earth structure, we can select an appropriate function (a “statistic”) summarizing important properties of the structure, for instance, a variogram. We can now generate Monte Carlo realizations of the structure and compute values of the statistic for each realization. By comparing the distribution of the statistics with the observed (or theoretical) value of the statistics, we can decide if our choice of model is acceptable (Mrkvicka et al. 2016). The decision relies on a statistical test measuring to what extent the observed statistics coincides with high probabilities in the histogram of simulated values. The above testing approach can be seen as a tool to determine acceptable models satisfying given observations. This is related, but in principle more general, than the Monte Carlo inference methods described in the following. In these methods, the statistical model is fixed beforehand.
M
892
Monte Carlo Method
Monte Carlo Inference Inference problems are problems where we seek parameters of a given, underlying stochastic model, designed to explain observable data d. In physical sciences, technology, and economics, inference problems are often inverse problems which, in a probabilistic formulation, result in the definition of a posterior probability distribution p(x|d) describing our state of information about model parameters x after inference. Integrals of p over the parameter space can then provide estimates of event probabilities, expectations, covariances, etc. Such integrals are of the form I¼
hðxÞpðxÞdx,
ð7Þ
w
where p(x) is the posterior probability density. They can be evaluated numerically by generating a large number of random, independent realizations x(1), . . ., x(N ) from p(x). The sum 1 I N
N
Markov-Chain Monte Carlo (MCMC)
Algorithm 6 (Metropolis-Hastings). Given a (possibly un-normalized) probability density p(x) > 0, a sequence of random numbers Un(n ¼ 1, 2, . . .) in [0,1], and an iterative random function V (x) sampling a constant probability density using Algorithm (4): xðnþ1Þ ¼ V xðnÞ :
h xðnÞ :
ð8Þ
n¼1
is then an estimate of the integral. The fact that the integral (7) in high-dimensional spaces is more efficiently calculated by Monte Carlo algorithms than by any other method means that Monte Carlo integration has become important in numerical analysis. For many practical inference problems, the model space is so vast, and evaluation of the probability density p(x) is so computer intensive, that full information about p(x) remains unavailable. In this case, we need algorithms that can do with values of p(x) from one point at a time. The simplest of these algorithms is the rejection algorithm (von Neumann 1951):
The Rejection Algorithm
Algorithm 5 The Rejection Algorithm Assume that p(x) is a probability density with an upper bound M max ( p(x)). In the nth step of the algorithm, choose with uniform probability a random candidate point xc. Accept xc only with probability paccept ¼
The rejection algorithm is ideal in the sense that it generates independent realizations from bounded probability densities, but for distributions in high-dimensional spaces with vast areas of negligible probability, the number of accepted points is small. This makes the algorithm very inefficient, and for this reason random-walk-based algorithms were developed. The original version of this was the Metropolis Algorithm (Metropolis et al. 1953), but it was later improved by Hastings (1970):
pð xc Þ : M
ð9Þ
The set of thus accepted candidate points is a sample from the probability distribution p(x).
ð10Þ
Then the distribution of samples produced by the iterative random function W: xðnþ1Þ ¼ W xðnÞ
¼
V xðnÞ x
ðnÞ
if
Un min 1,
p V xðnÞ pðxðnÞ Þ
,
otherwise ð11Þ
will asymptotically converge to p(x) as n goes to infinity. An extended form of this algorithm is obtained if V (x) is sampling an arbitrary probability density r(x), in which case the algorithm will sample the product distribution p(x)r(x). This was suggested by Mosegaard and Tarantola (1995) as a way of sampling the posterior distribution in Bayesian/probabilistic inverse problems, where r(x) is the prior probability density and p(x) is the likelihood. In this way, a closed-form expression for the prior was unnecessary, allowing V to be available only as a computer algorithm. It is important to realize that the design of the proposal algorithm (4) is critical for the performance of the MetropolisHastings algorithm. The proposal distribution q(x(nþ1)|x(n)) will work efficiently only if it locally resembles the sampling distribution p(x) (except for a constant). In this way, the number of rejected moves will be minimized. In practice, any sampling with the Metropolis-Hastings algorithm needs to run for some time to reach equilibrium
Monte Carlo Method
893
sampling with statistically stationary output. The time (number of iterations) needed to reach this equilibrium is called the burn-in time, and it is usually found by experimentation. Another problem is that the algorithm is based on a random walk, and hence, sample points will not be independent when sampled closely in time. To determine the time spacing between approximately independent samples, a correlation analysis may be needed. Ways of avoiding poor sampling results were discussed by Hastings (1970). Geman and Geman (1984) introduced the Gibbs Sampler, another variant of the Metropolis algorithm where sample points were picked along coordinate axes in the parameter space according to the conditional distribution p(xj|x1, . . ., xj1, xjþ1. . ., xN), xj being the parameter to be perturbed. Although this strategy gives acceptance probabilities equal to 1, it may not be practical when calculation of the conditional probability is computationally expensive.
V xðnÞ , N ðnÞ
¼
x ,N
ðnÞ
ð13Þ
u(p) and u(q) being random vectors with generally nonuniform distributions, and where jJj is the Jacobian @ xðpÞ , uðpÞ , jJ j ¼ @ ðxðqÞ , uðqÞ Þ
Algorithm 7 Reversible-Jump Monte Carlo Let p(x, N) be a probability density over a subset of RN, augmented with its dimension parameter N. The MetropolisHastings algorithm (6) operating in the extended space with
q xðnÞ , NðnÞ jxðnþ1Þ , Nðnþ1Þ q xðnþ1Þ , Nðnþ1Þ jxðnÞ , NðnÞ
jJj
,
ð12Þ
otherwise
where the proposal distributions q(x(q)|x( p)) and q(x( p)|x(q)) are expressed through the invertible, differential mappings:
xðpÞ ¼ h0 xðqÞ , uðqÞ ,
The Metropolis-Hastings algorithm (6) can be formulated to allow variable dimension of the parameter vector x (Green 1995). This version of the algorithm is used in Bayesian inference when the number of parameters N is treated as an unknown (Sambridge et al. 2006), making it possible to work with sparse models, and thereby reducing the computational burden:
xðnþ1Þ , N ðnþ1Þ if Un min 1, ðnÞ
xðqÞ ¼ h xðpÞ , uðpÞ
Monte Carlo Inference for Sparse Models
ð14Þ
will sample the extended space according to p(x, N). This algorithm will not only sample the space of models x (with varying dimension), but also the distribution N of the dimension itself.
Sequential Monte Carlo Methods Estimating the evolution of a dynamic system is important in science and technology, for instance, in predicting the movement of an airplane in motion, or in numerical weather prediction. An important analytical tool in this field is the Kalman filter (Kalman 1960) which iteratively provides
deterministic, Bayesian, and linear updates to past predictions of system parameters. For large systems, however, the many parameters make the analytical handling of large covariance matrices computationally intractable. This led to the development of the ensemble Kalman filter (EnKF) (Evensen 1994, 2003) a Monte Carlo algorithm that avoids the covariance matrices and instead represents the Gaussian distributions of noise, prior and posterior by ensembles of realizations. A basic formulation is the following: Algorithm 8 Ensemble Kalman Filter (EnKF) Assume that data d at any time are related to the system parameters x through the linear equation d ¼ Hx. If X is an n N matrix of N realizations from the initial (prior) Gaussian distribution of system parameters, and if D is an m N matrix with N copies of data d þ em where d is the noise-free data and each em is a realization of the Gaussian noise, it can be shown that if K ¼ CHT(HCHT þ R)1, the columns of the matrix X ¼ X þ KðD HXÞ
ð15Þ
are realizations from the posterior probability density. Iterative application of these rules, where the posterior realizations in one time step are used as prior realizations in the next step, will approximately evolve the system in time according to the model equations and the uncertainties.
M
894
Monte Carlo Method
For more details about the implementation, see Evensen (2003). A more recent, nonlinear, and non-Gaussian alternative to EnKF is the Particle Filter method (Reich and Cotter (2015), Nakamura and Potthast (2015) and van Leeuwen et al. (2015)). The standard particle filter is a bootstrapping technique (see below) operating in the following way: Algorithm 9 Particle Filter Assume that data d at any time are related to the system parameters x through the equation d ¼ h(x). After n time ðnÞ
ðnÞ
ðnÞ
steps, let x1 , x2 , . . . , xN be N initial realizations (“particles”) of the system parameters, representing the prior probability density ðnÞ
p xð n Þ ¼
1 ðnÞ d xðnÞ xi : N
i¼1
ð16Þ
Assume that the system is propagated forward in time by the generally nonlinear model f : x
ðnÞ
¼f x
ðn1Þ
þe
ðnÞ
ð17Þ
where e(n) is modelization noise introduced in the n’th time step. If the likelihood function for time step n is p dðnÞ j xðnÞ ¼ pe d h xðnÞ
ð18Þ
We can use Bayes’ rule and obtain p xðnÞ jdðnÞ ¼ p dðnÞ jxðnÞ n
p xðnÞ p dðnÞ
wi ðnÞ d xðnÞ xi ðnÞ
ð19Þ
i¼1
Empirical Monte Carlo Assume that N realizations from an unknown probability distribution p(x) are available, and that we, for some reason, are unable to obtain more samples. We are interested in inferring information about a parameter of p(x), for example, the distribution of its mean. This can be accomplished by an empirical Monte Carlo technique, for example, the resampling method Bootstrapping (Efron 1993): Algorithm 10 Bootstrapping To infer information about a parameter of the distribution p(x) from a set of realizations A ¼ {x1, . . ., xN}, draw N elements from A with replacement (thereby allowing elements to be repeated) and compute for the N elements. This process is repeated many times, each time obtaining a new value of (if N is large). The normalized histogram for of all these resampling experiments will be an approximation to the distribution of . Bootstrapping can in this way be used to evaluate biases, variances, confidence intervals, prediction errors, etc.
Monte Carlo Optimization An important application of Monte Carlo algorithms is in the solution of complex optimization problems, for instance, in the location of the global minimum for an objective function E(x). One of the first examples of a Monte Carlo optimizer was the following modification of the Metropolis-Hastings algorithm (6) (Kirkpatrick et al. 1983):
Simulated Annealing
where wi ðnÞ ¼
ðnÞ
p d jxi kp
ðnÞ
dðnÞ jxk ðnÞ
:
ð20Þ
Iterative application of these rules, where the posterior realizations in one time step are used as prior realizations in the next step, will approximately evolve the system in time according to the model equations and the uncertainties. Resampling (sampling with replacement from the ensemble) is often used before each time step to obtain more equally weighted, but possibly duplicated particles.
Algorithm 11 (Simulated Annealing) Given an objective function E(x), a sequence of random numbers Un(n ¼ 1, 2. . .) in [0,1], a decreasing sequence of positive numbers (“temperature” parameters) Tn ! 0 for n ! 1, and an iterative random function V (x), sampling a constant probability density: xðnþ1Þ ¼ V xðnÞ :
ð21Þ
Then the sample points produced by the iterative random function A:
Monte Carlo Method
895
problem by skipping the forced decrease of the “temperature” parameter T. The idea is to operate on an ensemble of models, and to include T in the parameters to be sampled:
xðnþ1Þ ¼ A xðnÞ ¼
V xðnÞ ðnÞ
x
if Un min 1,
exp E V xðnÞ =Tn expðEðxðnÞ Þ=Tn Þ
,
otherwise
(22) asymptotically converge to the minimum of E(x) as n goes to infinity. This algorithm was inspired by the process of chemical annealing, where a crystalline material is slowly cooled (T ! 0) from a high temperature through its melting point, resulting in the formation of highly ordered crystals with low lattice energy E. In each step of the algorithm, thermal fluctuations in the system are simulated by the Monte Carlo algorithm (for applications, see Mosegaard and Vestergaard (1991) and Deutsch and Journel (1994)). It is seen that, for constant temperature parameter T, the above algorithm is actually a Metropolis-Hastings algorithm designed to sample the Gibbs-Boltzmann distribution (see Mosegaard and Vestergaard 1991)
Algorithm 12 (Parallel Tempering) Given an ensemble x ¼ (x1, x2, . . .xK) of K models, and their temperatures T1, T2, . . ., Tk, all distributed according to the same probability density p over the augmented spaces (xK, Tk), k ¼ 1, 2, . . ., K. The Metropolis-Hastings algorithm (6), with the special rule for V that any perturbation of T consists of swapping values of T for two random ensemble members, asymptotically samples the joint distribution ∏k p(xk, Tk), and the sample of ensemble members with Tk ¼ 1 approaches a sample from the original target distribution p(x). If positive values of T close to T ≈ 0 are allowed in this algorithm, the ensemble members with Tk ¼ T* will be located close to the maxima for p(x). In this way, parallel tempering can be used for optimization in a way similar to simulated annealing.
Other Approaches PB ð m Þ ¼
exp EðTmÞ Z ðT Þ
ð23Þ
Monte Carlo Optimization-Sampling Hybrids The use of MCMC algorithms for inference often runs into problems in high-dimensional parameter spaces. When the target probability/objective function has multiple optima, MCMC algorithms show a critical slowing-down often making the analysis computationally intractable. There exist a few methods to alleviate this problem. Common to these methods is to use a “temperature” parameter to artificially increase (or vary) the noise to broaden the sampling density to hinder entrapment in local optima during sampling.
Parallel Tempering One method is a simple modification of simulated annealing, where E(x) ¼ ln ( p(x)), and the decrease of T is stopped at T ¼ 1 (see Salamon et al. (2002)). An inherent problem with simulated annealing is, however, the risk of entrapment in local minima for E(x). The parallel tempering algorithm (Geyer 1991; Sambridge 2013), which is an improvement of an older algorithm simulated tempering (Marinari and Parisi 1992; Sambridge 2013), seeks to avoid the entrapment
Biological systems evolve by using a large population to explore many options in parallel, and this is the source of inspiration for evolutionary algorithms. An example of this is genetic algorithms (Holland 1975) where an ensemble x1, . . ., xK of K sample points is assigned a probability p(xk) of survival (“fitness”). In a simple implementation, the population is initially generated randomly, but at each iteration it is altered by the action of three operators: selection, crossover, and mutation. Selection randomly resamples the ensemble according to p(xk), thereby typically duplicating ensemble members with high fitness at the expense of members with low fitness, improving the average fitness of the ensemble. The crossover step exchanges parameter values between two random members, and the mutation step randomly changes a parameter of one member. Crossover and mutation attempts are only accepted according to predefined probabilities. Iterative application of the above three steps allows the algorithm to asymptotically converge to a population with high fitness values p(xk). Another widely used technique is the neighborhood algorithm (Sambridge 1999a, b) where sample points are iteratively generated from a neighborhood approximation to the target distribution p(x). The approximation is a piecewise constant function over Voronoi cells, centered at each of the previous sample points. The approximation to the target distribution is updated in each iteration, concentrating the sampling in multiple high-probability regions. The method will generate a distribution of points biased toward the maxima of p(x).
M
896
Neither of the two algorithms above produces samples of the probability distribution p, but if supplemented with appropriate resampling of the output, an approximate sample from p may be produced.
Summary We have given an overview of the use of Monte Carlo techniques for inference and optimization. A main theme behind the development of many methods has been to reduce the computational workload required to solve large-scale problems. Several ways of dealing with this challenge were discussed, including Markov-Chain Monte Carlo strategies, the use of sparse models, improving numerical forecast schemes through nonparametric representations of probability distributions, and even improving ideas inherited from early works on simulated annealing. Research in Monte Carlo methods is still a prolific area, and there is no doubt that many interesting developments are waiting ahead.
Cross-References ▶ Bayes’s Theorem ▶ Bayesian Inversion in Geoscience ▶ Bootstrap ▶ Computational Geoscience ▶ Ensemble Kalman Filtering ▶ Genetic Algorithms ▶ Geostatistics ▶ Inversion Theory ▶ Inversion Theory in Geoscience ▶ Markov Chain Monte Carlo ▶ Markov Chains: Addition ▶ Multiple Point Statistics ▶ Optimization in Geosciences ▶ Particle Swarm Optimization in Geosciences ▶ Realizations ▶ Sampling Importance: Resampling Algorithms ▶ Sequential Gaussian Simulation ▶ Simulated Annealing ▶ Statistical Computing ▶ Uncertainty Quantification ▶ Very Fast Simulated Reannealing
References Deutsch CV, Journel AG (1994) Application of simulated annealing to stochastic reservoir modeling. SPE Adv Technol Ser 2:222–227 Efron B (1993) An introduction to the bootstrap. Chapman & Hall\CRC Evensen G (1994) Sequential data assimilation with a nonlinear quasigeostrophic model using Monte Carlo methods to forecast error
Monte Carlo Method statistics. J Geophys Res 99(C5):10,143–10,162. https://doi.org/10. 1029/94JC00572 Evensen G (2003) The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dyn 53(4):343–367. https:// doi.org/10.1007/s10236-003-0036-9 Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741. https://doi.org/10.1109/TPAMI.1984. 4767596 Geyer CJ (1991) Markov chain Monte Carlo maximum likelihood. Interface Foundation of North America Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82(4):711–732. https://doi.org/10.1093/biomet/82.4.711 Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109 Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45. https://doi.org/10.1115/1.3662552 Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680. https://doi.org/10.1126/ science.220.4598.671 Marinari E, Parisi G (1992) Simulated tempering: A new Monte Carlo scheme. Europhys Lett 19(6):451–458. https://doi.org/10.1209/ 0295-5075/19/6/002 Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092 Mosegaard K, Tarantola A (1995) Monte Carlo sampling of solutions to inverse problems. J Geophys Res 100:12431–12447 Mosegaard K, Vestergaard PD (1991) A simulated annealing approach to seismic model optimization with sparse prior information. Geophys Prosp 39(05):599–612 Mrkvička T, Soubeyrand S, Myllymäki M, Grabarnik P, Hahn U (2016) Monte Carlo testing in spatial statistics, with applications to spatial residuals. Spatial Stat 18:40–53, spatial Statistics Avignon: Emerging Patterns Nakamura G, Potthast R (2015) Inverse modeling. IOP Publishing, Bristol Reich S, Cotter C (2015) Probabilistic forecasting and Bayesian data assimilation. Cambridge University Press, Cambridge, UK Salamon P, Sibani P, Frost R (2002) Facts, conjectures, and improvements for simulated annealing. Society of Industrial and Applied Mathematics Sambridge M (1999a) Geophysical inversion with a neighbourhood algorithm – i. searching a parameter space. Geophys J Int 138(2): 479–494. https://doi.org/10.1046/j.1365-246X.1999.00876.x Sambridge M (1999b) Geophysical inversion with a neighbourhood algorithm – ii. appraising the ensemble. Geophys J Int 138(3): 727–746. https://doi.org/10.1046/j.1365-246x.1999.00900.x Sambridge M (2013) A parallel tempering algorithm for probabilistic sampling and multimodal optimization. Geophys J Int 196(1): 357–374. https://doi.org/10.1093/gji/ggt342 Sambridge M, Gallagher K, Jackson A, Rickwood P (2006) Transdimensional inverse problems, model comparison and the evidence. Geophys J Int 167(2):528–542. https://doi.org/10.1111/j.1365-246X. 2006.03155.x Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34:1–21 van Leeuwen P, Cheng Y, Reich S (2015) Nonlinear data assimilation. Springer, Berlin von Neumann J (1951) Various techniques used in connection with random digits. Monte Carlo methods. Nat Bureau Standards 12: 36–38
Moran’s Index
Moran’s Index Giuseppina Giungato1 and Sabrina Maggio2 1 Department of Economics, Section of Mathematics and Statistics, University of Salento, Lecce, Italy 2 Dipartimento di Scienze dell’Economia, Università del Salento, Lecce, Italy
Synonyms Moran I; Moran index; Moran’s coefficient; Moran’s I; Moran’s ratio
Definition Moran’s index is a measure of spatial autocorrelation.
897
From the eventual absence of spatial autocorrelation in the residuals, it would be possible to deduce that (although the explanation of the model is not perfect) there would be little advantage in looking for further explicative variables, as they would be probably difficult and maybe singular to every sampling location. As mentioned above, spatial autocorrelation is about comparing two types of information: similarity of location and similarity among attributes. The computation of the spatial proximity depends on the type of objects (points, areas, lines, rasters), while similarity among attributes can be measured in ways which depend on the type of data (interval, ordinal, nominal). In the continuation, the following notation will be utilized: • n: number of sampled objects; • i, j: any two of the objects; • wij: the similarity of i’s and j’s objects, where wii ¼ 0 for all i; • zi: the value of the attribute of interest for object i; • cij: the similarity of i’s and j’s attributes.
Introduction to Spatial Correlation Measures Generally, the grade to which attributes or objects at some location on the earth’s area are analogous to other attributes or objects in neighboring places is called spatial autocorrelation. The concept which Tobler has defined as the “first law of geography: everything is related to everything else, but near things are more related than distant things” (Tobler 1970), reflected the existence of the spatial autocorrelation. As the name indicates, autocorrelation is the correlation between two variables of the same random field. If Z is the attribute observed in the plane, then the term spatial autocorrelation refers to the correlation between the same attribute at two spatial locations (Schabenberger and Gotway 2005). Spatial analysis treats two quite different types of information. A first type of information is represented by the attributes of spatial characteristics, which comprise both measures (e.g., rainfall or population) and qualitative variables (e.g., name or type of soil). The other type of information is represented by the location of each spatial feature, which can be described by different geographical references or coordinate systems or by its position on a map (Goodchild 1986). Spatial autocorrelation offers information about a phenomenon distributed in space, which can result essential for a right interpretation of the same phenomenon and which is not obtainable through other types of statistical analysis. Moreover, in searching for a specific spatial distribution, it often occurs to find a variable which explains a pattern, but not completely. In this case, in order to identify any other variables that could be useful in explaining the remaining variation, the next stage is to examine the spatial model of residuals.
In general, the measures of spatial autocorrelation compare the set of attribute similarities cij with the set of locational similarities wij, combining them into a single index of the n
n
wij cij , with properties which lead to quick
form i¼1 j¼1
interpretation. Various methods have been proposed for estimating spatial proximity in order to create a suitable matrix of wij. Since the probability that objects are connected is greater for close ones than for distant ones, the existence of a connection between objects is often considered a measure of similarity of location. For example, a simple, indirect binary indicator of spatial proximity is represented by a common boundary between areas. After obtaining the distances dij between objects i and j, in order to define the weights wij some appropriate decreasing function can be used. For example, a negative exponential wij ¼ exp(bdij) or a negative power wij ¼ dijb can be used. In both cases a large b generates a quick decline and a small b a slower one. Indeed b represents a parameter affecting the speed at which weight decreases with spatial distance. Various modes to assess the similarity of attributes have also been suggested, which are appropriate to the involved types of attributes. For nominal data the usual approach is to set cij to 1 if i and j have the same attribute value, and 0 otherwise. For ordinal data, similarity is usually based on comparing the ranks of i and j, while for interval data both the squared difference (zi zj)2 and the product (zi z¯) (zj z¯) are commonly used.
M
898
Moran’s Index
where s2 denotes the variance of the attribute z values, that is,
Moran’s Index As previously mentioned, Moran’s index (Moran 1948) is a measure of spatial autocorrelation. In the following, some contexts characterized by different types of data and objects are discussed, as well as the respective possibility to apply spatial autocorrelation measures to them.
Interval Data and Area Objects In the framework of interval data and area objects, the attribute similarity measure utilized by Moran’s index makes it equivalent to a covariance between the values of a pair of objects cij ¼ (zi z¯) (zj z¯), where z¯ denotes the mean of the attribute z values and cij measures the covariance between the value of the variable at one place and its value at another. The remaining terms in Moran’s index are designed to constrain it to a fixed range: n
s2 ¼
n
ðzi zÞ2 =ðn 1Þ . The value of c will be greater
i¼1
when large values of wij, which coincide with pairs of areas in contact, correspond to large values of cij, or large differences in attributes. As in Moran’s index, the wij terms measure the spatial proximity of i and j and can be calculated by any suitable method. In most applications Geary’s and Moran’s indices are satisfactory in the same way. However, Moran’s index has the advantage that its extremes reflect the instinctive concepts of positive and negative correlation. Conversely, the scale used by Geary’s index is not very clear as summarized in Table 1. To explain the sense of the terms, the step-by-step computation of Moran’s index can be represented in the following Example 1, using binary weights, as described above. Example 1 Given the example of data set in Fig. 1, the stepby-step calculation of Moran’s I is presented in the following.
n
z1 ¼ 1
wij cij I¼
i¼1 j¼1 n n s2
ð1Þ
n¼4
wij
i¼1 j¼1
z2 ¼ 2
w¼
z3 ¼ 2 z4 ¼ 3
0
1
1 1
1
0
0 1
1
0
0 1
1
1
1 0
where the wij terms represent the spatial proximity of i and j and s2 denotes the sample variance s2 ¼
n
n
ðzi zÞ2 =n.
z¼
i¼1
If neighboring areas tend to have similar attributes, the Moran index is positive. Conversely, if they tend to have more dissimilar attributes than might be expected, the Moran index is negative. Finally it is approximately null when attribute values are distributed randomly and independently in space. Moran’s index can be used alternatively to Geary’s index for the identical application framework (area objects and interval attributes). For a variable measured on an interval scale, in Geary’s index (Geary 1968), attribute similarity cij is calculated with the squared difference in value cij ¼ (zi zj)2. In the original paper (Geary 1954) the similarity of location was calculated in a binary way. If i and j had a common border, the weights wij could be set equal to 1. In the opposite case, the weights wij could be set equal to 0. The other terms in Geary’s index guarantee that extremes happen at definite points: n
c¼
zi =n ¼ 8=4 ¼ 2 i¼1
Moran’s Index, Table 1 Conceptual scales of spatial autocorrelation: correspondence between Geary’s and Moran’s indices Conceptual scales Similar Independent Dissimilar a
c index 0 0a I ’ 0a I < 0a
The correct expected value is 1/(n 1) rather than 0
Moran’s Index, Fig. 1 An example of data set for calculation of Moran’s coefficient
z1 = 1
z2 = 2
n
wij cij i¼1 j¼1 n n 2s2 wij i¼1 j¼1
ð2Þ
z3 = 2
z4 = 3
Moran’s Index
899
cij ¼ ðzi zÞ zj z
n
c¼
1
0 0
1
0
0 0
0
0
0 0
0
1
0 0
1
n
wij cij ¼ 2 i¼1 j¼1 n
s2 ¼
ðzi zÞ2 =n ¼ 2=4 ¼ 0:5
i¼1 n
n
wij cij I¼
i¼1 j¼1 n n s2
¼ 2=0:5 10 ¼ 0:4: wij
i¼1 j¼1
Interval Data and Other Object Types (Point, Line, and Raster) In the context of interval attributes, both Moran’s index and Geary’s index can also be used for other types of objects (point, line, and raster), as long as a proper way can be designed for assessing the geographical closeness of pairs of objects. First of all it should be noted that one manner of creating a suitable wij measure for area objects consists in replacing each area by positioned control point and calculating distances. Then, to define the weights some appropriate decreasing function can be used. For example, a negative exponential or a negative power can be used, as said earlier. This indicates a simple method for accommodating both indices to point objects. Another option would be to replace point objects with areas, using a systemic process; for example, the creation of Thiessen or Dirichlet polygons (Boots 1986). This procedure splits the total survey region into polygons. Each polygon encircles a point and envelopes the surface which is nearer to that point than any other. Relating to this it is suggested to see a powerful efficient way to creating Thiessen polygons from a point set (Brassel and Reif 1979), as well as a case showing the application of this procedure to evaluate spatial autocorrelation for point objects (Griffith 1982). As already mentioned, the weights could be defined on the basis of the existence of the common boundary or its length. In effect, areas and points can be considered replaceable for both Moran’s and Geary’s indices. When the application context is characterized by line objects, it is possible to distinguish two different situations. In the first case, the lines can represent connections between nodes; thus it is necessary to calculate the spatial
autocorrelation existing in some attribute observed at the nodal points, and the cij are measures of the similarity of the attributes of each pair of nodes, and the wij are measures of the links between them. For instance, the weight wij can be set equal to 1 if a direct link exists between i and j and 0 otherwise, or be based on the link capacity or length. In the other situation, it is necessary to determine the spatial autocorrelation existing in some attribute observed at the links (for instance, probability of motor vehicle accident, or transport cost per unit length). In this case, the weight wij represents a measure of proximity between two links and might be defined according to the spatial distance between the central points of two links, or on whether or not two links are in direct connection. Therefore Geary’s and Moran’s indices represent proper measures for the interval attributes, even in the case of line objects, provided appropriate methods can be proposed for defining the spatial proximity measures. For rasters (lattices) one simple way of defining the weights is to set wij equal to 1 if i and j have a common border and 0 in the opposite case. In some works two cells are considered to be adjacent, also when they join at a corner. When handling square rasters, the previous two cases are sometimes referred to as the Rook or 4-neighbor case and the Queen or 8-neighbor case, respectively (Lloyd 2010).
Ordinal Data Several documents have treated the identification of particular spatial autocorrelation indices for ordinal attributes, namely attributes with ordinal scale of measurement. In particular, some experts have dealt with explaining the spatial distribution of the Canadian settlement sizes (Coffey et al. 1982). They wanted to investigate whether big settlements tended to be encircled by little settlements or by other big ones, or whether the size of settlements was casually distributed. In order to make the attribute an ordinal measure restricted to integers from 1 to 5, the settlements have been assigned to one of the five dimensional classes. The authors proposed two kinds of spatial autocorrelation measures. In the first approach, the ordinal size class data are treated as having range properties, considering that the computation of Geary’s and Moran’s indices requires the difference between z values. Indeed, both the indices were calculated directly from the size classes by setting zi equal to an integer between 1 and 5. The wij terms were set equal to 1 for pairs of settlements sharing a direct road link and 0 in the opposite case. In the other approach, the set of measures is built on the recorded number of connections between settlements of different dimensional classes, and the ordinal data are treated as if they were merely nominal. For example, n23 would represent the count of direct road connections recorded between a
M
900
settlement belonging to class 2 and a settlement belonging to class 3. Then, the recorded number of every type of link was confronted with the number predicted, assuming a random distribution for settlement sizes. This type of measure is the issue of the following section. On the other hand, some works described the index more directly suitable for ordinal data (Royaltey et al. 1975). Each object is given a rank based on its ordinal value, and the cij are then built on the absolute difference between ranks for each pair. The index was implemented to a set of points, using a procedure which created the matrix of weights called adjacency matrix of the Gabriel graph, obtained setting the weights equal to 1, if no other points were inside a circumference constructed with the pair of points as diameter and 0 alternatively (Gabriel and Sokal 1969). In addition, there was a further generalization of the index (Hubert 1978), and other indices for ordinal data, ever built on ranks, were discussed (Sen and Soot 1977).
Nominal Data In the context of nominal attributes and for any type of objects (areas, points, lines, or raster) it is appropriate to think of them as a set of objects on the map colored using a limited collection of colors. If k indicates the number of possible attribute classes, the spatial distributions of the nominal data are usually referred to as k-color maps. Thus, a two nominal classes attribute could be imagined as a model of black and white, as good as a three nominal classes attribute could be viewed as a three-color distribution (red, yellow, and blue). In dealing with nominal data the ways in which they can be compared represent a strong constraint, since no measures of difference are possible. The characteristics of the nominal data permit just two possibilities: two attributes can be the same or different. In constructing indices, then, the cij can take only one of two values. In this context, many of the proposed measures are built on join count statistics, by defining wij in a binary way. Much of early work was in the context of raster data, in which if two cells have a common border, they can be considered as joined. Then the count of times that a color s cell results joined to a color t cell is defined as the join count between color s and color t (indicated with nst). Moreover, in dealing with joins between cells having the same color, it is necessary to avoid double counting. Spatial autocorrelation measures can be based on join count statistics, as these reflected the spatial arrangement of colors. When the distribution exhibits positive autocorrelation the incidence of joins of the same color will be lower than would be expected assuming a casual distribution of colors. In the same way, if the distribution exhibits negative autocorrelation the incidence of joins between different colors will be higher than expected. Join count statistics are an easy
Moran’s Index
way to measure spatial model. However, they do not represent a summary index, and they are not similar to Geary’s or Moran’s measures. The previous ideas can be extended from rasters to other types of objects, whenever analogous binary spatial proximity measures can be developed. Therefore, if two points share a Thiessen border, then they could be treated as united; if two lines share a common node, they could be considered adjacent; if two areas share a common border, they could be considered joined.
Conclusions Spatial autocorrelation represents the correlation between the same attribute at two spatial locations, and it is about comparing two types of information: similarity among attributes and similarity of location. Moran’s index is a measure of spatial autocorrelation for interval attributes and area objects. It uses the covariance between the value of the variable at one place and its value at another, as attribute similarity measure. Moran’s index can be used alternatively to Geary’s, which uses the squared difference in value, as attribute similarity measure for the same context (area objects and interval attributes). In most applications Geary’s and Moran’s indices are equally satisfactory. However, Moran’s index has the advantage that its extremes reflect the instinctive concepts of positive and negative correlations. Conversely, the scale used by Geary’s index is not very clear. In the context of interval attributes, in essence, both Moran’s index and Geary’s index can also be used for other types of objects (point, line, and raster), provided a proper way can be designed for assessing the geographical closeness of pairs of objects. Several documents have treated the identification of particular spatial autocorrelation indices for ordinal attributes, and the authors proposed two kinds of spatial autocorrelation measures. In the first approach, the ordinal size class data are treated as having range properties, considering that the computation of Geary’s and Moran’s indices requires the differences between values. In the second approach, the ordinal data are treated as if they were merely nominal. On the other hand, some works described the index more directly suitable for ordinal data. Finally, for the nominal attributes, join count statistics represent an easy way to measure spatial model. However, they do not provide a summary index, and they are not similar to Geary’s or Moran’s measures.
Cross-References ▶ Database Management System ▶ Geostatistics ▶ Random Variable
Morphological Closing
▶ Spatial Analysis ▶ Spatial Statistics
901
the binary image and B denotes the structuring element, morphological closing is defined as fB ðXÞ ¼ eB ðdB ðXÞÞ ¼ ðX BÞ B
ð1Þ
Bibliography This definition can be reduced to Boots BN (1986) Voronoi polygons. CATMOG 45. Geo Books, Norwich Brassel K, Reif D (1979) A procedure to generate Thiessen polygons. Geogr Anal 11:289–303 Coffey W, Goodchild MF, MacLean LC (1982) Randomness and order in the topology of settlement systems. Econ Geogr 58:20–28 Gabriel KR, Sokal RR (1969) A new statistical approach to geographic variation analysis. Syst Zool 18:259–278 Geary RC (1954) The contiguity ratio and statistical mapping. Inc Stat 5: 115–141 Geary RC (1968) The contiguity ratio and statistical mapping. In: Berry BJL, Marble DF (eds) Spatial analysis: a reader in statistical geography. Prentice-Hall, Englewood Cliffs, pp 461–478 Goodchild MF (1986) Spatial autocorrelation. Catmog 47. Geo Books, Norwich Griffith D (1982) Dynamic characteristics of spatial economic systems. Econ Geogr 58:177–196 Hubert LJ (1978) Nonparametric tests for patterns in geographic variation: possible generalizations. Geogr Anal 10:86–88 Lloyd C (2010) Spatial data analysis: an introduction for GIS users. Oxford University Press, Oxford Moran PAP (1948) The interpretation of statistical maps. J R Stat Soc Ser B 10:243–251 Royaltey HH, Astrachan E, Sokal RR (1975) Tests for patterns in geographic variation. Geogr Anal 7:369–395 Schabenberger O, Gotway CA (2005) Statistical methods for spatial data analysis: texts in statistical science, 1st edn. Chapman & Hall/CRC Press, Boca Raton Sen AK, Soot S (1977) Rank tests for spatial correlation. Environ Plan A 9:897–905 Tobler WR (1970) A computer movie simulating urban growth in the Detroit region. J Econ Geogr 46:234–240
Morphological Closing Aditya Challa1, Sravan Danda1 and B. S. Daya Sagar2 1 Computer Science and Information Science, APPCAIR, BITS, Pilani, India 2 Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
fB ðXÞ ¼ \ Bx jX Bcx
ð2Þ
Equivalently, in case of gray-scale images, if f denotes the gray-scale image and g denotes the structuring element, the closing is defined as f g ð f Þ ¼ e g dg ð f Þ
ð3Þ
Illustrations Morphological closing is one of the basic operators from the field of Mathematical Morphology. Its definition can be traced to fundamental works by Jean Serra (1983, 1988) and Georges Matheron (1975). In general, morphological closing is defined as a dilation operator composed with an erosion operator. In this entry, we illustrate the closing operator using binary images and discuss various properties. We also discuss the more general algebraic closings. In simple words, (2) considers the intersection of all the translates of the structuring element whose complement would contain the original set. This is illustrated in Fig. 1. Here, the structuring element is taken to be a simple disk denoted by a dashed line in Fig. 1. All the points in the complement which do not fit within the structuring element are added to the original dataset. In Fig. 1, this corresponds to the spiked injection of the larger disk which gets filled. Although not discussed here, similar intuition follows in gray-scale images as well (Dougherty and Lotufo 2003).
Some Important Properties The properties of the closing operator are explained using the binary images. However, these properties easily extend to the
Definition Morphological closing is one of the basic operators from the field of Mathematical Morphology (MM). It is also one the simplest morphological filters which is used for constructing more complex filters. And this operator is dual to morphological opening (chapter-ref). In general, morphological closing is defined as a composition of a dilation followed by an erosion. So, in case of discrete binary images, if X denotes
Set X
Closing with B
phiB (X)$
Morphological Closing, Fig. 1 Illustration of morphological closing. The shaded area refers to the set obtained by closing
M
902
Morphological Dilation
gray-scale images as well. First, morphological closing is an increasing operator, X Y ) fB ðXÞ fB ðY Þ
Cross-References
ð4Þ
▶ Mathematical Morphology ▶ Morphological Filtering ▶ Structuring Element
ð5Þ
Bibliography
Moreover, it is also an extensive operator X fB ðXÞ Also, the opening operator is idempotent fB ðfB ðXÞÞ ¼ fB ðXÞ
ð6Þ
Algebraic Closing Morphological closing is a subset of a wider class of operators known as Algebraic Closing. An algebraic closing is defined as any operator on the lattice which is – (a) increasing, (b) extensive, and (c) idempotent. It is easy to see that morphological closing is a specific example of algebraic closing. As an example of the algebraic closings which is not a morphological closing, consider the Convex Closing – take the smallest convex set which contains the given set. In Matheron (1975), it is shown that any algebraic closing can be defined as infimum of morphological closing. Application to Filtering Recall that, in the context of Mathematical Morphology, an operator which is increasing and idempotent is referred to as a filter. As stated earlier, morphological closing is one of the simplest filters based on which complex filtering operators such as top-hat transforms, granulometries, and alternating sequential filters are constructed, for instance, Black top-hat transform which is defined as BTHðf Þ ¼ fðf Þ f
ð7Þ
Dougherty ER, Lotufo RA (2003) Hands-on morphological image processing, vol 59. SPIE Press Matheron G (1975) Random sets and integral geometry. Wiley, New York Serra J (1983) Image analysis and mathematical morphology. Academic Press Serra J (1988) Image analysis and mathematical morphology, volume 2: theoretical advances. Academic Press, New York
Morphological Dilation Sravan Danda1, Aditya Challa1 and B. S. Daya Sagar2 1 Computer Science and Information Systems, APPCAIR, Birla Institute of Technology and Science, Pilani, Goa, India 2 Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
Definition Morphological Dilation is one of the basic operators from the field of Mathematical Morphology (MM). Depending on the domain of application the definition varies. In the standard case of discrete binary images, given a binary image X, and a structuring element B the dilation of X is defined as dB ðXÞ ¼ [ Bx xX
ð1Þ
or simply BTH ¼ f I where I is the identity operator.
Summary In this entry, morphological closing is defined for binary and gray-scale images. The practical utility of morphological closing is then illustrated using a simple binary image. The main properties of morphological closing namely increasing, extensiveness, and idempotence are then mentioned. A generalized notion of morphological closing, i.e., algebraic closing is defined. The entry is concluded by describing construction of more complex filter – black-top-hat transforms using morphological closing.
where Bx denotes the structuring element translated by x. It is also a common practice to write δB(X) ¼ X B. Observe that the definition in (1) extends to continuous binary images as well. In the case of gray-scale images, let f denote the gray-scale image and g denote the structuring function. Then, morphological dilation is defined by dg ðf ÞðxÞ ¼ sup ff ðyÞ þ gðx yÞg yE
ð2Þ
where E denotes the domain of definition. Both of the above definitions can be obtained from a general lattice-based definition of the dilation operator, as discussed in this entry.
Morphological Dilation
903
Illustrations
Binary Images
Morphological dilation is one of basic operators from the field of mathematical morphology. Its definition can be traced to fundamental works by Jean Serra (1983, 1988) and Georges Matheron (1975). Depending on the domain the definition of the dilation operator differs (see Dougherty and Lotufo (2003)). However, all these definitions can be obtained from a generic lattice-based definition. In this entry, we discuss the morphological dilation in specific case of discrete binary and gray-scale images. We then provide the generic lattice-based definition and show how this reduces to the definitions in discrete binary and grayscale images.
The procedure to construct the dilation of a discrete binary image X is illustrated in Fig. 1. 0 values are not indicated in the image. Given the discrete binary image X and the structuring element B, we proceed by translating and placing B at all possible positions of X, as shown in Fig. 1. The final dilated image is obtained by taking the union of all these translations. In mathematical terms, this corresponds to (1). One can also construct the dilation of a discrete binary image X by translating the image X for all possible values b in B, and then taking the union of these translations. In mathematical terms, this corresponds to
3 2 1 0 -1 -2
1 1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
-4 -3 -2 -1
0
1
-3 -4 2
3
M
X ⊕ B−2,0
3
3
3
2
2
2
1 0 -1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
1 0 -1
1 1 1 1 1
1 0 -1
-2
-2
-3
-3
-3
-4
-4
-4
-2
-4 -3 -2 -1
X
0
1
2
3
-4 -3 -2 -1
0
1
2
3
1 1 1 1
1 1 1 1
1 0
Union
1
2
3
1 1 1 1
1 1 1 1 1
-4 -3 -2 -1
0
2
0 -1 -2
1 1 1 1
-3
1 1 1 1 1 1
-4 1
X ⊕ B1,−2
Morphological Dilation, Fig. 1 Illustration of morphological dilation
-1 -2
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1
-4 -3 -2 -1
0
1
2
X ⊕B
3
1
1 1 1 1 1 1
1 1 1 1
-4 0
X ⊕ B0,1
B
1 1 1 1 1 1
2
1 1 1 1
-3
-4 -3 -2 -1
3
1 1 1 1 1
2
3
3
904
Morphological Dilation
dB ð X Þ ¼ [ X b
ð3Þ
bB
Observe that, in the case of flat structuring element, we have
This is equivalent to the definition in (1).
dg ðf ÞðxÞ ¼ sup ff ðyÞ þ gðx yÞg ¼ yE
Gray-Scale Images In the case of gray-scale images, structuring element g is a function as well. It is usually the practice that g is only defined on a finite/compact subset K of the domain. If, on the finite/ compact subset, g only takes the value 0, it is referred to as the flat structuring element, else it is referred to as non-flat structuring element.
sup ff ðyÞg
ð4Þ
ðxyÞ K
where K denotes the finite/compact set on which g is defined. That is, the dilated value is taken to be the maximum over a neighborhood. This is illustrated in Fig. 2. We consider a generic gray-scale image f. In gray-scale images, it is a convention that the values in the flat structuring element are taken to be 0 over the domain of definition and empty when it is not defined. As in the case of binary images, the structuring element is translated to each position of the original image. Within each neighborhood, the maximum
34
117 112
73
3
145
35
2
51
64
111 240
1
16
87
193
36
0
58
158
89
132
-1
21
110 203 117 175 180
-2
49
194 123 227 252
-3
64
-4
68 -4
91
28
-2
3
250
160 118
202 120 181 240 32
196 104 157
48
178 194 216 162
187 127 -3
60
75
204
73
-1
0
1
47
153
84
235
61
169
135 162 2
3
3
(f ⊕ g)3,−2
34
0
158 203 203 203 202 202 240 240
111 240
202 120 181 240
1
0
0
0
1
16
87
193
36
196 104 157
0
0
0
0
0
58
158
89
132
47
153
-1
0
0
0
-1
21
110 203 117 175 180
84
235
-2
-2
49
194 123 227 252
61
169
-3
-3
64
135 162
-4
-4
68
16
87
193
36
0
58
158
89
132
-1
21
110 203 117 175 180
-2
49
194 123 227 252
-3
64
-4
68
60
32
28
48
178 194 216 162 75
204
73
-1
0
1
f
158 193 240 240 240 202 240 240
196 104 157
64
1
-2
1
35
51
111 240
-3
145 193 240 240 240 202 250 250
202 120 181 240
145
2
64
-4
145 145 240 240 240 160 250 250
2
3
2
35
51
187 127
3
3
73
145
2
91
250
160 118
250
160 118
117 112
3
3
2
3
-4
-3
-2
-1
0
1
2
-4
3
91
34
g
60
32
73 28
48
178 194 216 162
187 127 -3
117 112
-2
75
204
73
-1
0
1
47
153
84
235
61
169
-3 194 194 227 252 252 252 235 235
135 162
-4 187 187 194 216 216 216 169 169
2
3
(f ⊕ g)1,2
34
117 112
73
3
145
35
2
51
64
111 240
1
16
87
193
36
0
58
158
89
132
-1
21
110 203 117 175 180
-2
49
194 123 227 252
-3
64
-4
68 -4
91
28
32
-2
3
250
160 118
196 104 157
48
75
204
73
-1
0
1
(f ⊕ g)−2,3
Morphological Dilation, Fig. 2 Illustration of morphological dilation with gray-scale images
-1 194 203 227 252 252 252 235 235 -2 194 203 227 252 252 252 235 235
-4
-3
-2
-1
0
f ⊕g
202 120 181 240
178 194 216 162
187 127 -3
60
Maximum
47
153
84
235
61
169
135 162 2
3
1
2
3
Morphological Dilation
905
is considered, and the central value is replaced by this maximum value. Finally, the maximum over all these images is taken. At the boundaries, only the values that exist in the domain are considered. In the case of non-flat structuring element, the values of g are added to the neighborhood values and then the maximum is computed. Remark: Recall that the gray-scale images can take the values in {0, 1, 2, , 255}. However, it is possible that in definition (2) the right-hand-side value may be higher than 255. In such cases, we take any value greater than 255 to be 255.
The properties of the dilation operator are explained using the binary images. However, these concepts extend to the grayscale images as well. Firstly, morphological dilation is an increasing operator, ð5Þ
Moreover, it is also an extensive operator when the structuring element B is symmetric and the center belongs to B. X dB ð X Þ
ð6Þ
Also, dilation can be thought to be commutative d B ð X Þ ¼ d X ð BÞ
dðfxgÞ ¼ dðT x ðf0gÞÞ ¼ T x ðdðf0gÞÞ ¼ T x ðBÞ ¼ Bx
ð8Þ
Where Tx is a translation operator, which translates all elements by x, and Bx denotes the set B, which is translated by x as usual. Now, to obtain the dilation of a generic set X, we have
Some Important Properties
X Y ) dB ð X Þ dB ð Y Þ
How Does This Definition Relate to Definitions in (1) and (2)? In case of discrete binary images, the complete lattice is taken to be the power set, i.e., ℒ ¼ P ðEÞwhere E denotes the domain of definition for the binary images. The infimum/supremum of any subset is obtained by intersection/union, respectively. Assume that δ({0}) ¼ B, i.e., the set {0} maps a set B in the lattice. Using invariance w.r.t translation, we get
dðXÞ ¼ dð_x X xÞ ¼ _x X ðdðxÞÞ ¼ _x X Bx ¼ [ Bx ð9Þ xX
which is identical to the definition of dilation one has in discrete binary images. In case of discrete gray-scale images, the lattice is obtained by the set of all possible functions f: E ! {0, 1, 2, 255} where E is the domain of definition for the image. The binary images can be thought “of” as a specific case where f: E ! {0, 1} alone. Define f 1 f 2 ( f 1 ðxÞ f 2 ðxÞ for all x E
ð10Þ
The infimum of two functions is given by ð7Þ
Lattice-Based Definition Recall that a partially ordered set is a set ℒ with a binary order such that: (i) x x (reflexivity) – (ii) x y and y x implies x ¼ y (anti-symmetry), and (iii) x y and y z implies x z holds true. A partially ordered set is called a lattice when any finite subset of ℒ has an infimum and a supremum. It is called a complete lattice if any subset of ℒ has an infimum and a supremum. The dilation operator on a complete lattice is defined as an operator δ: ℒ ! ℒ which • is increasing, i.e., x y implies δ(x) δ( y), • preserves the supremum, i.e., δ(_i I xi) ¼ _i I (δ(xi)). In case the index set I is empty, we have δ(0) ¼ 0, where 0 denotes the infimum of ℒ.
ðf 1 ^ f 2 ÞðxÞ ¼ ðf 1 ðxÞ ^ f 2 ðxÞÞ
ð11Þ
and the supremum is given by ðf 1 _ f 2 ÞðxÞ ¼ ðf 1 ðxÞ _ f 2 ðxÞÞ
ð12Þ
Now, observe that if E is finite, then the set of all possible functions as above is a complete lattice, ℒ. Consider the set of functions ei defined as ev,z ðxÞ ¼
0
if x 6¼ z
v
if x ¼ z
ð13Þ
Observe that any function f can be written as f ¼ _fev,z jev,z f g
ð14Þ
So, if one defines the dilation operator on ev,z, then from the law of preserving the supremum we obtain the dilation for f.
M
906
Morphological Erosion
Moreover, assuming the translation to invariance in the domain, one needs to define the dilations for cases of ev,0 where 0 denotes the origin. For each of these functions we assume that the dilation is obtained by
Bibliography Dougherty ER, Lotufo RA (2003) Hands-on morphological image processing, vol 59. SPIE Press, Bellingham Matheron G (1975) Random sets and integral geometry. AMS, Paris Serra J (1983) Image analysis and mathematical morphology. Academic, London Serra J (1988) Image analysis and mathematical morphology, Theoretical advances, vol 2. Academic, New York
dðev,0 Þ ¼ ev,0 þ g
ð15Þ
dðf Þ ¼ dð_fev,z jev,z f gÞ
ð16Þ
¼ _ðfdðev,z Þjev,z f gÞ
ð17Þ
Morphological Erosion
¼ _ðfdðev,0 Þjev,0 T z ðf ÞgÞ
ð18Þ
¼ _ðfev,0 þ gjev,0 T z ðf ÞgÞ
ð19Þ
¼ _ðfT z ðf Þ þ ggÞ
ð20Þ
Aditya Challa1, Sravan Danda2 and B. S. Daya Sagar3 1 Computer Science and Information Systems, APPCAIR, BITS Pilani KK Birla Goa Campus, Pilani, Goa, India 2 Computer Science and Information Systems, APPCAIR, Birla Institute of Technology and Science Pilani, Pilani, Goa, India 3 Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
So, we have
The second equality follows since dilation preserves the supremum. The third equality follows from translation to invariance. That is, if ev,z(x) < f(x), then we have ev,0 < f(x – z). Simply substituting the dilation definition gives the fourth equality. The final equality follows from noting that _ev,0 T–z ( f ) ¼ T–z ( f ). Now, the final equation for a specific x E can be written as dðf ÞðxÞ ¼ _ðff ðx zÞ þ gðxÞgÞ
ð21Þ
This is identical to the gray-scale definition we have in (2).
Definition Morphological erosion is one of the four basic operators from the field of Mathematical Morphology (MM). This is a dual operator to morphological dilation. Depending on the domain of application, the definition varies. In the standard case of discrete binary images, given a binary image X, and a structuring element B (cross-ref), the erosion of X is defined as
Summary ϵ B ðX Þ ¼ In this entry, morphological dilation is defined in the context of binary and gray-scale images. These definitions are then illustrated on fictitious images. Several types of structuring elements that are intrinsic to the definition of a morphological dilation are described. The main properties of morphological dilation – increasing, extensive, and commutative – are discussed. The definition of morphological dilation in lattices is then described in detail.
Cross-References ▶ Mathematical Morphology ▶ Morphological Closing ▶ Morphological Erosion ▶ Morphological Opening ▶ Structuring Element
Xb
ð1Þ
bB
Here B denotes the reflection of B – x B , x B. Xb denotes the image X translated by b for each b belonging to the B. It is also a common practice to write ϵB(X) ¼ X B. Observe that the definition in (1) extends to continuous binary images as well. In the case of gray-scale images, let f denote the gray-scale image and g denote the structuring function. Then, morphological erosion is defined by ϵg ðf ÞðxÞ ¼ inf ff ðyÞ gðx yÞg yE
ð2Þ
where E denotes the domain of definition. These definitions can be derived from the fact that erosion is dual to dilation using the definition of dilation.
Morphological Erosion
907
Illustrations
Binary Images
Morphological erosion is one of the basic operators from the field of mathematical morphology. Its definition can be traced to fundamental works by Jean Serra (1983, 1988) and Georges Matheron (1975). Depending on the domain, the definition of the erosion operator differs Dougherty ER, Lotufo RA (2003). However, all these definitions can be obtained from a generic lattice-based definition. In this entry, we discuss morphological erosion in specific case of discrete binary and gray-scale images. We then provide the derivation of these definitions from the lattice-based definitions.
The procedure to construct the erosion of a discrete binary image X is illustrated in Fig. 1. Zero values are not indicated in the image. Given the discrete binary image X and the structuring element B, we proceed by translating X by b for all possible values in B, as shown in Fig. 1. The final eroded image is obtained by taking the intersection of all these translations. In mathematical terms, this corresponds to (1).
Gray-Scale Images In the case of gray-scale images, structuring element g is a function as well. It is usually the practice that g is only defined on a finite/compact subset K of the domain. If, on the finite/ compact subset, g only takes the value 0, it is referred to as the
3 2 1 0 -1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
-2
-1
0
1
-2 -3 -4 -4
-3
2
3
M
X0,1
3
3
3
2
2
2
1 1 1 1
1 0 -1
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1 1
1 0 -1
0 -1
-2
-2
-3
-3
-4
-4
-4
-4
-3
-2
-1
X
0
1
2
3
-4
-3
-2
-1
0
1
2
1 1 1 1
1
-3
-2
3
2
3
1 1 1 1
1 1 1 1
1 1 1 1
1
Intersection
-2 -3 -4
-4
-3
-2
-1
0
1
2
3
X−1,0
B
2 1 0 -1 -2
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
-1
0
1
2
-3 -4 -4
-3
-2
X1,0
-4
-3
-2
-1
0
X B
3
Morphological Erosion, Fig. 1 Illustration of morphological erosion
1 1 1 1
0 -1
3
1
2
3
908
Morphological Erosion
flat structuring element, else it is referred to as nonflat structuring element. Observe that in the case of flat structuring element, we have ϵg ðf ÞðxÞ ¼ inf ff ðyÞ gðx yÞg ¼ yE
inf ff ðyÞg
ðxyÞ K
ð3Þ
where K denotes the finite/compact set on which g is defined. That is, the eroded value is taken to be the minimum over a neighborhood. This is illustrated in Fig. 2. We consider a generic gray scale image f. In gray-scale images, it is a convention that the values in the flat structuring element are taken to be 0 over the domain of definition and empty when they are not defined. As in the case binary images, the structuring element is translated to each position of the original
image. Within each neighborhood, the maximum is considered, and the central value is replaced by maximum values. Finally, the minimum over all these images is taken. At the boundaries, only the values which exist in the domain are considered. In the case of the nonflat structuring element, the values of g are added to the neighborhood values and then the minimum is computed.
Some Important Properties The properties of the erosion operator are explained using the binary images. However, these properties could extend to the gray-scale images as well. Most of these properties can be derived from noting that the erosion operator is dual to the 3
145
35
2
51
64
111 240
1
16
87
193
36
0
58
158
89
132
-1
21
110 203 117 175 180
47
153
-2
49
194 123 227 252
84
235
-3
64
61
169
-4
68 -4
91
34
60
73 28
-2
3
250
160 118
202 120 181 240 32
196 104 157
48
178 194 216 162
187 127 -3
117 112
75
204
73
-1
0
1
135 162 2
3
3
(f ⊕ g)3,−2
34
250
3
3
145
35
160 118
2
2
51
64
111 240
202 120 181 240
1
0
0
0
1
16
87
193
36
196 104 157
0
0
0
0
0
58
158
89
132
47
153
-1
0
0
0
-1
21
110 203 117 175 180
84
235
-2
-2
49
194 123 227 252
61
169
-3
-3
64
135 162
-4
-4
68
117 112
3
145
35
2
51
64
111 240
1
16
87
193
36
89
132
60
32
73 28
0
58
158
-1
21
110 203 117 175 180
-2
49
194 123 227 252
-3
64
-4
68 -4
91
178 194 216 162
187 127 -3
48
-2
75
204
73
-1
0
1
f
3
2
3
-4
-3
-2
-1
0
1
2
3
-4
91
34
-3
g
3
35
34
34
34
28
3
3
2
16
16
34
34
28
3
3
3
202 120 181 240
1
16
16
36
32
28
28
28
104
196 104 157
0
16
16
36
32
32
32
47
47
47
153
-1
21
21
89
32
32
32
47
47
84
235
-2
21
21
91
117
48
47
47
47
61
169
-3
49
49
75
75
48
48
48
61
135 162
-4
64
64
75
75
73
61
61
61
-4
-3
-2
-1
0
1
2
3
60
32
73 28
48
178 194 216 162
187 127 -2
75
204
73
-1
0
1
2
3
(f g)1,2
145
35
2
51
64
111 240
1
16
87
193
36
0
58
158
89
132
-1
21
110 203 117 175 180
47
153
-2
49
194 123 227 252
84
235
-3
64
61
169
-4
68 -4
60
73 28
-2
3
250
160 118
202 120 181 240 32
196 104 157
48
178 194 216 162
187 127 -3
117 112
75
204
73
-1
0
1
(f ⊕ g)−2,3
Morphological Erosion, Fig. 2 Illustration of morphological dilation with gray-scale images
Maximum
f g
3
91
34
3
250
160 118
117 112
135 162 2
3
Morphological Erosion
909 c
dilation operator. First, morphological erosion is increasing operator,
c
ϵB ðXÞ ¼ ðdB ðX ÞÞ ¼ c
ðX Þb c
ð8Þ
bB
X Y ) ϵ B ðX Þ ϵ B ðY Þ
ð4Þ ¼
Moreover, it is also an antiextensive operator when the structuring element B is symmetric. X ϵB ðXÞ
ð5Þ
Lattice-Based Definition Recall that a partially ordered set is a set L with a binary order such that: (i) x x (reflexivity), (ii) x y and y x implies x ¼ y (antisymmetry); and (iii) x y and y z imply x z holds true. A partially ordered set is called a lattice when any finite subset of L has an infimum and a supremum. It is called a complete lattice if any subset of L has an infimum and a supremum. The erosion operator on a complete lattice is defined as an operator ϵ : L ! L which
Xb
ð9Þ
bB
As a convention, if B is the structuring element used for dilation, then B , the reflection of the structuring element, is used for erosion. Accordingly, a reflection operation is embedded into the definition of the erosion to obtain (1). This is because, only if the structuring element B is reflected, would (δB, ϵB) form an adjunction, i.e., dB ð X Þ y , x ϵ Bð y Þ
ð10Þ
Adjunction is an important property for defining morphological opening In the case of gray-scale images, the duality is obtained by inversion ϵg ðf Þ ¼ 255 dg ð255 f Þ
ð11Þ
Then we have,
• Is increasing, i.e., x y implies ϵ(x) ϵ( y) • Preserves the infimum, i.e., ϵ(^i Ixi) ¼ ^i I(ϵ(xi))
ϵg ðf Þ ¼ 255 dg ð255 f Þ
ð12Þ
In case the index set I is empty, we have ϵ(m) ¼ m, where m denotes the supremum of L.
¼ 255 sup f255 f ðyÞ þ gðx yÞg
ð13Þ
How Does This Definition Relate to Definitions in (1) and (2)? The optimal approach to deriving equations (1) and (2) is by using the duality property, i.e.,
¼ inf f255 ð255 f ðyÞ þ gðx yÞÞg
ð14Þ
¼ inf ff ðyÞ gðx yÞÞg
ð15Þ
ϵðxÞ ¼ ðdðxc ÞÞc
which is equivalent to definition in (2). The lattice-based definitions are much more general, and the specific definitions can be derived from here.
ð6Þ
where xc denotes the complement of x in the lattice L. Observe that the definition of the erosion can in fact be derived starting from the duality principle. In the chapter of morphological dilation (chapter-ref), we derived the specific definition from the lattice definitions. Here, we shall use the duality property to arrive at definitions (1) and (2). In case of binary images, we have the duality by complement operation. It has already known that dB ð X Þ ¼
jBx ¼ xX
Xb bB
So using duality by complementation, we have
ð7Þ
yE
yE
yE
Summary In this chapter, morphological erosion is defined in the context of binary and gray-scale images. These definitions are then illustrated on simple examples. Several types of structuring elements which are intrinsic to the definition of a morphological erosion are described. The main properties of morphological erosion – increasing and antiextensive – are discussed. The definition of morphological erosion in lattices is then described in detail.
M
910
Cross-References ▶ Mathematical Morphology ▶ Morphological Closing ▶ Morphological Dilation ▶ Morphological Opening ▶ Structuring Element
Bibliography Dougherty ER, Lotufo RA (2003) Hands-on morphological image processing, vol 59. SPIE Press Matheron G (1975) Random sets and integral geometry. Wiley, New York Serra J (1983) Image analysis and mathematical morphology. Academic Serra J (1988) Image analysis and mathematical morphology, volume 2: Theoretical advances. Academic, New York
Morphological Filtering Jean Serra Centre de morphologie mathmatique, Ecoles des Mines, Paristech, Paris, France
Definition An operation which maps a complete lattice into itself is said a morphological filter when it is increasing and idempotent. Morphological filtering is mainly used for binary and vector images, and for partitions. When a connection, or a connectivity, is involved, morphological filtering allows to segment images, i.e., to find contours for the objects. Moreover these filters often depend on a size parameter which orders them in a hierarchy and from which one extracts the optimal segmentation. Morphological Filters In physics or in chemistry, when one says that one filters a precipitate, one describes in some way an opening. However this meaning is not the only one. In signal processing, one usually means by filter any linear operator that is invariant under translation and continuous. According to a classical result, every filter in the previous sense is expressed as the convolution product F ’ of the signal F by a convolution distribution ’, and the transform of F þ F0 is the sum of the transforms of F and of F0. If you listen on the radio to a duet for piano and violin, it is quite natural that the amplified sound should be the sum of the amplifications from the piano alone and from the violin alone.
Morphological Filtering
To the three axioms of convolution, a fourth one is in fact added: it is in practice very common to consider filters as being of the band-pass type, even if that is not quite exact. One says about a Hi-Fi amplifier that it is “low-pass up to 30 000 Hz”, of a tainted glass that it is monochromatic, etc. In this doing, one attributes implicitly to the operation of filtering the property of not being able to act by iteration: the signal that has lost its frequencies above 30,000 Hz will not be modified further if one submits it to a second amplifier identical to the first. We find again idempotence, which however is not captured by the axioms of convolution. This approach, based on linear filtering, is very well adapted to telecommunications, where the fundamental problem is the pass band of the channel, which requires to modify consequently the frequency band of the signal. But in geosciences and in image analysis, what matters is not to recognize frequencies, but to detect, localize and measure objects. One must thus be able to express spatial relations between objects, the first one being inclusion, which induces the lattice framework, where growth replaces the linearity of vector spaces. Since furthermore complete lattices lend themselves very well to taking into account idempotence, we are led to define as a morphological filter as follows: Definition Given a complete lattice ℒ, one calls a morphological filter, or more briefly a filter, any operator c : ℒ ! ℒ that is both increasing and idempotent. Openings and closings, that we already met in the article ▶ “Mathematical Morphology”, illustrate this notion, but one can devise many other filters. For example, we can think of these two operations as primitives and combine them serially or in parallel to create new filters. The first mode, by composition products leads to the alternating (sequential) filters, and the second one to the lattice of filters. Remark that translation invariance is never a prerequisite, the filters may vary from place to place, according to the context. Products of Filters When c and x are two ordered filters, that is, c < x, the inequalities cx ccx cxcx cxxx cx show that the product cx is in turn a filter. For the sake of concision, let us write ℬα for the invariance domain Inv(α) of a filter α. One can state the: Matheron’s Criterion (Matheron 1988) Let c and x be two filters such that c x. Then: 1. Their products generate only the four filters cx, cxc, xcx, and xc that are partly ordered:
Morphological Filtering
c cxc
911
xc cx
xcx x:
xcx is the least filter above xc _ cx, and cxc is the greatest filter below xc _ cx. 2. Although the product xc is not commutative in general, each filter xc or cx eliminates positive noise and sharp reliefs, as well as negative noise and narrow hollows. When the two primitive are an opening γ and a closing ’, which is the most popular case, then the two four products γ’ and ’γ have the following characteristic property: f g f _ g’ðf Þ ¼) g’ðgÞ ¼ g’ðf Þ and similarly f ^ fγ( f ) g f ) ’γ(g) ¼ ’γ( f ). The first property ensures that any signal g between f and f _ γ’( f ) gives the same filtered version as f itself. The second one is the dual version for ’γ. Iterations of Over and Underfilters We just saw how to generate filters by composition products. Is it also possible to obtain filters by suprema or infima? The supremum _i I ci (respectively, the infimum ^i I ci) of a family {ci |i I} of filters is visibly overpotent (respectively, underpotent), that is, c2 c (respectively c2 c), which makes it lose the idempotence that the ci had. One can nevertheless speak of a lattice of filters (Matheron 1988). Put _i I ci ¼ α. When ℒ is finite, we always reach αn þ 1 ¼ αn for n large enough; it suffices to iterate the overpotent operator α to obtain finally the idempotent limit F(α) ¼ αn which turns out to be the least filter above all the ci. When ℒ is infinite, α, α2, , αn and _n N αn are still overpotent, but _n N αn is not necessarily idempotent; it will be if α is "-continuous. Dually, we put ^i I ci ¼ β. If the lattice ℒ is finite, or if β is #-continuous, then ^n N βn will be the greatest filter Γ(β) below all the ci. Let us also remark that in some cases one of the terms, F or Γ, is obtained directly, but not the other: for example, when the ci are openings ci, the supremum F(_i I γi) ¼ _i I γi, but the infimum Γ(^i I γi) requires iterations. Hierarchies and Semigroups By regard for pedagogy, until now we have not parameterized the filters. To make them depend upon a parameter l > 0 of scale or size, leads us to ponder the structure of the obtained family {Cl| l 0}. For example, is the product CmCl of two of the filters in the family? If the increasing l correspond to stronger and stronger simplifications, can one start from any intermediate level Cm(A) to reach Cl(A), when m < l? For answering these questions, we can firstly examine the openings γ and the alternating filters γ’.
Granulometries The case of openings is the simplest. Following G. Matheron, we will say that the family {γl| l 0} with a positive parameter generates a granulometry when: 1. For every l 0, the operator γl is an opening, with γ0 ¼ id 2. In the composition product, the most severe opening imposes its law: gm gl ¼ g maxfl,mg :
ð1Þ
In terms of sifting, the grains refused by a sieve will be a fortiori by sieves with smaller holes. This second axiom amounts to saying that the openings decrease when the parameter increases, that is, l m 0 ) γl γm, or as well that their invariance domains decrease, that is, l m 0 ) ℬl ℬm. By duality, one constructs in a similar way the anti-granulometry {’l | l 0}. The two axioms of Matheron model the sieving technique, where solid particles are classified according to a series of sieves with decreasing meshes. They describe, similarly, the sizes of the individuals in a population. But they also apply to procedures which are not based on individual particles, like the Purcell method used in the petroleum industry, where mercury is injected into specimen of porous rocks under various pressures. Or again they model the linear intercept distributions in image analysis, the disc openings, etc. The relation (1) defines a commutative semigroup, with the identity id as neutral element, called the Matheron semigroup, whose use extends well beyond the granulometric case of openings. In particular, in the Euclidean case with invariance under translation, this relation weaves a link between granulometry by adjunction, similarity, and convexity. Indeed, every lB homothetic to the compact B is open by adjunction by mB for every m l if and only if B is convex, and consequently the family {γlB| l 0} of openings by the convex sets {lB| l > 0} is granulometric. Here, the hypotheses of convexity or similarity are equivalent. On the other hand, if one does not impose similarity, then the B(l) do not need to be convex, nor even connected. Alternating Sequential Filters Let us proceed to the products of the type ’γ, or γ’, of an opening by a closing. To parameterize directly the two primitives leads to a rather coarse result, that one refines by replacing the primitives by a granulometry {γl| l 0} and an anti-granulometry {’l| l 0}. Let us suppose for the moment l a positive integer, and set: ϖn ¼ ’n gn ’2 g2 ’1 g1 : The operator ϖn is a filter, designated as alternating sequential (or ASF), and due to S.R. Sternberg. Although it
M
912
Morphological Filtering
Morphological Filtering, Fig. 1 Left: initial mosaic; other views, from left to right: connected ASF of sizes l ¼ 1, 4, 6
summarized in the following axiomatics (Serra 1988) p.51. Let E be an arbitrary space:
p n ) ϖp ϖn ¼ ϖp ,
– A connection on P ðEÞ is a family C P ðEÞ that satisfies the following three conditions: 1. 0 C : 2. 8p E, fpg C . 3. 8fCi ji I g C , \i I Ci 6¼ 0 ) [i I Ci C : An element of C is called connected. – A system of connection openings on P ðEÞ associates with each point p E an opening γp on P ðEÞ that satisfies the following three conditions, 8p, q E: 1. γp({p}) ¼ {p}. 2. 8X P ðEÞ, gp ðXÞ \ gq ðXÞ 6¼ 0 ) gp ðXÞ ¼ gq ðXÞ: 3. 8X P ðEÞ, p X ) gp ðXÞ ¼ 0:
Connections In image processing, the fundamental operation associated with connectivity consists in directing a point towards a set and extracting the marked particle. The result depends on the choice of the said connectivity (e.g., 4- or 8-connectivities in square grid)), but in all cases, the particles of a set A pointed at x and at y are either identical, or disjoint, for all x and y of the space. Moreover, the operation which goes from A to its connected component in x is obviously an opening. Besides, connected zones that intersect are included in a same connected component. These characteristics can be
which is sufficient to construct hierarchies of more and more severe ASF. Nevertheless, when the γl and ’γ, l 0, are grain operators, that is, families of connected openings (respectively, closings) that process each grain (respectively, pore) independently from the others, then the associated ASF ϖn form a Matheron semigroup Serra (2000). An example is given in Fig. 1. The γi are connected openings by reconstruction from the eroded initial image by a disc of radius l, and the ’i are the dual operators. Each contour is preserved or suppressed, but never deformed: the initial partition increases under the successive filters, which form a Matheron semigroup. When l is a positive real, one often subdivides the segment [0, l] into 2, 4, . . ., 2k, . . . sections of filters ϖk ðlÞ ¼ ’l gl ’i2k l gi2k l ’0 g0, with 0 i 2k When k increases, the ϖk(l) decrease, and ^k ϖ k(l) ¼ ϖ(l) still remains a filter Serra (1988) p.205. The preceding properties extend to that continuous version. Finally, what has been said here for the sequences of primitives ’lγl remains true if one starts with from γl’l.
is constructed in view of the semigroup, it does not satisfy Eq. (1), but only the absorption law.
For p X, γp(X) is referred to as the connected component of X marked by p. The two notions are indeed equivalent (Serra 1988) p.52, as shown by the following theorem: Theorem There exists a bijection between the connections on P ðEÞ and the systems of connection openings on P ðEÞ: – with a connection C one associates the system of connection openings (γp, p E) defined for all X P ðEÞ by: gp ðXÞ ¼ [fC C jp C Xg;
ð2Þ
– with a system of connection openings (γp, p E) one associates the connection C defined by: C ¼ gp ðXÞjp E, X P ðEÞ :
ð3Þ
Classically in topology, a set is connected when it cannot be partitioned into two non-empty closed (or open) regions. In topology, one also speaks of arcwise connectivity, according
Morphological Filtering
913
Examples of Connections We now describe two examples of connections taken among many other ones (Ronse 2008; Serra 1988). Both are often used in practice and cannot be reduced to usual connectivities. Connection by dilation Start from a set P ðEÞ which is already provided with a connection C and consider an extensive dilation d : P ðEÞ ! P ðEÞ which preserves C , i.e., dðC Þ C ; equivalently, it is required that 8p E, p dðpÞ C . Then the inverse image C 0 ¼ d1 ðC Þ of C by δ constitutes a second connection, richer than C, i.e., C 0 C. For all A P ðEÞ, the Ccomponents of δ(A) are exactly the images by δ of the C 0 components of A. If γx stands for the opening associated with C, and nx for that associated with C 0 , we have: nx ð A Þ ¼ 0
0nx ðAÞ ¼ gx dðAÞ \ A if x A,
otherwise:
If, in the Euclidean or digital plane, we take for example for δ the dilation by a disc of radius r, then the openings nx Morphological Filtering, Fig. 2 Left: Connection by partition: the connected component of A at point x is the union of the two pieces of particles of A \ D(x); right: connection by dilation the particles of each cluster generate a second connection
characterize the clusters of objects whose infimum of the distances between points is 2r (Fig. 2, right). A contrario, the same approach extracts also the isolated connected components in a set A, since they specifically satisfy the equality nx(A) ¼ γx(A). Connection induced by a partition Consider a given partition D, and a point x E. The operation which associates, with each A E, the transform gx ðAÞ ¼ DðxÞ \ A
if x A, gx ðAÞ ¼ 0 otherwise:
to which a set A is connected when for each pair of points a, b A, one can find a continuous map c from [0, 1] into A such that c(0) ¼ a and c(1) ¼ b. This last connectivity is more restrictive than the first one, though in Rn both notions coincide on the open sets. In discrete geometry, the digital connectivities are particular cases of graph connectivity, which transposes to arcs the Euclidean arcwise definition. In Z2 one then finds, among others, the classical 4- and 8-connectivities of the square grid and the 6-connectivity of the hexagonal grid; and in Z3 the 6- and 26-connectivities of the cube, or that of the cube-octahedron. All these topological or digital connectivities satisfy the axiomatics of connections. But the latter does not presuppose any topology, or the distinction between continuous and discrete approaches. It is thus more comprehensive, and also better adapted to image processing, as it starts from one of its basic operations. The axiomatic of a connection has been relaxed by C. Ronse, by suppressing the axiom about the points (Ronse 2008), and by M. Wilkinson by extending it to ultra-connections which allow some covering (Wilkinson 2006).
is clearly an opening, and as x varies, the γx(A) and γγ(A) are identical or disjoint because they correspond to classes of partitions. The class: C ¼ fgx ðAÞjx E, A P ðEÞg is thus a connection. One sees in Fig. 2 (left) that C breaks the usual connected components, and puts together their pieces when they lie in a same class D(x). If E has been provided with a prior connection C 0 , like the usual arcwise one, then the elements of C \ C 0 are the connected components in the sense of C 0 of the intersections A \ D(x). A few properties of connections are as follows: 1. Lattice: The set of all connections on E if is a complete lattice, where the infimum of the family {C i | i I} is the intersection \i I C i , and the supremum is the least connection containing [i I C i . 2. Maximum partitioning: The openings γx of connection C partition every set A E into the set {γx(A)| x A} of its connected components; it is the coarsest partition whose all classes belong to C , and this partition increases with A: if A B, then every connected component of A is included in a unique connected component of B; 3. Increasingness: If C and C 0 are two connections on P ðEÞ, with C C 0, then the partition of A into its C 0-components is coarser than that into its C -components;
M
914
Morphological Filtering
s½F, A ¼ 1 ) s½cðFÞ, A ¼ 1:
ð4Þ
In other words, the connection generated by the pair (s, c(F)) contains that of (s, F). When, moreover, the operator c is a filter, one speaks of a connected filter. Since the partition of E into zones where c(F) is homogeneous according to s is coarser than that of E for (s, F), one proceeds from the second one to the first one by removing frontiers only. On the other hand, an operator that coarsens partitions is not necessarily connected. Some examples of connected filters are as follows: 1. E is provided with a connection C , and s(F, A) ¼ 1 for A connected and F constant over A. Every filter c that increases the flat connected zones, that is, those where F is constant, is connected. This corresponds to the usual meaning of the expression connected filter (Salembier and Serra 1995). 2. E is still provided with the connection C , and F is the lattice of binary functions E ! {0, 1}. Since this lattice is isomorphic to P ðEÞ, we develop the example in the settheoretical framework. The opening γx of the flat zones criterion applied to the set A gives simply the connected component of A that contains the point x. It is a connected filter, as well as, more generally, the opening γM ¼ _ {γx| x M}, called of marker M, where γM(A), called reconstruction opening, is the union of connected components, or grains of A, that meet M. 3. Reconstruction openings and closings extend to numerical functions through flat operators (see an example in the article ▶ “Mathematical Morphology”). The set A (respectively, M ) becomes the section of threshold t of the function F (respectively, of the marker function), one finds back criteria of the “flat zones” type. The alternating sequential filters of primitive connected openings and closings are themselves connected (see Fig. 1 above)
• A criterion s: F P ðEÞ ! {0, 1} is a binary function such that for all A P ðEÞ and all F F we have s[F, A] ¼ 1 when s is satisfied by F on A and s[F, A] ¼ 0 when not. It is supposed that for all F F the criterion is always satisfied on the empty set: s½F, 0 ¼ 1. • A criterion s is connective when: 1. it is satisfied for the singletons:
Connected Filters In what follows, E is any space and F designates a lattice of functions E ! R or Z . We have already met a connected filter, since the datum of a connection C on E is equivalent to that of the connected point openings {γx| x E}. We shall extend a property of the latter by stating that an operator c : F ! F is connected relatively to the connective criterion s when for every F F and every A E it satisfies the implication:
Connective Segmentation In image processing, a numerical or multivalued function F is said to be segmented when its space of definition E has been partitioned into homogeneous regions, in the sense of some given criterion. The operation will be meaningful if the regions are as large as possible. But do all criteria lend themselves to such a maximum cut-out? Suppose for instance that we want to partition E into (connected or not) zones, where the numerical function f under study is Lipschitz of unity parameter. We run the risk of finding three disjoint zones A, B, and C such that the criterion will be satisfied on A [ B and on A [ C, but not on B [ C. In this case, there is no largest region containing the points of A and where the criterion is realized, so that the segmentation requires a nondeterministic choice between the partitions {A [ B, C} and {A [ C, B}. This type of problem occurs typically in the “split and merge” approaches which have been used for more than 30 years in image segmentation. How to sort out the criteria in order to be sure that they generate segmentation? First, we will make precise a few notions that are needed. Let E and T two arbitrary sets, and F a family of functions from E ! T. Then:
8F F , 8x E, s½F, fxg ¼ 1;
ð5Þ
2. for all F F and all families {Ai | i I} in P ðEÞ, we have: \ Ai 6¼ 0 and ^ s½F, Ai ¼ 1
iI
4. Arcs: The set A is connected for C if and only if for all x, x0 A, one can find a component X C that contains x and x0 .
iI
¼) s½F, [i I Ai ¼ 1:
ð6Þ
• Finally, a criterion s segments the functions of F when for all F F, the family D ðE, F, sÞ of partitions of E whose all classes A satisfy s[F, A] ¼ 1, is non-empty and closed under supremum (including the empty one, D ðE, F, sÞ comprises the identity partition D0). Then the partition _D ðE, F, sÞ defines the segmentation of F according to s. We saw that each connection on E partitioned every set A E into its maximum components. Therefore, if a criterion allows us to generate a connection associated to F, then we have good reason to think that it will segment the function. This is exactly what the following result states (Serra 2006):
Morphological Filtering
Theorem of Segmentation The following three statements are equivalent: 1. The criterion s is connective 2. For each function F F, the class of sets A where s[F, A] ¼ 1 is a connection 3. Criterion s segments all functions of F . Thus, a difficult problem (how to determine whether a class of partitions is closed under supremum?) comes down to the simpler question of checking whether a criterion is connective. Since this theorem turns out to be an alternative to the variational methods in segmentation, a short comparison is instructive. 1. In the connective approach, no differential operators intervene, such as Lagrange multipliers, for example. No assumption on the continuous or discrete nature of the space E is necessary: both E and T are arbitrary. 2. Moreover, not only the definition domain E of F is segmented, but even all the subsets of this domain. Therefore, if Y and Z are two masks in E, if x Y \ Z, and if both segmentation classes Dx (Y) and Dx (Z) in x are included in Y \ Z, then Dx (Y) ¼ Dx (Z) (in a traveling, the contours are preserved). The approaches based on global optimization of an integral in E are not able to obtain such a regional result. 3. The connective criteria form a complete lattice, where the infimum of the family {si | i I} is given by the Boolean minimum, what allows to parallelize the conditions. Note that if D1 and D2 are the segmentations associated with the two connective criteria s1 and s2, then we have s1 s2 if and only if D1 D2, but the segmentation D relative to the inf s ¼ s1 ^ s2 satisfies only D D1 ^ D2. The petrographic example of Fig. 3 demonstrates an infimum of criteria.
915
Examples of Connective Segmentations In what follows, we omit to repeat, for each criterion s, that it is satisfied by the singletons and by the empty set (alternatively we could stop demanding that the singletons satisfy s, and that would lead to a partially connective criterion (Ronse 2008)). Besides, the starting space E is supposed to be provided with an initial connection C 0 , which may intervene or not in the definition of the connective criterion under study, and the arrival space T is R or Z. The various connective segmentations can be classified into two categories, according to the presence, or the absence, of particular points, namely, the seeds. We begin with the second category, that we call the simple connective criteria. Smooth Connection The criterion a for smooth connection, given by par s[F, A]¼ 1 if and only if: ∘
8x A, ∃aðxÞ > 0, BaðxÞ ðxÞ A ∘
and F is kLipschitz in BaðxÞ ðxÞ is obviously connective and induces the connection C 1 . When C 0 is the arcwise connection, the criterion C ¼ C 0 \ C 1 means that function F is k-Lipschitz along all paths included in the interior A of A. In Z2, for example, where the smallest value of a is 1, it suffices, for segmenting, to erode the functions F and –F by the cone H(k, 1) whose base is the unit square (or hexagon), height is k, and summit the origin o, and then to take the intersection of the two sets where F (resp. –F) is equal to its eroded. In the example of the micrograph of Fig. 3, one obtains for the smooth connection of slope 6 the white zones of Fig. 3(c), where the black ones indicate the isolated singletons. This connection differentiates the granular zones from those which are smoother, even when they both exhibit the same average gray value, as it often happens in electron microscopy.
Morphological Filtering, Fig. 3 From left to right: Electron micrograph of concrete, segmentation for a jump connection of value 12, segmentation by smooth connection of slope 6, infimum of the jump and smooth criteria
M
916
Morphological Filtering
Connections by Clustering on Seeds Many segmentation processes work by binding together points of the space around an initial family G0 E of seeds, which may possibly move, or whose number may vary when the process progresses during several iterations. All these processes satisfy the following property: Theorem of Seeds Given a function F F, an initial family G0 of seeds, and an aggregation process that yields the final seeds G, the criterion s obtained by: s½F, A ¼ 1 if all points of A are allocated to a same final seed g G, and s[F, A] ¼ 0 otherwise, is connective. Watershed lines: We now interpret the numerical function F as a relief in Rn or Zn. All points of the relief whose deepest descent line ends to a same minimum form an arcwise connected catchment basin, and the singletons which do not belong to any catchment basin define the watershed (Lantuéjoul and Beucher 1981). There exist several method to obtain a watershed. One can notice that for binary images, the watershed is nothing but the SKIZ of Lantuejoul, or skeleton by zones of influence. It extends to numerical functions by level set approach, the portion of watershed of level i being the geodesic SKIZ of level i inside level i – 1 (Meyer and Beucher 1990; Najman et al. 2012) (see also [Serra 1982] in the article ▶ “Mathematical Morphology”). Alternatively, one can determine the steepest descents of the points towards minima (Cousty et al. 2009; Bertrand 2005). The reference [Najman and Talbot 2010] in the article ▶ “Mathematical Morphology” comprises several chapters on the subject. Whatever the exact way, the watershed line has been introduced, the theorem of seeds shows that the involved criterion is connective. Now, among the singleton components we find of course the crest line, but also all points x of the intermediate flat zones, like stairs, stuck between an upstream and a downstream, and where F is constant on an
open set surrounding x. We then take for second connective criterion that “each point x of an intermediate flat zone goes to the same catchment basin as the point of the downstream frontier closest to x.” This criterion is applied to the set of singletons; by repeating the process as many times as needed for going up from stair to stair, one finally reduces the singleton zones to the true crest lines. Remark that the watershed of function F gives the zones of influence of the minima of F, as can be shown in Fig. 4 (first two left images), and not the contours of the objects. The latter appear as the watershed lines of the | gradient(F) | image, as one can see in Fig. 4 (last two left images). Moreover, for the sake of robustness, the gradient image is stamped here by the desired minima, which are both minima and watershed lines of F. Jump connection: It is a kind of alternative to watershed, which often leads to excellent segmentation. The jump connection works by aggregating seeds, each one being a connected basin of given height around a minimum. The space is supposed to be equipped with an initial connection C 0, and a jump value k > 0 is fixed. Around the supports M of the minima, we build the largest connected sets S1 (M ) where the level goes up from 0 to k – 1. By binding the S1 (M) one obtains the components associated to the first application of the jump criterion. A second application of the criterion, this time on the residual, gives new components S2 (M ), etc. until nk exceeds the dynamic of values of the function (Fig. 5). The family {Si(M)} is the set of classes of the final jump partition. Alternatively, one can jump up from the minima, and down from maxima, in a symmetrical manner (Serra 2006). Returning to the micrograph of Fig. 3, we observe that the smooth connection and the jump connection taken alone are not excellent, but that the infimum of the two criteria suppresses the noise, and gives a satisfactory result. Compound Segmentation Instead of parallelizing criteria, one can also take them into account successively, and segment differently various regions
Morphological Filtering, Fig. 4 Watershed of an electrophoresis and of its derivative
Morphological Filtering
917
Morphological Filtering, Fig. 5 The first two steps of a jump connection
Morphological Filtering, Fig. 6 From left to right: initial photo, segmentation of the face (color criterion), marker for the bust, final segmented silhouette. (By courtesy of C. Gomila)
of an image. For this purpose, one relaxes the first axiom of connective criteria, namely, relation (5). This condition aims at guaranteeing that every point of the space belongs to a class of the performed segmentation. By breaking away from relation, one thus renounce the covering of a set by the classes of a partition or by its connected components. This approach, formalized by C. Ronse (2008), is based on partial connections and leads to more flexible processing. In the example of Fig. 6, one tries to segment silhouettes of the person in Fig. 6 (left). A first partial segmentation, based on the hue of the skin and the hair, leads to the head contour. A second segmentation that deals only with the complement of the head, extracts next the shoulder from the marker formed by three superposed rectangles (minus the already segmented points). The union of the two segmentations leads to the mask in Fig. 6 (right). Hierarchies of Partitions From now on we concentrate upon the morphological filtering and segmentation defined on finite hierarchies of partitions in R2 or Z2, for whatever origin. They can be due to anterior connected filters, or not. Figure 7 (left), for example,
represents administrative divisions around the city of Paris, which generate a hierarchy of four partitions according to some parameter. The hierarchies of same base, same top, and same number of levels form a complete lattice for the product ordering of their partition levels. A hierarchy can be summarized in two ways, depending on whether we focus on the edges or on the classes. Both ways are equivalent but serve differently. Firstly, one can weight the edges by the level when they disappear. That condenses the hierarchy into a unique map of the so-called saliencies. The saliency maps lend themselves to morphological filtering on hierarchies. For example, the operation which suppresses all edges of saliency l, or shorter than m, and leaves unchanged the others, is a closing and it generates an anti-granuometry as l increases (see Ch.9 in the reference [Najman and Talbot 2010] of the article ▶ “Mathematical Morphology”). Alternatively one can replace the hierarchy by its dendrogram, as depicted in Fig. 7 (right), i.e., by a tree whose levels are those of the hierarchy. The top level is called the root and the lowest level the leaves. The classes, or nodes, are indicated by small discs and their ordering is described by arrows. The parameters associated with the classes are usually written
M
918
Morphological Filtering
Morphological Filtering, Fig. 7 Displays of a hierarchy
in the small discs. The dendrogram highlights the “sons” of each class S, i.e., the partial partition just below S. The relation between classes and sons directly intervenes when looking for minimal cuts. Formally speaking, a partition D is a map E ! P ðEÞ : x 7! DðxÞ that satisfies the following two conditions:
• for all x E, x D(x); • for all x, y E, D(x) \ D( y) 6¼ 0 ) D(x) ¼ D( y). D(x) is called the class of the partition in x. Partition D is finer than partition D0 when each class of D is included in a class of D0 . One writes D D0 . This relationship defines the refinement ordering, which provide partitions with the structure of a complete lattice (see article ▶ “Mathematical Morphology”). A hierarchy is an ordered sequence of partitions (e.g., Fig. 7 bottom). In the following, we start from a unique hierarchy H and consider the family ℋ of all the hierarchies whose classes are taken among those of H. Similarly, we name cut through hierarchy H any partition D whose all classes are taken among the classes of the hierarchy and we denote by D the set of all cuts of H. Finally, when the partitioning is restricted to a class S of the hierarchy H, one speaks of partial partition of support, or nod, S. Optimal Cuts Suppose we want to summarize the hierarchy into a significant partition. The information conveyed by the parameters on the classes may be pertinent for some classes at a certain level and for others at another level. In order to express this
pertinence, one classically associates an energy o with each partial partition of H. Three questions arise here, namely: 1. Most of the segmentations involve several features (color, shape, size, etc.). How to combine them in unique energy? 2. How to generate, from the set of all partial partitions, a partition that minimizes o, and which can be determined easily? 3. When one energy o depends on an integer l, i.e., o ¼ ol how to generate a sequence of optimal partitions that increase with l, which therefore should form an optimal hierarchy? These questions have been taken up by several authors, over many years, and by various methods. Morphological approach unifies them and allows to go further. We will successively answer the three above questions. Dynamic Programming The most popular energies o for hierarchical partitions derive from that of Mumford and Shah (Arbelaez et al. 2011; Guigues et al. 2006; Salembier and Garrido 2000). It is defined in each class as the sum of a fidelity term (e.g., the variance of the values) and a boundary regularization term (e.g., the length of the edges). The energy of a partial partition is the sum of those of its classes. The optimization turns out to be a trade-off between fidelity and regularization. In 1984, L. Breiman (Breiman et al. 1984) introduced a dynamic programming method to determine the cut of minimum energy in a hierarchy. The hierarchy is spanned only
Morphological Filtering
919
Morphological Filtering, Fig. 8 Breiman’s minimal cut over the hierarchy: the energy o at node S is compared to the sum of the energies of its sons TS. When o(S) Po(TS) one keeps S, when not, one replaces S by its sons
once, from the leaves to the root. The energy of each node S is compared to the sum of the energies of its children, and one keeps the partial partition which has the least energy. In case of equality one keeps S. Fig. 8 depicts a dendrogram with the energies of the classes and the minimal cut, i.e., the partition with the least total energy. However, one can wonder whether additivity is the very underlying cause of the simplifications via a unique spanning, since Soille’s constrained connectivity (Soille 2008), where the addition is replaced by the supremum, satisfies similar properties. Finally, one finds in literature a third type of energy, which holds on nodes only, and no longer on partial partitions. It appears in labeling methods (Arbelaez et al. 2011), and again, it yields optimal cuts. Is there a common denominator to all these approaches, more comprehensive than just additivity, and which explains why they always lead to unique optima? Energetic Lattices Consider a given hierarchy H, and the family D of all its cuts. An energy o is assigned to all partial partitions of H whose support S is a class of H. Let D1, D2 D , be two cuts. At point x, either the class D1(x) is the node S1 (x) support of a partial partition D2(x) of D2, or we have the inverse. We then say that cut D1 is o-smaller than D2 when o½D1 ðxÞ ¼ o½D2 ðxÞ and D2 ðxÞ ¼ S1 ðxÞ: These two conditions define an order relation over the set of the cuts D of hierarchy H, which in turn induces a complete lattice on the cuts D . Both ordering and lattice are said energetic (Kiran and Serra 2014). The structure of the cuts as an energetic lattice guarantees us that there is one and only one minimal cut, which is not useless (see Fig. 1 in (Kiran and Serra 2015)). h-Increasingness and Partition Closing The common property of all energies which make use of dynamic programming is that they all are h-increasing
(Kiran and Serra 2014). Let D1 and D2 be two partial partitions of the same support S, and D0 a partial partition of support S0 disjoint of S. An energy o over the partial partitions is said to be h-increasing when: oðD1 Þ oðD2 Þ ) oðD1 t D0 Þ oðD2 t D0 Þ where the symbol “t” stands for the concatenation of two partial partitions. When the energy o is h-increasing, and only then, each step of the dynamic programming is an increasing and extensive operation on the family ℋ of hierarchies, so that at the end, when the root is reached, we obtain the closing of H w.r.t. energy o. This closing is a hierarchy whose all levels are identical, and equal to the unique minimal cut. h-increasing energies include, of course, the classical additive energies, but also the composition laws by infimum, harmonic sum, number of classes, quadratic sum, and supremum, among others. Families of Minimal Cuts An energy oδ over the partial partitions is said inf-modular when for each partial partition D of support S we have od ðfSÞg ^od ½DðxÞ, x S: The Lagrange formalism proposes a convenient way to order a family of minimal cuts (Guigues et al. 2006; Soille 2008). Introduce a scalar Lagrange family of energies by {o(l) ¼ o’ þ loδ, l > 0} where o(l), o’, and oδ are h-increasing, and further oδ is inf-modular. These energies are defined over partial partitions, and we suppose l > 0. Given hierarchy H, the three energies o(l), o’, and oδ induces the three energetic lattices over the set D of all cuts of hierarchy H, namely, D l , D ’ , D d . Just as for the constrained minimization for functions, the energy o(l) stands for the Lagrangian of objective term o’ and constraint term oδ. The constrained Lagrangian minimization results then in the minimal cut of H w.r.t. energy o(l). This minimal
M
920
Morphological Filtering
Morphological Filtering, Fig. 9 Left: initial view; right: minimal cut for an additive energy. (By courtesy of L. Guigues)
Morphological Filtering, Fig. 10 Two cuts of the minimal hierarchy
cut increases with l for the refinement order, so that it produces a new hierarchy H (Kiran and Serra 2015). Remark that one does not change the H by iterating the process. Figs. 9 and 10 give an example of the optimal hierarchy H . It depicts the evolution of the optimal cuts when o’ and oδ are the variance of the values and the length of the edges in a partial partition, respectively. The parameter l increases when going from left to right, and the minimal cut of Fig. 9 (right) corresponds to a transitional l. We see that for the greatest l, the fields are well rendered, but the region of the village, which has a high length of edges, is swept out. Here a better technique would probably be the Ronse compound segmentation (Ronse 2008), as demonstrated in Figs. 6 and 10.
Summary The contribution of Morphological Filtering to geosciences and more generally to image processing can be contemplated
from the two points of view of theory and practice. Several notions used in qualitative description receive a precise meaning in mathematical morphology, owing to adapted axioms. Morphological filtering: turns out to be a sort of trade-off between its two axioms of increasingness, which simplifies the structures under study, and idempotence, which blocks the simplification. The axioms of a connection directly apply to the objects which are connected in the usual sense, but also to those which may seem disconnected at first glance, like the dotted lines of a trajectory. When a connection is added to the filters, then they come to segmentation methods. They extract, from numerical functions, partitions of the space into regions where these functions are homogeneous in the sense of a given connective criterion. In image processing this process is called a segmentation. When it depends on a positive parameter, segmentation results in a sequence of increasing partitions called a hierarchy. Then a closing filter on the hierarchy allows us to extract its “best” segmentation. Some of the above examples belong to petrography or to remote sensing, thus come within the domain of geosciences.
Morphological Opening
But the others, like district grouping, or zones of influence of spots, can easily be transposed to geosciences questions. Morphological Filtering intervenes, indeed, at all scales and includes hydrology, generation of maps, fractal landscape description, etc. (Sagar 2013).
921 Serra J (2006) A lattice approach to image segmentation. J Math Imag Vis 24(1):83–130. https://doi.org/10.1007/s10851-005-3616-0 Soille P (2008) Constrained connectivity for hierarchical image partitioning and simplification. IEEE Trans Pattern Anal Mach Intell 30(7):1132–1145 Wilkinson MH (2006) Attribute-space connectivity and connected liters. IVC 25(4):426–435
Cross-References ▶ Mathematical Morphology ▶ Matheron, Georges
Bibliography Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans PAMI 33(5):898–916 Bertrand G (2005) On topological watersheds. J Math Imag Vis 22(2–3):217–230 Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey Cousty J, Bertrand G, Najman L, Couprie M (2009) Watershed cuts: minimum spanning forests and the drop of water principle. IEEE Trans Pattern Analysis Machine Intel 31(8):1362–1374 Guigues L, Cocquerez JP, Men HL (2006) Scale-sets image analysis. Int J Comput Vis 68(3):289–317 Kiran BR, Serra J (2014) Global-local optimizations by hierarchical cuts and climbing energies. Pattern Recogn 47(1):12–24 Kiran BR, Serra J (2015) Braids of partitions. In: LNCS 9082 mathematical morphology and its applications to signal and image processing. Springer, Berlin Heidelberg, pp 217–228 Lantuéjoul C, Beucher S (1981) On the use of the geodesic metric in image analysis. J Microsc 121:39–49 Matheron G (1988) Chapter 6, Filters and lattices. In: Serra J (ed) Image analysis and mathematical morphology. volume 2: theoretical advances. Academic Press, London, pp 115–140 Meyer F, Beucher S (1990) Morphological segmentation. J Vis Commun Image Represent 1(1):21–46 Najman L, Talbot H (2010) Mathematical morphology: from theory to applications. Wiley, New York Najman L, Barrera J, Sagar BSD, Maragos P, Schonfeld D (eds) (2012) Filtering and segmentation with mathematical morphology. IEEE J Sel Topics Signal Process 6(7):737–738 Ronse C (2008) Partial partitions, partial connections and connective segmentation. J Math Imag Vis 32(2):97–125. https://doi.org/10. 1007/s10851-008-0090-5 Sagar BSD (2013) Mathematical morphology in geomorphology and gisci. CRC Press, Taylor and Francis Group, A Chapman and Hall Book, London Salembier P, Garrido L (2000) Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Trans Image Process 9(4):561–576 Salembier P, Serra J (1995) Flat zones filtering, connected operators, and filters by reconstruction. IEEE Trans Image Process 4(8):1153–1160 Serra J (1982) Image analysis and mathematical morphology. Academic Press, London Serra J (ed) (1988) Image analysis and mathematical morphology. volume 2: theoretical advances. Academic Press, London Serra J (2000) Connections for sets and functions. Fundamenta Informaticae 41(1/2):147–186
Morphological Opening Sravan Danda1, Aditya Challa1 and B. S. Daya Sagar2 1 Computer Science and Information Systems, APPCAIR, Birla Institute of Technology and Science, Pilani, Goa, India 2 Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
Definition Morphological opening is one of the basic operators from the field of Mathematical Morphology (MM). It is also one of the simplest morphological filters used for constructing more complex filters. This operator is dual to Morphological Closing. In general, morphological opening is defined as a composition of an erosion followed by a dilation. So, in the case of discrete binary images, if X denotes the binary image and B denotes the structuring element, morphological opening is defined as gB ðXÞ ¼ dB ðeB ðXÞÞ ¼ ðX BÞ B
ð1Þ
This definition can be reduced to gB ðXÞ ¼ [fBx jBx Xg
ð2Þ
Equivalently, in the case of gray-scale images, if f denotes the gray-scale image and g denotes the structuring element, opening is defined as g g ð f Þ ¼ dg e g ð f Þ
ð3Þ
Illustrations Morphological opening is one of the basic operators from the field of Mathematical Morphology. Its definition can be traced to fundamental works by Jean Serra (1983, 1988) and Georges Matheron (1975). In general, morphological opening is defined as an erosion operator composed with a dilation operator. In this entry, we illustrate the opening operator using binary images, and discuss various properties. We also discuss the more general algebraic openings.
M
922
Morphological Opening
In simple words, (2) considers the union of all the translations of the structuring element which fits into the original set. This is illustrated in Fig. 1. Here, the structuring element is taken to be a simple disk denoted by a dashed line in Fig. 1. All the points that do not fit within the structuring element are removed. In Fig. 1, this corresponds to the spiked projection of the larger disk. Although not discussed here, similar intuition follows in gray-scale images as well.
Some Important Properties The properties of the opening operator are explained using the binary images. However, these extend to the gray-scale images as well (see Dougherty and Lotufo (2003)). Firstly, morphological opening is an increasing operator, X Y ) gB ð X Þ gB ð Y Þ
ð4Þ
Moreover it is also an anti-extensive operator X gB ðXÞ
ð5Þ
Also, the opening operator is idempotent gB ð gB ð X Þ Þ ¼ gB ð X Þ
ð6Þ
Algebraic Openings
Set X
Opening with B
γB (X)
Morphological Opening, Fig. 1 Illustration of Morphological Opening. The shaded area refers to the set obtained by opening
(Opening) γ
Thresholding
Morphological Opening, Fig. 2 Application of Morphological Opening. Using the white top-hat transform one can remove the illumination effects. https://scikit-image.org/docs/dev/auto_examples/color_
Morphological openings are a subset of a wider class of operators known as Algebraic Openings. An algebraic opening is defined as any operator on the lattice which is: (a) increasing (b) anti-extensive, and (c) idempotent. It is easy to see that morphological opening is a specific example of algebraic openings.
(WTH) I − γ
Thresholding
exposure/plot_regional_maxima.html#sphxglr-download-auto-exam ples-color-exposure-plot-regional-maxima-py. Copyright: 2009–2022 the scikit-image team. License: BSD-3-Clause
Morphological Pruning
923
An example of algebraic opening, which is not a morphological opening, is Area Opening – remove all connected components of the image that has an area lesser than the parameter l. (Recall that in a binary image, two adjacent pixels x, y are said to be connected if they have the same value. A path between two pixels x, y is a sequence of pixels x ¼ x0, x1, , xk ¼ y such that xi and xiþ1 is connected. A subset is said to be connected if for any two pixels in the subset there exists a path between them. A connected component is a maximal subset which is connected.) In Matheron (1975) it is shown that any algebraic opening can be defined as a supremum of morphological opening.
Cross-References ▶ Mathematical Morphology ▶ Morphological Filtering ▶ Structuring Element
Bibliography Dougherty ER, Lotufo RA (2003) Hands-on morphological image processing, vol 59. SPIE Press, Bellingham Matheron G (1975) Random sets and integral geometry. AMS, Paris Serra J (1983) Image analysis and mathematical morphology. Academic, London Serra J (1988) Image analysis and mathematical morphology, Theoretical advances, vol 2. Academic, New York
Application to Filtering Recall that, in the context of mathematical morphology, an operator which is increasing and idempotent is referred to as a filter. As stated earlier, morphological opening is one of the simplest filters based on which complex filtering operators such as top-hat transforms, granulometries, and alternating sequential filters are constructed. Here we discuss the example of White top-hat transform which is defined as WTHðf Þ ¼ f gðf Þ
ð7Þ
or simply WTH ¼ I – γ where I is the identity operator. The effect of this operator is illustrated in Fig. 2. A simple thresholding of the original image results in dark patches. This is due to the bad illumination of the image. The dark patches can be removed by opening with a large structuring element and subtracting the result from the original image, as White-top-hat transform. By thresholding the filtered image, the illumination effects can be seen filtered out.
Summary In this entry, morphological opening is defined for binary and gray-scale images. The practical utility of morphological opening is then illustrated using a fictitious binary image. The main properties of morphological opening, namely, increasing, anti-extensiveness and idempotence are then mentioned. A generalized notion of morphological opening, i.e., algebraic opening is defined. The entry concludes by describing the construction of more complex filter: White-top-hat transforms using morphological opening, along with illustrations on real images.
Morphological Pruning Sin Liang Lim1 and B. S. Daya Sagar2 1 Faculty of Engineering, Multimedia University, Cyberjaya, Selangor, Malaysia 2 Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
Definition In digital image processing, skeletonization and thinning algorithms are procedures that tend to leave parasitic components, or “spurs” behind. As a result, pruning is a technique that could be adopted as an essential procedure to eliminate these unwanted spurs. The spurs, in this case, refer to unwanted branches of line which should be removed since, by removing them, the overall shape of the line is not affected. These parasitic components are the undesirable result generated from edge-detection algorithms (e.g., character recognition, car plate detection) or digitization. Very often, these spurs can be treated as “noise” and thus they should be cleaned-up before proceeding to the subsequent procedure.
Introduction Pruning is an important post-processing operator that complements skeletonization and thinning algorithms because these operations tend to leave spurs that need to be eliminated (Gonzalez and Woods 2008; Shaked and Bruckstein 1998). Skeletons are concise descriptors of objects in images (Duan et al. 2008; Maragos and Schafer 1986). Skeletonization, on
M
924
Morphological Pruning
the other hand, often provides a compact yet effective representation of two-dimensional and three-dimensional objects. Skeletonization is commonly applied in many low-level and high-level image-processing applications which include object representation, tracking, recognition, and compression. Furthermore, skeletonization also facilitates efficient description of geometry, topology, scale, and other local properties related to the object (Saha et al. 2017; Baja 2006). Particularly, skeletonization is used in automated character recognition including handwritten characters recognition to extract the skeleton of each character (Gonzalez and Woods 2008). However, skeletonization will often result in not only the skeleton but also the parasitic components, or “spurs.” Spurs are created as a result of erosion due to the nonuniformities in the strokes that are possessed by the characters.
be repeated for n times. In this case, since the spur is determined to be three pixels, thus Step 1 is repeated three times to generate set X1 in Fig. 1c. Step 2: Find End Points 8
X2 ¼
X 1 Bk
ð2Þ
k¼1
In this step, a set X2 containing all end points in X1 is constructed. In Eq. 2, Bk denotes end-point detectors which are used to determine the end points in X1 using the hit-andmiss operation of Bk with X1. End points are defined as the points where the origin of the structuring elements are satisfied. Fig. 1d shows the resultant X2 which consists of two end points. Step 3: Dilate End Points
Methodology In Gonzalez and Woods (2008), a morphological-based pruning framework was developed by assuming that the length of a parasitic component does not exceed a certain number of points. In other words, the standard pruning algorithm will eliminate all branches shorter than a specific number of pixels. For example, if a parasitic branch is shorter than three points and the thinning algorithm is repeated for three times, then the parasitic branch will be removed. The framework comprises four steps: (i) thinning, (ii) finding end points, (iii) dilating end points, and (iv) union. It is highlighted that step (ii) is performed to ensure that the main branch of each line are preserved and not shortened by the procedure. The algorithms and the example in Fig. 1 below are adapted with permission from Digital Image Processing, fourth Edition by Gonzales and Woods, Pearson 2018. Step 1: Thinning X 1 ¼ A f Bg
ð1Þ
Thinning of set A with a sequence of structuring element {B} is performed, as shown in Eq. 1. As an example, Fig. 1a shows the skeleton of a handprinted letter “a” and it is denoted as set A. It is observed that the leftmost part of the character (three pixels in vertical) represents the “spur” which should be removed. This leads us to assume that any branch with three or less pixels should be removed. Figure 1b represents the ordered list of 8 structuring elements which is rotated 90 and each of which contains two pixels. The “” symbol denotes a “don’t care” condition. In order to remove any branch with n or less pixels, Step 1 has to
X 3 ¼ ðX 2 H Þ \ A
ð3Þ
The next step is to perform dilation of the end points as conditioned on A, as given in Eq.3. In particular, the end points is dilated using a 3 3 matrix (H) which contains all 1’s and intersects with A. This step is executed n times in all directions for each endpoint. In this example, as no new spurs are created by inspection, the end points are dilated three times with reference to A as delimiter. This is the same number as the thinning process in Step 1. Fig. 1e represents the resultant X3. Step 4: Union of X1 and X3 X 4 ¼ X1 [ X3
ð4Þ
The purpose of the last step is to restore the character to its original form but with the spurs removed. In essence, the union of X1 and X3 will produce the final pruned image X4, as shown in Fig. 1f.
Case Studies For simple illustration, the ASTER Global Digital Elevation Model (ASTER GDEM) of the mountainous area in Cameron Highlands, Pahang in Malaysia is taken as the study area for extracting the pruned image. With elevation up to 1600 meters above sea level, the study area covers the tea plantation, vegetable farm, and various other agricultural activities. The study area encompasses 400 km2 with coordinates of 4 220 57’ N to 4 330 50’ N and 101 220 4’ E to 101 320 50’ E (Lim et al. 2016; Kalimuthu et al. 2016).
Morphological Pruning
925
M
Morphological Pruning, Fig. 1 (a) Set A, (b) sequence of structuring elements B used for detecting end points, (c) X1 - after three cycles of thinning, (d) X2 – end points detected, (e) X3 – dilation of X2 based on (a), (f) X4 – Final pruned image of (a)
926
Morphological Pruning
Morphological Pruning, Fig. 2 (a) DEM of the study area in Cameron Highlands, Malaysia, (b) binary image of (a), (c) skeleton image of (b), and (d) pruned image of (c)
Figure 2a shows the DEM of the study region in Cameron Highlands with resolution of 257 258 pixels. Figure 2b is the resultant image after binarization is performed on Fig. 2a. Figure 2c shows the skeleton image of Fig. 2b. It is observed that there are some parasitic branches (spurs) that appear in the end points of the objects. Hence, after pruning is applied, Fig. 2d shows the final pruned image of Fig. 2a where most of the spurs have significantly been removed resulting in a cleaner skeleton image.
Morphological pruning has been applied in many applications related to geoscience. For example, in Tay et al. (2006), the traveltime channel networks were pruned downward from the tips of the branches to the stationary outlet. This mimics realistic phenomenon such as burning of fire from all the extremities of river source, and propagate progressively toward the outlet. The corresponding convex hulls of the traveltime pruned networks were also computed, which leads to the derivation of convexity measure. Based on the
Morphological Pruning
927
M
Morphological Pruning, Fig. 3 (a) An example of (nonconvex set) channel network where the source point is represented as a round dot at the bottom of the tree, (b) convex hull of (a), (c) Traveltime pruned
network until it reaches the outlet, (d) union of corresponding convex hulls of pruned networks in (c). This figure is adopted from (Tay et al. 2006)
convexity measures obtained, new power law relations could be further derived to characterize the scaling structure of the networks. The proposed methodology has been explained with the use of a simple tree-like non convex set, shown in Fig. 3a. Its corresponding convex hull is as shown in Fig. 3b.
The traveltime network of Fig 3a are pruned recursively towards the outlet and color-coded in Fig. 3c, while Fig. 3d illustrates the convex hulls of the traveltime channel network. Furthermore, the framework has been applied on seven subbasins of Cameron Highlands, Malaysia and the convexity
928
measures acquired from the dynamically shrinking traveltime basins which move toward their outlets offers geophysicists and geomorphologists a better visualization and understanding of the channelization process.
Summary or Conclusions Morphological pruning is a useful technique for postprocessing to eliminate unwanted parasitic components which appear following skeletonization and thinning operations. We have demonstrated the use of morphological pruning to eliminate unwanted parasitic components in the digital elevation model (DEM). From the results, it is obvious that morphological pruning is able to remove the unwanted spurs and helps to generate a cleaner output with significantly lesser noise. This framework can be applied to extract the ridge of the mountains or to determine the contour lines of drainage basins from the DEMs. Furthermore, it is proven that pruning can be adopted to applications related to geoscience which helps in deriving statistically significant metrics.
Cross-References ▶ Digital Elevation Model ▶ Morphological Dilation
Bibliography Baja GS (2006) Skeletonization of digital objects. Progress in Pattern Recognition, Image Analysis and Applications. pp 1–13 Duan H, Wang J, Liu X, Liu H (2008) A scheme for morphological skeleton pruning. Proceedings of 2008 IEEE International Symposium on IT in Medicine and Education Gonzalez RC, Woods RE (2008) Digital image processing, 3rd edn. Prentice Hall, Upper Saddle River. ISBN 978-0131687288 Kalimuthu H, Tan WN, Lim SL, Fauzi MFA (2016) Interpolation of low resolution Digital Elevation Models: A comparison. 2016 8th Computer Science and Electronic Engineering Engineering Conference (CEEC), 28–30 September 2016 Lim SL, Tee SH, Lim ST (2016) Morphological interpolations of lowresolution DEMs vs High-resolution DEMs: A comparative study. Progress in Computer Sciences and Information Technology International Conference, Procsit, 20–22 December 2016 Maragos P, Schafer R (1986) Morphological skeleton representation and coding of binary images. IEEE Trans Acoust Speech Signal Process 34(5):1228–1244 Saha PK, Borgefors G, Baja GS (2017) Skeletonization theory, methods, and applications. Academic Press. ISBN 978-0-08-101291-8 Shaked D, Bruckstein AM (1998) Pruning Medial Axes. Computer Vision and Image Un derstanding 69(2):156–169 Tay LT, Sagar BSD, Chuah HT (2006) Allometric relationships between Traveltime Channel networks, convex hulls, and convexity measures. Water Resour Res 42(W06502)
Morphometry
Morphometry Sebastiano Trevisani1 and Igor V. Florinsky2 1 University IUAV of Venice, Venice, Italy 2 Institute of Mathematical Problems of Biology, Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, Pushchino, Moscow Region, Russia
Synonyms Geomorphometry; Quantitative morphology; Surface analysis; Terrain analysis
Definition Morphometry is usually reckoned as the quantitative analysis of solid earth surface morphology. It is mainly focused on the analysis of topography and bathymetry; however, its related concepts and methods can be applied to the analysis of any surface, being it natural (e.g., planetary geomorphology) or anthropic (e.g., surface metrology), and at any spatial scale. The morphometric analysis is most efficiently conducted by means of geocomputational tools applied to digital representations (1D, 2D, etc.) of earth surface. The computational algorithms are implemented both in standalone specialized software for terrain analysis as well as additional tools in Geographical Information Systems (GIS). The morphometric analysis can be conducted also in the field by means of field equipment, e.g., for the characterization of local characteristics and for validation purposes. In a broader sense, morphometry also includes the quantitative spatial analysis of morphometric features, such as the drainage network and structural lineaments; in this context, the analysis can be conducted with geocomputational tools applied to remote sensing imagery.
Introduction Topography is one of the main factors controlling processes taking place in the near-surface layer of the planet and it is expression of multiple geomorphic processes and factors. Accordingly, it is not surprising that the qualitative and quantitative study of surface morphology has a long record of research in the geosciences. For example, the work of Strahler on hypsometry (see ▶ “Hypsometry”) is emblematic of the use of quantitative methods for the description of surface morphology. Moreover, the heterogeneity and complexity of topographic surfaces and features have always attracted scientists; in this perspective, the development of fractal theory
Morphometry
(Mandelbrot 1967) is emblematic. Before the 1990s, topographic maps were the main source of quantitative information on topography. These maps were analyzed using geomorphometric techniques to calculate manually morphometric variables and produce morphometric maps. In the mid1950s, a new research field – digital terrain modeling – emerged in photogrammetry. Within its framework, digital elevation models (DEMs), two-dimensional discrete functions of elevation, became the main source of information on topography. DEMs were used to calculate digital terrain models (DTMs), two-dimensional discrete functions of morphometric variables. Initially, digital terrain modeling was mainly applied to produce raised-relief maps using computer-controlled milling machines, and to design highways and railways. However, digital terrain modelling became widely applicable many years later, thanks to technological innovations in computing and topographic data collection. Relevant developments in the physical and mathematical theory of the topographic surface in the gravity field accompanied the technological innovations. As a result, geomorphometry evolved into the science of quantitative modeling and analysis of the topographic surface and of the study of its relationships with the other natural and artificial components of geosystems. Currently, geomorphometry is widely adopted to solve various multiscale problems of geomorphology, hydrology, remote sensing, soil science, geology, geophysics, geobotany, glaciology, oceanology, climatology, planetology, and other disciplines (e.g., Moore et al. 1991; Pike 2000; Hengl and Reuter 2009; Florinsky 2016; Wilson 2018). Geomorphometric analysis has deep interconnections with many geocomputational, statistical, and mathematical approaches; these include geostatistics, fractal analysis, spectral methods (e.g., wavelets, Fourier analysis, etc.), mathematical morphology (Sagar 2013), and many others. Moreover, considering the raster representation of elevation, morphometry has strong methodological interconnections with image analysis and pattern recognition algorithms, with various examples of applications of advanced pattern recognition approaches such as the Local Binary Pattern or the classical gray level co-occurrence matrix (e.g., Haralick et al. 1973). In this context, the links between topographic surface roughness (or, as synonym, surface texture) and image texture analysis are evident (Haralick et al. 1973; Trevisani and Rocca 2015). The current popularity of morphometry is strongly related to recent technological innovations in multiple fields, including techniques for the derivation of digital terrain models, computational power, and software development. Cloud storing services and parallel computing will play a key role in the next future, given the increasing need of computational resources for storing and analyzing digital elevation data. In
929
fact, the quantity of topographic-related data is continuously increasing for multiple reasons. First, new sensing technologies, e.g., LiDAR (Light Detection and Ranging) and SAR (Synthetic Aperture Radar), permit the efficient collection of topographic data for wide areas and at high resolution. Second, the increase in the geosphere-anthroposphere interactions and the need to deal with multiple geoenvironmental issues (e.g., natural hazards) require an accurate, detailed, and, often, multi-temporal knowledge of surface morphology. Third, the morphometric analysis can be conducted by means of a variety of algorithms, permitting to derive a wide set of morphometric variables and local statistical indices (e.g., Pike 2000; Florinsky 2017); moreover, the various morphometric indices can be computed at multiple scales. Geomorphometric methods provide a multiscale quantitative description of solid-earth surface morphology by means of multiple derivatives and local statistical indices. These can be used directly for a visual analysis and a description of landscape, for comparing the morphometric characteristics of different study sites and for studying the connections between morphology and geomorphic processes and factors. Some morphometric indices can be also used as parameters in physical-based computational models; for example, local roughness indices can be used as impedance factor in surface-flow models. Often, the calculated geomorphometric features are further processed by means of supervised (e.g., landslide susceptibility models, soil properties mapping, etc.) or unsupervised learning approaches (e.g., landscape classification). In this context new, approaches based on machine learning are becoming increasingly popular in geomorphometric analysis. Morphometry is inherently related to the field of remote sensing. First, frequently the digital elevation models are derived by means of remote sensing technologies, using active (e.g., LiDAR, SAR, etc.) or passive (e.g., imagery) sensors, mounted on a wide set of platforms (terrestrial, satellite, aerial, etc.). Second, many of the techniques adopted to analyze digital elevation models can be applied to the analysis of remote sensing imagery and vice versa. Third, relevant morphological features (e.g., structural lineaments) that can be analyzed quantitatively can be derived from remote sensing imagery. Fourth, in many applications (e.g., for studying soil properties, for mapping landslide susceptibility, or for ecological considerations), the quantitative analysis can be performed by means of an integrated use of morphometric derivatives and remote sensing-derived variables. Fifth, digital elevation models and local morphometric characteristics (e.g., slope, aspect, roughness, etc.) are important factors for the processing of remote sensing products (e.g., orthorectification of imagery, post-processing of SAR products, etc.).
M
930
Morphometry
Morphometrics
ð1Þ
of morphometric variables based on their intrinsic (mathematical) properties (Shary et al. 2002; Florinsky 2016). Here, the classification of Florinsky (2017) is adopted. Morphometric variables can be divided into four main classes: (1) local variables; (2) nonlocal variables; (3) two-field specific variables; and (4) combined variables. The terms “local” and “nonlocal” are used regardless of the study scale or model resolution. They are associated with the mathematical sense of a variable (c.f. the definitions of a local and a nonlocal variable). Being a morphometric variable, elevation does not belong to any class listed, but all topographic attributes are derived from DEMs.
where z is elevation, and x and y are the Cartesian coordinates. This means that caves, grottos, and similar landforms are excluded. However, alternative approaches for digital representation of topography can be adopted, including, for example, true 3D representations (i.e., caves can be represented), 3D digital globes, triangulated irregular networks, etc. Even if there is not a consensus, many researchers differentiate morphometrics (i.e., indices describing some aspects of surface morphology) between morphometric variables and local statistical indices. Morphometric variables (Fig. 1) generally imply specific mathematical requirements for the representation of topographic surface (e.g., differentiability, see Florinsky 2017) and are defined through their relation with the gravity field (see section “Local Morphometric Variables” difference between flow and form attributes). Local statistical indices do not require particular mathematical properties in the topographic surface data to be analyzed and are not necessarily related to the gravity field. For example, an important set of local statistical indices is adopted to describe surface roughness (or its synonym “surface texture”), a fundamental property of surface morphology. Surface roughness is a complex and multiscale characteristic that cannot be described by a single mathematical index (Fig. 2). In this regard, there is a quite ample literature on the use and development of surface texture indices (e.g., Haralick et al. 1973; Grohmann et al. 2011; Trevisani and Rocca 2015). Given that morphometric variables represent the core of geomorphometry, we furnish a basic review of the fundamental ones. A morphometric (or topographic) variable (or attribute) is a single-valued bivariate function describing properties of the topographic surface. Here, we review fundamental topographic variables associated with the theory of the topographic surface and the concept of general geomorphometry, which is defined as “the measurement and analysis of those characteristics of landform which are applicable to any continuous rough surface. . . . General geomorphometry as a whole provides a basis for the quantitative comparison . . . of qualitatively different landscapes . . ..” (Evans 1972, p. 18). There are several classifications
Local Morphometric Variables A local morphometric variable is a single-valued bivariate function describing the geometry of the topographic surface in the vicinity of a given point of the surface, along directions determined by one of the two pairs of mutually perpendicular normal sections (Fig. 3). A normal section is a curve formed by the intersection of a surface with a plane containing the normal to the surface at a given point. At each point of the topographic surface, an infinite number of normal sections can be constructed, but only two pairs of them are important for geomorphometry. The first pair of mutually perpendicular normal sections includes two principal sections (Fig. 3a) well known from differential geometry. These are normal sections with extreme – maximal and minimal – bending at a given point of the surface. The second pair of mutually perpendicular includes two normal sections (Fig. 3b) dictated by gravity. One of these two sections includes the gravitational acceleration vector and has a common tangent line with a slope line at a given point of the topographic surface. The other section is perpendicular to the first one and tangential to a contour line at a given point of the topographic surface. Local variables are divided into two types – form and flow attributes – which are related to the two pairs of normal sections (Shary et al. 2002). Form attributes are associated with two principal sections. These attributes are gravity field invariants. This means that they do not depend on the direction of the gravitational acceleration vector. Among these are minimal curvature (kmin), maximal curvature (kmax), mean curvature (H), the Gaussian curvature (K), unsphericity curvature (M), Laplacian (∇2), shape index (IS), curvedness (C), and some others. Flow attributes are associated with two sections dictated by gravity. These attributes are gravity field-specific variables. Among these are slope (G), aspect (A), northwardness (AN), eastwardness (AE), horizontal (or, tangential) curvature (kh), vertical (or profile) curvature (kv), difference curvature (E), horizontal excess curvature (khe), vertical excess
The digital representation of surface morphology is crucial for modern geomorphometry. The most popular and easy implementation is based on a 2.5 D representation of topography by means of a raster coding: i.e., the topography is represented by a single band image, in which the value of the pixel is the elevation. According to this representation, topographic surface is uniquely defined by a continuous, single-valued bivariate function z ¼ f ðx, yÞ,
Morphometry
931
M
Morphometry, Fig. 1 Example of basic morphometric variables for an alpine area (Trentino, Italy). (a) digital terrain model (pixel size 2 m); (b) shaded relief; (c) slope in degrees; (d) aspect classified in eight directions
(i.e., classes 45 degree wide); (e) profile curvature; (f) tangential curvature
932
Morphometry
Morphometry, Fig. 2 Examples of roughness indices (see Trevisani and Rocca 2015), based on a lag of 2 pixels and a circular moving window with a radius of 3 pixels, calculated for an alpine terrain (Trentino, Italy). (a) DTM and shaded relief (pixel size 2 m); (b)
isotropic roughness; (c) anisotropy in roughness (1 maximum anisotropy); (d) landscape classified according the ratio of isotropic roughness versus flow directional roughness: in green morphologies elongated in the direction of flow; in red morphologies elongated along contour lines
curvature (kve), accumulation curvature (Ka), ring curvature (Kr), rotor (rot), horizontal curvature deflection (Dkh), vertical curvature deflection (Dkv), and some others. Local topographic variables are functions of the partial derivatives of elevation. In this regard, local variables can be divided into three groups: (1) first-order variables – G, A, AN, and AE – are functions of only the first derivatives; (2) second-order variables – kmin, kmax, H, K, M, ∇2, IS, C, kh, kv, E, khe, kve, Ka, Kr, and rot – are functions of both the first and second derivatives; and (3) third-order variables –
Dkh and Dkv – are functions of the first, second, and third derivatives. Equations can be found elsewhere (Florinsky 2016, 2017). The partial derivatives of elevation (and thus local morphometric variables) can be estimated from DEMs by (1) several finite-difference methods using 3 3 or 5 5 moving windows and (2) analytical computations based on DEM interpolation by local splines or global approximation of a DEM by high-order orthogonal polynomials.
Morphometry
933
Morphometry, Fig. 3 Schemes for the definitions of local, nonlocal, and two-field specific variables. (a) and (b) display two pairs of mutually perpendicular normal sections at a point P of the topographic surface: (a) Principal sections APA0 and BPB0 . (b) Sections CPC0 and DPD0 allocated by gravity. n is the external normal; g is the gravitational acceleration vector; cl is the contour line; sl is the slope line. (c) Catchment and
dispersive areas, CA and DA, are areas of figures P0AB (light gray) and P00AB (dark gray), correspondingly; b is the length of a contour line segment AB; l1, l2, l3, and l4 are the lengths of slope lines P0A, P0 B, AP00, and BP00, correspondingly. (d) The position of the Sun in the sky: θ is solar azimuth angle, c is solar elevation angle, and N is north direction. (From Florinsky (2017, Fig. 1))
Nonlocal Morphometric Variables A nonlocal (or regional) morphometric variable is a singlevalued bivariate function describing a relative position of a given point on the topographic surface. Among nonlocal topographic variables are catchment area (CA) and dispersive area (DA). To determine nonlocal morphometric attributes, one should analyze a relatively large territory with boundaries located far away from a given point (e.g., an entire upslope portion of a watershed) (Fig. 3c). Flow routing algorithms are usually applied to estimate nonlocal variables. These algorithms determine a route, along which a flow is distributed from a given point of the topographic surface to downslope points. There are several flow routing algorithms grouped into two types: (1) eight-node single-flow direction (D8) algorithms using one of the eight possible directions separated by 45 to model a flow from a given point and (2) multipleflow direction (MFD) algorithms using the flow partitioning. There are some methods combining D8 and MFD principles.
Two-Field Specific Morphometric Variables A two-field specific morphometric variable is a single-valued bivariate function describing relations between the topographic surface (located in gravity field) and other fields, in particular, solar irradiation and wind flow. Among two-field specific morphometric variables are reflectance (R) and insolation (I). These variables are functions of the first partial derivatives of elevation and angles describing the position of the Sun in the sky (Fig. 3d). Reflectance and insolation can be derived from DEMs using methods for the calculation of local variables. Combined Morphometric Variables Morphometric variables can be composed from local and nonlocal variables. Such attributes consider both the local geometry of the topographic surface and a relative position of a point on the surface. Among combined morphometric variables are topographic index (TI), stream power index (SI), and some others. Combined variables are derived from DEMs by the sequential application of methods for nonlocal and local variables, followed by a combination of the results.
M
934
Conclusions Geomorphometry is a discipline still in evolution and of growing interest in many applied and research contexts. Geomorphometry is a fundamental tool for studying geosphere-anthroposphere interlinked dynamics in the critical zone, especially when performing multitemporal morphometric analysis. Moreover, geomorphometric analysis permits to derive relevant geoengineering and hydrological parameters, useful for the numerical modelling of different processes such as floods, landslides, wildfires and ecological processes. Geomorphometric indices can be also exploited as secondary variables in various prediction approaches based on geostatistics or machine learning, for example for mapping soil properties. Geomorphometry is also adopted for the characterization of landscape and of geodiversity. Future developments in mathematical/computational approaches as well as in technology will likely amplify morphometry potentials.
Moving Average Moore ID, Grayson RB, Ladson AR (1991) Digital terrain modelling: a review of hydrological, geomorphological, and biological applications. Hydrol Process 5(1):3–30 Pike RJ (2000) Geomorphometry - diversity in quantitative surface analysis. Prog Phys Geogr 24(1):1–20 Sagar BSD (2013) Mathematical morphology in geomorphology and GISci. Chapman and Hall/CRC Shary PA, Sharaya LS, Mitusov AV (2002) Fundamental quantitative methods of land surface analysis. Geoderma 107(1–2):1–32 Trevisani S, Rocca M (2015) MAD: robust image texture analysis for applications in high resolution geomorphometry. Comput Geosci 81: 78–92 Wilson JP (2018) Environmental applications of digital terrain modeling. Wiley-Blackwell, Chichester
Moving Average Frits Agterberg Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Cross-References ▶ Digital Elevation Model ▶ Geostatistics ▶ Horton, Robert Elmer ▶ Hypsometry ▶ LiDAR ▶ Machine Learning ▶ Mathematical Morphology ▶ Pattern Analysis ▶ Quantitative Geomorphology ▶ Remote Sensing ▶ Virtual Globe
Definition A moving average is the mean or average of all values of a variable within a given domain that is moved across the space of study. Domain and space can be one-, two-, or threedimensional. In Euclidian space, every moving average value is located at the center of its domain. For time series, the moving average can apply to values before the point in time at which it is located. In weighted moving average analysis, the values are assigned different weights usually decreasing with distance away from the point of location of each estimated moving average value.
Bibliography
Introduction
Evans IS (1972) General geomorphometry, derivations of altitude, and descriptive statistics. In: Chorley RJ (ed) Spatial analysis in geomorphology. Methuen, London, pp 17–90 Florinsky IV (2016) Digital terrain analysis in soil science and geology, 2nd edn. Academic, Amsterdam Florinsky IV (2017) An illustrated introduction to general geomorphometry. Prog Phys Geogr 41(6):723–752 Grohmann CH, Smith MJ, Riccomini C (2011) Multiscale analysis of topographic surface roughness in the Midland Valley, Scotland. IEEE Trans Geosci Remote Sens 49(4):1200–1213 Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC-3(6): 610–621 Hengl T, Reuter HI (2009) Geomorphometry: concepts, software, applications. Elsevier, Amsterdam Mandelbrot B (1967) How long is the coast of Britain? Statistical selfsimilarity and fractional dimension. Science 156(3775):636–638
Along with “probability,” the mean or average is probably the best-known statistical tool. The first average on record was taken by William Borough in 1581 for a number of successive compass readings (cf. Eisenhart 1963; also see Hacking 2006). The procedure of averaging numbers was regarded with suspicion for a long time. Thomas Simpson (1755) advocated the approach in a paper entitled: “On the advantage of taking the mean of a number of observations in practical astronomy,” stating: “It is well-known that the method practiced by astronomers to diminish the errors arising from the imperfections of instrument and the organs of sense by taking the mean of several observations has not so generally been received but that some persons of note have publicly
Moving Average
935
maintained that one single observation, taken with due care, was as much to be relied on, as the mean of a great number.” The simple, unweighted moving average of n values xi n
xi
within a given domain can be written as x n¼ i¼1n . Different domains can have different values of n. An example is shown in Fig. 1. It is for a long series of 2796 copper (XRF) concentration values for cutting samples taken at 2 m intervals along the so-called Main KTB borehole of the German Continental Deep Drilling Program (abbreviated to KTB). These data are in the public domain (citation: KTB, WG Geochemistry). Depths of the first and last cuttings used for this series are 8 and 5596 m, respectively. Locally, in the database, results are reported for a 1-m sampling interval; then, alternate copper values at the standard 2 m interval were included in the series used, for example. Most values are shown in Fig. 1 together with a 101-point average representing consecutive 202-m log segments of drill-core. Locally, the original copper values deviate strongly from the moving average. It can be assumed that these deviations are largely random and that the moving average represents a reproducible copper concentration value contrary to the original XRF values that are subject to significant measurement errors.
Weighted Moving Average Methods In most applications of the moving average method, different weights are assigned to the observed values depending on the distance between the point at which the moving average value Moving Average, Fig. 1 Copper concentration (ppm) values from Main KTB bore-hole together with mean values for 101 m long segments of drill-core. Locally, the original data deviate strongly from the moving average. (Source: Agterberg 2014, Fig. 6.22)
is to be calculated and the point at which a measurement is to be used. In time series analysis, more weight is commonly assigned to values at points in the more immediate past. A frequently used method in time series analysis is the exponentially weighted moving average (EWMA) method summarized by Hunter (1986) as follows: The predicted value at time i þ 1 in a time series with observed values yi can be written as yiþ1 ¼ yi þ lϵ i where 0 < l < 1 is a constant and ϵ i ¼ yi yiþ1 is the observed random error at time i. For applications in Euclidian space, more weight is assigned to values at points that are closer to the points at which the values are to be estimated. A well-known method in two-dimensional space consists of assigning weights to the observations that are inversely proportional to the squared distances between locations of the known values and location at which the value is to be estimated. An example from Cheng and Agterberg (2009) based on arsenic concentration values in about 1000 stream sediment samples in the Gejiu area in China is shown in Fig. 2a. This area contains large tin deposits in its southeastern part that have a long history of mining causing extensive pollution in the environment. The stream sediment sample locations were equally spaced at approximately 2 km in the north-south and east-west directions. Every sample represented a composite of materials from the drainage basin upstream of the collection site. In this application, each square on Fig. 2a represents the moving average for a square window measuring 26 km on a side with influence of samples decreasing according to the square of distance. For comparison, Fig. 2b also shows results of local singularity analysis applied to the same data set. This is a
M
936 Moving Average, Fig. 2 Map patterns derived from arsenic concentration values in stream sediment samples; triangles represent large tin mines rich in other elements including arsenic as well: (a) weighted moving average with weights inversely proportional to squared distance from point at which value is plotted; (b) local singularity based on small square cells with halfwidths set equal to 1, 3, 5, . . ., 13 km; for further explanation. (See Cheng and Agterberg 2009)
Moving Average
a
.72356 54.304 119.64 184.99 250.33 315.67 381.02 446.36 511.71 577.05 642.39 707.74 773.08 838.42
b
.702 1.54 1.63 1.73 1.83 1.92 2.02 2.11 2.21 2.31 2.40 2.50
different contouring technique that places more emphasis on the strictly local environments of the points at which the average is to be computed (for details, see Cheng and Agterberg 2009). Obviously, the pattern of Fig. 2b is more advantageous when the objective is to find undiscovered mineral deposits in relatively unexplored target areas, because the areas used for the moving averaging are much larger in Fig. 2a. One of the first applications of a weighted moving average technique in the geosciences is described in detail by Krige (1968) for gold assay values in South African gold mines. In this application, the weights were determined by the method of sampling that was applied to the gold deposits with more weight assigned to nearby sampling locations. Figure 3 (after
Krige 1966) shows two kinds of results for one particular gold mine although the same moving average method was applied. The area covered in Fig. 3b is 400 times as large as the area in Fig. 3b indicating scale independence (self-similarity) of the gold distribution pattern. Other weighted moving average methods include hot spot analysis (see Getis and Ord 1992). It is widely applied by geographers to enhance two-dimensional patterns of random variables that exhibit spatial clustering. Typical examples are counts (xij) of occurrences (e.g., cases of a specific disease or accidents) for small areas (e.g., counties). Originally, the technique was based on Moran’s I statistic for spatial correlation (Moran, 1968). It led to the Getis Gi(d) statistic that satisfies:
Moving Average
937
a
b
Moving Average, Fig. 3 Typical gold inch-dwt weighted moving average trend surfaces in the Klerksdorp goldfield obtained on the basis of two-dimensional moving averages for two areas with similar average gold grades; value plotted is inch-pennyweight (1 unit ¼ 3.95 cm-
g). (a) moving averages of 100 100 ft. areas within a mined-out section of 500 500 ft; (b) moving averages of 2000 2000 ft. areas within mined-out section of 10,000 10,000 ft. (Source: Krige 1966, Fig. 3)
Gi ðd Þ ¼
n j¼1 wij ðd Þxj n j¼1 xj
, j not equal to i,
where {wij} is an asymmetric one/zero spatial weight matrix with ones for all links defined as being within distance d of a given i, all other links being zero (cf. Getis and Ord 1992, p. 190). Setting W i ¼ nj¼1 wij ðdÞ, it can be shown that the expected value of Gi(d) and its variance satisfy: E½Gi ðdÞ ¼
W ðn 1 W i ÞY i2 Wi ; s2 ½Gi ðd Þ ¼ i n1 ðn 1Þ2 ðn 2ÞY 2 i1
where E denotes mathematical expectation and s2 is variance, Y i1 ¼
n j¼1 xj
n1
; and Y i2 ¼
n 2 j¼1 xj
n1
Y 2i1 :
For a one-dimensional application of hot-spot analysis, see Agterberg et al. (2020).
Summary and Conclusions A moving average is the mean or average of all values of a variable within a given domain that is moved across its
Euclidian space of study. For time series, the moving average can apply to values before the point in time at which it is located. In weighted moving average analysis, the input values are assigned different weights, usually decreasing with distance away from the point of location of each estimated moving average value. By means of practical examples, it was shown in this entry that using the moving average can help to eliminate “noise” from the data. Also, applying weights to the observed values that decrease with distance away from each point, at which the moving average is estimated, often improves results. Further improvements may be obtainable when estimated average values are restricted to strictly local environments. The three examples (Figs. 1, 2 and 3) are concerned with element concentration values across study areas that exhibit scale-independence, which is documented in more detail in the original publications.
Cross-References ▶ Local Singularity Analysis ▶ Scaling and Scale Invariance ▶ Time Series Analysis ▶ Trend Surface Analysis
M
938
Bibliography Agterberg FP (2014) Geomathematics: theoretical foundations, applications and future developments, Quantitative geology and geostatistics 18. Springer, Heidelberg, 553 pp Agterberg FP, Da Silva A-C, Gradstein FM (2020) Geomathematical and statistical procedures. In: Gradstein FM, Ogg JG, Schmitz MB, Ogg GB (eds) Geologic time scale 2020, vol 1. Elsevier, Amsterdam, pp 402–425 Cheng Q, Agterberg FP (2009) Singularity analysis of ore-mineral and toxic trace elements in stream sediments. Comput Geosci 35: 234–244 Eisenhart MA (1963) The background and evolution of the method of least squares. In: Proceedings of the 34th Sess. Int. Statist Inst, Ottawa (preprint) Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geogr Anal 24:189–206 Hacking I (2006) The emergence of probability, 2nd edn. Cambridge University Press, 209 pp Hunter JS (1986) The exponentially weighted moving average. J Qual Technol 18:203–210 Krige DG (1966) A study of gold and uranium distribution pattern in the Klerksdorp goldfield. Geoexploration 4:43–53 Krige DG (1968) Two-dimensional weighted moving average trend surfaces in ore evaluation. In: Proceedings of symposium on mathematical statistics and computer applications. Southern African Institute of Mining and Metallurgy, Johannesburg, pp 13–38 Moran PAP (1968) An introduction to probability theory. Oxford University Press, 542 pp
Multidimensional Scaling Francky Fouedjio Department of Geological Sciences, Stanford University, Stanford, CA, USA
Definition Multidimensional scaling (MDS) is a family of methods used for representing (dis)similarity measurements among objects (entities) as distances between points in a low-dimensional space, each point corresponding to one object (entity). MDS is concerned with the problem of constructing a configuration of n points in a m-dimensional space using information about the (dis)similarities between n objects, in such a way that inter-point distances reflect, in some sense, the (dis)similarities. Thus, similar objects are mapped close to each other, whereas dissimilar objects are located far apart.
Introduction Multidimensional scaling (MDS) is frequently used in geosciences for exploring (dis)similarities among a set of
Multidimensional Scaling
objects (entities) such as signals, images, simulations, predictions, or samples. The term “similarity” depicts the degree of likeness between two objects, while the term “dissimilarity” indicates the degree of unlikeness. Some objects are more dissimilar (or similar) to each other than others. For instance, colors red and pink are more similar (less dissimilar) to each other than colors red and green. Equivalently, colors red and green are more dissimilar (less similar) than colors red and pink. For dissimilarity data, a higher value indicates more dissimilar objects, while for similarity data, a higher value indicates more similar objects. Dissimilarities are distancelike quantities, while similarities are inversely related to distances. Dissimilarity or similarity information about a set of objects can arise in many different ways. For instance, if one considers an ensemble of continuous geostatistical realizations, the dissimilarity between any two realizations can be defined as the square root sum of square differences between realizations, which is calculated cell-by-cell. Multidimensional scaling (MDS) aims to map objects’ relative location using data that show how these objects are dissimilar or similar. The basic idea consists of representing the objects in a low-dimensional space (usually Euclidean) so that the inter-point distances best agree with the observed (dis)similarities between the objects. In other words, MDS depicts interobject (dis)similarities by inter-point distances, each point corresponding to one object. In this way, MDS provides interpretable graphical displays (i.e., maps of objects), in which dissimilar objects are located far apart, and similar objects are mapped close to each other. In the color example, the points representing red and pink will be located closer in the space than the points representing red and green. MDS’s map facilitates our intuitive understanding of the relationships among the objects represented in the map. It helps to understand and possibly explain any structure or pattern in the data. Multidimensional scaling’s map (or spatial representation) of a set of n objects consists of a set of n m-dimensional coordinates, each one of which represents one of the n objects. The required coordinates are generally found by minimizing some measure of goodness-of-fit between the inter-point distances implied by the coordinates and the observed (dis)similarities. MDS seeks to find the best-fitting set of coordinates and the appropriate value of the number of dimensions m needed to adequately represent the observed (dis)similarities. The hope is that the number of dimensions, m, will be small, ideally two or three, so that the derived configuration of points can be easily plotted. It is important to note that the configuration of points produced by any MDS method is indeterminate with respect to translation, rotation, and reflection. There is no unique set of coordinate values that give rise to a set of distances since the distances are unchanged by shifting the whole configuration of points
Multidimensional Scaling
from one place to another or by a rotation or a reflection of the configuration. In other words, one cannot uniquely determine either the location or the orientation of the configuration of points. There are mainly two types of MDS, depending on the scale of the (dis)similarity data: metric multidimensional scaling and non-metric multidimensional scaling. The metric multidimensional scaling is adequate when dissimilarities are also distances. In contrast, non-metric multidimensional scaling is appropriate when dissimilarities are not distances. The metric MDS uses the actual values of the dissimilarities, while the non-metric MDS effectively uses only the rank order (orderings) of the dissimilarities. These two types of MDS also differ in how the agreement between fitted inter-point distances and observed dissimilarities is assessed. A complete description of the theoretical and practical aspects of MDS can be found in the following reference books: Cox and Cox (2000), Borg and Groenen (2005), and Borg et al. (2018).
939
Metric MDS Assume that there are n objects with dissimilarities δij between them, for i, j ¼ 1, . . ., n (one dissimilarity for each pair of objects). Let Δ ¼ [δij] be the n n dissimilarity matrix. The metric MDS attempts to find the coordinates of the n objects in a m-dimensional Euclidean space, in the form of an n m matrix of coordinates values X ¼ [xkl], where each point represents one of the objects, and the inter-point distances dij(X) are such that: dij ðXÞ f dij ,
ð1Þ
where f() is a continuous parametric monotonic function, such that f(δij) ¼ α þ βδij. The function f() can either be the identity function or a function that attempts to transform the dissimilarities to a distance-like form. The two main metric MDS methods are classical scaling and least squares scaling.
(Dis)Similarity Measure
Classical Scaling
Multidimensional scaling (MDS) is based on similarity or dissimilarity information between pairs of objects. The similarity between two objects is a numerical measure of the degree to which the two objects are alike. Thus, similarities are higher for pairs of objects that are more alike. In contrast, the dissimilarity between two objects is the numerical measure of the degree to which the two objects are different. Dissimilarity is lower for more similar pairs of objects. The term proximity is often used as an umbrella that encompasses both dissimilarity and similarity. Let O be the set of objects. A similarity measure s on O is defined as s: O O ! ℝ such that: (1) s(x, y) s0; (2) s(x, x) ¼ s0; and (3) s(x, y) ¼ s(y, x); if in addition, (4) s(x, y)s(y, z) [s(x, y) þ s(y, z)]s(x, z) (triangle inequality), s is called a metric similarity measure. A dissimilarity measure is a function d : O O ! ℝ such that: (1) d(x, y) 0; (2) d(x, x) ¼ 0; and (3) d(x, y) ¼ d(y, x); if in addition, (4) d(x, z) d(x, y) þ d(y, z) (triangle inequality), d is called a metric dissimilarity measure (distance measure). A (dis)similarity matrix is a matrix such that their elements (representing (dis)similarities between pairs of objects) fulfill the conditions depicted above. The input data for MDS is in the form of a dissimilarity matrix representing the dissimilarities between pairs of objects. When the similarities are available, MDS requires first to convert the similarities into dissimilarities. There are many ways of transforming similarity measures into dissimilarity measures. One way is to take d(x, y) ¼ s0 s(x, y), 8ðx, yÞ O O.
Classical scaling transforms a dissimilarity matrix into a set of coordinates such that the (Euclidean) distances derived from these coordinates approximate as closely as possible the original dissimilarities. The basic idea of classical scaling is to transform the dissimilarity matrix into a cross-product matrix and then perform its spectral decomposition, which gives a principal component analysis (PCA). Given the dissimilarity matrix Δ ¼ [δij], the classical MDS carried out the following steps: 1. Construct the matrix A ¼ 12 d2ij .
2. Form the matrix B ¼ HAH, where H ¼ I n111T is the centering matrix of size n and 1 is a column-vector of n ones. 3. Perform the eigen-decomposition of B ¼ ΓΛΓT, where Λ ¼ diag (l1, . . ., ln) is the diagonal matrix formed from the eigenvalues of B, and Γ ¼ [Γ1, . . ., Γn] is the corresponding matrix of eigenvectors; the eigenvalues are assumed to be labeled such that l1 l2 . . . ln. 4. Take the first m eigenvalues greater than 0 (Λ+) and the corresponding first m columns of Γ (Γ+). The solution of 1=2
classical MDS is X ¼ Gþ Lþ . If the dissimilarity matrix Δ is Euclidean, i.e., the matrix B is positive semi-definite, an exact representation of dissimilarities, that is, one for which dij ¼ δij, can be found in a q-dimensional Euclidean space. The coordinate matrix of classical MDS is then given by X ¼ Gq L1=2 q , where Λq is
M
940
Multidimensional Scaling
the matrix of the q non-zero eigenvalues and Γq is the matrix of the corresponding q eigenvectors. Using all q-dimensions will lead to complete recovery of the original Euclidean dissimilarity matrix. The best-fitting m-dimensional representation is given by the m eigenvectors of B corresponding to the m largest eigenvalues. The adequacy of the m-dimensional representation can be judged based on one of the following two criteria: Pðm1Þ ¼
m i¼1 n i¼1
jli j , jli j
m 2 i¼1 li n 2 i¼1 li
Pðm2Þ ¼
:
ð2Þ
StrainðXÞ ¼
i, j
d ij ðXÞ dij =
i, j
dij ðXÞ f dij
2
1=2
d2ij ðXÞ
=
ð4Þ
i s23, fluid-1 is the non-wetting phase and fluid-2 is the intermediate phase. The spreading phenomenon for a three-phase system is more much complex; further discussion of this subject, as well as contact line determination for a three-phase system, can be found in the cited references (Adler 1995; Blunt 2017). Darcy’s law offers a convenient way to describe flow behavior at a macroscopic level. The relative permeability Multiphase Flow, Fig. 1 Typical two-phase relative permeability functions
coefficients, in particular, are used to capture the relevant pore-scale physics; therefore, these relative permeability functions are strongly dependent on the pore structure and pore size distribution (Parker 1989). A set of connected pore bodies and throats can be represented quantitatively (or numerically) using a pore network model, which consists of a 3D network of nodes and channels with varying sizes (e.g., radii and lengths). This approach has gained significant popularity with advances in image analysis and digital rock physics. Different levels of capillary pressure are applied to the network to simulate the drainage and imbibition processes, and the macroscopic relative permeability relationships, as well as capillary pressure relationships, can be computed as a function of phase saturations and saturation history (or paths) (Blunt 2017). Although such theoretical computations using pore network models are feasible, empirical functions derived from experimental measurements involving actual rock samples are still most commonly adopted. A schematic of two-phase relative permeability functions for the wetting (krw) and non-wetting phases (krnw) are shown in Fig. 1. Residual saturation is the saturation at which the relative permeability becomes zero, as denoted by Swr or Snwr for the wetting and non-wetting phases, respectively, in Fig. 1. As a result of capillary pressure, the non-wetting phase preferentially occupies the bigger pore bodies or centers of pore throats, while the wetting phase preferentially occupies the smaller pore throats, crevices between rock grains. The trapped non-wetting phase would likely pose a bigger
Multiphase Flow
obstacle to the flowing wetting phase than vice versa. As a result, kornw is generally higher than korw . In addition, the crossover between the two curves would typically occur where the wetting phase saturation is greater than 0.5 (Bear 1972).
Analytical Formulations Versus Numerical Simulations The aforementioned governing equations (conservation of mass and Darcy’s law) are solved by incorporating the relevant constitutive relationships (relative permeability and capillary pressure functions), as well as other auxiliary relationships such as a phase behavior or fluid model (e.g., an equation of state). Solutions to these problems may involve both analytical and numerical techniques. Boundary and initial conditions must be specified to complete the problem statement. Examples of boundary conditions may include no-flow boundary or constant pressure or flux. Analytical solutions are popular because of their simplicity and ease of application. They are widely adopted in the areas of pressure transient analysis or well testing (Lee et al. 2003). However, many assumptions regarding the flow regimes (e.g., transient flow, boundary-dominated flow, sequential depletion) or model set-up (e.g., geometric symmetry and homogeneous properties) are often invoked in analytical models. Simulations entail numerical approximation of the solutions to the governing equations. Despite some recent developments in unstructured mesh generation and finite-element formulations for simulating multiphase flow, most existing commercial simulation packages predominantly focus on finite-difference/finite-volume schemes. A variety of structured and unstructured meshes, including Cartesian, corner point, perpendicular bisector (PEBI), and Delaunay triangular/tetrahedral, grids are available. Further details can be found in (Chen et al. 2006). Meshes should conform to the domain boundaries. The overall computational requirement would increase with more complex meshes and local refinement. Upscaling, a process that aims to replace a fine-scale detailed description of reservoir properties with a coarser scale description that has equivalent properties, is often used to generate a coarser mesh (Das and Hassanizadeh 2005).
957
momentum balance, and complex reaction kinetics, into the solution process. These coupled simulations are important for modeling more complex subsurface systems, including geothermal reservoirs, geologic storage of CO2, and unconventional tight/shale formations with irregular multiscale fracture networks. More sophisticated numerical approximation or discretization methods (e.g., advanced finite element methods with mixed spaces) are needed to enhance computational efficiency and better capture the complex physical processes or geometries, which often introduce additional nonlinearities among the model parameters. Another main challenge is the discrepancy between measurement and modeling scales, as well as the difficulty in representing heterogeneity (and its uncertainty) at these different scales. Detailed pore-scale models are computationally expensive and not suitable for field-level analysis. Macroscopic models are often employed. Darcy’s law, in theory, is only applicable at scales greater than or equal to the representative elementary volume (REV) scale (Bear 1972). However, heterogeneity generally varies with scale. As a result, quantifying the sub-scale variability in multiphase flow and transport behavior, due to uncertain heterogeneity distribution below the modeling scale, is especially challenging (Vishal and Leung 2017).
Summary or Conclusions Subsurface multiphase flow problems are encountered in a wide range of scientific and engineering applications. Complex heterogeneity observed in rock properties coupled with multitudes of physical phenomena render these problems to be both challenging and interesting. Models at both the poreand macroscopic-levels are commonly adopted to analyze these problems. Future directions should focus on discovering, examining, and quantifying additional physical phenomena into existing or new models.
Cross-References ▶ Geohydrology ▶ Porosity ▶ Rock Fracture Pattern and Modeling
Current Investigations and Knowledge Gaps Bibliography Much of the current research efforts have been focusing on the incorporation of additional physical phenomena. For example, coupling multiphase flow with geomechanical and geochemical considerations under non-isothermal conditions. This would require incorporating additional governing and auxiliary equations, such as energy balance, linear
Adler PM (ed) (1995) Multiphase flow in porous media. Kluwer Academic, Amsterdam Bear J (1972) Dynamics of fluids in porous media dynamics of fluids in porous media. Elsevier, New York Bird RB, Stewart WE, Lightfoot EN (2007) Transport phenomena, Revised 2nd edn. Wiley, New York
M
958
Multiple Correlation Coefficient
Blunt MJ (2017) Multiphase flow in permeable media: a pore-scale perspective. Cambridge University Press, Cambridge Chen Z, Huan G, Ma Y (2006) Computational methods for multiphase flows in porous media, vol 2. SIAM, Philadelphia Das DB, Hassanizadeh SM (2005) Upscaling multiphase flow in porous media. Springer, Berlin Ertekin T, Abou-Kassem JH, King GR (2001) Basic applied reservoir simulation. Society of Petroleum Engineers, Houston Lee J, Rollings JB, Spivey JP (2003) Pressure transient testing, SPE textbook series, vol 9. SPE, Richardson Parker JC (1989) Multiphase flow and transport in porous media. Rev Geophys 27(3):311–328 Vishal V, Leung JY (2017) A multi-scale particle-tracking framework for dispersive solute transport modeling. Comput Geosci 22(2):485–503
Multiple Correlation Coefficient
nonnegative. In particular, the multiple correlation coefficient is equal to 0, when the random variables in {X1, . . ., XK} are pairwise orthogonal, i.e., when C(X) ¼ IK K, i.e., when rXi Xj ¼ 0, i 6¼ j: On the other hand, when the random variables in {X1, . . ., XK} form a closed composition (see ▶ “Compositional Data”), which is a common situation when considering geochemical data, then rXi ðX1 ,...,Xi1 ,Xiþ1 ,...,XK Þ ¼ 1 for all i, i ¼ 1, . . ., K. This property reflects nothing other than the constant sum constraint of compositional data, as the correlation matrix for the independent variables has 0 determinant. The multiple correlation coefficient can be used to measure the degree of multicollinearity between variables as follows. The variance inflation factor VIFi for the ith variable (Hocking 2013) is defined as
U. Mueller School of Science, Edith Cowan University, Joondalup, WA, Australia
Definition Given a set of random variables {X1, X2, . . ., XK}, the multiple correlation coefficient rXi ðX1 ,...,Xi1 ,Xiþ1 ,...,XK Þ of the ith random variable Xi on the remaining K 1 variables is given by rXi ðX1 ,...,Xi1 ,Xiþ1 ,...,XK Þ ¼
1
VIFi ¼
C ðX Þ ¼
rX 1 X 2 1 ⋮ rX K X 2
detðCðXÞÞ , detðCi ðXÞÞ
rX 1 X K rX 2 X K ⋱ ⋮ 1
:
Its value is nonnegative, and it increases with decreasing magnitude of rXi ðX1 ,...,Xi1 ,Xiþ1 ,...,XK Þ : There is a high degree of collinearity in the data if VIFi > 10, for some i, i ¼ 1, . . ., 10. The most common application of the multiple correlation coefficient arises in the situation when there is a dependent variable Y and a set of predictor or dependent variables {X1, X2, . . ., XK}. In this case, the multiple correlation coefficient may be computed from the matrix
C ðY ,X Þ ¼
where Ci(X) is the ith principal minor of the correlation matrix 1 rX 2 X 1 ⋮ rX K X 1
1
1
r2Xi ðX1 ,...,Xi1 ,Xiþ1 ,...,XK Þ
rYY rX 1 Y
rY X 1 rX 1 X 1
⋮ rX K Y
⋮ rX K X 1
rY X K rX 1 X K ⋱ ⋮ rX K X K
as ,
i.e., the matrix obtained by removing the ith row and the ith column of C(X) and rXi Xj is the Pearson correlation coefficient between Xi and Xj.
Properties and Usage The multiple correlation coefficient was first introduced by Pearson (1896) who also produced several further studies on it and related quantities such as the partial correlation coefficient (Pearson 1914). It is alternatively defined as the Pearson correlation coefficient between Xi and its best linear approximation by the remaining variables {X1, . . ., Xi 1, Xi þ 1, . . ., XK} (Abdi 2007). In contrast to the Pearson correlation coefficient between two random variables, which attains values between 1 and 1, the multiple correlation coefficient is
rY ðX1 ,X2 ,...,XK Þ ¼
1
detðCðY, XÞÞ detðCðXÞÞ
and the multiple correlation coefficient rY ðX1 ,X2 ,...,XK Þ as a measure between the dependent variable and its predictors is well defined, and it does not matter whether the independent variables are random or fixed (Abdi 2007). The multiple correlation coefficient attains a particularly simple form, when the independent variables are pairwise orthogonal (Abdi 2007), in that case it is given as the sum of the square correlation coefficients: rY ðX1 ,X2 ,...,XK Þ ¼
r2X1 Y þ r2X2 Y þ . . . þ r2XK Y
This case arises, for example, when a regression is based on principal components (see ▶ “Principal Component Analysis”).
Multiple Correlation Coefficient
959
In regression settings, the square multiple correlation coefficient is interpreted as the fraction of the variation in the data explained by the model; this quantity is also known as the coefficient of determination. The coefficient of determination R2Y ðX1 ,X2 ,...,XK Þ is commonly used as one of the criteria for determining the best model from a set of competing regression models. As R2Y ðX1 ,X2 ,...,XK Þ is biased upward, the adjusted R2 is normally considered which is nothing other but R2Y ðX1 ,X2 ,...,XK Þ multiplied n1 by nK1 where n denotes the number of observations. To test whether a given value of R2Y ðX1 ,X2 ,...,XK Þ is statistically significant, the ratio F¼
R2Y ðX1 ,X2 ,...,XK Þ
1 R2Y ðX1 ,X2 ,...,XK Þ
nK1 K
is computed, and under the assumption of normally distributed errors and independence of the errors and estimates, this ratio follows an Fdistribution with K and n K 1 degrees of freedom (Abdi 2007). The usefulness of R2Y ðX1 ,X2 ,...,XK Þ is context dependent: a low value which is statistically significant might be acceptable when the aim is to discover causal relationships, but even a relatively high R2 might be unacceptable when accurate prediction is required (Crocker 1972). Already in 1914 Pearson cautioned about the temptation to add more variables with the aim to increase significance; this strategy will only succeed, if the variables to be added have weak correlation with the other predictors and among one another and high correlation with the dependent variable (Pearson 1914). It should be noted that multiple correlation or coefficient of determination is rarely used on their own when judging the quality of a statistical model but typically in combination with the Akaike or Bayes information criteria (James et al. 2013).
Applications The multiple correlation coefficient is rarely used as a measure for the goodness of fit of a regression model. It is more common to use the coefficient of determination for this purpose. Exceptions are a study by Liu et al. (2019) in a context of fully mechanized coal mining where the height of the water conducting fractured zone in is modeled via nonlinear multiple linear regression with predictor variables thickness of coal seam, proportion of hard rock, length of panel, mined depth, and dip angle. The model is tested in detail for the Xiegou coal mine in the Shanxi Province, China. The multiple regression coefficient has also been used to evaluate competing models in mineral prospectivity mapping such as the case study on
iron ore presented by Mansouri et al. (2018). The general setting for these evaluations is often cross-validation. In a geostatistical context, multiple correlation is also typically used in cross-validation as one criterion in the assessment of the quality of a geostatistical model (Webster and Oliver 2007).
Summary The multiple correlation coefficient is a measure of association linked to multiple regression and is useful as a first-pass measure for assessing the quality of a regression model. In this case, it is calculated as the Pearson correlation between estimates and true values. However, it is also a valuable tool to appraise the input variables and their potential for variance inflation.
Cross-References ▶ Compositional Data ▶ Principal Component Analysis ▶ Regression
Bibliography Abdi H (2007) Multiple correlation coefficient. In: Salkind N (ed) Encyclopedia of measurement and statistics. Sage, Thousand Oaks Crocker DC (1972) Some interpretations of the multiple correlation coefficient. Am Stat 26(2):31–33 Hocking RR (2013) Methods and applications of linear models: regression and the analysis of variance, 3rd edn. Wiley, Hoboken James G, Witten D, Hastie T, Tibshirani R (2013) An introduction to statistical learning with applications in R, 7th printing. Springer, New York Liu Y, Yuan S, Yang B, Liu J, Ye Z (2019) Predicting the height of the water-conducting fractured zone using multiple regression analysis and GIS. Environ Earth Sci 78:422. https://doi.org/10.1007/s12665019-8429-3 Mansouri E, Feizi F, Jafari Rad A, Arian M (2018) Remote-sensing data processing with the multivariate regression analysis method for iron mineral resource potential mapping: a case study in the Sarvian area, Central Iran. Solid Earth 9:373–384 Pearson K (1896) Mathematical contributions to the theory of evolution.—III. Regression, heredity, and panmixia. Philos Trans R Soc London Series A 187:253–317. https://doi.org/10.1098/rsta. 1896.0007 Pearson K (1914) On certain errors with regard to multiple correlation occasionally made by those who have not adequately studied this subject. Biometrika 10(1):181–187 Pearson K (1916) On some novel properties of partial and multiple correlation coefficients in a universe of manifold characteristics. Biometrika 11(3):233–238 Webster R, Oliver MA (2007) Geostatistics for environmental scientists, 2nd edn. Wiley, Chichester
M
960
Multiple Point Statistics Jef Caers1, Gregoire Mariethoz2 and Julian M. Ortiz3 1 Department of Geological Sciences, Stanford University, Stanford, CA, USA 2 Institute of Earth Surface Dynamics, University of Lausanne, Lausanne, Switzerland 3 The Robert M. Buchan Department of Mining, Queen’s University, Kingston, ON, Canada
Definition Multiple-point geostatistics is the field of study that focuses on the digital representation of the physical reality by reproducing high-order statistics inferred from training data, usually training images, that represent the spatial (and temporal) patterns expected in such context. Model-based and data-driven algorithms aim to reproduce such patterns in space-time by capturing and reproducing real world features through trends, hierarchies, and local spatial variation.
Multiple-point Geostatistics: A Historical Perspective Geostatistics aims at modeling spatio-temporal phenomena, which encompasses a large class of types of data and modeling problems. From a niche statistical science in the 1970s, it has now evolved into a widely applicable and much used set of practical tools. The increased acquisition of massive spatiotemporal data and the vast increase in computational power is at the root of a renewed attraction to geostatistics. Fundamental societal problems, such as climate change, environmental destruction, and even pandemics, involve spatio-temporal data. Solution to these problems will require ingesting vast datasets while treating them in a repeatable, coherent, and consistent mathematical framework. Such coherent framework was first established by George Matheron (Matheron 1973), with the formulation of random function theory and extension of random variables to space-time. Space-time is more challenging than multivariate problems, since, as he discovered, mathematical models are required that address the special continuity that is space-time. While in multivariate analysis (Härdle and Simar 2015), covariance matrices are fundamental building block, geostatistics requires functions, such as covariance functions that need to be defined mathematically consistently (Matheron 1973), in as much as the covariance matrix needs to be positive definite to be useful in multivariate statistics.
Multiple Point Statistics
For the first three decades, geostatistics followed classical principles outlined in probability theory. Estimating spatiotemporal behaviors was cast as a least-square problem of estimating stochastic functions. The unbiased linear solution to this least square formulation of the estimation problem was termed kriging (Matheron 1963). Kriging is one of the most general forms of linear regression, where all (linear) dependencies are accounted for. Today we see a resurgence of kriging in Computer Science and Machine learning, in the form of Gaussian process regression (Rasmussen and Williams 2006). Regression does not provide an uncertainty quantification on a function. The estimated functions are smoother than the true functions because covariance between the function and the data is not equal to the covariance of the data itself. A regression estimate is not necessarily a reflection of truth or true behavior. The quest for “truth” arrived in the form of stochastic simulation (e.g., Lantuéjoul 2013; Deutsch and Journel 1998). Instead of one best guess function, the aim is to build realistic models of the real world, where realism is, at least initially, based on modeling statistical properties of the data. In traditional geostatistics, a key statistical property is the semi-variogram, which essentially described the average square difference in value between any two locations in space. A marriage was found between variograms (covariances) and the multivariate Gaussian distribution. The Gaussian assumption was essentially needed because it leads to convenient analytical expressions that involved the variogram. As we now know, much of statistics in the nineteenth and twentieth century was driven by analytical closed form expressions, simply because of lack of computer power. Parsimonious parametric models, such as multi-Gaussian random fields, require estimating just a few parameters. Today, machine learning involves estimating possibly millions of parameters from even more data. A much due makeover in Geostatistics was arriving. There was a cost to analytical convenience and that cost was physical plausibility and/or truthful behavior. Spatiotemporal models based on variance reflected the (linear) spatial correlation in the data but nothing more. The parsimonious part often led to a fairly narrow formulation of spatial uncertainty. As a result, geostatistical simulation models were often not “believed” by practitioners (Caers and Zhang 2004). This was certainly true in the natural sciences, for example, geology, where practitioners come with a background and prior knowledge about what they believe is truthful about spatial variation. In that context, spatial variation can be distinct and organized in a fashion that cannot be captured with linear models of correlations, such as variograms and Gaussian regressions. Nature is often organized in very specific geometries: forms of low-entropy type of variation and
Multiple Point Statistics
not high-entropy such as a Gaussian distribution with given covariance. In the late 1990s, it now became clear that just the measured data of the phenomenon at the study area of interest may not be sufficient and that additional prior information is required (Strebelle 2002; Guardiano and Srivastava 1993). And, alternatively, that so much data is available that parametric models can no longer describe their statistical complexity. The analogy to machine learning is now clearly visible: a model should be trained with data from “something else,” then applied, that is, generalized to the problem at hand. Since geostatistics deals with spatio-temporal data, the training data would need to be of that kind, somewhat exhaustive in space-time. The training image was born. The word “image” here reflects the spatial coherence of such data, just like the covariance function is mathematically consistent. Training “images” can be 4D (space-time). Training images are by definition “positive definite,” simply because they exist in the real world: every single statistic extracted from a training image, such as an exhaustive experimental variogram is, by definition, positive definite. With the advent of the training image, or some very large exhaustive dataset of space-time, came the decrease in the need for parsimonious parametric models. In fact, most algorithms in MPS, but not all, are decidedly nonparametric: the data is driving the model. Because now a large amount of data are available, statistical properties other than the variogram became accessible, simply because of the increase in frequency of observations. MPS algorithms described in the next section rely heavily on this fact. Calculations can now be made of the probability of occurrence at some location, given any values in the neighborhood (space-time), without a probability model, simply by counting. Alternatively, comparisons can be made between some pattern of data in the study area of interest and the patterns in the training image. This comparison, in terms of “distance,” leads to the birth of many MPS algorithm, some of which mimic ideas in computer graphics such as texture synthesis (Mariethoz and Lefebvre 2014). Applications have always driven development in geostatistics. MPS was driven by such need as well as the explosion of data and computational power. While MPS focused initially on oil and gas applications (a funding reason), it has lately expanded to many fields of the Earth Sciences and beyond. The oil and gas application is worth mentioning because of the direct link between the realistic depiction of geological geometries and realism in the multiphase flow oil/water/gas in the subsurface (Caers 2005; Pyrcz and Deutsch 2014). Real-world applications drove the development of MPS. Current needs are also driven by ease of use. Fitting higher-order parametric statistical models (Mustapha and Dimitrakopoulos 2011) is extremely challenging even for a statistical expert, let alone the thousands of possible
961
nonexpert users. A certain form of expertise is needed. Current work in science is more going towards artificial intelligence, meaning procedures and processes that allow for relatively easy automation without the presence of the expert. The 1970s expert-based way of doing geostatistics is gradually diminishing simply due to the current technological revolution. MPS algorithms now are evolving to this form of self-tuning (e.g., Meerschman et al. 2013; Baninajar et al. 2019), because they rely on relatively small amounts of tuning parameters that can be learned from examples. In these types of algorithms, there is no need to assess convergence in a Markov chain, or worry about positive definite functions. This allows the domain of application to continue growing as more people discover the richness and convenience that MPS approaches bring to their real world.
Methods Simulation methods aim at generating multiple renditions of a random function, honoring conditioning data and reproducing a spatio-temporal structure, as depicted in the training information. In some cases, the goal is one of realism, where the reproduction of exact statistics of pattern configurations may not be a priority. In other cases, the goal is to match as closely as possible the multiple-point pattern frequencies, while respecting conditioning data. In order to solve this problem, various approaches have been developed over the last 30 years, from methods that attempt to solve this analytically, to methods that rely on the heuristics of a specific algorithm to define the higher-order patterns found in the resulting realizations. A review of all the methods is available in the book by Mariethoz and Caers (2015). In this section, we provide some basic definitions and then dive into a high-level description of different methods and their differences. Definitions MPS simulation methods aim at reproducing patterns consistent with some reference information, often a training image, thereby characterizing the spatial (or spatio-temporal) continuity of one or more variables. Patterns are specific configurations of values and locations/time (Fig. 1). Furthermore, in practice, the pattern is composed of conditioning data, which may include previously simulated nodes, and the simulated node or patch, where values need to be assigned. This is referred to as a data event (Fig. 2). In order to inform the simulated node or patch, the training image is used to identify the frequency of values taken by the simulated nodes (red nodes in Fig. 2). An exact match or a similar pattern may be sought after, however, nodes within the simulated patch
M
962
Multiple Point Statistics
x + h1
x + h2
Position: x
x + h4
x + h3 Time: t + t 1
x + h1 x + h5
x + h2
Position: x x + h6
x + h3 Time: t Multiple Point Statistics, Fig. 1 The concept of a pattern: the pattern is defined within its domain (the grey area) by the specific data values taken by one or more variables at multiple locations and times. In the example, values can be “black” or “white,” representing two categories. Locations are defined relative to a specific position in space and time (in this case, the center x, at time t)
should be matched exactly, or at least more closely, as these are key to avoiding artifacts in the simulation result. The continuity of the spatio-temporal information is inferred from the training data, by recording the occurrence/ nonoccurrence of a specific data event, by sampling or scanning the training image. This provides information about possible values the simulated node or patch may take. Simulation Methods A large variety of approaches exists to generate simulated models accounting for multiple-point statistics. As discussed in Sect. 1, most of these methods are driven by the algorithm; however, there are a few exceptions where the random function is analytically defined (model-based approaches) and a full description of the mathematical model that characterizes the statistical and spatial frequency of patterns is available. Algorithm-based methods, on the other hand, do not explicitly define the probability model. The conditioning data and
Multiple Point Statistics, Fig. 2 The concept of data event: the data event defines the relationship between the location (in space and time) of the conditioning values and the node or patch (as in this example) to be simulated. In this case, the patch to be simulated consists of 3 by 3 pixels, from which two are already informed. In addition to these two nodes, 11 additional conditioning points are available, some of them from a previously simulated patch (at the bottom of the domain)
frequency of different patterns, inferred from the training information, impose feature in the simulated models, but some features are a direct result of the algorithm. Model-based Methods
Among model-based methods, Markov Random Fields (MRF) (Tjelmeland and Besag 1998) can be used to reproduce joint and conditional distributions of categorical variables by defining the interactions between pixels in the simulation grid. Reproduction of large-scale continuity can be achieved by imposing high-order interactions (beyond pairwise relationships), over a limited neighborhood. The conditional probability at a specific location xi conditioned to the data event, p(z(xi)|N(xi)), is modeled as an analytical function, usually an exponential function (parametric model), from which the spatial law of the random function can be fully defined (up to a normalization constant). Conditional distributions can also be written as an exponential function: pðzðxi ÞjN ðxi ÞÞ exp
V c ðzc Þ c:xi C
where zc is a clique (a set of connected nodes within the pattern), Vc is a potential function that defines the parametric interactions between cliques. The challenge of MRF is to determine which cliques should be used and how to infer
Multiple Point Statistics
the parameters θ of the potential functions. This is approached in theory by maximum likelihood; however, in practice it requires Markov chain Monte Carlo sampling, in order to approximate the likelihood function. Markov mesh models (MMM) are a subclass of MRF that solve some of these implementation issues. They essentially work as a two-or three-dimensional time series. The conditioning path is kept constant, that is, a unilateral path is used, thus simplifying the problem and limiting the clique configurations. Furthermore, the potential functions can be parameterized by using generalized linear models, with parameters inferred by maximum likelihood estimation. A second family of methods under the model-based category builds upon the concept of moments of the random function. Instead of limiting the inference and simulation to second-order moments as in Gaussian simulation, higherorder moments are considered. These are called spatial cumulants. This allows characterizing random functions with high-order interactions that depart from Gaussianity, thus imposing particular structure and connectivity of patterns. As with traditional variograms (or covariances), inference becomes an issue, thus requiring abundant training data. Algorithm-based Methods
Many of the algorithm-based methods work under the sequential simulation framework. This means that every new simulated node or patch is conditioned by the hard data and previously simulated nodes within a search neighborhood. The typical sequential framework is summarized in Algorithm 1. Algorithm 1 The sequential simulation framework Inputs: 1. Variable(s) to simulate 2. Simulation entity: pixel or patch 3. Simulation grid and search neighborhood size 4. Training data 5. Conditioning points 6. Auxiliary variables 7. Algorithm parameters Algorithm: 1. Assign conditioning data to closest nodes in simulation grid 2. Visit nodes according to a path 2.1 Search for conditioning data in search neighborhood 2.2 Compute the conditional probability of the variable(s) in the simulation entity conditioned by the data and previously simulated values in the neighborhood (and constrained also with auxiliary variables) 2.3 Draw a value (pixel or patch) from the conditional distribution and assign to location 3. Postprocess realization Outputs: 1. Simulated grid fully informed, honoring conditioning data and replicating patterns inferred from training data
963
Algorithms vary in the way they handle conditioning to data, inferring of the conditional distribution of node or patch values, and storing frequencies of nodes or patterns conditioned to different data events. The simplest approaches involve simulating a single node, noted pixel-based. Nodes are visited in a random path and at every uninformed location, a search is performed for nearby informed nodes (either conditioning data or previously simulated). A data event is then searched and retrieved from the training data. Pixel Based Sequential Methods
The simplest method called Extended Normal Equation Simulation (ENESIM) directly scans the training image to determine the frequency with which the uninformed node to be simulated takes different values when conditioned by the data event (Guardiano and Srivastava 1993). This rather inefficient approach of scanning the training image every time can be improved by prescanning the training image and storing the different data events in a data structure for efficient retrieval. The Single Normal Equation Simulation (SNESIM) does this by using a search tree (Strebelle 2002), while the Improved Parallel List Approach method (IMPALA) uses a list (Straubhaar et al. 2011). Direct sampling (DS) replaces the inference of the full conditional distribution by finding a single case and using it as a direct sample of it (Mariethoz and Caers 2015). Patch Based Sequential Methods
Other approaches seek faster results and a better reproduction of textures, by simulating a complete patch instead of a single node. The process is similar to the one for pixel-based methods. The specific data events as related to the patch to be simulated are inferred from the training image and stored either individually, grouped with a clustering technique or in some kind of data structure. Stochastic Simulation with Patterns (SIMPAT) was originally conceived to perform this task creating a database of patterns, where the most similar to the data event is searched (using a distance metric). This makes it computationally expensive, which can be solved by clustering patterns that are similar into prototypes. This is done in several methods, where patterns are clustered after applying a dimensionality reduction technique. FILTERSIM uses filters to reduce the dimension of the feature space and then identify the most similar prototype to the data event. Based on this, a patch is drawn from the patterns within a prototype family. DISPAT performs a multidimensionality scaling (MDS) and clustering is done over this reduced space. WAVESIM uses wavelets coefficients for the pattern classification task. Another variation of SIMPAT is provided by GROWTHSIM that uses a random path that starts at conditioning data locations and grows from there.
M
964
Pixel and Patch Based Unilateral Methods
Following ideas from texture synthesis (and similar to MMM), some methods use a unilateral path, instead of proceeding in the traditional sequential fashion with a random path. PATCHWORK simulation computes transition probabilities between patches in the unilateral path from the training image and draws from them depending on the data event, discarding patches that do not honor conditioning data in the nodes of the patch to be simulated. This approach is extended in cross-correlation based simulation (CCSIM), where the difference between patterns is computed rapidly by using a convolution with a cross-correlation distance function (Tahmasebi et al. 2014). Patches are drawn from the most similar set that matches the conditioning data. The patch can be split if no match is found, reducing the size of the simulated patches, thus simplifying the problem. Image Quilting (IQ) extends the CCSIM framework by optimizing the overlapping area between patches, using a minimum boundary cut. Optimization Based Methods
Simulated annealing (SA) and Parallel Conditional Texture Optimization (PCTO-SIM) are optimization-based approaches (Mariethoz and Caers 2015). Both methods work by starting with the simulation grid filled randomly, except at conditioning locations, where the values are preserved. In SA, an objective function is defined to compute the mismatch between the current state of the simulation and the target statistics. These statistics are related to the frequency of patterns, which are inferred from a training source. A change in the current model is proposed, either by perturbing a node or swapping pairs of nodes, and the objective function is updated. Changes are conditionally accepted following the Boltzmann distribution, that is, favorable changes, those that make the simulation closer to the target, are always accepted, while unfavorable changes are accepted with probability. The probability of this acceptance is reduced as the simulation progresses. In the case of PCTO-SIM, iterations over the initial grid are applied in order to update patches that lead to the minimization of the mismatch function. This is resolved with an expectation-maximization algorithm, which is used to update patches of the simulation. SA can be extremely slow and depending on the definition of the objective function, fail at reproducing realistic representations of the phenomenon at the end of the simulation. More recent methods based on deep neural networks integrate the image reconstruction capabilities of deep learning methods in the spatial context. Constraining Models to Data Up to this point, we have discussed the simple case where conditioning data is a hard data, that is, a sample value without uncertainty at the same support of the simulation pixel, and that measures the same variable being simulated.
Multiple Point Statistics
However, many of the methods described in the previous section can be extended to deal with multivariate data, where both the simulation grid and the training image need to contain multiple correlated variables. In this case, the direct and multivariate distributions are relevant (both statistically and over space/time). The extension to the multivariate case brings inference problems due to the combinatorial nature of the problem. It becomes harder to find specific patterns and every comparison becomes more expensive. The case of dynamic data related to flow properties and geophysical information bring a different challenge. These variables are related to the simulated variables, but in an indirect manner, often through nonlinear relationships. Adhering to the relationship between the primary variable and the nonlinear information requires a different approach, often done with a Bayesian approach, requiring the definition of a likelihood function. Implementing such an approach requires specifying the prior distribution and then sampling from the posterior distribution according to Bayes’ formulation. The prior can be based on training images; however, training-image based priors need to be consistent with data. If there is inconsistency, the sampling process may not converge to the posterior distribution.
Training Images Requirements for a Training Image Obtaining adequate training images is one of the most fundamental requirements to be able to use MPS, and at the same time it can be the largest hurdle. MPS simulation methods extract information from the training image and store it in different forms (e.g., a search tree for SNESIM, a list for IMPALA or even keeping the training image in its original image form for DS). Regardless of storage format, the stored training information is thereafter used equivalently as a model in the simulation process. Therefore, whichever simulation algorithm is chosen, it can only perform as well as the quality of the training image it is provided. This begs the question of defining what a good training image is. We define below three essential criteria, namely: stationarity, diversity, data compatibility. A first requirement is that the training image should be stationary. This means that all areas of the training image represent similar statistical characteristics, otherwise it would be virtually impossible to calculate meaningful higher order frequencies or conditional probabilities. Stationarity in the traditional geostatistical sense, such as second order intrinsic stationarity, is now extended to higher order statistics. While there are ways of using nonstationary training images, those work by segmenting the nonstationary training image into several subdomains that are themselves stationary. This segmentation can take the form of well-defined zones
Multiple Point Statistics
965
(de Vries et al. 2009) or continuously varying features that are described with an auxiliary variable, which defines correspondences between areas in the training image and in the simulation (Chugunova and Hu 2008). However, in all cases each segmented zone should be itself stationary, meaning that it must present a degree of recurrence of similar statistical characteristics. The second requirement is that the training image should be diverse, that is, large enough to capture the properties of interest. But here the term “large” should not be taken in terms of size (e.g., number of pixels or voxels), but in terms of information content. Specifically, one can use a relatively small training image if the goal is to simulate simple patterns. But for complex and diverse structures, the training image needs to represent the entire range of features and their variability. Essentially, the features of the training image have to be replicated sufficiently in order to represent their diversity. If those features are large, the training image has to be large enough to contain a sufficient amount of them. One way of increasing the diversity of patterns in a training image is to use transform-invariant distances (Mariethoz and Kelly 2011), which allow considering not only the patterns present in a training image, but also the ensemble of their transformations (e.g., rotations, affinity transforms). Finally, the third requirement is that a training image should be compatible with data are available for the simulated domain. Most practical cases involve conditional simulation, and therefore, such data exist. It can be, for example, scattered point measurements in the case of simulating rainfall intensity based on in situ rain gauge measurements, line data if simulating a 3D geological reservoir with information from wells, or it can be a known secondary variable which has a given relationship with the variable to be simulated, such as in geophysics. In such cases, the statistical properties of the simulated variable should correspond to the statistics of the data, and here again such requirement goes beyond mean and
variance, but can include higher-order properties such as connectivity or the frequency of patterns. Validating a training image for data compatibility can be achieved in several ways. Often, one would compute the statistical properties of interest on the dataset and on a candidate training image. This can take the form of variograms (for point data), connectivity functions (if the data are spatially continuous, a requirement for computing connectivity metrics), or patterns frequencies (using specific patterns comparison algorithms that assess the frequency of spatial patterns in both training image and data). Examples of pattern comparison algorithms are given in Boisvert et al. (2010), Pérez et al. 2014), and Abdollahifard et al. (2019). To illustrate our stated requirements, Fig. 3 provides some examples of inadequate training images, which are individually commented in the figure caption.
Multiple Point Statistics, Fig. 3 Example of inadequate training images. From left to right: (a) A very repetitive training image that is overly redundant and could be reduced to a fraction of its size without loss of information. (b) A nonstationary training image where features on the right and left are fundamentally different. Such an image could however be used by defining stationary subdomains. (c) A nonstationary
image with a continuous gradation of colors and orientations implying almost repetitivity. (d) Circular structures resulting in nonstationarity. May be usable with an auxiliary variable that defines where each feature should be mapped in the simulation, or by using rotation-invariant distances
Data-Based Training Images There are two main ways of obtaining adequate training images for a given simulation problem. The first one is in situations where large amounts of spatially registered data are available, such that the data themselves can be used as training image. Typical data-rich applications where data-based training images are common include geophysics, remote sensing, or time series analysis. Such cases can involve processes that repeat themselves in space (e.g., geophysical cross-sections across a valley, where measured transects are assumed to be statistically similar to unmeasured transects) or repetitions in time (e.g., repeated acquisitions of remote sensing imagery, where previous acquisitions are similar to acquisitions of the same area that have not yet taken place), or even a combination of both (e.g., a spatial network of weather stations that produce several meteorological time-series that show recurrent patterns in both space and time). One issue with data-driven training image is that direct observation of natural processes often does not correspond to
M
966
the requirements outlined above. While pattern diversity can be attainable if large amounts of data are at hand (e.g., with Earth observation satellites that generate daily terabytes of spatialized imagery), stationarity and data compatibility are often more problematic. Natural processes are rarely stationary, often requiring detrending techniques to make them amenable for use as training images. The detrending can take many forms, depending on the type of nonstationarity to be addressed. One solution for large-scale trends is to separate the training image in two components: a deterministic trend and a residual, much like the traditional way of doing geostatistics. Under the assumption that such a decomposition is sufficient to describe the trend in the data, the residual component is then considered as stationary and used as training image (Rasera et al. 2020). Such a decomposition is common in time series simulation, where the trend can, for example, describe seasonal fluctuations in the mean and in the standard deviation of the data of a variable. Data compatibility can also be an issue when using databased training images. The typical recommendation is to find data that are representative of the modeled process; however, this can be difficult in practice. The training image generally does not have exactly the same statistical properties as the data. In many cases, this can be resolved by applying a transformation to the data, such as a histogram transformation, which results in the training image having the same distribution as the available data. Sometimes however the data transformation needs to be more complex. For example,
Multiple Point Statistics
one may want to simulate the spatial features of a deltaic channelized river, but the only available training image has been acquired in high-water conditions and therefore the channels are broader than desired. In this case, a morphological transformation of the training image may be more appropriate, for example, by applying erosion or dilatation operators. Interactive image transformation techniques have been proposed to either alter the proportion of categories in an image or “warping” an image to change its morphological characteristics while preserving the overall topology (Straubhaar et al. 2019). This is illustrated in Fig. 4. Construction of Training Images In many applications, the data are too scarce to provide datadriven training images. A typical case is the modeling of subsurface reservoirs, where the only data are often borehole cores which provide lithological information along 1D lines that represent only a very small proportion of the entire reservoir 3D volume. In these cases, using MPS requires to build a training image based on whatever information is available. The common approaches to achieve this often involve using other geostatistical methods (i.e., other than MPS) that can produce a realistic representation of the phenomenon to model. Several geostatistical methods are less data-hungry than MPS and still able to depict a high level of complexity. For example, object-based approaches have been commonly used to create 3D training images in reservoir modeling.
Multiple Point Statistics, Fig. 4 Illustration if image warping. (a) input image, (b) and (c) image after expansion and compression, (d) proportion of each category. (Figure from (Straubhaar et al. 2019))
Multiple Point Statistics
Another way of constructing training images is by using process-based models. These models use physical laws to model the process of interest, for example, sediment transport (Sun et al. 2001) or terrain erosion (Braun and Willett 2013) or fluid flow in fractures (Boisvert et al. 2008). This is generally done by solving computationally demanding systems of partial differential equations. Process-based models are often deterministic, and in any case so expensive that it is often not possible to obtain more than one or a handful set of realizations, let alone to condition them to available data. MPS is then the ideal tool to use these few models as training images and expand them into a larger set of realizations that are statistically representative and moreover conditioned (Comunian et al. 2014; Comunian et al. 2014; Jha et al. 2015).
Applications As in Part I, we will provide here more of a historical timeline of applications, partly to illustrate how MPS grew and how more applications entered the scene, how diversity of algorithms became linked to diversity of applications. Without doubt, the primary application of MPS was in reservoir modeling (Oil & Gas; Caers et al. 2003; Caers and Zhang 2004). Reservoir modeling (Pyrcz and Deutsch 2014) requires building multiple realizations of several important properties such as lithology, porosity, and permeability. The purpose of these models is either appraisal or production planning. The latter requires simulation of multiphase flow. Flow simulation is sensitive not just to the magnitude in permeability but to the contrast between high and low permeability. This contrast, in the subsurface, offers itself in specific geological shapes, such as lithological or architectural elements, or due to diagenesis. The connectivity of high (or low) permeability is poorly characterized by Gaussian distributions with given covariance (the high entropy model). Often, multiple-point geostatistics offered a compromise between Boolean and pixel-based methods. Boolean methods are more difficult to constrain to data; hence, they can serve as training images once the objects are pixelized on a grid. In terms of approach, oil and gas application very much followed the “constructed” training image approach (see sect. 3). Well data (considered hard data) are scarce, yet geological interpretation based on them (as well as seismic) are very specific. Uncertainty in such interpretation was handled by proposing multiple training image. Validation of training images was often needed when integrating production data. Specific methods for training-image based inversion of flow data were developed (Caers 2003). To date, a range of algorithms are used in actual applications: SNESIM, DS, IQ. Like discussed in the previous section, there was initially confusion around the need for a stationary training. Model based on dense outcrop data or generated using
967
process methods are by very definition nonstationary. Either stationary training images are used in combination with seismic data, which offers a nonstationary trend in petrophysical properties (Caers et al. 2001), nonstationary images are accompanied by auxiliary variables (Chugunova and Hu 2008) (Fig. 5). Groundwater modeling offers similar challenges compared to oil and gas, namely, in the realistic representation of subsurface heterogeneity. A more general similarity emerges, namely, these are problems with very few specific data (hard data, wells) and very specific information on variability of spatial geometries. A second type of application was then emerging which presents the exact opposite problem: very dense data. With very dense data, we enter the realm of geophysical and remote sensing data. Such data sources are easier to get exhaustively than drilling wells and require either passive monitoring (remote sensing, gravity) or active source monitoring (electrical resistivity, seismic). Two types of problems occur in which MPS has started to play a role: data gap filling and downscaling. Satellite data may present gaps because of cloud cover and orbital characteristics. Traditional approaches by filling gaps using kriging (Cressie et al. 2010) poses many problems. First, kriging provides only a smooth representation of actual data variability. Second, a computational problem occurs because of the need to invert very large covariance matrices. MPS provides a natural solution in that it uses the data itself to derive statistics to fill the gaps. Here, the neighboring data is the “training image,” although there is no explicit construction of such image. Gap filling is also needed with (subsurface) geophysical data. Often such data are collected along acquisition lines with certain line spacing. These linesspaced data then need to be gap-filled to get a 2D or 3D realistic representation of the subsurface. Most geophysical inversion methods smooth the line data. Downscaling constitutes a set of problems whereby coarse scale data or models are transposed to fine-scale resolution. This problem is not uniquely defined: given an upscaling process, one can devise many downscaled models from a single coarse model. Downscaling can be addressed in two ways. First, one may have areas (or space-time blocks) with both high-resolution and low-resolution data, which can be used as training data. Alternatively, one may have constructed, using physical models, high-resolution models that are then upscaled (another physical process) into lowresolution versions. An example presented here is the downscaling of topographic data in Antarctica. Other applications worth mentioning are in mining. Mining is characterized by the availability of dense drill-hole data; hence, training images based on dense data can be constructed, then used in sparser drilled areas. MPS has also
M
968
Multiple Point Statistics
Multiple Point Statistics, Fig. 5 One of the first real-world application of MPS, generation of a reservoir model using a training image, constrained to well and probability maps deduced from seismic. (From Caers et al. 2003)
applications in temporal modeling such as the modeling of rainfall time-series (Fig. 6).
Summary and Conclusion We presented a historical overview of multiple-point geostatistics (MPS) as well as some ongoing trends in this area. Fundamental to multiple-point geostatistics is recognizing that spatial covariance or variogram-based methods fail to capture complex spatio-temporal variation. Training images, either as constructed formats or as direct real-world observations, are explicit expressions of such variations. The aim of MPS algorithms is to capture and reproduce essential higherorder statistics from training images, at the same time, conditioning to a variety of data. Algorithms can be divided into two groups: model-based or data-driven. Model-based algorithms rely on the traditional probability theory framework while data-driven method share ideas with machine learning
and computer graphics. Training images are required to have certain characteristics to be useable with the defined MPS framework: stationarity, diversity, and data compatibility. Stationarity is required to be able to borrow meaningful statistical properties. Diversity is needed to have enough quantity of such meaningful statistical information and compatibility means that the statistical properties cannot contradict with actual conditioning data. The application of MPS methods and ideas continues expanding. From the initial subsurface focus, applications have grown to address the need of today’s societal challenges: the Earth’s landscape, climate, and the environment. The fundamental idea of MPS and the shift it brought to geostatistics will be permanent. The algorithms, data, and scope of applications will continue to grow, for example, more recent publications have started to focus on generative models that replace training images with neural networks. An example is the Generative Adversarial Network (GAN) that can be trained once and possibly ported to a variety of application with minimal modification.
Multiple Point Statistics
969
M
Multiple Point Statistics, Fig. 6 Gap-filling the digital elevation in Antarctica. (a) Bed elevation model of Antarctica; (b) study area; (c) the kriging realization; and (d) kriging (e) training data (f–g) Direct sampling realizations. (From Zuo et al. 2020)
Cross-References
Bibliography
▶ Geostatistics ▶ High-Order Spatial Stochastic Models ▶ Markov Random Fields ▶ Pattern ▶ Pattern Analysis ▶ Pattern classification ▶ Pattern recognition ▶ Simulation ▶ Spatial statistics ▶ Spatiotemporal
Abdollahifard MJ, Baharvand M, Mariéthoz G (2019) Efficient training image selection for multiple-point geostatistics via analysis of contours. Comput Geosci 128:41–50 Baninajar E, Sharghi Y, Mariethoz G (2019) MPS-APO: a rapid and automatic parameter optimizer for multiple-point geostatistics. Stoch Env Res Risk A 33(11):1969–1989. https://doi.org/10.1007/s00477019-01742-7 Boisvert JB, Leuangthong O, Ortiz JM, Deutsch CV (2008) A methodology to construct training images for vein-type deposits. Comput Geosci 34(5):491–502 Boisvert JB, Pyrcz MJ, Deutsch CV (2010) Multiple point metrics to assess categorical variable models. Nat Resour Res 19(3):165–175
970 Braun J, Willett SD (2013) A very efficient O(n), implicit and parallel method to solve the stream power equation governing fluvial incision and landscape evolution. Geomorphology 180–181:170–179 Caers J (2003) History matching under a training image-based geological model constraint. SPE J 8(3):218–226, SPE # 74716 Caers J (2005) Petroleum Geostatistics. Society of Petroleum Engineers, Richardson, 88 pp Caers J, Zhang T (2004) Multiple-point geostatistics: a quantitative vehicle for integrating geologic analogs into multiple reservoir models. In: Grammer GM, Harris PM, Eberli GP (eds) Integration of outcrop and modern analog data in reservoir models, AAPG memoir 80. American Association of Petroleum Geologists, Tulsa, pp 383–394 Caers J, Avseth P, Mukerji T (2001) Geostatistical integration of rock physics, seismic amplitudes, and geologic models in North Sea turbidite systems. Lead Edge 20(3):308–312 Caers J, Strebelle S, Payrazyan K (2003) Stochastic integration of seismic data and geologic scenarios: a West Africa submarine channel saga. Lead Edge 22(3):192–196 Chugunova T, Hu L (2008) Multiple-point simulations constrained by continuous auxiliary data. Math Geosci 40(2):133–146 Comunian A, Jha S, Giambastiani B, Mariethoz G, Kelly B (2014) Training images from process-imitating methods. Math Geosci 46(2):241–260 Cressie N, Shi T, Kang EL (2010) Fixed rank filtering for spatiotemporal data. J Comput Graph Stat 19(3):724–745 de Vries L, Carrera J, Falivene O, Gratacos O, Slooten L (2009) Application of multiple point Geostatistics to non-stationary images. Math Geosci 41(1):29–42 Deutsch CV, Journel AG (1998) GSLIB: geostatistical software library and user’s guide. Oxford university press, New York Guardiano F, Srivastava M (1993) Multivariate geostatistics: beyond bivariate moments. In: Soares A (ed) Geostatistics-Troia. Kluwer Academic, Dordrecht, pp 133–144 Härdle WK, Simar L (2015) Applied multivariate statistical analysis. Springer, Berlin/Heidelberg Jha S, Mariethoz G, Mathews G, Vial J, Kelly B (2015) A sensitivity analysis on the influence of spatial morphology on effective vertical hydraulic conductivity in channel-type formations. Groundwater Lantuéjoul C (2013) Geostatistical simulation: models and algorithms. Springer Science & Business Media. Springer, Berlin, Heidelberg Mariethoz G, Caers J (2015) Multiple-point Geostatistics: stochastic modeling with training images. Wiley, Hoboken, 374 pp Mariethoz G, Kelly BF (2011) Modeling complex geological structures with elementary training images and transform-invariant distances. Water Resour Res 47(7):1–14 Mariethoz G, Lefebvre S (2014) Bridges between multiple-point geostatistics and texture synthesis: review and guidelines for future research. Comput Geosci 66(0):66–80 Matheron G (1963) Principles of geostatistics. Econ Geol 58:1246–1266 Matheron G (1973) The intrinsic random functions and their applications. Adv Appl Probab 5(3):439–468 Meerschman E, Pirot G, Mariethoz G, Straubhaar J, Van Meirvenne M, Renard P (2013) A practical guide to performing multiple-point statistical simulations with the direct sampling algorithm. Comput Geosci 52:307–324 Mustapha H, Dimitrakopoulos R (2011) HOSIM: a high-order stochastic simulation algorithm for generating three-dimensional complex geological patterns. Comput Geosci 37(9):1242–1253 Pérez C, Mariethoz G, Ortiz JM (2014) Verifying the high-order consistency of training images with data for multiple-point geostatistics. Comput Geosci 70:190–205 Pyrcz MJ, Deutsch CV (2014) Geostatistical reservoir modeling. Oxford University Press, New York
Multiscaling Rasera LG, Gravey M, Lane SN, Mariethoz G (2020) Downscaling images with trends using multiple-point statistics simulation: an application to digital elevation models. Math Geosci 52(2):145–187 Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge, MA Straubhaar J, Renard P, Mariethoz G, Froidevaux R, Besson O (2011) An improved parallel multiple-point algorithm using a list approach. Math Geosci 43:305–328 Straubhaar J, Renard P, Mariethoz G, Chugunova T, Biver P (2019) Fast and interactive editing tools for spatial models. Math Geosci 51 (1):109–125 Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Geology 34(1):1–22 Sun T, Meakin P, Jøssang T (2001) A computer model for meandering rivers with multiple bed load sediment sizes 2. Computer simulations. Water Resour Res 37(8):2243–2258 Tahmasebi P, Sahimi M, Caers J (2014) MS-CCSIM: accelerating pattern-based geostatistical simulation of categorical variables using a multi-scale search in Fourier space. Comput Geosci 67(0): 75–88 Tjelmeland H, Besag J (1998) Markov random fields with higher-order interactions. Scand J Stat 25:415–433 Zuo C, Yin Z, Pan Z, MacKie EJ, Caers J (2020) A tree-based direct sampling method for stochastic surface and subsurface hydrological modeling. Water Res Res 56(2). p.e2019WR026130
Multiscaling Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition In several natural resource and environmental applications, e.g., mining, there exists a high degree of uncertainty in the information in the scenario (Sagar et al. 2018). This uncertainty can be addressed using scale-dependent phenomena. Scale in simple terms refers to the scope of observations in the respective physical dimensions, such as spatial length (Euclidean space) and time, or related spatial dimensions, such as image space. Multiple geological features occur at different scales in time and space, namely tectonic objects (faults, joints, etc.), sedimentary objects (bedding structures, etc.), intrusive and effusive objects (lava flows, dykes, etc.), and epigenetic objects (metamorphic units, etc.). They range in spatial scales of micrometers to kilometers, and the geophysical processes range in time scales of microseconds to millions of years. Multiscaling is used to capture information at different scales where specific type of locality can be defined, e.g., spatial locality or temporal locality. This is especially necessary to faithfully model multiresolution data, which are
Multiscaling
generated or acquired at multiple resolutions, by design. An illustrative example of multiscaling is in determining instantaneous fiction and predicting time to failure in tectonic fault physics of slow earthquakes at the laboratory scale, and at the field scale (Bergen et al. 2019). The fault generated in the laboratory is effectively a single frictional patch in a homogeneous system, and a field fault in the earth is an ensemble of several frictional patches, occurring in heterogeneous earth materials.
Overview The use of scale for studying physical models and simulations, and for analyzing data has been well known. However, this practice also contrasts with the ideal formulation of scaleindependent entities in mathematics, e.g., “point” (Lindeberg 1998). In modern data scientific applications, scale-space theory has been extended to nonconventional spatial regions, e.g., images. Such applications are important in the domain, as multiscaling is applicable for modeling and simulation of geoscientific processes, and for data analysis for observed as well as simulated data of all types, including images. Multiscaling in image processing and computer vision is used to address scale-dependent variations in perspective, and noise generated during the image formation process. Multiple scales naturally organize data in hierarchical structures, from coarsest to finest scales, i.e., from lowest to highest resolution or scope of information, presenting data in the form of a pyramid. Given the presentation of variability in the data, multiscaling also becomes a process by which uncertainty in the data and/or the data model can be alleviated. Uncertainty in data is owing to variability in the process that is being observed, which causes insufficiency in the sampling and/or quality of data. This type of uncertainty is called aleatoric or irreducible uncertainty. The other type of uncertainty stemming from the data model used owing to incomplete knowledge of the phenomenon is called epistemic or reducible uncertainty. Epistemic uncertainty may be reduced by adding more data. Generally, the aleatoric uncertainty tends to be scale dependent (Sagar et al. 2018), and multiscaling may be seen as a powerful tool to resolve this type of uncertainty. The widely used multiscaling methods define scale as a contextually appropriate parameter, and focus on the best method of aggregating the information acquired at different scales. For LiDAR (light detection and ranging) point cloud analysis, local neighborhood can be a spherical one (Demantké et al. 2011) or k-nearest neighbors (Hackel et al. 2016), in which case the radius of the sphere or the integral neighbor count k is taken as the scale, respectively. For feature extraction applications, the multiscale aggregation methods include using information at an optimal scale (Demantké et al.
971
2011), concatenating information from all scales as a long vector (Hackel et al. 2016), or performing an appropriate map-reduce operation on the information from the different scales, e.g., averaging (Sreevalsan-Nair and Kumari 2017). Thus, the multiscaling approaches perform filtering, concatenation, or transformations, respectively.
Applications There exist several different multiscaling applications in geosciences. Instead of an exhaustive survey, illustrative examples of multiscaling in spatial length, time, image space, and point clouds are discussed here. Transfer functions are mathematical functions used to navigate the hierarchical structures of scales, i.e., for upscaling or for downscaling. Consider the example of transfer functions used in hydropedology, which is a combined discipline of studying soil science (pedology) and hydrology to understand their interactions in the critical zone of the earth (Lin 2003). This involves studies at microscopic scale of length (1 mm 1 cm), mesoscopic scale (1 m), and macroscopic scale (1 km). The microscopic structures include pores and aggregates, mesoscopic ones include pedons and catenas, and macroscopic ones include watersheds and regional and global regions. These structures also reflect varied temporal structures, going from minutes-days, days-months, to months-years, respectively. Given that the natural physical processes involve multiple scales, the process modeling, simulation, and data analysis also tend to involve multiscale bridging in the discipline. The different scales are incorporated in hierarchical structures of soil mapping and soil modeling, and these are used in generating different pedotransfer functions (PTF). PTFs are mathematical functions or models which map the soil survey database (the domain) to the soil hydraulic properties (the co-domain), and are formulated in a nested fashion, using the scales. The goal of the hydropedological discipline in itself is to use these PTFs and other similar methods to connect the pedon-scale measurements (micro/mesoscopic) to the landscape-scale phenomena (macroscopic) in a three-way interconnection between soil structure, preferential flow, and water quality. Similarly, transfer functions have been used in a range of temporal scales of recharge and groundwater level fluctuations as input and output, respectively, to fractured crystalline rock aquifers, to quantify the aquifer recharge. Hyperspectral remote sensing data of different spatial and temporal scales has been used to monitor vegetation at laboratory (plot), field (local), and landscape (regional) levels. Different geometric resolutions in these images provide heterogeneity of vegetation indices, where the information between scales is harmonized by using transfer functions.
M
972
Coupling across different scales to communicate information is another one of the multiscaling methods. Consider the instance of interdisciplinary study of earth sciences involving interconnections between thermal, hydrous, mechanical, and chemical (THMC) behavior of the earth (Regenauer-Lieb et al. 2013). There exists a multiphysics model of THMC coupling across different scales, in both spatial length and time, which enables communication needed for the transition between scales. As an example, for earth resource processes, the spatial scales range from nanometers to 100 kilometers and the time scales range from femtoseconds to 1e+8 years. These are the scales involved for energy and mineral resource discovery, involving a geodynamic framework. One of the unifying concepts across scales is where the macroscale of one process is used as the microscale of a process at a higher level, thus facilitating multiscaling. Statistical mechanics and thermodynamics offer tools to solve such problems, e.g., macroscopic thermodynamic properties of a continuous medium are derived using stochastic modeling of multiple microscale interactions. The poor availability of order parameters, microscale intrinsic length scales, and energy-based interactions is a roadblock for using the concept as is. To overcome this, the larger scale entities use inputs computed using percolation renormalization at the microscopic scale through coarse graining, and an asymptotic homogenization at the macroscopic scale. Percolation theory has been conventionally used for determining global behavior, e.g., phase transitions, in a random heterogeneous system, e.g., randomly structured media. It is widely known that earth observations and analysis are sensitive to the scale of data acquisition (Zhao 2001). Thus, an important application of object detection in very-highresolution imagery entails estimation of scale parameters at different resolutions to segment the images appropriately. Segmentation provides the geometric representation of the spatial objects present in the region captured in the images. The scale parameter depends on local variance of the object heterogeneity in the scene as well as the rate of change of the local variance across consecutive scales. This is because the structural information of the region is derived from the pixel information, which encodes the object heterogeneity. This structural information is effectively obtained from the features extracted from the remotely sensed images. In addition to the hierarchical organization of scales and navigation using transfer functions, scale-space filtering is a commonly used method in feature extraction using multiscaling (Lindeberg 1998). The scale-space theory states that the smaller-scale structures disappear sooner than the larger-scale ones, when the scale-space representation is defined using smoothing. The smoothing is done using Gaussian or wavelet filters. The smaller-scale structures are also perceived as noise, thus, using the scale-space representation for denoising images or filtering noise from the images. This also works
Multiscaling
on the principle of persistence of significant features across multiple scales. An example application for feature extraction using multiscaling is for land cover–based image classification of LandSAT images.
Case Study: Feature Extraction in LiDAR Point Clouds Consider a deeper study on how multiscaling is used for feature extraction in point clouds acquired using airborne LiDAR or terrestrial LiDAR technology. Use of multiple scales in point cloud analysis is a well-established practice to address the uncertainty in environmental datasets. These features are further used for semantic classification, i.e., object-based classification of the point clouds. The classification is an important data analytic step in point cloud analysis. In point clouds, spatial locality is exploited using the neighborhood defined by the scale parameter. The local neighborhood itself can be determined using different approaches. The local neighborhood of each point in the point cloud has been defined in the state of the art using either the limiting shape of the neighborhood and/or the size of the neighborhood set. A neighbor of a point p is defined as a point which satisfies the distance criterion defining the local neighborhood of p. The different kinds of local neighborhoods generally used for point cloud analysis are spherical, cylindrical, cubical, and k-nearest neighborhoods. The distance used is usually the Euclidean distance or l2 norm, and in certain applications, its limiting case of Chebyshev distance or max-norm is used. The size of the local neighborhood is, then, used as the scale parameter. For instance, the radius of the spherical or cylindrical neighborhood is the scale parameter, so is the length of side in the cubical neighborhood, and the value of k in the k-nearest neighborhoods. Conventional workflow involves feature extraction at each scale for a set of consecutive scales, with lower and upper bounds (smin and smax), and the step size (Δs) of the scale parameter defined for each application. Determining the scales for effective feature extraction is in itself a difficult problem. Applications which use an optimal scale sopt for the final feature vector are not as critically dependent on the choice of [smin,smax,Δs] as the applications where feature vectors at all scales are used for further data processing. Before discussing how features across different scales are aggregated meaningfully for any application, the feature extraction at each scale must be discussed. For a given point and its selected neighborhood, the local geometric descriptor is computed. A widely used local geometric descriptor is the covariance matrix or tensor, C. It is a positive semidefinite second-order tensor (Sreevalsan-Nair and Kumari 2017), computed as the sum of outer product of distance vector between the point and each of its local neighbor. Tensor
Multiscaling
voting is an alternative for computing local geometric descriptors, as it generates positive semidefinite secondorder tensor fields. The nonnegative eigenvalues of C are used as features, upon its eigenvalue decomposition. There is also a variant of C, where the distances of the neighbors are computed with respect to the centroid of the local neighborhood, instead of the concerned point itself. This is effectively equivalent to performing a principal component analysis of the local neighborhood of the point. Thus, the eigenvaluebased features essentially capture the shape of the local neighborhood. The shape of the local neighborhood in itself gives the geometric classification of the concerned point to a linear, areal/surface, or volumetric feature (Sreevalsan-Nair and Kumari 2017). The eigenvalue-based features provide a probabilistic geometric classification, as these features are scale dependent. Overall, the eigenvalue-based features of a point are essential in interpreting the point in its spatial context. For semantic classification, height-based features of the point are also significant as the eigenvalue-based ones, thus making the best optimal count of significant features to be 21. Now, consider three different ways the feature vectors across multiple scales are used. The first method, which is a widely used method, is to determine an optimal scale where the eigenvalue-based features provide the minimum entropy (Demantké et al. 2011). The scale with lowest entropy is the scale at which the local neighborhood has the optimal shape, i.e., the local neighborhood has the least uncertainty in realizing the geometric class of the point. There is a variant of the method, where the optimal scale uses local minimum value instead of global minimum value in entropy. Overall, the method of aggregating multiple scales involves determining the optimal scale sopt, and using the feature vector computed at sopt. This is a conservative method, as the multiscale aggregation does not involve feature aggregation. The second method involves concatenation of multiscale features into a long vector (Hackel et al. 2016). This method is apt for both supervised learning using random forest classifiers as well as neural networks, for classification applications. The long feature vector is the most effective where locally significant features are determined and used, and is equivalent to the optimal scale method. The third method involves performing a mapreduce operation on the multiscale features. One way is to average the features, however the semantics of the features must be considered before performing the averaging operation (Sreevalsan-Nair and Kumari 2017). For instance, viewing different scales as mutually exclusive events justifies the averaging operation as the computation of the likelihood of the value of the concerned feature. The feature vectors computed using each of these three methods have been found to be effective for classification of different airborne and terrestrial laser scanned point clouds.
973
Future Scope Scale selection continues to be an important problem to solve for multiscaling. One such solution, for images, uses a general methodology for spatial, temporal, and combined spatiotemporal dimensions (Lindeberg 2018). Most often, the scale selection in image processing and computer vision is applied sparsely in regions of interest (ROIs). The solution of the general methodology, on the contrary, performs dense scale selection where local extrema, such as minimum entropy, are computed at all points in the image and at all time instances. However, using mechanisms involving local extrema poses a problem of the strong dependency on the local order used for finding the locally dominant differential structure. This problem of phase dependency of local scales can be reduced by performing either post-smoothing or local phase compensation. Machine learning (ML) methods have been applied to several problems in solid Earth geosciences (sEg), but with limited impact, owing to the limitations in data quality. Various physical systems seen in sEg are complex and nonlinear, implying that the data they generate are complex, with multiresolution structures in both space and time, e.g., fluid injection and earthquakes are strongly nonstationary (Bergen et al. 2019). Hence, the need of the hour is to generate more annotated data using multiscaling, which can model multiresolution data effectively. This can remove biases in frequent or wellcharacterized phenomena, thus improving the effectiveness of ML algorithms, especially those that rely on training datasets. Altogether, using these illustrative examples, multiscaling can be seen as a powerful tool that has been successfully used in several geoscientific applications, with an equally large scope for influencing future applications.
References Bergen KJ, Johnson PA, Maarten V, Beroza GC (2019) Machine learning for data-driven discovery in solid Earth geoscience. Science 363(6433):eaau0323 Demantké J, Mallet C, David N, Vallet B (2011) Dimensionality based scale selection in 3D LiDAR point clouds. Int Arch Photogramm Remote Sens Spat Inf Sci 38(Part 5):W12 Hackel T, Wegner JD, Schindler K (2016) Fast semantic segmentation of 3D point clouds with strongly varying density. ISPRS Ann Photogram Remote Sens Spatial Inf Sci 3(3):177–184 Lin H (2003) Hydropedology: bridging disciplines, scales, and data. Vadose Zone J 2(1):1–11 Lindeberg T (1998) Feature detection with automatic scale selection. Int J Comput Vis 30(2):79–116 Lindeberg T (2018) Dense scale selection over space, time, and spacetime. SIAM J Imag Sci 11(1):407–441 Regenauer-Lieb K, Veveakis M, Poulet T, Wellmann F, Karrech A, Liu J, Hauser J, Schrank C, Gaede O, Trefry M (2013) Multiscale coupling and multiphysics approaches in earth sciences: theory. J Coupled Syst Multiscale Dyn 1(1):49–73
M
974 Sagar BSD, Cheng Q, Agterberg F (2018) Handbook of mathematical geosciences: fifty years of IAMG. Springer International Publishing AG, Springer Nature, Gewerbestrasse 11, 6330 Cham, Switzerland. Sreevalsan-Nair J, Kumari B (2017) Local geometric descriptors for multi-scale probabilistic point classification of airborne lidar point clouds. In: Modeling, analysis, and visualization of anisotropy. Springer, Cham, pp 175–200 Zhao W (2001) Multiscale analysis for characterization of remotely sensed images. PhD thesis, Louisiana State University, Baton Rouge
Multivariate Analysis Monica Palma1 and Sabrina Maggio2 1 Università del Salento-Dip. Scienze dell’Economia, Complesso Ecotekne, Lecce, Italy 2 Dipartimento di Scienze dell’Economia, Università del Salento, Lecce, Italy
Synonyms Multiple analysis; Multivariable statistics; Multivariate statistics
Definition Multivariate analysis refers to a collection of different methods and techniques which allow studying the relationships characterizing data sets concerning two or more variables of the phenomenon of interest. These techniques are usually applied in the stage of exploratory data analysis, with the object to identify a reduced number of variables/factors which adequately describes the main characteristics shown by the data.
Multivariate Analysis in Geosciences In several studies of environmental sciences, the phenomenon of interest is often the result of the simultaneous interaction of different variables observed at some locations of the survey area. In this case, the available data present a multivariate spatial structure, and the study is usually conducted on both (a) the spatial correlation which characterizes each observed variable and (b) the relationships existing among the variables. The first aspect is directly connected to the spatial structure of the data (spatially closer locations have often similar values compared to observations recorded at more distant locations), while the second aspect is related not only to the spatial features, but also to the nature of the phenomenon and the physical-chemical characteristics that link the variables to each other.
Multivariate Analysis
Often in the study of natural phenomena, such as climatic and atmospheric conditions in meteorology, or the concentrations of pollutants in environmental sciences, the variables of interest are observed at different points of the survey area for several instants of time. In this case, the data set presents a multivariate space-time structure, and the analyses must consider jointly the space-time correlation which characterizes each variable, as well as the relationships existing among the variables. In geosciences, large data sets of multiple variables measured at several spatial or spatiotemporal points of the study area can be analyzed by using (a) the classical multivariate statistical techniques or (b) the multivariate geostatistical tools. Evidently, the research objectives are different even if complementary. Indeed, techniques of type (a) are usually applied in the stage of exploratory data analysis and most frequently as data reduction techniques; on the other hand, geostatistical tools are used with the aim of properly defining a model for the spatial/spatiotemporal multivariate correlation shown by the data and predicting the variables at unobserved points of the domain. It is worth pointing out that often in geosciences compositional data arises, which is multivariate data and consists of vectors of positive components subject to a unit-sum constraint (Aitchison 1982). In such a context, the use of traditional multivariate analysis, which have been developed for unconstrained data, is not suitable. Because of the properties of compositional data (Aitchison 1994), it is necessary to work in terms of the corresponding log-ratio vectors and consequently to apply compositional data analysis in a logratio space.
Multivariate Statistical Techniques The traditional multivariate statistical techniques, widely used for applications in social sciences, are usually applied in the stage of exploratory data analysis. A non-exhaustive list of multivariate techniques includes: • Principal component analysis (PCA), which linearly transforms the observed variables into few uncorrelated variables which maximizing the data variability • Canonical correlation analysis (CCA) that studies the relationships between two groups of variables representing different attributes of the same phenomenon • Correspondence analysis (CorA), which is usually applied for studying the association between categorical variables, representing the variable’s categories in a lowdimensional space • Cluster analysis (CA) that tries to identify homogeneous groups of observations such that the dissimilarity among groups is remarkable
Multivariate Analysis
975
• Discriminant analysis (DA), which allows the identification of a linear combination of the observed variables that best separates two or more classes of observed data Among these techniques, PCA is one of the most used because of its fast time of processing (from a computational point of view), as well as for the simplicity of results interpretation. Moreover, interesting insights into the variables’ exploration come from the results obtained through the CCA. The fundamentals of the above mentioned techniques of classical multivariate analysis are reported in the next sections. Main Aspects of PCA PCA is typically a technique for data dimensionality reduction: PCA transforms (linearly) several correlated variables into few unrelated variables which maximize the variance characterizing the data (Hotelling 1933; Lebart et al. 1984; Jobson 1992; Jolliffe 2002). PCA can be used also to understand possible relationships between variables, as well as to pre-process the data and eliminating redundancy caused by variables’ correlation, before variographical analysis and modeling. It is worth highlighting that in the PCA’s applications the data are considered apart from the spatial or spatiotemporal points in which they are collected. In this context, the sample points represent the cases. Let X be the (N r) data matrix with N observations recorded for r variables (obviously, r is much greater than 2). In other words, the columns of X refer to the variables under study, while the rows of X refer to the measurements of the variables at the spatial or spatiotemporal points. Through the PCA, the data matrix X is linearly transformed into an (N r) matrix Y, as follows: Y ¼ X Q, where Y, usually called score matrix, contains r orthogonal and zero mean components, while Q, called loading matrix, is the (r r) matrix of the eigenvectors of the following variance-covariance matrix S¼
1 ðX MÞT ðX MÞ, N
ð1Þ
where the (N r) matrix M is the matrix of the variables’ mean values. Matrix Q is determined such that its first column represents the direction with the maximum variance in the data, while the second column of Q represents the direction of the next largest variance and so on. Hence, PCA corresponds simply to the following eigenvalue analysis of the variance-covariance matrix S
S ¼ Q C QT , with QT Q ¼ I, where C is the diagonal eigenvalues matrix of S. Moreover, as S is a positive semi-definite matrix, its eigenvalues are all non-negative; thus by arranging the eigenvalues in descending order, it is possible to select those components which correspond to the greatest eigenvalues; these components are named principals since they explain most of the total variance in the observed data. Therefore, if r0 < r is the number of the chosen principal components which correspond to the highest eigenvalues, then the data matrix X is adequately approximated as follows T
X ffi YQ , where Y is the (N r0) matrix of r0 selected components and Q is the (r r0) matrix of the corresponding eigenvectors. Note that the variance-covariance matrix S in (1) can be used if the variables under study are expressed with the same measurement units; otherwise, when the measurement units of the variables differ in size and type, the observed variables need first to be standardized, and successively the standardized variables are used for the computation of the correlation matrix which is scale-independent. An interesting application of PCA on a multivariate environmental data set is in De Iaco et al. (2002). Main Aspects of CCA CCA is a classic multivariate statistical technique that studies the relationships between two groups of variables, which represent different attributes of the same phenomenon, for example, air pollutants and atmospheric variables, or concentrations of minerals and soil geologic features. By performing CCA it is possible to identify pairs of factors relating to the two groups of variables under study, among which the correlation is maximum (Lebart et al. 1984; Jobson 1992). Given the (N r) data matrix X with N measurements (matrix rows) for the r variables (matrix columns) under study, the following matrix X¼XM is considered for the CCA, where M is the matrix of the variables’ mean values. According to the main features of the analyzed phenomenon, it is possible to define two different groups of, respectively, r1 and r2 variables, with r1 þ r2 ¼ r. Consequently, X can be obtained through the horizontal concatenation of two matrices: the (N r1) matrix X1 and (N r2) matrix X2 . In this way, the r columns of X correspond to r1 columns of X1 plus r2 columns of X2 , i.e.
M
976
Multivariate Analysis
X ¼ X1 j X2 Given the following variance-covariance matrix for X S¼
1 1 T X X ¼ X j X2 N N 1
T
X1 j X2 ¼
S11
S12
S21
S22
where • S11 is the (r1 r1) covariance matrix for the variables of the first group. • S22 is the (r2 r2) covariance matrix for the variables of the second group. • S12 and S21 are, respectively, the (r1 r2) and (r2 r1) matrices of the covariances between the variables of the two groups, under the condition that ST12 ¼ S21 .
Let X be the observed (l p) two-way contingency table, where each entry xji represents the joint absolute frequency associated to the j-th and i-th attributes of two categorical variables, with j ¼ 1, . . ., l, and i ¼ 1,. . ., p. By performing CorA, the best simultaneous representation of the rows and columns of X is determined in a lowdimensional space. Let Dl ¼ diag f j :
ð3Þ
Dp ¼ diag½f :i ,
ð4Þ
and
be two diagonal matrices of the marginal relative frequencies computed as follows
The CCA is applied in order to identify the matrices A1 and A2 which maximize the correlation between
p
f j ¼
f ji ,
j ¼ 1, . . . , l,
ð5Þ
f ji ,
i ¼ 1, . . . , p,
ð6Þ
i¼1
Y1 ¼ X1 A1 and Y2 ¼ X2 A2 : Note that the linear combination Y1 and Y2 represent the canonical variates, while A1 and A2 the canonical weights. The canonical correlation coefficient given by
and l
f i ¼ j¼1
AT1 S12 A2
rðY1 ,Y2 Þ ¼
AT1 S11 A1 AT2 S22 A2
,
ð2Þ
measures the correlation between Y1 and Y2. Hence, finding A1 and A2 such that the coefficient (2) is maximum corresponds to solving the following problem: max AT1 S12 A2 , ðA1 , A2 Þ
where f ji ¼
xji l
,
p
j ¼ 1, . . . , l,
i ¼ 1, . . . , p:
ð7Þ
xji j¼1 i¼1
Through CorA, a vector u, which maximizes T
1 1 uT D1 Dl D1 u, l FDp l F Dp
ð8Þ
under the condition that AT1 S11 A1 ¼ AT2 S22 A2 ¼ 1:
under the following constraint uT D1 p u ¼ 1,
ð9Þ
Finally, the inspection of the magnitudes and signs of A1 and A2 allows the analyst to detect those variables which maximize the correlation, as well as to have insights into the contrasts among the variables under study. A very interesting application of CCA on a multivariate environmental data set is in De Iaco (2011).
is finding in a p-dimensional space. In the same way, through CorA a vector v which maximizes
Main Aspects of CorA The theory of CorA is discussed in several books (Benzécri 1983; Lebart et al. 1984; Greenacre 1989); therefore, the main characteristics of this multivariate technique will be reviewed.
under the constraint that
T
T 1 T 1 vT D1 Dp D1 v, p F Dl p F Dl
vT D1 l v ¼ 1,
ð10Þ
ð11Þ
Multivariate Analysis
is determined in an l-dimensional space. The following factors 1 F ¼ D1 p u and C ¼ Dl v
define the plane where the rows and columns of X are projected. CorA diagram (biplot) allows the user to find latent relationships among the categories of the observed variables (Lebart et al. 1984). An interesting application of CorA on a spatiotemporal environmental data set can be found in Palma (2015). Main Aspects of CA CA is a technique of multivariate analysis used for grouping a set of variables in such a way that variables which are “similar” to each other belong to the same group (called cluster), while variables which are “not similar” belong to different groups. The concept of similarity is in the sense that variables which present characteristics homogeneous within each group and different from those of the variables belonging to other groups can be considered similar (Everitt 1974). This goal can be achieved by various algorithms which deeply differ in the way the final clusters are identified. In the literature, several clustering algorithms have been developed which can be classified in hierarchical and not hierarchical or partitional clustering algorithms. Hierarchical clustering techniques generate the wellknown dendrogram (from the Greek words “dendro” which means tree and “gramma” drawing), i.e., a particular tree diagram which shows the clustering sequences. These methods of clustering can be: (a) Divisive, if it starts from a single cluster and step-by-step splits the cluster(s) until each cluster consists of a single variable (b) Agglomerative, if it starts with so many clusters as the observed variables, then step-by-step the most similar clusters are merged together until all the variables are assigned to a single cluster. On the other hand, not hierarchical clustering approaches divide the data set into a user selected number of clusters, minimizing certain optimization criteria (e.g., a square error function). Therefore these techniques are based on iterative algorithms which find the optimum number of clusters when a specific stopping criteria is satisfied (i.e., a maximum number of iterations fixed by the user). In Zhou et al. (2018), a good application of CA on geochemical data is discussed.
977
Main Aspects of DA DA, introduced by Fisher (Fisher 1936) to classify some fossil remains as monkeys or humanoids, is a very useful technique of multivariate analysis. In particular, this technique allows the definition of a linear combination of the observed variables which best separates two or more classes of cases. By applying DA on a multivariate data set, a model (called discriminant function) for the classification of the cases under study is defined. Unlike CA, this method starts from a known classification of the cases under study; successively, by defining a latent discriminant function which synthesizes the explanatory (quantitative) variables, new cases can be assigned to one of the groups of the primary (qualitative) variable. Hence, DA can be applied for descriptive aims, namely, to classify cases, as well as for prediction purposes, namely, to assign a new case to a specific group. Let’s consider a (N r) matrix X of r 2 variables observed on N sample cases. Moreover, let’s assume that the cases under study belong to K mutually exclusive groups defined by K different attributes of a categorical variable. Each group is composed by Nk cases, such that Kk¼1 N k ¼ N. By applying DA on matrix X, a linear combination Y ¼ AT X of the r variables is obtained, such that the K groups are best separated. In other words, among all the linear combinations of the observed variables, DA allows the identification of those linear combinations characterized by the maximum between groups variance (in this way the difference between groups are maximized) and the minimum within-group variance (in this way the spread within the groups are minimized). Hence, the ratio of the between-group variance on the within-group variance has to be maximized with respect to the vector A. The function which maximizes this ratio is called first discriminant function. Then, the obtained function can be used for predictive purposes in order to assign new cases to each group. Lonoce et al. (2018) proposed an interesting application of DA on a multivariate data set concerning archaeological measurements.
Multivariate Geostatistical Tools In geostatistics, one of the main objectives is to estimate the variable of interest at unsampled points of the domain. For this aim, it is required a model which is able to adequately describe the spatial or spatiotemporal correlation of the variable under study. In the univariate context, techniques for
M
978
Multivariate Analysis
modeling and estimating spatial/spatiotemporal data are widely developed (Chilés and Delfiner 1999; Cressie 1993). Moreover, the existence in the literature of large classes of admissible correlation functions (covariances or variograms), from which selecting the most suitable function for the variable under study, greatly simplifies the modeling stage. On the other hand, in multivariate geostatistics, only few models have been proposed that can explain both the correlation which characterizes each variable (often called direct correlation) and the one existing between the observed variables (known as cross-correlation). Furthermore, there are several issues related to (a) the spatial or spatiotemporal sampling, (b) the choice of valid multivariate models, (c) the lack of automatic fitting procedures, which make modeling and estimation tasks quite complex. In multivariate geostatistics, the concept of regionalization, first introduced by Matheron (1965) to highlight the close link between the observed values of one variable and the locations at which the values are measured, has been extended at the coregionalization concept, referring to regionalized variables which are spatially or spatiotemporally dependent on each other. The probabilistic approach to coregionalization is based on the concept of multivariate random function (MRF) in space or space-time. Let {Zi(s), s D ℝd}, i ¼ 1, . . ., r, be r random functions (RFs) on the spatial domain D, where s ¼ (s1, s2, . . ., sd) are the spatial coordinates (usually d 3). The vector ZðsÞ ¼ ½Z 1 ðsÞ, Z2 ðsÞ, . . . , Z r ðsÞT
ð12Þ
represents a spatial MRF. Hence, the RFs Zi(s), i ¼ 1, . . ., r, which compose the vector (12) are associated with the observed variables. In particular, the measurements zi(s)α, α ¼ 1,. . ., Ni, i ¼ 1,. . ., r, are considered as realizations of a spatial MRF Z (Fig. 1). As already pointed out, natural phenomena are usually characterized by a space-time evolution and are the result of the joint interaction of different correlated variables. Evidently, the analysis of these processes requires the measurements of two or more variables, at some locations of the survey area and for several time points. Therefore, the data set presents a multivariate space-time structure, and the analysis of its spatiotemporal distribution can be adequately performed through space-time multivariate geostatistical techniques. In this case, the MRF definition can be extended to the spatiotemporal case. In particular, let Zi ðs, tÞ, ðs, tÞ D T ℝdþ1 ,
i ¼ 1, . . . , r,
Multivariate Analysis, Fig. 1 Example of a sampling scheme for two variables over a 2D spatial domain
be r spatiotemporal RFs over the domain D T where s D ℝd (usually d 3) and t T ℝ is the time-point. Hence, the vector Zðs, tÞ ¼ ½Z1 ðs, tÞ, . . . , Zr ðs, tÞT
ð13Þ
represents a spatiotemporal MRF. Given the RFs Zi, i ¼ 1, . . ., r, related to the r 2 spatiotemporal variables under study, the measurements zi(s, t)α, α ¼ 1,. . ., Ni, i ¼ 1,. . .,r are assumed as realizations of the spatiotemporal MRF Z (Fig. 2). In multivariate geostatistics, the modeling procedure is essentially based on the identification of a spatial/spatiotemporal correlation model between the components of the MRF, which is able to describe the overall behavior of the phenomenon under study. For multivariate spatial data, several models have been explored (Goovaerts 1997; Wackernagel 2003; Gelfand et al. 2004; Gneiting et al. 2010), while in the space-time domain, modeling approaches and methodologies have been discussed in Christakos (1992); Christakos et al. (2002); Fassó and Finazzi (2011); and De Iaco et al. (2019). However, both in spatial and spatiotemporal context, the linear coregionalization model (LCM) is still the most used model to explain the correlation among the RFs of a multivariate process. The computational flexibility which characterizes the LCM represents its main strength, despite some limitations of the model which will be pointed out later on. Let Z(u) ¼ [Z1(u), Z2(u), . . ., Zr(u)]T be a vector of second-order stationary RFs in a spatial domain when u ¼ s D ℝd or in a space-time domain when u ¼ (s, t) D T ℝdþ1, with d 3. Then, first- and second-order moments of Z are, respectively
Multivariate Analysis
979 P
Ap Yp ðuÞ þ M
ZðuÞ ¼
ð16Þ
p¼1
where Ap, p ¼ 1,. . ., P, are the (r r) matrices of coefficients and M is the vector defined as in (14). In other words, the LCM assumes that each Zi is described through a linear combination of r orthogonal RFs Y vp , v ¼ 1, . . ., r, p ¼ 1, . . ., P, at P different variability scales. Hence, in the LCM, the matrix C(h) in (15) can be expressed as a linear combination of the basic covariance functions cp, i.e. P
Bp cp ðhÞ,
CðhÞ ¼
ð17Þ
p¼1
where the (r r) matrices Bp ¼ Ap ATp ,
p ¼ 1, . . . , P,
where mi ¼ E[Zi(u)], i ¼ 1, . . ., r, for all points u of the domain, and
must be positive-definite for the admissibility of the above model. In spite of the computational simplicity and flexibility of the coregionalization model, both in spatial and spatiotemporal context (Wackernagel 2003; De Iaco et al. 2010, 2012), some limitations of the model need to be highlighted. In particular, in the coregionalization model, the crosscovariance functions satisfy the following properties of symmetry:
CðhÞ ¼ ½Cik ðhÞ
ð15Þ
Cik ðhÞ ¼ Cki ðhÞ,
ð18Þ
with, respectively, Cii(h) ¼ Cov{Zi(u), Zi(u0)}, the direct covariances and Cik(h) ¼ Cov{Zi(u), Zk(u0)}, the crosscovariances (i 6¼ k) of the r RFs of Z; the vector h represents the separation vector between the points u and u0; in particular h ¼ hs in the spatial case where u ¼ s and u0 ¼ s0, while h ¼ (hs, ht) with hs ¼ (s0 s) and ht ¼ (t0 t) in the spatiotemporal case where u ¼ (s, t) and u0 ¼ (s0, t0).
Cik ðhÞ ¼ Cik ðhÞ,
ð19Þ
Multivariate Analysis, Fig. 2 Example of spatiotemporal sampling scheme for two variables (2D spatial domain)
E½ZðuÞ ¼ M ¼ ½m1 , . . . , mr T ,
Given
the
orthogonal
8ðuÞ D T,
vectors
ð14Þ
Yp ðuÞ ¼
T
Y 1p ðuÞ, . . . , Y rp ðuÞ , p ¼ 1 ,. . ., P, whose r components are second-order stationary RFs, with E Y vp ðuÞ ¼ 0,
v ¼ 1, . . . , r, p ¼ 1, . . . , P,
and 0
Cov Y vp ðuÞ, Y vp0 ðu0 Þ ¼
cp ðhÞ 0
if v ¼ v0 , p ¼ p0 , otherwise,
by assuming the LCM, the MRF Z is defined as follows
with i, k ¼ 1,. . ., r, i 6¼ k. Evidently, in the spatiotemporal context, h ¼ (hs, ht) and h ¼ (hs, ht). As known, in general the cross-covariance functions are not symmetric and do not satisfy conditions (18) and (19). In some applications the correlation between two variables is asymmetric with respect to the separation vector h, therefore assuming symmetry for the cross-covariances can produce unreliable results. In this context, it could be useful the definition of covariance functions which describe the direct and cross-correlation among the variables and do not satisfy the above properties. It is worth noting that in 1993 Grzebyk (1993) extended the LCM to the complex domain allowing the use of not-even cross-covariance functions, and more recently Li and Zhang (2011) discussed asymmetric cross-covariance functions. In the coregionalization model, the cross-covariance functions satisfy not only the symmetry assumptions (viz., invariance with respect to the spatial or spatiotemporal separation
M
980
Multivariate Analysis
vector, as well as to the exchange of the variables) but also the non-separability condition. In other words, it is assumed that the cross-covariances Cik, i, k ¼ 1,. . ., r are not proportional, namely, they are not described by the following intrinsic model Cik ðhÞ ¼ aik rðhÞ, where r() represents a correlation function, while aik, i, k ¼ 1,. . ., r, represent the elements of an (r r) matrix which is positive-definite. Several methodologies have been developed in the literature, in order to assess symmetry and separability/nonseparability conditions of the cross-covariances. Therefore, before adopting the coregionalization model to synthesize the multivariate spatial or spatiotemporal correlation among the analyzed variables, it is convenient to check if symmetry and non-separability conditions are satisfied, as thoroughly discussed in De Iaco et al. (2019) and Cappello et al. (2021).
Summary The study of a multivariate data set concerning two or more variables of the phenomenon of interest can be done by using either classical multivariate statistical techniques mainly for exploratory purposes (detection of specific features or identification of a reduced number of hidden factors which are common to the observed variables) or multivariate geostatistical techniques to model the spatial/spatiotemporal correlation which characterizes the analyzed data.
Cross-References ▶ Compositional Data ▶ Exploratory Data Analysis ▶ Principal Component Analysis ▶ Spatiotemporal ▶ Spatial Data
Bibliography Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B Methodol 44(2):139–177 Aitchison J (1994) Principles of compositional data analysis. Institute of Mathematical Statistics Lecture Notes, Monograph Series, Editor(s) Anderson TW, Fang KT, Olkin I, 24: 73–81 Benzécri JP (1983) Histoire et préhistoire de l’analyse des données. Dunod, Paris Cappello C, De Iaco S, Palma M, Pellegrino D (2021) Spatio-temporal modeling of an environmental trivariate vector combining air and soil measurements from Ireland. Spat Stat 42:1–18
Chilés J, Delfiner P (1999) Geostatistics - modeling spatial uncertainty. Wiley, New York Cressie N (1993) Statistics for spatial data. Wiley, New York Christakos G (1992) Random field models in earth sciences, 1st edn. Academic Press Christakos G, Bogaert P, Serre M (2002) Temporal GIS. Advanced functions for field-based applications. Springer De Iaco S (2011) A new space-time multivariate approach for environmental data analysis. J Appl Stat 38:2471–2483 De Iaco S, Maggio M, Palma M, Posa D (2012) Chapter 14: Advances in spatio-temporal modeling and prediction for environmental risk assessment. In: Haryanto B (ed) InTech, Air pollution: a comprehensive perspective, IntechOpen, Croatiapp, 365–390 De Iaco S, Myers DE, Posa D (2002) Space-time variograms and a functional form for total air pollution measurements. Comput Stat Data Anal 41(2):311–328 De Iaco S, Myers DE, Palma M, Posa D (2010) FORTRAN programs for spacetime multivariate modeling and prediction. Comput Geosci 36(5):636–646 De Iaco S, Palma M, Posa D (2019) Choosing suitable linear coregionalization models for spatiotemporal data. Stochastic Environ Res Risk Assess 33:1419–1434 Everitt B (1974) Cluster analysis. Social Science Research Council, Heinemann, London Fassó A, Finazzi F (2011) Maximum likelihood estimation of the dynamic coregionalization model with heterotopic data. Environmentrics 22:735–748 Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7(2):179–188 Gelfand AE, Schmidt AM, Banerjee S, Sirmans CF (2004) Nonstationary multivariate process modeling through spatially varying coregionalization., Sociedad de Estadystica e Investigacion Opertiva. Test 13:263–312 Gneiting T, Kleiber W, Schlather M (2010) Matérn cross-covariance functions for multivariate random fields. J Am Stat Assoc 105(491):1167–1177 Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York Greenacre MJ (1989) Theory and applications of correspondence analysis. Academic Press, London Grzebyk M (1993) Ajustement d’une corégionalisation stationnaire, Doctoral Thesis, Ecoles des Mines de Paris, France Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441 Jobson JD (1992) Applied multivariate data analysis, 2nd edn. Springer, New York Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, Berlin Lebart L, Morineau A, Warwick KM (1984) Multivariate descriptive statistical analysis. Wiley, New York Li B, Zhang H (2011) An approach to modeling asymmetric multivariate spatial covariance structures. J Multivar Anal 102:1445–1453 Lonoce N, Palma M, Viva S, Valentino M, Vassallo S, Fabbri PF (2018) The Western (Buonfornello) necropolis (7th to 5th BC) of the Greek colony of Himera (Sicily, Italy): site-specific discriminant functions for sex determination in the common burials resulting from the battle of Himera (ca. 480 BC). Int J Osteoarchaeol 28:766–774 Matheron G (1965) La Theorie des Variables Regionalisees et ses Applications. Masson, Paris Palma M (2015) Correspondence analysis on a space-time data set for multiple environmental variables. Int J Geosci 6:1154–1165 Wackernagel H (2003) Multivariate geostatistics: an introduction with applications. Springer, Berlin Zhou S, Zhou K, Wang J, Yang G (2018) Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies. Front Earth Sci 12:491–505
Multivariate Data Analysis in Geosciences, Tools
Multivariate Data Analysis in Geosciences, Tools John H. Schuenemeyer Southwest Statistical Consulting, LLC, Cortez, CO, USA
Definition Multivariate data occurs frequently in the geosciences. Graphical and statistical tools are applied to extract information from large, complex data sets. Methods are suggested for different types of problems.
Introduction Multivariate analysis in the geosciences begins with an exploratory procedure that allows the analyst to gain a better understanding of a complex data set. Data is obtained in many ways. The preferred method would be from designed experiments, often categorized by a formal experimental design or sampling design since they yield more generalizable results. However, because of the cost of sampling (the need for ships, satellites, deep drilling), time sensitivity (floods and earthquakes), and accessibility of sites (access to land, harsh conditions), this may not be possible. This entry discusses procedures by categories keyed to data analytic tasks. The tasks within categories include reducing the number of variables, clustering observations, gaining insight into correlation among variables, and classifying observations. Most of the variables we present are metric variables, which are variables that can be measured quantitatively. These include interval, ratio, and continuous variables. Examples are from the geosciences, broadly defined to include energy, geology, archaeology, climate, and ecology. Procedures used to understand data obtained in these disciplines overlap. Geochemical data is common in geosciences. Often it is partly compositional data, which will be presented after a discussion of categories. Selected references are provided. A general reference with geoscience examples is by Schuenemeyer and Drew (2011).
Categories The categories below list specific tasks that need to be addressed while performing multivariate exploratory data analysis. Under each category, one or more approaches are presented. In the section after this, alternative approaches are presented:
981
1. Graphical procedures. Graphical procedures are usually the first stage in multivariate exploratory data analysis and frequently are an extension of those used in univariate or bivariate analysis. They are used in the methods described in the following categories. Boxplots are one of the most effective ways to compare distributions of several variables expressed in the same unit. Density plots are alternatives. Using three dimensions plus color, size, and shape can extend a two-dimensional scatter plot to six dimensions. In addition, this system can be rotated. Faceting or grouping allows plotting of multiple variables on a single graph such as a histogram or boxplot. 2. Dimensionality reduction. Data sets in the earth sciences sometimes have dozens to hundreds of variables, partly due to the high initial cost of sampling and the relatively low marginal cost of sampling additional variables. In such cases, the variables can be highly correlated, meaning that the dimensionality of the system is less than the number of variables. This can result in interpretation, storage, and computational problems. Some approaches to transform a data set in high dimensional space to one in a lower dimensional space are: (a) Principal component analysis (PCA) is an orthogonal transformation of a system of correlated variables into one of uncorrelated variables. The result is an ordered system by percent of variability accounted for by principal components PC1,. . ., PCp, where p is the number of variables. The hope is that a significant percent of the total variability will be accounted for by m variables where m < < p. For example, we show (Schuenemeyer and Drew 2011) that 78% of the total variability of 21 variables from the Freeport coal bed was accounted for by seven variables. (b) A second, more modern but more difficult, procedure to implement is reduction of the number of variables by machine learning. In a machine learning approach, criteria are established that variables must meet to be retained. These could include variability, distance, and passage of statistical tests. 3. Clustering of observations by group where the variables have similar characteristics. Cluster analysis seeks to group observations by like variables. As a result of clustering, observations in each group should be more similar than those in different groups. This reduces the number observations, as opposed to dimensional reduction of variables. Examples of cluster analysis in earth and environmental science include: • Identifying waste sites based on variables describing measures of toxicity. • Defining climate zones based on measures of pressure, precipitation, and temperature.
M
982
• Identifying archaeological sites based on the chemical composition of pottery sherds. The number of approaches to finding clusters is too numerous to describe in this report, but here are possibilities: (a) Hierarchical clustering is a tree-based approach divided into two types of algorithms – agglomerative and divisive. An agglomerative approach begins at the lowest level, that is, each observation is a cluster. The first step upward is to combine the two observations that are closest according to some criterion, which could be as simple as medians. The path upward continues until a stopping rule or one cluster consisting of all observations is reached. The inverse of this procedure is a divisive hierarchical approach. It begins with all observations in a single cluster and then splits groups to achieve the most diversity. (b) Clustering by partitioning algorithms is another set of procedures to use. Unlike with hierarchical algorithms, the number of clusters is specified in advance, and observations can be moved from cluster to cluster until some measure of closure is achieved. The k-mean algorithm is frequently used. Other methods include fuzzy set and method clustering. (c) Multidimensional scaling (MDS) is a dimension reducing visualization method that may be effective when the number of dimensions can be reduced to three or fewer. The purpose is to reduce dimensionality while approximating the original distances or similarities. Dzwinel et al. (2005) illustrate its use in investigating earthquake patterns. MDS can also be used as a nonparametric alternative to factor analysis (FA) because it does not need the assumption of multivariate normality or a correlation matrix. (d) Correspondence analysis (CA) is similar to MDS. There is no single best approach to clustering as it depends on the pattern in your data set such as elliptical or a snake-like curvature pattern. Achieving a satisfactory partitioning of data into clusters is largely a trial and effort procedure. 4. Gaining insight into correlation among variables. Factor analysis is a popular method used to perform this task. There are similarities to PCA, but there are important differences. PCA is a mathematical transformation. FA is a statistical model. In addition, the goal of FA is to find correlation structures. The goal of PCA is to maximize the variance of the new variables. FA is used as a statistical model when the number of factors is known and specified in advance. Since this is rarely the case, FA is more often an EDA method where several estimates of the number of factors are made. Typical output includes a factor loading
Multivariate Data Analysis in Geosciences, Tools
matrix where the columns are factors and each observation down a column represents the factor and variable. For example, in a two-factor solution, the variables Ca and SO4 may have a high correlation with Factor 1 and low correlation with Factor 2. Conversely, variables Na and Cl may have a high correlation with Factor 2 and a low correlation with Factor 2. R-mode and Q-mode are alternative forms of factor analysis. 5. Determining separation between known groups and/or classifying an object of unknown origin into one of two or more groups. For example, an archaeologist has identified two nearby sites from buried samples from different time periods. Two questions arise: (1) How do we determine the spatial overlap, if any, among sites? (2) How do we determine with probability, to which site an observation of unknown origin belongs? We examine two approaches: (a) Discriminant analysis (DA) is a procedure used to determine separation between known groups and/or to classify an observation of unknown origin into one of two or more predefined groups. Two models are linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). The LDA is a model in which the population variance-covariance matrices are assumed to be the same for all sites. For QDA the homoscedastic assumption does not hold. In the archaeology example, the data set consists of two sites labeled K and N. The sites were identified by samples having variables Cu and Pb. (b) Tree-based modeling is a nonparametric alternative to LDA. Variables can be continuous, counts, or categorical. The procedure begins at the root node and splits on the variable that meets a classification rule that would make the splits at termination as pure as possible.
Automating Geoscience Analysis Automated systems to understand large, complex data in many geological science disciplines are relatively new. They are discussed below. 1. Structural equation modeling (SEM). This methodology (Grace 2020) represents an approach to statistical modeling that focuses on the study of complex cause-effect hypotheses about the mechanisms operating in systems. SEM is used increasingly in ecological and environmental studies. 2. Data mining versus machine learning. Data mining (Bao et al. 2012) is a tool used to gain insight into data. The
Multivariate Data Analysis in Geosciences, Tools
983
algorithms listed in the Categories section fall under the category of data mining. Even an algorithm to implement a procedure as simple as liner regression belongs to the data mining category because it yields information from a set of linear data. A slightly more complex algorithm would be one that distinguishes between linear and quadratic data. However, we typically think of data mining as yielding information from a larger and more complex data set. A distinguishing feature between data mining and machine learning is that machine learning (Korup and Stolle 2014; Srivastava et al. 2017) can teach itself from previous analyses of data. Data mining may be a procedure used by machine learning. 3. Supervised versus unsupervised learning. Most statistical learning processes fall into two categories, supervised and unsupervised learning (James et al. 2017). A statistical learning process is building a model where the data contains an input and an output. An example is regression where a possible set of explanatory variables is the input and the explanatory variable is the output. This is the training set. The supervised learning algorithm learns a function which can estimate output for future data. In unsupervised learning, there is only one set of data and the algorithm learns from the data. An example is clustering.
Compositional Data Analysis (CoDa) This refers to the analysis of compositional data, also known as constant sum data (van den Boogaart and TolosanaDelgado 2013; Pawlowsky-Glahn et al. 2015). Geochemical data occurs frequently in the geosciences. Compositional data are portions of a total. They include variables known as components that are expressed as fractions, percentages, or parts per million. This data has an induced correlation because, when there are p variables, only p 1 are independent. Because of this closure, a variance matrix will always be singular, and the variables cannot be normally distributed. Therefore, standard statistical methods do not apply. Alternative approaches include performing a log-ratio transformation prior to analysis.
Multivariate Data Analysis in Geosciences, Tools, Table 1 A subset of compositional data; units are percent
Site A A A B B B
Cu 11.1 9.8 8.9 8.3 7.9 7.9
Pb 5.4 1.8 1.9 2.0 5.9 2.4
A Numerical Example Data in the geosciences is often compositional; however, it is not always analyzed in that manner. We present a small, constructed example of a subset of a compositional data set consisting of three variables (Cu, Pb, and Zn) and 92 observations and four sites. A sample is shown in Table 1. Typically, there would be more compositional variables that may be of little interest to the investigator. The units here are percent but parts per million is another way to express compositional data. The problem is to see if linear discriminant analysis (LDA) is a reasonable model to identify separation of existing sites using the three variables identified above. The two procedures we consider are (1) to use the data shown above and (2) to divide each of the three elements by their sum so that the variables in each observation sum to 1. A third procedure (not shown) would be to add a residual variable to the elements that would be the sum of the omitted variables. Table 2 compares the prior (known) data with the LDA posterior results using the original data as the new data set in the LDA prediction algorithm. The prior column shows the number of observations by site. The posterior row shows the allocation made by the prediction. The diagonal shows the number of correct allocations based on the maximum probability assigned to each observation and site. For this example, the sum of the diagonal is 65; however, a typical but arbitrary measure of accuracy is that the maximum probability must be at least 0.50. There are five correct assignments with probability less than 0.50. Applying this rule would yield an estimation accuracy of 0.65. Now we examine the results obtained when the data is changed to constant sum and the compositional LDA is used. The major difference with the standard LDA is that the data argument to compositional LDA is the isometric log-ratio (ilr) transformation of the constant sum data. Following the prediction, an inverse ilr is made mapping the data back to their original units (those shown in Table 1).
Zn 6.1 17.4 13.2 5.8 8.1 9.4
Site C C C D D D
Cu 8.6 8.1 8.1 9.2 9.1 8.9
Pb 2.7 5.5 5.0 1.6 2.7 4.7
Zn 6.3 5.0 4.9 9.3 7.9 10.1
M
984 Multivariate Data Analysis in Geosciences, Tools, Table 2 A comparison of prior and posterior results using the untransformed data
Multivariate Data Analysis in Geosciences, Tools, Table 3 A comparison of prior and posterior results using the ilr transformed data
Multivariate Data Analysis in Geosciences, Tools
A B C D Posterior
Site A 9 0 0 2 11
B 2 24 5 3 34
C 0 1 5 1 7
D 3 5 5 27 40
Prior 14 30 15 33 92
A B C D Posterior
Site A 7 2 0 1 10
B 5 20 2 6 33
C 1 2 10 3 16
D 1 6 3 23 33
Prior 14 30 15 33 92
The definitions are the same as those in Table 3. However, for this example, the sum of the diagonal is 60, and there are six correct assignments with probability less than 0.50. Applying this rule would yield an accuracy of 0.59. There are numerous reasons for differences in accuracy between the two approaches that go far beyond the scope of this report. However, it is proper to use the compositional LDA when appropriate.
Summary and Conclusion This report presents tools for a user to gain insight into multivariate data. They are listed by category, and within each category there are brief discussions of different algorithms. Alternative tools are discussed that are useful when analyzing large complex data sets. Finally, a brief discussion of compositional data followed by an example is presented because its use in the geosciences in common and such data requires a nonstandard statistical approach to analysis.
Bibliography Bao F, He X, Zhao F (2012) Applying data mining to the geosciences data. Phys Procedia 33:685–689 Dzwinel W, Yuen D, Boryczko K, Ben-Zion Y, Yoshioka S, Ito T (2005) Nonlinear multidimensional scaling and visualization of earthquake clusters over space, time and feature space. Nonlinear Process Geophys 12(1):117–128 Grace J (2020) Quantitative analysis using structural equation modeling, USGS, Wetlands and Aquatic Research Center. https://www.usgs. gov/centers/wetland-and-aquatic-research-center. Accessed 22 Nov 2020 James G, Witten D, Hastie T, Tibshirani R (2017) An introduction to statistical learning with applications in R. Springer, New York, 426 p Korup O, Stolle A (2014) Landslide prediction from machine learning. Geol Today 30:26–33. Wiley Online Library Pawlowsky-Glahn V, Egozcue J, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Hoboken. 272 p Schuenemeyer J, Drew L (2011) Statistics for earth and environmental scientists. Wiley, Hoboken. 407 p Srivastava A, Nemani R, Steinhaeuser K (eds) (2017) Large-scale machine learning in the earth sciences. Data mining and knowledge discovery series. Chapman & Hall/CRC, Boca Raton. 226 p van den Boogaart K, Tolosana-Delgado R (2013) Analyzing compositional data with R. Springer, Berlin. 258 p
N
Němec, Va´clav Jan Harff1 and Niichi Nichiwaki2 1 Institute of Marine and Environmental Sciences, University of Szczecin, Szczecin, Poland 2 Professor Emeritus of Nara University, Nara, Japan
Fig. 1 Václav Němec, 2019 (Photo by Ekaterina Solntseva)
Biography Václav Němec is a Czech scientist born on November 2, 1929, in Prague. He has been a founding member of the International Association for Mathematical Geosciences (IAMG) since 1968 and has contributed significantly to the dissemination of scientific knowledge of mathematical geology in Eastern Europe.
He started his study of economics at the Technical University of Prague in 1948. At that time, Czechoslovakia belonged to the so-called Eastern Bloc. In 1951, because of political reasons, he was not allowed to continue with the closing fourth academic year of his studies. The following 26 months, until the end of November 1953, he spent as a miner in a special Czechoslovak army unit for politically unreliable persons. From the end of 1953 to 1990, he was working for the Czechoslovakian geological exploration of mineral deposits service. As distant education student at Charles University in Prague, he completed his studies in the Earth sciences with a diploma in applied geophysics in 1959. He received his RNDr degree in 1967 from Charles University in economic geology with a doctoral thesis on computerized modeling of three mineral deposits exploited by the cement industry in Czechoslovakia. He received a second doctoral degree (analogous to a PhD) in 1987 from the Technical University of Košice (mining sciences) and in 1994 an engineering degree from the University of Economics in Prague as restitution for the injustice received in 1951. His scientific work was initially focused on mathematical modeling of mineral resources in the cement industry and the investigation of planetary tectonic fault systems (Němec 1970, 1988, 1992). A stay from 1969 to 1970 as visiting research fellow at the Kansas Geological Survey (USA) was dedicated to the latter topic. As Eastern Treasurer of the IAMG, he contributed substantially to mathematical geology in 1968–1980 and 1984–1996. In this role, he organized 19 international conferences on mathematical geology as part of the annual Mining Přibram Symposia. During the East-West separation of the world until 1990, these conferences served as the most important scientific interface for mathematical geology between the politically separated hemispheres. In addition to mathematical geology, he has devoted himself to geoethics that he introduced in 1991 as a new topic for
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
986
the Mining Přibram Symposium inspired by his wife Lidmila Němcova (Němec and Němcová, 2000). He convened five symposia on this topic at the 30th and 34th IGCs and at two outreach symposia at annual meetings of the European Union of Geosciences (EGU) in Vienna (2013–2014). He lectured as invited speaker in about 20 countries on five continents and became recognized as the “Father of Geoethics.” In recognition of his scientific achievements in mathematical geology and his service to the IAMG, he received the IAMG’s highest award in 1991 – the William Christian Krumbein Medal. He is a member of the Russian Academy of Natural Sciences since 1995 and an active member or officer of various other national and international scientific and cultural NGOs. His scientific competence, his multilingual talent, and his musicality (as a pianist) make him an extraordinary ambassador between the nations not only in science but also in the field of culture.
Bibliography Němec V (1970) The law of regular structural pattern, its applications with special regard to mathematical geology. In: Merriam DF (ed) Geostatistics – a colloquium. Plenum, New York/London, pp 63–78 Němec V (1988) Geomathematical models of ore deposits for exploitation purposes. Sci de la Terre Sér Inf 27:121–131 Němec V (1992) Possible ways to decipher spatial distribution of mineral resources. Math Geol 24/8:705–709 Němec V, Němcová L (2000) Geoethical backgrounds for geoenvironmental reclamation. In: Paithankar AG, Jha PK, Agarwal RK (eds) Geoenvironmental reclamation (International Symposium, Nagpur, India). Oxford & IBH Publ. Co., New Delhi, pp 393–396, ISBN 81-204-1457-8
Neural Networks Vladik Kreinovich Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA
Definition A neural network is a general term for machine learning tools that emulate how neurons work in our brains. Ideally, these tools do what we scientists are supposed to do: We feed them examples of the observed system’s behavior, and hopefully, based on these examples, the tool will predict the future behavior of similar systems. Sometimes they do predict – but in many other cases, the situation is not so simple. The goal of this entry is to explain what these tools can and cannot do – without going into too many technical details.
Neural Networks
How Machine Learning Can Help What Are the Main Problems of Science and Engineering The main objectives of science and engineering are: • To determine the current state of the world. • To predict the future behavior of different systems and objects. • If it is possible to affect this behavior, to find the best way to do it. For example, in geosciences: • We want to find mineral deposits – oil, gas, water, etc. • We want to be able to predict potentially dangerous activities such as earthquakes and volcanic eruptions. • We want to find the best ways to extract minerals that would not lead to undesirable side effects such as pollution or triggered seismic activity. Let us describe the three classes of general problems in precise terms. To determine the current state of the world, we can use the measurement results x. Based on these measurement results, we want to describe the actual state y of the system. For example, to find the geological structure y in a certain area – e.g., the density (and/or speed of sound) values at different depths and at different locations, we can use the seismic data x, both passive (measuring seismic waves generated by actual earthquakes) and active (measuring seismic signals generated by specially set explosions or vibrations). To predict the future state y of the system – or at least the future values y of the quantities of interest – we can use the information x about the current and past state of the system. For example, by using the accurate GPS measurements x, we can find how fast the continents drift, and thus predict their future location y. To find the best control y, we can use all the known information x about the current state of the system. For example, based on our knowledge of the geological structure x of the area, we would like to find the parameters (e.g., pressure) y of the fracking technique that would leave the pollution below the desired level. Sometimes We Know the Equations, Sometimes We Do Not In some cases, we know the equations relating the available information x and the desired quantities y. In some of these cases, the relation is straightforward: For example, simple linear extrapolation formulas enable us to predict the future continent drift. In other cases, the equations are not easy to solve: For example, it is relatively easy, giving the density structure y, to describe how the seismic signals will propagate and what
Neural Networks
signals x to expect – but to find y based on x (i.e., to solve the inverse problem) is often not easy. In most cases, however, we do not know the equations relating x and y. In this, geosciences are drastically different from physics – especially fundamental physics – where the corresponding equations are usually known (they may be difficult to solve, as in predicting chemical properties of atoms and molecules from Schroedinger’s equation, but they are known). How Machine Learning Can Help: General Case Usually, there are cases when we know both the input x and the desired output y. In other words, we know several pairs (xi, yi) corresponding to different situations. Such cases are ubiquitous for prediction problems: For example, if we are trying to predict seismic activity a week ahead, then every time a week has passed, we have a pair (xi, yi) consisting of the measurement results xi obtained before the passed week and the passed week’s seismic activity yi. Such cases are ubiquitous in control problems: Every time we do something and succeed, we get a pair (xi, yi) consisting of the previous situation xi and of the successful action. Based on such pairs, machine learning tools produce an algorithm that, given the input x, provides an estimate for the desired output y. For example, for prediction problems, we can use the current values x to get some predictions of the future values y. Producing such an algorithm usually takes time – but once the algorithm has been produced, it usually works very fast. How Machine Learning Can Help: Case When We Know Equations When we know equations, the difficulty is usually in solving the inverse problem – finding y based on x. In contrast, the “forward” problem – finding x based on y – is usually easy to solve. So, what we can do is select several realistic examples y1, . . ., yn of y, compute the corresponding x’s x1, . . ., xn, and feed the resulting pairs (x1, y1), . . ., (xn, yn) into a machine learning tool. As a result, we get an algorithm that, given x, produces y – i.e., that solves the inverse problem.
Limitations There Are Limitations So far, it may have seemed that machine learning is a panacea that can solve almost all of our problems. But, of course, the reality is not always rosy. To effectively use machine learning, we need to solve three major problems:
987
• First, the ability of machine learning tools to process multi-D data is limited. In most cases, these tools cannot use all the measurement results that form the input xi and the output yi. So, we need to come up with a small number of informative characteristics. This selection is up to us, it is difficult, and if we do not select these characteristics correctly, we lose information – and the machine learning program will not be able to learn anything. • Second, to make accurate predictions, we need to have a large number of pairs: thousands, sometimes millions. Sometimes we have many such pairs – e.g., when we are solving an inverse problem. But in many other important cases – e.g., in predicting volcanic eruptions or strong earthquakes – there are – thankfully – simply not that many such events. Of course, we can add to the days of observed eruptions days when nothing happened, but then the machine learning program will simply always predict “no eruption” – and be accurate in the spectacular 99.99% of the cases! • Third, training a machine learning program requires a lot of time, so much that often it can only be done on a highperformance computer. The need to solve these three problems – especially the first two – severely limits the usefulness of machine learning tools. Let us explain, on a qualitative level, where these problems come from. Machine Learning Is, in Effect, a Nonlinear Regression Machine learning is not magic. It is, in effect, a nonlinear regression: Just like linear regression enables us to find the coefficients of a linear dependence based on samples, nonlinear regression enables us to find the parameters of a nonlinear regression. For neural networks, such parameters are known as weights. In many practical situations – especially in geosciences – linear models provide a very crude approximation. So, crudely speaking, in addition to parameters describing linear terms, we also need parameters describing quadratic terms, cubic terms, etc. The more parameters we use, the more accurately we can describe the actual dependence. This is similar to how we can approximate, e.g., an exponential 2 function expðxÞ ¼ 1 þ x þ x2! þ . . . by the first few terms in m its Taylor expansion: expðxÞ ¼ 1 þ x þ . . . þ xm! (this is, by the way, how computers actually compute exp(x)): the more terms we use, the more accurate is the result. We Cannot Use All the Information And here lies the problem. The numbers of nonlinear terms drastically grow with the number of inputs. For example, if the input x consists of m values x ¼ (X1, . . ., Xm), then to describe a
N
988
Neural Networks
generic linear dependencies Y ¼ c0 þ
m
ci Xi , we need m þ 1
i¼1
parameters. To describe a generic quadratic dependence Y ¼ c0 þ
m i¼1
ci X i þ
m
m
cij Xi Xj , we already need ≈ m2
i¼1 j¼1
parameters. This already leads to a problem. For example, suppose that we are processing images. Describing an image x means describing the intensity Xi at each of its pixels i. A typical image consists of about 1000 1000 ¼ 106 pixels, so here we have m ≈ 106. It is easy to store and process a million values, but finding 1012 unknown coefficients – even in the simplest case, when we have a system of 1012 linear equations with 1012 unknown – is way beyond the abilities of modern computers. As a result, we cannot simply feed the image into a machine learning tool – and, similarly, we cannot simply feed the seismogram into this tool. We need to select a few important parameters characterizing this image (or this seismogram). And here computers are not much help, we the scientists need to do the selection. This explains the first problem. It should be mentioned that the situation is not so bad with images. Everyone knows that images can be compressed into a much smaller size without losing much information – e.g., a small-size photo on a webpage is still quite recognizable – and modern machine learning techniques use such methods automatically. However, for seismograms, no such noinformation-loss drastic compressions are known. What About Computation Time? The more parameters we need, the more computation time we need to find the values of these parameters. Even if the system is linear, to find the values of q parameters – i.e., to solve a system of q equations for q unknowns – we need time ≈ q3, at least as much as we need to multiply two q q matrices A and B – where to compute each of q2 elements of the product cij ¼ ai1 b1j þ . . . þ aiq bqj, we need q computational steps (and q2 q ¼ q3). Even if we compress the image from a million to m ¼ 300 values, if we take the simplest – quadratic – terms into account, we will need q ≈ m2 ¼ 105 variables, so we need at least q3 ≈ 1015 computational steps. On a usual GigaHertz PC that performs 109 operations per second, this means 106 seconds – about 2 weeks. And we only took into account quadratic terms – and just like an exponential function or a sinusoid do not look like graphs of x2, real-life dependencies are not quadratic either. So machine learning requires a lot of computation time – often so much time that only highperformance computers can do it. This explains the third problem.
We Need a Large Number of Samples And this is not all. The more accurately we want to predict y, the more parameters we need. How many samples do we need? Crudely speaking, each pair (xi, yi) with yi ¼ (Yi1, . . ., Yir) provides r equations Yij ¼ f(Xi1, . . ., Xiq) for determining the unknown parameters, for some small r. So, we need approximately as many samples as parameters. In the above example of q ≈ 105 unknowns, we need hundreds of thousands of example – and usually, even more. This explains the second problem. There is an additional reason why we need many pairs. The reason is that it is not enough to just train the tool, we need to test how well the trained machine learning tool works. For that purpose, the usual idea is to divide the original pairs into the training set and the testing set, train the tool on the training set only, and then test the resulted training on the training set. How do we know that it works well? One correct prediction may be a coincidence. However, if we have several good predictions, this makes us confident that the trained model works well. So, to gain this confidence, we need a significant number of such pairs. And in many important problems, we do not have that many pairs. For example, even if a volcano has been very active, had five eruptions, there is not enough data to confirm the model.
An Additional Problem All the above arguments assumed that once we arranged an appropriate compression, gathered millions of sample, and rented time on high-performance computer, the machine learning tool will always succeed. Unfortunately, this is not always the case. It is known that, in general, prediction problems, inverse problems, etc. are NP-hard, which means that (unless P ¼ NP, which most computer scientists believe to be impossible) no feasible algorithm is possible that would always solve these problems. In other words, any feasible algorithm – and algorithms implemented in the machine learning tools are feasible – will sometimes not work. Good news is that computer scientists are constantly inventing new algorithms, new tools. So if you encounter such a situation, a good idea is to team up with such a researcher. Maybe his/her new algorithm will work well when the algorithms from the software package you used did not work. This brings us to the question of what machine learning tools are available.
Neural Networks
What Tools Are Available Traditional neural networks used three layers of neurons. Corresponding the number of parameters was not high, so while such neural networks can be easily implemented on a usual PC, they are not very accurate. So, if you have a few samples – not enough for more accurate training – you can use traditional neural networks, and get some reasonable (but not very accurate) results. In the latest decades, the most popular are deep neural networks that have up to several dozen layers, and thus a much larger number of adjustable parameters. Often, they lead to spectacular results and accurate predictions. However, due to the high number of parameters, deep neural networks need a large (sometimes unrealistically large) number of sample pairs. For the same reason, deep neural networks require a lot of time to train – so much that this training is rarely possible on a usual computer. So, if you have a large number of samples – and you know how to compress the original information – it is a good idea to try to use deep learning. There are also other efficient machine learning tools, such as support vector machine (SVM) – that used to be the most efficient tool until deep learning appeared – but these tools are outside the scope of this article.
In Case You Are Curious So how do neural networks work? An artificial neural network consists of neurons. Each neuron takes several inputs v1, . . ., vk and returns an output u ¼ s(w0 þ w1 v1 þ . . . þ wk vk), where wi are coefficients (“weights”) that need to be determined during training, and s(z) is an appropriate nonlinear function. Traditional neural networks used the so-called sigmoid functions s(z) ¼ 1/(1 þ exp(z)), while deep neural networks use the “rectified linear” function s(z) ¼ max(0, z). Neurons usually form layers: • Neurons from the first layer use the measurement results (or whatever we feed them as xi) as inputs. • Neurons from the second layer use outputs of the first layer neurons as inputs, etc. How are the coefficients wi trained? In a nutshell, by using gradient descent – the very first optimization method that students learn in their numerical methods class. The main idea behind this method is very straightforward: to go down the mountain as fast as possible, you follow the direction in which the descent is the steepest. Neural networks use some clever algorithmic tricks that help find this steepest direction, but they do use gradient descent.
989
Readers interested in technical details are welcome to read (Goodfellow et al. 2016) – at present (2021) the main textbook on deep neural networks.
Examples of Successful Applications Successful applications of traditional (three-layers) neural networks have been summarized in a widely cited book (Dowla and Rogers 1996). There have been many interesting applications since then, especially in petroleum engineering, where even a small improvement in prediction accuracy can lead to multimillion gains. The paper (Thonhauser 2015) provides a survey of such applications. For example, neural neworks are efficient in classifying geological layers based on the well log (Bhatt and Helle 2002) – this is one of the cases when we have a large amount of data, sufficient to train a neural network. Applications of deep learning are a rapidly growing research area. There is a large number of papers, there are surveys – e.g., (Bergen et al. 2019). However, new applications and new techniques appear all the time – this research area grows faster that surveys can catch up. Because of this fast growth, this part of the article is more fragmentary than comprehensive – we will just cite examples of typical applications. Probably the most active application areas are processing images – e.g., satellite images. One of the main reasons why these applications of deep learning are most successful is that, as we have mentioned, for images we can apply efficient almost-no-loss compression and thus, naturally prepare the data for neural processing. A typical recent example of such an application is (Ullo et al. 2019). One of the most interesting applications of deep learning to satellite images is the possibility to predict volcanic activities based on images produced by satellites equipped with interferometric synthetic aperture radar (InSAR) – which can detect centimeter-scale deformations of earth’s surface (Gaddes et al. 2019). Many examples of applications are related to solving inverse problems – since, as we have mentioned, in this case, we can easily generate a large number of examples; see, e.g., (Araya-Polo et al. 2018), (Mosser et al. 2018). The use of state-of-the-art machine learning techniques for solving inverse problem has already led to interesting discoveries. For example, it turns out that, contrary to the previously assumed simplified models of seismic activity – according to which only usual (“fast”) earthquakes release the stress, a significant portion of the stress is released through slow slip and slow earthquakes; see, e.g., (Pratt 2019). Interestingly, sometimes even for the forward problem, neural networks produce results faster than the traditional numerical techniques; see, e.g., (Moseley et al. 2018).
N
990
The papers (Magaña-Zook and Ruppert 2017) and (Linville et al. 2019) use deep learning to solve another important problem: How to distinguish earthquakes from explosions based on their seismic waves? This problem was actively developed in the past because of the need to distinguish nuclear weapons tests from earthquakes. Now, the main case study is separating small earthquakes from small explosions like quarry blasts – and in this application, there is a large number of examples, which makes neural networks very successful.
Summary In a nutshell, neural networks are interpolation and extrapolation tools. When we have a large number (xi, yi) of pairs (x, y) of related tuples of quantities, machine learning techniques – such as neural networks – produce a program that, given a generic tuple x, estimates y. In many cases, neural networks have led to successful applications in geoscience. However, neural networks are not a panacea. For a neural network application to be successful, we really need a very large number of examples – which is not always possible in geosciences; we need to compress the data without losing information – which is also often difficult in geosciences; we usually need to spend a large amount of computation time; and we need to be patient: Sometimes these techniques work, sometimes they do not. We hope that this entry will help geoscientists to select problems in which all these conditions are satisfied, and get great results by applying neural networks! (And do not hesitate to collaborate with computer scientists if it does not work the first time around.) Acknowledgments This work was supported in part by the National Science Foundation grants 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science) and HRD-1242122 (Cyber-ShARE Center of Excellence).
Nonlinear Mapping Algorithm Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge, MA Linville L, Pankow K, Draelos T (2019) Deep learning models augment analyst decisions for event discrimination. Geophys Res Lett 46(7): 3643–3651 Magaña-Zook SA, Ruppert SD (2017) Explosion monitoring with machine learning: A LSTM approach to seismic event discrimination. In: Abstract of the American Geophysical Union (AGU) Fall 2017 Meeting, New Orleans, Louisiana, December 11–15, 2017, Abstract S43A–0834 Moseley B, Markham A, Nissen-Meyer T (2018) Fast approximate simulation of seismic waves with deep learning. In: Proceedings of the 2018 conference on Neural Information Processing Systems NeurIPS’2018, Montreal, Canada, December 3–8, 2018 Mosser L, Kimman W, Dramsch J, Purves S, De la Fuente A, Ganssle G (2018) Rapid seismic domain transfer: Seismic velocity inversion and modeling using deep generative neural networks. In: Proceedings of the 80th European Association of Geoscientists and Engineers (EAGE) conference, Copenhagen, Denmark, June 11–14, 2018 Pratt SE (2019) Machine fault. Earth & Space Science News (EoS), December 2019, 28–35 Thonhauser G (2015) Application of artificial neural networks in geoscience and petroleum industry. In: Craganu C, Luchian H, Breaban ME (eds) Artificial intelligent approaches in petroleum geosciences. Springer, Cham, pp 127–166 Ullo SL, Langenkamp MS, Oikarinen TP, Del Rosso MP, Sebastianelli A, Piccirillo F Sica S (2019) Landslide geohazard assessment with convolutional neural networks using Sentinel-2 imagery data. In: Proceedings of the 2019 IEEE International Geoscience and Remote Sensing Symposium IGARSS’2019, Yokohama, Japan, July 28 – August 2, 2019, 9646–9649
Nonlinear Mapping Algorithm Jie Zhao1, Wenlei Wang2, Qiuming Cheng3 and Yunqing Shao3 1 School of the Earth Sciences and Resources, China University of Geosciences, Beijing, China 2 Institute of Geomechanics, Chinese Academy of Geological Sciences, Beijing, China 3 China University of Geosciences, Beijing, China
Definition References Araya-Polo M, Jennings J, Adler A, Dahlke T (2018) Deep-learning tomography. Lead Edge (Tulsa Okla) 37:58–66 Bergen KJ, Johnson PA, de Hoop MV, Beroza GC (2019) Machine learning for data-driven discovery in solid earth geoscience. Science 363(6433) Paper eaau0323 Bhatt A, Helle HB (2002) Determination of facies from well logs using modular neural networks. Pet Geosci 8:217–228 Dowla FU, Rogers LL (1996) Solving problems in environmental engineering and geo-sciences with artificial neural networks. MIT Press, Cambridge, MA Gaddes ME, Hooper A, Bagnard M (2019) Using machine learning to automatically detect volcanic unrest in a time series of interferograms. J Geophys Res Solid Earth 124(11):12304–12322
Nonlinear mapping algorithm, firstly proposed by Sammon (1969), is a nonhierarchical method of cluster analysis for geometric image dimensionality reduction (Howarth 2017). According to the algorithm, complex relationships in highdimensional space can be transformed and demonstrated in a low-dimensional space without changing relative relationships among internal factors. Approximate images of the relationship between high-dimensional samples can be intuitively seen in the low-dimensional space that will greatly simplify the interpretation (Sammon 1969). Nonlinear mapping algorithm has been widely applied in target classification and recognition within the research fields of geology,
Nonlinear Mapping Algorithm
chemistry, and hydrology, especially for distributions and processes with nonlinearity, for examples, applications in flood forecasting, fault sealing discrimination (Smith 1980), etc. The algorithm focuses only on data without considering types and categories, and has been efficiently used in prediction of mineralization and natural disasters.
991
where, NF ¼ i < j dij*
N
dij* is the normalized factor
i¼1 j¼iþ1
and Wij ¼ 1/ dij* is the weighting factor. d ij ¼
p
xk1 xkj
2
ð2Þ
i¼1
Introduction dij ¼ Observational data indicative to natures of the Earth system are often multidimensional (e.g., stream sedimentary geochemical samples recording concentrations of various elements; structure data in a form of vector consisting of length, azimuth, age, strain, stress, etc.). Consequently, researchers working in the Earth sciences, especially mathematical geosciences, often needs to deal with highdimensional data and sometimes even faces issues of information redundancy. The so-called “curse of dimensionality” is a vivid description of complicated calculation and interpretation brought by high-dimensional data. It may also lead to obstacle of analytical method, for example calculating the inner product. Dimensionality reduction is a good way to solve the problems by transforming original highdimensional data into a low-dimensional “subspace” where the sample density is increased and distance calculation will become simpler and more convenient. Classic methods including principal component analysis (PCA) and linear discriminant analysis (Belhumeur et al. 1996) with systematic and theoretical foundation have been well practiced in various geological applications, which assume linear data structures are existed in the high-dimensional data. However, inherited from nonlinear geo-systems, relationships among geovariables often present nonlinearity. Utilization of linear methods to investigate these geo-variables will mislead the interpretation of nonlinear geo-processes. The proposed nonlinear mapping technique proposed by Sammon (1969) has triggered a big progress in understanding the world. Within a p-dimension space R p, suppose N samples with P variables can be denoted by Xi ¼ [xi1, xi2, . . .. . ., xip], i ¼ 1, 2, . . ., N. If the N points in the RP space are nonlinearly mapped in a lower dimensional space Rl(l < p): N samples will be Yi ¼ [yi1, yi2, . . .. . ., yil], i ¼ 1, 2, . . ., N. The distances or relations among samples should be approximate in both spaces of R p and Rl (dij and dij*). A critical variable E, termed as mapping errors, is defined by total sample distances in spaces of R p and Rl (dij and dij*) to constrained the transformation: 2 1 E¼ wij d ij dij ð1Þ NF i j
N1
l
yk1 ykj
2
ð3Þ
k¼1
The dimensionality reduction implemented by nonlinear mapping algorithm applies the variable E as the weight coefficient to minimize difference between the dij and dij*. The steepest descent method can iteratively reduce the total mapping error through adjusting coordinates of points. In general, a well mapping result will be obtained with less than a hundred iterations. When the iteration results become convergent, variations of errors should not be less than the “mapping tolerance” (often set as 0.000001). A result with less mapping errors can better preserve topological structures of samples in the original space for acquiring geometric configuration of samples in the new space.
Manifold Learning and Basic Algorithms Manifold learning method proposed by Roweis and Saul (2000) and Tenenbaum et al. (2000) has gradually become a heat point in studies of feature extraction. Manifold learning is another category of dimensionality reduction methods. Manifold is a homeomorphous space with Euclidean space at the local scale. In other words, the Euclidean distance can be used for distance measurement at the local scale. It means when the low-dimensional manifold is embedded in the highdimensional space, although the distribution of samples in the high-dimensional space seems extremely complicated, the properties of Euclidean space still locally stand true for those samples. Thus, the dimension reduction mapping relations between the high-dimensional and the low-dimensional spaces can be easily established at the local scale. Typical algorithms include spectrum-based algorithms, isometric mapping algorithm (Tenenbaum et al. 2000), local linear embedding algorithm (Rowies et al. 2000), local tangent space alignment (Zhang 2004), kernel principal component analysis (Schölkopf et al. 1997), Laplacian feature mapping (Belkin and Niyogi 2002), Hessian feature mapping (Donoho and Grimes 2003), etc. Here will introduce three famous and popular learning methods.
N
992
Isometric Mapping Algorithm (ISOMAP) The basic assumption of isometric mapping algorithm (Tenenbaum et al. 2000) is when the low-dimensional manifold is embedded in the high-dimensional space, estimated straight-line distances in the high-dimensional space are misleading, since the distances cannot find their equivalence in the embedded low-dimensional manifold. The fundamental thinking of the algorithm is to approximate geodesic distances among samples according to local neighborhood distance; meanwhile, based on systematical derivation of these distances, the problem of coordinates calculation in the embedded low-dimensional manifold can be changed to the issues of eigenvalues of matrices. The implementation of isometric mapping algorithm including three steps: (1) k neighbor (the first k data closest to the data point) or ε neighbor data (all data whose point distance is less than ε) of each sample in the highdimensional spatial data set {xi}Ni ¼ 1 need to be defined and connected to construct the weighted neighborhood figure of the high-dimensional data; (2) the shortest distances between each pair of data points within the neighborhood figure are calculated, which are further defined as an approximate geodesic estimation; and (3) the multidimensional scaling algorithm is applied to reduce the dimensionality of the original data set. The advantages of ISOMAP are obvious. With simple parameter settings, it provides a fast and globally optimal nonlinear mapping by calculating the geodesic distance, rather than the Euclidean distance between points in highdimensional space. However, searching for the shortest distances between each sample pair is quite time-consuming when the sample volume is too large. The ISOMAP algorithm has been successfully employed in many research fields.
Local Linear Embedding Algorithm (LLE) Local linear embedding proposed by Roweis and Saul (2000) is another nonlinear dimensionality reduction method based on the think of manifold learning. Suppose the data lies on or near a smooth nonlinear manifold of lower dimensionality, the main idea of the algorithm is to approximate global nonlinear structure with locally linear fits. During this processes, local linear relationships between original neighborhoods are maintained, and the intrinsic local geometric properties remain invariant to the local patches on the manifold. Its specific steps of LEE include: 1. Assign neighbors to each data points, for example using the k nearest neighbors.
Nonlinear Mapping Algorithm
2. Calculate the local linear reconstruction weight matrix W of the data from its neighbors by solving the following constraint least-square problem: 2
eðW Þ ¼
Xi i
ð4Þ
W ij Xj j
W ij ¼ 1 , and
Then, minimize ε(W) subject to (a) j
(b) Wij ¼0, if Xj is not belong to the set of neighbors of Xi. 3. Take the parameter of local linear representation as the invariant eigenvalue of high-dimensional and lowdimensional data, and calculate the unconstrained optimization problem to obtain the dimensionality reduction result Y: 2
min fðY Þ ¼ Y
Yi i
W ij Y j
ð5Þ
j
The advantage of LLE algorithm is to effectively record the inherent geometric structure of the data by utilizing the local linear relationship. Compared to the isometric mapping algorithm, it seeks for eigenvalue solving of sparse matrix and not requires iteration, which makes it more efficient in calculation. Rather than the geodesic distance relationship between sample points, the local linear embedding algorithm only maintains the local neighbor relationship, so it is not well descriptive of the manifold structure of data with equidistant characteristics. It should be noticed that this algorithm is more sensitive to noises since it assumes the samples are uniformly distributed.
Kernel Principal Component Analysis Algorithm (KPCA) Geological processes often present nonlinear characteristics. As a linear algorithm, PCA has been widely used in geosciences, but it may still fail to depict nonlinear structures embraced in geo-datasets, and unable to implement dimensionality reduction of geological characteristics, effectively. As an upgrading version, KPCA proposed by Schölkopf et al. (1997) is a nonlinear form of PCA. By introducing an integral operator kernel functions, KPCA allows to map lowdimensional space to high-dimensional space via nonlinear mapping and then executes PCA in this feature space. The intuitive idea of kernel PCA is to transform linear indivisible
Nonlinear Mapping Algorithm
993
problems in input space to linear divisible problem in feature space. Suppose we first map the data in an original space Rn into a new feature space F via nonlinear mapping function ’: ’ : Rn ! F
ð6Þ
x ! ’ðxÞ: The Euclidean distance in the new feature space F can be expressed as: d n ðx 1 y Þ
jpðxÞ rðvÞj
fðxÞ rðxÞ 2rðxÞ ’ðyÞ þ ’ðyÞ ’ðyÞ
ð9Þ
Obviously, if the kernel function K(x, y) is given, by which the data can be implicitly mapped to a high-dimensional feature space. Consequently, the data are linearly separable in the high-dimensional feature space. Since the inner product in the feature space is given by the kernel function, the calculation of which will no longer be confined by the dimensionality. The kernel function plays a vital role in the kernel method. As long as the kernel function is selected, a certain mapping is then defined. The only requirement on the kernel function is that it should meet requirements of Merce’s theorem for functional analysis. It means if the kernel function is a continuous kernel of a positive integral operator, then it performs a dot product during the mapping. Commonly used kernel functions are listed as follows, and the results will be varied according to the kernel function selected. 1. Polynomial kernel function: K ðx, yÞ ¼ ðaðx yÞ þ bÞd
d > 0, and a, b R
ð10Þ
2. Gaussian radial basis kernel function:
K ðx, yÞ ¼ exp
3. The Sigmoid function:
k x yk2 sR 2s2
ð12Þ
In comparison with other nonlinear algorithms, kernel PCA demonstrates its specific advantages: (1) no nonlinear optimization should be considered; (2) simple as PCA, its calculation is still focusing on solving eigenvalues; and (3) the number of preserved PCs is not necessary to be determined prior to the calculation. Due to the superiority of kernel PCA in feature extraction, it has been applied in various fields, e.g., face recognition, image feature extraction, and fault detection. From its research status, KPCA with mature algorithms employed for nonlinear feature extraction has been ideally applied in many fields.
ð7Þ
According to the Mercer’s theorem of functional analysis, the dot product in the original space can be represented by the Mercer kernel function in the new feature space F: K ðx, yÞ ¼ ð’ðxÞ ’ðyÞÞ ð8Þ Equations (7) and (8) can be expressed as: dH ðx, yÞ ¼ K ðx, xÞ 2K ðx, yÞ þ K ðy, yÞ
K ðx, yÞ ¼ tanhðaðx yÞ bÞa, b R
ð11Þ
Summary or Conclusions Nowadays, research topics relating to nonlinear mapping are still in development. Something worth to note are: 1. Nonlinear mapping algorithm applied to geological data has advantages in removal the information redundancy of high-dimensional data, intuitive expression of data distribution patterns, and statistical significance than that of linear mapping. However, attention should be paid to two-dimensional interpretation which has possibility of failure for cases in high dimensions. Therefore, delicate interpretation and discussion at multidimensions are more appropriate. 2. Almost all algorithms have both advantages and defects, due to their complicated theoretical foundations. Some of them had been further developed, but they may still not be able to interpret the nonlinear data comprehensively. Determination of neighborhood size and algorithm upgrading for reducing noise interference and enhancing robustness will be the potential focuses in the further.
Bibliography Belhumeur PN, Hespanha JP, Kriegman DJ (1996) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. In: European conference on computer vision. Springer, Berlin/Heidelberg, pp 43–58 Belkin M, Niyogi P (2002) Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances eural information processing systems, pp 585–591 David LD, Carrie G (2003) Hessian eigenmaps: new locally linear embedding techniques for high dimensional data, proc. Natl Acad Sci 100:5591–5596 Donoho DL, Grimes C (2003) Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences 100(10):5591–5596 Howarth RJ (2017) Dictionary of mathematical geosciences. Springer, Cham Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
N
994 Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 100:401–409 Schölkopf B, Smola A, Müller KR (1997) Kernel principal component analysis. In: International conference on artificial neural networks. Springer, Berlin/Heidelberg, pp 583–588 Smith DA (1980) Sealing and nonsealing faults in Louisiana Gulf Coast salt basin. AAPG Bull 64:145–172 Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290: 2319–2323 Zhang Z (2004) Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J Sci Comput 26:313–338
Nonlinearity
that the world is inherently nonlinear, and knowledge on nonlinearity and its causative mechanisms will significantly enhance our recognition to interactions among the lithosphere, hydrosphere, atmosphere, and biosphere that will consequently progress human capability of reasonable development, natural resource utilization, prediction and/or safeguard against serious geological disasters, and environment protection.
Introduction
Nonlinearity Wenlei Wang1, Jie Zhao2 and Qiuming Cheng2 1 Institute of Geomechanics, Chinese Academy of Geological Sciences, Beijing, China 2 China University of Geosciences (Beijing), Beijing, China
Definition Nonlinearity is a broad term descriptive of complex natures of the Earth system. These signatures are not an artifact of models, equations, and simplified experiments but have been observed and documented in many investigations and surveys as diverse variables. Formally defined in “Dictionary of Mathematical Geosciences” (Howarth 2017), the concept of nonlinearity is “The possession of a nonlinear relationship between two or more variables. In some cases, this may be a theoretical assumption which has to be confirmed by experimental data. The converse is linearity.” In order to better recognize nonlinearity, the term linearity should be referred first, which denotes possession of linear or proportional relationships among variables and can create “a straight line” in Cartesian coordinates. In contrary, nonlinearity denoting disproportional relationships among variables often creates “curves” (not a straight line) in the coordinates, e.g., parabola of a quadratic function relation. However, discrimination on linearity and nonlinearity is not that simple within the context of complex systems and complexity science. Linearity as a special case of nonlinearity merely has a proportional relationship among variables that small causes (inputs) lead to small effects (outputs), while nonlinearity often displays deviation from this simple relationship. Nonlinear relationship can be varied dramatically, and its intrinsic mechanisms are governed by the nonlinear paradigm that one cause (input) can lead to more effects (outputs) and/or one effect (output) may be led by more causes (inputs). Causative factors of nonlinearity in the Earth system are often with interrelated, interacted, and intercoupling effects, by which small variation can result disproportional and even catastrophic influence on results or other variables. Many Earth scientists even stated
In the community of the Earth sciences, one of significant missions is to understand natures of the Earth for better living and long-term development. The real world is a complicated system. For solving practical problems, mathematical models should be established and solved for simulating and interpreting natural phenomena in the system, respectively. Constrained by inadequate capability and means of cognition, before advent of nonlinearity, it was generally accepted that the nature could be explicated linearly by linear equations. In order to find the approximate solution to complex systems or geoprocesses, they were routinely simplified by eliding some secondary causes or decomposing into several simple subsystems. The transformation of a nonlinear problem into linear problems is termed as linearization, during which multiple linear approximation methods were often adopted as well. There is generally a parameter by which the nonlinear equation is expanded to obtain several or infinite linear equations. The approximate solution of nonlinear issues can consequently be achieved by solving these expanded linear equations. However, the method of linear approximation is not always effective, since development of the Earth system accompanies with not only quantitative transition but also dramatic qualitative changes. It is obvious that solution or interpretation by linear approximation works effective for linear or approximately linear phenomena, but the cases with fundamental differences from linear phenomena are often powerless. As mentioned above, nonlinearity is a ubiquitous and objective attribute in the Earth system and its compositions (subsystems). In particular, diverse geological processes including plate movement, tectono-magmatic activities, earthquakes, mineralization, etc. with dramatic variations in temperature, pressure, stress, and strain are disproportionately caused and effected by complex interaction and intercoupling mechanisms. These geological (or formation) processes essentially demonstrate nonlinearity, and it is precisely because the existence of nonlinearity prevents the practical efficiency of classic mathematical methodologies. Joint research and thinking of the Earth and nonlinear sciences have been developed for 100 years (Ghil 2019). The Earth system and its subsystems spanning wide spatial-temporal scales that generally demonstrate nonlinearity are dynamically developed and often unrepeatable,
Nonlinearity
unpredictable, and dimensionless. The Earth scientists have proposed some nonlinear ways of thinking about these nonlinear signatures. The complex Earth system pertains to geoprocesses that are believed as deterministic and can be modeled with fractal and deterministic chaos (Turcotte 1997). Progressive development in theories of chaos (Lorenz 1963), fractal (Mandelbrot 1983), and dissipative structure has profound implications for simulation, predication, and interpretation of many subsystems of the Earth (Flinders and Clemens 1996; McFadden et al. 1985; Phillips 2003).
995
curves of repeating computation will be gradually separated and become dissimilar as the computation time increases (Fig. 1). The butterfly effect is a figurative expression of this characteristic. Consequently, short-term weather condition can be predictable, but not for long-term forecast (Lorenz 1963). dx ¼ s ð y xÞ dt dy ¼ Ra x y xz dt dz ¼ xy bz dt
(1)
Chaos From observations, it can be found that occurrences of numerous natural phenomena in the deterministic Earth system with nonlinearity seem stochastic. A deterministic system is governed by rules that have no randomness and nothing stochastic. The present state can determine the future state. However, the system can sometimes have aperiodic and apparently unpredictable behaviors, and its development or evolution is not simply repetitive motion to equilibrium. Chaos is a term descriptive of evolution or development with stochastic appearance in deterministic dynamic systems. The chaos theory (Lorenz 1963; Turcotte 1997) mainly analyzes evolutive characteristics of nonlinear systems that is the evolution or development process from disorder, order, to chaos with the increase of nonlinearity. It is the science to explore their causative processes and evolutions originated from nonlinear dynamic systems which are usually described by ordinary differential equation, partial differential equation, or iterative equation. Chaos possesses both ordered periodic and disordered quasi-periodic structures that manifest the inherent randomness of nonlinear dynamic systems. The chaos theory still in development currently focuses on studying the characteristics, orderliness, and evolutionary mechanism of chaotic behaviors in nonlinear dynamic systems. Chaotic behaviors possess random appearance but are not fundamentally stochastic. During geo-processes, the evolution and development of geo-variables or components of the Earth system demonstrate both repulsion and attraction, simultaneously. The attraction causes these variables clustered in the vicinity of a collection, while the repulsion of the collection makes these variables neither too close nor separate that reach an equilibrium state. Then the repulsion may be further enhanced resulting that these variables will be attracted by other attractors. Iterations of this process constitute complex chaotic behaviors (Fig. 1). These behaviors or geo-processes are sensitive to initial conditions and small perturbations, and initial differences or causes of minor deviations tend to persist and grow over time. From the Lorenz equation (Eq. 1), even under a same initial condition, integral
where, x, y, and z are proportional to the intensity of convection motion, the temperature difference between the ascending and descending currents, and the distortion of the vertical temperature profile from linearity, respectively; s is Prandtl number; b is velocity damping constant; and Ra is Rayleigh number, a natural convection-related dimensionless number. The application of chaos theory greatly expands human’s horizon and thinking of recognition to nonlinearity of geoprocesses (Turcotte 1997). Applying to investigate the convection caused by nonuniform heating during the raising of mantle can reveal a dynamic process from horizontal melting, multilayered convection, and turbulence (chaos) indicating that the Earth is an open thermal-dynamic system. The evolution process includes numerous nonlinear dynamic mechanisms that produce irregular (anomalous) and incoherent chaotic but stable structures. In the Earth system, many other chaotic behaviors or phenomena can be found in lithosphere evolution, mantle convection, geomorphic systems, geomagnetic reversal, etc. (Turcotte 1997).
Fractal Many substances formed or produced in the nonlinear Earth system are often with irregular shapes (e.g., cloud, mountain, and coastline) and heterogeneously distributed quantities (e.g., element concentration, density, magnetism, etc.). Their irregular shapes different from continuously smooth curves and surfaces well described by classical differential geometry can be continuous but unsmooth or coarse with noninteger dimensions. Traditional mathematical methodologies like Euclidean geometry (e.g., 1, 2, and 3 dimensions for point, area, and cube, respectively) cannot measure these abnormal features. They were mentioned in classical mathematical literatures, but mostly treated as special and “weird” cases without delicate discussions. As progressive awareness of spatial structures (e.g., symmetry and self-similarity of crystal
N
996
Nonlinearity
Nonlinearity, Fig. 1 A structure in the chaos – when plotted in three dimension, the solutions to Lorenz equation fell onto a butterfly-shaped set of points
and crystal grown), it turns out that they may represent another geometric form of the real world. In mathematical sense, fractal patterns or shapes (Fig. 2) can be generated by a simple iterative equation that is a recursion-based feedback system. Using the Koch curve as an example, a straight line is taken as the initial curve. It can then be reconstructed as shown in Fig. 2 that is termed as generator with 4/3 length of the initial curve. Repeating this process for all line segments can update the curve constituted by line segments with a total length of (4/3)2. Further iteration will produce a curve with an infinite length. The curve is continuous but nowhere can be differentiable (Schroeder 1991). The fractal theory or fractal geometry proposed by Mandelbrot (1983) is to quantify objects with irregular geometry. Since the Earth substances with irregular shapes are ubiquitous in natural systems, it is so called as “geometry of nature” (Mandelbrot 1983). If a curve is equally divided into N straight-line segments with a length of r, its approximate length L(r) can be evaluated by: L(r) ¼ Nr. The estimated L(r)
will be a finite limit as the step (r) reducing to zero. For a fractal or more complex curve, the approximate length L(r) will be infinite as r goes to 0; however, there is a critical H exponent DH > 1 which can keep the length L(r) ¼ NrD as finite (Schroeder 1991). The exponent DH is called Hausdorff dimension: log N DH ¼ lim ð2Þ r!0 logð1=r Þ When the above approximation is applied to the Koch curve (Fig. 2), choosing r ¼ r0/3n for the nth generation, the number of segments N is proportional to 4n. log 4 ¼ 1:26 . . . ð3Þ DH ¼ log 3 The approximation between 1 and 2 well demonstrates that the Koch curve is more irregular and complicated than a smooth curve but does not cover an area (DH ¼ 2). The Hausdorff dimension which can be fraction values is called fractal dimension in fractal geometry. Patterns, forms, or structures of the Earth substances with Hausdorff dimensions
Nonlinearity
997
Nonlinearity, Fig. 2 A: The initial curve; B: generator for the Koch curve; C: the next stage in the construction of the Koch curve; and D: high-order approximation to the Koch curve (D). After Schroeder (1991)
N greater than their Euclidian dimensions are fractal, like the above Koch curve with a DH ¼ 1.26 between Euclidian dimensions of planar surface and volume. Moreover, fractal possesses self-similarity, and its basic structure is scale invariant (Fig. 3). It can exhibit similar patterns as changing scales (Fig. 2). Fractal geometry is the science of complex geometry with self-similar structures to investigate intrinsic mathematical laws beneath the complex surface of nature. In practice, self-similarity is often not strictly similar, but for the case that any parts possess same statistical distributions as the whole is named as statistical self-similarity. Meanwhile, fractal theory mathematically linked to chaos and other forms of nonlinearity has been used as a morphometric descriptor of many forms and patterns produced by complex nonlinear dynamics. It has been broadly applied in the Earth sciences, especially in geology and geophysics (Turcotte 1997). In addition to classic geometry, it opens up a new way to investigate complex systems in support of simulation and explanation of nonlinear geological phenomena. The early applications were mainly to discover fractal structures and measure fractal dimensions of various geological features. Nowadays, it is more focusing on
geological processes and their indicative meanings interpreted from fractal dimension and multifractal spectrum of geological phenomena and geological bodies (Cheng 2017).
Selected Case Studies The first selected case study of nonlinearity is chaotic processes discovered in magma mingling and mixing processes (Flinders and Clemens 1996). Inspired by the forming process of mixed patterns of two or more distinctly different fluids, the inherent deterministic chaos of magma systems was investigated through the mingling and mixing of enclave magmas with host magmas. During the chaotic process, migration of adjacent enclave blobs may be arbitrary, and these enclaves will experience different evolution. Data (e.g., crystal morphology, chemical features, and isotope) collected from microgranitoid enclaves within individual felsic magmas exhibits significant variations which may discover driving effects of nonlinear dynamics on the evolution of complex magma systems. Another example is the discovery of
998
Nonlinearity
Nonlinearity, Fig. 3 Singularity curve calculated from global databases of igneous and detrital zircon U-Pb ages, and the predication of the crust evolution by fractal-based models. Modified from Cheng (2017)
nonlinearity in geodynamo evidenced by palaeomagnetic data (McFadden et al. 1985). From observational palaeomagnetic data, the magnetic field in the core has a substantial and usually nonlinear effect on the velocity field, and the variation of westward drift rates can be interpreted according to nonlinear effects relating to the magnetic field intensity. The last selected case study is on the predication of the Earth evolution (Cheng 2017). As mentioned above, knowledge on nonlinearity can enhance our capability in predication and explanation. In Cheng (2017), global databases of igneous and detrital zircon U-Pb ages were utilized to investigate episodic evolution of the crust. Fractal-based methodologies were developed to analyze shapes of the age curve with scale invariant fractality and singularity of the age records. It was discovered that the age peaks follow a power law distribution, and exponents estimated from the age curve and age peaks well demonstrate an episodic evolution of the crust. The obtained descending trend (Fig. 3) may be indicative of mantle cooling. Furthermore, following the trend of mantle cooling, it was predicted that the magmatic activities may vanish in 1.45Gyr.
Summary and Conclusions Nonlinearity inherent from the origin and evolution of the Earth system is significant. After a century of efforts on understanding nonlinearity, theories and methodologies to interpret and explain nonlinear phenomena have greatly progressed. Nowadays, we do not only discover nonlinearity from phenomenological observation but also can reveal nonlinearity from intrinsic relations of data. There are effective
methods such as chaos and fractal to study evolutionary mechanism of nonlinear behaviors and irregular shapes caused by nonlinear processes. As a fundamental scientific issue, prediction of future development or evolution of the Earth system is an important goal of mathematical geosciences. The Earth system is a giant open nonlinear system whose variation is mutual transformation among nonstationary, deterministic, and random processes. For nonlinearity at different spatial-temporal scales, applying chaos, fractal, and other methods to construct new predication theory and methodology will be the main content of future research.
Bibliography Cheng Q (2017) Singularity analysis of global zircon U-Pb age series and implication of continental crust evolution. Gondwana Res 51: 51–63 Flinders J, Clemens JD (1996) Non-linear dynamics, chaos, complexity and enclaves in granitoid magmas. Trans R Soc Edinb Earth Sci 87: 217–224 Ghil M (2019) A century of nonlinearity in the geosciences. Earth Space Sci 6:1007–1042 Howarth RJ (2017) Dictionary of mathematical geosciences. Springer, Cham Lorenz EN (1963) Deterministic nonperiodic flow. J Atmos Sci 20: 130–141 Mandelbrot BB (1983) The fractal geometry of nature. Freeman, Francisco McFadden PL, Merrill RT, McElhinny MW (1985) Non-linear processes in the geodynamo: palaeomagnetic evidence. Geophys J Int 83: 111–126 Phillips JD (2003) Sources of nonlinearity and complexity in geomorphic systems. Prog Phys Geogr 27:1–23 Schroeder M (1991) Fractals, chaos, power laws: minutes from an infinite paradise. Freeman, New York Turcotte DL (1997) Fractals and chaos in geology and geophysics. New York, Cambridge
Normal Distribution
Normal Distribution Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
999
distribution through his work in 1809 (Gauss 1963). However, the history of normal distribution dates back to the eighteenth century (Fendler and Muzaffar 2008; Kim and Shevlyakov 2008; Stigler 1986), starting out as a way to represent binomial probability. If binomial variables represent a special case of the set of random variables, where all Xi have Bernoulli distribution with a parameter p, then, for large n and moderate values of p, (0.05 p 0.95), based on CLT, Binomialðn, pÞ N m ¼ np, s ¼ npð1 pÞ :
Definition Normal distributions have been widely used in natural and social sciences to approximate unknown continuous distributions of observed or simulated data. Real world data is realvalued and is handled using random variables which are variables whose values depend on outcomes of random processes. It is a parametric distribution, defined by two parameters, namely, its mean m and its standard deviation s. It is represented as N ðm, sÞ, whose probability density function is given by ðxmÞ2 1 fðx; m, sÞ ¼ p e 2s2 , 1 < x < 1, s 2p giving the characteristic bell curve shape, as shown in Fig. 1. Normal distribution is unimodal, i.e., it is a distribution with a unique mode, where mode ðN ðm, sÞÞ ¼ m. The standard normal distribution is a special case of N ð0, 1Þ, as shown in Fig. 1. A non-standard normal distribution can be converted to its standard form, using standardized values called Z-score of the random variable, given by Zi ¼ Xism . Computing probabilities for different intervals of random variables is done by converting them to their z-scores and looking up probabilities of the z-scores from the Standard Normal Distribution table. Computing probabilities of such clearly defined events correspond to direct problems. Computing the value of the random variable using its probability corresponds to inverse problems, which are also solved in the case of normal distributions by using its standardized form and the lookup table. Normal distributions are significant due to the Central Limit Theorem (CLT). For a set of independent and identically distributed (i.i.d.) random variables in real-valued data, X1, X2,. . . Xn, each with the same mean m and standard deviation s, CLT states that as n grows, i.e., n ! 1, the distribution of the average of the set X¯n converges to the normal distribution N ðm, s2 =nÞ . This implies that, for p the average value, lim Z n ¼ n Xnsm is the limiting case n!1 of the standard normal random variable.
Overview Gaussian distribution is another name for the normal one, as K. F. Gauss has been attributed the origin of the normal
In the year 1733, Abraham de Moivre had determined the normal distribution for a special case of p ¼ 0.5, as a limit of a coin-tossing distribution (Pearson et al. 1926; Stigler 1986). de Moivre had provided the first treatment of the probability integral. This was improved by P-S Laplace in 1781 by generalizing for 0 < p < 1 (Laplace 1781; Kim and Shevlyakov 2008). Gauss in 1809 popularized the derivation of the normal distribution, but not as a limiting form of the binomial distribution. He used the normal distribution to explain the distribution of errors, thus leading to the distribution becoming eponymous to him (Gauss 1963). The use of normal distribution for errors made it appropriate for modeling errors in the linear least-squares model (LS) by Gauss. Since LS is solved using a system of equations, called as normal equations, the name “normal” is associated with the distribution itself. In short, it is widely believed that Laplace, Gauss, and a few other scientists had arrived at the normal distribution through their independent work (Stigler 1986). The study of the theory of errors played a key role in developing statistical theory and improved the applicability of normal distribution in different domains. As the applications of normal distribution shifted from scientific to societal ones, its use began to popularize the notion of the average man in 1846 by Belgian scientist, Adolphe Quetelet, as a representative of the human population. The notion of average man perpetuates the deceptive characterization of the distribution to have stable homogeneity (Fendler and Muzaffar 2008). Thus, despite its varied applications in establishing the universality of the normal distribution, its overuse has slowly led to the need of being scrupulous of its correct formulation and usage across varied applications. Normality tests, including Q-Q plot and goodness-of-fit tests, are used for determining the likelihood that the given set of observations X1, X2,. . . Xn form a normal distribution. There are different types of normality. The normality tests are used to determine exact normality and approximate normality in the data. Another type of normality, asymptotic normality is a property of data, especially for a large sample, where the sample means have an approximately normal distribution, conforming to the CLT, i.e., the distribution of the z-scores converges to a standard normal one as of the sample means, X, n ! 1. Asymptotic normality is a key property of the least squares estimators of generalized space-time autoregressive
N
1000
Normal Distribution
Normal Distribution, Fig. 1 Plots for the probability density function in normal distribution with different values for mean m and standard deviation s, where the red curve is the standard normal distribution. (Image source: an online browserbased application, Desmos, https://www.desmos.com/ calculator/0x3rpqtgrx)
(STAR) models, used widely in ecology and geology (Borovkova et al. 2008). STAR models are applied in geological applications where prior information about spatial dependence, e.g., exact site locations for observations, is available. Asymptotic normality says that the estimator converges to the unknown parameter at a very fast rate of p1n , which emphasizes the quality of the estimator. STAR models and their generalized variants are used for predicting rainfall, air pollution, groundwater levels in basin models, and similar variables in geospatial models. Apart from its role in the CLT, normal distributions are indispensable in signal processing. Following are some of the significant properties of the normal distribution (Kim and Shevlyakov 2008) related to signal processing:
information and preserve variance bounded by its expected value s2 , is the normal distribution. 1
1
¼ arg max H ð f Þ ¼ N ð0, sÞ: f ð xÞ
Similarly, the lower bound of the variance of a parameter estimator gives minimum Fisher information. The solution of the distribution that minimizes Fisher information is a normal distribution. 1
For I ð f Þ ¼ 1. Two normal distributions upon convolution give another normal distribution. 2. The Fourier transformation of a normal distribution is another normal one. 3. The normal distribution maximizes entropy and minimizes Fisher information. Solving for the variational problem of maximizing Shannon entropy H( f ), the distribution f that would discard
f ðxÞ logð f ðxÞÞdx, we get f ðxÞ
For H ðf Þ ¼
1
@ logð f ðx, yÞÞ @y
2
f ðx, yÞdx, we get f ðxÞ
¼ arg min I ð f Þ ¼ N ð0, sÞ: f ð xÞ
Related Distributions The normal distribution belongs to the exponential family of probability distributions, which also include log-normal, von Mises, and others.
Normal Distribution
1001
Log-normal distribution is a close relative of normal distribution in the exponential family. It is a continuous probability distribution where the logarithm of the random variable follows a normal distribution. It is also referred to as Galton’s distribution, and is widely used in modeling data in engineering sciences, including others. If the observed values are strictly positive, then the log-normal distribution is preferred for modeling over the normal distribution, which accommodates both positive and negative values (Barnes et al. 2021). Its probability distribution function is given as: fðx; m, sÞ ¼
1 p
e
ðln xmÞ2 2s2
for m ð1, þ1Þ, xs 2p s > 0, and support x ð0, þ1Þ:
von Mises distribution is a continuous probability distribution, that can be closely approximated as a wrapped or circular normal distribution. Hence, it is also referred to as the circular normal or Tikhonov distribution. Applications of von Mises distribution include any periodic phenomena, e.g., wind direction (Barnes et al. 2021). The parameters of the von Mises distribution are m and k, where the latter is the reciprocal parameter of dispersion and is analogous to variance s2. Its probability density function, with I0(k) as the modified Bessel function of order 0, is given as: fðx; m, kÞ ¼
ek cosðxmÞ , where m ℝ, k > 0, 2pI 0 ðkÞ support x any interval of length 2p:
Statistical parametric mapping (SPM) also demonstrates varied uses of normal or Gaussian distributions in spatial data mining (McKenna 2018). Spatial fields are a common representation of continuous data in geoscientific and environmental applications, e.g., permeability, porosity, mineral content, reflection in satellite imagery, etc. Comparison of such fields over time or other characteristics (e.g., ensemble runs of simulations) require the generation of difference maps, in which anomalies are detected, for better data interpretation, using SPM. SPM involves analyzing each voxel in a volumetric grid or pixel in an image, using any univariate parametric test, and the results of the test statistics are assembled in an image, in the form of a map. Properties of Gaussian or normal distribution of the data in the fields drive the SPM techniques. The pixel values in the statistical parametric map are modeled using a two-dimensional normal/Gaussian distribution function. Usually, this involves transforming a map of t-statistics to that of z-scores. Once the SPM is generated for localized anomaly detection, several post-processing steps of smoothening using Gaussian filtering, reinflation of the variance, image segmentation, and hypothesis testing are implemented in sequence. SPM has been used for anomaly detection in the imagery in satellite images, in transmissivity in groundwater pumping, etc. This specific application places a large emphasis on the Gaussian distribution of the pixels in the difference maps. Given the multivariate nature of data involved in SPM, the modeling involves multi-Gaussian (MG) fields.
Geostatistical Applications
Future Scope
Normal distributions are widely used in geostatistical modeling. Two different examples, namely, Kriging and statistical parametric mapping (SPM), are explored here to understand the relevance of normal distributions in geoscientific applications. Data gaussianity is an essential property, where the data has the propensity for normal distributions, in geo- and spatial statistical modeling and experimental data analysis. Kriging is a widely used geostatistical methodology for interpolation that relies on the prior covariances between samples that are assumed to be normally distributed. The Kriging methodology, which is a form of Bayesian inference, involves estimating the posterior distribution by combining the Gaussian prior with a Gaussian likelihood function for each of the observed variables. This posterior distribution also tends to be normal. The covariance function is used to stipulate the properties of the normal distribution. Thus, Kriging represents the variability of the data up to its second moment, i.e., covariance. Kriging has varied interdisciplinary uses in hydrogeology, mining, petroleum geosciences (for oil production), etc.
Some of the popular geostatistical methodologies, e.g., Kriging, exploit the properties of normal distribution. Hence, to improve the applicability of such a methodology, data transformation is an option. One such method is Gaussian transformation of spatial data (Varouchakis 2021), which is needed to reduce the effect of outliers, improve stationarity of the observed data, and process the data to be apt as the input for ordinary Kriging, which is widely used. Determining such transformations is challenging, as it is highly dependent on the data and its application. The use of the modified Box-Cox technique for Gaussian transformation has been successfully used to improve the normality metrics of datasets, such as detrended rainfall data, groundwater-level data, etc. Thus, this line of work opens up research on appropriate data transformation techniques to improve normality in the data and will continue to keep normal distributions highly relevant to geostatistics. Given the uptake in the use of neural networks for data analysis, normal distributions and related distributions (e.g., log-normal, von Mises, etc.) play a crucial role in uncertainty
N
1002
quantification of data via machine learning in geosciences (Barnes et al. 2021). Uncertainty can be added to any neural network regression architecture to locally predict the parameters of a user-specified probability distribution instead of just performing value prediction. Apart from using normal distributions as user-defined ones for fitting the data, uncertainty quantification also involves adding noise which follows either normal or log-normal distribution. In summary, this article introduces the concept of normal distribution and brushes upon its relevance in geosciences.
Cross-References ▶ Kriging ▶ Lognormal Distribution ▶ Spatial Statistics ▶ Spatiotemporal Modeling ▶ Stationarity ▶ Uncertainty Quantification ▶ Variance ▶ Z-transform
Normal Distribution
Bibliography Barnes EA, Barnes RJ, Gordillo N (2021) Adding uncertainty to neural network regression tasks in the geosciences. arXiv preprint arXiv:210907250 Borovkova S, Lopuhaä HP, Ruchjana BN (2008) Consistency and asymptotic normality of least squares estimators in generalized STAR models. Statistica Neerlandica 62(4):482–508 Fendler L, Muzaffar I (2008) The history of the bell curve: sorting and the idea of normal. Educ Theory 58(1):63–82 Gauss KF (1963) Theoria Motus Corporum Celestium, Perthes, Hamburg, 1809; English translation, Theory of the motion of the heavenly bodies moving about the sun in conic sections. Dover, New York Kim K, Shevlyakov G (2008) Why Gaussianity? IEEE Signal Process Mag 25(2):102–113 Laplace PS (1781) Mémoire sur les probabilités. Mémoires de l’Academie royale des Sciences de Paris 9:384–485 McKenna SA (2018) Statistical parametric mapping for geoscience applications. In: Handbook of mathematical geosciences. Springer, Cham, pp 277–297 Pearson K, de Moivre A, Archibald R (1926) A rare pamphlet of Moivre and some of his discoveries. Isis 8(4):671–683 Stigler SM (1986) The history of statistics: the measurement of uncertainty before 1900. Harvard University Press, Cambridge Varouchakis EA (2021) Gaussian transformation methods for spatial data. Geosciences 11(5):196
O
Object Boundary Analysis Hannes Thiergärtner Department of Geosciences, Free University of Berlin, Berlin, Germany
Definition A linearly ordered set of multivariate geological objects can be segmented into quasi-homogeneous subsets significantly different from directly neighbored subsets. Rodionov (1968) developed a statistical algorithm augmenting the already known heuristic cluster-analytical models to classify ordered objects and to estimate boundaries between different subsets (1. Boundaries between ordered objects). The basic algorithm is supplemented by two algorithms that classify disordered objects into quasi-homogeneous subsets (2. Boundaries between disordered objects) and rank detected boundaries according to their importance (3. Hierarchy of boundaries).
Basic Algorithm 1. Boundaries Between Ordered Objects. A linearly ordered set of n vectors of m quantitative attributes xn ¼ {xn1, xn2,. . ., xni,. . ., xnm} is to be classified iteratively into two subsets h1 covering k vectors and h2 covering (n– k) vectors (1 k n– 1). The assumed boundary between h1 and h2 is subjected to a statistical test. The iterative procedure is starting with k ¼ 1 and ending with k ¼ (n–1). Mean value vectors are calculated for both subsets. Significance of a difference d between the mean value vectors is subjected to the alternative hypothesis H1 (d (r20) 6¼ {0; 0; . . .; 0}) and tested against the null hypothesis H0 (d (r20) ¼ {0; 0; . . .; 0}). The test criterion is
v r 20 ¼
n1 nðn kÞk
k
ðn k Þ
m
xti k
t¼1 n
i¼1
t¼1
x2ti
1 n
2
n
xti
t¼kþ1 2 n
ð1Þ
xti t¼1
where xti is the measured value of attribute i for object t. A statistically significant boundary between both subsets is given at a significance level α if v(r20) > w2 (α; m). The value v (r20) is marked. The null hypothesis is accepted if v(r20) w2 (α; m) meaning that the assumed boundary is not statistically significant. All significant boundaries are marked by their v (r20) values. Finally, a series is created of 1, 2, or at most (n– 1) boundaries each characterized by its v(r20) value. In a second step, the boundary with the maximum test criterion is selected as the most important one. It subdivides the set of ordered objects into two main subsets. Steps 1 and 2 will be repeated successively for all created subsets until new subsets cannot be established. Step 3 includes elimination of created but unnecessary boundaries which could subdivide homogeneous neighbor vectors. It is based on a homogeneity test for all obtained boundaries between neighbored subsets T1 and T2 which are not resulting from the last step. The null hypothesis that a generated boundary is unnecessary will be accepted if v(T1; T2) w2 (α; m) where vðT 1 ; T 2 Þ ¼
n1 þ n2 1 n1 n2 ð n1 þ n2 Þ m
n1
t T 2 xti
n2
t T 1 xti
2 2
i¼1 t T 1 [T 2
1 x2ti n1 þn 2
t T 1 [T 2
:
ð2Þ
xti
Acceptance of the null hypothesis means that the neighboring subsets are not significantly different from each other and that their boundary was artificially resulting from the preceding iterative procedure.
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
1004
Object Boundary Analysis
2. Boundaries Between Disordered Objects. This is a similar approach, but it does not get prevalence because the power of the first algorithm regarding the neighborhood is not inserted into the model. A set of n vectors for m quantitative attributes xn ¼ {xn1, xn2,. . ., xni,. . ., xnm} which do not show any spatial order is to be classified into h subsets T1, T2,..., Ts..., Th with at least two objects each. The calculation starts with the determination of Þ all v(Ts;Tk) between the hðh1 pairs of objects: 2 vðT s , T k Þ ¼
v v2 N¼ p 1 2 v1 þ v2 m
will be applied only if two boundaries b1 and b2 are in existence. The null hypothesis H0 (b1 ¼ b2) is preferred if no other hypothesis is accepted; H1 (b1 < b2) is accepted if N < N (α); H2 (b1 > b2) is accepted if N > N (1–α); and H3 (b1 6¼ b1) is accepted if N > N 1 a2 : A w2-distributed test criterion
ns þ nk 1 ns nk ð ns þ nk Þ
w2 ¼ 2
m
ns
t Tk
xti nk
t Ts
xti
ð3Þ 2
i¼1 t T s [T k
1 x2ti ns þn k
t T s [T k
xti
where Ts and Tk are the first and second subsets with ns and nk objects (n 2) respectively, and xti is the value for attribute i for object t. Next, a test for general homogeneity of all objects is conducted. General homogeneity means that there are no significant boundaries between the multivariate described objects involved. The null hypothesis H 0 ðall ½YTs YTk ¼ f0; 0; . . . ; 0gÞ
ð4Þ
regarding all objects will be accepted if 2
min s,k vðT s , T k Þ w ða; mÞ:
ð6Þ
1 2ð2y mÞ
k
ðvi yÞ2
ð7Þ
i¼1
will be applied if g boundaries b1, b2, . . . bg exist; y is the mean value of all v-values. The null hypothesis H0 (b1 ¼ b2 ¼ . . . ¼ bg ¼ b0) that all boundaries are of equal rank is to be accepted if w2 w2 (α; k–2). Otherwise, at least one of the boundaries can be distinguished clearly from the other ones. If this is the case, all v-values will be ordered according to their decreasing size: yi1 > yi2. . . > yig. Next, boundaries between geological objects will be united within a common class of equivalent boundaries if they do not show significant differences. This procedure will be iteratively repeated until no relation v w2 can be found anymore. The end result is a hierarchical set of boundaries classified according to their range.
Applications ð5Þ
At least one object shows a significantly different attribute vector, and at least one boundary exists if the null hypothesis is rejected. ΘTs refers to the mean value vector of the subset Ts which includes ns geological objects. The procedure will be continued only if the null hypothesis is rejected. In this case, subsets will be united iteratively into larger quasihomogenous subsets. At every step, two subsets corresponding to the term min(s; k)v(Ts; Tk) form a new, larger subset without internal boundary if v w2(α; m). This new subset replaces the previous subsets Ts and Tk. The number of classes is reduced by 1. The tests are continued as before until all possible combinations of subsets have been processed. 3. Hierarchy of Boundaries. The problem is testing whether or not significant boundaries bi characterized by their v-values vi (r20) are of equal rank. There can arise four possible situations: boundaries are equal, one boundary is greater than or less than another one, or the boundaries are not equal. A normally distributed test criterion
Examples of linearly ordered sets of multivariate geological objects are borehole profiles, profile sections, terrestrial or submarine traverses, and time series as well. Neighboring geological objects not separated by significant boundaries constitute quasi-homogeneous classes. Each class can be described by the estimated mean value and variance of every attribute, by the number of objects included, by its spatial or temporal position and other related information. Spatial or temporal coordinates of the geological objects are not taken into account. Equidistance of samples or observations is not necessary, but their given linear sequence must be respected. In view of the statistical test criteria belonging to the procedure, a (0;1)-standardization of all input data is recommended so that all attributes are to receive equal weight. More subtle differences between neighbored objects can be detected if more attributes are considered. An increased statistical significance level such as α ¼ 0.10 instead of the commonly used α ¼ 0.05 diminishes the theoretical w2 values and results in a finer subdivision of the set of linearly ordered geological objects.
Object Boundary Analysis
The boundaries between obtained classes can be interpreted geologically. An application of the testing model “hierarchy of boundaries” is recommended for differentiation between more or less important boundaries. Boundaries of higher order may be tectonic (e.g., thrust faults) or stratigraphic discordances. Lower-order boundaries represent general changes in tectonically undisturbed sequences of sedimentary layers or in the evolving fossil content. A boundary in the chemical composition or in mineral content provides essential information if quasi-homogeneous blocks in mineral deposits should be distinguished and delineated for subsequent mining. The model “boundaries between disordered objects” represents an extension of the basic model “boundaries between linearly ordered objects.” The principle that a spatial neighborhood of multivariate objects is necessary does not exist anymore. Quasi-homogeneous classes can be generated even between nonneighboring objects. This statistical model competes with the better-known cluster analysis methods. However, it has the advantage of forming spatially interrelated groups of objects. An application can be recommended if a number of multivariate well logs are available for an exploratory target area or for a mineral deposit. The model then results in several closed classes both within every profile and between profiles. This gives the opportunity for crosscorrelation between geologically connected profiles. The subset boundaries can be interpreted as stratigraphic, fossiliferous, or lithologic ones, or as delineations between quasihomogeneous spatial blocks within mineral deposits. Past applications in various geological projects in Germany gave good results.
Relation to Cluster Analysis and Discriminant Analysis The segmentation of a linearly ordered set of multivariate geological objects belongs to the group of nonsupervised classification methods. It is based on a multivariate statistical comparison of means at a chosen significance level. The user cannot incorporate any other (geo)scientific information. The model developed and introduced by Rodionov respects the neighborly relations between geological objects and enables uniting neighboring objects into a common quasi-homogeneous subset even if their attribute vectors differ to some degree. Thus, it is a generalizing approach. A subdivision of the same set of objects by cluster analysis is a heuristic model based on one of several multivariate similarity or distance measures. Cluster analysis belongs also to the class of nonsupervised classification models. The user selects a suitable measure and can influence the result using experience and preknowledge. Information about the statistical
1005
uncertainty is not available. Every geological object will be classified into one of several resulting groups regardless of its spatial or temporal position. In this way, the ordered sequence of objects will be subdivided into numerous independent subsets or single objects. The approach is individualizing. Models of discriminant analysis differ clearly from the preceding models. They are supervised multivariatestatistical approaches. The boundaries between geological objects are determined a priori by geoscientific facts such as lithologic, stratigraphic, or facies allocation. This approach is applied to test the multivariate differences between predetermined classes of objects according to their statistical significance. Not yet classified objects can be assigned to existing classes at a given statistical level of significance, or they result in generating new classes of geological objects.
Example The profile of borehole Herzfelde 4/63 east of Berlin (Germany) is to be classified by its microfauna population. It shows a monotone sedimentary sequence, poor in fossils and without any index fossils (Fig. 1). The series consists of fine to medium-grained sandstone (a), ordinary sandstone (b), calcareous sandstone (c), pelitic limestone (d), clayey limestone (e), limestone breccia (f), limestone conglomerate (g), oolithic limestone (h), dolomite (i), and clayish sandstone (j). Some layers are colored (k) and the deeper limestone is pelitic (l). The sequence covers Mesozoic sediments from the Oxfordian to the Cenomanian. The series was synsedimentarily to postsedimentarily disturbed by halokinetic processes (Schudeck and Tessin 2015). The sparse fossil content (residues of ostracods, charophyta, echinoderms, snails, and other types of shells and fish) was determined in 94 samples (column 2 in Fig. 1). The segmentation of this profile by microfauna components resulted in six classes (column 3 in Fig. 1) for significance level a ¼ 0.10. The lower class corresponds to Oxfordian and Kimmeridgian strata and is concordantly overlain by Tithonian and earliest Berriasian followed by a weathered horizon (Early Berriasian). This sequence is overlain by poorly fossiliferous Late Berriasian and Cenomanian sediments (class no. 4, possibly limited by a tectonic fault?). Above an obvious discordance, class no. 5 continues the Early Berriasian sequence which is very rich in fossils. The hanging group follows discordantly and is composed of Early Berriasian and Late Berriasian containing only a few fossils. A reduction of the significance level to a ¼ 0.05 results in altogether 16 microfauna boundaries (column 4 in Fig. 1), but the entire profile is still clearly structured into coherent sections. This is important for the geological interpretation.
O
1006
Object Boundary Analysis
Object Boundary Analysis, Fig. 1 Boundaries between ordered samples in a borehole profile
Column 5 in Fig. 1 shows the resulting preliminary stratigraphic classification of the studied Mesozoic profile. The difference between the generalizing method “boundaries of ordered objects” and the individualizing cluster
analysis is presented in Fig. 2. The same data set was used for both approaches. The cluster-analytic result (column 3 in Fig. 2) shows nine classes similarly composed as the groups of the Rodionov algorithm. It shows partly also closed sequences
Object Boundary Analysis
1007
O
Object Boundary Analysis, Fig. 2 Comparison between generalized and individualized boundaries
1008
of a quasi-homogeneous composition but also embedded thin layers characterized by a different fossil content. This comparison between results is suitable to recognize the difference between generalizing and individualizing methods.
Summary Linearly ordered or not ordered sets of multivariate geological objects can be subdivided into quasi-homogeneous subsets by multivariate statistical methods developed by Rodionov (1968). The resulting boundaries can be tested for their statistical significance and can be interpreted geologically. This generalizing approach is distinguished from the individualizing cluster analysis by preferential involving neighboring objects into a common class without explicit inclusion of spatial or temporal coordinates. Application is recommended to classify borehole profiles, time series, and related ordered geoscientific objects.
Cross-References ▶ Cluster Analysis and Classification ▶ Discriminant Analysis ▶ Pattern Classification ▶ Rodionov, Dmitriy Alekseevich
Bibliography Rodionov DA (1968) Statistical methods to mark geological objects by a complex of attributes. Nedra, Moscow. (in Russian) Schudeck M, Tessin R (2015) Jurassic. In: Stackebrandt W, Franke D (eds) Geology of Brandenburg. Schweizerbart, Stuttgart, pp 217–240. (in German)
Object-Based Image Analysis D. Arun Kumar1, M. Venkatanarayana1 and V. S. S. Murthy2 1 Department of Electronics and Communication Engineering and Center for Research and Innovation, KSRM College of Engineering, Kadapa, Andhra Pradesh, India 2 KSRM College of Engineering, Kadapa, Andhra Pradesh, India
Definition Object-based image analysis (OBIA) is defined as recognizing the target objects in an image through the processes like image segmentation, image classification, evaluation, and analysis of objects. Image segmentation is the process of dividing an image into homogeneous segments by grouping
Object-Based Image Analysis
the neighboring pixels with similar characteristics like tone, texture, color, and intensity. Image classification is the process of labeling the image objects to the respective classes as they appear in the reality. Evaluation is the process of comparing the predicted object classes to the actual/true classes. The term image objects is a group of pixels with similar characteristics. A pixel is a digital number representing the spectral reflectance from a given area.
Introduction The earth resources change dynamically over the years due to various reasons like anthropogenic activities, natural processes and other activities. The consumption of resources such as water, forest, minerals, oil, and natural gases has increased from the past years (Lillesand and Kiefer 1994). Also, the land use/land cover changes are frequent due to the reasons such as overfloods, drought, cyclones, tsunamis, etc. The percentage of resource utilization by the humans has increased over the recent years (Lillesand and Kiefer 1994). Many studies were implemented scientifically to quantify the changes in the resources like forest, water, and land use/land cover changes (Richards and Jia 2006). Remote sensing is one such type of art, science, and technology of acquiring the information about the resources on the earth surface without any physical con tact with the object of interest (Campbell 1987). The images/photographs of the target of interest are obtained remotely from airborne/ spaceborne platforms (Richards and Jia 2006). The very first attempt of acquiring the aerial photographs were implemented by airborne platforms like parachutes, air crafts, etc. In the recent years, space-based platforms like satellites are used to acquire the information about the target of interest (Schowengerdt 2006). The information about the target of interest is remotely sensed and represented in the form of images (Jensen 2004). In the earlier days, the images were obtained in analog format with coarse resolution (Gonzalez and Woods 2008). Identification of object of interest in the images with coarse resolution is difficult task for the manual analysis. Advancement in the sensor technology like development of charged coupled devices generated digital images with fine spatial resolution. The digital images are more informative in comparison with analog images. A digital image consists of finite set of elements called pixels/pels (Bovik 2009). Each pixel in a digital image is indicated with digital number (DN) (Gonzalez and Woods 2008). A DN of a pixel represents the average spectral reflectance sensed by the sensor in the finite area of interest. The digital image consists of digital numbers represented in rows and columns (Bovik 2009). The digital images are processed to extract the information about the area of interest using various processing techniques like image enhancement, image transformation, and image classification (Richards and Jia 2006). The required information from the digital images is acquired with the usage of automated
Object-Based Image Analysis
methods. These automated methods are mathematically implemented using statistics, probability, set theory, graph theory, etc. (Schowengerdt 2006). The processing and analysis of images is classified as (i) manual approach and (ii) digital approach. In the manual approach expert committee will recognize the object of interest using the image interpretation keys/elements (Lillesand and Kiefer 1994). In the digital image processing and analysis approach the objects are detected and recognized using automated methods followed by the expert committee analysis. The DNs in an image are processed using two approaches, namely (i) Pixel-based image analysis and (ii) Object-based image analysis (Jensen 2004). The contents of present entry are given as pixel-based image analysis, object-based image analysis, major applications of objectbased image analysis in remote sensing, and conclusions.
1009
digital images. The object-based image analysis (OBIA) approach works well in recognizing the object of interest in both finer and coarser resolution digital images (Johnson and Ma 2020). The example of PBIA is provided in Fig. 1 and Fig. 2. In Fig. 1, there are three categories of vege tation classes. The input image is classified using PBIA approach and the classified output image is given in Fig. 2.
Pixel-Based Image Analysis The pixel-based image analysis approach (PBIA) is based on spectral signature of the area of interest. In the pixel-based image analysis the processing techniques are implemented at the finite level of the digital images called as pixels (Veljanovski et al. 2011). Each pixel in the image is processed to acquire the information of the area. Pixel-based classification is one of the key operations performed in PBIA. In pixelbased classification each pixel in the input image is labelled to the corresponding class using various classifiers (Johnson and Ma 2020). The input pixel is considered as a vector in the N-dimensional feature space and the pixels of same class will form a cluster in the N-dimensional feature space. In pixelbased classification approach each pixel is assigned with the class label. There exists various pixel-based classifiers for remote sensing image analysis (Blaschke et al. 2014). K-nearest neighbors classifier assigns the class label to the pixel based on the K-nearest neighbors in the Euclidean space. Minimum distance to mean classifier assigns the class label to the pixel based on the minimum difference between the pixel value and the class mean (Veljanovski et al. 2011). Maximum likelihood classifier assigns the class label to the pixel based on the class conditional probability. The output of pixel-based classification approach is the classified image/classified map. Furthermore, the classified map is analyzed by the expert for quantitative assessment of various classes present in the image (Johnson and Ma 2020). Pixel-based image analysis approach works well with coarse resolution image where the target object area is lesser than the spatial resolution of the sensor (Blaschke et al. 2014). This approach has limitations when the input image is with finer resolution where the target object area is larger than the spatial resolution of the sensor. Motivated with this reason, object-based image analysis approach was suggested for the
Object-Based Image Analysis, Fig. 1 Input image with vegetation classes. Reprinted from (Veljanovski et al. 2011)
O
Object-Based Image Analysis, Fig. 2 Output image of PBIA with vegetation classes represented in three symbols (Image is reprinted from (Veljanovski et al. 2011)). The input image is given in Fig. 1
1010
Object-Based Image Analysis
Object-Based Image Analysis In OBIA, the image pixels are grouped based on the similarity like color, tone, texture, and intensity characters. The group of pixels are combined to generate the image objects (Jensen 2004). The image objects represent the themes like water body, trees, buildings, grounds, etc. (Richards and Jia 2006). In OBIA, the spatial and spectral characteristics of pixels are used to recognize the target objects in the image. The steps implemented in OBIA is given in Fig 3. In the first stage of OBIA, the input image is segmented to obtain the image objects. Image segmentation is the process of dividing the input image into homogeneous regions called image segments. The major image segmentation approaches are given as (i) region based approach and (ii) boundary based approach. In the region based approach, similar type of pixels are detected using the algorithms. In boundary based approach, the algorithm searches for the discontinuity in the image. The boundary based approach is implemented using the segmentation algorithms like edge detection, point detection and line detection (Bovik 2009). The image segmentation techniques are broadly classified as threshold based method, edge-based segmentation, region based segmentation, cluster based segmentation, watershed based method, and artificial neural network based segmentation (Blaschke et al. 2014). The segmented image consists of separated image objects based on shape and size characteristics. In the second stage of OBIA, the objects in the segmented image are labeled to the respective classes by using various classification methods. The classification algorithms are categorized as (i) parametric classification and (ii) nonparametric classification (Johnson and Ma 2020). In the parametric classification, the statistical parameters like mean, median, mode, Object-Based Image Analysis, Fig. 3 Schematic flow diagram of OBIA. The flow di agram consists of steps like passing the input image, segmenting the input image to obtain image objects, classifying the image objects, and evaluation of accuracy
Object-Based Image Analysis, Fig. 4 Processing steps of OBIA. Reprinted from (Veljanovski et al. 2011)
etc. of the input pixel values are considered during the image classification. In the nonparametric classification, the classification of digital images is performed without considering the statistical parameters of the input pixel values (Veljanovski et al. 2011). The major para metric/nonparametric classifier methods include K- means, minimum distance method, maximum probability method, ISODATA, K-nearest neighbor, support vector machines, and parallel-piped method (Johnson and Ma 2020). In addition, there exists neural networks (NNs)-based classifiers such as Kohonen networks, feedforward neural networks (FNNs), recurrent NNs (RNNs), deep neural networks (DNNs), etc. The machine learning–based classifiers such as decision trees and regression trees are used to classify the image objects into respective classes. The output of image classification is a classified map with various classes of land use/land cover classes, natural resources, etc. (Blaschke et al. 2014). The processing steps of OBIA approach is given in Fig. 4. The output of OBIA approach for the input image (Fig. 1) is given in Fig. 5. The output classified image of OBIA approach is evaluated using various indices like user’s accuracy (UA), producer’s accuracy (PA), overall accuracy (OA), dispersion score (DS), and kappa coefficient (KC) (Lillesand and Kiefer 1994). The performance indices are de- rived from confusion matrix (CM). CM is a table of actual classified output classes assigned by a model. The sum of diagonal elements of the CM divide by total number of pixels is termed as OA (Campbell 1987). The results of OBIA are correlated with the ancillary data and evaluated with the field visits by selecting the locations based on random approach (Jensen 2004). Furthermore, based on the evaluation results, the OBIA is extended to various remote sensing applications.
Image
Image
Segmentation
classification
InputImage
Output Evaluation
classified map
Object-Based Image Analysis
1011
sensors like Optical Sensor Assembly, GeoEye Imaging System, Shortwave Infrared sensor, etc. (Jensen 2004). The coarse resolution images has 15 m to 75 m coverage of ground area and the fine resolution images have 0.5 m to 4.5 m coverage of ground area. The OBIA is efficiently applied on monochromatic, panchromatic, multispectral, and hyperspectral remote sensing images. OBIA has produced impressive results with the images acquired from 1–350 spectral bands (Gonzalez and Woods 2008).
Conclusions
Object-Based Image Analysis, Fig. 5 Output image of OBIA with detailed vegetation classes (Reprinted from (Veljanovski et al. 2011)). The input image is given in Fig. 1
Major Applications of Object-Based Image Analysis in Remote Sensing The applications of object-based image analysis in remote sensing is broadly classified as land use/land cover classification, forest mapping, mineral mapping, coastal zone studies, crop classification, and urban stud ies (Blaschke et al. 2014). In land use/land cover classification, the pixels in the image are classified into corresponding land use/land cover class like builtup land, urban dense, urban sparse, scrub land, and deciduous forest (Johnson and Ma 2020). OBIA is used in the forest studies for change detection and quantitative assessment of vegetation classes. OBIA is used to detect the minor and major lineaments in the remote sensing images for mineral exploration (Veljanovski et al. 2011). In coastal zone studies the OBIA is used to study the dynamics of shore line changes. The coastal zone applications also include the change detection in the vegetation like mangroves. The crop classification in remote sensing images consists of mapping and assessment of various type of crops in high-resolution images (Blaschke et al. 2014). In the urban studies, the OBIA is used to study the dynamics of urban sprawl and change detection in the city pattern (Johnson and Ma 2020). The applications of OBIA exists from coarse resolution to fine resolution remote sensing images (Schowengerdt 2006). The coarse resolution remote sensing images are acquired using the sensors like Linear Imaging Self-Scanner (LISS III), Linear Imaging Self Scanner (LISS II), Multispectral Scanner (MSS), Wide field sensor (WFS), Advanced Wide Field Sensor (AWIFS), etc. (Camp- bell 1987). The fine resolution remote sensing images are acquired from the
Advancement of sensor technology produced the images with good spatial resolution. OBIA approach is used for better information extraction from the fine resolution images in comparison with traditional methods. The traditional PBIA approach produces better results with coarse resolution images in comparison with fine resolution remote sensing images. PBIA considers the spectral characteristics of the object of interest during the processing of digital images. The performance of OBIA is efficient as tested with both coarse and fine resolution remote sensing images. OBIA considers the spatial and spectral characteristics of pixels during the object recognition. These advantages of OBIA provide better results in comparison with PBIA. OBIA is used in various type of remote sensing applications like land use/land cover classification, resource monitoring, urban mapping etc., where the spatial resolution of the images varies from 75 m to 0.5 m and the spectral resolution of images varies from single band to hundred bands. OBIA is used in the applications related to earth’s atmosphere, earth’s surface, and earth’s sub-surface. In the recent years, intensive research on OBIA with machine learning is being carried by the scientific community. OBIA with machine learning concepts produced better results in various remote sensing applications. The future scope of OBIA involves the inclusion of temporal and radiometric characteristics of objects along with spatial and spectral characteristics during the processing of digital images. Acknowledgment The first author is thankful to K. Madan Mohan Reddy, vice-chairman, Kandula Srinivasa Reddy College of Engineering, and A. Mohan, director, Kandula Group of Institutions for establishing Machine Learning group at Kandula Srinivasa Reddy College of Engineering, Kadapa, Andhra Pradesh, India – 516003.
Bibliography Blaschke T, Hay GJ, Kelly M, Lang S, Hofmann P, Addink E, Feitosa RQ, der Meer FV, der Werff HV, Coillie FV (2014) Geographic object-based image analysis towards a new paradigm. ISPRS J Photogramm Remote Sens 87:180–191 Bovik AC (2009) The essential guide to image processing. Academic Press
O
1012 Campbell JB (1987) Introduction to remote sensing. The Guilford Press Gonzalez RC, Woods RE (2008) Digital image processing, 3rd edn. Prentice Hall, New Jersey Jensen JR (2004) Introductory digital image processing: a remote sensing perspective, 3rd edn. Prentice Hall Johnson BA, Ma L (2020) Image segmentation and object- based image analysis for environmental monitoring: recent areas of interest, researchers views on the future priorities. Remote Sens 12(11). https://doi.org/10.3390/rs12111772 Lillesand TM, Kiefer RW (1994) Remote sensing and image interpretation. Wiley Richards JA, Jia X (2006) Remote sensing digital image analysis: an introduction, 4th edn. Springer Schowengerdt RA (2006) Remote sensing: models and methods for image processing. Elsevier Veljanovski T, Kanjir U, Ostir K (2011) Object-based image analysis of remote sensing data. GEOD VESTN 55(4):678–688
Olea, Ricardo A. C. Özgen Karacan U.S. Geological Survey, Geology, Energy and Minerals Science Center, Reston, VA, USA
Fig. 1 Ricardo A. Olea, courtesy of Ricardo A. Olea, Jr.
Biography Dr. Ricardo A. Olea received his B.Sc. degree in Mining Engineering from the University of Chile, Santiago, in 1966, and was awarded the Juan Brüggen Prize by the Instituto de Ingenieros de Minas de Chile for the best graduating Mining Engineer. Following graduation, he started his
Olea, Ricardo A.
professional career as an exploration seismologist with the National Oil Company of Chile (ENAP). Later, he pursued advanced qualifications in the United States. He received an M.Sc. degree in Computer Science from the University of Kansas in 1972 with a thesis on “Application of regionalized variable theory to automatic contouring,” and a Ph.D. from the Chemical and Petroleum Engineering Department of the University of Kansas in 1982 with a dissertation titled “Systematic approach to sampling of spatial functions.” He immigrated to the United States in 1985 to pursue a career as a research scientist with the Kansas Geological Survey in Lawrence, Kansas, where he worked until 2003. Before joining the USGS (United States Geological Survey) in 2006, where he currently works as a Research Mathematical Statistician, he had appointments with the Department of Environmental Sciences and Engineering of the School of Public Health at the University of North Carolina in Chapel Hill; the Marine Geology Section of the Baltic Research Institute of the University of Rostock, Warnemünde, Germany; and the Department of Petroleum Engineering at Stanford University. Besides authoring and coauthoring more than 250 publications, including 4 books, in quantitative modeling in the earth sciences, energy resources, geostatistics, compositional data modeling, well log analysis, geophysics, and geohydrology, Ricardo made exceptional contributions to the profession of mathematical geosciences and the IAMG (International Association for Mathematical Geosciences). He served as the chair of IAMG Geostatistics Committee (1985–1989), and the IAMG Membership Committee (1989–1992), as IAMG Secretary-General (1992–1996), IAMG President (1996–2000), and IAMG Past-President (2000–2004). He still takes active roles in the IAMG and maintains affiliations with professional organizations. He currently serves as an Associate Editor of the Springer journal Stochastic Environmental Research and Risk Assessment. Ricardo was presented with the IAMG’s highest Award, the William Christian Krumbein Medal in 2004, due to his distinguished research, service to the IAMG, and service to the profession in general. Ricardo’s achievements and professional accomplishments are many and much more than the brief summary highlighted above. As importantly, Ricardo is a very kind and modest person, despite his distinguished career, he is always keen to highlight excellence in others rather than his own. He is an influential and a well-loved mentor for many around the world. He is a great friend and an exceptional colleague to work with. Ricardo is always dependable and democratic, and he seeks a broadly based consensus before making any decisions. He always strives to learn new things and share them with his colleagues with openness. I feel privileged to have been asked to write this contribution about Ricardo in this volume.
Ontology
Ontology Torsten Hahmann Spatial Informatics, School of Computing and Information Science, University of Maine, Orono, ME, USA
Definition In the mathematical geosciences the term ontology denotes an information artifact that specifies a set of terms, consisting of classes of objects, their relationships and properties, and semantic relationships between them. But even in the information and computing sciences, the word “ontology” can denote vastly different kinds of artifacts, see Fig. 1 and Table 1 for examples. They differ in purpose and scope, ranging from informal conceptual ontologies to formal ontologies and from very broad to narrowly tailored to specific domains or specialized reasoning applications. They also differ in their representation formats, which vary in formality and expressivity, ranging from concept maps and term lists to structured thesauri and further to formal languages. If the semantics of terms are expressed in a formal, that is machine-interpretable, language one speaks of a “formal ontology.” Ontologies specified in less rigorous languages are referred to as conceptual ontologies or models.
Introduction The term ontology originates from the study of the nature of being and the categorization of the things that exist in the world. This philosophical tradition – referred to by the uppercase term Ontology – dates back to Aristotle’s work
Ontology, Fig. 1 Example snippets of nonformal ontologies loosely modeled after (Hahmann and Stephen 2018; Brodaric et al. 2019)
1013
on metaphysics. Within computing and information sciences, the term ontology is used more narrowly to refer to an artifact that result from explicitly modeling a portion of the world or, in the words of Gruber and Studer, an “explicit formalization of a [shared] conceptualization” (Staab and Studer 2009). See Jakus et al. (2013) for additional historical context. Over the last decades, ontologies of various scopes, formalities, and expressivities have emerged for a wide range of uses. The most obvious distinction between them is how they are represented, which ranges from simple term lists and graphical models to highly sophisticated logic-based languages, with the Resource Description Framework (RDF) (https://www.w3.org/TR/rdf11-concepts/) and RDF Schema (RDFS) (https://www.w3.org/TR/rdf-schema/) and the Web Ontology Language (OWL2) (https://www.w3.org/ TR/owl2-overview/) most widely used (Staab and Studer 2009). Regardless of their differences, all ontologies share two ingredients: (1) a set of terms – its terminology or vocabulary – and (2) a specification of how the terms are defined or related to one another as a means of specifying the terms’ semantics, that is, how to interpret them. The terms encompass classes, sometimes called concepts, and properties, also referred to as relations, predicates, or roles. Mathematically, classes can be thought of as sets (unary relations), whereas properties have arities of two or greater (Many representation formats, including RDFS and OWL2, only permit binary properties.). As an example, an ontology may include classes such as “Town,” “Well,” and “GroundwaterBody” and binary relations between classes (object properties in RDFS and OWL2) such as “contains,” “drawsFrom,” or “connectedTo” as well as attributes (data properties in RDFS, OWL2) such as “depth,” “volume,” “latitude,” and “longitude.” By relating terms to one another via generic or specific semantic relationships or via axioms, the terms’ semantics are specified. The most prevalent semantic relations are taxonomic ones (hyponymy and hypernymy) such as “is a subclass of” (short: “is-a”) and “is a subproperty of.” Other common semantic relations are meronymy (part of; mereology in logic) and its opposite holonymy as well synonymy, antonymy, and similarity. The process of developing ontologies is called ontological engineering, for which established methodologies, such as TOVE, Methontology, On-To, DILIGENT, or NeOn, are surveyed in the entry “Ontology Engineering Methodology”, Staab and Studer (2009). Development typically is a multistage process from knowledge elicitation all the way to ontology verification and validation and involves both knowledge engineers and domain experts. Various tools, including competency questions, ontology design patterns (entry ▶ “Pattern Recognition”), and formal ontological analysis help at various stages. More recently, the emphasis of ontological
O
1014 Ontology, Table 1 Examples snippets of formal ontologies using some of the concepts from Fig. 1
Ontology RDFS: :WaterWell rdfs:subClassOf: Well. :OilWell rdfs:subClassOf: Well. :drawsFrom rdfs:domain: WaterWell; rdfs:range: GroundWaterBody. OWL2: :WaterWell rdf:type owl:Class; rdfs:subClassOf [rdf:type owl:Restriction; owl:onProperty: drawsFrom; owl:someValuesFrom: GroundWaterBody]; owl:disjointWith: OilWell; :drawsWaterFrom rdf:type owl:ObjectProperty; rdfs:domain: WaterWell; rdfs:range: GroundWaterBody. SWRL: Town(?x1) & WaterWell(?x2) & contains(?x1,?x2) & drawsFrom(?x2,?x3) ! townGetsWaterFrom(?x1,?x3) First-order logic (FOL): 8x [WaterWell(x) ! : OilWell(x)] 8x, y [drawsFrom(x, y) ! WaterWell(x) ^ GroundWaterBody(y)] 8x [WaterWell(x) ! ∃ y [drawsFrom(x, y) ^ GroundWaterBody(y)] 8x [HydroRockBody(x) ^ 8 y [GroundWaterBody(y) ^ submat(y, x) ! WellWaterBody(y)] ! WaterWell(x)]
engineering has shifted to maintaining and evolving existing ontologies (Kotis et al. 2020).
Types of Ontologies To provide an overview of the broader ontology landscape, we survey it along three dimensions. The first is an ontology’s purpose, that is, how it is intended to be used. The subsequent dimensions – scope and representation format – are partially correlated to the purpose (see Table 3). Categorizing Ontologies by Purpose Among the vastly different uses of ontologies of interest to the mathematical geosciences, five common purposes emerge: conceptual modeling, knowledge organization (including information retrieval), terminological disambiguation, standardization, and reasoning. Many ontologies, however, serve combinations or variations of these prototypical purposes. Conceptual modeling (see Guizzardi 2010) is the process of brainstorming a domain or application in order to identify how the concepts therein relate to one another, creating topic or concept maps wherein concepts are related via unlabeled (generic) or labeled relationships. Entity-Relationship (ER) diagrams or UML class diagrams (examples in Fig. 1) – frequently used in software and database design – are more expressive yet still predominantly visual representations. Conceptual models are often integral parts of standards. Knowledge organization and management (see also Jakus et al. 2013) encompasses various techniques to organize and help retrieve knowledge within knowledge organization systems (KOS), supporting metadata annotations, keyword-
based semantic search, and hierarchical navigation through knowledge. The utilized ontologies typically focus on classes and include formats such as controlled vocabularies and glossaries (lists of terms without definitions or relationships between them), taxonomies (i.e., classes organized using hypernyms), and thesauri (e.g., the Getty Thesaurus of Geographic Names, (https://www.getty.edu/research/tools/vocab ularies/tgn/) which uses taxonomic relations and generic associations). Thesauri can be encoded using the SKOS language or more expressive languages, such as RDFS or OWL2 (http://www.w3.org/TR/skos-reference). The latter two are commonly used for sharing and connecting data as knowledge graphs, which support some basic semantic reasoning. Terminological clarification and disambiguation is about precisely defining terms and, in the process, distinguishing and relating different word senses of terms (e.g., “well” for a bore hole to extract liquids/gases vs the space in a building containing a staircase) and nuances in its use (e.g., referring to a water or oil well; referring to the excavated hole only vs to the entire structure that includes container and liquid). A primary tool for disambiguation is natural language (“prose”) definitions. They are core to dictionaries but also incorporated into SKOS ontologies via the “skos:definition” attribute and into WordNets (Fellbaum 2006) that additionally relate words and word senses using semantic relationships such as synonymy and hyponymy. Ontologies specified in RDFS or OWL2 can also incorporate such definitions in comments (using “rdfs:comment”) but they are not formally interpreted. Gazetteers, such as the USGS’s Geographic Names Information System (GNIS) (https://geonames.usgs.gov/), are ontologies that clarify and disambiguate between geographic names using feature types (e.g., a town, county, or river),
Ontology
meronomic relationships (e.g., the country or state where it is located), and precise locations (e.g., latitude and longitude). Standardization is agreeing on a shared interpretation of terms for a specific domain to improve semantic interoperability. Because standards primarily guide the people developing such applications, nonformal representations consisting of conceptual models and natural language definitions of terms are most common, as exemplified by OGC’s spatial data standards Simple Features (SFA) (https://www.ogc.org/ standards/sfa) and GML (https://www.ogc.org/standards/ gml) and their domain-specific standards such as GeoSciML (https://www.ogc.org/standards/geosciml) and WaterML (https://www.ogc.org/standards/waterml) for geologic and hydrologic data, respectively. Other standards additionally provide reference implementations or machine-interpretable versions, such as the OWL2 versions for Semantic Sensor Networks (SSN) (https://www.w3.org/TR/vocab-ssn/), PROV-O (https://www.w3.org/TR/prov-o/), and OWLTime (https://www.w3.org/TR/owl-time/). Such formalized standards are critical for increasing information integration and exchange across geoscience applications geosciences and, specifically, for constructing spatial data infrastructures (SDI) and knowledge graphs. Reference ontologies (Menzel 2003) play important standardization roles. Foundational (or upper) reference ontologies (Guizzardi 2010) provide anchor concepts refinement by other ontologies, while domain reference ontologies (Hahmann and Stephen 2018) provide domain-specific reference concepts. Both kinds are discussed further in the categorization by scope. Semantic reasoning is about inferring implicit knowledge and requires expressive formal ontologies – also called knowledge representation ontologies (Jakus et al. 2013) – specified using logic-based languages such as OWL2 (or other Description Logics), SWRL (https://www.w3.org/ Submission/SWRL/) or first-order logic (or Common Logic) (https://standards.iso.org/ittf/PubliclyAvailableStandards/ c066249_ISO_IEC_24707_2018.zip). By relating terms flexibly via logical axioms, these languages enable complex automated reasoning about the interaction of classes and relations, such as about the domain or range of objects participating in a relation, or the existence of specific relations for certain types of objects. Querying and reasoning with such ontologies is facilitated by SPARQL (https://www.w3.org/ TR/rdf-sparql-query/) (for RDFS, OWL2) and GeoSPARQL (https://www.ogc.org/standards/geosparql), Description Logic reasoners, or first-order theorem provers and model finders. Categorizing Ontologies by Scope Ontologies can also be categorized by scope, that is, how broadly applicable they are, ranging from top-level ontologies, generic ontologies, domain and domain reference ontologies, to highly specialized application ontologies. With narrowing scope the ontologies’ depth can increase.
1015
Top-level (or upper or foundational) ontologies, such as BFO or DOLCE and surveyed by Mascardi et al. (2007), define high-level categories of objects according to a certain philosophical perspective. They are primarily used for reference purposes (Menzel 2003) with challenges and limitations discussed by Grüninger et al. (2014). Common distinctions are between endurants (also called continuants: entities that are entirely present at a given time), perdurants (also called occurrents: entities that are by nature defined over time, such as events or processes) and qualities (e.g. a temperature, color, or spatial location). Endurants are sub-categorized into physically present ones, including material (e.g. towns or water bodies) and immaterial ones (a hole), and nonphysical endurants such as social or mental objects. DOLCE further distinguishes abstract objects, such as a “space region”, while BFO emphasizes the difference between independent and dependent objects (e.g. boundaries and holes depend on a host object). Perdurants are commonly categorized into events, processes, and states. Foundational relations include parthood (temporal or spatial), constitution and dependence as well as relations linking perdurants to endurants, such as DOLCE’s participation relation. Ongoing work streamlines the top-level ontologies, including a new modular one (TUpper: Grüninger et al. 2014) within a single ISO standard (Draft standard ISO/IEC: 21838 with Parts for BFO and proposed parts for DOLCE and TUpper.). Generic ontologies (Grüninger et al. 2014) and mid-level ontologies encompass concepts that are used across multiple domains, such as pertaining to space, time, processes, events, observations, or the provenance of information. These generic concepts play critical roles in the mathematical geosciences and the ontologies are intended to be either reused by or used in conjunction with domain and application ontologies. While foundational ontologies also contain generic concepts, purpose-built generic ontologies flesh them out. For example, DOLCE and BFO know “space regions” whereas spatial ontologies like for the FOL formalizations of Simple Features and CODIB define entire hierarchies of different kinds of spatial regions based on dimension, curvature, and configuration (FOL and OWL2 axiomatizations available from http:// colore.oor.net/simple_features/ and http://colore.oor.net/ multidim_space_codib/). Domain ontologies cover a specific domain of interest, but vary in domain breadth. For example, GeoSciML covers all kinds of geological formations, while the WaterML parts Hy_Features and GWML2 more narrowly focus on surface and subsurface hydrology, respectively. These are typical geoscience domain ontologies in that they cover a breadth of concepts and attributes for describing data in their respective domains. In comparison, a domain reference ontology (variably also called a reference domain ontology or core ontology), such as the hydro foundational ontology (HyFO; Hahmann and Stephen 2018) or the GeoSciMLBasic module
O
1016
Ontology
of GeoSciML, strictly focuses on the high-level concepts needed to integrate ontologies within their domains. Application ontologies are tailored to specific software systems, for example to describe data models or to enable data retrieval. Their specific nature and ad-hoc development often prevents reuse elsewhere. Examples are ontologies specifically developed to identify rock formations for natural resource extraction, or the data model underlying the National Hydrographic Dataset (https://www.usgs.gov/core-sciencesystems/ngp/national-hydrography/national-hydrographydataset) that encodes the US stream network and flows. Categorizing Ontologies by Representation Format A wide spectrum of representations that differ in (1) formality and (2) expressivity are available (Table 2) for the varied uses
of ontologies (as summarized by Table 3). With respect to formality, we distinguish formal formats from non-formal ones. Expressivity is about how granular and detailed the semantics of terms can be specified. Formality: The least formal, mostly conceptual, ontologies use graphical notations, such as Entity-Relationship Models or UML class diagrams (examples in Fig. 1), that lack machine-interpretability but are appropriate for communication between people. Semi-formal ontologies include thesauri (e.g., WordNets or SKOS ontologies) that utilizes multiple explicit – though vaguely defined – semantic relationships. On the other end of the spectrum are formal ontologies specified using machine-interpretable languages such as RDFS, OWL2 (and other Description Logics), rule languages (SWRL), first- and higher-order logics (examples in Table 1).
Ontology, Table 2 Common types of ontologies by representation format in increasing order of expressivity Nonformal ontologies Topic Map, Concept Map: Graphical representation of concepts with labeled (Concept Map) or unlabeled (Topic Map) relationships between the concepts Entity-Relationship Model: Graphical representation showing relationships between named classes of objects and named properties/ relations Controlled Vocabulary, Term List, Glossary: A set or list of terms for a domain/application of interest, called a glossary if in alphabetic order Semi-formal ontologies Dictionary: A set of terms (typically in alphabetic order) with concise natural language definitions Taxonomy: A set of classes related hierarchically via “is-a” (hyponymy/subclass) relationships Thesaurus (including SKOS and WordNet): A set of classes related via taxonomic and other semantic relationships such as synonymy or meronymy (part-of) UML Class Diagram: Graphical representation that enhances ER models by taxonomic and meronomic relationships.
Ontology, Table 3 Common representation formats for ontologies and their most common uses
Representation format Topic Map, Concept Map Entity-Relationship Model Controlled Vocabulary, Glossary Dictionary Taxonomy UML Class Diagram WordNet (Thesauri) SKOS (Thesauri) RDF-S OWL2 SWRL FOL
Formal ontologies Resource Description Framework Schema (RDFS): XML-based notation for declaring classes, properties, and individuals and relating them via hyponymy relations (subclassOf, subPropertyOf) and domain and range restrictions of properties in terms of classes. Can alternatively be serialized as triples of the form “subject predicate object” in the widely used Turtle or N-Triples notations or in JSON-LD Web Ontology Language (OWL 2): An extension of RDFS that allows more complex class descriptions and constructs for restricting the interaction of properties and classes. For example, complex classes can be constructed using logical connectives (intersection, union, complement), using existential and universal quantification and cardinality restrictions Semantic Web Rule Language (SWRL): Extends OWL with rules, that is, axioms of the form “antecedent (body) - > consequent (head)” where body and head are conjunctions of statements First-order logic (FOL; including Common Logic): Logic language that allows free-from axioms using standard logical connectives (not, and, or, if, iff) over a vocabulary of classes and relations of arbitrary arity. Common Logic provides additional constructs for, e.g., specifying axiom schemata and or to scope axioms to specific classes
Common uses (by purpose) Modeling KOS Term. C & D X X X X X X X X X X X X X X X
Standards
Reasoning
X X X X
X X X
X X X X
Open Data
Expressivity: Non-formal ontologies range in expressivity from providing a single, unnamed generic semantic relationship as in topic maps to thesauri that incorporate multiple semantic relationships such as WordNet’s hyponymy and synonymy and SKOS’ “closeMatch” and “broaderMatch”. Formal ontologies exhibit more pronounced differences in expressivity, which gradually increases from RDFS to OWL2 and further to SWRL and first-order logic (FOL, Common Logic) and higher-order logics. Because they use a formal logic as representation language, DL and FOL ontologies are called axiomatic ontologies, whereas non-axiomatic ones are sometimes called “lightweight ontologies.” The term “axiomatic” emphasizes that they are not reliant upon predefined semantic relations but allow arbitrarily complex logical sentences to be used as axioms to constrain the semantics of terms. This vastly increases expressivity and the kind of queries that can be posed.
1017 Hahmann T, Stephen S (2018) Using a hydro reference ontology to provide improved computer-interpretable semantics for the groundwater markup language (GWML2). Int J Geogr Inf Sci 32(6): 1138–1171 Jakus G, Milutinović V, Omerović S, Tomažič S (2013) Concepts, ontologies, and knowledge representation. Springer, New York Kokla M, Guilbert E (2020) A review of geospatial semantic information modeling and elicitation approaches. ISPRS Int J Geo Inf 9(3):146. MDPI Kotis K, Vouros G, Spiliotopoulos D (2020) Ontology engineering methodologies for the evolution of living and reused ontologies: status, trends, findings and recommendations. Knowl Eng Rev 35: 1–34. Cambridge University Press Mascardi V, Cordì V, Rosso P (2007) A comparison of upper ontologies. Workshop dagli Oggetti agli Agenti, Genova, Italy, Seneca Edizioni Torino, pp 55–64 Menzel C (2003) Reference ontologies – application ontologies: either/ or or both/and? Proceedings of the KI2003 Workshop on Reference Ontologies and Application Ontologies, Hamburg, Germany, September 16, 2003, CEUR Workshop Proceedings 94, http://ceur-ws. org/Vol-94,CEUR-WS.org Staab S, Studer R (2009) Handbook on ontologies, 2nd edn. Springer, Berlin
Summary Ontologies are used in a multitude of ways by mathematical geosciences and related scientific communities (see also the survey on geospatial semantics by Kokla and Guilbert 2020): They help conceptualize domains, standardize terminology, enable information organization and retrieval, and support complex semantic reasoning. Ontologies also differ widely in scope and representation formats. Among the available representation formats, formal ontologies specified in languages such as RDFS, OWL2, and more expressive languages experience increased uptake by the community as they afford the greatest semantic clarity and support the overall trend toward semantically enabled, knowledgeintensive science.
Cross-References ▶ Graph Mathematical Morphology ▶ Metadata
Bibliography Brodaric B, Hahmann T, Gruninger M (2019) Water features and their parts. Appl Ontol 14(1):1–42. https://doi.org/10.3233/AO190205 Fellbaum C (2006) WordNet(s). In: Encyclopedia of language & linguistics, vol 13, 2nd edn. Elsevier, Amsterdam, pp 665–670 Grüninger M, Hahmann T, Katsumi M, Chui C (2014) A sideways look at upper ontologies. In: Formal ontology in information systems. IOS Press, Amsterdam, pp 9–22 Guizzardi G (2010) Theoretical foundations and engineering tools for building ontologies as reference conceptual models. Semantic Web J 1:3–10
Open Data Lucia R. Profeta Lamont-Doherty Earth Observatory, Columbia University in the City of New York, New York, NY, USA
Definition The summary definition of Open Data is “data that can be freely used, modified, and shared by anyone for any purpose” (OKF 2015). The acceptance of what qualifies as open is under constant revision. The first definition in its modern use was created by the Open Knowledge Foundation in 2005 and its most recent iteration (version 2.1) was established in 2015. Changes to this definition were made using community feedback in order to ensure improved reusability, licensing, and attribution of the data. Prerequisites of Open Data The Open Definition (OKF 2015) describes in rigorous detail what elements are required, recommended, or optional to meet the criteria for Open Data. This poses conditions for the data itself (open works) and the license that will be associated with the data. There are four criteria that the data must satisfy in order to classify as open: (1) it must have an open license, or be freely available in the public domain, (2) it must be accessible at most at a one-time reproduction cost and should be downloadable from the internet, (3) it needs to be compiled in a way
O
1018
that makes it machine readable, and (4) the format in which the data is shared should be nonproprietary. The license associated with Open Data must comply with the conditions of open licenses as outlined in Fig. 1. The differences between open licenses are important to understand, as they impose specific rights and responsibilities, both for data creators, as well as users. Through a general licensing process, Open Data can be freely usable without any monetary compensation to its creator. It can be redistributed or sold, added to collections, and partial or derivative data must be distributed under the same license. Moreover, there must be no discrimination against any person or group, and all uses of the data are acceptable. The aforementioned conditions can only be changed under the following circumstances, which may be imposed by the license agreement: (i) Sharing the data requires that the creators will be cited (attribution). (ii) If any modifications are made to the data they will be clearly stated, or the name of the dataset will be changed or versioned (integrity). (iii) Redistributing the work will be done with the same license type (share-alike). (iv) Copyright and license information must be preserved (notice). (v) Distribution will be done in a predefined format (source). (vi) No technical restrictions will be imposed on distribution. (vii) Additional permissions and allowed rights can be granted to the public in cases such as patents (nonaggression).
Open Data, Fig. 1 Open License Conditions based on The Open Definition v 2.1 (OKF 2015)
Open Data
Principles Surrounding Open Data The Open Data ecosystem incorporates multiple participants throughout the data life cycle at many levels, such as governments, funding agencies, researchers, publishers, and end users. Some guiding principles have emerged, which aim to govern the complex rights and responsibilities of all stakeholders. The International Open Data Charter (2015) put forth six principles to inform the use of Open Data with an emphasis on how government data should be shared in order to create more transparency and international collaboration. These principles stated that government data should be “Open By Default, Timely and Comprehensive, Accessible and Usable, Comparable and Interoperable, For Improved Governance & Citizen Engagement, and For Inclusive Development and Innovation.” The need for regulating how data is shared was also identified as a priority within the scientific community, which lead to the emergence of the FAIR Principles (Findable, Accessible, Interoperable, Reusable; Wilkinson et al. 2016). The FAIR principles provided a general framework for how Open Data should be shared and the importance of machine readability. They also recognize the need for data repositories to capture rich metadata – information that would accompany the data in order to make it more reusable. To further define the responsibilities of data repositories, a successor to the FAIR Principles was created: the TRUST Principles (Transparency, Responsibility, User focus, Sustainability, and Technology; Lin et al. 2020). Another complement to the FAIR principles is represented by the CARE Principles (Collective Benefit, Authority to control, Responsibility, Ethics; RDA 2019), a set of recommendations which aim to recognize and protect the value of Indigenous data.
Open Data
State of Open Data Since 2016, reports on the State of Open Data (Hyndman 2018) have been generated based on surveys of the scientific community. Researchers have been asked to share their views regarding their perception of Open Data. The emerging trends show that more researchers are sharing their data, but fewer of them are reusing openly available data. There has been a declined interest among those that have not yet reused Open Data to start doing so. Of those researchers that have shared their data openly, almost half did not know under what license their data was shared. There is an increasing desire for more guidance and regulations to be put in place. The majority of researchers that were surveyed indicated that a national mandate for making primary research openly available is necessary (Fig. 2). There are many organizations and collaborative projects that are working toward establishing more robust standards and best practices for data sharing and attribution. Initiatives such as Data Together (GO FAIR 2020) combine expertise from international organizations to ensure that the Open Data research ecosystem is governed, implemented, and maintained in a sustainable way. Linked Open Data The concept of Linked Data was introduced by Tim BernersLee (2006) in the W3C community as a way to regulate data sharing through the Semantic Web: Linked Data should have
1019
a Uniform Resource Identifier (URI) that uses HTTP protocol, links to other URIs, and it should use standardized querying languages such as SPARQL or the Resource Description Framework (RDF). If the Linked Data uses open sources, it becomes Linked Open Data (LOD). LOD adds value and increases the trustworthiness of data making it very powerful. When combined with open interfaces (APIs) to facilitate automation, LOD allows for better integration, discoverability, and access to complex data. Open Data in Geosciences From a data perspective, Earth and Space Sciences are made up of disparate long tail subdomains – data is primarily shared in discreet amounts, with not enough attention being given to interoperability or reusability. Both technical infrastructure and governance initiatives regarding data stewardship in the geosciences have been gaining more momentum over the last decade. A variety of Open Data providers, tools, and platforms has emerged and is being maintained by large public funding agencies (see Ma 2018 for review). Requirements for newly published data to be open and interoperable by default, as well as legacy data rescue efforts are leading to large amounts of Open Data. Technological advancements such as cloud computing and high-performance computing (HPC) are enabling large scale reuse of Open Data. Software deployments such as the Pangeo Environment (Odaka et al. 2020) offer solutions for
O
Open Data, Fig. 2 Trends emerging from surveys of researcher attitudes toward Open Data between 2016 and 2020. (Data from Hyndman 2018)
1020
big data geoscience research that rely on, and concurrently generate, Open Data. Public access to these types of technologies is promoting open science on scales never seen before, facilitating cross-domain research and international collaboration.
Optimization in Geosciences
Optimization in Geosciences Ilyas Ahmad Huqqani and Lea Tien Tay School of Electrical and Electronics Engineering, Universiti Sains Malaysia, Penang, Malaysia
Summary or Conclusions Abbreviations Open Data is the most important building block of open science and scholarship. Big data studies and cloud computing rely on having a steady supply of high quality, metadatarich, freely accessible, and machine-readable data. This has been historically a burden on data creators, but we are witnessing a perception shift, where it is understood that this is a shared responsibility. Better practices regarding data dissemination are being implemented on every level, from education to publication. The integrity of our data is the key to our shared knowledge – there is an inherent duty to preserve it, and ensure that it is open to all.
Cross-References ▶ Data Life Cycle ▶ FAIR Data Principles
Bibliography Berners-Lee T (2006) Linked data – design issues, world wide web consortium (W3C), London 2006. https://www.w3.org/ DesignIssues/LinkedData.html. Accessed 05 Nov 2020 GO FAIR (2020) Data together statement. https://www.go-fair.org/2020/ 03/30/data-together-statement/. Accessed 05 Nov 2020 Hyndman A (2018) State of open data. https://doi.org/10.6084/m9. figshare.c.4046897.v4. Accessed 28 Nov 2020 International Open Data Charter (2015) Principles. https:// opendatacharter.net/principles/. Accessed 6 Dec 2020 Lin D, Crabtree J, Dillo I et al (2020) The TRUST principles for digital repositories. Nat Sci Data 7:144. https://doi.org/10.1038/s41597020-0486-7 Ma X (2018) Data science for geoscience: leveraging mathematical geosciences with semantics and open data. In: Handbook of mathematical geosciences: fifty years of IAMG. Springer International Publishing, Cham, pp 687–702 Odaka TE, Banihirwe A, Eynard-Bontemps G, Ponte A, Maze G, Paul K, Baker J, Abernathey R (2020) The Pangeo ecosystem: interactive computing tools for the geosciences: benchmarking on HPC. In: Communications in computer and information science. Springer, pp 190–204 Open Knowledge Foundation – OKF (2015) Open Data Handbook. https://opendatahandbook.org/. Accessed 11 Oct 2020 Research Data Alliance International Indigenous Data Sovereignty Interest Group (2019) CARE principles for indigenous data governance. The Global Indigenous Data Alliance. GIDA-global.org Wilkinson M, Dumontier M, Aalbersberg I et al (2016) The FAIR guiding principles for scientific data management and stewardship. Nat Sci Data 3:160018. https://doi.org/10.1038/sdata.2016.18
ACO ANN BBO DE EBF FPAB FR GA GIS GS GWO LR P PSO NP SVM SVR RF WoE
Ant colony optimization Artificial neural network Biogeography-based optimization Differential evolution Evidential belief function Flower pollination by artificial bee Frequency ratio Genetic algorithm Geographical information system Geoscience Gray wolf optimization Logistic regression class Polynomial time class Particle swam optimization Nondeterministic polynomial time Support vector machine Support vector regression Random forest Weight of evidence
Definitions Geoscience: The study of earth is called as geoscience. It investigates the processes that shape the earth’s surface, the natural resources that we utilize, and how water and ecosystems are interrelated. Geoscience encompasses so much more than rocks, debris, and volcanoes. Optimization: A process that takes the action of making the best or most effective use of a situation or resource. Optimization problem: Maximizing or minimizing a specific function in relation to a set, which frequently represents a set of possibilities in a given circumstance. The function makes it possible to compare several possibilities that could be the “best.”
Background The world is confronted with enormous problems that have ramifications for global socioeconomic performance and quality of life. To prevent and overcome the chaos created by global change, the new solutions are of utmost needed. In order to achieve this aim, better solutions need to be
Optimization in Geosciences
developed to optimize the management of natural resources, identify new ways to exploit geophysical processes and characteristics, and apply geoscience knowledge to explore new and optimized methods of utilizing terrestrial resources. One of the most challenging tasks in modelling of geophysical and geological processes is to address large data and achieve specified objectives. Therefore, it can be considered as an optimization problem, which allows the use of several new effective optimization techniques. This article provides insight into how optimization can be utilized in geoscience study and how optimization and geoscience should coexist in the future.
What is Optimization? Optimization is a search process which makes an object or system as useful or effective as feasible. It refers to determine the optimal solutions to meet a given objective subject to certain conditions. The purpose of optimization is to find the best possible solutions while taking into account the problem constraints. Optimization approaches are used to address the complicated scientific and engineering problems. The use of optimization techniques is vital due to time consumption and complexity of existing methods. Various complex engineering problems that demand computational precision in short time cannot be solved using traditional approaches. In this scenario, the optimization techniques are most effective approaches to find the best possible solution of such problems. Initially, the heuristic optimization techniques explore randomly in the problem search space for the optimal solution with limited computation time and memory without requiring any complex derivatives (Osman and Laporte 1996). The term “heuristic” originated from the Greek word “heuriskein,” which means “to find.” In computing, the heuristic technique works on a rule-of-thumb for providing a solution to the problem despite of considering the repetitive application of the method. However, this method only searches for an estimated solution and does not require any mathematical convergence proof. The most common and advanced heuristic optimization technique utilized in the context of addressing search and optimization issues is the metaheuristic optimization algorithm that was first proposed by Glover (1986). This method denotes a technique that employs one or more heuristics and hence possesses all the aforementioned features of heuristic method. As a result, a metaheuristic technique has many properties such as it tries to discover a near optimal solution rather than attempting to find the exact ideal solution, typically lacks a formal demonstration of convergence to the optimal solution, and lastly, it requires less time to compute than other exhaustive searches. These approaches are iterative in nature, and frequently utilize stochastic processes to alter
1021
one or more original candidate solutions during the search process (usually generated by random sampling of the search space). Due to the inherent practicality of many real-world optimization issues, classical optimization algorithms may not always be relevant and may not perform well in tackling such problems in a pragmatic manner. Despite disregarding the importance of classical algorithms in the optimization field, many researchers and optimization practitioners realized that the metaheuristic methods are able to obtain a nearoptimal solution in a computationally tractable manner. Therefore, the metaheuristic techniques have become the most popular optimization technique in recent years due to their capacity to handle a wide range of practical problems and produce an authentic and reasonable solution. The most common inspirations for metaheuristic techniques are natural, physical, or biological principles that can be imitated at a fundamental level using a variety of operators. The balance between exploration and exploitation is a recurring subject in all metaheuristics. The term “exploration” relates to how successfully operators diversify solutions in the search space. This feature provides the metaheuristic with a global search behavior. Moreover, the ability of the operators to use the knowledge provided from earlier iterations to increase search is referred as exploitation. The metaheuristic gains a local search feature as a result of this intensification. Some metaheuristics are more explorative than exploitative, whereas others are the inverse. The fundamental approach of randomly selecting answers for a set number of iterations, for example, constitutes a totally exploratory search. Hill climbing, on the other hand, is an example of totally exploitative search in which the present answer is progressively modified until it improves. Metaheuristics allow the user to balance between diversification and intensification via operator settings. Based on these characteristics, metaheuristic optimization algorithms have various advantages as compared to the classical optimization algorithms: • Metaheuristic algorithms can deal with the P class issues easily under significant complex inputs which may be a challenge for other conventional techniques. • Metaheuristics are useful in solving difficult NP problems which cannot be solved by any known algorithm in an acceptable time period. • Metaheuristics, unlike most traditional techniques, do not require gradient knowledge and may thus be utilized with non-analytic, black box, or simulation-based objective functions. • Due to the inherent stochasticity and deterministic behaviors, many metaheuristic algorithms have the ability to recover from local optima. • Metaheuristics can also handle uncertainties occurred in objectives in a better way as compared to the classical
O
1022
optimization methods because of the ability to recover from local optima. • Lastly, most metaheuristics can accommodate multiple objectives with relatively minor algorithmic modifications.
Classification of Metaheuristic Optimization Algorithms The most popular method of categorizing metaheuristic methods is based on the number of original solutions modified in subsequent rounds. Single-solution metaheuristics begin with a single starting solution that is iteratively modified. It is important to note that while the modification process may involve more than one solution, only one solution is employed in each subsequent iteration. On the other hand, population-based metaheuristics employ more than one starting solution. Multiple solutions are modified in each iteration, and some of them make it to the next iteration. The operators are used to modify solutions, which frequently employ particular statistical features of the population. Metaheuristic optimization algorithms can also be classified on the basis of the domain in which they operate. These algorithms are commonly referred as “bioinspired” or “nature-inspired” umbrella words. They can, however, be
Optimization in Geosciences
further subdivided into four categories (Fig. 1): evolutionary algorithms, swarm intelligence algorithms, physical phenomena algorithms, and nature-inspired algorithms. The evolutionary algorithms simulate many elements of natural evolution, such as survival of the fittest, reproduction, and genetic mutation. Swarm intelligence and nature-based algorithms replicate the group behavior and/or interactions of live species (such as ants, bees, birds, fireflies, glowworms, fish, white blood cells, bacteria, and so on) (Huqqani et al. 2022) as well as nonliving objects (like water drops, river systems, masses under gravity). The remaining metaheuristics, classified as physics-based algorithms, imitate different physical processes such as metal annealing, musical aesthetics (harmony), and so on.
Practical Applications of Optimization Algorithms in Geoscience The optimization algorithms have been widely used in various practical applications related to geoscience such as mineral and water exploration, geological and geo-structural mapping, natural hazards analysis, earthquake prediction and control, and many more. A few of the applications are discussed in the section.
Optimization in Geosciences, Fig. 1 Classification of metaheuristic optimization algorithms
Optimization in Geosciences
Land Cover Feature Extraction The term “land cover feature extraction” refers to the process of excerpting features from images of land in diverse locations. This is useful for a variety of different difficulties, such as detecting ground water, locating a path in mountainous terrain, and so on. For the aim of land cover feature extraction in the Alwar area of Rajasthan, methods such as fuzzy set, rough-fuzzy tie-up, membrane computing, and BBO were used. The pixels in the image are classified into five categories: rocky, water, urban barren, and flora. On the same Alwar dataset, hybrid versions of several methods have also been implemented. The algorithms included hybrid of ACO-BBO, hybrid of ACO-PSO-BBO, hybrid of FPAB-BBO, and hybrid of biogeography-based optimization and geoscience (BBO-GS) (Goel et al. 2013). As a result, BBO-GS has shown to be the top performer for the task among all the implemented algorithms. Groundwater Mapping Analysis One of the most important case studies of geoscience applications is the groundwater analysis. Water resources, including surface water and groundwater, are vital for life on our planet that are regenerated via evaporation, precipitation, and surface runoff. According to recent climate change estimations, the water cycle will become more spatially and temporally heterogeneous, resulting in water demand exceeding supply (Intergovernmental panel on climate change 2014). Groundwater is described as saturated water that fills the pore spaces between mineral grains, cavities, and fractured rocks in a rock mass. According to the World Economic Forum, scarcity of water will be a worldwide issue in the future. Consequently, about 20% of water utilized by humans is sourced from groundwater, and this proportion is expected to rise over the next few decades (Biswas et al. 2020). The extensive use of groundwater in agriculture, industry, and daily life causes several challenges to its management. Generally, the hydrological testing, field surveys, and geophysical techniques are commonly used in groundwater research projects. These courses of action are time consuming, costly, and necessitate the use of skilled personnel. As a result, techniques for analyzing groundwater that may illustrate the hydrological connection with groundwater, such as the use of data-specific capacity, transmissivity, and yield, must be developed. Groundwater potential is often described as optimum zones for groundwater development or the likelihood of groundwater presence (Díaz-Alcaide and MartínezSantos 2019). The likelihood of groundwater occurrence in a particular region is estimated using groundwater potential mapping which entails statistical analysis of many forms of field data. Remote sensing data acquiring techniques increase spatial coverage, thus improving the variety and availability
1023
of data. GIS technology may be utilized to analyze vast regions at lower cost. Preliminary GIS studies utilizing machine learning and statistical techniques, among other things, might be beneficial for analyzing groundwater availability based on topographical and geographical parameters. As a result, prospective ground-water wells may be plotted to aid in groundwater detection. With the rapid growing and complexity of data in GIS, a trustworthy model was necessary to assist in groundwater issue. Several models have been proposed to assess groundwater potential, including FR, WoE, and EBF models, as well as machine learning models such as ANN, RF, LR, and SVM. Some of these studies still have limitations in their predictions, which are impacted by the quality of the data collection and the internal structure of the model (Chen et al. 2019). Fadhillah built the groundwater potential mapping using a machine learning technique based on SVR with training data obtained from a hydraulic transmissivity dataset (Fadhillah et al. 2021). The learning process and the consistency of model outcomes can be influenced by SVR parameter settings. As a result, defining operational parameters becomes a problem in achieving the desired outcomes. To overcome this problem, he applied two metaheuristic optimization algorithms: GWO and PSO. GWO is an optimization method that has been frequently employed in GIS applications to optimize models. Due to its outstanding features above other swarm intelligence methods, it has been widely adapted for a broad variety of optimization problems (Faris et al. 2018). It has relatively few parameters, and no derivation information is necessary in the first search. Similarly, PSO is regarded as dependable in the computational process for model optimization since it provides a gentle computation and rapid convergence. It is believed that by employing the optimization technique, the groundwater model’s performance would increase. By adjusting the SVR machine learning parameters, the metaheuristic optimization technique has the potential to improve model prediction accuracy. Oil and Gas Reservoir Exploration In resolving geophysical optimization issues, optimization algorithms have been utilized in two ways: either by performing the optimization or by optimizing parameters of other techniques (e.g., neural networks) used in various problems. In a study on reservoir model optimization to match previous petroleum production data, a few optimization techniques are compared to PSO (Hajizadeh et al. 2011). In a case study involving two petroleum reservoirs, ACO, DE, PSO, and the neighborhood algorithm are combined in a Bayesian framework to assess the uncertainty of each algorithm’s predictions. Ahmadi predicted reservoir permeability
O
1024
Ordinary Differential Equations
using a soft sensor based on a feed-forward artificial neural network, which was subsequently improved using a hybrid GA and PSO approach (Ali Ahmadi et al. 2013). A multiobjective evolutionary method discovers optimum solutions and surpasses a standard weighted-sum technique. In a research of oil and gas fields, Onwunalu and Durlofsky used PSO to find the best well type and location (Onwunalu and Durlofsky 2010). The comparisons with a GA across several runs of both algorithms reveal that PSO surpasses the GA on average, although the benefits of using PSO over GA may vary depending on the scenario. Well placement problems were also considered by Nwankwor using a hybrid PSO-DE method (Nwankwor et al. 2013). Hybridized metaheuristic optimization algorithms are potentially beneficial in reservoir engineering problem.
optimization in engineering: techniques and applications. Springer, Berlin Heidelberg, pp 209–240 Huqqani IA, Tay LT, Mohamad-Saleh J (2022) Spatial landslide susceptibility modelling using metaheuristic-based machine learning algorithms. Eng Comput (2014) Intergovernmental panel on climate change. In: Climate change 2013 – the physical science basis: working group I contribution to the fifth assessment report of the intergovernmental panel on climate change. Cambridge University Press, Cambridge Nwankwor E, Nagar AK, Reid DC (2013) Hybrid differential evolution and particle swarm optimization for optimal well placement. Comput Geosci 17(2):249–268 Onwunalu JE, Durlofsky LJ (2010) Application of a particle swarm optimization algorithm for determining optimum well location and type. Comput Geosci 14(1):183–198 Osman IH, Laporte G (1996) Metaheuristics: a bibliography. Ann Oper Res 63(5):511–623
Summary
Ordinary Differential Equations
In summary, there are a lot of optimization problems in geosciences, and they can be solved using metaheuristic optimization or hybridized metaheuristic optimization algorithms. Besides, optimization algorithms also play a vital role in determine the best parameter for other machine learning techniques.
R. N. Singh1 and Ajay Manglik2 1 Discipline of Earth Sciences, Indian Institute of Technology, Gandhinagar, Palaj, Gandhinagar, India 2 CSIR-National Geophysical Research Institute, Hyderabad, India
Definition Bibliography Ali Ahmadi M, Zendehboudi S, Lohi A, Elkamel A, Chatzis I (2013) Reservoir permeability prediction by neural networks combined with hybrid genetic algorithm and particle swarm optimization. Geophys Prospect 61(3):582–598 Biswas S, Mukhopadhyay BP, Bera A (2020) Delineating groundwater potential zones of agriculture dominated landscapes using GIS based AHP techniques: a case study from Uttar Dinajpur district, West Bengal. Environ Earth Sci 79(12):302 Chen W, Tsangaratos P, Ilia I, Duan Z, Chen X (2019) Groundwater spring potential mapping using population-based evolutionary algorithms and data mining methods. Sci Total Environ 684:31–49 Díaz-Alcaide S, Martínez-Santos P (2019) Review: advances in groundwater potential mapping. Hydrogeol J 27(7):2307–2324 Fadhillah MF, Lee S, Lee CW, Park YC (2021) Application of support vector regression and metaheuristic optimization algorithms for groundwater potential mapping in Gangneung-si, South Korea, Remote Sensing. 13(6) Faris H, Aljarah I, Al-Betar MA, Mirjalili S (2018) Grey wolf optimizer: a review of recent variants and applications. Neural Comput & Applic 30(2):413–435 Glover F (1986) Future paths for integer programming and links to artificial intelligence. Comput Oper Res 13(5):533–549 Goel L, Gupta D, Panchal VK (2013) Biogeography and geo-sciences based land cover feature extraction. Appl Soft Comput 13(10): 4194–4208 Hajizadeh Y, Demyanov V, Mohamed L, Christie M (2011) Comparison of evolutionary and swarm intelligence methods for history matching and uncertainty quantification in petroleum reservoir models. In: Köppen M, Schaefer G, Abraham A (eds) Intelligent computational
Ordinary differential equation – a differential equation that contains unknown functions of one independent variable and their ordinary derivatives.
Introduction An ordinary differential equation (referred as ODE in literature) is an equation that contains the unknown variable y and its ordinary derivatives with respect to time (or space) t, which in general form can be expressed as Fðt, y, y0 , y00 , yn0 Þ ¼ 0,
ð1Þ
where y0 ¼ dy/dt, , yn0 ¼ dny/dtn. Here, n represents the highest order of the derivative and hence the order of the ODE. For a unique solution of this equation, we need n conditions. When these can be prescribed at a value of t, it is called initial value problem, and when these are prescribed at two values of t, it is called two-point boundary value problem. This equation can be written as a system of firstorder differential equation as dy/dt ¼ f(y, t), where y is a n-dimensional vector. In initial value problem, the values of y are given at t ¼ t0, y(t0). Initial value problem is well posed as its solution exists,
Ordinary Differential Equations
1025
is unique, and depends continually on the given data in case both functions f and df/dy are continuous (Coddington 1989). The literature on this subject is vast, but in this entry, we focus on some ODEs used in geosciences and show their applications in solving problems relevant to understanding earth’s evolution and processes. Geosciences deal with the structure, processes, and evolution of earth, the knowledge of which is constructed by using physical, chemical, and biological laws and interpreting observations made above, on, and within the earth. ODEs play a very important role in improving our understanding of the earth, especially when long-term behavior of some aspect of the earth is to be studied, e.g., temporal evolution of the earth, because other spatial variations can be averaged to reduce a generalized model, represented by a partial differential equation (PDE) (see ▶ “Partial Differential Equations”), to an ODE. These are so-called box models which pervade earth studies, say in biogeochemical cycles. ODEs also arise when PDEs involved in spatiotemporal studies of the earth are decomposed into a set of ordinary differential equations. Although one can have a generalized nth order ODE (Eq. 1), most of geosciences problems can be described by first- and second-order ODEs. Therefore, we discuss linear first- and second-order ODEs in the next section. We then cover nonlinear ODEs which are used in system’s thinking. This is followed by discussion on various solution methods. Finally, we illustrate several examples of ODEs used in geosciences.
yðt Þ ¼ Ce
f ðt 0 Þdt 0
þe
f ðt 0 Þdt 0
e
f ðt 0 Þdt 0
g t } dt } :
(4)
The value of the constant of integration C depends on the choice of an initial condition, i.e., y|t ¼ 0. Second-Order ODEs Second-order ODEs are more prevalent in geosciences as most physical laws are described by second-order equations, e.g., oscillatory phenomena, equations for which are expressed as d2 y dy dy þ a þ by ¼ f ðtÞ, yð0Þ ¼ c, ðt¼ 0Þ ¼ d, 2 dt dt dt
ð5Þ
where the coefficients a and b are taken as constants and f(t) is the source function. The solution of inhomogeneous equation comprises of general solution given by the homogeneous part, also called complementary function, and particular solution of the equation. The solution for the homogeneous part of the equation can be written in terms of elementary functions. The solution is assumed as yðtÞ ¼ Cemt ,
ð6Þ
where C and m are constants. Substituting this equation in Eq. (5), we get the following polynomial equation m2 þ am þ b ¼ 0:
ð7Þ
First- and Second-Order Linear ODEs First-Order ODEs We frequently encounter growth and decay problems. Such problems are solutions of the first-order ODEs. We get the following first-order inhomogeneous equation in general case, for instance, in Newton’s cooling problem used in the study of cooling of earth and geological bodies: dy ¼ f ðtÞy þ gðtÞ, dt
ð2Þ
where, f(t) and g(t) are time-dependent coefficients in general form. The solution of this equation is obtained by first multiplying both sides by e
f ðt0 Þdt0
d e dt
and then rearranging the equation as
f ðt 0 Þdt 0
y ¼ e
which has the general solution as
f ðt 0 Þdt 0
g ðt Þ,
(3)
This has two roots m1 and m2, which can be real or complex conjugate. Thus, the solution is written as yðtÞ ¼ C1 em1 t þ C2 em2 t :
ð8Þ
When both roots are the same (m1 ¼ m2), the solution is given as yðtÞ ¼ ðC1 þ C2 tÞem1 t :
ð9Þ
For complex roots such as m1 m2, the solution is given by yðtÞ ¼ em1 t ðC1 cos m1 t þ C2 cos m2 tÞ:
ð10Þ
The particular solution is obtained by using the method of trial and error in simple cases when the right-hand side is made of power, exponential or trigonometric functions. Some general forms of particular solution taken are C, Ct þ D, Ct2 þ Dt þ E, Ce pt, C cos pt þ D sin pt. In general case, the method of variation of parameters is used to get a particular solution. The particular solution is assumed as linear combination of solutions of homogeneous equation
O
1026
Ordinary Differential Equations
yP ¼ C1 ðtÞy1 þ C2 ðtÞy2 :
ð11Þ
using qualitative methods initially developed by Poincare for studying the stability of the solar/planetary systems.
This is substituted in the defining equation and resulting equation for C1(t), and C2(t) is solved. The values of constants are determined by the initial conditions.
Solution Strategies
Nonlinear Equations
Two-Point Boundary Value Problem These types of problems are posed as, with y-a vector
Three such equations under this category are the logistic equation, the predator-prey equations, and the Lorenz equation. In the first two cases, analytical solution can be written whereas in the last case no analytical solution has been found so far. The logistic equation, used in all population studies, is written as dy y ¼ ry 1 , yð0Þ ¼ y0 : dt K
ð12Þ
Here, K is called carrying capacity, which is the maximum value of the population growing with maximum rate of growth as r, called Malthusian parameter. The solution of this equation is y¼
K 1þ
K y0
1 ert
:
ð13Þ
The predator ( y)–prey (x) equations, also known as the Volterra-Lotka equations, are written as, p, q, r, and s being constants, dx dy ¼ px qxy, ¼ ry þ sxy: dt dt
ð14Þ
No general analytical solution has been obtained for these equations except for special cases of coefficients. However, an implicit solution can be obtained by combining these equations in the phase plane as dy/dx ¼ (ry þ sxy)/(px qxy), and the solution is sx r log x ¼ p log y qy þ C; C a constant
ð15Þ
This is a closed curve in phase plane (x, y). C is determined by initial conditions for (x, y). In this way, the nature of solution can be obtained. Lorenz equations, constituted of three ODEs, also occur frequently in the study of chaos in geosciences. These are, s, r, and β being constants, dx dy dz ¼ sðy xÞ, ¼ xðr zÞ y, ¼ xy bz: dt dt dt
ð16Þ
No exact analytical solution to these equations has been found. However, the nature of solution can be deciphered
dy ¼ f ðy, tÞ, a < t < b: dt
ð17Þ
The boundary conditions at both ends are given either values of y or dy dt , written as BðyðaÞ, yðbÞÞ ¼ 0:
ð18Þ
The process of solving boundary value problem is to reduce them to initial value problem. In shooting method, we assume missing initial value and solve the problem as initial value problem, y(a) ¼ x. The solution is substituted in the boundary condition as Bðx, yðb; xÞÞ ¼ 0:
ð19Þ
This is a nonlinear function, and its root can be found. Simple shooting which is done from the boundary can be extended to multiple shooting by dividing the whole interval into subintervals and then shooting from one end of subinterval and using continuity of values at the nodes. Eigenvalue problems are also boundary value problems. Here, the aim is to find values of coefficients in the differential equation such that both equation and boundary value are satisfied. The problem is posed, here, y – a vector dy ¼ lf ðyÞ, a < t < b; yðaÞ ¼ 0 ¼ yðbÞ: dt
ð20Þ
Functions satisfying boundary conditions are substituted in the equation to get an algebraic equation for eigenvalue. The roots of the equation are eigenvalues, and corresponding orthogonal functions are eigen functions (see ▶ “Eigenvalues and Eigenvectors”). In a simple case d2 y ¼ ly; yð0Þ ¼ 0 ¼ yðbÞ: dt2
ð21Þ
We have the solution of this equation as p p yðtÞ ¼ A sin l t þ B cos l t,
ð22Þ
where y(0) ¼ 0 shows that constant B ¼ 0. Thus, nontrivial p solutions are possible if sin l b ¼ 0. The solution of this
Ordinary Differential Equations
1027
equation gives eigenvalues of the ln ¼ (nπ/b)2; n ¼ 1, 2, . . ., and the solution of boundary value problem is yn ¼ A sin nπt/b. After normalization, the eigen functions are yn ¼ 2=b sin npt=b. Green’s Function Approach The Green’s function is the solution of a differential equation when the inhomogeneous term is given by Dirac delta function. Thus, for the Green’s function G(t, t0) of second-order differential equation, we have the defining equation as d2 Gðt, t0 Þ dGðt, t0 Þ þ qðtÞGðt, t0 Þ ¼ dðt t0 Þ, þ pðtÞ dt dt2
ð24Þ
By integrating the defining equation around t ¼ t0, the following two conditions should be satisfied Gðt, t0 Þjt0 þ Gðt, t0 Þjt0 ¼ 0,
ð25aÞ
G0 ðt, t0 Þjt0 þ G0ðt,t0 Þ
ð25bÞ
t0
¼ 1:
Thus, the expression for the Green’s function is obtained from two solutions of the homogeneous equation, y1, satisfying boundary condition at t ¼ 0, and y2, satisfying boundary condition at t ¼ d, as y1 ðt Þy2 ðt o Þ 0 < t < to W ðt o Þ y2 ðt Þy1 ðt o Þ to < t < d W ðt o Þ
(26)
This method helps to solve inhomogeneous ODEs. Transform Methods Initial value problem of ODEs can be converted to solving an algebraic equation after taking Laplace transformation (see ▶ “Laplace Transform”). After obtaining solution of this equation, inverse Laplace takes solution in transform domain to original domain. Laplace and inverse Laplace transform are defined as
1 yðtÞ ¼ L Y ðpÞ ¼ 2pi
1
e 0
pt
yðtÞdt,
ð27Þ
ept Y ðpÞdp; ReðpÞ ¼ s: siϵ
pY y0 ¼ aY :
ð30Þ
the result is
Solving for Y and taking inverse Laplace transformation give yðtÞ ¼ y0 eat :
ð31Þ
Linear Systems With the advent of computers, such equations are written in matrix form as d y1 dt y2
¼
1 0 b a
y1 0 þ : y2 f ðt Þ
(32)
This is further written as dz ¼ Mz þ g: dt
ð33Þ
This is an example of linear dynamical system. The solution is written by mimicking first-order equation as 0
z ¼ CeMt þ eMðtt Þ f ðt0 Þdt
ð34Þ
There are large numbers of ways to calculate the exponential of a matrix. Best way is to use the method of singular value decomposition of a matrix. This matrix approach can be generalized to N number of equations. Constants are determined using the initial/boundary conditions (for finite intervals in t). Numerical Solution The ordinary differential equations, such as y-scale (vector) dy ¼ f ðy, tÞ, yðt0 Þ ¼ y0 , dt
ð35Þ
are approximated by, e.g., the Euler method as
sþiϵ
1
ð29Þ
This approach can be used for any order of ordinary differential equations.
Gð0, t0 Þ ¼ 0 and Gðd, t0 Þ ¼ 0:
Y ðpÞ ¼ L fyðtÞg ¼
dy ¼ ay, yðt¼ 0Þ ¼ y0 , dt
ð23Þ
with the boundary conditions as
Gðt, t 0 Þ ¼
Applying this to a simple first-order ODE
ð28Þ
dy ðyðt þ DtÞ yðtÞÞ=Dt, dt
ð36aÞ
O
1028
Ordinary Differential Equations
yðt þ DtÞ ¼ yðtÞ þ Dtf ðyðtÞ, tÞ þ O Dt2 ,
ð36bÞ
or more generally, with higher order schemes, say the RungeKutta, the fourth order, as k1 ¼ Dtf ðyn , tn Þ,
ð37aÞ
k2 ¼ Dtf yn þ
k1 , t þ Dt=2 , 2 n
ð37bÞ
k3 ¼ Dtf yn þ
k2 , t þ Dt=2 , 2 n
ð37cÞ
k4 ¼ Dtf ðyn þ k3 , tn þ DtÞ,
ð37dÞ
yðtn þ DtÞ ¼ yðtn Þ þ ðk1 þ 2k2 þ 2k3 þ k4 Þ=6 þ O Dt5 : ð37eÞ Now, powerful computer codes are available to solve system of ODEs, both nonstiff and stiff, such as odeint() method from scipy.integrate python module. Special Functions In simple cases, the solutions are made of elementary functions such as exponential, logarithmic, trigonometric, or hyperbolic equations. But when the coefficients of the second-order equations are a function of time, i.e., d2 y dy þ pðtÞ þ qðtÞy ¼ 0, dt dt2
ð38Þ
it is no longer possible to write solution in terms of elementary equations. There are a large number of special functions worked out in various applications. The solutions are written in series form as y¼
1
a tn : n¼0 n
In the above, p, a, b, and c are constants. All these function values are tabulated in the literature. These functions appear in describing wave and diffusion phenomena in the earth. As we need to compute the special functions, now it is preferable to use numerical solution of ordinary differential equations.
ð39Þ
This is substituted in the equation, and by equating the coefficients of tn, the values of an0s are determined. If coefficients are singular, the expansion is taken around those points under some regularity conditions. Some frequently occurring special function equations are as follows (Arfken et al. 2015): Bessel’s Equation: x2y00 þ x y0 þ (x2 – p2)y ¼ 0 Legendre’s Equation: (1 – x2)y00 2xy0 þ p( p þ 1)y ¼ 0 Hermite Equation: y00 2xy0 þ py ¼ 0 Gauss Hypergeometric Equation: x(1 x)y00 þ [c (a þ b þ 1)x]y0 aby ¼ 0 Laguerre’s Equations: xy00 þ (1 x)y0 þ py ¼ 0 Chebyshev’s Equation: (1 – x2) y00 xy0 þ p2y ¼ 0 Airy’s Equation: y00 p2xy ¼ 0
Qualitative Methods Sometimes, it is not possible to find the solution of nonlinear problems. In such cases, it is still possible to find the solution in a qualitative form. This method was developed in the beginning of this century when Poincare’ studied the problem of planetary system stability. Instead of determining the trajectory of planetary system, he devised methods to know whether such a system is stable or unstable. For instance, we are given dy ¼ f ðyÞ: dt
ð40Þ
We can find the equilibrium points of this problem by solving f(ye) ¼ 0. We can also find the stability of the equilibrium point ye by finding the Jacobian @f/@y at y ¼ ye. The nature of eigenvalues gives the nature of stability/instability. Such studies have been done in studying earth systems (Kaper and Engler 2013). This can be demonstrated by a simple case dy ¼ m y2 , dt
ð41Þ
p which has equilibrium points at ye ¼ m . The stability of the system is deciphered from the sign of the following expression evaluated at equilibrium points df ¼ 2y at y ¼ ye : dy
ð42Þ
p p Thus, ye ¼ m m is stable (unstable). This is called pitchfork bifurcation.
Example of ODEs in Geoscience Geochronology Estimation of the age of rocks is an important study area in geosciences to reconstruct the geological processes in time and evolutionary history of the earth. The problem of decay of radiogenic elements N(t) with time, t, can be expressed in terms of a first-order ODE dN ðtÞ ¼ lN ðtÞ, dt
ð43Þ
Ordinary Differential Equations
1029
where l is the decay constant. For prescribed initial condition N|t ¼ 0 ¼ N0, the solution is obtained as N ðtÞ ¼ N 0 expðlt Þ:
ð44Þ
Since our interest is in finding age, the above solution gives t¼
1 N ln : l N0
ð45Þ
Here, we need the values of two unknown parameters l and N0 to get the age. l can be obtained in terms of half-life (t1/2) of the radioactive element, defined as the time when its initial concentration N0 is reduced to half. From this, we get l ¼ ln 2/t1/2. However, N0 is not known. To circumvent this unknown, concentration of the stable decay product, called the daughter product, is used which is related to the parent element as D ¼ N 0 N ¼ N elt 1 :
ð46Þ
Thus, time is given in terms of the ratio, D/N, as t¼
1 D ln þ1 : l N
l N elt el1 t : l1 l 0
ð48Þ
C
dT ¼ Qð1 aÞ ðA þ BT Þ: dt
ð52Þ
Such equation has been used to find changes in the earth when solar luminosity changes due to solar processes. Evolution of Mantle Temperature Earth’s mantle cools via loss of initial heat and thermal convection. The ordinary differential equation for mantle average temperature is obtained by equating the rate of mantle temperature change with difference of heat due to exponentially decaying radiogenic heat h0elt,and cooling at the earth surface (q) (Korenaga 2016) dT ¼ rDh0 elt qðT Þ: dt
ð53Þ
Here C, D, and r are heat capacity, thickness, and density of mantle, respectively. In general, q(T ) is a nonlinear function; however, this is linearized as q ¼ q0 þ
dq ðT T o Þ: dT
ð54Þ
Thus, we get a linear differential equation for the evolution of mantle temperature which can be easily solved. ð49Þ
In case l l1, this equation reduces to the condition called secular equilibrium. l1 D lN :
ð51Þ
The surface temperature, time, heat capacity, incident solar radiation, albedo, and Stefan-Boltzmann constant are denoted respectively by T, t, C, Q, α, and s. Albedo is also a function of temperature. Thus, this nonlinear equation needs to be solved by numerical method. However, for constant albedo and Budykov approximation for outgoing radiation as linear function of T (in C) as (A þ BT), this is reduced to linear ODE given by
CrD
The solution for D is D¼
dT ¼ Qð1 aÞ sT 4 : dt
ð47Þ
Many times, there is a decay series, daughter product decaying to its own daughter products. For the case of the daughter product decaying to its subproduct with the decay constant l1, we get dD ¼ lN l1 D ¼ lN 0 elt l1 D: dt
C
ð50Þ
Energy Balance at the Earth’s Surface Earth surface receives short wave radiation from the Sun and emits long wavelength radiation. This is modified by the surface albedo, atmospheric greenhouse gases, and planetary heat capacity. Using black body radiation law, the energy balance equation at the earth’s surface is written as (Kaper and Engler 2013)
Overland Flow on a Hillslope First-order ODEs also describe the overland flow in one horizontal direction (x) on a hillslope (see ▶ “Earth Surface Processes”), as given by (Anderson and Anderson 2010) dQx ¼ R I: dx
ð55Þ
Here Qx, R, and I are water flux, rainfall, and infiltration, respectively. Using the following relation between water flux, velocity, U, and thickness of overland flow, h, Qx ¼ Uh, U ¼
1 f
ghS,
ð56Þ
O
1030
Ordinary Differential Equations
where g and S are gravity acceleration and hillslope, respectively; the expression for the thickness of the overland flow with distance from the hill top is expressed as hð x Þ ¼
f ðR I Þ gS
2=3
x2=3 :
ð57Þ
Crustal Geotherm The defining equation for the steady-state temperature distribution with depth in the continental crust is obtained by using corresponding special case of the heat diffusion equation given by (Turcotte and Schubert, 2010) d2 T K 2 ¼ AðzÞ: dz
ð58Þ
Here, temperature, depth, thermal conductivity, and radiogenic heat source concentration are denoted by T, K, z, and A(z), respectively. The boundary conditions given at the earth surface (z ¼ 0) and at the base of the crust (z ¼ l) are Tjz¼0
dT ¼ T 0 and K dz
b ¼ b0 eð1þiÞz=z0 ; z0 ¼
¼ ql ,
ð59Þ
z¼l
AðzÞ ¼ A0 exp
z , d
ð60Þ
where A0 and d are the concentration of radiogenic sources at the surface and the characteristic depth. The solution of the problem is z ql A d2 z 0 1 ed : K K
ð61Þ
This expression has been used extensively to derive crustal geotherms. Diffusion of Magnetic Field in Crust The magnetic field due to external electric currents diffuses inside the earth’s crust and the skin depth of diffusion is determined in terms of electrical conductivity of the rocks. This distribution is governed by the Maxwell’s equations. For the horizontal magnetic field component bx diffusing with depth, denoted by z positive downward, and having time variation as eiot, a special case of Maxwell’s equations is obtained as d2 b ¼ iom0 sb: dz2
ð62Þ
2 , om0 s
ð63Þ
where b0 is the amplitude of the magnetic field at the earth’s surface and z0 is called the skin depth that represents the length scale of the diffusion. Channel Flow in Orogens In orogens, sometimes, deep rocks are exposed at the earth’s surface, and to explain it channel flow is postulated. The equation for variation in horizontal velocity u(z) with depth z in a viscous (m viscosity) channel of thickness d due to applied pressure gradient dp/dx is approximated from the Navier-Stokes equation as (Turcotte and Schubert 2014) m
where T0 is the surface temperature and ql is the heat flux at the base of the crust. The distribution of the radiogenic heat sources is given by
T ¼ T0 þ
Here, electrical conductivity and permeability are denoted, respectively, by s and m0, and o is the angular frequency of the signal. The part of the solution which decreases with depth is given by
d 2 u dp ¼ : dz2 dx
ð64Þ
With top velocity as u0 and bottom velocity as 0, the solution of this equation is uðzÞ ¼ u0 1
z 1 dp 2 þ z dz : d 2m dx
ð65Þ
Such solutions have been used to explain exposure of deeper rock in the Himalayan orogen. Hillslope Forms Hillslopes have convex shapes, and their weathered products diffuse toward their base. In the presence of tectonic uplift (U ), the steady state height of hillslope (denoted by z(x), with diffusion coefficient as D) connected to a river at horizontal distance x ¼ L from its peak at x ¼ 0, is expressed, along with boundary conditions, as (Anderson and Anderson 2010) d dz dz D þ U ¼ 0, ðx¼ 0Þ ¼ 0, zðLÞ ¼ 0: dx dx dx
ð66Þ
The solution of the above equation is zðx Þ ¼
U 2 2 L x : 2D
ð67Þ
How much sediment is being put into the river can be determined by deriving the value of the flux at the river associated with uplift.
Ordinary Differential Equations
1031
Groundwater Mound Steady state water table (h(x), x horizontal axis) in an aquifer (conductivity denoted by k) bounded by rivers on both sides (h(x ¼ 0) ¼ h1, h(x ¼ b) ¼ h2) under recharge I is governed by second-order differential equation (see ▶ “Flow in Porous Media”) as (Anderson 2008) d dh kh þ I ¼ 0: dx dx
Such reactions can be described by a system of ordinary differential equations which would describe the changes in the concentration of various elements. We can write reaction equation for dissolution of carbon dioxide in water as dx=dt ¼ k1 x þ k1 y, dy=dt ¼ k1 x k1 y k2 y þ k2 p,
ð68Þ
dz=dt ¼ k2 y þ k2 p, dp=dt ¼ k2 y k2 p,
ð74aÞ ð74bÞ
The solution is easily written for constant recharge as h ¼ h20 þ
I x bx x2 þ h21 h20 k b
1=2
:
ð69Þ
The expression can be used to calculate the amount of water exchange taking place between river and aquifer due to changes in recharge and river levels. Carbon Cycle The most import part of carbon cycle at shorter timescale is about how much carbon is being put into the ocean. This problem can be understood by making box model of the coupled ocean and atmosphere and using conservation of mass to find out stocks and fluxes from both boxes. In a simplest model, we have f for anthropogenic input to atmosphere, with fluxes from atmosphere (ocean) to ocean (atmosphere) aCA(bCO)
where x, y, z, and p are concentration of CO2(g), CO2(aq), H20, and H2CO3, respectively. Reaction constants for forward and backward reactions are denoted by k’s. These equations can be written in matrix form and solved by using matrix methods. Geophysical Systems Geophysical problems are represented by a set of partial differential equations. These problems are reduced to a set of ODEs after discretizing the spatial domain. Method of lines is a well-developed tool to take this approach. We consider a simple advection-diffusion PDE which occurs quite frequently in geoscience @u @u @2u ¼a þb 2: @t @x @x
ð75Þ
ð70Þ
The spatial derivatives are discretized for the domain (1 < i < M), which gives a set of ODEs as
In steady state, we have CA/CO ¼ b/a. This equation in matrix form can be written as
dui u ui u 2ui þ ui1 ; 1 < i < M: ð76Þ ¼ a iþ1 þ b Iþ1 dt Dx ðDxÞ2
dCA dCO ¼ aCA þ bCO þ f , ¼ aCA bCO : dt dt
d CA dt C O
¼
a b a b
CA f ðt Þ þ : CO 0
(71)
This matrix equation can be solved by using linear system method. Geochemical Reactions Most common reactions, such as dissolution, hydrolysis, hydration, and oxidation, occur near the earth’s surface due to its interaction with the atmosphere. On the one hand, these provide chemical elements for life processes, and on the other, these also lead to environmental hazard. In a simple case, dissolution of carbon dioxide in water is written as CO2 ðgÞ $ CO2 ðaqÞ,
ð72Þ
CO2 ðaqÞ þ H2 O $ H 2 CO3 :
ð73Þ
With requisite boundary and initial conditions, a solution can be obtained analytically or numerically. Similar approach is used to discretize space part of second-order diffusion and wave equations.
Summary Ordinary differential equations (ODEs) pervade geosciences. When studying long-term behavior of earth, box models are used which are described by a set of ODEs. ODEs also occur when using system’s thinking about earth. Also, when focus is on steady state and one space-dimensional problem, again ODEs are used. Further, numerical solutions of PDEs often are obtained by decomposing them into a set of coupled ODEs. Both initial and boundary value problems for ODEs have been developed along with eigenvalue/eigenvectors and
O
1032
Green’s function method. For variable coefficients, some solutions are written in terms of special functions, or recourse to numerical methods is taken using powerful computer programs like highly popular Python’s odeint() method. This entry has described the ODEs as applicable to the earth problems.
Cross-References ▶ Earth Surface Processes ▶ Eigenvalues and Eigenvectors ▶ Flow in Porous Media ▶ Laplace Transform ▶ Partial Differential Equations Acknowledgments AM carried out this work under the project MLP6404-28 (AM) with CSIR-NGRI contribution number NGRI/Lib/ 2021/Pub-03.
Bibliography Anderson RS (2008) The little book of geomorphology: exercising the principle of conservation. Electronic textbook. http://instaar.colo rado.edu/~andersrs/publications.html#littlebook, p 133 Anderson RS, Anderson SP (2010) Geomorphology: the mechanics and chemistry of landscapes. Cambridge University Press, Cambridge, UK Arfken GB, Weber HJ, Harris FE (2015) Mathematical methods for physicists. Academic Press, San Diego Coddington EA (1989) An introduction to ordinary differential equations. Dover Publications, New York Kaper H, Engler H (2013) Mathematics and climate. SIAM, Philadelphia Korenaga J (2016) Can mantle convection be self-regulated? Sci Adv 2(8):e1601168. https://doi.org/10.1126/sciadv.1601168 Turcotte D, Schubert G (2014) Geodynamics, 3rd edn. Cambridge University Press, Cambridge, UK
Ordinary Least Squares Christopher Kotsakis Department of Geodesy and Surveying, School of Rural and Surveying Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
Definition Ordinary least squares. A mathematical framework for analyzing erroneous observations in order to extract information about physical systems based on simple modeling hypotheses.
Ordinary Least Squares
Introduction An important aspect of the geosciences is to make inferences about physical systems from imperfect data. For this purpose, it is often required to work with mathematical models linking a set of unknown parameters with other observable quantities through known equations. Least squares theory originated from the aforesaid necessity and provides a standard framework to obtain information about the physical world on the basis of inferences drawn from modeled observations in the presence of additive errors. Initially motivated by estimation problems in astronomy and geodesy (Nievergelt 2000), least squares theory was developed independently by Gauss and Legendre around 1795–1820, about 40 years after the advent of robust L1 estimation. Its historical tracing has been studied by several scientists (e.g., Seal 1967; Plackett 1972; Sheynin 1979; Stigler 1991), and it constitutes one of the most interesting disputes in the history of science. The remarkable monograph by Gauss (1823) already contains almost the complete modern theory of least squares including elements of the theory of probability distributions, the definition and properties of the Gaussian distribution, and a discussion of the Gauss-Markov theorem on the statistical properties of the least squares estimator. In this entry, we give a brief overview of ordinary least squares (OLS) which is the simplest manifestation of least squares theory for analyzing noisy data with deterministic parametric models.
General Setting of the Problem Consider a known vector l ¼ [l1, l2, . . ., ln]T of n observations. From physical, mathematical or empirical considerations, it is expected that these observations can be explained by a smaller set of m parameters x ¼ ½ x1 , x2 , . . . , xm T through a known predictive model f :
ℝ m ! ℝ n ðn > m Þ x 7! f ðxÞ
ð1Þ
which is represented by a nonlinear over-determined system of n equations with m unknowns l ¼ f ð xÞ
ð2Þ
The fundamental problem to be discussed herein is the inference of the model parameters from the information contained in the observations through the inversion of the above system (Menke 2015). In most applications of physical sciences and engineering, such a system does not have an exact unique solution, that is, no parameter vector x ℝm exists that can reproduce the given data l ℝn. This setback
Ordinary Least Squares
1033
originates from the unavoidable presence of errors in the redundant observations (and perhaps from hidden deficiencies in the model choice itself). Hence, to a certain degree of simplification, the above problem may be viewed as a quest for a unique solution among the infinite ones that satisfy the under-determined system of observation equations l ¼ f ðxÞ þ v
ð3Þ
where the residual vector v contains the unknown observation noise and other possible errors due to imperfect data modeling. In this context, OLS is an optimal method for solving the above system in order to obtain a best estimate of the model parameters based on algebraic or statistical principles.
OLS: Algebraic Aspects From an algebraic perspective, the rationale of OLS is to provide an estimator for the model parameters, denoted hereafter by x OLS, that minimizes the Euclidean (squared) distance between the observations and their model-based prediction (Menke 1989; Rao and Toutenburg 1999) x
OLS
2
¼ arg min
kl f ð xÞ k
¼ arg min
T
x ℝm x ℝm
ð l f ð xÞ Þ ð l f ð xÞ Þ
ð4Þ
The above solution does not rely on any stochastic or statistical considerations about the observation errors. It is merely a best-fitting solution in the sense of minimizing the residual vector in Eq. (3) or more precisely the sum of the squared values of its elements (hence the term “ordinary least squares”). Model Linearization To explicitly compute x OLS in practice, one first has to linearize the predictive model around known approximate values for the parameters xo and then apply the least squares criterion to its linearized version in order to solve for the firstorder corrections δx ¼ x xo. Of course, this step needs to be properly iterated (i.e., the solution of each step is used as initial approximation in the next step) until sufficient convergence is reached in the estimated parameters. As an alternative, instead of applying the OLS principle to the linearized model, one may directly solve the nonlinear optimization problem of Eq. (4) by using some well-known iterative technique (e.g., Gauss-Newton method, steepest descent gradient method). We prefer to follow the first approach here, since it is the one that is commonly used in least squares problems with nonlinear parametric models. For more details, see, e.g., Sen and Srivastava (1990, pp. 298–316).
In the following, it is assumed that the linearization of the predictive model l ¼ f(x) was applied beforehand. Moreover, for the sake of notation simplicity, we shall retain the same symbols in both sides of the linearized model which is expressed hereafter as l ¼ A x, instead of the lengthened form l f(xo) ¼ A (x xo). The Jacobian matrix A ¼ @f/@x, known as the design matrix, is computed with the help of approximate parameter values, and it needs to be updated during the iterative implementation of OLS estimation in nonlinear models. Estimation of Model Parameters Based on the previous linear(-ized) framework, the objective function to be minimized in the optimal criterion of Eq. (4) takes the form kl f ð xÞ k 2 ¼ ð l A xÞ T ð l A x Þ ¼ l T l 2l T A x þ xT AT A x
ð5Þ
A necessary condition to minimize the last expression is the nullification of its partial derivative @ kl f ð xÞ k2 ¼ 2l T A þ 2xT AT A ¼ 0 @x
ð6Þ
which leads to the so-called normal equations AT A x ¼ AT l
,
Nx¼b
ð7Þ
where N ¼ ATA is called the normal matrix. If the design matrix A has full rank, then the normal matrix is invertible and the normal equations admit a unique solution which is the OLS estimator of the model parameters (Koch 1999; Rao and Toutenburg 1999) x OLS ¼ N 1 b ¼ AT A
1
AT l
ð8Þ
Model-Based Prediction In practice, the user’s interest may be confined on x OLS or it may extend to its predictive ability for other quantities that depend on the model parameters – such quantities are called response variables in the terminology of regression analysis (Draper and Smith 1998; Sen and Srivastava 1990). In fact, the observations that were used to estimate the model parameters can be predicted anew as follows: l
OLS
¼ A x OLS ¼ A AT A
1
AT l ¼ H l
ð9Þ
where H ¼ A (ATA)1AT is an important matrix, called the hat matrix, which executes the orthogonal projection from the observation space ℝn onto the column space of the design
O
1034
Ordinary Least Squares
matrix. The latter is denoted as Im{A}, and it corresponds to a particular subspace of ℝn spanning all possible predictions of the observables via the adopted model, that is, Im{A} ¼ {A x, x ℝm}. The hat matrix is symmetric and idempotent while its trace is equal to the number of model parameters (tr H ¼ m). The predicted vector in Eq. (9) upgrades the original observations in the sense of reducing (not eliminating) their random errors to get new values that are compatible with the parametric model – this replacement OLS
of l by l is generally termed least squares adjustment (Teunissen 2000). Residuals The OLS residuals reflect the difference between the original observations and their model-based predictions, and they are given by the equation v OLS ¼ l A x OLS ¼ l A AT A
1
AT l
¼ ðI H Þ l
ð10Þ
where I is the n n unit matrix and I H corresponds to the orthogonal projector from the observation space ℝn onto the orthogonal complement of Im{A}. This matrix is also symmetric and idempotent, while its trace is equal to the degrees of freedom of the estimation problem at hand (tr(I H) ¼ n m). The OLS residuals give a useful metric for the effectiveness of the solution, but they cannot be exploited in more comprehensive quality analyses or statistical testing procedures without the aid of auxiliary hypotheses for the observation errors. Geometrical Interpretation of OLS The OLS method provides a best estimate of the model parameters by splitting the observations into two orthogonal components: a model-based predicted part and a residual part
which is attributed to measurement and/or modeling errors. This decomposition is algebraically expressed as l ¼ ðI H Þ l þ H l ¼ v OLS þ l whereas the orthogonality condition l
OLS
OLS T
ð11Þ
v OLS ¼ 0 is
easily verified from the properties of the hat matrix. The above splitting allows a simple geometrical interpretation of OLS which is depicted in Fig. 1.
OLS: Statistical Aspects The algebraic formulation of the OLS method does not account for the nature of the residuals in the augmented system of observation equations. Its justification is based on the presumption that their values should be small (hence the reasoning of kvk ! min), a fact that implies the availability of a “good” dataset and the usage of a “correct” parametric model. The switch to a statistical formulation allows us to formalize those two assumptions while it offers the necessary framework to test their validity in practice and to assess the quality of the OLS solution. Gauss-Markov Conditions In the statistical context of OLS, the residuals in Eq. (3) are modeled as zero-mean random variables with common variance and zero covariances (Rao and Toutenburg 1999): Efvg ¼ 0, Dfvg ¼ E vvT ¼ s2 I
ð12Þ
where E{} is the expectation operator from probability theory, D{v} is the covariance matrix (also called dispersion matrix) of the residual vector, and s2 is the common variance
l
A2
Im{A}
vˆOLS = l – Axˆ OLS xˆ 2 A2
● Axˆ OLS
xˆ1 A1
Ordinary Least Squares, Fig. 1 The OLS solution is an orthogonal projection from the “observation space” ℝn onto the “model space” Im {A}. The columns of the design matrix form a (non-orthogonal) basis in the model space while the estimated parameters are the “coordinates” of
A1
the model-predicted observables with respect to that basis. For easier visualization, it is assumed that the model contains only two parameters which correspond to columns A1 and A2 of the design matrix
Ordinary Least Squares
1035
of its elements. The above assumptions may also be expressed in the observation domain: Efl g ¼ A x,
ð13Þ
¼ s2 I
Dfl g ¼ E ðl A xÞ ðl A xÞT
and they are known as Gauss-Markov (G-M) conditions which theoretically assure the statistical goodness of the OLS solution (Koch 1999). In simple words, these conditions dictate that the observations are free of any systematic errors, outliers, or blunders, and they are influenced only by (uncorrelated) random errors at the same accuracy level. Moreover, in the absence of such errors, the observations are expected to conform to a linear model which is specified through a full-rank design matrix of error-free elements (A) and a parameter vector with fixed but unknown values (x) that need to be estimated from a given ensemble of noisy observations. Statistical Optimality of OLS Solution If the G-M conditions hold true, then the OLS solution admits optimal properties that justify its use in real-life applications. Specifically, it becomes a linear unbiased estimator of the model parameters, that is E x OLS
¼E
AT A
¼ AT A
1
¼ AT A
1
1
usually amplified by a factor of 3 or 4 in order to increase their confidence level. In a nutshell, the OLS solution minimizes the propagated observation errors to the estimated model parameters. The estimator x OLS is the best linear unbiased estimator which offers optimal accuracy (minimum variance) in the presence of the G-M conditions. What is more, the statistical optimality extends to all model-based predictions that will use the OLS estimator to infer other quantities in a linear(-ized) context, that is, if y ¼ qTx, then its estimate y OLS ¼ qT x OLS is unbiased and it has minimum variance among any other linear unbiased estimator of y from the same data. For more details, see Koch (1999) and the references given therein. If the G-M conditions are invalid in practice, then the OLS solution does not necessarily become useless, yet its results lose their statistical optimality and may be affected by hidden biases. In such cases, the user needs to make appropriate changes to the data and/or the parametric model which would cause approximate compliance with the G-M conditions. Other Statistical Properties Based on Eqs. (9, 10), and after considering the properties of the hat matrix, one easily obtains the covariance matrix of the model-predicted observations
AT l
A T Ef l g
D l
ð14Þ
OLS
¼ DfH l g ¼ H D fl g H T
AT A x
¼s H
¼ x
¼ A A T
1
¼s A A 2
1
AT A
T
A Dfl g 1
¼ s2 AT A
D v OLS
AT l
T
T
T
A A T
A A A A
1
1
T
A
1
T
1
of s (A A) , and it gives a measure of its likely offset from the true value of the respective parameter. For a more realistic assessment of the estimation accuracy, the standard errors are
ð17Þ
2
¼ s ðI H Þ
(15)
and it has the smallest trace among any other linear unbiased estimator of x from the same data l (Rao and Toutenburg 1999; Koch 1999). The square roots of its diagonal elements reflect the (internal) accuracy of the estimated model parameters – also called standard errors – and they are used to describe the statistical goodness of the OLS solution at the parameter level. For example, the standard error of xi denoted by sx is the square root of the ith diagonal element i
¼ DfðI H Þ l g ¼ ðI H Þ Dfl g ðI H ÞT
T
In contrast to D{l}, the covariance matrices D v OLS and D l
2
O
and also the covariance matrix of the OLS residuals
whereas its covariance (dispersion) matrix is D x^ OLS ¼ D
ð16Þ
2
OLS
are always singular, and their diagonal elements
have a key role in the statistical testing of observations regarding the presence of outliers. The last two equations imply the decomposition Dfl g ¼ D v OLS þ D l
OLS
ð18Þ
which is a manifestation of the orthogonality property of OLS in the stochastic domain. The model-predicted observations and the estimated residuals are actually uncorrelated with each other (Draper and Smith 1998)
1036
Ordinary Least Squares
E v OLS l
OLS T
¼0
ð19Þ
and, therefore, the sum of their covariance matrices will be equal to the covariance matrix of the observations. Note that the expected values of these quantities shall comply with the G-M conditions, that is, E l
OLS
¼ A x and E v
OLS
The above estimate (often called residual mean square – RMS) is useful if the data noise level is unknown, as it allows us to replace s2 with s2 in order to infer the statistical accuracy of the results, e.g.
The evaluation of model fit is an important task that often needs to be performed in the context of OLS estimation. In general, a successful solution is characterized by small residuals which indicate a good fit of the observations to the parametric model (or vice versa). A number of relevant measures to quantify the goodness of fit of the solution are briefly described below. For a more detailed discussion, see the respective chapters in Draper and Smith (1998) and Sen and Srivastava (1990). Residual Sum of Squares (RSS) It corresponds to the (squared) Euclidean length of the OLS residual vector. Its value may be computed by three equivalent expressions as follows: ð20Þ
The disadvantage of this quantity is its dependence on the units in which the observations are expressed, thus making it difficult to exploit the RSS for objective assessment of model fit performance. If the above value is divided by the degrees of freedom of the problem at hand, then we obtain an unbiased estimate of the variance of the (true) residuals in the respective model s2 ¼
RSS , E s2 ¼ s2 nm
ð22Þ
Coefficient of Multiple Determination This is a unitless measure which is commonly used to evaluate the model fit in OLS problems (yet it requires caution when comparing different models with the same or several datasets). It is defined as n
v OLS ¼ l T ðI H Þ l ¼ l T v OLS
1
The knowledge of noise level is not required in the numerical computation of the estimated parameters, but it is necessary for computing their associated covariance matrix. Obviously, for s2 to be a meaningful estimate of the noise variance, it needs to be ensured that no “problematic” observations (in violation of the G-M conditions) were used in the OLS solution.
Goodness of Fit
T
¼ s2 AT A
! D x OLS
¼ 0.
OLS and Maximum Likelihood Estimation The OLS estimator does not require the probability distribution function of the observation errors. In fact, the only necessary stochastic elements that need to be specified through the G-M conditions are the first- and second-order moments of the observation vector l. However, in case of normally distributed errors, the OLS estimator becomes equivalent with the maximum likelihood estimator which is inherently related to Bayesian inference under the hypothesis of uniform prior distribution for the model parameters (Rao and Toutenburg 1999).
RSS ¼ v OLS
1
¼ s2 AT A
D x OLS
ð21Þ
2
R ¼
li
i¼1 n
OLS
2
l
li l
¼
2
l
OLS
l
ll
T
l T
OLS
l
ll
i¼1
¼1
RSS ll
T
ll
ð23Þ
where l denotes the mean value of the observations and l is an auxiliary vector whose elements are all equal to l. The value of R2 ranges between 0 and 1, and it reflects the squared correlation coefficient between the observations and their model-predicted part. A perfect fit to the available data would give R2 ¼ 1 (since RSS ¼ 0) which is a highly unlikely event in practice. In general, the closer its value gets to 1, the better is the model fit to the noisy observations. The previous definition of R2 relies on the assumption that the parametric model includes a constant term, which means that the design matrix A should contain a column full of ones (a constant term in a linear model is usually called the “intercept” in statistical literature). As a matter of fact, an alternative interpretation of R2 is that it measures the usefulness of the extra terms other than the intercept in the parametric model (Draper and Smith 1998). Its value is not comparable between entirely different models, but it can be used to compare the fitting performance of nested models. As a result, parametric models which contain an intercept cannot be compared with those without intercept on the basis of R2. Sometimes it is preferable to use an adjusted R2, denoted by R2a , which is given by the formula (Sen and Srivastava 1990)
Ordinary Least Squares n
OLS
R2a ¼ 1 i¼1 n
li
1037 2
l =ð n m Þ 2
li l =ðn 1Þ
¼ 1 1 R2
n1 nm
i¼1
ð24Þ The above coefficient accounts for the sample size (i.e., number of available observations) since it is often felt that small sample sizes tend to inflate the value of R2. This revised metric can be used to compare different parametric models that may be fitted not only to a specific dataset but also to two, or more, different sets of observations. Note that R2a may take negative values, and it practically serves as a gross indicator for the model fit in OLS problems.
observation has much higher leverage than the others, then it is poorly predicted by the model parameters, and its prediction is likely to absorb the largest part of a hidden outlier or blunder. The sum of the leverages is equal to tr H ¼ m which means that their average value is H ii ¼ m=n. Therefore, an alarm for very influential observations which may be outliers could be set if H ii > 2m=n or maybe H ii > 3m=n (Draper and Smith 1998). High-leverage observations usually originate from poor experimental design, and they pose a threat to the integrity of OLS solutions. For the purpose of detecting bad observations that do not comply with the G-M conditions, more valuable than the ordinary residuals v OLS are the so-called standardized or (internally) studentized residuals given as ei ¼
Residual Analysis
viOLS
1=2
var viOLS
The observations to be analyzed by least squares methods may be affected by outliers or blunders due to improper conditions in data collection or other external disturbances. Depending on the magnitude of such errors and the importance of the affected observables, the OLS solution may be seriously harmed even by a single bad observation. The analysis of residuals aims to identify bad observations which should be excluded from the estimation procedure as they violate the first of the G-M conditions in Eq. (13). One of the most important tips to remember here is the following: the OLS residuals do not correspond to the true observation errors but they are related to them through the orthogonal projection v OLS ¼ ðI H Þ v
ð25Þ
Although the above equation does not have practical value, it shows that the OLS residuals always give a “blurred” snapshot of the true observation errors. In particular, a gross error in a single observation is partly “transferred” to the residuals of other observations, thus causing difficulties to isolate problematic observations solely on the basis of v OLS . Of special importance in residual analysis are the diagonal elements {Hii} of the hat matrix H, which are linked with the variances of the model-predicted observations as follows var li
OLS
¼ s2 H ii
ð26Þ
and they represent the so-called leverages of the respective observables (Belsley et al. 1980). Their values are always positive numbers between 0 and 1, and their importance should be mainly considered in a relative context: if an
v OLS ¼ pi s 1 H ii
ð27Þ
whose variance is always equal to 1. A modified version of the above residuals e0i ¼
ei
p
nm1 n m e2i
ð28Þ
known as (externally) studentized residuals is mostly used in practice (Draper and Smith 1998). These residuals follow the t-distribution with n m 1 degrees of freedom under the normality assumption for the observation errors, and their values are used by most software packages as test statistics in single outlier detection.
Summary Least squares theory carries a long history dating back to the late eighteenth century, and it offers a powerful framework for analyzing noisy data in a broad range of applications in physical sciences and engineering. The estimation problems that can be treated by the least squares methodology involve a variety of modeling setups with assorted complexity levels, which extend well beyond the simple OLS case that was covered in this entry. Most of these problems are of high relevance to various branches of mathematical geosciences, including the inversion of rank-deficient linear models with or without the presence of prior information, the least squares approximation of continuous signals from discrete observations, the handling of correlated data noise with regular or singular covariance matrix, the recursive estimation of timevarying parameters in dynamic systems (Kalman filtering), and the parameter estimation in the so-called errors-in-
O
1038
variables models. On the other hand, the OLS methodology and its associated diagnostic tools remain an attractive option for handling parameter estimation and model-fitting problems with homogeneous datasets, having as major advantage the straightforward and easy-to-implement mathematical framework that was outlined in this entry.
Cross-References ▶ Best Linear Unbiased Estimation ▶ Iterative Weighted Least Squares ▶ Least Median of Squares ▶ Least Squares ▶ Maximum Likelihood ▶ Multiple Correlation Coefficient ▶ Multivariate Analysis ▶ Normal Distribution ▶ Optimization in Geosciences ▶ Random Variable ▶ Rao, C. R. ▶ Regression ▶ Statistical Outliers ▶ Statistical Quality Control
Ordinary Least Squares
Bibliography Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics. Wiley Draper NR, Smith H (1998) Applied regression analysis. Wiley Gauss CF (1823) Theoria combinationis observationum erroribus minimis obnoxiae: pars prior, pars posterior. Commentat Soc Regiae Sci Gott Recent 5:1823 Koch K-R (1999) Parameter estimation and hypothesis testing in linear models. Springer, Berlin/Heidelberg Menke W (1989) Geophysical data analysis: discrete inverse theory. Academic, San Diego Menke W (2015) Review of the generalized least squares method. Surv Geophys 36:1–25 Nievergelt Y (2000) A tutorial history of least squares with applications to astronomy and geodesy. J Comput Appl Math 121:37–72 Plackett RL (1972) The discovery of the method of least squares. Biometrika 59(2):239–251 Rao CR, Toutenburg H (1999) Linear models, least squares and alternatives. Springer, New York Seal HL (1967) The historical development of the Gauss linear model. Biometrika 54(1–2):1–24 Sen A, Srivastava M (1990) Regression analysis: theory, methods and applications. Springer, New York Sheynin OB (1979) C.F. Gauss and the theory of errors. Arch Hist Exact Sci 20:21–72 Stigler SM (1991) Gauss and the invention of least squares. Ann Stat 9: 465–474 Teunissen PJG (2000) Adjustment theory, an introduction. Delft Academic Press
P
Partial Differential Equations R. N. Singh1 and Ajay Manglik2 1 Discipline of Earth Sciences, Indian Institute of Technology, Gandhinagar, Palaj, Gandhinagar, India 2 CSIR-National Geophysical Research Institute, Hyderabad, India
Laplacian is written as ∇2 fxx þ fyy þ fzz in the Cartesian coordinate system. 2. Diffusion equation governing transport of heat, chemical concentration, low-frequency electromagnetism, stress diffusion in earthquake generation, groundwater flow, geomorphology, and many more, written as ∇2f ¼ @f/@t. 3. Wave phenomenon governing propagation of acoustic, elastic, and water waves due to sudden disturbances in the Earth system, written as ∇2f ¼ @ 2f/@t2.
Definition Partial differential equation: A differential equation that contains unknown functions of two or more independent variables and their partial derivatives
Introduction Geosciences use physical laws to derive information about the properties and processes in Earth based on observations of physical fields. Physical laws are those which govern mechanical, thermal, electromagnetic, and chemical behaviors of continuous media. These consist of conservation laws and constitutive relations. This leads to laws as a set of partial differential equations (PDEs) (Officer 1974; Kennett and Bunge 2008). PDEs in general form are written as a function of space, time, function, and its derivatives, i.e., f x, y, z, t, f, fx , fy , fz, , fxx , fyy , fzz ¼ 0, where fx
@f : @x ð1Þ
Most of the physicochemical processes in Earth are governed by second-order PDEs. Three classic examples of linear homogeneous PDEs, elliptic, parabolic, and hyperbolic types, respectively, are (Morse and Feshbach 1953): 1. Potential field equation governing static gravity, magnetic, and electrostatic fields, written as ∇2f ¼ 0, where the
For time harmonic waves or fields, diffusion and wave equations are reduced to the Helmholtz equation ∇2f ¼ k2f. PDEs describing mantle and core convection involved in quantifying plate tectonic processes and origin of the magnetic field are also reduced to the above elliptic, parabolic, and hyperbolic equations under various approximations of flow conditions likely to be met in Earth. For various problems in geoscience, these equations are written in the Cartesian, cylindrical, or spherical coordinates. The above equations are supplemented by boundary and initial conditions to solve problems related to a vast area of geosciences. There are three kinds of boundary conditions, namely, the Dirichlet, the Neumann, and the Robin boundary conditions. These are: • Dirichlet condition: f is prescribed on the boundary. • Neumann condition: ∇f is prescribed on the boundary. • Robin condition: af þ b∇f is prescribed on the boundary. Mathematical investigations of the PDEs reveal that not all solutions of these equations may be useful for interpreting observations. Hadamard (1902) proposed a set of criteria which a well-posed problem should fulfill. These are as follows: (i) the solution should exist, (ii) the solution should be unique, and (ii) the solution should depend on the given data continuously, i.e., it should be stable. The equations should have proper initial and boundary conditions. Like for heat equation, we should have one initial condition along with
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
1040
Partial Differential Equations
boundary conditions. And for second order wave equation we should have two initial conditions along with boundary conditions. Those problems which are not well posed in the sense of Hadamard criteria are called ill-posed problems. In some cases, solutions may not exist, for some it may not be unique, and for some others it may be unstable with respect to changes in given data. There are regularization methods to make such problems useful for interpreting data. The basic strategy to solve PDEs is to split the problem into a series of initial value problems in ordinary differential equations (ODEs) (▶ Ordinary Differential Equations) or system of linear algebraic equations. For simple geometries in linear problems, the solution can be obtained by using wellknown method of separation of variables, method of Green’s function, and integral transform methods. For nonlinear and highly heterogenous problems, numerical methods such as finite difference or finite elements are used which involve ultimately solving a set of algebraic equations. These methods are briefly outlined below.
Method of Separation of Variables This method is discussed here for the potential field equation in two-dimensional Cartesian coordinate system to describe temperature distribution, T(x, z), in the crust. The domain extends in the horizontal (x) and vertical (z) directions as l x l and 0 z d, respectively, with z positive downward. The PDE is written as 2
2
@ T @ T þ ¼ 0, @x2 @z2
The solutions to this eigenvalue problem for X are orthogonal functions Xn(x) ¼ cos (lnx), ln ¼ nπ/l, and the solution for Z satisfying boundary condition at z ¼ 0 is Z ¼ sinh (nπz/l). Therefore, the solution of T can be expressed as 1
T ðx, yÞ ¼
cn cosðnpx=lÞ sinhðnpz=lÞ:
Applying boundary condition at z ¼ d yields 1
f ðx Þ ¼
cn cosðnpx=lÞ sinhðnpd=lÞ:
ð7Þ
n¼0
Using the orthogonality property of sin(nπ/l ), we get expression for cn as cn ¼
2 sin hðnpd=lÞ
a 0
f ðxÞ cosðnpx=lÞ:
ð8Þ
Thus, for the given function f(x), the solution can be obtained by evaluating this integral. This method can also be applied to time-dependent heat conduction problem (k – diffusivity) applied to cooling of a hot geological body, posed as @T @2T k 2 ¼ 0, 0 z < d, 0 t < 1: @t @z
ð9Þ
The initial and boundary conditions are
ð2Þ
with the boundary conditions as
ð6Þ
n¼0
T ðz, 0Þ ¼ f ðzÞ,
ð10aÞ
uð0, tÞ ¼ 0 ¼ uðd, tÞ:
ð10bÞ
T ðx, 0Þ ¼ 0, T ðx, dÞ ¼ f ðxÞ,
ð3aÞ
Using the method of separation of variables, we get two ordinary differential equations as
@T @T ðl, zÞ ¼ 0, ðl, zÞ ¼ 0: @y @y
ð3bÞ
dT ðtÞ d 2 Z ðzÞ ¼ kl2 T ðtÞ, ¼ l2 Z ðzÞ: dt dz2
Assuming that the variables are separable, the solution can be written as T(x, y) ¼ X(x)Z(z), which transforms Eq. 2 to a set of two ODEs: 1 d2 X 1 d2 Z ¼ ¼ l2 , X dx2 Z dz2
ð4Þ
dX dX Z ð0Þ ¼ 0 ¼ Zðd Þ, ðlÞ ¼ 0 ¼ ðlÞ: dx dx
The differential equation for Z is posed as an eigenvalue problem with the solution Zn ¼ sin (lnz), where ln ¼ (nπ/d). Thus, the solution is 1
uðz, tÞ ¼
Cn sin n¼1
with respective boundary conditions as ð5Þ
ð11Þ
npz l2n kt e : d
ð12Þ
The constants Cn are obtained by using the initial condition, which gives
Partial Differential Equations
1041 1
uðz, 0Þ ¼ f ðzÞ ¼
npz : d
Cn sin n¼1
ð13Þ
Using the orthogonality of eigenfunctions, we get expression for constants Cn as d
Cn ¼ 2 f ðxÞ sinðnpz=dÞdz:
ð14Þ
0
For given values of f(z), g(z), we can get the values of these constants. We have presented solutions in the Cartesian geometry. However, many problems in seismology and geomagnetism need spherical model of Earth. In such cases, the governing equation is written in the spherical coordinate system. Separated ODEs made use of special functions like spherical harmonics for latitude and longitude dependencies and Bessel functions to represent radial variations.
Thus, the solution is given by 1
uðz, tÞ ¼ 2 n¼1
npz sin d
d 0
Method of Eigenvalue: Eigenfunction Expansion 0
0
0 l2n kt
f ðz Þ sinðnpz =dÞdz e
: ð15Þ
This method is also useful for solving wave equation problem posed as (here, u denotes displacement) 2
2
@ u 1 @ u ¼ 2 2 , 0 < t < 1, 0 z d, c dz dt 2 uð0, tÞ ¼ 0 ¼ uðd, tÞ,
ð16aÞ ð16bÞ
@u uðx, 0Þ ¼ f ðxÞ, ðx, 0Þ ¼ gðxÞ: @t
ð16cÞ
Using the method of separation of variables, u ¼ T(t)Z(z), two ordinary differential equations are d2 T ðtÞ d2 Z ðzÞ 2 2 ¼ l c T ð t Þ, ¼ l2 ZðzÞ, Z ð0Þ ¼ 0 ¼ Z ðd Þ: dt2 dz2 ð17Þ The differential equation for Z poses an eigenvalue problem with the solution Zn ¼ sin (lnz), where ln ¼ (nπ/d ). The solution of the PDE (Eq. 16a) is 1
uðz, tÞ ¼
sin n¼1
npz d
Cn sin
npct npct þ Dn cos d d
:
This method can be used to solve more general heat/chemical/ stress diffusion problems with inhomogeneous source and boundary conditions, such as @T @2T ¼ k 2 þ gðz, tÞ, 0 < t < 1, 0 z d, @t dz
ð21aÞ
T ð0, tÞ ¼ aðtÞ, T ðd, tÞ ¼ bðtÞ, T ðz, 0Þ ¼ f ðzÞ:
ð21bÞ
The solution is written in two parts such that the boundary conditions become homogeneous, i.e., T ðz, tÞ ¼ uðz, tÞ þ vðz, tÞ:
The function v(z, t) is chosen as v(z, t) ¼ a(t) þ z(b(t) a(t))/d so that the equation for u(z, t) becomes @u @2u ¼ k 2 þ g0 ðx, tÞ, 0 < t < 1, 0 z d, @t dz
Cn ¼
1 d
1 Dn ¼ npc
d 0
f ðzÞ sin d
npz , d
npz : gðzÞ sin d 0
ð19Þ
ð20Þ
ð23Þ
where g0(z, t) ¼ g(z, t) @v/@t. The boundary and initial conditions of u(z, t) become u(0, t) ¼ 0 ¼ u(d, t) and u(z, 0) ¼ f(x) v(z, 0), respectively. The solution for u(z, t) is thus obtained by using the following eigenvalues and eigenfunctions (▶ Eigenvalues and Eigenvectors):
ð18Þ Constants Cn, Dn are found by using the orthogonality property of the eigenfunctions, i.e.,
ð22Þ
ln ¼
np , un ¼ sinðln zÞ: d
ð24Þ
The method described for wave equation in previous section can be followed to get the solution: T ðz, tÞ ¼vðz, tÞ n
d
i¼1
0
þ
2
2
ekln ðttÞ g0n ðtÞdt þ ekln t T n ð0Þ sin ln z: ð25Þ
P
1042
Partial Differential Equations
Here, Tn(0) is given by T n ð 0Þ ¼
2 d
d 0
1
T ðz, pÞ ¼ L fT ðz, tÞg ¼
ðf ðxÞ vðz, 0ÞÞ sinðln zÞdz:
ð26Þ
The above formulation used the Dirichlet boundary condition. Similar approach can be used to find solution for the Neumann and the Robin boundary conditions. The above formulation can also help in solving following more general advection-diffusion equation which occurs in several transport processes like heat, groundwater flow (▶ Flow in Porous Media), and dispersion of chemical substance in the underground: @T @T @2T þv ¼k 2 , @t @z dz T ð0, tÞ ¼ 0, T ðd, tÞ ¼ 0, T ðx, 0Þ ¼ f ðzÞ:
ð27aÞ ð27bÞ
We can use the following transformation to the variable T(z, t) T ðz, tÞ ¼ T 0 ðz, tÞeaxþbt ,
ð28Þ
We assume b þ va a k ¼ 0 and v 2ak ¼ 0, which gives a ¼ v/2k and b ¼ v2/4k. With this, advection diffusion and initial/boundary conditions reduce to
ept T ðz, tÞdt,
ð31Þ
with the inverse Laplace transform given by 1 T ðz, tÞ ¼ L T ðz, pÞ ¼ 2pi
sþiϵ
1
ept T ðz, pÞdp; ReðpÞ ¼ s: siϵ
ð32Þ For a large number of functions, Laplace transform pairs have been obtained over the years and are well tabulated. We illustrate this method by estimating the temperature in halfspace for prescribed initial condition, posed as @T @2T k 2 ¼ 0, 0 < z < 1, 0 < t < 1, @t @z T ðz, 0Þ ¼ A, T ð0, tÞ ¼ 0:
ð33aÞ ð33bÞ
Taking Laplace transform of Eq. 33a, we get pT ðz, pÞ A ¼ k
to get @T 0 @T 0 @2T0 þ b þ va a2 k kT 0 þ ðv 2akÞ ¼ k 2 : ð29Þ @t @z dz
0
d 2 T ðz, pÞ , dz2
ð34Þ
the solution of which is pp pp A T ðz, pÞ ¼ Cez k þ Dez k þ : p
ð35Þ
2
@T 0 @2T0 ¼k 2 , @t dz T 0 ð0, tÞ ¼ 0, T 0 ðd, tÞ ¼ 0, T ðx, 0Þ ¼ f ðzÞevx=2k :
ð30aÞ ð30bÞ
This can then be solved by using eigenvalue-eigenvector expansion method discussed above.
Method of Integral Transforms Integral transforms reduce PDEs to ODEs. The choice of integral transforms depends on the nature of boundary condition. A popular method for solving time-dependent problems is the Laplace transform method (▶ Laplace Transform). Laplace transform method, in respect of time 0 < t < 1, is defined as
As the solution needs to be finite as z tends to infinity, the solution takes the form as T ðz, pÞ ¼ Dez
pp k
þ
A : p
ð36Þ
The value of D is found by using surface boundary condition as D ¼ A/p. Thus, the solution in transform domain is T ðz, pÞ ¼
pp A 1 ez k : p
ð37Þ
The solution in time domain is obtained by performing inverse Laplace transformation. The solution is T ðz, tÞ ¼ A 1 erfc p
z 4kt
:
ð38Þ
Similarly, Laplace transform can be used to solve first-order PDE occurring in geomorphology, h, representing elevation in river long profile under uplift denoted by b (▶ Earth-Surface Processes):
Partial Differential Equations
a
1043
@h @h þ ¼ b, 0 < t < 1, 0 < x < 1, @x @t
ð39aÞ
1
Inverse finite sine ð0 < x < LÞ : f ðxÞ ¼
Fn sinðnpx=LÞ n¼1
hðx, 0Þ ¼ 0, hð0, tÞ ¼ 0:
ð39bÞ Finite cosine ð0 < x < LÞ : Fn ¼
Taking Laplace transform of this equation, we get
2 p
L 0
f ðxÞcosðnpx=LÞdx;
Neumann boundary condition
a
dhðx, pÞ þ phðx, pÞ ¼ b=p: dx
ð40Þ
¼
The solution of the equation is h¼
Inverse finite cosine ð0 < x < LÞ : f ðxÞ
b þ Cexp=a : ap
ð41Þ
F0 þ 2
1
Fn cosðnpx=LÞ n¼1
These transforms are helpful in solving PDEs with applicable boundary conditions.
Using the boundary condition, we get C ¼ b/(ap), and the solution in transform domain is
Green’s Function Approach b b 1 exp=a hðx, pÞ ¼ 1 exp=a ¼ : ap a p p
ð42Þ
Taking inverse Laplace transformation, we get b hðx, tÞ ¼ ð1 H ðt x=aÞÞ: a
ð43Þ
There is a large class of integral transforms and their inverse, and these have been tabulated. Some of these transforms are (Carslaw and Jaeger 1959): 1 Fourier ð1 < x < 1Þ : FðkÞ ¼ p 2p 1
1 Inverse Fourier : f ðxÞ ¼ p 2p
1 1
2 Fourier sine ð0 < x < 1Þ : FðkÞ ¼ p
0
1 1
f ðxÞe
FðkÞe
ikx
ikx
dx
dk
f ðxÞ sinðkxÞdx;
Dirichlet boundary condition 1
Inverse Fourier sine : f ðkÞ ¼ Fourier cosine ð0 < x < 1Þ : FðkÞ ¼
0
2 p
FðkÞ sinðkxÞdx
1 0
f ðxÞ cosðkxÞdx;
Green’s function arises in geoscience problems when attention is focused on generation of fields by sources. For example, in case of earthquakes, the generated displace field u, expressed in terms of potential f as u ¼ ∇ f, due to seismic sources f(r, t), can be obtained by solving (Aki and Richards 2002) ∇2 f
Inverse Fourier cosine : f ðkÞ ¼
Finite sine ð0 < x < LÞ : Fn ¼
2 p
L 0
0
ð44Þ
Green’s function, G(r, t, r0, t0), corresponding to the above problem is given by the solution of ∇2 G
1 @2G ¼ dðr r 0 Þdðt t0 Þ, a2 @t2
ð45Þ
representing point and instantaneous source. Taking Fourier transform (▶ Fourier Transform) of this equation, we get ∇2 þ k2 G ¼ dðr r 0 Þeiwt0 :
ð46Þ
The solution of this equation, which decays to zero as r ! 1, is G¼
Neumann boundary condition 1
1 @2f ¼ f ðr, tÞ: a2 @t2
eik jrr0 j iwt0 e : 4pjr r 0 j
ð47Þ
Taking inverse Fourier transform, we get FðkÞ sinðkxÞdx G¼
0 d t t0 rr c : 4pjr r 0 j
ð48Þ
f ðxÞ sinðnpx=LÞdx;
Dirichlet boundary condition
Knowing Green’s function solution, the solution of Eq. 44 is obtained by integrating Green’s function over volume, i.e.,
P
1044
Partial Differential Equations
fðr, tÞ ¼ Gðr, r 0 , t, t0 Þf ðr 0 , t0 Þdr 0 dt 0 :
ð49Þ
Thus, the displacement field can be constructed for any given source.
f0 ¼ AðxÞeiwBðxÞ : Substituting in the wave equation, we get ∇2 A w2 Aj∇Bj2 i 2w∇A:∇B þ wA∇2 B ¼
Method of Characteristics for Wave Equations The method of characteristics also reduces a PDE into a set of ODEs. This is demonstrated by the following equation describing erosion due to stream power in geomorphology (▶ Earth-Surface Processes) in the presence of tectonic uplift: a
@h @h þb ¼ c, hð0Þ ¼ f ðxÞ: @x @t
ð50Þ
This PDE is equivalent to the following three ODEs, called characteristic and compatibility equations: dx dt dh ¼ a, ¼ b, ¼ c: ds ds ds
ð51Þ
The initial conditions for these equations at s ¼ 0 are, being parameter along x-axis, x ¼ x0 ðÞ, t ¼ t0 ðÞ, h ¼ h0 ðÞ:
x ¼ as þ , t ¼ bs, h ¼ cs þ f ðÞ:
ð53Þ
Transforming in the original domain, we get solution as hðx, tÞ ¼ ct=b þ f ðx at=bÞ:
ð54Þ
This can be extended to initial and boundary value problems.
w2 A: a2 ð57Þ
By separating real and imaginary parts, we get the following two equations: ∇2 A w2 Aj∇Bj2 þ
w2 A ¼ 0, a2
2w∇A:∇B þ wA∇2 B ¼ 0:
ð58aÞ ð59bÞ
The real part is rewritten as 1 2 1 ∇ A j∇Bj2 þ 2 ¼ 0: a Aw2
ð60Þ
In high-frequency situation, the first term can be ignored to get the eikonal or ray equation: j∇Bj2 ¼
ð52Þ
We also have t ¼ 0, x ¼ , h ¼ f(). Thus, the solutions of characteristic curves are, a, b, and c being constants,
ð56Þ
1 ¼ n2 : a2
ð61Þ
The reciprocal of velocity, denoted by n, is called the slowness parameter. Wavefront is given by B ¼ constant. Ray path is given by ∇B. This is a first-order hyperbolic equation and can be solved by using the method of characteristics. A simple solution is A ¼ 1, B ¼ nx:
ð62Þ
Thus, we get plane wave solution as f ¼ eiwðnxþtÞ :
ð63Þ
The imaginary part gives transport equation as
Ray Method in Wave Propagation
2∇A:∇B þ A∇2 B ¼ 0:
Seismological knowledge is built on using the solution of wave equation in interpreting earthquake data. The most commonly used solution is ray solution of wave equation (Aki and Richards 2002). For time harmonic waves (f ¼ f0eiwt), the wave equation reduces to ∇2 f0 ¼
ðiwÞ2 0 f: a2
ð55Þ
High-frequency waves propagate along rays. In this, the solution is sought in the following form:
ð64Þ
This equation is used to find the amplitude of the propagating waves.
Similarity Solutions It is possible to transform PDE into ODE by symmetry transformations. Following dilation, symmetry transformation gives the so-called similarity solution:
Partial Differential Equations
1045
x x0 ¼ , t0 ¼ t=Lb : L
ð65Þ
In case of heat diffusion problem
Finite Difference Method
@T @2T k 2 ¼ 0, 0 < z < 1, 0 < t < 1, @t @z T ðz, 0Þ ¼ T m , T ð0, tÞ ¼ T 0 ,
ð66aÞ ð66bÞ
p a similarity parameter ¼ z=2 kt is defined such that the diffusion equation reduces to d2 T dT ¼ 0: þ 2 2 d d
ð67Þ
To set up boundary condition, temperature is also made nondimensional by using T0 ¼ (T Tm)/(Tm T0) such that T 0 ð0Þ ¼ 1, T 0 ð1Þ ¼ 0:
ð68Þ
The solution is obtained in terms of the error function as 2 T 0 ¼ 1 erf ðÞ, where erf ðÞ ¼ p p
equation over the elements. This again results in a set of algebraic equations.
0
2
ey dy:
ð69Þ
This solution is used in quantifying the heat flow, bathymetry, seismicity, and many other aspects of structure and evolution of the oceanic lithosphere (Turcotte and Schubert 2002).
Numerical Methods Very few PDEs representing the real Earth situations have analytical solutions for the given geometry and distribution of properties. In these cases, recourse to numerical methods is taken (▶ Computational Geosciences). Numerical methods aim at approximating PDEs to a system of linear algebraic equations which can be solved iteratively. There are two main methods for solving PDEs numerically, finite difference and finite element. In finite difference method, the derivatives occurring in the equations are approximated by difference forms, with values of the variables at discrete nodes. This results in a set of algebraic equations. In the finite element method, whole geometry is divided into finite elements; such elements could be finite interval in 1D or triangle in 2D or tetrahedron in 3D. Within the elements, continuous variation of the variables is assumed in terms of their values on the nodes of the elements. PDE is written in variational form over the domain, and the equations in terms of nodal values are derived by integrating the variation
Semi-Discrete Method
We illustrate this method on the following heat diffusion problem: @T @2T k 2 ¼ gðzÞ, 0 < z < d, 0 < t < 1, @t @z
ð70aÞ
T ðz, 0Þ ¼ f ðzÞ, T ð0, tÞ ¼ 0 ¼ T ðd, tÞ:
ð70bÞ
We discretize z-axis as zi ¼ iΔz, i ¼ 0, 1, 2, . . ., n þ 1; Δz ¼ d/(n þ 1). On this grid, we have the following discrete form of the second space derived in the equation @ 2 T T ðzIþ1 , tÞ 2T ðzi , tÞ þ T ðzI1 , tÞ : @z2 ðDzÞ2
ð71Þ
Thus, the governing equation is approximated as dT i T 2T i þ T i1 þ gðzi Þ, i ¼ 1, 2, 3 . . . n: ð72aÞ k iþ1 dt ðDzÞ2 T i ¼ T ðzi , tÞ, T 0 ¼ 0 ¼ T nþ1 , T i ð0Þ ¼ f ðzi Þ:
ð72bÞ
These equations can be solved as a system of firstorder ODEs. Semi-Discrete Collocation Method
In this method, the space part of solution is written in terms of some basis functions (▶ Radial Basis Functions), fj (z) : j ¼ 1, . . n as n
T ðz, tÞ
aj ðtÞfj ðzÞ:
ð73Þ
j¼1
We substitute this expression in the governing equation and get n
a0 j ðtÞfj ðzi Þ ¼ k
j¼1
n
aj ðtÞfj 00 ðzi Þ
i¼1 n
þ
gðzi Þfj ðzi Þ:
ð74Þ
i¼1
This can be written in matrix form as a0 ¼ kM1 N a þ M1 b:
ð75Þ
P
1046
Partial Differential Equations
Here a ¼ ½a1 , a2 , . . . an T ; mij ¼ fj ðzi Þ; nij ¼ f00j ðzi Þ; bij ¼ gðzi Þfj ðzi Þ. This is a system of ODEs which can be solved by following ODE solution methods.
To proceed further to get a set of ODEs, the solution is written in terms of orthogonal basis functions as n
aj ðtÞfj ðzÞ:
T ðz, tÞ
Fully Discrete Methods
ð81Þ
j¼1
Here, both space and time domain are discretized. The time is discretized as tk ¼ Δt, k ¼ 1, 2, . . .. Fully discrete form of the governing equation is T ðtkþ1 , zk Þ T ðtk , zk Þ T ðzIþ1 , tk Þ 2T ðzi , tk Þ þ T ðzI1 , tk Þ k Dt ðDzÞ2
The test function is also taken as vðzÞ ¼ fi ðzÞ: Substituting both in the integral expressions, we get
þ gðzi Þ,
d
ð76aÞ
0
T ðtkþ1 , zk Þ ¼T ðtk , zk Þ kDt ðT ðzIþ1 , tk Þ 2T ðzi , tk Þ þ T ðzI1 , tk ÞÞ þ ðDzÞ2
fi ðzÞ
ð76bÞ
n
@ @t
aj ðtÞfj ðzÞ dz j¼1
@fi ðzÞ @ @z @z 0 d
þk ¼
þ gðzi ÞDt:
ð82Þ
d 0
n
aj ðtÞfj ðzÞ
ð83Þ
j¼1
fi ðzÞgðzÞdz:
We denote The initial and boundary conditions are T ðzi :0Þ ¼ f ðzi Þ, T ð0, tk Þ ¼ 0 ¼ T ðznþ1 , tk Þ:
Mij ¼
ð77Þ
This problem can be solved iteratively. This is called explicit method. The space and time steps are chosen so that the solution is stable. This requires meeting the CourantFriedrichs-Lewy (CFL) criteria:
N ij ¼
0
fi ðzÞfi ðzÞdz,
@fi ðzÞ @fj ðzÞ dz, @z @z 0 d
gi ¼
kDt 1 : ðDzÞ2 2
d
ð78Þ
d 0
fi ðzÞgðzÞdz:
ð84aÞ
ð84bÞ
ð84cÞ
We then get
Finite Element Method The most useful method in FEM uses weak formulation of the problem. Weak solutions are applicable to geophysical situations for which classical solutions having smoothness and other regulatory condition are not applicable. In this case, the PDE is reduced to set of ODEs. Here, governing equation is multiplied by a test function and integrated over the whole domain. Thus, for one-dimensional heat diffusion equation, we get
M
da þ Na ¼ g: dt
ð85Þ
The initial condition is obtained by using the orthogonality properties of fj n
T ðz, 0Þ ¼ f ðzÞ ¼
aj ð0Þfj ðzÞ:
ð86Þ
j¼1 d 0
vðzÞ
@T @2T k 2 dz ¼ @t @z
d 0
vðzÞgðzÞdz:
ð79Þ
The second term on the right-hand side is integrated by parts, and using the boundary condition on the test function, we get d 0
v ðzÞ
@T dz þ k @t
d
@v @T ¼ 0 @z @z
d 0
vðzÞgðzÞdz:
ð80Þ
These ODEs are solved to get the solution of the PDE. Deep Learning Method PDEs also are used in deep learning algorithms. Space-time observations are fitted to PDEs with unknown coefficients which are then determined by using an optimization process based on neural network (▶ Neural Networks). This then helps to know the nature of physical processes involved in
Particle Swarm Optimization in Geosciences
generating the observations. Deep learning methods (▶ Machine Learning and Geosciences) are also used to find the solution of PDEs by approximating the functions by neural networks. In one such method, called deep Galerkin method (Raissi 2018), the solution which in the FEM is approximated by certain basis functions is instead approximated by neural network. This method helps to solve faster when grids and basis functions become large in multidimensional geometries.
Summary Most processes in geosciences vary with space and time. These processes are governed by conservation laws and constitutive relationships for various materials occurring in Earth. Combining these, we get laws of continuum physics written as partial differential equations. These equations are in general second-order partial differential equations. General strategy in solving these equations is to convert them into a set of ordinary differential equations by using the method of separation of variable, transformation methods, or numerical methods. We have presented these developments as applied to the study of processes related to the Earth.
Cross-References ▶ Computational Geoscience ▶ Earth Surface Processes ▶ Eigenvalues and Eigenvectors ▶ Flow in Porous Media ▶ Fast Fourier Transform ▶ Laplace Transform ▶ Machine Learning ▶ Neural Networks ▶ Ordinary Differential Equations ▶ Radial Basis Functions Acknowledgments AM carried out this work under the project MLP6404-28(AM) with CSIR-NGRI contribution number NGRI/Lib/ 2021/Pub-02.
Bibliography Aki K, Richards PG (2002) Quantitative seismology, 2nd edn. University Science Books Anderson RS, Anderson SP (2010) Geomorphology: the mechanics and chemistry of landscapes. Cambridge University Press Carslaw HS, Jaeger JA (1959) Conduction of heat in solids, 2nd edn. Oxford University Press Hadamard J (1902) Sur les problèmes aux dérivées partielles et leur signification physique. Princeton University Bulletin, pp 49–52
1047 Kennett BLN, Bunge H-P (2008) Geophysical continua – deformation in the Earth’s interior. Cambridge University Press Morse P, Feshbach F (1953) Methods of theoretical physics, vol 1. McGraw-Hill Officer CB (1974) Introduction to theoretical geophysics. Springer Raissi M (2018) Deep hidden physics models: deep learning of nonlinear partial differential equations. J Mach Learn Res 19:1–24 Turcotte DL, Schubert B (2002) Geodynamics, 2nd edn. Cambridge University Press
Particle Swarm Optimization in Geosciences Joseph Awange1, Béla Paláncz2 and Lajos Völgyesi2 1 School of Earth and Planetary Sciences, Discipline of Spatial Sciences, Curtin University, Perth, WA, Australia 2 Department of Geodesy and Surveying, Budapest University of Technology and Economics, Budapest, Hungary
Definition One of the nature-inspired meta-heuristic global optimization methods, the particle swarm optimization (PSO), is here introduced and its application in geosciences presented. The basic algorithm with an illustrative example is discussed and compared with other global methods like simulated annealing, differential evolution, and Nelder-Mead method.
Global Optimization In global optimization problems, where an objective function f(x) having many local minima is considered, finding the global minimizer (also known as global solution or global optimum) x* and the corresponding f * value is of concern. Global optimization problems arise frequently in engineering, decision-making, optimal control, etc. (Awange et al. 2018). There exist two huge but almost completely disjoint communities (i.e., they have different journals, different conferences, different test functions, etc.) solving these problems: (i) a broad community of practitioners using stochastic natureinspired meta-heuristics and (ii) academicians studying deterministic mathematical programming methods. To solve global optimization problems, where evaluation of the objective function is an expensive operation, one needs to construct an algorithm that is able to stop after a fixed number of evaluations M of f(x), from which the lowest obtained value of f(x) is used. For global optimization, it is not the dimension of the problem that is important in local optimization but rather the number of allowed function evaluations (often called budget). In other words, when one has the possibility to evaluate f(x)
P
1048
M times (these evaluations are hereinafter called trials), in the global optimization problem of the dimension 5, 10, or 100, the quality of the found solution after M evaluations is crucial and not the dimensionality of f(x). This happens because it is not possible to adequately explore the multidimensional search region D at this limited budget of expensive evaluations of f(x). For instance, if D R20 is a hypercube, then it has 220 vertices. This means that one million of trials are not sufficient not only to explore well the whole region D but even to evaluate f(x) at all vertices of D. Thus, the global optimization problem is frequently NP-hard. Meta-heuristic algorithms widely used to solve real-life global optimization problems have a number of attractive properties that have ensured their success among engineers and practitioners. First, they have limpid nature-inspired interpretations explaining how these algorithms simulate behavior of populations of individuals. Other reasons that have led to a wide spread of meta-heuristics are the following: they do not require high level of mathematical preparation to understand them, their implementations are usually simple, and many codes are freely available. Finally, they do not need a lot of memory as they work at each moment with only a limited population of points in the search domain. On the flip side, meta-heuristics have some drawbacks, which include usually a high number of parameters to tune and the absence of rigorously proven global convergence conditions ensuring that sequences of trial points generated by these methods always converge to the global solution x*. In fact, populations used by these methods can degenerate prematurely, returning only a locally optimal solution instead of a global one or even nonlocally optimal point if it has been obtained at one of the last evaluations of f(x) and the budget of M evaluations has not allowed it to proceed with an improvement of the obtained solution.
Nature-Inspired Global Optimization Meta-heuristic algorithms are based on natural phenomenon such as particle swarm optimization simulating fish schools or bird groups; firefly algorithm simulating the flashing behavior of the fireflies; artificial bee or ant colony representing a colony of bees or ants in searching the food sources; differential evolution and genetic algorithms simulating the evolution on a phenotype and genotype level, respectively; harmony search method that is inspired by the underlying principles of the musicians’ improvisation of the harmony; black hole algorithm that is motivated by the black hole phenomenon, namely, if a star gets too close to the black hole, it will be swallowed by the black hole and is gone forever; immunized evolutionary programming where adaptive mutation and selection operations are based on adjustment of artificial immune system; and cuckoo search based on
Particle Swarm Optimization in Geosciences
the brood parasitism of some cuckoo species, along with Levy flights random walks, just to mention some of them.
Particle Swarm Optimization The idea of the particle swarm optimization (PSO) was inspired by the social behavior of big groups of animals, like flocking and schooling patterns of birds and fish, and suggested in 1995 by Russell Eberhart and James Kennedy, (see in Yang 2010). Let us imagine a flock of birds circling over an area where they can smell a hidden source of food. The one who is closest to the food chirps the loudest, and the other birds swing around in his/her direction. If any of the other circling birds comes closer to the target than the first, it chirps louder and the others veer over toward him/her. This tightening pattern continues until one of the birds happens to land upon the food. So the motion of the individuals of the group is influenced by the motion of their neighborhood (Fig. 1). In the searching space, we define discrete points as individuals (particles) which are characterized by their position vector determining their data (fitness) values to be maximized (similar to fitness function) and their velocity vectors indicating how much the data value can change and a personal best value indicating the closest the particle’s data has ever come to the optimal value (target value). Let us consider a population of N individuals, where their * location vectors are p i , the objective function to be maxi* mized is F , and the fitness of an individual is F p i . These values measure the attraction of the individual, namely, the higher its fitness value, the more the followers in its neighborhood, which will follow its motion. The local velocity of * the individual v i will determine its new location after Δt time step. The velocity value is calculated according to how far an individual’s data is from the target. The further it is, the larger is the velocity value. In the bird example, the individuals furthest from the food would make an effort to keep up with the others by flying faster toward the best bird (the individual having the highest fitness value in the population).
Particle Swarm Optimization in Geosciences, Fig. 1 Birds are swarming
Particle Swarm Optimization in Geosciences
1049
Each individual’s personal best value only indicates the closest the particle’s data has ever come to the target since the algorithm started. The best bird value only changes when any particle’s personal best value comes closer to the target than best bird value. Through each iteration of the algorithm, best bird value gradually moves closer and closer to the target until one of the particles reaches the target.
8. Go back to (2) and compute new fitness value with the new positions.
Illustrative Example Let us demonstrate the algorithm with a 2D problem. Let us suppose that the local geoid can be described by the following function (see Fig. 2):
Definitions and Basic Algorithm f ðx, yÞ ¼ We can give the following somewhat simplified basic algorithm of the PSO: 1. Generate the position and velocity vectors of each particle randomly: *
*
*
*
and V ¼
P ¼ p 0, . . . , p i, . . . , p N
*
*
n 0, . . . , n i, . . . , n N :
ð11 þ xÞ2 þ ð3 þ yÞ2 50 sin
50 sin
*
*
:
3. Store and continuously update the position of each particle where its fitness has the highest value: *best
*best
*best
P ¼ p0 , . . . , pi , . . . , pN
*
¼ max F p * p P
þ
ð11 þ xÞ2 þ ð9 þ yÞ2 ð11 þ xÞ2 þ ð9 þ yÞ2
The parameters to be initialized are
:
4. Store and continuously update the position of the best particle where its fitness has the highest value (global best): *best pg
50 sin
þ
ð11 þ xÞ2 þ ð3 þ yÞ2 ð11 þ xÞ2 þ ð3 þ yÞ2
F ð PÞ ¼ F p 0 , . . . , F p i , . . . , F p N
ð11 þ xÞ2 þ ð9 þ yÞ2 þ
ð6 þ xÞ2 þ ð9 þ yÞ2 ð6 þ xÞ2 þ ð9 þ yÞ2
2. Compute the fitness of the particle: *
ð6 þ xÞ2 þ ð9 þ yÞ2
:
N ¼ 100; (number of individuals), l ¼ {{20, 20}, {20, 20}}; (location ranges), m ¼ {{0.1, 0.1}, {0.1, 0.1}}; (velocity ranges), M ¼ 300; (number of iterations), ’1 ¼ 0.2; (local exploitation rate), ’2 ¼ 2.0; (global exploitation rate).
5. Modify the velocity vectors of each particle considering its personal best and the global best: * new ni
*
*best
¼ n i þ ’1 p i
*
*best
p i þ ’2 p g
*
pi
for
1iN
Remark: Coefficient ’1 ensures escaping from local optimum, since the motion of a particle does not follow the boss blindly! 6. Update the positions (Δt ¼ 1): * new pi
*
* new
¼ pi þ n i
for
1 i N:
7. Stop the algorithm if there is no improvement in the objective function during a certain number of iterations or the number of iterations exceeds the limit.
Particle Swarm Optimization in Geosciences, Fig. 2 The 2D objective function
P
1050
Particle Swarm Optimization in Geosciences
Figure 3 shows the distribution of the individuals after 300 iterations, while Table 1 shows the results of different global methods.
Variants of PSO Although particle swarm optimization (PSO) has demonstrated competitive performance in solving global optimization problems, it exhibits some limitations when dealing with optimization problems with high dimensionality and complex landscape. Numerous variants of even a basic PSO algorithm are possible in an attempt to improve optimization performance. There are certain research trends; one is to make a hybrid optimization method using PSO combined with other optimizers such as the genetic algorithm. Another research trend is to try and alleviate premature convergence (i.e., optimization stagnation), e.g., by reversing or perturbing the movement of the PSO particles. Another approach of dealing with premature convergence is the use of multiple swarms (multi-swarm optimization). Another school of thought is that PSO should be simplified as much as possible without impairing its performance, leading to the parameters being easier to fine-tune such that they perform more consistently across different optimization problems.
Particle Swarm Optimization in Geosciences, Fig. 3 The population distribution after 300 iterations
Particle Swarm Optimization in Geosciences, Table 1 The result of different global methods Method Particle swarm Simulated annealing Differential evolution Nelder-Mead
xopt 6.01717 6.01717 11.0469 11.0469
yopt 9.06022 9.06022 9.07418 9.07418
fopt 10.8406 10.8406 5.3015 5.3015
Initialization of velocities may require extra inputs; however, PSO variant has been proposed that does not use velocity at all. As the PSO equations given above work on real numbers, a commonly used method to solve discrete problems is to map the discrete search space to a continuous domain, to apply a classical PSO, and then to remap the result, for example, by just using rounded values. In general, an important variant and strategy of global optimization is to employ global method to find the close neighborhood of the global optimum and then apply local method to improve the result ( Paláncz 2021).
PSO Applications in Geosciences In this section, some case studies are introduced to illustrate the applicability of PSO algorithm in geosciences. Particle Swarm Optimization for GNSS Network Design The global navigation satellite systems (GNSS) are increasingly becoming the official tool for establishing geodetic networks. In order to meet the established aims of a geodetic network, it has to be optimized, depending on some design criteria (Grafarend and Sansò 1985). Optimization of a GNSS network can be carried out by selecting baseline vectors from all of the probable baseline vectors that can be measured in a GNSS network. Classically, a GNSS network can be optimized using the trial and error method or analytical methods such as linear or nonlinear programming or in some cases by generalized or iterative generalized inverses. Optimization problems may also be solved by intelligent optimization techniques such as genetic algorithms (GAs), simulated annealing (SA), and particle swarm optimization (PSO) algorithms. The efficiency and the applicability of PSO were demonstrated using a GNSS network, which has been solved previously using a classical method. The result shows that the PSO is effective, improving efficiency by 19.2% over the classical method (Doma 2013). GNSS Positioning Using PSO-RBF Estimation Model Positioning solutions need to be more accurate, precise, and obtainable at minimal effort. The most frequently used method nowadays employs a GNSS receiver, sometimes supported by other sensors. Generally, GNSS suffer from signal perturbations (e.g., from the atmosphere, nearby structures) that produce biases on the measured pseudo-ranges. With a view to optimize the use of the satellite signals received, a positioning algorithm with pseudo-range error modeling with the contribution of an appropriate filtering process includes, e.g., extended Kalman filter (EKF) and Rao-Blackwellized filtering (RBWF), which are among the most widely used algorithms to predict errors and to filter
Particle Swarm Optimization in Geosciences
the high frequency noise. A new method of estimating the pseudo-range errors based on the PSO-RBF model is suggested by Jgouta and Nsiri (2017), which achieves an optimal training criterion. The PSO is used to optimize the parameters of neural networks with radial basis function (RBF) in their work. This model offers appropriate method to predict the GNSS corrections for accurate positioning, since it reduces the positioning errors at high velocities by more than 50% compared to the RBWF or EKF methods (Jgouta and Nsiri 2017). Using PSO to Establish a Local Geometric Geoid Model There exist a number of methods for approximating local geoid surfaces (i.e., equipotential surfaces approximating mean sea level) and studies carried out to determine local geoids. In Kao et al. (2017), the PSO method as a tool for modeling local geoid is presented and analyzed. The ellipsoidal heights (h), derived from GNSS observations, and known orthometric heights (H ) from first-order leveling from benchmarks were first used to create local geometric geoid model. The PSO method was then used to convert ellipsoidal heights (h) into orthometric heights (H ). The resulting values were compared to those obtained from spirit leveling and GNSS methods. The adopted PSO method improves the fitting of local geometric geoid by quadratic surface fitting method, which agrees with the known orthometric heights within 1.02 cm (Kao et al. 2017). Application of PSO for Inversion of Residual Gravity Anomalies over Geological Bodies with Idealized Geometries A global particle swarm optimization (GPSO) technique was developed and applied by Singh and Biswas (2016) to the inversion of residual gravity anomalies caused by buried bodies with simple geometry (spheres, horizontal, and vertical cylinders). Inversion parameters, such as density contrast of geometries, radius of body, depth of body, location of anomaly, and shape factor, were optimized. The GPSO algorithm was tested on noise-free synthetic data, synthetic data with 10% Gaussian noise, and five field examples from different parts of the world. This study shows that the GPSO method is able to determine all the model parameters accurately even when shape factor is allowed to change in the optimization problem. However, the shape was fixed a priori in order to obtain the most consistent appraisal of various model parameters. For synthetic data without noise or with 10% Gaussian noise, estimates of different parameters were very close to the actual model parameters. For the field examples, the inversion results showed excellent agreement with results from previous studies that used other inverse techniques. The computation time for the GPSO procedure was very short (less than 1 s) for a swarm size of less than 50. The advantage of the GPSO method is that it is extremely fast and
1051
does not require assumptions about the shape of the source of the residual gravity anomaly (Singh and Biswas 2016). Application of a PSO Algorithm for Determining Optimum Well Location and Type Determining the optimum type and location of new wells is an essential component in the efficient development of oil and gas fields. The optimization problem is, however, demanding due to the potentially high dimension of the search space and the computational requirements associated with function evaluations, which, in this case, entail full reservoir simulations. In Onwunalu and Durlofsky (2010), the particle swarm optimization (PSO) algorithm is applied to determine optimal well type and location. Four example cases are considered that involve vertical, deviated, and dual-lateral wells and optimization over single and multiple reservoir realizations. For each case, both the PSO algorithm and the widely used genetic algorithm (GA) are applied to maximize net present value. Multiple runs of both algorithms are performed, and the results are averaged in order to achieve meaningful comparisons. It is shown that, on average, PSO outperforms GA in all cases considered, though the relative advantages of PSO vary from case to case (Onwunalu and Durlofsky 2010). Introducing PSO to Invert Refraction Seismic Data Seismic refraction method is a powerful geophysical technique in near surface study. In order to achieve reliable results, processing of refraction seismic data in particular inversion stage should be done accurately. In Poormirzaee et al. (2014), refraction travel times’ inversion is considered using PSO algorithm. This algorithm, being a meta-heuristic optimization method, is used in many fields for geophysical data inversion, showing that the algorithm is powerful, fast, and easy. For efficiency evaluation, different synthetic models are inverted. Finally, PSO inversion code is investigated in a case study at a part of Tabriz City in northwest of Iran for hazard assessment, where the field dataset are inverted using the PSO code. The obtained model was compared to the geological information of the study area. The results emphasize the reliability of the PSO code to invert refraction seismic data with an acceptable misfit and convergence speed (Poormirzaee et al. 2014). PSO Algorithm to Solve Geophysical Inverse Problems: Application to a 1D-DC Resistivity Case The performance of the algorithms was first checked by Fernández-Martínez et al. (2010) using synthetic functions showing a degree of ill-posedness similar to that found in many geophysical inverse problems having their global minimum located on a very narrow flat valley or surrounded by multiple local minima. Finally, they present the application of these PSO algorithms to the analysis and solution of an inverse problem associated with seawater intrusion in a
P
1052
coastal aquifer in southern Spain. PSO family members were successfully compared to other well-known global optimization algorithms (binary genetic algorithms and simulated annealing) in terms of their respective convergence curves and the seawater intrusion depth posterior histograms (Fernández-Martínez et al. 2010). Anomaly Shape Inversion via Model Reduction and PSO Most of the inverse problems in geophysical exploration consist of detecting, locating, and outlining the shape of geophysical anomalous bodies imbedded into a quasihomogeneous background by analyzing their effects on the geophysical signature. The inversion algorithm that is currently in use creates very fine mesh in the model space to approximate the shapes and the values of the anomalous bodies and the geophysical structure of the geological background. This approach results in discrete inverse problems with a huge uncertainty space, and the common way of stabilizing the inversion consists of introducing a reference model (through prior information) to define the set of correctness of geophysical models. A different way of dealing with the high underdetermined character of these kinds of problems consists of solving the inverse problem using a low dimensional parameterization that provides an approximate solution of the anomaly via particle swarm optimization (PSO), e.g., Fernández-Muñiz et al. (2020). This method has been designed for anomaly detection in geological setups that correspond with this kind of problem. These authors show its application to synthetic and real cases in gravimetric inversion performing at the same time uncertainty analysis of the solution. The two different parameterizations for the geophysical anomalies (polygons and ellipses) show that similar results are obtained. This method outperforms the common least-squares method with regularization (Fernández-Muñiz et al. 2020). One-Dimensional Forward Modeling in Direct Current (DC) Resistivity One-dimensional forward modeling in direct current (DC) resistivity is actually computationally inexpensive, allowing the use of global optimization methods (GOMs) to solve 1.5 D inverse problems with flexibility in constraint incorporation. GOMs can support computational environments for quantitative interpretation in which the comparison of solutions incorporating different constraints is a way to infer characteristics of the actual subsurface resistivity distribution. To this end, the chosen GOM must be robust to changes in the cost function and also be computationally efficient. The performance of the classic versions of the simulated annealing (SA), genetic algorithm (GA), and particle swarm optimization (PSO) methods for solving the 1.5 D DC resistivity inverse problem was compared using synthetic and field
Particle Swarm Optimization in Geosciences
data by Barboza et al. (2018). The main results are as follows: (1) All methods reproduce synthetic models quite well; (2) PSO and GA are comparatively more robust to changes in the cost function than SA; (3) PSO first and GA second present the best computational performances, requiring less forwarding modeling than SA; and (4) GA gives higher performance than PSO and SA with respect to the final attained value of the cost function and its standard deviation. To put them into effective operation, the methods can be classified from easy to difficult in the order PSO, GA, and SA as a consequence of robustness to changes in the cost function and of the underlying simplicity of the associated equations. To exemplify a quantitative interpretation using GOMs, these solutions were compared with least-absolute and least-squares norms of the discrepancies derived from the lateral continuity constraints of the log-resistivity and layer depth as a manner of detecting faults. GOMs additionally provided the important benefit of furnishing not only the best solution but also a set of suboptimal quasi-solutions from which uncertainty analyses can be performed (Barboza et al. 2018).
Summary This study investigated the particle swarm method, its variants, and their applications in geosciences, i.e., optimization of GNSS network, approximation of local geoid surface, inversion of residual gravity anomalies, determination of optimal well locations, inversion of refraction seismic data, and solution of geophysical inverse as well as modeling direct current resistivity.
Bibliography Awange JL, Paláncz B, Lewis RH, Völgyesi L (2018) Mathematical geosciences: hybrid symbolic-numeric methods. Springer International Publishing, Cham, p. 596, ISBN: 798-3-319-67370-7 Barboza FM, Medeiros WE, Santana JM (2018) Customizing constraint incorporation in direct current resistivity inverse problems: a comparison among three global optimization methods. Geophysics 83: E409–E422 Doma MI (2013) Particle swarm optimization in comparison with classical optimization for GPS network design. J Geodetic Sci 3: 250–257. https://doi.org/10.2478/jogs-2013-0030 Fernández-Martínez JL, Luis J, Gonzalo E, Fernandez P, Kuzma H, Omar C (2010) PSO: a powerful algorithm to solve geophysical inverse problems: application to a 1D-DC resistivity case. J Appl Geophys 71:13–25. https://doi.org/10.1016/j.jappgeo.2010.02.001 Fernández-Muñiz Z, Pallero JL, Fernández-Martínez JL (2020) Anomaly shape inversion via model reduction and PSO. Comput Geosci 140. https://doi.org/10.1016/j.cageo.2020.104492 Grafarend EW, Sansò F (1985) Optimization and design of geodetic networks. Springer, Berlin/Heidelberg. https://doi.org/10.1007/9783-642-70659-2
Pattern Jgouta M, Nsiri B (2017) GNSS positioning performance analysis using PSO-RBF estimation model. Transp Telecommun 18(2):146–154. https://doi.org/10.1515/ttj-2017-0014 Kao S, Ning F, Chen CN, Chen CL (2017) Using particle swarm optimization to establish a local geometric geoid model. Boletim de Ciências Geodésicas 23. https://doi.org/10.1590/s1982-21702017000200021 Onwunalu JE, Durlofsky LJ (2010) Application of a particle swarm optimization algorithm for determining optimum well location and type. Comput Geosci 14:183–198. https://doi.org/10.1007/s10596009-9142-1 Paláncz B (2021) A variant of the black hole optimization and its application to nonlinear regression. https://doi.org/10.13140/RG.2. 2.28735.43680. https://www.researchgate.net/project/MathematicalGeosciences-A-hybrid-algebraic-numerical-solution Poormirzaee R, Moghadam H, Zarean A (2014) Introducing Particle Swarm Optimization (PSO) to invert refraction seismic data. In: Conference Proceedings, near surface geoscience 2014 – 20th European meeting of environmental and engineering geophysics, Sep 2014, vol 2014, pp 1–5. https://doi.org/10.3997/2214-4609. 20141978 Singh A, Biswas A (2016) Application of global particle swarm optimization for inversion of residual gravity anomalies over geological bodies with idealized geometries. Nat Resour Res 25:297–314. https://doi.org/10.1007/s11053-015-9285-9 Yang X (2010) Nature-inspired metaheuristic algorithms, 2nd edn. Luniver Press, Frome
Pattern Uttam Kumar Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India
1053
same behavior differently and represent it by different patterns. It is also apparent that one observer may use different patterns to describe the same behavior.
Introduction Geographic knowledge discovery finds novel and useful geospatial knowledge hidden in Big Data. This is a human-centered process involving many activities such as data cleaning, data reduction, and data mining through scalable techniques to extract patterns to form knowledge (Shekhar and Xiong 2008). In fact, the basis of data mining is to discover patterns occurring in the database, such as associations, classification models, sequential patterns, etc. (Han and Kamber 2006). Areas of machine learning such as recognition, classification, clustering, prediction, association mining, sequential pattern mining, etc. constitute pattern discovery. Recognition and classification are complementary functions because classification is concerned with establishing criteria that can be used to identify or distinguish patterns appearing in the data. These criteria can depend on representatives of each class to numeric parameters for measurement, to syntactical descriptions of key features. Recognition is the process by which tools are subsequently used to find a particular pattern within the data (Russ 2002). Clustering of data has wider applications in many domains, for example, spatial data clustering of remotely sensed images from space-borne satellites to identify various land use classes (see Fig. 1).
P
Definition Spatial data analysis and spatial data mining encompass techniques to find pattern, detect anomalies and test hypotheses from spatial data (Shekhar and Xiong 2008). Pattern denotes the way in which we see and interpret a behavior and present them, where the term behavior refers to a particular, objectively existing configuration of characteristics (Andrienko and Andrienko 2006). A pattern is a construct reflecting essential features of a behavior in a parsimonious manner, i.e., in a substantially shorter and simpler way than specifying every reference and the equivalent characteristics. In the context of data mining, a pattern is an expression E in a language L describing facts in a subset FE of a set of facts F (dataset) so that E is simpler than the enumeration of all facts in FE (Andrienko and Andrienko 2006). So, a pattern may reflect a combination of qualities, acts, tendencies, etc., forming a consistent or characteristic arrangement. A pattern results from observation or analysis, an image of a behavior showing how an observer might understand it, which is often subjective. Therefore, different observers understand the
Pattern, Fig. 1 Clustering of pixels for different land use classes in multispectral data having 5 spectral bands
1054
For efficient data mining, the evaluation of pattern interestingness is made as deep as possible into the mining process so as to confine the search to only the interesting patterns. Often, domain knowledge are used to guide the discovery process to facilitate concise pattern discovery at various levels of abstraction. Integrity constraints and business rules help in evaluating the interestingness of the discovered patterns (Han and Kamber 2006). Data pattern can be mined from many different kinds of database such as data warehouse, relational database, transactional database, object-relational database, spatial database, time-series database, sequence database, text database, multimedia database, data streams, World Wide Web, etc. Patterns such as frequent itemsets, subsequences and substructures occur frequently in the data. For example, in a transactional dataset of a supermarket, frequent itemset refers to a set of items that frequently appear together (example, milk and bread), whereas a frequent occurring subsequence that customers may tend to purchase from an electronic retail shop are computer and printer followed by ink cartridges, which represents a sequential pattern. A substructure can refer to different structural forms such as graph, trees, lattice, etc. Mining these frequent patterns leads to the discovery of interesting associations and correlations within data (Han and Kamber 2006). In the context of spatial data, a tessellation (or tiling) is often used as a way of partition of space into mutually exclusive cells that together make up the complete study space. These are used for vector-based representation to characterize geographic fields and objects through a regular pattern. Examples of two regular tessellations are shown in Fig. 2 that represent square and hexagonal cells. It is fascinating to see that in all regular tessellations, cells are of the same shape and size, and that the field attribute value assigned to a cell is associated with the entire area occupied by the cell. Square cell tessellation pattern is common because of their ease in geo-referencing. It is interesting to notice how tessellation represents different patterns for building topology (By 2004). Pattern can be described by terms like radial, checkboard, etc. In the context of terrain data, pattern refers to the spatial arrangement of objects and implies the characteristic Pattern, Fig. 2 Tessellation: pattern in a square and hexagonal cell representation
Pattern
repetition of certain forms or relationships. For example, land use types have specific characteristic patterns such as irrigation types, different types of housing in urban areas, rivers with tributaries, etc. as observed from satellite images. Therefore, pattern is normally used as an interpretation element in image analysis (Kerle et al. 2004).
Types of Patterns Pattern can be expressed mathematically, through attributes or numbers such as pattern observed in the recorded summer temperature across a city. A pattern may be broadly classified into the following types: • Association pattern – it means description of a set of references as a unified whole on the basis of similarity of their features (Andrienko and Andrienko 2006). This unification is based on identical or similar characteristics where the attribute values correspond to the references. For example, cities can be grouped together as a cluster based on their population, where population is a close characteristic. This idea can be extended to account for clustering based on multiple attributes such as a cluster of cities with similar male and similar female proportions or low male and high female proportions or high male and low female proportions, etc. Association patterns are supplemented with summary characteristics of a combination of references from the individual characteristics. Numeric characteristics are summarized by aggregation of individual attributes into a single feature such as mean and variance, etc. • Arrangement pattern – this defines the way characteristics are arranged in certain order for which time can be a reference to signify increasing or decreasing trends (Andrienko and Andrienko 2006). Pattern may also mean stability. The pattern ordering may be random such as cities with increasing population or regular such as a chessboard with black and white boxes. Spatial trends occur in spatial locations that do not exhibit natural
Pattern
•
•
•
•
ordering, however, some parameters like directions and distances in space can be used for ordering. Differentiation pattern – here, references are united because they are different from the characteristics of other remaining references/neighborhood that may form another pattern (Andrienko and Andrienko 2006). For example, grouping of regions that have relatively higher traffic than the neighborhood in the northern part of a city. Likewise, similar regions with equal traffic density may also occur in other parts (south, east, west, etc.) of the city. In such examples, references are done on some characteristics along with differentiation operation. Differentiation is possible even in the absence of common references, where the reference may have different characteristics from the remaining reference set. Outlier (local or global) fall in this category, where local outliers differ significantly from their neighborhood and global outliers have references that differ from the entire reference set. A subset of references can also be differentiated from the remaining references because of higher variability of characteristics in one subset than in other references. Compound pattern – this type of pattern emerges when the summarization of data involves division of a reference set into parts and association of elements on the basis of similar features, resulting in a compound pattern that comprises several subpatterns (Andrienko and Andrienko 2006). In other words, when the distribution does not clearly show a spatial pattern or trend, it can be described as a combination of two or more patterns called compound pattern. For example, “spatial autocorrelation states that locations that are close are more likely to have similar values than locations that are far apart” involves distance as a reference set because the characteristics change gradually, i.e., the neighboring objects differ less than the distant objects. In such cases, it makes sense to describe a spatial trend as a compound pattern (combination of two patterns). Distribution summary pattern – this defines the notion of the distribution (variance) of characteristics over a reference set (Andrienko and Andrienko 2006). Statistical mean, median, quartiles, quantiles, percentiles and variance in different types of distributions are often used to summarize the data. For spatially referenced data, center, entropy, etc. are used to summarize the homogeneity or heterogeneity. Patterns in spatio-temporal data – spatio-temporal data refers to data that are varying in both space and time domain, i.e., data point moving over space and time, for example, traffic flow at different times on the road.
A spatio-temporal pattern characterizes the spatial relationships among a collection of spatial entities and the behavior of such relationships over time, therefore they are often
1055
multifaceted. Spatial relationship is complex and diverse, i.e. for any pair of spatial entities, a variety of spatial relations exist such as directional, distance based, and topological that are often application specific. Evolving spatial clusters generate pattern which have large number of spatial entities that are similar to one another in the neighborhood. So, spatiotemporal patterns have large number of spatial and temporal relations while spatial clusters may be restricted to relationship based on distances between entities. Spatio-temporal pattern studies are widely used in traffic analysis, security systems, epidemic studies and disease control, etc. (Shekhar and Xiong 2008). Handling spatio-temporal dataset still remains a challenge. In this context, movement patterns in spatio-temporal data refer to events expressed by a set of entities such as movement of humans, traffic jams, migration of birds, bird flock movement, etc. In these cases, a pattern starts and ends at certain times and may be constrained to a subset of space. In geoscience, pattern is used with various meanings conceptualized as salient movement of events in the geospatial context. Application areas of movement pattern in spatio-temporal data are animal behavior, human movement (for example, tracking humans through mobile phones, cameras, etc.) for traffic management, surveillance and security, in military, sport’s action analysis, etc. The challenge rests in relating movement patterns with the underlying space to answer where, when, and why the entities move the way they do. Therefore, patterns need to be conceptualized with the surroundings (Shekhar and Xiong 2008). There are a few interesting points that naturally emerge as a result of the pattern analysis of Big Data. It is important to state that not all the patterns are interesting and only a small fraction of them are of interest to a user. So, what makes a pattern interesting? A pattern is interesting if it is easily understood, valid on new dataset with some degree of certainty, useful and novel. A pattern is also interesting if it validates a hypothesis that the user sought to confirm because an interesting pattern also characterizes information that will turn into knowledge. There are several objective measures of pattern interestingness which are based on the structure of discovered patterns. For example, an objective measure for association rules is support and confidence. Support represents the percentage of transactions from a database that the given rule satisfies while confidence assesses the degree of certainty of the detected association (Han and Kamber 2006).
Pattern in Geoscience There has been several studies that use artificial intelligence based techniques to reduce the multitude of phenomenon in the natural environment into discrete manageable classes. This is because humans are very good at detecting pattern,
P
1056
but relatively poor in distinguishing subtle brightness difference such as in pixel values (reflectance) in multispectral imageries. In general, the term pattern recognition is the act of taking in raw data and making an action based on the category of the pattern (Duda et al. 2000). In spatial data analysis, pattern recognition or classification is a method used widely by which labels are attached to data (pixels) in view of their spectral characteristics (of the remote-sensing data). This labeling is implemented by any classification algorithm such as maximum likelihood classifier, random forest, support vector machine, etc. that are trained beforehand to recognize pixels with spectral pattern and similarities to derive land use classes. The method of land use detection or recognition has proved to be efficient, objective and reproducible. Fig. 3 shows the spatio-temporal pattern of land use, highlighting the phenomenon of urban sprawl of Bangalore City, India derived from time-series Landsat data using a machine-learning approach. The patterns of land use classes such as built-up (including buildings, concrete surfaces, paved roads, parking lots, etc.), vegetation (gardens, lawns, parks, forest, agriculture/horticulture farms, etc.), water bodies (lakes, ponds, etc.), and others class (including fallow land, barren land, rocks and play grounds with soil) are clearly distinguishable. The time-series plot shows the trend/pattern of the growth of urban areas and reduction in the area of vegetation and water bodies as depicted in Fig. 4. Note that the detailed land use statistics and accuracy assessment of the
Pattern
time-series-classified images have not been discussed here. Such geospatial analysis is very useful in understanding the past and present scenario of a spatial area and helps to model the trend to predict the future urban growth in various directions and distances from the City Center. Pattern analysis has been used in many studies such as discovering climate change pattern using spatio-temporal dataset, analyzing temporal trends in precipitation and temperature, assessing the spatial distribution of global vegetation cover, for multivariate spatial clustering and geovisualization, discovering pseudopatterns, etc. (Miller and Han 2009). There are other applications including spatio-temporal video data mining to analyze abnormal activity, target analysis, content-based video retrieval under spatiotemporal distribution, urban sprawl pattern analysis, urban temperature analysis, (Li et al. 2015), spatial point pattern analysis (Cressie 2015), etc.
Summary Pattern refers to the spatial arrangement of objects and implies the characteristic repetition of certain forms or relationships. Here, different types of patterns were broadly discussed. As such, there are no rules for selecting a pattern for a given problem. Discovering an interesting and useful pattern is subjective, so patterns symbolizing important characteristics need to be explored. Common properties for a set of
Pattern, Fig. 3 Spatio-temporal pattern of the urban sprawl of Bangalore City, India
Pattern Analysis
1057
Pattern, Fig. 4 Trend of the spatio-temporal change in land use classes in Bangalore City, India
references are easy to find out, however, distinguishing properties are rather difficult to discover. Patterns need to be distinctive and relevant to the goals of the investigation related to an application. Aggregation at an appropriate level helps in avoiding noninteresting results from summary characteristics. The degree of variation of the characteristics pertaining to the references along with the outliers has to be considered. Finally, pattern depends on the purpose of the analysis, whether a precise specification of a subset is required or an allusion is sufficient.
Cross-References ▶ Big Data ▶ Cluster Analysis and Classification ▶ Machine Learning ▶ Pattern Classification ▶ Support Vector Machines
Bibliography Andrienko N, Andrienko G (2006) Exploratory analysis of spatial and temporal data a systematic approach. Springer, Berlin. https://link. springer.com/book/10.1007/3-540-31190-4 Cressie NAC (2015) Statistics for spatial data. Wiley, Hoboken de By RA (2004) Principles of geographic information systems. ITC Educational Textbook Series: The Netherlands Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. A Wiley-Interscience Publication, New York. ISBN 9814-12-602-0 Han J, Kamber M (2006) Data mining concepts and techniques. Elsevier, San Francisco Kerle N, Janseen LLF, Huurneman GC (2004) Principles of remote sensing. ITC Educational Textbook Series: The Netherlands Li D, Wang S, Li D (2015) Spatial data mining theory and application. Springer, Berlin, Heidelberg Miller HJ, Han J (2009) Geographic data mining and knowledge discovery, 2nd edn. CRC Press, Boca Raton Russ JC (2002) The image processing handbook, 4th edn. CRC Press, Boca Raton Shekhar S, Xiong H (2008) Encyclopedia of GIS. Springer, New York
Pattern Analysis Sanchari Thakur1, LakshmiKanthan Muralikrishnan2, Bijal Chudasama3 and Alok Porwal4 1 University of Trento, Trento, Italy 2 Cybertech Systems and Software Limited, Thane, Maharashtra, India 3 Geological Survey of Finland, Espoo, Finland 4 Center of Studies in Resources Engineering, IIT Bombay, Mumbai, India
Synonyms Pattern recognition; Spatial association
Definition Pattern refers to the rules governing the arrangement of physical events or occurrences in space and time. Depending on the dynamic or static nature of the events, patterns can be spatial, temporal, and spatio-temporal. Patterns can be identified from geoscientific datasets by visualization and mathematical techniques collectively known as pattern analysis. These techniques are chosen depending on the geospatial representation of the events, e.g., as discrete points, lines, and polygons, or as continuous field values (raster or images). Pattern analysis of single type of data (univariate) can indicate whether the events tend to occur close to (clustered) or away from (dispersed) each other, while multivariate pattern analysis can also reveal how the attributes of the events are inter-related.
Pattern Analysis in Geosciences Geological entities (events) are represented as points, lines, polygonal vectors, and rasters defined by their spatial
P
1058
coordinates and associated with single or multiple attributes. For instance, at regional scale, mineral deposits are considered as points, lineaments as lines, lithological units as polygons, and surface elevation as raster. The underlying processes and rules governing the occurrence of these geological events result in distinguishable patterns of interrelationships between them. These patterns can be identified by well-established methods. Statistical pattern analysis techniques test the hypothesis that the observed events are a realization of a random process. Several instances of the process are generated by Monte Carlo simulations. If the observed pattern occurs within the simulation envelope, it represents complete spatial randomness (CSR). Deviations from randomness are indications of positive association (or clustering, i.e., events tend to occur close to each other) or negative association (or dispersion, i.e., events tend to occur away from each other) (Fig. 1). In this chapter, we review some of the pattern analysis approaches with examples of their application in geosciences.
Pattern Analysis
PP. The simplest approach divides the observation window into equal and contiguous blocks of any size and shape (quadrats). Clustering is inferred in a quadrat if it has higher number of points in relation to the other quadrats. Alternatively, in kernel smoothed density function (KSDF) techniques, the PP is convolved with a kernel (such as Gaussian) to estimate the probability of occurrence (intensity) of points at every location in the observation window. A constant intensity throughout the window implies a homogeneous point process. Spatially varying intensity could indicate either inhomogeneity or clustering. Such inferences can be associated with a level of statistical significance by hypothesis testing methods. For example, a chi-square goodness-of-fit test compares the observed quadrat counts to that expected from a homogeneous point Poisson process to infer CSR. However, distance-based techniques are more widely used for statistical SPP analysis. These are based on estimating the frequency distribution of the distance r between the closest pairs of points (Baddeley 2015, Diggle 2013, Lisitsin 2015). The inter-event distance calculator, G-function, is given by:
Patterns Analysis in Univariate Data N
A univariate point data is defined within a spatio-temporal framework as the presence or absence of an event at a given location and/or time. A spatial point pattern (SPP) X consists of two main components – a set of points with well-defined coordinates xi R2 (2D cartographic coordinates) and an observation window, that is, a delimiting boundary within which the analysis is restricted. An SPP can be extended to include the time of occurrence of each event to generate a spatio-temporal point pattern (STPP) {(xi, ti)}. In this case, the observation window is bounded in space, x [x1, x2], and time, t [t1, t2]. There are several graphical and statistical methods for SPP analysis (Baddeley 2015). Graphical methods visually and quantitatively highlight the regions of heterogeneity in the
G ðr Þ ¼
i6¼j
count min xi xj < r N
ð1Þ
Here, the numerator calculates the number of points xi whose nearest neighbor xj is closer than the distance r. N is the total number of points. The point-to-event distance calculator, F-function, is given by: Fð r Þ ¼
M i¼1
N j¼1 count
min ui xj < r N
ð2Þ
It calculates the distances to the nearest event xj from selected M arbitrary points ui in the empty space. The F-function provides a measure of the shapes of the empty spaces between the point patterns and is particularly useful when the
Pattern Analysis, Fig. 1 Different categories of spatial patterns, where the events are represented as points
Pattern Analysis
1059
observation window is a concave polygon. The inter-event and point-to-event functions are combined into another summary statistic called the J-function given by: J ðr Þ ¼
1 G ðr Þ 1 F ðr Þ
ð3Þ
J(r) ¼ 1 represents CSR. If the empty spaces are more (i.e., F(r) is small) and the points are closer to each other (i.e., G(r) is large), J(r) < 1 and the PP is clustered. Conversely, J(r) > 1 indicates dispersion. G, F, and J are first-order distance functions, in that only the nearest points are considered. There are well-established multi-distance functions that consider multiple points at different distances from each other, such as the K-function (Baddeley 2015). These methods can be used for hypothesis testing by appropriately selecting the point processes such as homogeneous Poisson point process (for constant point intensity), inhomogeneous Poisson point process (for variable but predictable intensity), doubly stochastic Poisson process, or Cox process (if intensity is also random) (see ▶ “Point Pattern Statistics”). In general, border correction is necessary to account for censoring errors caused by restricting the PP analysis to within the boundary of an observation window. Similar to SPP, STPP analysis aims at understanding the interaction among events (Diggle 2013) in terms of the degree of clustering, but in both space and time. Thus, SPP analysis methods can be extended to STPP analysis by adding the temporal coordinate. Statistical STPP analysis methods include the use of Knox’s statistic, scan statistics, and tests related to the nearest-neighbor distribution, and second-order properties of the point process expressed through the Ripley’s K-function for homogeneous and inhomogeneous spatiotemporal point processes. An application of STPP analysis is the modeling of earthquake events, which are known to trigger aftershocks close to the epicenter and occurrence time of a high-magnitude earthquake. Shi et al. (2009) used STPP analysis to study the sequences of aftershocks of the Wenchuan earthquake of 2008. They found the aftershock events to be clustered within 60 km and 260 hours from the high-magnitude earthquake. Besides points, univariate analysis applied to single-band rasters identifies relationships between neighboring pixels (e.g., texture analysis, hydrological and terrain modeling) using image processing techniques.
Patterns in Multivariate Data Different geological processes are intricately linked to one another. Therefore, it seldom suffices to independently analyze only the coordinates of events that are also associated with other geological information. If each point xi in an
SPP/STPP is associated with a mark m, the set of the pointmark combinations {(x, m)} describes a marked point pattern (MPP) (Ho and Stoyan 2008). In geoscience applications, the marks are typically real-valued attributes. In the previous example of earthquake analysis if each earthquake event is also associated with its magnitude, it describes a marked STPP. The mark can be modeled as a spatially continuous random field Z. However, this model assumes marks and points to be mutually independent, thus inhibiting the pattern analysis of correlations between them. In contrast, density-dependent marked Cox processes allow the modeling of interrelationships between the points and the marks (Stoyan et al. 1995; Ho and Stoyan 2008). Several MPP distance functions are extensions of SPP analysis, for example, the normalized mean product of marks of a pair of points separated by a distance r. Common MPP processes include log Gaussian Cox process, intensity-marked Cox process, and geostatistical model for preferential sampling (Ho and Stoyan 2008). A special case of MPPs is when the marks are categorical. Consider n point datasets X1, X2, . . .Xn each containing objects having the same categorical mark. Irrespective of the mark, any two points in the union of the n datasets are defined as neighbors if the distance between them is less than a given distance r0. A co-location pattern is defined by an undirected connected graph with points representing nodes and edges representing the neighborhood relationships between them. Patterns are mined from the graph based on spatio-temporal topological constraints and defining proximity rules that associate the objects of Xi with those of Xj, i 6¼ j. These association rules are defined using interestingness measures such as participation ratio, prevalence, and confidence (Mamoulis 2008). An application of co-location analysis is in determining the association of water reservoirs with incidence of diseases. Patterns between vector geometries can be identified using probabilistic approaches such as the Weights of Evidence (WofE) model based on Bayesian method (Agterberg et al. 1993). It involves quantification of spatial association (in terms of weights) between the targeted geometry (hypothesis) and the evidential geometry (evidence). Figure 2 shows an application of WofE to mineral prospectivity modeling, where the hypothesis to be predicted is the occurrence of mineral deposits (points) and the evidential features are faults (lines). Here, the spatial association between deposits and the presence/absence of faults is assessed by the estimation of posterior probabilities. Patterns between points and their enclosing polygons can be identified using the SPP analysis techniques by defining the polygons as observation windows. Mamuse et al. (2010) applied distance-based PP analysis to the exploration of nickel sulfide deposits within prospective komatiites in Kalgoorlie terrane. Considering the terrane as observation window, the nickel sulfide deposits and komatiite bodies
P
1060
Pattern Analysis
Pattern Analysis, Fig. 2 WofE method for identifying distance of spatial association between points and lines by calculating contrast, that is, difference between the probability of deposits within the feature and probability of deposits outside the feature
(represented as points) were found to be clustered. However, considering the komatiites as observation windows, the deposits were found to be randomly distributed or dispersed. This observation was used to estimate the number of deposits in a komatiite body and to identify the spatial controls on nickel sulfide mineralization. Pattern analysis of multivariate rasters includes evaluation of the correlation and covariance matrices, and cluster analysis. In these methods, the geoscientific information from co-located pixels of each raster band is concatenated to form a feature-vector representing the set of attributes at a given location (corresponding to the pixel). The superset of all feature-vectors extracted from the full raster is then analyzed in the feature-space (where the support axes represent the different variables rather than the spatio-temporal coordinates in the original dataspace) to identify patterns of interrelationships between the different variables. A diagnostic step for advanced multivariate analyses involves computation of a correlation matrix, which estimates the pair-wise correlation coefficients between the variables or a covariance matrix, which characterizes the monotonic trend between the variables (positive covariance indicates variations in the same direction, while negative covariance indicates inverse directional variations). These matrices are precursors for advanced multivariate analysis such as principal component analysis (PCA), which is based on eigenvalue decomposition of the covariance matrix. PCA is used for identifying geochemical patterns of anomalous elemental concentrations or alteration patterns indicating mineralization or bedrock lithology. Graphs of the PC loadings indicate that elements (variables) located in proximity in the PC plots are closely related in the geochemical domain (e.g., Filzmoser et al. 2009). Clustering (e.g., k-means, Isodata) algorithms are also used for finding patterns in the feature-space. Its simplicity lies in the calculation of Euclidean distances between the feature-vectors and grouping them into clusters by their proximity. The identified clusters are mapped back to the corresponding geospatial domains. The actual calculations
are iterative for the identification of cluster centroids and grouping of pixels until the centroids have stabilized or the maximum number of iterations has been reached (see ▶ “Pattern Recognition”).
Summary and Conclusions We have seen different types of spatial and spatio-temporal patterns and the techniques for analyzing them as univariate or multivariate data by graphical methods, pattern mining, statistical testing, and probabilistic modeling. Pattern analysis supports the understanding of the underlying physical processes that cause the occurrence of the events at a particular location and/or time. This capability finds interesting applications in geosciences such as (1) interpolation, that is, estimating the values of spatio-temporal variables where observed data is not available; (2) forecasting, that is, predicting occurrence of events; (3) modeling, that is, identifying the underlying processes responsible for the occurrence of the pattern; and (4) simulations, that is, virtual regeneration of physical reality. Apart from the methods described in this chapter, there are several supervised, semi-supervised, and unsupervised techniques for identifying patterns in images (either single or multiband) and applying them to the automatic classification of geological entities. Recently, pattern analysis has been extended to pattern recognition which incorporates advanced machine learning techniques (such as artificial neural networks, self-organizing maps, convolutional networks), which enable mineralization-related pattern identification from input raster data (see ▶ “Machine Learning”).
Cross-References ▶ Machine Learning ▶ Mineral Prospectivity Analysis
Pattern Classification
▶ Multiple Point Statistics ▶ Pattern ▶ Pattern Classification ▶ Pattern Recognition ▶ Point Pattern Statistics ▶ Predictive Geologic Mapping and Mineral Exploration ▶ Spatial Analysis ▶ Spatial Autocorrelation ▶ Spatial Statistics
Bibliography Agterberg FP, Bonham-Carter GF, Cheng QM, Wright DF (1993) Weights of evidence modelling and weighted logistic regression for mineral potential mapping. Comput Geol 25:13–32 Baddeley A (2015) Spatial point patterns: methodology and applications with R. CRC Press Diggle PJ (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC Press Filzmoser P, Hron K, Reimann C (2009) Principal component analysis for compositional data with outliers. Environmetrics 20(6):621–632 Ho LP, Stoyan D (2008) Modelling marked point patterns by intensitymarked Cox processes. Stat Prob Lett 78(10):1194–1199 Lisitsin V (2015) Spatial data analysis of mineral deposit point patterns: applications to exploration targeting. Ore Geol Rev 71:861–881 Mamoulis N (2008) Co-location patterns, algorithms. In: Shekhar S, Xiong H (eds) Encyclopedia of GIS. Springer, Boston Mamuse A, Porwal A, Kreuzer O, Beresford S (2010) Spatial statistical analysis of the distribution of komatiite-hosted nickel sulfide deposits in the Kalgoorlie terrane, western Australia: clustered or not? Econ Geol 105(1):229–242 Shi P, Liu J, Yang Z (2009) Spatio-temporal point pattern analysis on Wenchuan strong earthquake. Earthq Sci 22(3):231–237 Stoyan D, Kendall WS, Mecke J (1995) Stochastic geometry and its applications. Wiley, New York
Pattern Classification Katherine L. Silversides Australian Centre for Field Robotics, Rio Tinto Centre for Mining Automation, The University of Sydney, Camperdown, NSW, Australia
Definition Pattern classification is the automated grouping or labeling of samples based on features or patterns in a dataset (Bishop 2006; Duda et al. 2001).
Introduction Pattern classification involves using the features present in a dataset to automatically label each test point using a machine
1061
learning method. It is a supervised learning method, where a training dataset is used to train the model. The training dataset contains both the input data and the corresponding label for each data point. The trained model is then used to classify the test dataset using the input data to predict the corresponding label (Bishop 2006; Duda et al. 2001). Pattern classification can be performed on any dataset that can be divided into a set number of classes, each with distinct features in the data. In the geosciences this can include lithofacies classification (Bhattacharya et al. 2016), material classification (Banchs and Rondon 2007), geological mapping (Cracknell et al. 2014), wireline log interpretation (Song et al. 2020), anomaly detection (Gonbadi et al. 2015), and lithology classification (Bressan et al. 2020). Many different data types can be used for the classification including geophysical logs (Bhattacharya et al. 2016; Bressan et al. 2020; Cracknell et al. 2014; Song et al. 2020), geochemical samples (Cracknell et al. 2014; Gonbadi et al. 2015), airborne geophysical data (Cracknell et al. 2014), and seismic surveys (Banchs and Rondon 2007). Machine learning methods that can be used for classification include support vector machine (SVM) (Bhattacharya et al. 2016; Bressan et al. 2020; Gonbadi et al. 2015), neural networks (NN) (Banchs and Rondon 2007; Bhattacharya et al. 2016; Song et al. 2020), self-organizing maps (SOM) (Bhattacharya et al. 2016; Cracknell et al. 2014), random forests (RF) (Bressan et al. 2020; Cracknell et al. 2014; Gonbadi et al. 2015), boosting (Gonbadi et al. 2015), and decision trees (Bressan et al. 2020). The best method to apply varies between different applications. NN, SVM, and RF and examples of their application are discussed below. SOMs were designed as an unsupervised learning method; however supervised learning versions have been developed for classification tasks (Bhattacharya et al. 2016; Cracknell et al. 2014; Lau et al. 2006). For more details on SOMs, please refer to the “SOM” entry in this encyclopedia.
Neural Networks Neural networks (NN) are a popular method for data classification that was designed to mimic the way that the human brain processes information. NN typically contain at least three layers. The first is an input layer that takes the input data and passes it to the hidden layer. The hidden layer learns the relationships between the input variables, then passes weighted data patterns to the output function. This output layer then determines the output, which for classification problems is a label. Multiple hidden layers can be used for complicated problems. In training, a NN is initially assigned random weight coefficients for the hidden layer. Training data is then repeatedly fed through the network and the weight coefficients are modified until the difference between the
P
1062
outputs and the desired classifications are minimized. Iterative back-propagation is often used for this learning step. When designing a NN there are numerous parameters that can be altered, such as the number of hidden layers, the number of nodes in each hidden layer, learning rate, damping coefficient, and the number of iterations used for learning (Bishop 2006; Bhattacharya et al. 2016; Duda et al. 2001). Many different variations of NN exist, with variations on how the layers are built and the method used for training common changes. More information can be found in the ▶ “Artificial Neural Network” entry. Bhattacharya et al. (2016) apply several different machine learning methods to classify multiple mudstone lithofacies from each other and calcareous siltstone and limestone lithofacies from petrophysical well logs. One of the methods applied is a standard NN using a single hidden layer. For this case, the authors found that the best results were achieved using a SVM. Banchs and Rondon (2007) apply a Hopfield neural network (HNN) to several example cases, classifying areas of high and low porosity, or areas of sand or shale, using seismic attributes as the input. In a HNN the nodes in the hidden layer are binary instead of weighted, so each node is either on or off. Song et al. (2020) classified shapes in in wireline logs using recurrent neural networks (RNN). This involved identifying four basic log shapes in spontaneous potential log segments. RNNs use previous outputs as part of the input to recognize trends and larger features. This model used six layers including a layer of Long Short Term Memory (LSTM) units. The RNN model was found to outperform a standard NN as well as other machine learning techniques for this problem.
Support Vector Machine Support Vector Machine (SVM) is a popular machine learning method that maps the input data to a higher dimensional space to increase the distance between data points from different classes. A hyperplane can then be used to divide the points into classes. Support vectors are the points that are on the edges of the classes, closest to the hyperplane. The distances between the support vectors represent the margin. A model with larger margins is considered to be more robust, and less likely to misclassify edge points (Bhattacharya et al. 2016; Bressan et al. 2020; Schölkopf et al. 2000). A kernel function is used to describe the mapping between the input space and the new higher dimensional space. Possible kernels include linear, polynomial, radial basis function (RBF), and multilayer perceptron. This kernel is optimized using the training data to minimize the margin. While ideally there should be no misclassifications in the training data, perfect separation may be impossible due to overlapping
Pattern Classification
classes. In this case a small number of errors may be tolerated. While SVM is a binary classifier, it can be used for multi-class problems using either a one-vs-all or pairwise one-vs-one method (Bhattacharya et al. 2016; Bressan et al. 2020). More information can be found in the ▶ “Support Vector Machines” entry. Bhattacharya et al. (2016) used SVM with RBF to classify multiple mudstone lithofacies from each other and calcareous siltstone and limestone lithofacies from petrophysical well logs. The authors found that in this case SVM performed better than both NN and SOM. Bressan et al. (2020) used SVM to classify sedimentary well logs into four lithology groups based on seven geophysical logs. The authors found that the SVM method used produced poor results, and suggested that this was due to the log data not containing a highly defined pattern, and insufficient data for proper training. In comparison, RF performed well on this data. Gonbadi et al. (2015) applied SVM with a RBF kernel to identify geochemical anomalies in a porphyry copper to target new exploration drilling. Despite testing using two different feature selection methods, the authors found that a boosting method (AdaBoost) was better suited to this problem.
Random Forest Random forest (RF) is a pattern classification method that is based on an ensemble of decision trees (Breiman 2001). Each individual decision tree is created by independently splitting the data using sequential queries at each node. This splitting divides the tree into branches, with leaves (each resembling a class) at the end of each sub-branch. Each decision tree has a different set of rules. A classification prediction can then be calculated by averaging the results from all of these trees, combining many weak classifiers into a single strong classifier. As more trees are included in the RF the generalization error converges to a limit without overfitting. The generalization error depends on both the strength of the trees and the correlation between the trees (Breiman 2001; Gonbadi et al. 2015). The generalization error and variance of the RF can be reduced using bagging and random selection of features. Bagging involves generating a random individual subset, with replacement, of the dataset to train each decision tree. The samples not used in the training are then used to estimate the generalization error. For each of these trees a set number of features are randomly chosen to use in the decision nodes. As this method does not overfit, it has been shown to perform well in classifications involving multisource and highdimensional data (Cracknell et al. 2014; Gonbadi et al. 2015). More information can be found in the ▶ “Random Forest” and ▶ “Decision Trees” entries.
Pattern Recognition
Cracknell et al. (2014) used RF to classify geophysical and geochemical data into 21 discrete lithological units. The data was from an area with known volcanic-hosted massive sulfide (VHMS) deposits where field mapping is difficult due to poor outcrops and thick vegetation. Comparing the RF map to the geological interpretation demonstrated that the RF map not only performed well, but also included additional details about the spatial distributions of key lithologies. Bressan et al. (2020) applied RF to seven geophysical logs to classify sedimentary well logs into four lithology groups. Compared to SVM, decision tree and multilayer perceptron (MLP) methods, RF performed well on this data. In two of the three scenarios considered RF obtained the best results. RF also performed well in all scenarios for cross-validation. Gonbadi et al. (2015) applied a RF with 350 trees to identify geochemical anomalies to identify the best targets for new exploration drilling in a porphyry copper region. RF was chosen because it does not assume that the variables have a normal distribution. However, the authors found that a boosting method (AdaBoost) was better suited to this problem.
Summary Pattern classification involves using a machine learning method to automatically group or label samples based on features or patterns in a dataset. It is a supervised learning method, requiring a training dataset that contains both the input data and the corresponding label for each data point. The trained model can then be used to predict the classification of new samples. Many different classification methods can be used. Popular methods include NN, SVM, and RF. Pattern classification has been applied to problems such as lithology or material classification, geological mapping, wireline log interpretation, and anomaly detection.
Cross-References ▶ Artificial Neural Network ▶ Decision Tree ▶ Random Forest ▶ Self-Organizing Maps ▶ Support Vector Machines
1063 studies from the Bakken and Mahantango-Marcellus Shale, USA. J Nat Gas Sci Eng 33:1119–1133 Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin, 710p Breiman L (2001) Random forests. Mach Learn 45:5–32 Bressan TS, de Souza MK, Girelli TJ, Chemale F Jr (2020) Evaluation of machine learning methods for lithology classification using geophysical data. Comput Geosci 139:104475 Cracknell MJ, Reading AM, McNeill AW (2014) Mapping geology and volcanic-hosted massive sulfide alteration in the Hellyer–Mt Charter region, Tasmania, using Random Forests™ and self-organising maps. Aust J Earth Sci 61(2):287–304 Duda R, Hart P, Stork D (2001) Pattern classification, 2nd edn. Wiley, New York, 654p Gonbadi AM, Tabatabaei SH, Carranza EJM (2015) Supervised geochemical anomaly detection by pattern recognition. J Geochem Explor 157:81–91 Lau KW, Yin H, Hubbard S (2006) Kernel self-organising maps for classification. Neurocomputing 69(16–18):2033–2040 Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245 Song S, Hou J, Dou L, Song Z, Sun S (2020) Geologist-level wireline log shape identification with recurrent neural networks. Comput Geosci 134:104313
Pattern Recognition Rogério G. Negri São Paulo State University (UNESP), Institute of Science and Technology (ICT), São José dos Campos, São Paulo, Brazil
Definition Pattern recognition comprises a field of study regarding how machines and algorithms may recognize patterns and structures on data sets and making decisions about them. Pattern recognition embraces a research area that focuses on problems of classification and object description. With multidisciplinary aspects, the pattern recognition studies interface with statistic, engineering, artificial intelligence, computer science, data mining, image and signal processing, etc. Typical examples of applications are automatic character recognition, medical diagnosis, bank customer profiles monitoring, and even the most recent issues like recommendations systems and face recognition.
Process Overview Bibliography Banchs R, Rondon O (2007) Seismic pattern recognition based upon Hopfield neural networks. Explor Geophys 38:220–224 Bhattacharya S, Carr TR, Pal M (2016) Comparison of supervised and unsupervised approaches for mudstone lithofacies classification: case
A pattern recognition process is not limited to apply methods in data sets but also includes distinct stages from the problem formulation, data collection, and representation until classification, assessment, and interpretation. Figure 1 depicts a common pattern recognition workflow.
P
1064 Pattern Recognition, Fig. 1 Pattern recognition workflow overview
Pattern Recognition Data
Sensor
Features
• Extraction • Selection
Classification
• Model selection • Parametrization
Results
• Decision • Conclusion
Assessment
In a broad view, a “pattern” is usually represented by a vector whose components values stands for a specific attribute previously measured by a sensor. Waves captured from an acoustic system, the number of financial transactions in a period, the spectral responses registered by a digital camera, and the climatic variables at a specific instant are examples of patterns in practical situations. Based on the observed data and aiming to simplify or identify non-obvious information about them, associations are made between each of these data to a specific class or group/cluster. This pattern-to-class association process is called classification, which demands the modeling (i.e., choosing a model and respective parameterization) of a rule able to perform such a task. As an intermediate part of this process, it needs to highlight the data preprocessing and assessment of the classification results. Computing additional values from the objects to provide a better description leads to a different stage known as “feature extraction.” Regarding all the available information, it is essential to use only the meaningful attributes in the pattern recognition process that favors error reduction and accuracy improvement. To do so, a careful “feature selection” is usually performed. Also, computing the error/accuracy levels should be systematically and persistently carried in an “assessment” stage.
Learning Paradigms According to the previous discussions, a pattern recognition process lies in modeling a decision rule that can classify patterns regarding classes or clusters. The learning paradigm is directly related to how the modeling is carried. Among different paradigms in the literature, the supervised, unsupervised, and semi-supervised are the most common. However, it is valid to mention other paradigms, for example, reinforcement, evolutionary, and instance-based learning. The main characteristic that distinguishes supervised and unsupervised learning refers to how the classification rules are modeled. While supervised learning builds a model using information from a set of ground-truth examples in which the expected response is known (i.e., the class), unsupervised learning is based on similarities identified over the set of patterns without information about presumed classes. Examples whose response/class is known are called “labeled
patterns.” The set of labeled patterns required by supervised methods is defined as “training set.” When the classes comprised by the classification are previously defined, supervised learning is preferable. However, insufficiently labeled data availability may turn unfeasible a supervised training. In this case, unsupervised learning rises as an alternative. Nonetheless, it should stress that results based on such a learning paradigm are restricted to groups/ clusters of similar patterns without assigning them to a class. These limitations became a starting point to the development of the semi-supervised paradigm, usually defined as a midterm between supervised and unsupervised, since it simultaneously uses labeled and unlabeled patterns to model the classification rules (Chapelle et al. 2006).
Approaches Beyond the learning paradigm, the pattern recognition methods are also characterized according to the approach used to define the decision rules. Examples of approaches comprise template matching, statistical models (parametric or non-parametric), construction of decision surfaces based on geometric/structural criteria, and neural networks. Template matching involves comparing and identifying similarities between patterns taken as referential (i.e., the templates) and other patterns whose class is unknown (Pavlidis 2013). This process should be independent of scale, rotation, and translation changes. The statistical approach establishes decision regions using probability distributions representing each class included in the classification problem (Webb and Copsey 2011). The Bayes decision theory is an essential mathematical referential to define the rules and make decisions. Naive Bayes, maximum likelihood classification, and K-nearest neighbors are examples of statistical approach methods. Methods with a geometric/structural approach are characterized by decision rules expressed by separation hyperplane or partitions in the attribute space (Bishop 2006). Support vector machines, decision trees, graph-based models, and the K-means algorithm follows the geometric/structural approach. Lastly, methods based on the neural network approach simulate an arrangement of neurons that adapt to solve complex tasks with nonlinear input and outputs (Haykin 2009). Among many neural networks proposed in the literature, the multilayer perceptron, self-organizing maps, and convolutive neural networks are popular.
Pawlowsky-Glahn, Vera
In addition, it is worth highlighting the possibility of combining the cited approaches into an ensemble scheme (Theodoridis and Koutroumbas 2008). Methods like AdaBoost and random forest are examples of ensembles widely adopted.
Applications in the Geosciences Several studies and applications in geosciences employ pattern recognition concepts. Examples include seismic analysis, meteorology, land use and land cover mapping, and natural disaster prevention. Curilem et al. (2016) use pattern recognition methods to extract features from seismic signals and build a system for volcanic events classification. Chattopadhyay et al. (2020) demonstrate that the integration of clustering and neural network concepts provides a potential method for weather pattern forecasting. Extreme rainfall forecasting using pattern recognition is another topic of relevance discussed in Nayak and Ghosh (2013). Distinctly, as proposed in Wang et al. (2015), an approach based on the support vector machine method allows reconstructing the historical data of weather type, which comprises an essential component for photovoltaic power forecasting. Mapping the land use and land cover is another topic of extreme relevance in several applications related to the geosciences (Giri 2012).
1065 Curilem M, Huenupan F, Beltrn D, San-Martin C, Fuentealba G, Franco L, Cardona C, Acua G, Chacn M, Khan MS, Becerra Yoma N (2016) Pattern recognition applied to seismic signals of Llaima Volcano (Chile): an evaluation of station-dependent classifiers. J Volcanol Geotherm Res 315:15–27. https://doi.org/10.1016/j. jvolgeores.2016.02.006 Giri CP (2012) Remote sensing of land use and land cover: principles and applications. Remote sensing applications series. CRC Press Haykin S (2009) Neural networks and learning machines, vol 10. Prentice Hall Nayak MA, Ghosh S (2013) Prediction of extreme rainfall event using weather pattern recognition and support vector machine classifier. Theor Appl Climatol 114:583603. https://doi.org/10.1007/s00704013-0867-3 Pavlidis T (2013) Structural pattern recognition. Springer series in electronics and photonics. Springer, Berlin/Heidelberg Theodoridis S, Koutroumbas K (2008) Pattern recognition, 4th edn. Academic, San Diego Wang F, Zhen Z, Mi Z, Sun H, Su S, Yang G (2015) Solar irradiance feature extraction and support vector machines based weather status pattern recognition model for short-term photovoltaic power forecasting. Energ Buildings 86:427–438. https://doi.org/10.1016/j. enbuild.2014.10.002 Webb AR, Copsey KD (2011) Statistical pattern recognition, 3rd edn. Wiley. https://doi.org/10.1002/9781119952954
Pawlowsky-Glahn, Vera Ricardo A. Olea Geology, Energy and Minerals Science Center, U.S. Geological Survey, Reston, VA, USA
Summary Pattern recognition comprises a research area focused on developing and applying mathematical and computational concepts to solve data classification problems. The amount and variety of pattern recognition applications in geosciences are vast.
P
Cross-References ▶ Pattern ▶ Pattern Analysis ▶ Pattern Classification
Fig.1 Vera Pawlowsky-Glahn, courtesy of Prof. Juan José Egozcue
Bibliography Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer, Berlin/Heidelberg Chapelle O, Scholkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge, MA Chattopadhyay A, Hassanzadeh P, Pasha S (2020) Predicting clustered weather patterns: a test case for applications of convolutional neural networks to spatio-temporal climate data. Sci Rep 10(1317). https:// doi.org/10.1038/s41598-020-57897-9
Biography Vera met her professional destiny the day she had a meeting with her adviser Wolfdietrich Skala to choose the topic of her dissertation. He offered her a job as collaborator on a compositional data analysis project, but she wanted to work on geostatistics. Vera first proved that spurios correlation was a
1066
problem in both disciplines. The rest is history. Vera started a long spatiotemporal trek that took her to the top specialist on each discipline: Georges Matheron in France and Aitchison in Hong Kong. Progress toward the solution to the mapping of compositional data came in several installments. First was her dissertation (Pawlowsky 1986), then a translation with a few new developments (Pawlowsky-Glahn and Olea 2004) and finally the complete solution to the problem via the formulation of the Aitchison geometry (PawlowskyGlahn and Egozcue 2001). Over the years, in cooperation with multiple scientists around the world, Vera expanded her interest to all aspects of compositional modeling, writing more than 100 papers. The main accomplishments of her career are summarized in Pawlowsky-Glahn et al. (2015). Without Vera, Aitchison’s ideas may have been lost or, at best, it would have taken much longer to receive the acceptance that they have today. The compositional data specialists around the world prepared a Festschrift (Filzmoser et al, 2021) in appreciation for Vera’s contribution to the field. Upon finishing graduate school, Vera spent her career with the Polytechnical University in Barcelona (1986–2000), serving as Associate Professor, and with the University of Girona (2000–2018), where she reached the position of Chair of the Computer Sciences, Applied Mathematics and Statistics Department. Vera has been quite active in the International Association for Mathematical Geosciences (IAMG), organizing the third IAMG conference in 1997, serving as chair of several committees, convener of numerous sessions, the 2007 Distinguished Lecturer, and the 11th President (2008–2012). IAMG presented her with the 2006 Krumbein Medal for her valuable contributions to science, the profession and IAMG, and also the 2008 Cedric Griffiths Teaching Award. She was instrumental in starting the Compositional Data Association in 2015, serving as the founding president (2015–2017). Vera was born in Barcelona on 25 September 1951, the seventh of nine children. As a Latvian-Russian-German descendant, she graduated from high school from the Deutsche Schule of Barcelona in 1970. She attended the University of Barcelona, enrolling in the Mathematics Department, not in the Biology Department as she had originally planned, because there was no waiting queue to enroll in Mathematics. She earned the equivalent of a master’s degree in mathematics in 1982. After that, she decided to pursue graduate studies in Germany, where she received a PhD in natural sciences from the Free University of Berlin in 1986. Vera’s partner is Juan José Egozcue and she is the mother of zoo, aquarium, and wildlife veterinarian Tania Monreal-Pawlowsky.
Pengda, Zhao
Bibliography Filzmoser P, Hron K, Martín-Fernández JA, Palarea-Albaladejo J, 2021. Advances in compositional data analysis: Festschrift in honour of Vera Pawowsky-Glahn. Springer. Pawlowsky V (1986) Räumliche Strukturanalyse und Schätzung Ortsabhängiger Kompositionen mit Anwendungsbeispielen aus der Geologic. Dissertation, Free University of Berlin, 170 pp Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stoch Env Res Risk A 15(5):384–398 Pawlowsky-Glahn V, Olea RA (2004) Geostatistical analysis of compositional data. Oxford University Press, Oxford, 181 pp Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Chichester, 247 pp
Pengda, Zhao Qiuming Cheng1 and Frits Agterberg2 1 State Key Lab of Geological Processes and Mineral resources, China University of Geosciences, Beijing, China 2 Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Fig. 1 photographed at the cultural party celebrating the 60th anniversary of the founding of China University of Geosciences. https://en. wikipedia.org/wiki/Zhao_Pengda (CC BY-SA 3.0)
Plurigaussian Simulations
1067
Biography
Bibliography
Professor Zhao Pengda was born in Qingyuan, Liao Ning Province in 1931. He graduated from the Geological Department of Beijing University in 1952, whereupon he was appointed as assistant to teach in the newly founded Beijing Geological Institute. In 1954, he was sent to Moscow, USSR as a postgraduate student, and studied methods of prospecting and exploration for mineral deposits under Professor A. A. Yakren at the Moscow Geological Surveys College obtaining his Ph.D. in 1958. The title of his dissertation was: “The geological characteristics and methods of exploration for stockwork type tin and tungsten ore deposits.” In 1960, he became Associate Professor at the Beijing Geological Institute, where he attained the rank of full professor in 1980. Meanwhile, the Institute was moved to Wuhan and named Wuhan College of Geology in 1975. Appointed its President in 1983, Professor Zhao retained this post following reorganization of the college in 1987 to create the China University of Geosciences (CUG). Later during his reign, CUG opened its second campus in Beijing. Professor Zhao has been researching the use of mathematical and statistical methods in mineral exploration since 1956. In 1976, he began applying mathematical models to predict and assess mineral resources, specifically iron and copper deposits in a Mesozoic volcanic basin in southern China. This resulted in the book “Statistical Prediction for Mineral Deposits” (Zhao et al. 1983). In total, he has published three books and over 120 papers. His later research includes work on expert systems, statistical theory and methods of prediction, and integrated modeling of geochemical anomalies and ranking by favorability. He has taught courses in mathematical geology, statistical analysis of geoexploration data, and statistical prediction of mineral deposits. In 1990, Professor Zhao organized and chaired the international workshop: “Statistical Prediction and Assessment for Mineral Deposits,” held in Wuhan and attended by 14 experts from outside China including Dr. Richard McCammon, then President of the International Association of Mathematical Geosciences (IAMG). As Father of Mathematical Geology in China, Professor Zhao has been a member of the IAMG for many years, furthering the field of mathematical geology through teaching and publications (e.g., Zhao 1992). In 1993 he was awarded the William Christian Krumbein Medal, which is the highest award the IAMG can bestow upon one of its members. In 1993, Professor Zhao also became Academician of the Chinese Academy where he chaired its geoscience section for many years. After his retirement as President of the China University of Geosciences he remains actively involved in research and supervision of Ph. D. students. In May 2020, CUG organized Professor Zhao’s 90th Birthday Celebration.
Zhao P (1992) Theories, principles and methods for the statistical of mineral deposits. Math Geol 24:589–592 Zhao P, Hu W, Li Z (1983) Statistical prediction of mineral deposits. Geological Publishing House, Beijing. (in Chinese)
Plurigaussian Simulations Nasser Madani School of Mining and Geosciences, Nazarbayev University, Nur-Sultan city, Kazakhstan
Definition Plurigaussian simulation is a stochastic technique for spatial modeling of categorical variables. An important characteristic of this technique is that the relationship and contact between categories can be identified by domain expert and imposed into the process of modeling.
Introduction Categorical variables in geoscience-related fields usually are divided into two parts. The first part, nominal variables are the categories without any ordering. Some examples of this family are lithology, facies, rock types, alternations, permeable/ impermeable, etc. The second part, ordinal variables are the categories with an intrinsic ordering or ranking. Several examples of this family are lithofacies sequence in sedimentary environment, class of mineral grades and geochemical anomalies (e.g., poor, medium, rich), intensity of minerals in a rock sample quantified on a quantitative scale (e.g., none, little, moderate, much, very much), clarity and color grades of particular minerals and stones, etc. Let’s denote the categorical variables as geo-domains throughout this text. Threedimensional spatial modeling of these geo-domains can be implemented either by deterministic or stochastic approaches. In the former, the layout of the boundaries between two adjacent geo-domains can be specified based on the geological knowledge and one may obtain only one single scenario for positioning of this boundary. In the latter, however, a stochastic process determines the probable area of the boundaries among the geo-domains. In this context, several geostatistical simulation techniques are developed in the last decades. Among others, plurigaussian simulations are of paramount importance for modeling the geo-domains where some particular restrictions exist in contact boundaries. For instance, in modeling of a mineral deposit, from the
P
1068
Plurigaussian Simulations
geological perspectives, it is possible that some of geodomains never touch each other, meaning that there is no contact between them. This characteristic, known as forbidden contact can be modeled in resulting maps/scenarios by plurigaussian simulations.
only in contact with geo-domain B and the latter is only in contact with geo-domains C and D. As a result, the configuration of the underlying flag that should be identified based on the geological interpretation of the domain identifies the permissible and forbidden contacts between geo-domains whether or not to comply with chronological ordering.
History
Step 2: Estimating the Truncation Thresholds Splitting the Gaussian maps, given the preidentified flag, to obtain the categorical maps requires determining the corresponding thresholds. These numerical values can be obtained based on the proportion of each geo-domain. For N Gaussian random fields belonging to a vector random field Z ¼ {Z(x) : x Rd} with N components, and M geo-domains, M – 1 thresholds can be identified. Let’s denote g(.) as the joint probability density function of the Gaussian random fields. For the purpose of computing the probability of presenting the i-th geo-domain at a particular sample location x, one requires resolving the following formula:
Theoretical background of plurigaussian simulations was first introduced by Galli et al. (1994). In the past almost 26 years since the initiation of the first idea, many scholars used this technique for modeling the categorical variables in reservoir characterization, ore body modeling, rock fractures, and hydrogeology (Armstrong et al. 2011).
Essential Concepts Plurigaussian simulations are natural extension of truncated Gaussian simulation. Matheron et al. (1987) first introduced the truncated Gaussian simulation for the purpose of simulating first the lithofacies in an oil reservoir and then to simulate the porosity and permeability within the built lithofacies. This methodology is developed for the cases when the geodomains follow a sequential ordering. Plurigaussian simulation, however, is developed to model more complicated types of contacts that exist among the underlying geo-domains. For instance, Fig. 1 shows an example resulting map of truncated and plurigaussian simulations for modeling four geo-domains. Figure 1a shows that there is a sequence from geo-domain A to geo-domain D, which is obtained from the former method while Fig. 1b illustrates that the geo-domain A only touches geo-domain B and never is in contact with geodomains C and D. The rationale behind the truncation Gaussian and plurigaussian simulation is that the former considers one Gaussian random field (Fig. 1c), and the latter employs two or more Gaussian random fields (Fig. 1c and d) that are simulated entire area under consideration, and then a truncation rule (Fig. 1e and f) is used to turn these Gaussian values into geo-domains. Hence, the actual application of the plurigaussian simulations requires defining five main steps: Step 1: Choosing the Truncation Rule As shown in Fig. 1, the Gaussian random fields (Fig. 1c and d) need to be truncated in order to obtain the categorical maps (Fig. 1a and b). This needs definition of a truncation rule known as “flag,” where one can identify how the geo-domains are in content together. For instance, in Fig. 1e, the flag can represent the sequential contact between the geo-domains A to D and Fig. 1f depicts the flag that geo-domain A is
Pi ð xÞ ¼
gðz1 , . . . , zN Þ dz1 . . . dzN
ð1Þ
Di
Where Pi is the proportion and Di is the segment of the flag linked to the ith geo-domain (a subset of RN defined by the thresholds). This formula may be resolved numerically or by trial and error. Step 3: Inferring the Variogram of the Gaussian Random Fields Variogram inference of the underlying Gaussian random field (s) is a key factor in plurigaussian simulation. Based on the mathematical relationship between the indicator variograms and the variograms of the Gaussian random field(s), one may infer a proper theoretical model for the variogram of interest by deriving its properties. The experimental indicator variogram can be computed experimentally through the recorded geo-domains at sample locations. For any separation vector h, the indicator cross variogram between two geodomains (with indices i and j) is deduced by related noncentered covariance (Armstrong et al. 2011): gij ðhÞ ¼ Cij ð0Þ
1 C ðhÞ þ Cij ðhÞ 2 ij
Cij ðhÞ ¼ prob GðxÞ Di , Gðx þ hÞ Dj
ð2Þ ð3Þ
If Di and Dj are rectangular parallelepipeds of RN and the components of the vector random field G are independent, then the second member of Eq. (3) is a function of the direct covariances or variograms of the components of G and can be computed by utilizing Hermite polynomials (Emery 2007).
Plurigaussian Simulations
1069
P
Plurigaussian Simulations, Fig. 1 Schematic representation of truncated and plurigaussian simulations techniques
Step 4: Generating Gaussian Random Fields at Sampling Points Once the variogram of Gaussian random fields and the corresponding threshold(s) are obtained, the categorical variable at sampling points should be converted to the Gaussian values. This step generates the Gaussian values that first respects the variogram fitted in step 3 and also falls in the proper intervals imposed by flag and the truncation thresholds. For this purpose, a method called Gibbs sampler (Casella and George 1992) is used to generate such Gaussian values. The common procedure is as follows: Initialization In each sample point uα, simulate a vector with N components uα in Di(α), where i(α) is the index of the geo-domain given at uα.
Iteration
(I) Select a data location uα, regularly or randomly. (II) Compute the distribution of G(uα) conditional to the other data {G(uα) : β 6¼ α}. In the stationary case, this is a Gaussian distribution, with mean equal to the simple kriging estimation of G(uα) and covariance matrix equals to the covariance matrix of the simple kriging errors. (III) Simulate a vector Zα according to the previous conditional distribution. (IV) If Zα is in compliant to the geo-domain dominating at uα (Zα Di(α)), substitute the existing value of G(uα) by Zα. (V) Go back to a) and loop many times.
1070
The Gibbs sampler presented is reversible, aperiodic, and irreducible. In a nutshell, the distribution of simulated values through the sample points converges to the conditional distribution of the target Gaussian random fields if the number of iterations increases significantly. Step 5: Simulating Values at Target Grid Nodes In this step, the simulation should be implemented to simulate the Gaussian random fields at the grid nodes, conditionally to the values acquired in the Step 4 and the same variogram model inferred in Step 3. For this, any algorithm can be applied for simulation of the Gaussian random fields: Cholesky decomposition, sequential Gaussian simulation, discrete spectral simulation, continuous spectral simulation, and turning bands (Chilès and Delfiner 2012, and references therein). Then, the simulated Gaussian values at target grid nodes should be converted back into the geo-domains using the preidentified flag.
Applications in More Detail To illustrate the applicability of the plurigaussian simulations in a practical example, a small case study is presented in this section. This exemplar is inspired from a siliciclastic platform, for which it is composed of Sand, Shale Sand (ShSand), Shale (Sh), Limestone (Lst), and argillite limestone (argLst). For this case study, 100 sample points are irregularly available in an area and each facies is quantified at each sample point (Fig. 2). Based on the geological interpretation, Sand is in touch with ShSand and ShSand itself is in contact with Sh, whereas Sh is in contact with both Lst and argLst. This gradual change of facies from Sand to Lst and argLst is compatible with the sedimentology of the geological setting for this platform. All these knowledge contribute to identifying a flag as identified in Fig. 2. The variogram model of Gaussian random fields is obtained through an iteration over experimental variogram of indicators. Therefore, one cubic variogram with range of 100 m is inferred for both Gaussian random fields. The simulation results (two scenarios) are also shown in this figure. In total, 100 scenarios are generated and the probability map of each facies is calculated. These maps are applicable for quantification of uncertainty. In the end, one can obtain the most probable map as illustrated.
Current Investigations There are several investigations of plurigaussian simulations, however, in this section; we only highlight some important developments of this simulation algorithm. Standard method of plurigaussian simulation (Emery 2007) generally is restricted to model few geo-domains and the Gaussian
Plurigaussian Simulations
random fields. Madani and Emery (2017) provided a generalization of the conventional plurigaussian simulation that allows the number of Gaussian random fields to be increased, appropriate for the chronological ordering of geological settings in such a way that younger geo-domains cross-cut the older ones. In such a case, all the geo-domains must be in contact altogether where modeling forbidden contact is not feasible. Maleki et al. (2016) introduced another exercise of this method, for which the geo-domains are, in the same manner, ordered hierarchically, but the forbidden contacts can also be dictated into the process of modeling. All these techniques are on the basis of stationary phenomena or homogeneous variability. Nevertheless, in the case of heterogeneous characteristics of geo-domains, one needs to renovate the stationary assumptions (Madani and Emery 2017) or utilize secondary information in replicating the possible trend for the corresponding geo-domains entire the region. For instance, an accessible information may be the seismic data. Another generalization of plurigaussian simulation is related to modeling the cyclic and rhythmic facies architectures (Le Blévec et al. 2018). The developed plurigaussian model takes into account the dampened hole-effect variogram model to delineate the rhythmicity in the vertical direction and a different stationary variogram model in horizontal plane. Plurigaussian simulations can also be integrated with multiGaussian simulation for establishing a modeling technique between geo-domains and continuous variables. Emery and Silva (2009) developed a joint simulation algorithm, where one categorical variable and one continuous variable can be modeled altogether given that the variability of the latter across the boundary of former is soft and there is a good association between these two variables. In this case, it is possible to consider several permissible and forbidden contacts among the geo-domains.
Controversies and Gaps in Current Knowledge Despite the rapid development of plurigaussian simulations in the last decades, there remain some avenues for further research for modeling the geo-domains by this technique. Deterministic Evaluation of Truncation Threshold One of the main concern in plurigaussian simulations is identification of truncation thresholds based on the available data at sampling points. There are some approaches to define the declustering proportions to make it representative; however, fitting variogram step, Gibbs sampler and final truncation maps highly depend on the predetermined truncation threshold. Therefore, misspecification of these values leads to producing a huge bias in the results. Incorporation of uncertainty for truncation threshold in the process of modeling needs investigation.
Plurigaussian Simulations
1071
P
Plurigaussian Simulations, Fig. 2 Application of plurigaussian simulation for modeling the facies in a siliciclastic platform
Variogram Analysis In the case of having complicated contact relationship between geo-domains, manual variogram fitting is cumbersome. This is more difficult when the aim is to use joint simulation for incorporation of continuous variable as proposed in Emery and Silva (2009). An automatic technique or
semiautomatic technique may facilitate this issue. Another issue is related to lack of data. Whenever some geo-domains are scarce, modeling the variogram and inference its corresponding Gaussian variogram is not trivial. More work is needed to solve the issue of variogram analysis in the presence of scarce data.
1072
Heterogeneity Characteristics of Geo-Domains Nonstationary representation of geo-domains in a geological complex motivates one using the proportion curves (Armstrong et al., 2011). A common exemplar comes up when the proportions of the geo-domain of interest change vertically and laterally. Spatially varying proportions may be connected with spatially varying thresholds. Taking the advantage of spatially varying geo-domain proportions is disputing for the variogram inference and the computation of spatially varying truncation thresholds itself. More research is required to solve the issue of nonstationary phenomena of geo-domains without considering the spatially varying proportion curves. Convergence of Gibbs Sampler In the case of simulating plenty of sample points, a moving neighborhood often is used for solving the kriging system, based on the screen-effect approximation. However, this means that the Markov Chain may no longer converge to the required distribution or does not converge at all. More investigation is required to solve the issue of computational resources for using unique neighborhood in Gibbs sampler instead of moving neighborhood in the case when the data are numerous. Complex Geological Features Plurigaussian simulations still are suboptimal for modeling some complex geological features such as connectivity constrains, long-range geological structures, and conifom shapes (e.g., intrusive bodies, metamorphic geological process, infiltration by gravity, or deposition). Cosimulation Plurigaussian simulations still are restricted to modeling of one categorical variable. However, this variable may contain only a few geo-domains. More investigation is required for establishing a cosimulation algorithm for modeling two or more categorical variables. For instance, sometimes, rock types and alteration show good association in a deposit. The idea of further research can use of a cosimulation algorithm to reproduce such an association. Concerning this, one possible solution can be multivariate categorical modeling with hierarchical truncated simulation (Silva and Deutsch 2019). Joint Simulation Using plurigaussian simulations for joint simulation as proposed by Emery and Silva (2009) is restricted to only one categorical variable and one continuous variable. More investigation is needed for developing an approach for joint simulation of more than one continuous variable, while the categories remain to either one or several variables. Another avenue of research is related to the hierarchical simulation
Plurigaussian Simulations
algorithm for identifying truncation rule while there exist one or more geo-domains in this joint technique.
Summary and Conclusion Plurigaussian simulations are advanced techniques for modeling the geo-domains with complexity in contact relationship. Flexibility of identifying the truncation rule (flag) and incorporation of geological perspectives lead to producing more realistic results compared to other simulation techniques. In this regard, particularly, forbidden contacts can be imposed into the process of modeling in the resulting maps. For implementation of this algorithm, five main steps were discussed: (1) choosing the truncated rule (flag), (2) estimating the truncation threshold(s), (3) inferring the variogram of the underlying Gaussian random fields, (4) Generating Gaussian random fields at sampling points, (5) simulating values at target grid nodes. Plurigaussian simulations can be implemented in commercial software such as ISATIS and in Petrel through PGS plug-in that is available on Schlumberger’s Ocean Store. An open access MATLAB code is also available in IAMG website (iamg.org) attached to the work of Emery (2007). Besides many interesting solutions for modeling the complex geological features in this technique, there still exist several opportunities for further improvement. Few of them were discussed accordingly.
Bibliography Armstrong M, Galli A, Beucher H, Le Loc’h G, Renard D, Doligez B, Eschard R, Geffroy F (2011) Plurigaussian simulations in geosciences. Springer, Berlin Blévec TL, Dubrule O, John CM, Hampson GJ (2018) Geostatistical modelling of cyclic and rhythmic facies architectures. Math Geosci 50(6):609–637. https://doi.org/10.1007/s11004-018-9737-y Casella G, George EI (1992) Explaining the Gibbs sampler. Am Stat 46(3):167–174 Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty. Wiley, New York Emery X (2007) Simulation of geological domains using the plurigaussian model: new developments and computer programs. Comput Geosci 33(9):1189–1201 Emery X, Silva DA (2009) Conditional co-simulation of continuous and categorical variables for geostatistical applications. Comput Geosci 35(6):1234–1246 Galli A, Beucher H, Le Loc’h G, Doligez B (1994) HeresimGroup. The pros and cons of the truncated gaussian method. Geostatistical Simulations, Kluwer, pp 217–233 Madani N, Emery X (2017) Plurigaussian modeling of geological domains based on the truncation of non-stationary Gaussian random fields. Stoch Environ Res Risk Assess 31:893–913 Maleki M, Emery X, Cáceres A, Ribeiro D, Cunha E (2016) Quantifying the uncertainty in the spatial layout of rock type domains in an iron ore deposit. Comput Geosci 20:1013–1028. https://doi.org/10.1007/ s10596-016-9574-3
Point Pattern Statistics
1073
Dietrich Stoyan Institut für Stochastik, TU Bergakademie Freiberg, Freiberg, Germany
pattern can be considered as a part of a large pattern with similar random fluctuations everywhere in space. However, if one only aims consideration of short-range properties, clever choice of the window of observation gives sense to application of stationary methods, which is justified by Georges Matheron’s idea of “local ergodicity.” Standard references to point processes and their statistics are Baddeley, Rubak and Turner (2016), and Illian et al. (2008). All formulas mentioned in this article can be found in these books. For statistical analyses of point patterns, the statistical R package spatstat is recommended. In the present short text, the theory for spatio-temporal point patterns is not given, as it is applicable to the concrete example considered, where the points are centers of sinkholes (see Fig. 1).
Synonyms
Point Process Theory
Point process statistics
General Many geometrical structures in the geosciences can be described by point patterns, where randomly distributed objects are described by points. This is done in the two- and three-dimensional case, while the present text considers mainly the 2D case. Sometimes the objects are additionally characterized by marks (e.g., numbers or vectors), which leads to marked point patterns. If the marks are bodies, one speaks of germ-grain processes or object-models. Clearly, a point pattern is not a regionalized variable or random field as studied in classical geostatistics, since it is given only in the isolated points that constitute it; also with real-valued marks,
Matheron G, Beucher H, de Fouquet C, Galli A, Guerillot D, Ravenne C (1987) Conditional simulation of the geometry of fluvio deltaic reservoirs. In: SPE 1987 annual technical conference and exhibition. SPE, Dallas, pp 591–599 Silva D, Deutsch CV (2019) Multivariate categorical modeling with hierarchical truncated pluri-gaussian simulation. Math Geosci 51(1):527–552
Point Pattern Statistics
Definition In the geosciences irregular, random point patterns are studied, in 3D and 2D space. The “points” usually stand for objects of the same type, reducing the information on the objects to their locations; often the points are midpoints or small objects of the same size as the data grid cell. Additional information can be attached by so-called “marks,” which can be real numbers (characterizing perhaps size or type) or geometrical objects (pores or geological bodies). The mathematical models of point patterns are called point processes.
P Overview In geoscientific applications, the points may be positions of volcanoes, small deposits of ore, sinkhole centers, trap sites of precious stone placer deposits, centers of pores or crystals, etc. The corresponding patterns occur on all scales, from microscopic to plate-tectonic scale. The goals of point pattern statistics are for a concrete pattern: • • • • •
Classification of its type (clustering or regular) Understanding interaction between the points Determination of range of correlation Detection of correlation to covariables Fitting a model that helps to understand the process that formed the pattern
Typical point patterns in the geosciences are clustered and cannot be considered as stationary, a standard assumption of point process statistics. “Stationarity” means that a given
Point Pattern Statistics, Fig. 1 Point pattern of 134 sinkhole centers in the region of the city Tampa. This sinkhole pattern from January 2020 in a rectangle of 19.0 km 11.5 km in the region of the city of Tampa is a small subpattern of the large and spatially inhomogeneous pattern of all sinkholes in Florida, located at the South-West edge of a large cluster of sinkholes. It was selected as a nearly homogeneous pattern, with the aim to study possible short-range interaction of sinkholes in a relatively homogeneous geological situation with reduced influence of covariables such as population density (and thus of observation intensity). (Source: Florida Geological Survey publications, http://fldeploc.dep.state.fl.us/ geodb_query/PublndexQuery.asp)
1074
Point Pattern Statistics
it is not a regionalized variable. However, it is possible to transform a point pattern to such a variable by smoothing, see below. The corresponding stochastic models are called point processes and marked point processes. It is always assumed that the structures are locally finite, that the number of points in bounded sets is finite. The corresponding mathematical theory is presented in Chiu et al. (2013) and Illian et al. (2008). It studies the probability laws of point processes, the interactions of points, and the description by statistical summary characteristics and many point process models. The present text concentrates on the facts necessary for first statistical steps. Statistical methods for stationary point processes can be applied for the investigation of short-range interaction of the points, for the study of relationships at distances in the order of first neighbor distances. For this purpose, the window of observation W of observation is adapted to the point pattern so that it looks homogeneously distributed within W. For example, it may make sense to consider points in the interior of a single cluster, while the analysis of a total single cluster with much empty space around is questionable, see Illian et al. (2008, p. 264 and 268). Intensity Function and Intensity As the first step of a statistical analysis of a point pattern its intensity function, the spatially variable point density, l(x), should be determined, where x is a deterministic point that varies in the window of observation W. Mathematically l(x) yields the mean of the number of points N(A) in a set A by integration, EðN ðAÞÞ ¼ lðxÞdx:
ð1Þ
A
The intensity function can be estimated by smoothing (see below). Figure 2 shows the estimated intensity for the sinkhole pattern, with a simple parametric form. In geological applications also relations to covariables help in the description of point density, for example, the distance to faults or relations to curvature or strain, see Eq. (7). In the stationary case, l(x) is constant, called intensity and denoted by l. Equation (1) simplifies to EðN ðAÞÞ ¼ l vðAÞ,
ð2Þ
Point Pattern Statistics, Fig. 2 Empirical intensity function of the sinkhole pattern. According to the positions of the sinkholes at the SouthWest edge of a large cluster of sinkholes, the isolines of l(x) were chosen as lines in SE-NW direction, that is, l(x) is estimated in a parametrized way, not by smoothing. The intensity is measured in number per km2
Pair Correlation Function and Further Summary Characteristics The variability of point patterns is characterized by various summary characteristics. In general, the most powerful is the pair correlation function g(r). It plays the role of Ripley’s K-function K(r) in the older literature and may be seen as an analogue of the variogram of geostatistics. A short explanation is as follows. The term lK(r) can be interpreted as the mean number of points in a disc of radius r centered at a typical point of the pattern, not counting the disc center. Then the pair correlation function is given by gð r Þ ¼
dK ðr Þ =ð2prÞ for r 0: dr
ð4Þ
Note that the intensity l is eliminated. In the context of g(r), the variable r should be interpreted as inter-point distance. Always g(r) is nonnegative. If the point distribution is completely random, then gðr Þ 1:
ð5Þ
For large r the function g(r) always takes on the value 1. If there is a finite distance rcorr with gðr Þ ¼ 1 for r r corr then rcorr is called range of correlation. This means that there are no correlations between point positions at distances larger than rcorr. And if there is a minimum inter-point distance r0, sometimes called hard-core distance, then
where v(A) denotes the area of A. Clearly, l is estimated statistically by
gðr Þ ¼ 0 for r r 0 :
l ¼ N ðAÞ=vðAÞ:
The estimation of the pair correlation function is described in Baddeley et al. (2016) and Illian et al. (2008). Corresponding to the nature of g(r) estimators of l are used
ð3Þ
Point Pattern Statistics
for normalization. It should be determined only for large point patterns of (say) more than 100 points. The interpretation of empirical pair correlation functions is an art, which needs experience (Illian et al. 2008). If the pattern analyzed deviates from the behavior expected for a sample of a stationary point process, the estimated g(r) can for large r significantly deviate from the theoretical value 1. In such cases, the estimation may be improved by using ideas of the statistics for second-order intensity-reweighted stationary point processes. Then the estimators of intensity l are replaced by estimators of the intensity function l(x) (Baddeley et al. 2016). Figure 3 shows the empirical pair correlation function g(r) for the sinkhole pattern of Fig. 1 obtained by the stationary and nonstationary approach. The curves have the typical form of pair correlation functions for cluster point processes. They suggest a cluster diameter between 0.2 and 0.5 km. There are
1075
nine point pairs of an inter-point distance smaller than 40 m. (This is very well related to a localized intense rain event.) The closest has a distance of 13 m, and the corresponding events happened in 6 days distance. Another pair of distance 40 m appeared in 2 days. Thus, there is strong clustering in space and time. The large values of g(r) for small r indicate clustering also in this selected subpattern. The pattern may consist of many micro clusters, and these microclusters seem to be randomly distributed; they often are simply close point pairs. Any prediction based only on the pattern is difficult. Point process statistics has many other summary characteristics, which may be considered as belonging to the class of multiple-point statistics. The most popular ones are: • Distribution function D(r) of the distance to the nearest neighbor from a typical point of the process • Spherical contact distribution function Hs(r), that is, distribution function of the distance from a random test point to the nearest point of the pattern Many others can be found in Baddeley et al. (2016) and Illian et al. (2008). For the determination of directionalities in point patterns, for example, in the context of strain analysis, the Fry method is used (Rajala et al. 2018). For the sinkhole pattern, Fig. 4 shows two summary characteristics, which aim to show gaps or regions of high point density: • The 4-neighborhood graph, see Illian et al. (2008, p. 258) • The Allard-Fraley estimate of regions of high point density, see Baddeley et al. (2016, p. 192).
Point Pattern Statistics, Fig. 3 Empirical pair correlation functions. Empirical pair correlation functions for the pattern of sinkhole centers in Fig. 1, estimate ignoring nonstationarity (—) and estimate with intensityreweighting (- - -) in comparison to the value of 1 for the CSR case (. . .). It is typical that the curve for the stationary approach is above that for the nonstationary approach. See the text for more information
P Marked Point Processes To each point xn of a point process, a further datum mn, called mark, can be added, such as a real number (perhaps tonnage
Point Pattern Statistics, Fig. 4 4-neighborhood graph and high-intensity regions. (left) Each sinkhole center is connected with its four nearest neighbors, which shows gaps in the point pattern. (right) High-intensity regions
1076
of the ore deposit at xn), a surface (perhaps a fault centered at xn), or a three-dimensional object centered at xn, see the article Stochastic Geometry in the Geosciences. Stationarity of marked point processes means invariance of the distribution under translations which shift the points but retain the marks, that is, [x, m] ! [x + δ, m]. The intensity of a stationary marked point process is the intensity of the point process with the marks stripped off. The marks are characterized by the mark distribution, which in the case of real-valued marks is simply given by a distribution function. The corresponding mean is called the mean mark. For the case of real-valued marks, stochastic models are available in the literature, for which formulas exist for model characteristics. Two simple cases are (a) independent marks and (b) marks that come from an independent random field {Z(x)}, that is, the mark of the point xn is Z(xn). In the statistics of marked point processes summary characteristics called mark-correlation functions are of value, that is, functions that describe the spatial variability of the marks relative to the points. An example for the application of these characteristics is Ketcham et al. (2005), which studies threedimensional porphyroblastic textures. Figure 5 shows a volume rendering of a slice of the material obtained using computed tomography. For statistical analysis, such samples were converted to marked point patterns, where crystal centers ¼ points and crystal volumes ¼ marks. Point process statistics for these data suggests distributional order or regularity at length scales less than the mean nearest-neighbor distance and clustering at great length scales. Crystals close together tend to be small, even after
Point Pattern Statistics, Fig. 5 A porphyroblastic structure. Garnet crystals (red) and higher-density trace phases (such as zircon and iron oxides) (yellow) in a sample of garnet amphibolite schist. The width of the object shown is 25.5 mm and the thickness 3.1 mm. (Data courtesy of Richard Ketcham)
Point Pattern Statistics
accounting for the physical limitation that one crystal cannot nucleate within another. In another microstructure application, the points could be pore centers and the marks pore volumes.
Point Process Models The Homogeneous Poisson Process, CSR The theory of point processes has a standard “null model”: the homogeneous Poisson process. In its context statisticians often speak about complete spatial randomness (CSR). Between the points of a homogeneous Poisson process, there is no interaction, and it can be assumed that the points take their locations independently. The point numbers in disjoint sets are independent; this holds true not only for two sets but for any number of sets. The homogeneous Poisson process is a stationary and isotropic point process. Its only parameter is the intensity l. The name “Poisson process” comes from the fact that the number N(A) of points in any set A has a Poisson distribution with mean m ¼ l v(A), which means PðN ðAÞ ¼ iÞ ¼
mi expðmÞ for i ¼ 0, 1, . . . : i!
ð6Þ
For the Poisson process, there exist many formulas for its distributional characteristics (Baddeley et al. 2016; Illian et al. 2008). In particular, the pair correlation function g(r) has the simple form of Eq. (5). When a statistician is able to prove by a test that a given point pattern can be seen as CSR, that is, as a sample of a homogeneous Poisson process, then any search for interaction between the points can stop, and any prediction is senseless. (That’s why the term “null model” is used.) Therefore, there are many tests of the CSR hypothesis, see Baddeley et al. (2016), Illian et al. (2008), and Myllymäki et al. (2017). Many of these consider an empirical summary characteristic for the pattern and analogous characteristics obtained by simulating CSR patterns, rejecting the CSR hypothesis if the empirical characteristic differs significantly from the simulated ones. For the pattern of sinkholes, the test using the pair correlation function leads to rejection of the CSR hypothesis. The values of g(r) for r between 0.2 and 0.4 km are too large. For the homogeneous Poisson process, the nearest neighbor distance distribution function D(r) and the spherical contact distribution function Hs(r) coincide and are equal to 1 – exp(lr2). The Inhomogeneous Poisson Process and the Cox Process The inhomogeneous Poisson process shares the most properties with the homogeneous one, only there is a spatially variable intensity function l(x). Care is necessary to
Point Pattern Statistics
1077
understand the main point: Of course, every point pattern has a randomly fluctuating point density, even a pattern for a stationary point process. However, in the case of the inhomogeneous Poisson process, the point density fluctuates in a predictable sense: If one could observe more than one sample, one would observe similar point densities in the same regions, for example, high point density along a geological fault; nevertheless the particular point locations may be fully random. The point numbers in disjoint sets are independent and the number N(A) of points in any set A has a Poisson distribution, the mean of which depends both on size and location of A. It is given by Eq. (1). An example of application of an inhomogeneous Poisson process is the study of spatial correlation between gold deposits and geological faults in Baddeley (2018), where an intensity of the form lðxÞ ¼ expða þ bd ðxÞÞ
ð7Þ
was used, where d(x) is the nearest distance from x to a fault. The inhomogeneous Poisson process can be made still more random: also the intensity function may be random. The resulting point processes are called Cox processes (after the English statistician David R. Cox) or “doubly stochastic Poisson processes” because of the two stochastic elements in the model. In this case, the intensity function is a random field: if this field is stationary then the Cox process is stationary too. Of course, this field has to be positive, as an intensity function. Therefore, a Gaussian random field is not suitable in this context. A natural solution is to start from a Gaussian field {Z(x)} and then to use the exponential l(x) ¼ exp(Z(x)). This leads to so-called log-Gaussian Cox processes, which have been widely used in the modeling of environmental heterogeneity. Because of their construction, they can be seen as a link between point process statistics and geostatistics. They are simple enough to derive formulas for their statistical characteristics. For example, intensity and pair correlation function of a stationary and isotropic log-Gaussian Cox process are given by 1 l ¼ exp mZ þ s2Z 2
ð8Þ
gðr Þ ¼ expðkZ ðr ÞÞ for r 0,
ð9Þ
and
where mz is the mean, s2Z the variance, and kZ(r) the covariance function of the Gaussian field Z(x).
Cluster and Regular Point Processes Some Cox processes are cluster point processes. Typical examples are Poisson cluster processes. These can be thought to be constructed starting from a homogeneous Poisson process. The points of this process, sometimes called “mother points,” are cluster centers (not points of the final process), and around each of these points “daughter points” are randomly scattered. The union of all daughter points is the cluster process. Formulas for its distributional characteristics can be found in Baddeley et al. (2016), and Illian et al. (2008). Cluster point patterns appear frequently in the geosciences. A famous case are spatio-temporal patterns of epicenters of earthquakes, which are clustered in space and time (Ogata 2017). Of course, also the sinkhole pattern can be considered as a sample of a cluster process. The pair correlation function of a cluster point process may have a pole at r ¼ 0. The existence of a pole may indicate a fractal behavior of the structure related to the points (Agterberg 2014). However, poles can appear also in situations without any fractal structure, for example, when daughter points are scattered on segments, perhaps close to faults. While the simulation of cluster processes is easy, their statistics present a challenge (Baddeley et al. 2016; Illian et al. 2008). Another class of point processes are regular point processes, for example, “hard-core processes” with a positive minimum inter-point distance. Such a model would be used for the porphyroblastic structure in Fig. 5. “Gibbs processes” as mentioned below in the context of Markov Chain Monte Carlo are a popular class of such models.
P Tools of Point Pattern Statistics Smoothing A regionalized variable is a quantity Z(x) defined for every location x of space. Examples are porosity at x (either 0 or 1) or ore content at x (a nonnegative number, being the ore content of a sample (ball or cube) centered at x). The statistics of regionalized variables are statistics of random fields and sometimes this just goes under the name geostatistics. Point processes are not regionalized variables, but sometimes it makes sense to regionalize such structures, in order then to apply the strong methods of geostatistics. A natural approach is smoothing. For example, consider the number of points Z(x) of a planar point process in the disc b(x, R) divided by πR2 for every point x of the space for some chosen radius R. For inhomogeneous point processes, this leads to estimates of the intensity function l(x). A problem is then the choice of the smoothing parameter R.
1078
Simulation Techniques Point processes are simulated in order to visualize the structure of some models, to generate samples for the investigation of statistical procedures, so-called test-patterns, or to demonstrate uncertainty. Here some important techniques are described, limited to the stationary case. Simple Simulations
The simulation of many point processes models is a simple task. Here are two pseudocodes for the first steps, where random is a random number uniform on [0, 1]: Simulation of a random point x ¼ (x, ) in a rectangle of side lengths a and b. 1. x ¼ random a 2. ¼ random b 3. print x ¼ (x, ) Simulation of a sample of a homogeneous Poisson process of intensity l in a rectangle. 1. generate a Poisson random number n of mean l a b 2. for i ¼ 1 to n generate random points xi in the rectangle 3. print (x1,. . ., xn) An elegant form of simulation of a Poisson random number is described in Illian et al. (2008, p. 71). In spatstat, a Poisson process pattern can be generated by the function rpoispp. For more general models, care is necessary with the boundaries, for example, when simulating cluster processes: then mother points outside the window must not be ignored. Here the technique of so-called “exact simulation” can be used. Markov Chain Monte Carlo
In particular, point processes with inhibition between the points, that is, with some degree of regularity, are simulated by the Markov chain Monte Carlo technique. A simple example is the simulation of the hard-core Gibbs process. It goes for a sample of n points in a window of simulation W as follows, if the hard-core or minimum distance between the points is R: Place n points within W so that the distance between all point pairs is not smaller than R. Take randomly one of the points and give it a new position chosen randomly in the part of W where the hard-core condition is not violated. Continue this procedure many times, until the simulation process comes into a stationary state; in the beginning, there is some burn-in period. This method can be generalized both to the cases of a random number of points and forms of
Point Pattern Statistics
interaction between the points more subtle than the simple hard interaction. The sequence of point patterns in W after each step is a Markov chain in the state space of point patterns. Reconstruction
Sometimes no model is available and one only needs patterns with similar statistical properties as a given empirical pattern, for example as training images. Then the method of reconstruction is recommended, a method in the spirit of multipoint statistics (Illian et al. 2008). Its first step is the determination of some statistical summary characteristics such as pair correlation function g(r) plus(!) distribution function D(r) or Hs(r) for the original pattern. Then a new point pattern is constructed that has similar statistical summary characteristics. The construction starts with a CSR pattern of the same point number as the empirical pattern. Then one of its points is chosen randomly and replaced by a random point in W. If the new pattern has summary characteristics closer to the characteristics of the empirical pattern, the new point is accepted and a new trial is made. If not, a new point is chosen, etc. This iteration is repeated for many steps, where the determination of the summary characteristics of the whole pattern must be well organized. The minimization of the discrepancy may be improved by so-called simulated annealing. Also conditional simulation is possible, which means here simulation of a point pattern in some region given a subpoint pattern.
Summary Point patterns make sense everywhere in geostatistics as representatives of locations of groups of objects irregularly distributed in space; marked point patterns stand for object systems. In statistical studies of microstructures, there are good chances to apply the powerful methods for stationary point processes. However, in larger scales big problems appear with instationarity and anisotropy. Here further research should change the situation, with adapted finite models and statistics.
Cross-References ▶ Geostatistics ▶ Multiple Point Statistics ▶ Simulated Annealing ▶ Simulation ▶ Spatiotemporal ▶ Stochastic Geometry in the Geosciences
Polarimetric SAR Data Adaptive Filtering
Bibliography Agterberg F (2014) Geomathematics: theoretical foundations, applications and future developments. Springer, Cham/Heidelberg/New York/Dordrecht/London Baddeley A (2018) A statistical commentary on mineral prospectivity analysis. In: Sagar BSD, Cheng Q, Agterberg F (eds) Handbook of mathematical geosciences. Springer, Cham, pp 25–65 Baddeley A, Rubak E, Turner R (2016) Spatial point patterns. Methodology and applications with R. CRC Press, Boca Raton Chiu SN, Stoyan D, Kendall WS, Mecke J (2013) Stochastic geometry and its applications. Wiley, Chichester Illian J, Penttinen A, Stoyan H, Stoyan D (2008) Statistical analysis and modelling of spatial point patterns. Wiley, Chichester Ketcham RA, Meth C, Hirsch DM, Carlson WD (2005) Improved methods for quantitative analysis of three-dimensional porphyroblastic textures. Geosphere 1:42–59 Myllymäki M, Mrkvicka T, Grabarnik P, Seijo H, Hahn U (2017) Global envelope tests for spatial processes. J R Stat Soc Ser B 79:381–404 Ogata Y (2017) Statistics of earthquake activity: models and methods for earthquake predictability studies. Annu Rev Earth Planet Sci 45:497–527 Rajala T, Redenbach C, Särkkä A, Sormani M (2018) A review on anisotropy analysis of spatial point patterns. Spat Stat 28:141–168
Polarimetric SAR Data Adaptive Filtering Mohamed Yahia1,2, Tarig Ali1,3, Md Maruf Mortula3 and Riadh Abdelfattah4,5 1 GIS and Mapping Laboratory, American University of Sharjah United Arab Emirates, Sharjah, UAE 2 Laboratoire de recherche Modélisation analyse et commande de systèmes- MACS, Ecole Nationale d’Ingénieurs de Gabes – ENIG, Université de Gabes Tunisia, Zrig Eddakhlania, Tunisia 3 Department of Civil Engineering, American University of Sharjah, Sharjah, UAE 4 COSIM Lab, Higher School of Communications of Tunis, Université of Carthage, Carthage, Tunisia 5 Departement of ITI, Télécom Bretagne, Institut de Télécom, Brest, France
Definition Remote sensing systems represent nowadays an important source of information for the study of the Earth’s surface. Due to their operation in all weather and all-time conditions, synthetic aperture radar (SAR) and polarimetric SAR (PoSAR) sensors are broadly applied in geoscience and remote sensing applications. Nevertheless, due to the coherent nature of the scattering mechanisms, SAR images are contaminated by the speckle noise. The speckle noise reduces the accuracy of post-processing as well as human
1079
interpretation of the images. Hence, speckle filtering forms an important preprocessing task before the extraction of the useful information from SAR images.
Introduction Adaptive filters and particularly the minimum mean square error (MMSE) filters have been extensively applied in PolSAR to reduce speckle. Most MMSE-based PoLSAR filters have mainly concentrated on the collection of homogeneous pixels. In the refined Lee filter (Lee et al. 1999), the homogeneous pixels are chosen from eight edge aligned windows. In the Lee sigma filter (Lee et al. 2015), similar pixels are chosen from the sigma range. In Lee et al. (2006), the scattering model-based technique chose pixels of the same scattering media. In Wang et al. (2016), similar pixels are chosen using both intensity information and scattering mechanism. Based on a linear regression of means and variances of the filtered pixels, an infinite number of looks prediction (INLP) technique has been applied to improve the filtering performance of MMSE-based SAR image filtering (Yahia et al. 2020a). In Yahia et al. (2017, 2020b), an iterative MMMSE (IMMSE)-based filter is introduced. The IMMSE is initialized by a polarimetric filter ensuring a high speckle reduction level. Then, to improve the spatial detail preservation, the IMMSE filter is implemented for few iterations.
SAR Polarimetry PolSAR data can be expressed using the scattering matrix S (Lee et al. 1999) S¼
Shh
Shv
Svh
Svv
ð1Þ
The indices v and h represent the vertical and horizontal polarization, respectively. Svh ¼ Shv for reciprocal backscattering. PolSAR data can be expressed also in a vector form O ¼ Shh
p
2Shv Svv
t
ð2Þ
where “t” represents the transpose operator. The total power of a pixel i.e., span is expressed as (Lee et al. 1999) span ¼ jShh j2 þ 2jShv j2 þ jSvv j2
ð3Þ
The covariance matrix C is expressed as (Lee et al. 1999) C ¼ EðO O t Þ
ð4Þ
P
1080
Polarimetric SAR Data Adaptive Filtering
Polarimetric SAR Data Adaptive Filtering, Fig. 1 Span images: (a) original image, (b) Sigma filter, (c) INLP filter. A and B zones are used to compute ENL (22) and EPD (23), respectively
covariance matrix should be maintained (i.e., XðiÞ ¼ CðiÞ). where “*” is the complex conjugation and E(.) is the mathematical expectation.
SAR images are affected by the multiplicative (Lee et al. 1999) yði, jÞ ¼ xði, jÞ nði, jÞ,
Adaptive MMSE PolSAR Filtering The PolSAR speckle filtering constraints are (Lee et al. 2015): 1. Conserve the polarimetric information. 2. Minimize the speckle noise in homogeneous zones (i.e., average all pixels XðiÞ ¼ CðiÞ). Where XðiÞ represents the filtered covariance matrix. 3. Preserve the spatial details. The value of the original
ð5Þ
n is the speckle noise with standard deviation sv and unit mean. x is the noise free pixel. For convenience, the (i,j) index is eliminated. The MMSE-based filtered pixel x is (Lee et al. 1999) x ¼ y þ d ðvarðyÞÞðy yÞ where
ð6Þ
Polarimetric SAR Data Adaptive Filtering
1081
b ¼ varðxÞ=varðyÞ,
ð7Þ
b¼x
ð8Þ
Equation (14) shows that the filtered pixel x is linearly related to its variance varðxÞ. This rule is applied to estimate the INLP filtered pixel (i.e., the parameter b) (Yahia et al. 2017). The linear regression in (x,y) coordinates gives (Vaseghi 2000):
Then b ¼ varðyÞ y2 s2n = 1 þ s2n varðyÞ
y and var( y) are computed adaptively using a local square window W. In the refined (Lee et al. 1999) and sigma (Lee et al. 2015) filters, the MMSE estimate of the covariance matrix (i.e., C) is C ¼ C þ b C C
ð9Þ
where b given in (8) is estimated from the span image (3). Figure 1 displays the original and the filtered span images of Les-Landes PolSAR data, respectively. It can be observed that the sigma filter performed high spatial detail preservation level. However; the speckle reduction is low. The speckle reduction level could be increased by increasing the size of the window size but the spatial detail preservation will be decreased. To increase the performance of the MMSE-based filters, the MMSE formulation has been improved as follows Infinite Number of Looks Prediction INLP Filter By combining Eqs. (5) and (6) (Yahia et al. 2017). varðxÞ ðy yÞ ¼ y ¼ x, x¼yþ varðyÞ varðxÞ ¼ E
varðxÞ yþ ðy yÞ y varðyÞ varðxÞ varðyÞ
¼
varðxÞ ¼
ð10Þ
and M
a¼
ðxi yi Mx yÞ=MV x ,
ð18Þ
i¼1
where Vx and x are the variance and the mean of the vector x, respectively. In (18), the parameters xi and yi represent the samples xi and varðxi Þ , respectively. To conserve PolSAR information, it has been suggested that all the elements of the covariance matrices must be filtered equally as the multi-look process (Lee et al. 2015). However, it is observed in (18) that the filtered pixel b is not linearly dependent on yi due to the square values of yi. Hence, (17) can be written as: b¼y 1þ
x2 Vx
x MV x
M
xi yi ¼ P y þ Q yt ,
ð19Þ
i¼1
where
2
P ¼ 1 þ x2 =V x
Eð y y Þ 2 ,
ð11Þ
ð12Þ
then, ðy xÞ varðxÞ: varðxÞ
ð13Þ
Finally, x ¼ a varðxÞ þ b,
ð14Þ
where
ð20Þ
and Q is a vector enclosing the samples Qi ¼ ðx=MV x Þxi :
ð15Þ
ð21Þ
To make b linearly dependent on yi, the parameters P and Q must have the same values for all covariance matrix elements. Hence, the parameters P and Q were estimated from the span image and introduced in (19) to filter all the elements of the covariance matrix similarly. The real and imaginary coefficients of the off-diagonal elements of the covariance matrix were processed independently. To assess the performance of the studied speckle filtering techniques quantitatively in terms of speckle reduction level, the equivalent number of looks is selected: ENL ¼ x=varðxÞ
a ¼ ðy xÞ=varðxÞ, and
ð17Þ
2
varðxÞ varðxÞ ¼ dvarðxÞ, varðyÞ
x¼xþ
b ¼ y a x,
ð16Þ
ð22Þ
To assess the ability of the filters in terms of spatial detail preservation, the Edge Preservation Degree (EPD) is
P
1082
Polarimetric SAR Data Adaptive Filtering
employed (Feng et al. 2011). The EPD in horizontal direction is:
EPDH ¼
m, n m, n
jxðm, nÞ=xðm, n þ 1Þj jyðm, nÞ=yðm, n þ 1Þj
:
ð23Þ
Table 1 shows the quantitative filtering results. The INLP filter ensured higher speckle reduction level (i.e., ENL ¼ 31 > 25) and better spatial detail preservation (EPDH-INLP ¼ 0.84126 > EPDH-sigma ¼ 0.8398) than the 7 7 sigma. Figures 1b and c display the filtered sigma and INLP span images, respectively. Quantitative outcomes are verified visually where the INLP filter ensured better spatial detail preservation than sigma filter. The spatial resolution is significantly increased by the improved INLP filter. The spatial details in Figs. 1c are more contrasted than the one in Figs. 1b. In addition to the ability of the adaptive MMSE filtering technique to reduce speckle, the conventional adaptive filters such as refined Lee, Lee sigma, are the most computationally efficient methods among other better speckle filtering techniques for huge datasets. In fact, the computational complexity of the PolSAR filter constitutes currently a great concern for practical implementations since for many of the new SAR systems such as TerraSAR-X, Cosmo-SkyMed, and Gaofen-3, the image sizes are huge (i.e., bigger than 10,000 10,000 pixels). Unlike the scalar MMSE, in PolSAR MMSE-based filters, the span is used to estimate the weight b. However, it has been Polarimetric SAR Data Adaptive Filtering, Table 1 Quantitative results Sigma 7 7 INLP (L ¼ 40)
ENL A zone 25 31
EPDH 0.8398 0.84126
EPDV 0.8183 0.8214
Polarimetric SAR Data Adaptive Filtering, Fig. 2 ENL of the span
demonstrated in Yahia et al. (2020c) that the span is a nonlinear filter and the span statistics are function of the entropy H. Figure 2 displays the ENL of the span image. It can be observed that the statistics of the span change within different extended homogenous areas. As result, for optimal application of the adaptive MMSE PolSAR filters, sn in (8) must be adapted to the statistics of the span image.
Summary and Conclusions In this study, the adaptive MMSE PolSAR filtering is investigated. The sigma Lee PolSAR filter is particularly implemented. Results show that this filter was able to preserve spatial details with low ability to reduce speckle. To improve the adaptive MMSE filtering performance, the INLP filter is introduced. The filtered pixel was derived using a linear regression of means and variances of the filtered pixels. The linear regression rule has been adapted to PolSAR data in order to preserve the polarimetric information. Results show that the INLP outperformed sigma filter in terms of spatial detail preservation and speckle reduction.
Bibliography Feng H, Hou B, Gong M (2011) SAR image despeckling based on local homogeneous-region segmentation by using pixel-relativity measurement. IEEE Trans Geosci Remote Sens 49(7):2724–2737 Lee JS, Grunes MR, De Grandi G (1999) Polarimetric SAR speckle filtering and its implication for classification. IEEE Trans Geosci Remote Sens 37(5):2363–2373 Lee JS, Grunes MR, Schuler DL, Pottier E, Ferro-Famil L (2006) Scattering-model-based speckle filtering of polarimetric SAR data. IEEE Trans Geosci Remote Sens 44(1):176–187 Lee JS, Ainsworth TL, Wang Y, Chen KS (2015) Polarimetric SAR speckle filtering and the extended sigma filter. IEEE Trans Geosci Remote Sens 53(3):1150–1160 Vaseghi SV (2000) Advanced digital signal processing and noise reduction, 4th edn. Wiley, New York, pp 227–254 Wang Y, Yang J, Li J (2016) Similarity-intensity joint PolSAR speckle filtering. In: International Conference on Radar Yahia M, Hamrouni T, Abdelfatah R (2017) Infinite number of looks prediction in SAR filtering by linear regression. IEEE Geosci Remote Sens Lett 14(12):2205–2209 Yahia M, Ali T, Mortula MM, Abdelfattah R, El Mahdi S, Arampola NS (2020a) Enhancement of SAR speckle denoising using the improved IMMSE filter. IEEE J Select Topics Appl Earth Obs Remote Sens 13: 859–871 Yahia M, Ali T, Mortula MM, Abdelfattah R, El Mahdy S (2020b) Polarimetric SAR speckle reduction by hybrid iterative filtering. IEEE Access 8(1) Yahia M, Ali T, Mortula MM (2020c) Span Statistics And Their Impacts On Polsar applications. IEEE Geosci Remote Sens Lett. https://doi. org/10.1109/LGRS.2020.3039109
Pore Structure
Pore Structure Weiren Lin and Nana Kamiya Earth and Resource System Laboratory, Graduate School of Engineering, Kyoto University, Kyoto, Japan
1083
were formed from volcanic gas. On the other hand, pores in crystalline rocks, including plutonic rocks and metamorphic rocks, have larger aspect ratios, e.g., >100, and are generally called fractures.
Visual Observation of Pore Structure Synonyms Interstice; Void
Definition A pore is a small open space or passageway existing between solid materials in a rock, also called a void or an interstice. Generally, pores primarily form at the same time as the formation of the host rock and/or are secondarily introduced by mechanical failures and/or chemical actions after rock formation. From their morphological features, pores can be classified into isolated pores and connected pores. Usually, the pore structure is described in terms of the geometrical features, including their shape, size, volume, connectivity, and tortuosity, of pore populations rather than an individual pore.
Introduction Invariably, pores exist in the earth materials from soils in the shallow subsurface to rocks in the crust and at least in the upper mantle. In this chapter, pores and pore structures in rocks will be the focus and will hopefully benefit areas related to solid earth in geoscience and geoengineering. Generally, pores in rocks are filled by groundwater at depths below the groundwater level. In rare cases, however, the pores are partially or fully filled by oil (petroleum) and/or natural gas or solid gas hydrate. More importantly, the connected pores play a function in the fluid flow path, which allows material circulations at local and global scales and affects Earth systems. In addition, pore fluid pressure, especially the over pore fluid pressure in seismogenic zones, possibly influences various phenomena in and around plate interfaces, such as great earthquakes and plate tectonics. Therefore, pore structure is one of the most fundamental and important characteristics of rocks in many areas of study, e.g., geology, geophysics, geochemistry, earth resources, and groundwater hydrology. Pore structures in different rock types are dependent on their origin and their host rocks. Generally, pores in sedimentary rocks exist in the matrix and have a relatively small aspect ratio (the ratio of their length to width). Most pores in volcanic rocks usually exhibit spherical shapes because they
To reveal the pore structure, visual observation at various scales is the most fundamental approach and has been performed by various methods. In this section, a very popular sedimentary rock called Berea sandstone was used as an example to show several typical images of a pore structure of a rock. This Paleozoic sandstone quarried for more than 200 years in Ohio, USA, has been used widely as a building material and grindstone (Hannibal 2020). Moreover, Berea sandstone with homogeneous fabric and fine-to-medium grain sizes has been used as a rock testing material for laboratory hydrological and mechanical experiments and for the visual observation of pore structures worldwide. Figure 1a is an image taken by an iPhone (Model: SE) camera, showing that any pore(s) is hard to identify, although mineral grains are visible. Therefore, it may indicate that the pore structure is hard to see with the human naked eye. Several holes (the pores) can be seen from images taken by a stereomicroscope, but the pore structure cannot be clearly identified, probably due to low contrast and the presence of many semitransparent minerals, such as quartz (Fig. 1b). Figure 1c is a polarized optical microscopic photo (open nicol) of a thin section of ~30 mm in thickness. The thin section was made up after fluidic resin dyed bule was impregnated in pores and then fixed thermally (Chen et al. 2018). Therefore, mineral grains and pores with blue color can be easily identified, and the pore structure is clearly visible. From this two-dimensional image, the pores can be considered to connect with each other in the three-dimensional domain, i.e., a well-connected pore network through Berea sandstone exists. Figure 1d and e shows the images of X-ray CT (computed tomography) at different scales. Since the 1990s, medical X-ray CT with an ~200-mm resolution has been utilized for rock core samples in the geosciences (Cnudde and Boone 2013). The important advantages of X-ray CT are as follows: (i) the internal structure in a rock sample can be imaged without cutting, i.e., nondestructive observation; (ii) the three-dimensional structure can be reconstructed; and (iii) its images are composed of digital data and suitable for numeral simulation and mathematical geosciences. Following the application of medical X-ray CT, micro-CT (high-resolution CT) with a typical resolution of ~10 mm (Fig. 1d and e and recently nano-CT with submicrometer resolutions were developed and applied. From such X-ray CT images, threedimensional pore network models can be built and then used
P
1084
Pore Structure
Pore Structure, Fig. 1 Images of Berea sandstone with ~20% porosity. (a) A normal optical photo taken by an iPhone (model: SE) camera. (b) An image of stereo microscope, arrows show presence of holes (the pores). (c) An image of thin section taken by a polarized optical microscope under open Nicol condition; the blue color shows pores in where blue resin was impregnated and fixed. (d) A X-ray CT image in a
resolution of ~22 mm/pixel of a cylindrical sample; the dark gray-black color spots show the pores. (e) A X-ray CT image in a resolution of ~8 mm/pixel of a rock tip (courtesy to Kazuya Ishitsuka). (f) A SEM image of the sandstone sample surface grinded by fine grit size sandpapers (average particle size 20
Reservoir quality Negligible Poor Fair Good Very good
Permeability (mD) 250
Reservoir quality Poor/tight Fair Moderate Good Very good
deposition diagenesis and catagenesis processes for clastic sediments, dolomitization of carbonates, solution, fracturing, faulting of carbonate, igneous and metamorphic rocks. Main types of secondary porosity are: (i) solution porosity; (ii) dolomitization porosity; (iii) fracture porosity; Engineering Classification of Reservoir Porosity An example of engineering classification of porosity used in hydrocarbon reservoir characterization is shown in Table 1 below. It is noted that the engineering criteria change with time, depending on reservoir type and technology advancement, say, more than 50 years ago a permeability of 50 mD indicated a tight reservoir (Tiab and Donaldson 2015) or while a 5%-porosity is negligible for a sandstone reservoir it is the maximum one can expect from a fractured granite basement reservoir.
Related Parameters Void Ratio In soil mechanics a variant of porosity named void ratio (e) is commonly used instead, defined as the ratio between the volume of voids and volume of matrix as defined below: e¼
Vp Vp ; n ¼ ¼ ¼ Vm Vt Vp 1 ; 1 n
ð3Þ
Where: e is void ratio; ; is porosity, similarly n is also porosity as commonly denoted in soil mechanics or geotechnical engineering; Vt, Vp, Vm are the same as in Eq. 1a, b (Fig. 1a). Tortuosity Tortuosity (t) of a porous material is the ratio of actual flow path length (La) to the straight distance between the ends of the flow path (L) as defined in Eq. 4 and shown in Fig. 4 with a dye tracer dispersion. Tortuosity can be of different types, i.e., geometric, hydraulic, electrical, and diffusive. It is a useful parameter to study the structure of porous media, their electrical and hydraulic conductivity, the travel time and length for tracer dispersion (Ghanbarian et al. 2013): t¼
La L
ð4Þ
Porosity, Fig. 4 Definition of tortuosity
Permeability and Porosity-Permeability Relationship (Poro-perm) Permeability (k) is defined as the ability of geomaterial (soils, rocks) to let the fluids flow through under a hydraulic/pressure gradient or loading. In 1856 Henry Darcy, a French engineer, based on laboratory experiments of water flow through core sand samples, developed an equation (Eq. 5) that had become a valuable mathematical tool to determine hydraulic conductivity for groundwater flow or permeability for hydrocarbon flow as follows: V¼
q k dp ¼ Ac m dl
ð5Þ
Where: v is the velocity of fluid flow (cm/s); q is the flow rate (cm3/s); k is the permeability of rocks in Darcy (D), 1D ¼ 0.986923 mm2; Ac is the cross-section of the core sample (cm2); m is the viscosity of the fluid in centipoises (cP); dp/dl is the pressure gradient in the direction of flow (atm/cm). Porosity and permeability form a couple of most related petrophysical parameters, with the latter being more difficult to be tested and estimated comparing to the former. That is why one always tries to establish some empirical relationships between these two parameters based on the core measurements. One early and common method is to plot porosity (on a linear scale) against permeability (on a logarithmic scale) and observe a curve fitting trend called poro-perm, which can be
P
1090
Porous Medium
used to predict permeability based on porosity values. For a non-linear porosity-permeability relationship, considering the rock heterogeneity, Kozeny-Carman equation is a good mathematical model for permeability prediction as widely used in petroleum engineering (Tiab and Donaldson 2015), and namely: k¼
3
1 ; Kps tS2 Vgr ð1 ;Þ2
k¼
3
1 ; 5S2 Vgr ð1 ;Þ2
while the latter is predominant in the oil and gas industry based on analysis of well log data.
Cross-References ▶ Porous Medium
ð6aÞ
Bibliography ð6bÞ
Where: k is permeability (D); Kps is the pore factor; t is the tortuosity; Svgr is The specific surface area per unit grain volume (cm-1); ; is the effective porosity; The product (Kps.t) is often taken equal to a numeric value, say, 5 for many porous rocks as shown in Eq. 6b, but it can vary from 2 to 5 and up depending on case by case.
Methods to Determine Porosity Porosity can be directly measured on the soil and rock samples in the laboratory by various methods such as buoyancy, helium porosimeter, fluid saturation, mercury porosimetry (Glover 2020; Ulusay 2015), thin section analysis, digital rock physics analysis or indirectly determined by analysis of well log curves, e.g., density, sonic, neutron, resistivity, NMR logs (Tiab and Donaldson 2015). In geotechnical engineering, the porosity and void ratio of fine-grained soils like clays are commonly determined using the popular one-dimensional consolidation or oedometer test (ASTM D2345 2020).
ASTM D2435 (2020) Standard test methods for one-dimensional consolidation properties of soils using incremental loading. American Society for Testing and Materials (ASTM) Fowler A, Yang X (1998) Fast and slow compaction in sedimentary basins. Appl Math 59(1):365–385 Ghanbarian B, Hunt AG, Ewing RP, Sahimi M (2013) Tortuosity in porous media: a critical review. Soil Sci Soc Am J 77(5):1461 Giao PH, Trang PH, Hien DH, Ngoc PQ (2021) Construction and application of an adapted rock physics template (ARPT) for characterizing a deep and strongly cemented gas sand in the Nam Con Son Basin, Vietnam. J Natl Gas Sci Eng 94:104117. https://doi.org/10. 1016/j.jngse.2021.104117 Glover Paul WJ (2020) Petrophysics MSc course notes, 376 p., Dept. of Geology and Petroleum Geology. University of Aberdeen, UK Graton LC, Fraser HJ (1935) Systematic packing of spheres, with particular relation to porosity and permeability. J Geol 43:785–909 Nur A, Mavko G, Dvorkin J, Galmudi D (1998) Critical porosity: a key to relating physical properties to porosity in rocks. Lead Edge 17(3): 357–362 Raymer LL, Hunt ER, and Gardner JS (1980) An improved sonic transit time-to-porosity transform, SPWLA, 21st annual logging symposium, 8–11 July 1980 Tiab D and Donaldson EC (2015) Petrophysics: theory and practice of measuring reservoir rock and fluid transport properties, 4th edn., 854 p., Gulf Professional Publishing, Elseveir as GPP is an imprint of Elsevier, USA and UK Ulusay R (2015) The ISRM suggested methods for rock characterization, testing and monitoring: 2007–2014, Eds. By R. Ulusay, 292 p., Springer International Publishing Switzerland
Summary or Conclusions Porosity is a fundamental property of soils and rocks, having many important applications in earth science, hydrogeology, geotechnical engineering, environmental engineering, petroleum geoscience and engineering etc. Extensive studies have been done on porosity since very early time of science and engineering development. Porosity of some clean sands can be calculated based on idealized spherical packings, but with time porosity is reduced due to geological process occurring after the rock formation such as cementation, solution, infilling of the pores, fracturing. Consequently, for rock characterization the porosity concept has to diversify and some classifications were proposed, e.g., geological and engineering classifications. There are two approaches to determine prosody directly and indirectly. The former is conducted on the core samples in the laboratory using various techniques,
Porous Medium Jennifer McKinley Geography, School of Natural and Built Environment, Queen’s University, Belfast, UK
Definition A porous medium can be defined as a material that contains spaces, which, when continuous in some way, enable the movement of fluids, air, and materials of different chemical and physical properties. The ability of a porous medium to permit the movement of fluids, air, or a mixture of different fluids is related to porosity and the variability of pore
Porous Medium
characteristics in that the presence of interconnected pore spaces, and thus permeability properties of the material, enables the movement of fluids and accompanying materials.
Characteristics of Porous Medium The characteristics of a porous medium are a result of variability in porosity and permeability, and the spatial continuity of these properties (McKinley and Warke 2006). Porosity as defined by Tucker (1991) is a measure of the amount of pore space in a porous medium and can be characterized as total or absolute porosity and effective porosity. Total or absolute porosity refers to the total void space within a porous medium including void space within grains. Effective porosity can be described as the interconnected pore volume in a porous medium (McKinley and Warke 2006). Effective porosity is more closely related to permeability, which can be defined as the ability of a porous medium to transmit fluids. Permeability is related to the shape and size of pores or voids and pore connections throats), and also on the properties of the fluids involved (i.e., capillary forces, viscosity, and pressure gradient). Permeability as calculated from Darcy’s law, in simple terms, is a measure of how easily a fluid of a certain viscosity flows through a porous medium under a pressure gradient (Allen et al. 1988).
Depositional and Diagenetic Characteristics of Geological Porous Media Geological materials exhibit inherited characteristics from their depositional, compaction, and cementation or crystallization history (McKinley and Warke 2006). This results in individual pores or voids which vary in size, shape, and arrangement, directing the movement of fluids, air, and accompanying materials along preferred pathways and at differential rates. Natural porous medium such as rocks seldom retain their original porosity (McKinley and Warke 2006). The key factors that control porosity and permeability in geological porous medium are primary depositional characteristics (including fabric features) and diagenetic features such as cements and salts (Worden 1998). Primary depositional processes produce fabric characteristics, which in turn may be further modified by processes such as compaction and cementation. This results in primary and secondary porosity. Primary porosity is developed as a sediment is deposited and includes inter- and intraparticle/granular porosity (Tucker 1991). Secondary porosity of a porous medium develops during diagenesis by dissolution or removal of soluble material and through tectonic movements producing fracturing. Fractures or vugs within geological materials may contribute
1091
substantially to the flow capacity (i.e., permeability properties) but contribute little to absolute porosity of a porous medium. Secondary diagenetic precipitation, in the form of cements and salts, has the potential to seal fractures or vugs and reduce the permeability properties of a porous medium. The predominant cements, which affect sandstone porous media, comprise carbonates, clay minerals, and quartz cements. Aspects of carbonate, quartz and clay cementation in sandstones have been comprehensively covered in three special publications (Morad 1998; Worden and Morad 2000, 2001). The variability of porosity in limestone porous media tends to be more erratic in type and distribution than for sandstones (Tucker 1991). Based on the seminal classification scheme by Choquette and Pray 1970, porosity types in carbonate porous media can be defined as fabric selective, depending on whether pores are defined by the fabric (grains and matrix) of the limestone (e.g., intercrystalline), and non-fabric selective, porosity that cuts across the actual rock fabric (e.g., fracture porosity). Stylolites in carbonate porous media can form a type of porosity in terms of acting as conduits for fluid movement or conversely styolites can produce a reduction in porosity of the porous medium through the accumulation of clays and insoluble residue. Porosity in crystalline porous media, including igneous and metamorphic, occurs generally as a result of fracturing, granular decomposition or dissolution, and may be accentuated by mineral alignment or banding.
Heterogeneity of Porous Medium No naturally occurring porous medium is homogeneous but some are relatively more homogeneous than others. The problem of nonuniformity or heterogeneity of a porous medium is inherent even at the pore scale, since individual pores vary in size, shape, and arrangement. Small-scale features such as pore geometry or laminae also affect the heterogeneity of porous medium. By definition a reservoir porous medium, such as sandstone, must contain pores or voids to enable the storage of oil, gas, and water, which must be connected in some way to permit the movement of the same. Fluids of differing physical properties (oil, gas, and water) move through heterogeneous porous media via preferred pathways and at differential rates. Primarily this is due to the variability in porosity, permeability, poro–perm relationship, viscosity, tortuosity, interfacial tension (wettability), and spatial continuity of the porous medium. Knowledge of how the petrophysical properties of porosity and permeability vary throughout materials and identifying the processes responsible for their reduction is important in accurate reservoir modeling of porous media.
P
1092
The challenge to studies of porous media description, and intrinsically of heterogeneity, can be summarized as to: • Describe the geology with as much detail and as realistically as possible and in a quantitative approach. • Assess the spatial distribution of heterogeneities including cements, salts, contaminants, etc., and their effect on fluid flow. • Subdivide the porous medium into hydraulic or flow units (units that under a given driving force behave differently and in which the variation of properties is less than the hydraulic or flow units above and below). In addition, studies must go beyond data capture to using this information in the development of mathematical simulation models that depict porous medium behavior accurately and effectively, all of which combines to give a greater understanding of fluid flow processes in a porous medium to optimize recovery and aquifer capabilities. The issues which arise in porous medium description are well documented. The recurrent themes are: • The detail required to adequately quantify the lateral variability of physical properties far surpasses the detail of sampling (interpolation, extrapolation, and informed interpretation are necessary). • Many heterogeneities have a spatial distribution much less than the average well or sample spacing. • Heterogeneity at small scale is homogenized, simplified, or smoothed over within mathematical modeling. • Averaging of heterogeneity data for modeling purposes can lead to underestimation of their effect on porosity and permeability.
Conclusions A porous medium contains spaces or voids that, when connected in some way, permit the movement of fluids, air, and associated materials. Porous media tend to be heterogeneous within a larger ordered framework, as a function of scale. At the pore and microscale, the sorting and packing of grains can be markedly variable. Random variations or heterogeneous elements at the pore scale may be sufficiently small to be considered homogeneous at a larger scale, e.g., lamina, stratum, or bedding scale. Porous media, therefore, may appear to be both heterogeneous and homogeneous: depending on the scale of measurement being considered. In view of this it is essential that the scale of measurement being considered be clearly stated. The importance of depositional history in naturally occurring porous media and
Power Spectral Density
diagenetic heterogeneities including the influence of cementation and the diagenetic growth of minerals on porous media need to be fully assessed and recognized in any modeling approach.
Cross-References ▶ Porosity
Bibliography Allen D, Coates G, Ayoub J, Carroll J (1988) Probing for permeability: an introduction to measurements with a Gulf Coast case study. Tech Rev 36(1):6–20 Choquette PW, Pray LC (1970) Geologic nomenclature and classification of porosity in sedimentary carbonates. AAPG Bull 54:207–250 McKinley JM, Warke P (2006) Controls on permeability: implications for stone weathering. Geol Soc Lond Spec Publ 271:225–236. https://doi.org/10.1144/GSL.SP.2007.271.01.22 Morad S (ed) (1998) Carbonate cementation in sandstones. International Association of Sedimentologists. Special Publication 26. Blackwell Science, Oxford Tucker ME (1991) Sedimentary petrology an introduction. Blackwell Scientific Publications, Oxford Worden RH (1998) Dolomite cement distribution in a sandstone from core and wireline data: the Triassic fluvial Chaunoy Formation, Paris Basin. Geological Society, London, Special Publications 136 (1):197–211. https://doi.org/10.1144/gsl.sp.1998.136.01.17 Worden RH, Morad S (eds) (2000) Quartz cementation in sandstones. International Association of Sedimentologists, Special Publication, vol 29. Backwell Science, Oxford Worden RH, Morad S (eds) (2001) Clay cementation in sandstones. International Association of Sedimentologists, Special Publication, vol 34. Blackwell Science, Oxford
Power Spectral Density Abhey Ram Bansal1 and V. P. Dimri2 1 Gravity and Magnetic Group, CSIR- National Geophysical Research Institute, Hyderabad, India 2 CSIR- National Geophysical Research Institute, Hyderabad, Telangana, India
Definition The power spectral is used in many branches of earth sciences to carry out various mathematical operations on geophysical data to understand various earth processes and earth structure. The power spectral density is a representation of power of the signal in the frequency domain. The representation of data in the frequency domain found many applications.
Power Spectral Density
1093
Introduction
N1
Xm ¼ ð1=NÞ
The geophysical measurements carried out in space/time represent the variations of physical measures in space/time. Exploration geophysics aims to find information about the subsurface distribution of physical properties from the surface/air/ship/satellite measurements. The distribution of physical properties in the subsurface is further interpreted to find the anomalous bodies containing hydrocarbons, minerals, water or to understand the crustal structure of the Earth. These measurements are often carried out at equal intervals along the roads (profile mode) or in a grid pattern (2-D survey), depending on the study’s aim. The study of variation with space and time is carried out to understand the subcrustal structure or seismic source nature. The transformation from space/time to frequency domain is often useful since many mathematical operations in the frequency domain become easy to carry out and further useful to find the physical properties of the subsurface of the Earth. Space/time series can be decomposed in the sum/integral of the harmonic waves of different frequencies, known as Fourier analysis or Fourier transform (FT). To carry out the FT, the time/space series should be of limited length, periodic, or satisfying the condition of absolute integrability, which can be defined as: if x(t) is a continuous function of space/time, then integration of the square of the time/space series is finite (Bath 1974):
xi ejð2pmi=NÞ for m ¼ 0, . . . , N 1
ð4Þ
i¼0
The quantity Xm is a complex number having real and imaginary components. Most of the time, we use power spectral density (P(k)), which is defined as the square of amplitude of the DFT: N1
PðkÞ ¼ ð1=NÞ
jXm j2
ð5Þ
m¼0
The calculation of power spectral density from the data using the DFT is called the periodogram method. Cooley and Tukey (1965) introduced a fast algorithm to calculate the Fourier transform, called the fast Fourier transform (FFT). The other popular method, known as the Blackman and Tukey method (Blackman and Tukey 1958), calculates the power spectral density from the autocorrelation function based on Wiener-Khintchine theorem. Blackman and Tukey’s approach is also useful for estimating power spectral density of random distribution when inequality condition of Eq. (1) is not met. The other high-resolution methods, e.g., Maximum Entropy Method (MEM) (Burg 1967) and Maximum Likelihood Method (MLM) (Capon et al. 1967), and multi-Taper Method (MTM) (Thomson 1982) can also be applied to calculate the power spectral density (Dimri 1992).
1
jxðtÞj2 dt < 1
ð1Þ
Application of Power Spectral Density (PSD)
1
A continuous time/space series can be described in the FT: 1
xðtÞ eðjktÞ dt
XðkÞ ¼
ð2Þ
1
where k is wavenumber ¼ 2π f and f ¼ frequency. The advantage of FT is that we can carry out some operations in the frequency domain, which can be further changed back to space/time series using the formula of inverse Fourier transformation: 1
XðkÞ eðjktÞ dk
xðtÞ ¼ ð1=2pÞ
ð3Þ
1
If x(t) is not a continuous function but has a finite number values (x0, x1, x2,. . ., xN1) at finite time i ¼ 0 to i ¼ N1 having sampling interval Δt, then the Fourier transform Xm at discrete frequencies fm ¼ m/NΔt is called the discrete Fourier transform (DFT):
Correlation of Space/Time Series The power spectral density is also useful for understanding the behavior of a time/space series. The random and uncorrelated time/space series are also known as white noises. The white noises have contents of all frequencies, e.g., the slope of a fitted straight line in the plot of log frequency/ wavenumber versus plot of power spectral density equals zero. The white noise series’ autocorrelation has only a finite value at zero lag and zero values at other lags. However, a time/space series are often not white noises but are correlated and known as scaling noises. The power spectral density of scaling noises is frequency-dependent. The slope of a straight line fitted to the log of frequency/wavenumber versus the log of power spectral density defines the series’ correlation. The higher the value of slope higher is the correlation of time/ space series. Recent measurements from the boreholes have shown the distribution of physical properties in the subsurface as scaling (Bansal and Dimri 2014). The white noise distribution assumption is away from reality and it was assumed so because of mathematical simplicity and unknown information about detailed distribution of physical properties within the
P
1094
Power Spectral Density
crust. The scaling noises find many applications in the interpretation of geophysical data, e.g., depth estimation from gravity and magnetic data, regional residual separation of gravity and magnetic measurements, inversion of geophysical data, predictive deconvolution, ore deposits, and earthquake distribution in space and time (Turcotte 1997). The scaling noises’ power spectral density is expressed as P(k) / kβ, where P(k), k, and β are power spectral density, wavenumber, and scaling exponent. The value of the scaling exponent equal to zero represents the white noise. The scaling exponent can be further related to the fractal dimension (D), valid for β between 1 and 3. b ¼ 5 2D
ð8Þ
Depth Estimation from Gravity and Magnetic Data The estimation of the depth of anomalous sources from the gravity/magnetic measurements in spectral methods expressed for a white noise distribution of sources as (Spector and Grant 1970): PðkÞ ¼ Ae2dk
The white noise distribution of sources provides an overestimation of anomalous sources’ depths (Bansal and Dimri 2001, 2014). The scaling noise distribution of sources resulted in the modification of the above equation as: PðkÞ ¼ Bk b e2kd
The scaling noises can be further related to fractional Gaussian and Brownian walks. The scaling noises are also useful for defining a self-similar or self-affine fractal (Turcotte 1997).
Estimation of Density/Susceptibility Distribution from Gravity and Magnetic Observations
ð9Þ
ð10Þ
The depth estimations for the scaling distribution of sources are found close to reality. The scaling distribution of sources found many applications in estimating the depth of anomalous sources from gravity and magnetic measurements from different parts of the world.
Upward and Downward Continuation The Fourier series analysis of space/time series is useful to solve some of the mathematical functions like convolution/ deconvolution. For a linear system, the convolution operator is useful to find an output (y(t)) from the input signal (x(t)) and earth’s response (h(t)). The convolution operator for a discrete case is defined: yt ¼
xi hti
ð6Þ
i
These convolution operators exist almost in all geophysical studies. We want to study the output by sending a known input, e.g., estimating gravity/magnetic potential from the density distribution and structural function. In the frequency domain, the convolution of gravity/magnetic potentials (Eq. 6) is converted in the form of multiplication using the well-known convolution theorem as: Ug ðf Þ ¼ rðf Þ Rðf Þ
ð7Þ
where Ug (f) is the Fourier transform of the gravity/magnetic potential, r(f) is the Fourier transform of the density/susceptibility distribution, and R(f) is the Fourier transform of Geometric function. In other words, the Fourier transform of gravity/magnetic potential is a combination of two multiplicative factors: (1) a function of depth and thickness, and (2) a function of distribution of density/susceptibility (Blakely 1995).
The gravity and magnetic measurements are carried out on the Earth’s surface, at a ship on the sea, air-born surveys at sea and land, or measured at satellite elevation. These measurements are at different heights. To bring these measurements at the same elevations, frequently upward continuation of the measured data is carried out. The other advantage of upward continuation is attenuating nearsurface anomalies and enhancing the more resonant anomalies at the expense of near-surface anomalies. The downward continuation is carried out to enhance the effect of the shallower bodies as compared to the upward continuation. The upward continuation is smoothing operation whereas downward continuation is unsmoothing operation (Blakely 1995). A small noise in the data causes large variations during the downward continuation. The computation of upward continuation is complicated in the space domain where these become multiplication with the exponential term and Fourier transform of the measured field. Again we can bring back the field in space domain by taking inverse transforms.
Summary and Conclusions The power spectral density found wide applications to understand the correlation of time series, estimation of density/ susceptibility distribution within the crust, depth estimation
Predictive Geologic Mapping and Mineral Exploration
from gravity and magnetic data, and downward and upward continuation of potential field data.
Bibliography Bansal AR, Dimri VP (2001) Depth estimation from the scaling power spectral density of nonstationary gravity profile. Pure Appl Geophys 158:799–812. https://doi.org/10.1007/PL00001204 Bansal AR, Dimri VP (2014) Modelling of magnetic data for scaling geology. Geophys Prospect 62(2):385–396 Bath M (1974) Spectral analysis in geophysics. Elsevier Scientific, Amsterdam, p 563 Blackman RB, Tukey JW (1958) The measurement of power spectra from the point of view of communication engineering. Dover Publications, New York, p 190 Blakely RJ (1995) Potential theory in gravity & magnetic applications. Cambridge University Press Burg JP (1967) Maximum entropy spectral analysis, presented at the 37th annual international SEG meeting November 1, in Oklahoma City Capon J, Greenfied RJ, Kolker RJ (1967) Multidimensional maximumlikelihood processing of a large aperture seismic array. Proc IEEE 55: 192–211 Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19:297–301 Dimri VP (1992) Deconvolution and inverse theory: application to geophysical problems. Elsevier, Amsterdam Spector A, Grant FS (1970) Statistical model for interpreting aeromagnetric data. Geophysics 35(2):293–302 Thomson DJ (1982) Spectrum estimation and harmonic analysis. Proc IEEE 70:1055–1096 Turcotte DL (1997) Fractals and chaos in Geology ans Geophysics, 2nd edition, Cambridge University press
Predictive Geologic Mapping and Mineral Exploration Frits Agterberg Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Definition During the nineteenth century and most of the twentieth century, the geology map was the most important tool for mineral exploration. More recently, the emphasis to find new mineral deposits has shifted to the use of geophysical and geochemical methods. Nevertheless, the geologic map augmented by remote sensing at the surface of the Earth and the use of exploratory boreholes remains indispensable for target selection in mineral exploration. In this chapter, the emphasis will be on lesser known but potentially powerful new methods of predictive geologic mapping for mineral exploration, because the subject is also covered in other chapters.
1095
Introduction The first geologic map was published in 1815. This feat has been well documented by Winchester (2001) in his book: The Map that Changed the World – William Smith and the Birth of Modern Geology. According to the concept of stratigraphy, strata were deposited successively in the course of geologic time, and they are often uniquely characterized by fossils such as ammonites that are strikingly different from age to age. Additionally, there are different types of igneous and metamorphic rocks. Geologic maps turned out to be essential tools for mineral exploration. For example, during the nineteenth century, coal became important in England and other countries. Its occurrence is largely restricted to the Carboniferous Period, which occurs in many different countries and is readily recognizable in outcrops at the surface of the Earth, although it covers only a small portion of the total surface area in the regions where it occurs. Today different countries in the world have fairly complete geological maps synchronized according to international standards (cf. portal. onegeology.org). It is remarkable that the usefulness of geologic maps remained virtually unknown until approximately 1800. There are many other examples of geologic concepts that became only gradually accepted more widely, although they usually had been proposed much earlier in one form or another by individual scientists. The best-known example is plate tectonics: Alfred Wegener (1966) had demonstrated the concept of continental drift fairly convincingly as early as 1912 but this idea only became acceptable in the early 1960s. One reason that this theory initially was rejected by most geoscientists was the lack of a plausible mechanism for the movement of continents. It is important to keep in mind that geologic maps are two-dimensional and our ability to extrapolate downward from the surface of the Earth to create threedimensional realities is severely limited. Rocks were formed at different times by means of different processes. Thus, a single internationally accepted geologic time scale is important (cf. Gradstein et al. 2020) in addition to sound geoscientific theory. The following two examples illustrate some of the difficulties originally and currently experienced in creating 3D geologic realities by downward extrapolation using data observed at the surface of the Earth. Harrison (1963) has pointed out that, although the outcrops in an area remain more or less the same in the immediate past, a geologic map constructed from them can change significantly over time when new geoscientific concepts become available. A striking example is shown in Fig. 1. Over a 30-year period, the exposed bedrock in this study area on the Canadian Shield that was repeatedly visited by geologists had remained nearly unchanged. However, the geologic maps and their legends created by different geologists are entirely different. These discrepancies reflect
P
1096
Predictive Geologic Mapping and Mineral Exploration
Predictive Geologic Mapping and Mineral Exploration, Fig. 1 Two geologic maps for the same area on the Canadian Shield published at different times. As pointed out by Harrison (1963), between 1928 (a) and 1958 (b) there was development of conceptual ideas about
the nature of metamorphic processes that resulted in geologists constructing different maps based on same observations (Source: Agterberg 2014, Fig. 1)
changes in the state of geologic knowledge at different points in time. In another example (Fig. 2), according to Nieuwenkamp (1968), a typical downward extrapolation problem is shown with results strongly depending on initially erroneous theoretical considerations. In the Kinnehulle area in Sweden, the tops of the hills consist of basaltic rocks; sedimentary rocks are exposed on the slopes and granites and gneisses in the valleys. The first projections into depth for this
area were made by a prominent geologist (Von Buch 1842). It can be assumed that today most geologists would quickly arrive at the third 3D reconstruction (Fig. 2c) by Westergård et al. (1943). At the time of von Buch, it was not yet common knowledge that basaltic flows can form extensive flat plates within sedimentary sequences. The projections in Fig. 2a and b reflect A.G. Werner’s pansedimentary theory of “Neptunism” according to which all
Predictive Geologic Mapping and Mineral Exploration
1097
Predictive Geologic Mapping and Mineral Exploration, Fig. 2 Geological observations made at the surface of the Earth have to extrapolated downwards in order to obtain a full 3-dimensional representation of reality. This can be an error prone process as pointed out by Nieuwenkamp (1968). Sections a and b for an area in Sweden are modified after von Buch (1842) illustrating his genetic interpretation that
was based on combining Werner’s original theory of Neptunism with von Buch’s own firsthand knowledge of volcanoes including Mount Etna in Sicily with a basaltic magma chamber. He was not aware of the fact that basaltic flows can be widespread covering large areas, the genetic model used by Westergård et al. (1943) for constructing Section c (Source: Agterberg 1974, Fig. 1)
rocks on Earth were deposited in a primeval ocean. Nieuwenkamp (1968) demonstrated that this theory was related to the philosophical concepts of F.W.J. Schelling and G.W.F. Hegel. When Werner’s view was criticized by other early geologists who assumed processes of change took place during geologic time, Hegel publicly supported Werner by comparing the structure of the Earth with that of a house. One looks at the complete house only, and it is trivial that the basement was constructed first and the roof last. Initially, Werner’s conceptual model provided an appropriate and temporarily adequate classification system, although von Buch, who was a student of Werner, rapidly ran into problems during his attempts to apply Neptunism to explain the occurrences of rock formations in different parts of Europe. For the Kinnehulle area (Fig. 2), von Buch assumed that the primeval granite became active later changing sediments into gneisses, while the primeval basalt became a source for hypothetical volcanoes. Probably the most important usage of geologic maps is that they can be used to delineate target areas for the discovery of new mineral deposits by drilling boreholes in places with a relatively high probability of success in such discovery. In addition to the geologic maps, frequent use is made of remote sensing and geophysical and geochemical data. These types of information also can be used to estimate the mineral potential of larger areas, i.e., total number of undiscovered mineral deposits of different types. In the following sections, the
topics to be discussed are (1) the use of multivariate statistical methods to estimate the probability of occurrence of mineral deposits; (2) use of jackknife method to reduce bias instead of defining “control” areas; (3) weights-of-evidence (WofE) method; and (4) singularity analysis. Topics (3) and (4) are well covered in other chapters and will only be briefly discussed in this chapter.
Mineral Potential Maps An early example of regional resource prediction is shown in Fig. 3 (from Agterberg and David 1979). Amounts of copper in existing mines and prospects had been related to lithological and geophysical data in the Abitibi area on the Canadian Shield to construct copper and zinc potential maps (Agterberg et al. 1972). During the 1970s there was extensive mineral exploration for these metals in this region that resulted in the discovery of seven new large copper deposits shown in black on Fig. 3. This example illustrates some of the advantages and drawbacks of the application of statistical techniques to predict regional mineral potential from geoscience map data. The later discoveries in the Abitibi area fit in with the prognostic contour pattern that was based on the earlier (pre-1968) discovered copper deposits (open circles in Fig. 3). The prognostic contours in Fig. 3 are for the estimated number of (10 10 km) “control cells” with known copper
P
1098
Predictive Geologic Mapping and Mineral Exploration
Predictive Geologic Mapping and Mineral Exploration, Fig. 3 Copper potential map, Abitibi area on the Canadian Shield. Contours and deposit locations are from Agterberg et al. (1972). Contour
value represents expected number of (10 km 10 km) cells per (40 km 40 km) unit area containing one or more copper ore deposits (Source: Agterberg and David 1979)
deposits within 16 times larger (40 40 km) cells. When the contour map was constructed in 1972, there were 27 control cells. The probability that any (10 km 10 km) cell in the study area would contain one or more mineable copper deposits was assumed to satisfy a Bernoulli random variable with parameter p. Any contour value on the map therefore can be regarded as the mean x ¼ n p of a binomial distribution with variance n p (1-p) where n ¼ 16. The corresponding amount of copper would be the sum of x values drawn at random from the exceedingly skewed size-frequency distribution for amounts of copper in the 27 control cells. The resulting uncertainty for most contour values therefore was and remains exceedingly large. The probabilities p for the 644 (10 km 10 km) cells used for the example of Fig. 3 were estimated by stepwise multiple regression (Draper and Smith 1966). The value of the dependent variable for the 27 control cells out of the 644 (10 km 10 km) cells used was set equal to the logarithm (base 10) of the total amount of copper per control cell and equal to zero for the (644–27 ¼) 617 cells without, known in 1968, copper deposits. A total of 55 geological and geophysical variables were used as independent variables in this application of which 26 were retained after using the stepwise regression method. Figure 4 illustrates that simple binary combinations of the independent variables used can already show a pattern that is positively correlated with the pattern of the control cells. Initial estimates of the dependent variable were biased because of probable occurrences of undiscovered deposits in many of the (10 km 10 km) cells that had been less explored. This bias was corrected by multiplying all initial
estimates by a factor F ¼ 2.35 representing the sum of all observed values divided by the sum of the corresponding initial estimates within a well-explored area consisting of 50 cells within the Timmins and Noranda-Rouyn mining camps jointly used as a “control area.” The total amount of mined or mineable copper contained in the entire study area (used to construct Fig. 3) amounted to 3.12 million tons. Multiplication of this amount by F ¼ 2.35 gave 7.33 million tons as an estimate of the total amount of copper in the entire study area. After 9 years of intensive exploration subsequent to the construction of the contours in Fig. 3, 5.23 million tons of this hypothetical total had been discovered (cf. Agterberg and David 1979). The metal potential maps published by Agterberg et al. (1972) created a significant amount of discussion in the scientific literature of which three publications will be briefly discussed here. Tukey (1972) suggested that, since probabilities cannot be negative or greater than 1, logistic regression should be considered as an alternative to using the general linear model in studies of this type. This suggestion was followed up in Agterberg (1984) for (A) copper in volcanogenic massive sulfide deposits and (B) nickel in magmatic nickel-copper deposits in a larger part of the Canadian Shield that includes the area used for Fig. 3 (see Fig. 5). The two types of copper deposits (A and B) were analyzed separately because they have different origins (cf. Pirajno 2010). Within the smaller Abitibi area, the pattern in Fig. 5a resembles that of the original copper potential map (Fig. 3). This is because the number of volcanogenic massive sulfide deposits significantly exceeds the number of magmatic nickel-copper deposits in this region. The general linear model failed to
Predictive Geologic Mapping and Mineral Exploration
1099
Predictive Geologic Mapping and Mineral Exploration, Fig. 4 The Abitibi area on the Canadian Shield in the provinces of Ontario and Quebec had been subdivided into 643 (10 km 10 km) cells. Top diagram (a) shows 27 copper-deposit rich cells with estimated tonnages of copper (rounded values of logarithm (base 10)). Bottom
diagram (b) shows cells with co-occurrence of acid volcanics and Bouguer anomaly above average. The two patterns are strongly correlated indicating that copper deposits tend to occur in places where acid volcanics occur on top of basic volcanics with relatively high specific gravity (Source: Agterberg 1974, Fig. 119)
produce meaningful mineral potential contours for the relatively few magmatic nickel-copper deposits that are closely associated with mafic and ultramafic intrusions on the Canadian Shield (Agterberg 1974). This was because relatively many estimated probabilities fell out outside the [0,1] interval. Thus, for relatively rare types of deposits, logistic regression produces better results than ordinary multiple regression. Wellmer (1983) pointed out that one of the seven new discoveries shown in Fig. 3 is the Timmins nickel-copper deposit with 3,500,000 tons of ore. It is a magmatic nickelcopper deposit and not a volcanogenic massive sulfide deposit discovered in 1976 that occurs on the westernmost peak in Fig. 3 which was delineated in 1972, relatively far away from any earlier discovered copper deposits in the Abitibi area. Since this peak is not clearly delineated in Fig. 5b obtained by logistic regression in 1974, this indicates that the general linear model locally can produce better results than logistic regression. Its final result can be regarded simply as the sum of 27 separate copper potential evaluations each based on
information for a single control cell only. This property of the general linear model is important because it allows the estimation of probable occurrences of mineral deposits that are of different genetic types. Finally, in another discussion paper, Tukey (1984) pointed out that on the peaks of maximal copper potential in Fig. 3, the known copper deposits tend to occur on the flanks of the peaks instead of in their centers. This observation was confirmed by a statistical significance test Agterberg (1984), but a geoscientific explanation of this feature has not yet been found.
Bias and the Jackknife In order to compare various geoscientific trend surface and kriging applications with one another, Agterberg (1970) had randomly divided the input data set for a study area used for example into three subsets: Two of these subsets were used for control and results derived for the two control sets were
P
1100
Predictive Geologic Mapping and Mineral Exploration
that in the previous section was solved by defining a control area consisting of well-explored mining camps (bias factor F ¼ 2.35 in Abitibi area). For cross-validation it has become common practice to leave out one data point (or a small number of data points) at a time and fit the model using the remaining points to see how well the reduced data set does at the excluded point (or set of points). The average of all prediction errors then provides the cross-validated measure of the prediction error. Cross-validation, the jackknife, and bootstrap are three techniques that are closely related. Efron (1982), Chapter 7) discusses their relationships in a regression context pointing out that, although the three methods are close in theory, they generally yield different results in practical applications. The following application was previously described in Agterberg (2021). For a set of n independent and identically distributed (iid) data, the standard deviation of the sample mean ðxÞ satisfies n
sðxÞ ¼
ðx xÞ2 i¼1 i nðn1Þ
. This result cannot be extended to other
estimators such as the median. However, the jackknife and bootstrap can be used to make this type of extension. Suppose i xi ¼ nxx n1 represents the sample average of the same data set but with the data point xi deleted. Let xJK represent the mean of the n new values xi. The jackknife estimate of the standard deviation then is sJK ¼ ðn1Þ n ðx x Þ , and it is easy to show that xJK ¼ x and sJK ¼ sðxÞ. The jackknife was originally invented by Quenouille (1949) under another name and with the purpose to obtain a nonparametric estimate of bias associated with some types of estimators. Bias can be formally defined as BIAS n
i¼1
EF # F
Predictive Geologic Mapping and Mineral Exploration, Fig. 5 Occurrences of (a) copper-zinc deposits, and (b) magmatic nickel-copper deposits in (40 km 40 km) cells on part of the Precambrian Canadian Shield. was used to estimate The underlying probabilities for (10 km 10 km) cells were estimated by logistic instead of linear regression for an area larger than that used for Fig. 4. Note the similarity of pattern for the central part of (a) with the pattern of Fig. 4. Logistic regression gave better results than linear regression for relatively few of the nickel-copper deposits of (b) (Source: Agterberg 1984, Fig. 3)
then applied to a third “blind” subset in order to see how well results for the control subsets could predict the values in the third subset. In his comments on this approach, Tukey (1970) stated that this form of cross-validation could indeed be used but a better technique would be to use the then newly proposed Jackknife method. The following brief explanation of this method is based on Efron (1982). In geoscientific mineral potential applications, it can help to eliminate bias due to great variations in the regional intensity of exploration, a problem
i
JK
2
#ðFÞ where EF denotes expectation under the
assumption that n quantities were drawn from an unknown probability distribution F; # ¼ # F is the estimate of a parameter of interest with F representing the empirical probability distribution. Quenouille’s bias estimate (cf. Efron 1982, p. 5) is based on sequentially deleting values xi from a sample of n values to generate different empirical probability distributions Fi each based on (n-1) values resulting in the # estimates #i ¼ # Fi . Writing # ¼ n , Quenouille’s bias estimate then becomes ðn 1Þ # # , and the bias~ ¼ n# ðn 1Þ# . corrected “jackknifed estimate” of # is # This estimate is either unbiased or less biased than #. n
i¼1 i
Examples of Jackknife Applications in Regional Mineral Potential Estimation A very simple example of application to a hypothetical mineral resource estimation problem is as follows. Suppose that half of a study area consisting of 100 equal-area cells is wellexplored and that two mineral deposits have been discovered
Predictive Geologic Mapping and Mineral Exploration
in it. This 50-cell “control area” can be used to estimate the number of undiscovered deposits in the 50-cell “target area” representing the other half of the study area that is relatively unexplored. Suppose further that the mineral deposits of interest are contained within a “favorable” rock type that occurs in 5 cells of the control area as well as in 5 cells of the target area. Because 2 of these 5 cells in the control area are known to contain a known ore deposit of interest, it is reasonable to assume that the target area would contain 2 undiscovered deposits as well. Thus, the entire study area probably contains 4 deposits. Obviously, it would not be good to assume that it contains only two deposits which constitutes a severely biased estimate. Application of the “leave-one-out” jackknife method to the 50-cell study area results in two biased estimates that are both equal to a single deposit. This also translates into the biased prediction of two deposits in the entire study area. By using the jackknife theory summarized in the preceding section, it is quickly shown that the jackknifed bias in this biased estimate also is equal to two deposits. The use of the jackknife for the
1101
study area, therefore, results in four deposits of which the two undiscovered deposits would occur within the cells with favorable rock type in the target area. A second, more realistic example is shown in Fig. 6. The top diagram (Fig. 6a) is for the same study area that was used for Fig. 3, but the contour values in Fig. 6 are for smaller (30 km 30 km) squares. The bottom diagram (Fig. 6b) shows a result obtained by using the jackknife method that produces nearly the same pattern. For both Figs. 3 and 6a, the contour values were obtained by multiple regression in which the dependent variable was the amount of copper per cell and the explanatory variables consisted of lithological composition and geophysical data as discussed previously. For the application of the jackknife (Agterberg 1973), the 35 control cells in the study area were divided into 7 groups each consisting of 5 cells, and these groups were deleted successively to obtain the 7 biased estimates required, with the final jackknife estimate shown in Fig. 6b, which is almost the same as Fig. 6a. The fact that two different approaches produced similar results indicates that both
P
Predictive Geologic Mapping and Mineral Exploration, Fig. 6 (a and b) Comparison of copper potential maps for (30 km 30 km) cells in the Abitibi area derived by (a) ordinary multiple regression using a
“control area” with F ¼ 1.828, and (b) the jackknife method without the assumption of existence of a “control area” (Source: Agterberg 1973, Fig. 4)
1102 Predictive Geologic Mapping and Mineral Exploration, Table 1 Comparison of ten (10 km 10 km) copper cell probabilities in Abitibi area (for location coordinates, see Agterberg et al. 1972)
Predictive Geologic Mapping and Mineral Exploration Location 32/62 16/58 17/58 18/58 16/59 17/59 18/59 16/60 17/60 18/60
p(original) 0.45 0.33 0.39 0.01 0.35 0.33 0.37 0.43 0.06 0.03
s.d.(1) 0.50 0.47 0.49 0.10 0.48 0.47 0.48 0.50 0.24 0.17
p(jackknife) 0.44 0.32 0.38 0.00 0.36 0.33 0.40 0.47 0.00 0.06
s.d.(2) 0.08 0.04 0.05 0.06 0.04 0.14 0.13 0.04 0.06 0.08
Predictive Geologic Mapping and Mineral Exploration, Fig. 7 Expected total amounts of copper for ore-rich cells predicted in Fig. 6a. Contour values are logarithms (base 10) for (30 km 30 km) unit cells (Source: Agterberg 1973, Fig. 5a)
methods of prediction of undiscovered copper resources are probably valid. The two methods also yield similar estimates for the probabilities of cells as is illustrated in Table 1 (see Agterberg 2014, for the exact locations of these cells). Standard deviations equal to {p (1 – p)}^0.5 estimated by the jackknife method are shown in the last column of Table 1. It is noted that one of the estimated jackknife probabilities is negative, although it is not significantly less than 0. Problems of this type can be avoided by using logistic regression. However, the general linear model used to estimate probabilities in this kind of application can have relative advantages (cf. Agterberg 2014).
Estimating Total Amounts of Metal from Mineral Potential Maps In a study described in Agterberg (1973), the Abitibi study area contained 35 (10 km 10 km) cells with one or more large copper deposits. Two types of multiple regressions were carried out with the same explanatory variables. First the
dependent variable was set equal to 1 in the 35 control cells, and then it was set equal to a logarithmic measure (base 10) of short tons of copper per control cell. Suppose that estimated values for the first regression are written as Pi and those for the second regression as Yi. Both sets of values were added for overlapping square blocks of cells to obtain estimates of expected values (30 km 30 km) unit cells. In Fig. 7 the ratio Yi* ¼ Yi/ Pi is shown as a pattern that is superimposed on the pattern for the Pi values only. The values of Yi* cannot be estimated when Yi and Pi are both close to zero. Little is known about the precision of Yi* for Pi 0.5. These values (Yi*) should be transformed into estimated amounts of copper per cell here written as Xi. Because of the extreme positive skewness of the size-frequency distribution for amounts of copper per cell (Xi), antilogs (base 10) of the values of Yi* as observed in control cells only were multiplied by the constant c ¼ Xi = 10Y i in order to reduce bias under the assumption of approximate lognormality. The pattern of Fig. 7 is useful as a suggested outline of subareas where the largest volcanogenic massive sulfide deposits are more likely to occur.
Predictive Geologic Mapping and Mineral Exploration
Weights-of-Evidence Modeling Weights-of-evidence modeling is a technique to predict the occurrence of discrete events such as mineral deposits, landslides, or earthquakes from digital geoscience map data. The statistical approach in this method was originally based on the approach taken by Spiegelhalter and Knill-Jones (1984) in their medical GLADYS expert system (cf. Agterberg 1989). WofE became widely used by mineral exploration companies and other organizations after the publication of BonhamCarter (1994)’s book entitled Geographic information systems for geoscientists: modeling with GIS. Since the approach is covered in other chapters, the discussion in this section will be restricted to basic principles and a single example of an early application to gold deposits in Nova Scotia (BonhamCarter et al. 1988). Suppose that occurrences of mineralization, coded as 1 for presence and 0 for absence in a map pattern called D, are to be related to a single map layer that is available in digital form. When there is a single map pattern B, the odds O(D|B) for the occurrence of mineralization if B is present is given by the ratio of the following two expressions of Bayes’ rule: PðBjDÞPðDÞ ÞPðDÞ PðDjBÞ ¼ PðBjD ; P DjB ¼ where the set D PðBÞ PðBÞ represents the complement of D. Consequently, ln O(D| B) ¼ ln O(D) þ W+where the positive weight for Þ presence of B is W þ ¼ ln PPðBjD . The negative weight for ðBjDÞ P BjD ð Þ the absence of B is W ¼ ln P BjD . When there are two map ð
Þ
ÞPðDÞ patterns: PðDjB \ CÞ ¼ PðB\CjD ; P DjB \ C ¼ PðB\CÞ PðB\CjDÞPðDÞ . Conditional independence of D with respect to PðB\CÞ
B and C implies: P(B \ C| D) ¼ P(B| D)P(C| D); P B \ CjD ¼ P BjD P CjD . ÞPðCjDÞ Consequently, PðDjB \ CÞ ¼ PðDÞ PðBjD ; P DjB \ C ¼ PðB\CÞ PðBjDÞPðCjDÞ P D . PðB\CÞ
From PðDjB\CÞ PðDjB\CÞ
these ¼
two
PðDÞPðBjDÞPðCjDÞ . PðDÞPðBjDÞPðCjDÞ
equations, it follows that This expression is equivalent to
þ ln OðDjB \ CÞ ¼ ln OðDÞ þ W þ 1 þ W 2 . The posterior logit on the left of this expression is the sum of the prior logit and the weights of the two map layers. The approach is only valid if the map layers are wholly or conditionally independent, and this hypothesis should be tested in applications. The approach can be extended to include other map layers. Various conditional independence tests have been developed (see, e.g., Agterberg 2014). Approximate variances of the weights can be obtained from:
s2 ðW þ Þ ¼
1 1 1 1 þ ; s2 ðW Þ ¼ þ nðbd Þ n bd n bd n bd
where the denominators of the terms on the right sides represent numbers n(..) of unit cells with map layer and
1103
deposit present or absent. Spiegelhalter and Knill-Jones (1984) had used similar asymptotic expressions for variances of weights in their GLADYS expert system. Using p(..) to denote probabilities, the following contingency table can be created: P¼
pðbd Þ p bd
p bd p bd
n bd =n n bd =n
nðbd Þ=n =n n bd
If there is no spatial correlation between deposit points and map pattern: P¼
pðbÞ pðd Þ pðbÞ p p b pðd Þ p b p
d d
where p(b) is the proportion of study area occupied by map layer B and p(d ) is the total number of deposits divided by total area, and with similar explanations for the other proportions. A chi-square test for goodness of fit can be applied to test the hypothesis that there is no spatial correlation between deposit points and map using pattern. It is equivalent to using the z-test to be described in the next paragraph. The variance of the “contrast” C ¼ is: s2 ðCÞ ¼
1 1 1 1 þ þ : þ nðbd Þ n bd n bd n bd
One of the first WofE applications is 68 gold deposits in a study area of 2,591 km2 in Nova Scotia (Bonham-Carter et al. 1988). Weights, contrasts, and their standard deviations for nine map patterns used in Bonham-Carter et al. (1988) for this same study area are shown in Table 2. The last column in this table is the “studentized” value of C, for testing the hypothesis that C ¼ 0. Values greater than 1.96 indicate that this hypothesis can be rejected at a level of significance α ¼ 0.05. For more information on the validity of the conditional independence tests, see Agterberg (2014). For an alternative approach to solve the same type of problem, see Baddeley et al. (2021). Figure 8 shows the posterior probability map using the weights of Table 2. Some of the areas with the largest posterior probabilities contain known gold deposits but others do not. These other areas are of interest for further exploration, because they may contain undiscovered gold deposits (cf. Bonham-Carter et al. 1990). Gold production and reserved figures were available for the 68 gold deposits used as input for this study. Figure 9 is a plot of posterior probability against cumulative area showing gold mines that were producing in 1990 as circles whose radii reflect magnitudes of production in 1990. There appears to be a positive correlation between production and posterior probability.
P
1104
Predictive Geologic Mapping and Mineral Exploration
Predictive Geologic Mapping and Mineral Exploration, Table 2 Weights, contrasts and their standard deviations for predictor map shown in Fig. 8 (Source: Bonham-Carter et al. 1990, Table 5.1) Goldenville Fm Anticline axes Au, biogeochem. Lake sed. Signature Golden-Hal contact Granite contact NW lineaments Halifax Fm. Devonian Granite
W+ 0.3085 0.5452 0.9045 1.0047 0.3683 0.3419 0.0185 1.2406 1.7360
s(W) 0.1280 0.1443 0.2100 0.3263 0.1744 0.2932 0.2453 0.5793 0.7086
W 1.4689 0.7735 0.2812 0.1037 0.2685 0.0562 0.0062 0.1204 0.1528
s(W) 0.4484 0.2370 0.1521 0.1327 0.1730 0.1351 0.1417 0.1257 0.1248
C 1.7774 1.3187 1.1856 1.1084 0.6368 0.3981 0.0247 1.4610 1.8888
s(C) 0.4663 0.2775 0.2593 0.3523 0.2457 0.3228 0.2833 0.5928 0.7195
C/s(c) 3.8117 4.7521 4.5725 3.1462 2.5918 1.2332 0.0872 2.4646 2.6253
Predictive Geologic Mapping and Mineral Exploration, Fig. 8 Posterior probability map corresponding to Table 1 for gold deposits in Meguma Terrane, eastern mainland Nova Scotia (Source: Bonham-Carter et al. 1990, Fig. 1)
Local Singularity Analysis This last section is devoted to a relatively new method called “local singularity analysis” developed by Cheng (1999, 2005, 2008) for geochemical or other data systematically collected across large study areas. Local singularity analysis, which has become important for helping to predict occurrences of hidden ore deposits, is based on the multifractal equation x ¼ c∙ ϵ α-E where x represents the element concentration value, c is a constant, α is the singularity, ϵ is a
normalized distance measure such as square grid cell edge, and E is the Euclidian dimension. The theory of singularity is given in other chapters. Here it will be assumed that E ¼ 2 and that chemical element concentration values are available for small square grid cells of which the spatial autocorrelation function (or variogram) is largely determined by its behavior near the origin, which is difficult to establish by earlier statistical or geostatistical techniques. For a relatively simple discussion of singularity analysis, also see de Mulder et al. (2016).
Predictive Geologic Mapping and Mineral Exploration
Predictive Geologic Mapping and Mineral Exploration, Fig. 9 Posterior probability as shown in Fig. 8 plotted against cumulative area, with producing gold mines shown as circles with radii reflecting magnitudes of reported production (Source: Bonham-Carter et al. 1990, Fig. 4)
In Cheng’s (1999, 2005) original approach, geochemical or other data collected at sampling points within a study area are subjected to two treatments. The first of these is to construct a contour map by any of the methods such as kriging or inverse distance weighting techniques generally used for this purpose. Secondly, the same data are subjected to local singularity mapping. The local singularity α then is used to enhance the contour map by multiplication of the contour value by the factor ϵ α-2 where ϵ < 1 represents a length measure. In Cheng’s (2005) application to predictive mapping, the factor ϵ α-2 is greater than 2 in places where there has been local element enrichment or by a factor less than 2 where there has been local depletion. Local singularity mapping can be useful for the detection of geochemical anomalies characterized by local enrichment even if contour maps for representing average variability are not constructed (cf. Cheng and Agterberg 2009; Zuo et al. 2009).
Gejiu Mineral District and Northwestern Zhejiang Province Examples Where the landscape permits this, stream sediments are the preferred sampling medium for reconnaissance geochemical
1105
surveys concerned with mineral exploration (Plant and Hale 1994). During the1980s and1990s, government-sponsored reconnaissance surveys covering large parts of Austria, the Canadian Cordillera, China, Germany, South Africa, the UK, and the USA were based on stream sediments (Darnley et al. 1995). These large-scale national projects, which were part of an international geochemical mapping project (Darnley 1995), generated vast amounts of data and continue to be a rich source of information. Cheng and Agterberg (2009) applied singularity analysis to data from about 7800 stream sediment samples collected as part of the Chinese regional geochemistry reconnaissance project (Xie et al. 1997). For illustration, about 1000 stream sediment tin concentration values from the Gejiu area in Yunnan Province were used. The study in this first example has an area of about 4000 km2 and contains 11 large tin deposits. Several of these, including the Laochang and Kafang deposits, which are tin-producing mines with copper extracted as a by-product. These hydrothermal mineral deposits also are enriched in other chemical elements including silver, arsenic, gold, cadmium, cobalt, iron, nickel, lead, and zinc. The application to be described here is restricted to arsenic which is a highly toxic element. Water pollution due to high arsenic, lead, and cadmium concentration values is considered to present one of the most serious health problems especially in underdeveloped areas where mining is the primary industry such as in parts of the Gejiu area. Knowledge of the characteristics of the spatial distribution of ore elements and associated toxic elements in surface media is helpful for the planning of mineral exploration as well as environmental protection strategies. The Gejiu mineral district (Fig. 10) is located along the suture zone of the Indian Plate and Euro-Asian plates on the southwestern edge of the China subplate, approximately 200 km south of Kunming, capital of Yunnan Province, China. The Gejiu Batholith with outcrop area of about 450 km2 is believed to have played an important role in the genesis of the tin deposits (Yu 2002). The ore deposits are concentrated along intersections of NNE-SSW and E-W trending faults. Stream sediment sample locations in the Gejiu area are equally spaced at approximately2 km in the north-south and east-west directions. Every sample represents a composite of materials from the drainage basin upstream of the collection site (Plant and Hale 1994). Regional trends were captured in a moving average map of tin concentration values from within square cells measuring 26 km on aside. Several parameters have to be set for use of the inverse distance weighted moving average method (Cheng 2003). In the current application, each square represented the moving average for a square window measuring 26 km on a side with an influence of
P
1106
Predictive Geologic Mapping and Mineral Exploration
higher local As singularity values are more evenly distributed across the Gejiu area than the As concentration values themselves. Similar results were obtained for other elements including tin (see Cheng and Agterberg 2009). Clearly, Fig. 10b provides a better guideline for finding unknown large tin deposits than Fig. 10a. On the other hand, Fig. 10a more clearly shows the area in which the streams are highly polluted because of mining pollution over an extended period of time. The second example shown in Figs. 11 and 12 (after Xiao et al. 2012) is for lead in northwestern Zhejiang Province, China, that in recent years has become recognized as an important polymetallic mineralization area with the discovery of several moderate to large Ag and Pb-Zn deposits mainly concentrated along the northwestern edge of the study area (Fig. 11), which has been relatively well explored. The pattern of local singularities based on Pb in stream sediment samples is spatially correlated with the known mineral deposits. It can be assumed that similar anomalies in parts of the less explored parts of the area provide new targets for further mineral exploration (Fig. 12). By combining singularities for different chemical elements with one another, spatial correlation between anomalies and mineral deposits can be further increased (Xiao et al. 2012). Predictive Geologic Mapping and Mineral Exploration, Fig. 10 Map patterns derived from arsenic concentration values in 1,000 stream sediment samples in Gejiu Mineral District, Yunnan Province, China. Large tin deposits are shown as white triangles. Pattern (a) shows distribution of arsenic concentration values using inverse distance weighted moving average; pattern (b) shows distribution of local tin singularities (α). The 3 tin deposits outside the highly polluted mining area shown in pattern (a) are recent discoveries. All 11 large tin deposits occur in places where local anomaly in pattern (b) is less than 2 (Source: Cheng and Agterberg 2009, Fig. 1)
samples decreasing with distance according to a power-law function with exponent set equal to 2. Original sample locations were 2 km apart both in the north-south and eastwest directions. The resulting map (Fig. 10a) shows a large anomaly in the eastern part of the Gejiu area surrounding the most large tin deposits including the mines. The three large tin deposits in the central part of the Gejiu area are recent discoveries that have not yet been mined. To illustrate in more detail how singularities were estimated, see Cheng and Agterberg (2009). The singularities were estimated by fitting straight lines on log-log plots of either the concentration value (x) versus ϵ using x ¼ c∙ ϵ α-2 or amount of metal m ¼ x∙ ϵ 2 versus ϵ using m ¼ c∙ ϵ α. Both methods yielded similar estimates of the singularity α. Figure 10b shows results using the first method only. The main difference between the patterns of Fig. 10a and b is that lower and
Summary and Conclusions Geologic maps augmented by remote sensing and geophysical and geochemical data provide the main inputs for decision-making in mineral exploration concerned with where to drill boreholes to find hidden ore bodies. In this chapter, the history of predictive geologic mapping with downward extrapolations from the Earth’s surface was briefly reviewed. Special attention was paid to the first copper potential map for the Abitibi area on the Canadian Shield published in 1972 for copper-producing mines and prospects publicly available in 1968 using geological maps and geophysical data. During the next 10 years, this area was subjected to intensive prospecting for copper-containing mineral deposits permitting an evaluation of the original copper potential maps. Various statistical methods including logistic regression and the jackknife were used for assessing the uncertainties associated with predicting locations and amounts of copper in square cells of different sizes. In the final sections of this chapter, the relatively well-known method of weights of evidence (WofE) and the more recently developed method of local singularity analysis were briefly discussed with examples of the application using geologic maps and geochemical data.
Predictive Geologic Mapping and Mineral Exploration
1107
Predictive Geologic Mapping and Mineral Exploration, Fig. 11 Geological map of northwestern Zhejiang Province, China (Source: Xiao et al. 2012, Fig. 1)
Cross-References ▶ Digital Geological Mapping ▶ Earth Surface Processes ▶ Fuzzy Inference Systems for Mineral Exploration ▶ Local Singularity Analysis ▶ Logistic Regression ▶ Logistic Regression, Weights of Evidence, and the Modeling Assumption of Conditional Independence ▶ Multivariate Analysis
Bibliography
Predictive Geologic Mapping and Mineral Exploration, Fig. 12 Raster map for Fig. 11 showing target areas for prospecting for Ag and Pb-Zn deposits delineated by comprehensive singularity anomaly method (cell size is 2 km 2 km). Known Ag and Pb-Zn deposits are indicated by stars (Source: Xiao et al. 2012)
Agterberg FP (1970) Autocorrelation functions in Geology. In: Merriam DF (ed) Geostatistics. Plenum, New York, pp 113–142 Agterberg FP (1973) Probabilistic models to evaluate regional mineral potential. In: Proceedings of the symposium on mathematical methods in the geosciences held at Přibram, Czechoslovakia, pp 3–38 Agterberg FP (1974) Geomathematics. Elsevier, Amsterdam, 596 p Agterberg FP (1984) Reply to comments by Professor John W. Tukey. Math Geol 16:595–600 Agterberg FP (1989) Computer programs for mineral exploration. Science 245:76–81 Agterberg FP (2014) Geomathematics: theoretical foundations, applications and future developments. Springer, Heidelberg, 553 p Agterberg FP (2021) Aspects of regional and worldwide mineral resource production. https://doi.org/10.1007/s12583-020-1397-4 Agterberg FP, David M (1979) Statistical exploration. In: Weiss A (ed) Computer methods for the 80’s. Society of Mining Engineers, New York, pp 90–115
P
1108 Agterberg FP, Chung CF, Fabbri AG, Kelly AM, Springer J (1972) Geomathematical evaluation of copper and zinc potential in the Abitibi area on the Canadian Shield. Geological Survey of Canada, Paper 71–11 Baddeley A, Brown W, Milne R, Nair G, Rakshit S, Lawrence T, Phatek A, Fu SC et al (2021) Optimum thresholding of predictors in mineral prospectively analysis. Nat Res 30:923–969 Bonham-Carter GF (1994) Geographic information systems for geoscientists: modelling with GIS. Pergamon, Oxford, p 398 Bonham-Carter GF, Agterberg FP, Wright DF (1988) Integration of geological data sets for gold exploration in Nova Scotia. Photogramm Eng Remote Sens 54:1585–1592 Bonham-Carter GF, Agterberg FP, Wright DF (1990) Weights of evidence modelling: a new approach to mapping mineral potential. Geol Surv Can Pap 89-9:171–183 Cheng, Q. (1999). Multifractality and spatial statistics. Comput Geosc 10:1–13 Cheng Q (2003) GeoData Analysis System (GeoDAS) for mineral exploration and environmental assessment, user’s guide. York University, Toronto, 268 p Cheng Q (2005) A new model for incorporating spatial association and singularity in interpolation of exploratory data. In: Leuangthong D, Deutsch CV (eds) Geostatistics Banff 2004. Springer, Dordrecht, pp 1017–1025 Cheng Q (2008) Non-linear theory and power-law models for information integration and mineral resources quantitative assessments. In: Bonham-Carter GF, Cheng Q (eds) Progress in geomathematics. Springer, Heidelberg, pp 195–225 Cheng Q, Agterberg FP (2009) Singularity analysis of ore-mineral and toxic trace elements in stream sediments. Comput Geosci 35: 234–244 Darnley AG (1995) International geochemical mapping – a review. J Geochem Explor 55:5–10 Darnley AG, Garrett RG, Hall GEM (1995) A global geochemical database for environmental and resource management: recommendations for international geochemical mapping. Final Rep IGCP Project 259. UNESCO, PA de Mulder E, Chen Q, Agterberg F, Goncalves M (2016) New and gamechanging developments in geochemical exploration. Episodes, pp 70–71 Draper NR, Smith H (1966) Applied regression analysis. Wiley, New York, 280 p Efron B (1982) The Jackknife, the Bootstrap and other resampling plans. SIAM, Philadelphia, 93 p Gradstein FM, Ogg J, Schmitt MD, Ogg G (eds) (2020) Geological time scale 2020, vol 1 & 2. Elsevier, Amsterdam, 1357 p Harrison JM (1963) Nature and significance of geological maps. In: Albritton CC Jr (ed) The fabric of geology. Addison-Wesley, Cambridge, MA, pp 225–232 Nieuwenkamp W (1968) Natuurfilosofie en de geologie van Leopold von Buch. K. Ned Akad Wet Proc Ser B 71(4):262–278 Pirajno F (2010) Hydrothermal processes and mineral systems. Springer, Heidelberg, 1250 p Plant J, Hale M (eds) (1994) Handbook of exploration geochemistry 6. Elsevier, Amsterdam Quenouille M (1949) Approximate tests of correlation in time series. J R Stat Soc Ser B 27:395–449 Spiegelhalter DJ, Knill-Jones RP (1984) Statistical and knowledgebased approaches to clinical decision-support systems, with an application in gastroenterology. J R Stat Soc A 147(1):35–77 Tukey JW (1970) Some further inputs. In: Merriam DF (ed) Geostatistics. Plenum, New York, pp 163–174 Tukey JW (1972) Discussion of paper by F. P. Agterberg and S. C. Robinson. Intern Stat Inst Bull 38:596
Principal Component Analysis Tukey JW (1984) Comments on “use of spatial analysis in mineral resource evaluation”. Math Geol 16:591–594 Von Buch L (1842) Ueber Granit und Gneiss, vorzűglich in Hinsicht der äusseren Form, mit welcher diese Gebirgsarten auf Erdflache erscheinen. Abh K Akad Berlin IV/2(18840):717–738 Wegener A (1966) The origin of continents and oceans, 4th edn. Dover, New York Wellmer FW (1983) Neue Entwicklungen in der Exploration (II), Kosten, Erlöse, Technologien, Erzmetall 36(3):124–131 Westergård AH, Johansson S, Sundius N (1943) Beskrivning till Kartbladet Lidköping. Sver Geol Unders Ser Aa 182 Winchester S (2001) The map that changed the world. Viking, Penguin Books, London, 412 p Xiao F, Chen J, Zhang Z, Wang C, Wu G, Agterberg FP (2012) Singularity mapping and spatially weighted principal component analysis to identify geochemical anomalies associated with Ag and Pb-Zn polymetallic mineralization in Northwest Zhejiang, China. J Geochem Explor 122:90–100 Xie X, Mu X, Ren T (1997) Geochemical mapping in China. J Geochem Explor 60:99–113 Yu C (2002) Complexity of earth systems – fundamental issues of earth sciences. J China Univ Geosci 27:509–519. (in Chinese; English abstract) Zuo R, Cheng Q, Agterberg FP, Xia Q (2009) Application of singularity mapping technique to identify local anomalies using stream sediment geochemical data, a case study from Gangdese, Tibet, western China. J Geochem Explor 101:225–235
Principal Component Analysis Alessandra Menafoglio MOX-Department of Mathematics, Politecnico di Milano, Milano, Italy
Synonyms Karhunen-Loève decomposition
decomposition;
Principal
orthogonal
Definition Principal component analysis (PCA) is a statistical technique aimed to explore and reduce the dimensionality of a multivariate dataset. It is based on the key idea of finding a representation of the p-dimensional data through a smaller set of k < p new variables, defined as linear combinations of the original variables. These are obtained as to explain at best the data variability. PCA was first introduced by Pearson (1901); nowadays its formulation has been developed for varied types of Euclidean data, including compositional data and functional data. Extensions to non-Euclidean data are available as well.
Principal Component Analysis
1109
PCA as an Optimization Problem It is given a dataset made of n observations of p variables, organized in a data matrix ¼ xij in ℝn,p, whose columns are denoted by x j ¼ (x1j, . . ., xnj). It is assumed that the variables have null sample mean ( nj¼1 xij ¼ 0 ); otherwise, the data can be centered with respect to their sample mean prior to apply the PCA. The goal of the PCA is to define a new set of uncorrelated variables z1, . . ., zp, named principal components (PCs), linearly related with the original variables x 1, . . ., x p, and suitable to effectively represent the data through a reduced representation based on k < p PCs only – namely, through a new data matrix ℤ ¼ [z1, . . ., zk] in ℝn,k. The generic principal component z j ¼ (z1j, . . ., znj)T is thus expressed as zij ¼ xj1 xi1 þ xj2 xi2 þ . . . þ xjp xip ,
ð1Þ
where the zij is named principal component score (or simply score) and represents the i-th observation of the j-th principal component, and the weights xji can be organized in a vector j j ¼ (xj1, . . ., xjp)T, which is named loading vector. As exemplified in Fig. 1, the vector j j identifies a direction in ℝp along which the data are projected (obtaining the scores zij); hence, j j is conventionally set to be a unit vector ( jj
2
¼
p
‘¼1
xj‘2 ¼ 1 , with || || the Euclidean norm). The
PCA thus ultimately identifies k directions in ℝp, which together form a new representation space for the data.
Two requirements underlie the identification of the loading vectors j j: (1) the directions are set to be orthogonal to each other; and (2) the variability expressed by the data when projected along these directions is maximized. This leads to find the j-th loading vector as the solution of the optimization problem n
arg minp jj ℝ
z2ij
subject to
jj ¼ 1,
i¼1
jj , j ‘ ,
ð2Þ
¼ 0, ‘ < j,
where the last orthogonality constraint hj j, j ‘i ¼ 0 (the symbol h , i denoting the Euclidean inner product) ensures the first requirement, and it is only imposed for j > 1 (i.e., not on the first PC). It can be proved that the optimization problem has a unique solution, which is found from the eigen-decomposition of the 1 covariance matrix of the data ¼ n1 T , or, analogously, from the singular value decomposition of the data matrix (see Johnson and Wichern (2002)). Here, the variances of the PCs, Var(z j), are found as the (ordered) eigenvalues of , while the loading vectors are the (ordered) eigenvectors of (equivalent to the right singular vectors of ). A strong linear correlation among variables reflects on the presence of PCs associated with very low variability. In this case, the last PCs have sample variance approaching 0. In case of perfect linear relation among variables, only a set of k < p PCs are associated to non-null variance (i.e., non-null eigenvalues of ). In this case, the data matrix , as well as the covariance matrix , is not of full rank.
P
Span(PC1) : subspace of best approximation
2
2
3
3
PCA: new set of coordinates maximizing the explained variance
1
1
ξ1
−1
0
x2
0 −3
−3
−2
−2
−1
x2
ξ2
−4
−2
0
2
4
x1
Principal Component Analysis, Fig. 1 Left: Finding a new system of coordinates through PCA to represent a set of bivariate data. Data are reported as gray symbols; directions of the PCs are depicted as straight lines; arrows indicate the loading vectors. The first PC is colored in blue, the second PC in red. Right: Interpretation of PCA as space of best
−4
−2
0 x1
2
4
approximation, for a bivariate dataset (gray symbols). The direction of the first PC (span{x1}) is depicted as a blue straight line; projections of the data as blue symbols; reconstruction errors are depicted as red segments. The MSE1 is the sum of the squares of the lengths of the red segments
1110
Principal Component Analysis
Alternative Geometrical Interpretation The PCA can be alternatively interpreted as a method to build nested spaces of best approximation for the data. Indeed, one can prove that the linear space generated by the loading vectors of the first k PCs, span{j 1, . . ., j k}, is the k-dimensional space of best approximation for the data, in the sense of minimizing the mean square error of approximation n
2
k
MSE k ¼
xi i¼1
xi, , jj jj
,
ð3Þ
j¼1
where xi denotes the i-th row of the data matrix , and k
xi, , jj jj is the projection of xi on span{j 1, . . ., j k}.
j¼1
In other terms, having at one’s disposal k dimensions to reconstruct the dataset, the first k PCs offer the coordinate system minimizing the reconstruction error (in terms of Euclidean distance MSEk). Note that, in general, the best reconstruction of dimension 0, in the sense of the MSE, is the sample mean x ¼ 1n
n
xi: . This alternative viewpoint to i¼1
PCA is often taken to generalize PCA to non-Euclidean spaces (e.g., in the principal geodesic analysis, Fletcher et al. (2004)).
Dimensionality Reduction The PCA can be used to perform a dimensionality reduction of the dataset, by retaining only k out of the p PCs, and then using the reduced data matrix ℤ ¼ [z 1, . . ., z k] in ℝn,k for further statistical processing. This is typically done by looking for an elbow in the scree plot, which depicts, as a function of k, the cumulative variance along the first k PCs, compared to the total variance of the data Vartot ¼ 1n
p
n
j¼1 i¼1
x2ij
(an example is given in Fig. 2b). Note that, by construction, the variance of the j-th PCs, Var(z j), is larger than the variances of successive PCs, i.e., Var(z j) Var(z ‘), ‘ > j, and that Vartot ¼
p
Var z j . A full representation of the total vari-
j¼1
ance Vartot can only be attained by retaining all the p PCs; however, in many cases, a set of k < p PCs may be sufficient to explain a large amount of the data variability, particularly if the original variables are highly correlated. The number k of PCs to retain can then be set by thresholding the proportion of variance explained by the PCs, typical thresholds being 80%, 90%, or 95%. Interpretation of the principal components is key for their use in practice. For this purpose, a useful representation is provided by the biplot (Gabriel 1971), which displays the
projection of the original variables in the Cartesian space defined by the first and second PCs (see Johnson and Wichern (2002)). An example of biplot corresponding to the PCA of a multivariate dataset with p ¼ 5 is given in Fig. 2c. The dimensionality reduction offered by PCA is useful to face the curse of dimensionality affecting data analyses as the number p of measured variables increases. In particular, if p > n (a.k.a. large p, small n problem) several statistical techniques cannot be directly applied. In these cases, PCA can be used prior to further statistical processing, to single out a reduced data matrix ℤ of dimension k n. These include, e.g., regression analysis in the presence of multicollinearity (principal component regression), where PCA is applied to the set of predictors. Other techniques which may befit from dimensionality reduction to stabilize estimations or speed up computations are, in a spatial context, co-kriging and stochastic co-simulation (Chiles and Delfiner 2009).
Extensions Several extensions of PCA to linear and nonlinear spaces are available. These include, among others, the principal component analysis for compositional datasets based on the Aitchison geometry (Pawlowsky-Glahn et al. 2015); the functional principal component analysis for functional data (e.g., curves or images) when these are embedded in Hilbert spaces (Ramsay and Silverman 2005); and the principal geodesic analysis for shape data (Fletcher et al. 2004). Different representations can be used to perform dimensionality reduction of a multivariate dataset, while attaining slightly different maximization tasks. Among these, we mention independent component analysis (ICA; see Hyvrinen (2013)), which identifies independent variables; Fisher’s canonical decomposition (see Johnson and Wichern (2002)), which identifies uncorrelated directions of maximum discrimination among groups in a classification setting; and multidimensional scaling (MDS; see Johnson and Wichern (2002)), which identifies a best approximation space in the sense of respecting the mutual distances among data points. PCA, however, unlike these alternatives, returns a bi-orthogonal decomposition, in the sense that the loading vectors are orthogonal, and the scores are uncorrelated, by virtue of the Karhunen-Loeve theorem. PCA is part of a broader class of statistical methods for the identification of latent factors, named factor analysis (see Johnson and Wichern (2002)).
Summary Principal component analysis (PCA) allows one to reduce the dimensionality of a multivariate dataset, by finding directions of maximum variability of the data or, equivalently, nested
Principal Component Analysis
1111 Scatterplot of the original variables
a
6
7
8
9
3
4
5
6
7
5
7
5
8
9
1
3
x1
8
9
5
6
7
x2
6
7
5
6
7
x3
4
6
3
4
5
x4
0
2
x5
1
3
5
7
5
6
7
Scree plot
9
0
c
2
4
6
Biplot
0.2
0.4
PC2
0.6
0.8
4
0.0
Cumulative explained variance
1.0
b
8
2
x1
0
x2 x3 x4
−2
x5
−4 0
1
2
3
4
5
Number of retained PC
Principal Component Analysis, Fig. 2 Example of PCA of a multivariate dataset ( p ¼ 5). (a) Scatterplot of the original variables, showing significant correlation among variables. (b) Scree plot. Typical thresholds set at 80% and 90% of explain variability (red horizontal lines)
−6
−4
−2
0
2
4
PC1
suggest to select k ¼ 1. (c) Biplot. Data are reported in the coordinate system of the first and second PC; red arrows indicate the directions of the original variables
P
1112
spaces of best approximation. The representation offered by PCA can be useful for further statistical processing, to face the curse of dimensionality affecting data analysis as the number p of measured variables increases. The geometrical perspective upon which the PCA is rooted enables its generalization to complex situations, pertaining constrained data (compositional, directional, spherical data) or infinitedimensional data (functional, distributional data).
Cross-References ▶ Compositional Data ▶ Correlation Coefficient ▶ Eigenvalues and Eigenvectors ▶ Exploratory Data Analysis ▶ Independent Component Analysis ▶ Multidimensional Scaling ▶ Q-Mode Factor Analysis ▶ R-Mode Factor Analysis ▶ Variance
Bibliography Chiles J-P, Delfiner P (2009) Geostatistics: modeling spatial uncertainty, vol 497. Wiley Fletcher P, Lu C, Pizer S, Joshi S (2004) Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans Med Imaging 23:995–1005 Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58(3):453–467 Hyvärinen A (2013) Independent component analysis: recent advances. Philos Trans R Soc A Math Phys Eng Sci 371(1984):20110534 Johnson R, Wichern D (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Chichester Pearson KF (1901) LIII. On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 2(11):559–572 Ramsay J, Silverman BW (2005) Functional data analysis. Springer, New York
Probability Density Function
Definition This chapter aims to furnish the reader with knowledge on the probability density function (pdf). It is assumed as a prerequisite a course in elementary calculus and set theories. Probability Measure First consider the definition of two concepts: a s-field of a set and a probability measure. Let Ω be the set of all outcomes formulated from a real random phenomenon, e.g., measurement of vegetation index from remote sensing radar (Maus et al. 2019) and description of intensities in synthetic aperture radar imagery (Ferreira and Nascimento 2020; Yue et al. 2021). A nonempty collection of subsets drawn from Ω, say A , is said to be a s-field of Ω if A satisfies two conditions (Hoel et al. 1973): (i) If A A, then Ac A. (ii) If Ai A for i ¼ 1, 2, . . ., then [1 i¼1 Ai A and c A A, where A is the complement of a set \1 i i¼1 A (i.e., the elements not in A) and [ and \ represent the union and intersection operations, respectively. (iii) A probability measure, say Pr( ), on a s-field of subsets A of a set Ω is a real-valued function having domain A satisfying the following properties (Hoel et al. 1973): (iv) Pr(Ω) ¼ 1. (v) Pr(A) 0 for all A A. (vi) If {An; n ¼ 1, 2, . . .} are mutually disjoint sets in A , then Pr [1 n¼1 An ¼
1
PrðAn Þ: i¼1
From the previous system, various properties can be derived, e.g.: (vii) (Boole’s inequality) If {Ai; i ¼ 1, 2, . . ., n} are any sets, then n
Pr [ni¼1 An
PrðAn Þ: i¼1
Probability Density Function Abraão D. C. Nascimento Universidade Federal de Pernambuco, Recife, Brazil
Synonyms Density; Joint probability density function
(viii)
PrðA1 \ A2 \ \ An Þ ¼ PrðA1 ÞPrðA2 j A1 ÞPrðA3 j A1 \ A2 Þ
PrðAn j A1 \ \ An1 Þ,
8A1 , . . . , An A, and 8n ¼ 1, 2, . . ., where Pr(Ai|B) ≔ Pr (Ai \ B)/Pr(B) for Pr(B) > 0 is the conditional probability, one of the most important concepts of probability theory. From the latter, one can define the notion of independence of two sets that says: two sets A and B A are independent if Pr(A \ B) ¼ Pr(A) Pr(B).
Probability Density Function
1113
(ix) (Bayes’ formula) Let {Ai; i ¼ 1, 2, . . .} be a sequence of events that represent a partition of Ω, then PrðAi jBÞ ¼
PrðAi ÞPrðBjAi Þ , Aj Pr BjAj
8B A:
1 j¼1 Pr
The jcdf is continuous from the right for xi, i.e., if xi1 # xim for m ! 1, FX x1 , . . . , xim , . . . , xp # FX x1 , . . . , xi1 , . . . , xp
for
m ! 1: From on now, a probability space, denoted by (Ω, A, Pr), is a system that approaches a set of events Ω, a s-field of subsets A, and a probability measure Pr : A 7! ð0, 1Þ.
Further lim
8i, xi !1
Random Variables and Vectors A mapping X : O 7! X ℝ is said to be a random variable (RV) if: (x) It is a real-valued function defined on a sample space on a collection of subsets having defined probability. (xi) For every Borel set B ℝ, fw O : XðwÞ Bg A, where Borel set represents any set in a topological space which is obtained from union, intersection, and complement operations of open or closed sets. Such sets are crucial in the measure theory because any measure formulated on the open or closed sets of a pre-specified space needs to be defined on all Borel sets of such space. ⊤
As an extension of X, X ¼ X1 , . . . , Xp : O 7! X ℝp is a random vector if Xi is random variable for i ¼ 1, . . . , p, where ( )T is the transposition operator. In practical terms, an RV represents a numerical value defined from the outcome on the random experiment. Mathematically, it requires Pr(X x) for all x ℝ to be welldefined or, equivalently, {w Ω : X(w) x} belongs to A . Thus, it is necessary to define the cumulative distribution function (cdf) of an RV. Let X be an RV defined on ðO, A, PrÞ, its cdf, say FX: X 7! ½0, 1, is given by FX ðxÞ ¼ PrðX xÞ: The cdf is nondecreasing and continuous from the right, satisfying limx ! 1F(x) ¼ 0 and limx ! 1F(x) ¼ 1. The cdf plays a crucial role to compute probability of events, e.g., Pr(a < X b) ¼ FX(b) FX(a) for a b. For random vectors, the joint cumulative distribution function (jcdf) of X, say FX : X 7! ½0, 1, is given by FX x1 , . . . , xp ¼ Pr X1 x1 , . . . , Xp xp : The jcdf is non-decreasing for xi (for all i ¼ 1, . . . , p) in (x1, . . . , xi, . . . , xp), i.e., if xi1 < xi2 , then FX x1 , . . . , xim , . . . , xp FX x1 , . . . , xi2 , . . . , xp :
FX x 1 , . . . , x i , . . . , x p ¼ 0
lim FX x1 , . . . , xi , . . . , xp ¼ 1:
and
8i, xi !1
Probability Density Function The nature of the under study phenomenon determines the kind of variable to be used, discrete, or continuous. Let X be a real-valued RV on ðO, A, PrÞ, then (xii) X is a discrete RV if it has as both domain Ω and range X a finite or countably infinite subset {x1, x2, . . .} ℝ such that fw O : XðwÞ ¼ xi g A for all i ¼ 1, 2 . . . . (xiii) X is a continuous RV if it has X ℝ such that Pr({w Ω : X(w) ¼ x}) ¼ 0 for 1 < x < 1. From now on, this discussion will focus on the continuous case. The cdf of a continuous RV is often defined in terms of its probability density function (pdf). The pdf of X is a nonnegative function, say fX: X 7! ½0, 1Þ, such that 1 1
f X ðxÞdx ¼
X
P
f X ðxÞdx ¼ 1:
Note that FX(x) and Pr(X B) can be rewritten as FX ð x Þ ¼
x 1
f X ðtÞdt and PrðX BÞ ¼
f X ðtÞdt, B
respectively. The pdf can also be deduced from fX(x) ¼ dFX(x)/dx. For random vectors, the joint probability density function (jpdf) of X is a nonnegative function, say fX: X 7! ½0, 1Þ, such that 1 1
...
1 1
f X ðt Þdt1 . . . dtp ¼
X
f X ðt Þdt1 dtp :
Note that FX(x) and Pr(X B) for B X can be explicited as
1114
Probability Density Function
FX ð xÞ ¼
x1 1
PrðX BÞ ¼ . . .
B
xp 1
f X ðt Þdt1 . . . dtp and
drawn from X N(m, s2); then the appropriate test statistic (called t statistic) for ℋ01 is St ¼
f X ðt Þdt1 . . . dtp :
Analogous to the univariate counterpart, the jpdf arises from fX(x) ¼ dpFX(x)/dx1. . .dxp.
Some Statistical and Computational Essays Now we are in a position of illustrating the pdf shape of some univariate and multivariate distributions. Hundreds of continuous univariate distributions have been extensively used for modeling phenomena in various areas of science (cf. Cordeiro et al. 2020). The most employed one is the normal (or Gaussian) distribution, having its univariate version support in ℝ and as parameters the mean m ℝ and standard deviate s > 0 and its p variate counterpart domain in ℝp and parameters, the mean vector m ℝp and covariance matrix S ≽ 0, where ≽0 represents the nonnegative definite condition. Their pdfs are presented in Table 1. This case in denoted as X Np(m, S) for the multivariate case and X N(m, s2) for the univariate. An alternative distribution to the Np law is the t-student distribution. Similar to the normal, this law is symmetric having pdf with heavy tails driven by the parameter n called degree of freedom (df), and it is denoted as X tn. The smaller the parameter n is, the heavier the tail is. Figure 1a illustrates the relation between N(m, s2) and tn distributions. Note that the latter tends to the former when increasing n and the t2 law has the most heavy tail. From the relation between these two distributions, the most used t and Hotelling’s T2 tests for the null hypotheses ℋ01: m ¼ [m0 ℝ] and ℋ02: m ¼ [m0 ℝp] arise from, respectively: • (Univariate case: the t test) Let X1, . . ., Xn be an independent and identically distributed (iid) sample of size n
X m0 Xm p ℋ01 tn1 or St ¼ p tn1 , S= n S= n
where X ¼ n1
n i¼1
is the sample mean, S2 ¼
Xi
2
is the sample variance, and ðn 1Þ1 ni¼1 Xi X “ ℋ01” represents a variable distributed as a distribution law under the hypothesis ℋ01. As a decision rule, we reject ℋ01 when jStj is large. • (Multivariate case: the Hotelling’s T2 test) Let X1, . . ., Xn be an iid sample of size n obtained from X Np(m, S); then the appropriate test statistic for ℋ02 (named Hotelling’s T2 statistic) is T 2 ¼ n X m0
⊤ 1
S
X m0
ðn 1Þp , F ðn pÞ p, np
where X ¼ n1 ni¼1 X i , S ¼ ðn 1Þ1 ni¼1 X i X X i X ⊤ and F p, np represents the F-Snedecor with degree of freedom p and n p (which is defined as the ratio of two independent random variables following the chi-square distribution divided by their dfs, termed w2k in Table 1). Thus, if T2 is too large, we have evidence to reject H02.
Emery (2008) showed that t and Hotelling’s T2 tests are crucial to check the fit quality in geoscience applications. The Np distribution (analogous to the t-student law) is symmetric. Figure 2a displays two groups of level curves for this law. To extend the Np distribution, Azzalini and Valle (1996) pioneered the skew-normal (SN) distribution having pdf given in Table 1. Figure 2b illustrates that the SN level curves can describe the skewness multivariate database, while Fig. 1b confirms this property for a univariate perspective.
Probability Density Function, Table 1 Some pdfs of random variables and vectors. f( ) and F( ) are the standard normal pdf and cdf. fp( ) is the p-variate standard normal pdf. j j and ( )1 are the matrix determinant and inverse operations Distributions Np(m, S)
pdfs
jpdfs
Skew normal
xm 2 df d
tn-student
Gð Þ p npGð2nÞ
X 2k
1 xk=21 2k=2 Gðk=2Þ
2 1=2
ð2ps Þ nþ1 2
exp
2 12 xm s
F a xm d 2
1 þ tn
nþ1 2
expðx=2Þ
j2pSj1=2 exp 12 ðx mÞT S1 ðx mÞ 2fp(x; Ω)F(α⊤x) GððnþpÞ=2Þ Gðn=2Þnp=2 pp=2 jSj1=2
ðx1 g1 Þa1 1 a
ba a1 GðaÞ x
expðbxÞ
ðnþpÞ=2
b p Pki¼1 Gðai Þ
Γ(α, β)
1 þ 1n ðx mÞ⊤ S1 ðx mÞ
ðx2 x1 g2 Þa2 1 xp xp1 gp
exp{[xp (γ1 þ þ γp)]/β}, xi 1 þ γi < xi (for i ¼ 2, . . ., p, )γ1 < x1
ap 1
Probability Density Function
0.5 0.3
0.4
SN(0,1,3) SN(0,1,−3) SN(0,1,1) SN(0,1,−1)
0.1
0.1
0.2
N(0,1) t2 t3 t10
0.2
Skew Normal Density
0.3
0.6
0.4
b
0.0
0.0
Normal and t−Student Densities
a
1115
−4
−2
0
2
4
−4
−2
Support
0
2
4
Support
Probability Density Function, Fig. 1 Curves for normal, t-student, and SN pdfs
b
−3
−4
−2
−2
−1
0
0
1
2
2
4
3
a
−4
−2
0
2
4
−3
−2
−1
0
1
2
3
Probability Density Function, Fig. 2 Curves for multivariate normal and SN jpdfs
Finally, one of the most used positive distributions is the gamma law that has two parameters: one of shape, α > 0, and another scale, β > 0. This case is denoted by Γ(α, β) and its pdf is presented in Table 1. The Γ distribution is known as the mother law for describing the speckle noise in synthetic aperture radar system (Ferreira and Nascimento 2020). Mathai and Moschopoulos (1992) proposed a multivariate gamma version having jpdf given in Table 1.
geoscience data. Particularly, Ma (2009) showed evidence that the previous axioms iii, iv, and v are crucial to deal with depositional facies analysis, while (Tolosana-Delgado and van den Boogaart 2013) addressed the importance of them to analyze mineral and geochemical elements. Finally, Ma (2019) have pointed out that the use of (a) probability dilemmas and the conditional probability concept and (b) pdfs and jpdfs in the probabilistic mixture are vital to understanding the physical conditions and statistical properties of geoscience phenomena.
Probabilistic Approach in Geoscience Problems Many geoscience applications can be understood as random experiments, e.g., on resource characterization (Ma 2019) and measurements of features obtained from remote sensing data (Maus et al. 2019) or SAR imagery (Yue et al. 2021). Thus, it is required to use a probability approach and pdf to describe
Conclusion In this chapter, we have presented the main concepts for understanding the probability density function (pdf). Some particular jpdfs and pdfs were approached and related to t and
P
1116
Hotelling’s T2 statistical tests. The latter are important tools in geoscience applications. In summary, the probability approach consists of a base to deal with several geoscience problems, like resource evaluation and mapping by remote sensing sources.
Bibliography Azzalini A, Valle AD (1996) The multivariate skew-normal distribution. Biometrika 83(4):715–726 Cordeiro GM, Silva RB, Nascimento ADC (2020) Recent advances in lifetime and reliability models, 1st edn. Bentham Science Publishers, The Netherlands Emery X (2008) Statistical tests for validating geostatistical simulation algorithms. Comput Geosci 34(11):1610–1620 Ferreira JA, Nascimento ADC (2020) Shannon entropy for the G 0I model: a new segmentation approach. IEEE J Sel Top Appl Earth Obs Remote Sens 13:2547–2553 Hoel PG, Port SC, Stone CJ (1973) Introduction to Probability Theory, Houghton Mifflin, Boston Ma YZ (2009) Propensity and probability in depositional facies analysis and modeling. Math Geosci 41:737–760 Ma YZ (2019) Quantitative geosciences: data analytics, geostatistics, reservoir characterization and modeling, 1st edn. Springer, Switzerland Mathai AM, Moschopoulos PG (1992) A form of multivariate gamma distribution. Ann Inst Stat Math 44(1):97–106 Maus V, Câmara G, Appel M, Pebesma E (2019) dtwSat: time-weighted dynamic time warping for satellite image time series analysis in R. J Stat Softw 88(5):1–31 Tolosana-Delgado R, van den Boogaart KG (2013) Joint consistent mapping of high dimensional geochemical surveys. Math Geosci 45:983–1004 Yue D-X, Xu F, Frery AC, Jin Y-Q (2021) Synthetic aperture radar image statistical modeling: part one-single-pixel statistical models. IEEE Geosci Remote Sens Mag 9(1):82–114
Proximity Regression Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition Geological features include mineral deposits, intrusions, faults, etc. It is known that the distance to the geological feature, i.e., distance to mineralization, is influenced by several geochemical variables or entities. These variables include the trace elements and the major oxides. The goal is to identify multi-element signatures (Bonham-Carter and Grunsky 2018) in the proximity to the geological features, and locate other regions with similar signatures. To
Proximity Regression
accomplish such data analysis, linear regression models using proximity are used. Proximity is defined as a function related to distance, just as similarity, where proximity is inversely proportional to distance (Bonham-Carter and Grunsky 2018). The rate of decay of proximity with distance can be considered to be a constant. Thus, by integrating the rate value, we get proximity to be an exponential function of distance. While this can be used for modeling proximity, say at half distance, it can also be predicted using known values of geochemical variables. Such a prediction is done using a linear regression model. Thus, proximity regression is the linear regression model of predicting proximity, as the response or dependent variable, using the geochemical variables as the predictors or the independent variables (Bonham-Carter and Grunsky 2018). For proximity Y, expressed as a real value in [0,1], the corresponding distance from feature Z ranges from infinity to zero. Using constant function, dY dZ ¼ a, which upon integration gives: Y(Z ) ¼ Y(0)eαz. At half distance Z 0:5 , YYðZð00:5Þ Þ ¼ 0:5 , thus a ¼ Zln0:50:5 ) Y ðZÞ ¼ exp
ln 0:5 Z0:5
Z .
Overview A dispersion halo is a physical phenomenon that is observed in the country rocks, where a region around a mineral deposit exhibits higher metal values compared to those of the deposit and considerably higher than background values. Usually geochemical sampling and testing are used to delineate the dispersion halo. For computations, proximity is one such measure used for modeling the dispersion halo. While distance to the geological feature is the direct observation, proximity is preferred over distance to be regressed on the geochemical variables for modeling the dispersion halo around a mineral deposit. This is because of the decay of the halo following an exponential function or a power law. Hence, using a measure, such as proximity, that is modeled as a similar function of distance in a linear regression model, improves its prediction. Overall, proximity is modeled as an inversely proportional function of distance (Bonham-Carter and Grunsky 2018). For using proximity regression, proximity values in close range to the selected feature are used. For dependent variables, the centered logratio (CLR) transformed values of the element values are used in the form of a matrix X of size m n for m geochemical variables and n samples or observations. Thus, the column vector of response variables of size n, used in a linear regression model, is given by Y ¼ Xβ þ ε, where β is the coefficient (column) vector of size n, and ε is the error (column) vector of size n. This overdetermined equation, as
Proximity Regression
m n, is solved using the least squares method or the conjugate gradient method. For studying geochemistry of soil samples, compositional data analysis is routinely used, owing to the constant sum constraint (Aitchison 1982). A linear transformation of the Aitchison geometry or simplex, which is the representation of the data points in the compositional data, is used to convert the data to real space. The centered logratio (CLR) is one such linear transformation that is effectively used for the geochemical variables, to avoid the closure problem (Bonham-Carter and Grunsky 2018). The closure problem arises from the standard multivariate analysis to compositional data, where there is a negative bias in the product-moment correlation between the constituents of a geochemical composition (Pawlowsky-Glahn et al. 2007). CLR transformation is also used in applications involving principal component analysis for dimensionality reduction (Pawlowsky-Glahn et al. 2007). CLR transformation involves dividing samples by the geometric mean of its values, and then using its logarithm values. While all components are treated symmetrically in CLR transformation, the transformed data may exhibit collinearity, which is not applicable for methods using full rank data matrices (Filzmoser et al. 2009). In cases of collinearity in the CLR data, the isometric logratio transformation (ILR) is alternatively used (Egozcue et al. 2003; Filzmoser et al. 2009).
1117
deposit, for training. In addition to identifying these regions as “bulls-eye,” the method has enabled in identifying the predictors or independent variables that identify Croxall property based on the training on the Canagau Mines deposit.
Future Scope Prospectivity mapping can be considered to be an organic successor to proximity regression–based methods. This can be applied by constructing the proximity measure to the nearest mineral deposit. The problem can be then reframed as predicting the proximity to the nearest mineral occurrence, instead of the occurrence of a mineral deposit (BonhamCarter and Grunsky 2018). Different from other spatial regression methods, where proximity and its analogues are used as independent variables, proximity regression is one of the methods where proximity is the response variable. The methodology for prediction can itself be improved through machine learning using neural networks. Alternative to linear regression, logistic regression can be used to strictly constrain the range of proximity to (0,1) (Bonham-Carter and Grunsky 2018). ILR can also be considered as an alternative to CLR for transforming the independent variables (Filzmoser et al. 2009). Overall, proximity regression is a multivariate predictive method used in multivariate compositional data.
Applications Bibliography Proximity regression has been applied in the study of lithogeochemistry of volcanic rocks in the Ben Nevis township, Ontario, Canada. Usually auto- and cross-correlations of the geochemical variables are useful in computations of spatial factors for factor analysis methods used for distinguishing lithogeochemical trends from geological processes (Grunsky and Agterberg 1988). It is observed that the anisotropically distributed variables are delineated better than isotropically distributed ones, when using the spatial factor maps. Such a multivariate analysis helps in identifying the dispersion halo that reflects mineralization. Expanding this and other similar works to identifying specific mineral occurrences in Canagau Mines deposit and Croxall property in the Abitibi Greenstone Belt, proximity regression has been used (Bonham-Carter and Grunsky 2018). This method also predicts the Croxall property using the Canagau Mines deposit, using 26 geochemical variables, and 278 samples within 3 km of Canagau Mines
Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B Methodol 44(2):139–160 Bonham-Carter G, Grunsky E (2018) Two ideas for analysis of multivariate geochemical survey data: proximity regression and principal component residuals. In: Handbook of mathematical geosciences. Springer, Cham, pp 447–465 Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barcelo-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300 Filzmoser P, Hron K, Reimann C (2009) Principal component analysis for compositional data with outliers. Environmetrics 20(6):621–632 Grunsky E, Agterberg F (1988) Spatial and multivariate analysis of geochemical data from metavolcanic rocks in the Ben Nevis area, Ontario. Math Geol 20(7):825–861 Pawlowsky-Glahn V, Egozcue JJ, Tolosana Delgado R (2007) Lecture notes on compositional data analysis. https://dugi-doc.udg.edu/ bitstream/handle/10256/297/?sequence=1. Accessed online last on May 14, 2022
P
Q
Q-Mode Factor Analysis Norman MacLeod Department of Earth Sciences and Engineering, Nanjing University, Nanjing, Jiangsu, China
covariance or correlation relations among the variables, not similarity relations among the cases themselves. The Q-mode methods of principal coordinates analysis (PCoord) and component-based Q-mode factor analysis (QFA) were created to redress this deficiency.
Definition
Estimating Similarity
Q-Mode Factor Analysis – a statistical modelling procedure whose purpose is to find a small set of latent linear variables that estimate the influence of the factors controlling the structure of case similarities/dissimilarities in a multivariate dataset and are optimized to preserve as much of the intercase similarity or dissimilarity structure as appropriate given an estimate of the number of causal influences.
Just as the core of the PCA and FA lies in the manner in which pairwise structural relations among variables are portrayed, the core of QFA lies in the manner in which pairwise structural relations among cases are portrayed. But while in R-mode methods this usually comes down to a choice between two quite distinct alternatives – covariance or correlation – in Q-mode methods a wide variety of similarity and dissimilarity indices are available. Moreover, across the earth sciences it is often the case that variable values are reported as proportions or percentages, a practice that adds complexity to QFA calculations (Table 1). The similarity coefficients or indices used in QFA can be classified into several types: binary association indices, distance indices, and proportional similarity indices. Binary association indices, such as the coefficients of Jaccard, Dice, Simpson, and Otsuka (Table 1; see Cheetham and Hazel 2012 for a review), are used exclusively for presence-absence datasets and have been applied most extensively in the context of Q-mode cluster analysis. They do, however, produce “similarity” matrices than can be operated on by any number of multivariate data-analysis procedures, including QFA. Owing to the count of either mutual presences, or mutual absences, appearing in the numerators of these ratios, the selection of particular binary association indices will focus the analysis on the structure of similarity associations, dissimilarity associations, or some combination of both (Table 2). Distance-based indices are the most readily understandable family of Q-mode similarity coefficients because they
Introduction Whereas a linear, multivariate data-analysis procedure whose purpose was to summarize the covariance/correlation structure among a set of variables (R-mode) was first introduced by Karl Pearson (1901), the development of linear procedures whose purpose was to summarize the similarity/dissimilarity structure among sets of cases (e.g., objects, samples or events) (Q-mode) is of more recent vintage. This type of multivariate data analysis was introduced into the earth-science literature by John Imbrie (1963) based on previous applications in the fields of psychology and biology. Both principal component analysis (PCA) and component-based, R-mode factor analysis (FA) can be used to create mathematical ordination spaces into which cases described by the variables can be projected. Cases that plot close to each other in such spaces can be regarded as having similar variable values and cases that plot at some distance from others can be regarded as having different variable values. But in R-mode analyses the spaces created by these methods are referenced to the structure of
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
Q-Mode Factor Analysis
1120 Q-Mode Factor Analysis, Table 1 A selection of six Q-mode binary similarity coefficients
SJaccard ¼ N1 þNC2 C
SSimpleMatching ¼ NCþA t þA
SDice ¼ N12C þN 2 2Þ SKluczynski2 ¼ C2ððNN11þN N2 Þ
SSimpson ¼ NC1 SOtsuka ¼ p2 NC N 1
2
C Present in both cases, A Absent on both cases, N1 Total present in case 1, N2 Total present in case 2, Nt Total present in cases 1 and 2
Q-Mode Factor Analysis, Table 2 A selection of five Q-mode distance (dissimilarity) coefficients
Euclidean Distance di,j ¼
2
p
xi,k xj,k
Squared Euclidean Distance d2i,j ¼
2
k¼1
Manhatten Distance Di,j ¼ ! xi ! xj
p
xi,k xj,k
2
k¼1
Mahalanobis Distance T D ¼ ! x ! x ! x S1 ! x i,j
i
j
i
j
Gower Distance p jxi,k xj,k j=rangek gi,j ¼ k¼1 p xi ¼ The ith case. xj ¼ The jh case. p ¼ no. of variables, S ¼ covariance/correlation matrix for the p variables
represent case similarity and difference as a spatial construct. Euclidean, squared Euclidean, Manhattan, and Mahalanobis distances (Table 2) are all examples of distance indices that can be used to express the structure of relations between cases by summarizing pairwise differences across all variables. Because the trace of a pairwise, n by n distance matrix will always be occupied by zeros (since the distance between any object, sample or event with itself ¼ 0.0), distance-based indices are often said to record “dissimilarity”. Distance indices are applied typically to sets of interval or ratio data values and in situations where it is desirable to retain the magnitude or range of each variables’ values in the analysis. However, if a dataset contains a broad mixture of variable types, or if the data analyst wishes to minimize the influence of different variables’ numerical ranges, Gower’s distance (Gower 1971) may be used. When using Gower’s distance it should be kept in mind that atypical variable values can inflate a variable’s range and so have a disproportionate effect on the resulting pairwise similarity estimate. Ideally, outliers should be removed prior to the calculation of Gower distances and the number of different variable types balanced insofar as possible. Weights may also be employed to control the relative influence of different variables on the overall measure of similarity though, as pointed out by Gower (1971), justification of an appropriate set of weights in the context of particular investigations is usually quite difficult. Here, the temptation to devise a weighting scheme that will either produce a particular ordination or speed calculations will be ever-present. Such temptations should be resisted. The final type of similarity index often employed in QFA is Imbrie and Purdy’s (1962) index of proportional similarity.
cos yi,j ¼
2
p k¼1 xi,k xj,k p p 2 2 k¼1 xi,k k¼1 xj,k
ð1Þ
In this equation xi is the ith case; xj ¼ is the jh case; and p ¼ is the no. of variables. This is by far the similarity index most commonly employed in QFA and often constitutes the default index in public-domain and commercial statistics and data-analysis software packages. The proportional similarity index is often referred to as the “cosine θ” index because it conceptualizes cases as vectors in the variable space and represents similarity as the angular difference between all pairwise combinations of case vectors, taking no account of these vectors’ lengths. Accordingly, the cosine θ index is a bounded index, constrained to vary between 0 and 1. The cosine θ, or proportional similarity, index is also a true similarity index insofar the trace of the resulting matrix is occupied by 1 s. If the variables used to compute the cosine θ index are standardized prior to similarity estimation the resulting values will be identical to those estimated from a calculation of the “correlation” across variables.
The Method Once a similarity/distance matrix has been obtained the QFA procedure conforms closely to that of FA. The linear model used in QFA derives from Spearman’s (1904) original formulation and separates the observed n by n similarity or distance structure among cases, calculated across all variables, into a set of a priori designated generalized factors with a residual, specific factor (e) associated with each of the n cases.
Q-Mode Factor Analysis
1121
X1 ¼ a1,1 F1 þ a1,2 F2 þ a1,m Fm þ e1 X2 ¼ a2,1 F1 þ a2,2 F2 þ a2,m Fm þ e2 X3 ¼ a3,1 F1 þ a3,2 F2 þ a3,m Fm þ e3
ð2Þ
⋮ Xj ¼ aj,1 F1 þ bj,2 F2 þ aj,m Fm þ ej In this expression ai is a standardized score the ith case, F is the “factor” value for that case across all measured or observed variables, and ei is the part of the distance or similarity not accounted for by the factor structure. Q-mode factor analysis attempts to find a set of m linear factors (F, where m < p) that underly the sample’s similarity or distance structure and, hopefully, provide a good estimate of the similarity/ distance structure of the larger population from which the sample was drawn.
Choosing and Extracting the Factors In a manner analogous to its R-mode counterpart, QFA begins with a principal coordinate analysis of a properly selected basis matrix whose structure reflects those aspects of between-case similarity, distance or association most appropriate for the investigation at hand. The decision of which similarity, distance or association index to use is not a trivial one as it will influence all aspects of the subsequent analysis. In addition, the number of factors the investigator wishes to construct their model around also needs to be specified at the outset of analysis. The guidance for factor number selection offered in the presentation of R-mode FA can also be applied to QFA. As always, there is no substitute for having a detailed understanding of the system being modeled. If there is uncertainty regarding the number of factors controlling the dataset’s distance/similarity structure, QFA may proceed, but its results should be interpreted with caution. Like R-mode FA, QFA scales the loadings of the retained principal coordinates by their eigenvalues, according to the following expression. ai,m ¼
2
lm bi,m
ð3Þ
Here, lm is the eigenvalue associated with the mth principal coordinate with m being set to the number of retained factors and b the eigenvector loading of the ith case on the mth principal coordinate (¼ factor). This scaling, along with the reduction in the number of factors being considered, represents a basic difference between PCoord and QFA. Next, the “communality” values of the scaled factors are calculated as the sum of squares of the factor loading values
(a) across all factors. This summarizes the proportion of similarity, distance or association provided by each case to each factor. The quantity ‘1- the sum of communality values for each case, then, expresses the proportion of the case’s distance, similarity, or association structure attributable to the error term (e) of the factor model. If the correct number of factors has been chosen, all the summed, squared factor values should exhibit a high communality for each case with a small residual error. Once the QFA factor equations have been determined, the factor loading values constitute the positions of objects, samples or events projected into the space formed from the eigenvalue-scaled orthogonal factors. These projection scores ðXÞmay then be tabulated, plotted, inspected and interpreted in the manner normal for PCoord ordinations. Table 3 and Fig. 1 compare and contrast results of a PCoord and twofactor QFA solution for the trilobite data listed in MacLeod (2006). Note that, despite the cosine θ basis for this analysis being a square 20 x 20 matrix, only three principal coordinates can be extracted from this small, demonstration dataset. This limitation stems from the fact that the data from which the similarity matrix was calculated included only three variables: body length, glabella length and glabella width. For this simple dataset both the first principal component and first factor would be interpreted as being consistent with generalized size variation owing to their uniformly positive loading coefficients. Similarly, for both analyses the second component/factor would be interpreted as reflecting localized shape variation owing to the contrast between signs among the different taxa. Figure 1 illustrates the distribution of these trilobite body shapes in the ordination space formed by the first two principal coordinate axes and the two retained factor axes. While the relative positions of the projected data points are also similar, and discontinuities in the gross form distributions are evident in both plots, the scaling of the PCoord and FA axes differ reflecting the additional eigenvalue-based scaling calculation that is part of component-based QFA. Irrespective of this correspondence, an important change has taken place in the ability of the QFA vectors to represent the structure of the original cosine θ similarity matrix (Table 4). From these results it is evident that the two-factor QFA has been much more successful in reproducing the entire structure of the original cosine θ similarity matrix than principal coordinate analysis. If all 20 principal coordinates had been used in these calculations the trace of the original similarity matrix would be reproduced exactly, but none of the off-diagonal elements. This simple demonstration illustrates how the eigenvalue-scaling operation, that stands at the core of both FA and QFA, render these approaches to data analysis more
Q
Q-Mode Factor Analysis
1122 Q-Mode Factor Analysis, Table 3 Principal coordinate loadings, Q-mode factor loadings, and factor communalities for the example trilobite morphometric variables
Principal Coordinates Parameter/ Variable Eigenvalue Acaste Balizoma Calymene Ceraurus Cheirurus Cybantyx Cybeloides Dalmanites Deiphon Ormathops Phacopidina Phacops Placopania Pricyclopyge Ptychoparia Rhenops Sphaerexochus Toxochasmops Trimerus Zacanthoides
1 19.865 (99.33%) 0.222 0.224 0.224 0.224 0.223 0.224 0.224 0.224 0.222 0.223 0.224 0.224 0.224 0.223 0.223 0.224 0.223 0.224 0.224 0.224
Factors
2 0.108 (0.540%) 0.400 0.084 0.192 0.124 0.294 0.119 0.016 0.152 0.388 0.283 0.233 0.033 0.083 0.321 0.359 0.097 0.301 0.174 0.035 0.019
a
3 0.027 (0.134%) 0.090 0.032 0.009 0.022 0.326 0.137 0.075 0.277 0.344 0.209 0.055 0.444 0.059 0.211 0.202 0.422 0.170 0.316 0.064 0.144
1 19.865 (99.33%) 0.991 1.000 0.998 0.999 0.994 0.999 1.000 0.998 0.990 0.995 0.997 0.997 1.000 0.994 0.992 0.997 0.995 0.997 1.000 1.000
2 0.108 (0.540%) 0.131 0.028 0.063 0.041 0.097 0.039 0.005 0.050 0.128 0.093 0.077 0.011 0.027 0.106 0.118 0.032 0.099 0.057 0.012 0.006
Communality
1.000 1.000 1.000 1.000 0.997 0.999 1.000 0.998 0.997 0.999 1.000 0.995 1.000 0.999 0.999 0.995 0.999 0.997 1.000 0.999
b
Q-Mode Factor Analysis, Fig. 1 Ordination spaces formed by the first two principal coordinates (A) and a two-factor extraction (B) from the three-variable trilobite dataset. Note the similarity of these results in
terms of the relative positions of forms projected into the PCoord and factor spaces and the difference in terms of the axis scales
information-rich than their PCA and PCoord counterparts, especially if a dramatic reduction in causal-factor dimensionality is required. The validity of applying either the FA or QFA models to a dataset is predicated, however, on the dataset exhibiting a factor-based variation/similarity structure. Aside from the use of different indices to quantify the structure of pairwise relations among cases, QFA also differs
from R-mode FA in a number of other, more subtle ways. For example, it is standard practice to mean center each variable prior to analysis in PCA and FA so the ordination space created as a result of those procedures will be centered on the origin of the coordinate system within which the ordination results will be displayed. This operation is typically not performed in QFA (or PCoord) for either the set of variables
Q-Mode Factor Analysis
1123
or the set of cases. In this way, the ordination actually expresses the orientation of eigenvectors all of whose tails emanate from the coordinate system origin, irrespective of whether the shafts or bodies of the eigenvectors are drawn in the Q-mode ordination space plots. It would also make little
sense to mean center the case rows across a set of variables of very different types and magnitude ranges. When QFA ordinations are examined the loadings (¼ ordination scores) on the first factor (F1) will often all exhibit comparably high positive values. This is usually is a
Q-Mode Factor Analysis, Table 4 Original cosine θ similarity matrix (upper), cosine θ matrix reproduced on the basis of the first to principal coordinate (middle), and cosine θ matrix reproduced by the two-factor
FA solution (lower). Note the bottom two matrices are based on the component/loading values shown in Table 3
Q
(continued)
Q-Mode Factor Analysis
1124 Q-Mode Factor Analysis, Table 4 (continued)
reflection of the “size” or combined magnitudes of all the variables included in the analysis. Since there often seems to be little difference among the cases along F1 it is often regarded as an irrelevant or “nuisance” factor, especially when compared to the oftentimes more informative distinctions between cases revealed in the plots of higher-level QFA ordination spaces. Nonetheless, as uniformly high correlations among variables are not characteristic of all earth science datasets, it is always a good idea to inspect the F1 vs F2 and (if necessary) F1 vs F3 ordination plots to determine whether any useful insight can be gained into the structure of between-case similarity/distance relations from an interpretation of F1. In some instances the variables being analyzed might be expressed as either compositional proportions or percentages such that the sum of each row is constrained to add up to 1.0 or 100.0, respectively. Such datasets are said to have been artificially “closed”. In such cases, application of a centered log-ratio transformation prior to analysis is usually recommended in order to mitigate the effects of the closure constraint.
Factor Rotation As with FA, QFA factor axes may be rotated relative to the variable axes in order to improve their interpretability. Axis rotation is less commonly applied in QFA because the emphasis is often on identifying the structure of similarity relations
among cases rather than interpretation of the factors in terms of the original variables. Nonetheless, such interpretations are possible and factor-axis rotations often make this interpretation easier. Operationally this amounts to adjusting the set of factor loadings, rigidly in a geometric sense, until all variables exhibit loadings at or as close to 0.0 or 1.0 across all factors as possible. Neuhaus and Wrigley’s (1954) quartimax, and Kaiser’s (1958) varimax orthogonal axis-rotation procedures can be used to perform this operation. If even more simplification is desired the constraint of the factor axes maintaining orthogonal relations can be relaxed using procedures such Carroll’s (1953) quartimin or Imbrie’s (1963) oblique axis rotation, in which case the factor axes will coincide with the orientation of the extreme variable vectors. Figure 2 illustrates the unrotated and varimax-rotated solutions for the example trilobite dataset.
Conclusion Q-mode factor analysis is often used to array cases along a spectrum defined by opposing case extrema. In some instances the purpose is to identify these extreme endmembers while, in others, it is to gauge which of these spectral end-members particular cases are closest (¼ most similar) to. Such analyses can also be accomplished, to some extent, by Q-mode cluster analysis. However, in most
Q-Mode Factor Analysis
a
1125
b
Q-Mode Factor Analysis, Fig. 2 Original (a) and rotated (b) variableaxis orientations with respect to the fixed, orthogonal factor axes for the three-variable trilobite data. The rotation of these variables is centered on
Factor 1 owing to their high mutual correlations (reflected in the high Factor 1 eigenvalue, see Table 3). More complex datasets will typically exhibit larger variable-vector rotations
cases cluster analysis represents the structure of between-case similarity relations as a hierarchy despite the fact that the generative processes that give rise to between-case structural relations might not be organized hierarchically. For systems in which similarity-distance relations are known, or suspected, not to be structured hierarchically, QFA may be the more appropriate conceptual model to employ in addition to one that will minimize the distortions that often accompany attempts to portray non-hierarchically structured data using hierarchical models. Q-mode factor analysis has proven especially attractive to geochemists, mineralogists petrologists, hydrologists, paleoecologists, and biostratigraphers who often deal with compositional data and find themselves in need of a procedure to non-hierarchically define both end-members and gradient orderings of cases in which a number of influences are mixed in varying proportions. In perhaps one of the more scientifically significant applications of the method, Imbrie and Kipp (1971) employed QFA to develop a sets of linear transfer functions whereby sea surface temperatures for ancient ocean basins could be inferred from the relative abundances of Cenozoic planktonic foraminifer species. Sepkoski (1981) also used QFA to summarize the diversity history marine life across the Phanerozoic and used the results generated through its application to recognize the three great evolutionary faunas that, together, constitute fundamental macroevolutionary historio-structural units of life on Earth. These are but two prominent examples of QFAs potential for
making important data analysis-based advances in the study of earth history.
Cross-References ▶ Cluster Analysis and Classification ▶ Correlation and Scaling ▶ Eigenvalues and Eigenvectors ▶ Grain Size Analysis ▶ Inverse Distance Weight ▶ Multivariate Data Analysis in Geosciences, Tools ▶ Principal Component Analysis ▶ R-Mode Factor Analysis ▶ Shape ▶ Statistical Outliers ▶ Variance
Bibliography Carroll JB (1953) An analytical solution for approximating simple structure in factor analysis. Psychom Theory 18:23–38 Cheetham AH, Hazel JE (2012) Binary (presence-absence) similarity coefficients. J Paleontol 43:1130–1136 Davis JC (2002) Statistics and data analysis in geology, 3rd edn. Wiley, New York Gould SJ (1967) Evolutionary patterns in pelycosaurian reptiles: a factor analytic study. Evolution 21:385–401
Q
1126 Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857–871 Imbrie J (1963) Factor and vector analysis programs for analyzing geologic data. Office of Naval Research, Technical Branch, Report 6:83 Imbrie J, Kipp NG (1971) A new micropaleontological method for quantitative paleoclimatology: application to a Pleistocene Caribbean core. In: Turekian KK (ed) The Late Cenozoic Glacial Ages. Yale University Press, New Haven, pp 72–181 Imbrie J, Purdy E (1962) Classification of modern Bahamian carbonate sediments. AAPG Mem 7:253–272 Kaiser HF (1958) The varimax criterion for analytic rotations in factor analysis. Psychometrika 23:187–200 MacLeod N (2006) Minding your Rs and Qs. Palaeontological Assoc Newsl 61:42–60 Neuhaus JO, Wrigley C (1954) The quartimax method: an analytical approach to orthogonal simple structure. Br J Stat Psychol 7:81–91 Pearson K (1901) On lines and planes of closest fit to a system of points in space. Philos Mag 2:557–572 Sepkoski JJ (1981) A factor analytic description of the Phanerozoic marine fossil record. Paleobiology 7:36–53 Spearman C (1904) “General intelligence”, objectively determined and measured. Am J Psychol 15:201–293
Quality Assurance N. Caciagli Metals Exploration, BHP Ltd, Toronto, ON, Canada Barrick Gold Corp, Toronto, ON, USA
Synonyms QAQC; Quality Control; Quality Management; Quality Systems
Definition Quality assurance is the act of giving confidence or the state of being certain that a product or service is fit for use or conforms to specifications. The ISO 9000 standard, clause 3.2.11 defines Quality Assurance as: “A part of quality management focused on providing confidence that quality requirements will be fulfilled”. (ref ISO 9000) Quality assurance (QA) encompasses all the planned and systematic activities implemented to provide confidence that a product or service will fulfill requirements for quality. Quality control (QC) typically refers to the operational techniques and activities used to fulfill requirements for quality (ASQ/ANSI 2008). Often “quality assurance” and “quality control” are used interchangeably, referring to all the actions performed to ensure the quality of a product, service, or process.
Quality Assurance
Why QAQC Quality assurance can be viewed as proactive and process oriented and is focused on the actions that produce the product or ensure the conditions a service must provide. In the geosciences, quality assurance considers the requirements of the data and often deals with the selection of appropriate analytical methods, standards to monitor and document the QC of the analyses, and regular check sampling by a thirdparty analytical laboratory, including a description of the pass/fail criteria, and the actions taken to address results that are outside of the pass/fail limits (CIM 2018). Securities regulators and financial institutions now demand that full QAQC procedures are used for all resource programs. There are various regulations and codes depending on where in the world the company has their primary stock exchange listing, such as SAMREC (South Africa), JORC (Australia) and NI 43-101 (Canada). These codes are all based on the standard definitions and globally consistent reporting requirements as determined by the Council of Mining and Metallurgical Institutions (CMMI) and the Committee for Mineral Reserves International Reporting Standards (CRIRSCO). These guidelines and standards are published and adopted by the relevant professional bodies around the world. They detail the mandatory requirements for disclosure of exploration results, reserves, resources, mineral properties, and require a summary of the nature and extent of all QC procedures employed, check assay, and other analytical and testing procedures utilized, including the results and corrective actions taken, in other words a Quality Assurance Program. The purpose of these international reporting codes is to ensure that fraudulent information relating to mineral properties is not published and to reassure investors that projects have been assessed in a scientific manner. The impetus for developing these codes came from the Busang or Bre-X scandal (Indonesia) in 1997 (Wilton 2017). Bre-X was a Canadian mining company based in Calgary, Alberta. In October 1995, they announced the discovery of one of the biggest gold deposits in the world. The estimates for this deposit climbed from 2 million ounces to a peak of 70 million ounces in 1997 with speculation of up to 200 million ounces, sending its stock price soaring from a penny share to a peak of CAD$286.50 a share. In March 1997 the project geologist, Michael de Guzman, fell to his death from a helicopter over the Indonesian jungle within days of Freeport-McMoran, a potential Busang project partner, announcing that its due diligence revealed only insignificant amounts of gold at the property. An independent third-party, Strathcona Minerals, was brought in to make its own analysis. On May 4th they published their results: the grade and tonnage of the deposit was
Quality Assurance
unsubstantiated and not based on scientific principles. The reporting was fraudulent, and the shares became worthless. It wiped out billions of dollars for many investors, which included major Canadian pension plans as well as individual investors who had committed their entire retirement funds, their children’s college funds, and even mortgaged their homes. It turned out to be the biggest mining scandal of the century (Wilton 2017).
QA, Accuracy, Standards, and Control limits Accuracy is a qualitative term referring to whether there is agreement between a measurement made and its accepted or reference value. Accuracy is determined using standards, also known as certified refence materials (CRMs) or standard reference materials (SRMs). Within the geosciences CRMs and SRMs are manufactured materials, either from natural ores or blends of barren material and ore-concentrates, with a known concentration of the elements of interest. The “known” concentration is determined from a round robin involving replicate analyses of multiple aliquots from various labs. This consensus approach results in a recommended value determined from the means of accepted replicate values as well as confidence limits obtained by calculation of the variance of the recommended value (mean of means). These confidence limits are key in determining the upper control (UCL) and lower control limits (LCL) for the process control charts. The selection of appropriate standards and blanks, and the accuracy and tolerance requirements that the analysis of those standards and blanks must fulfill as well as the precision of the replicate analyses is determined by the QA program. A standard must be of a material as similar as possible to the samples being assayed to match the ore body mineralogy, ideally custom manufactured from in-house material, and
1127
have a value that sits within the range used for classifying the material (i.e., typical waste values, ore-waste boundary, typical ore values). Given the deposit characteristics (i.e., oxide, sulfide, or various mineralization styles) and material processing requirements (i.e., leach, milling, etc.) several standards may be needed. A minimum of three is typically recommended. Statistics and QA intersect around the description of pass/ fail criteria of the quality controls that are put in place as part of the QA program to ensure accuracy and precision. Schewart Control Charts (Fig. 1.) are used to track the variation in repeated measurements of standard reference materials that is due to a defect or error in the process (i.e., contamination, equipment calibration, deviation from established operating procedures). Definition of control limits or the natural boundaries of an analysis within specified confidence levels depends on the detection of outliers. When dealing with natural materials, which are expected to be inherently variable, the determination of control limits and detection of outliers in these materials can be a rather circular process. Additionally, identifying an observation as an outlier depends on the underlying distribution of the data, which is often assumed rather than known a priori. An additional complication faced in the geosciences is that precision is a function of concentration and detection limit and, decreases as concentrations approach the detection limit of a given method. This will also impact what control limits can realistically be achieved for concentrations approaching the detection limits.
QA, Precision and Duplicates Precision is defined as the amount of variation that exists in the values of multiple measurements of the same material/
Quality Assurance, Fig. 1 Example of a Schewart Control Chart for a standard reference material used to monitor variation in the analysis
Q
1128
Quality Assurance
parameter and is often expressed as the percent relative variation at the two-standard deviation (95%) confidence level. In this definition, the larger the precision number, the less precise the sampling and analysis. Precision is measured utilizing multiple analysis of standard reference material. Often precision is expressed as percent relative standard deviation (RSD) which is determined from the mean standard deviation of multiple analysis and the mean value of the analysis. Typically, a geological analysis requires 1% carryover of elements from one sample to the next, and this 1% carryover will be proportionally larger in a small blank – making the blank test ineffective. SRM or Pulp Blanks are effectively another type of SRM with very low values of the element in question. These are inserted into the work order and used to assess contamination in the laboratory during sample analysis. Blanks can be monitored on a chart like a Shewhart control chart, although there is no lower control limit and the upper control limit is defined as a multiple of the analytical detection limit, usually 5 times the analytical limit of detection for the method used.
Check or Umpire Assays Check assays or umpire samples are pulverized samples (pulps) sent to a secondary lab and used to define interlaboratory precision and bias. These should be selected to cover a range of concentrations representing the range of values encountered, with particular focus on values representing decision points (e.g., mining cut offs, ore routing, or dispatching triggers). It is expected that the primary lab and the secondary or umpire lab have less than 5% bias. Inter-laboratory bias is identified by plotting the primary lab and secondary lab pairs on an XY-scatter plot. However, the data from the primary versus the secondary laboratory are independent of each other, and an ordinarily least squares (OLS) regression will correctly describe the relationship between the data because it assumes the X-measurement is without error. The Reduced Major Axis regression line is the regression line that usually represents the most useful relationship between the X and Y axes. It assumes that both data sets are equally error prone and calculates the regression slope as the ratio of two standard deviations (Sokal and Rohlf 1995).
Q
1132
Quality Control
Quality Control, Fig. 2 A paired duplicate chart with control lines demonstrating the required precision (+10%) for the stage of duplicates
QC Elements to Measure and Monitor Precision Sample duplicates are created at different stages of the workflow to monitor variability and detect errors related to geology, sampling, sample preparation and subsampling, and analysis. By inserting duplicates at these various points, it is possible to evaluate how much variability is contributed by these different processes. Field Duplicates or Check Samples Check samples or field duplicates are used to provide information on the reproducibility (or confidence) of the sampling, sample prep, and assaying program. They are another sample taken at the sampling site or in the case of diamond drill core a split (one half or one quarter) of the core. Field duplicate samples reflect the total variability or precision of the material being sampled, the sampling methodology, sample preparation, and analysis. The largest source of variability in field duplicates is related to geological heterogeneity and consistency in sampling and as such does not have pass/fail criterial,
rather they serve to monitor the adequacy of the sampling protocols. Coarse Duplicates or Prep Duplicates Coarse duplicates or prep duplicates reflect the variability introduced from the sample preparation procedures as well as the analytical precision. These duplicates are splits of a sample by the lab after the first comminution stage, typically the primary crushing. The largest source of error and variability in these duplicates is related to the subsampling process, that is, the crushing, the splitting of the crushed material, the pulverizing of that split, and the sampling of the resulting pulps. At each step, biases can be introduced by poor methodology, or the comminution and sampling stages may be insufficient to produce a representative subsample of the original sample, and as such they usually do not have pass/fail criterial, rather prep duplicates serve to monitor the adequacy of the sampling comminution and subsampling protocols.
Quality Control
1133
Quality Control, Fig. 3 MPD plot with control lines demonstrating the required precision (+10%) for the stage of duplicates. The limit of detection is shown (LOD) as is the limit of quantification (LOQ). Any samples that are less than the LOQ are not reliable because the signal to
noise ratio of the analysis is too low. The samples greater than the LOQ suggest that the sample prep is not adequate (sample is not sufficiently homogenized) or there are problems with the analysis. In the case, the issue was determined to be the former
Analytical Duplicates or Pulp Duplicates Pulp duplicates are used to provide information regarding analytical variation which exists in the analysis. These duplicates are splits or second aliquots of the final product of the sample preparation process, usually pulverization for rock and soil, that is analyzed side by side with the first. These duplicates will have pass/fail criteria, typically +10%, as they reflect the precision of the analysis and are used to monitor the quality of the analytical data and processes that are within the lab’s control.
the acceptable tolerance increases from +10% for analytical duplicates, to +20% for coarse duplicate, and +30% for field duplicates, reflecting the increasing contribution of the subsampling (coarse duplicates) and geological variability (field duplicates) to the paired samples. A mean percent difference (MPD) chart plots the mean concentration of the duplicate pair relative to the derived percent difference between the duplicate pair (Bland and Altman 1986). Control limits reflecting the required precision are also plotted (Fig. 3). These plots are used to provide a way of comparing the agreement between pairs and determine whether the degree of agreement is within required precision. A relative difference plot or half absolute relative difference plot (HARD) ranks relative difference against percentile and is an effective tool only when there are a large number of pairs existing. Data will be deemed acceptable if a certain percentage of the data falls within specific tolerances; depending on the commodity or the mineralization system
Analysis of QC Duplicates Various graphical and statistical tools can be utilized to visualize and determine the quality of the data. A duplicate chart plots paired samples on an XY chart (Fig. 2). Control limits reflecting the required precision are also plotted. Typically,
Q
1134
Quality Control
Quality Control, Fig. 4 HARD plot for gold assay data from pulp duplicates. From this plot, it can be determined that 90% of the samples have greater than 20% variability, which suggests an unsatisfactory analysis or sample prep for this project
being investigated, this may be something like 90% of the data must have +10% precision (Fig. 4). All the duplicate pairs can also be used in a ThompsonHowarth analysis (Thompson and Howarth 1973) to determine the effective precision as a function of concentration (see Quality Assurance Fig. 2). Separate curves expressing precision as a function of concentration can be constructed from each duplicate set to visualize precision at each stage from sampling (field duplicates) to sample prep (coarse or prep duplicates) to analysis (pulp duplicates). This will show where and at what concentration the largest variability in the sampling and analytical protocol is.
Challenges in QC Programs Working with natural systems that have inherent variability always brings challenges. When dealing with trace element concentrations, the “nugget effect” or inability to start with a
representative sample can swamp out any variability in the analytical process making it difficult to monitor whether the analysis is in control or not. To be effective, QC data must be reviewed on not only a “batch” basis, but also on longer timescales, such as monthly, quarterly, or at project milestones. Data trends such as drifts, shifts, bias, regular failures, and chronic sample prep issues (e.g., contamination) can only be fully realized when compiling data and reviewing longer time intervals.
Summary Quality control refers to the procedures and tools used to monitor these requirements for quality by testing a sample of the output against the required tolerances or specifications as determined by the quality assurance requisites. Quality control can be viewed as the tools put in place for the measuring and detection of quality.
Quantitative Geomorphology
Cross-References ▶ Lognormal Distribution ▶ Howarth, Richard J. ▶ Standard Deviation ▶ Statistical Outliers ▶ Statistical Quality Control
Bibliography ASQ/ANSI (2008) A3534-2, Statistics, “Vocabulary and Symbols” Statistical Quality Control, American Society of Quality Control. Available at: https://asq.org/quality-resources/quality-glossary. Accessed 20 Apr 2021 Bland J, Altman D (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 327(8476): 307–310 CIM (2018) CIM mineral exploration best practice guidelines, Canadian Institute of Mining, Metallurgy and Petroleum Sokal RR, Rohlf FJ (1995) Section 14.3 “Model II regression”. In: Biometry, 3rd edn. W. H. Freeman, New York, pp 541–549 Thompson M, Howarth RJ (1973) The rapid estimation and control of precision by duplicate determinations. Analyst 98:153–160 Wilton S (2017) Bre-X: the real story and scandal that inspired the movie Gold. Calgary Herald. https://calgaryherald.com/news/local-news/brex-the-real-story-and-scandal-that-inspired-the-movie-gold. Accessed 27 Nov 2020
Quantitative Geomorphology Vikrant Jain1, Shantamoy Guha1 and B. S. Daya Sagar2 1 Discipline of Earth Sciences, IIT Gandhinagar, Gandhinagar, India 2 Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
Definition Quantitative geomorphology is a part of geomorphology that explains landscape patterns, geomorphologic phenomena, processes, and cause–effect relationships for landforms on the Earth and Earth-like planetary surfaces in a firm quantitative manner. It demands frameworks to acquire spatiotemporal geomorphologic data, retrieve geomorphologic information, geomorphologic reasoning, and model simulation of geomorphologic phenomena and processes. Quantitative geomorphology is progressing in two dimensions. The structural dimension aims to characterize and explain landscape patterns, while the functional dimension focuses on processes, rates, and analyses of cause–effect relationships using fundamental concepts from physics. The approaches of
1135
geometric, topological, and morphological relevance that show promise in the structural dimension of quantitative geomorphology include classical morphometry, hypsometry, and modern approaches such as fractal geometry, mathematical morphology, and geostatistics. Classical approaches employ source data such as river networks and elevation contours, especially digital elevation models (DEMs), to derive morphometric quantities. Development in the functional dimension of quantitative geomorphology helped to integrate (1) processes at different scales and (2) feedback mechanisms between different landforms (geomorphic systems). Further, the cause–effect relationship in a quantitative framework also leads to developing prediction capability, and hence, providing an important tool to discuss the future of geomorphic systems.
Scope of Quantitative Geomorphology Geomorphologists have long ventured to identify landscape patterns and understand the events that cause changes in the landscape. The first set of queries related to landscape patterns started in 1950–1960, while the second dimension initiated in the late twentieth century was focused on process modeling and quantitative analysis of cause–effect relationship (Piégay 2019). Quantitative analysis of landscape patterns can be termed as the structural dimension of quantitative geomorphology that aims to quantitatively characterize landform or landscape features. However, current progress in quantitative geomorphology is in the area of process modeling and quantification of a cause–effect relationship, which defines the functional dimension of quantitative geomorphology. The growth in the functional dimension has seen the application of various mathematical tools to study various earth surface processes. Quantitative data on processes such as tectonics, climate, sea-level change, or human disturbances are essential to acquire an in-depth understanding of the cause–effect relationship and identify the causes. The geomorphic processes can be broadly classified into three main categories, e.g., erosion, transportation, and deposition. The primary agents that carry out these processes to modify the Earth’s surface are water, air, and ice . The prominent sources of energy to carry out the geomorphic processes are solar energy and gravitational forces. Solar energy drives other secondary forces like wind energy and generates water cycle. This wind drives the drag force to move sediments in the arid region and further generates waves that drive the geomorphic work in the coastal region. Gravitational force acts on the slope of the terrain in fluvial, glacial, and hillslope processes (Fig. 1). The geomorphic systems are analyzed through a functional approach (through quantification of driving and resisting forces in a geomorphic system) or through an evolutional approach to assess the landform or landscape trajectory with
Q
1136
Quantitative Geomorphology
Quantitative Geomorphology, Fig. 1 Generalized diagram of different forms of energy and resultant geomorphic processes
time. Functional approaches were built on the observation and measurement of landforms and fluxes, while evolutional approaches were based on qualitative analysis or quantitative modeling approaches. Quantitative geomorphology has grown significantly in the last two decades with the advent of advanced techniques of remote sensing, especially digital elevation data, chronological methods (e.g., 14C and OSL for the depositional environment; Cosmogenic nuclide e.g. 3He, 10 Be, 26Al for erosion rates and burial dating), and measurement of isotopes in water and sediments. Further, instrumentation procedures have led to in situ measurements of landform features and fluxes, while the experimental setup in the lab has generated huge quantitative datasets, which has led to several modeling approaches. Chronological tools provide a platform to analyze the process interaction, (dis)connectivity of different landscape drivers by various agents, integration of scales, and emergence of new geomorphic concepts like dis- or nonequilibrium landforms, nonlinearity, and complexity. Key aspects of quantitative geomorphology are discussed further.
Physical and Chemical Processes on the Earth’s Surface Rock Weathering Weathering is an in situ process that mechanically fractures a rock body or chemically alters it with the help of water. A rock
mass undergoes weathering process and disintegrates into fractured, weathered, and subsequently mobilized by various geomorphic agents. The weathering processes can be divided into two main categories, i.e., physical weathering and chemical weathering. Physical Weathering
Physical weathering is prominently a mechanical process where a whole rock body is extensively fractured and disseminate into smaller chunks of rocks. The fractured rock mass is further favorable for chemical disintegration and formation of saprolite. A rock body expands due to the heating process that inherently depends on the mineral content. If the stress due to expansion (or sometimes contraction) exceeds the strength of the rock, then it usually fractures and subsequently disintegrates. This expansion and contraction process due to temperature change is expressed by the volumetric thermal expansion coefficient, which is defined as a ¼ 1=V @V @T CP , where V is the volume, T is temperature, and CP indicates constant pressure. The stress s generated due to the strain is s¼
aEDT 1#
ð1Þ
where E is the Young’s modulus, ΔT is the temperature difference, and # is the Poisson’s ratio. The rate of physical weathering is significantly important in arid or glacial regions due to the importance of expansion–
Quantitative Geomorphology
1137
contraction of rock mass or frost cracking. The degree of weathering and generation of the loose sediment dictates the rate of sediment transport. Frost weathering is important as it enhances the efficiency of the rockfall in the alpine landscape (Draebing and Krautblatter 2019), whereas diurnal heating and salt weathering generate a significant amount of weathering material in the arid or semi-arid region (McFadden et al. 2005).
consistently higher in tropical watersheds (Gaillardet et al. 1999). Chemical weathering index or indices of alteration are the quantitative representation of bulk major element oxide chemistry into a single value. Examples of common weathering indices are CIA (Nesbitt and Young 1982), R (Ruxton 1968), WIP (Parker 1970), PIA (Fedo et al. 1996), and CIW (Harnois 1988).
Chemical Weathering
Hillslope Processes Hillslopes constitute the basic element of any landscape where it generates and transport a huge portion of sediment to either fluvial or glacial system. Hillslopes are either bare bedrock or soil-mantled, and the form of the hillslope depends on the characteristics of the hillslope material. Most soilmantled hillslopes are convex upward in nature, and diffusive hillslope processes are the most dominant mode of erosion. Erosion of a soil-mantled hillslope can be modeled with a simple two-dimensional diffusion equation (Fernandes and Dietrich 1997):
The top surface of the Earth is constantly exposed to the atmosphere where it interacts primarily with air and water. This interaction leads to a change in the composition of the rock mass as chemicals present in the water interact with thermodynamically unstable minerals. This process is termed chemical weathering in which the minerals come to an equilibrium with the prevailing condition on the Earth’s surface. Apart from the alteration of the chemical composition of the rock, chemical weathering alters the physical property of the rock mass by lowering the strength of the material. The rate at which this alteration takes place is termed the chemical weathering rate. The most common process of chemical weathering is the conversion of the mineral state to form ion or dispersed colloidal molecular units. The simplest process of chemical alteration is the dissolution of the carbonate rock by water and atmospheric CO2: H2 OðaqÞ þ CO2 ðgÞ þ CaCO3 ðsÞ $ Caþþ ðaqÞ þ 2HCO 3 ðaqÞ þ Al2 Si2 O5 ðHOÞ4 ðsÞ ð2Þ CaAl2 Si2 O8 þ 2CO2 þ 3H2 OðaqÞ ! Ca2þþ ðaqÞ þ 2HCO 3 ðaqÞ þ Al2 Si2 O5 ðOHÞ4 ðsÞ ð3Þ Equations 2 and 3 show the carbonate and silicate weathering reactions, respectively. Atmospheric CO2 uptake during rock weathering is an important process that leads to global change in temperature. Previous research has shown a significant correlation between the silicate weathering rates and the CO2 consumption rates (Gaillardet et al. 1999; Hartmann et al. 2009). Therefore, the quantification of silicate weathering rates is essential to understanding the carbon sequestration processes on a global scale. Carbon dioxide consumption rate corresponds to the flux of cations from the silicate weathering: CCR ¼ Q=A:
ðNaþ þKþ þMgþ þCa2þ Þ
ð4Þ
where CCR is the carbon dioxide consumption rate, Q is the discharge in m3s1, and A is the surface area of the same watershed in km2. The silicate weathering rate and CCR are
@z @2z @2z ¼D þ @t @x2 @y2
ð5Þ
where z is the elevation, D is the diffusion coefficient, x and y are the spatial coordinates, and t is the time. However, the soil-mantled hillslopes are mostly dominated by creep, which is a slow transport process. A generalized equation for creep is ϵ_ ¼ Asn
ð6Þ
where ϵ_ is the strain rate, and s is the tensile stress. Further, A ¼ A0
D0 mb Q exp kT kT
ð7Þ
where A0 is a dimensionless, experimental constant, m is the shear modulus, b is the Burgers’ vector, k is the Boltzmann’s constant, T is the absolute temperature, D0 is the diffusion coefficient, and Q is the diffusion activation energy. Apart from the slow diffusive hillslope processes, landslides are a very common case in the geomorphic system where the rapid downslope movement of materials takes place due to a sudden breach in the threshold condition. In the case of a planar segment of the hillslope, the weight of the overburdened material acts vertically and tries to shear the material away from the hillslope. The driving force on a section with width L and height h is Fd ¼ rðhdydx: cos yÞg sin y
ð8Þ
Q
1138
Quantitative Geomorphology
where r is the bulk density of the material, g is the gravitational constant, and θ is the slope of the planar segment. The resisting forces are essentially frictional forces due to contact with the overburdened material on the surface. The Frictional forces depends on the characteristics of grains of the material as well as of the surface. The resisting force is Fr ¼ ½rðhdydx: cos yÞg cos y tan f
ð9Þ
where tanf is the friction coefficient. Therefore, the stress balance for a planar segment at the failure can be represented as a “factor of safety,” which is FS ¼
½rb gh cos y cos y rw gd tan f þ C rb gh sin y cos y
ð10Þ
where d is water table height above the failure plane, rw is the density of water, and C is the cohesion of the material that restricts the failure. This simplistic equation of factor of safety provides quantitative information regarding the propensity of a hillslope for failure.
Geomorphic Agents and Their Actions Fluvial Systems Rivers are the most efficient conduit of water, sediment, and nutrients that shape the Earth’s surface and sustain aquatic life (Fig. 2). A river generally occupies a valley, which may be a bedrock valley (bedrock rivers) with less amount of sediments on the channel or the river may flow over its alluvium (alluvial channels). A major shift in river science studies took place by Quantitative Geomorphology, Fig. 2 Fluvial landscape and associated processes and landforms. The equation numbers correspond to the different processes. (Modified after Tucker and Slingerland 1994)
incorporating understanding from various disciplines, including hydrology, hydraulic engineering, physics, mathematics, water and sediment chemistry, ecology, sedimentary geology, and different approaches of numerical modeling. These have provided many datasets to further understand the driving mechanisms that cause changes in the processes and patterns. Water flows under the gravitational force and exerts driving forces for river processes. Furthermore, sediment supply in the channel and bedrock or alluvial characteristics (channel roughness) governs resisting forces for the erosion process. Erosion in the fluvial system requires power exerted by the flowing water. This power is termed stream power, which is defined as the rate of energy conversion from potential to kinetic energy as water moves downstream (Bagnold 1966). It is expressed as O ¼ rgQs
ð11Þ
where Ω is the stream power, r is the density of water (1000 kg/m3), g is the acceleration due to gravity (9.8 m/s2), Q is discharge (m3/s), and s is the channel slope. However, the rate of energy expenditure on the river bed inherently depends on the channel width. Therefore, unit stream power is a more prominent measure of geomorphic work in a channel reach. Channel reach generally defines river characteristics for a length of around 8–10 times of channel width, which represents uniform characteristics of river channel. Unit stream power is expressed as o ¼ O=w
ð12Þ
Quantitative Geomorphology
1139
where w is the channel width. Empirical studies have shown that unit stream power is an effective metric to understand the sediment transport and erosion processes in a river channel. It also elucidates the significance of the channel width on channel processes (Yanites et al. 2010). The water column exerts shear stress on the channel bed. Bed shear stress is calculated as t ¼ rgds
ð13Þ
where d is the average depth of the channel. Furthermore, the shear stress and unit stream power are related as follows: o ¼ tV
rate is presented by Meyer-Peter and Müller (1948), which is as follows:
Sb / 8
3= 2
3= 2
t 0:047
(17)
where kr is the roughness coefficient, k0r is the roughness coefficient based on the grains, and t is the dimensionless mobility parameter. Suspended sediment load is primarily modeled by empirical power equations, e.g., sediment rating curves (Asselman 2000). However, the continuity equation is also utilized to estimate suspended sediment load and can be expressed as
ð14Þ
where V is the average velocity of the water. Hence, expressions of driving force vary with scale. Stream power, unit stream power, and shear stress represent the average value of driving force for a given channel length, cross section, or a given point in a cross section, respectively.
kr k 0r
@ ðAS Þ @ ðQS Þ þ awoðS S Þ ¼ 0 @t @x
ð18Þ
where Q is the flow discharge, A is the flow area, S is the sediment concentration, S is the averaged sediment-carrying capacity, α is the adjustment coefficient for a cross section, and w is the water surface width (Zhang et al. 2014).
Water Flow and Sediment Scale Processes
Transportation of water is a classical fluid mechanics problem that is the basis of deriving the equations of vertical velocity profiles. These approaches can also be compared with traditional empirical equations (e.g., Manning’s, Chezy, and Darcy– Weisbach equations). Navier–Stokes equation is used to solve the flow condition in open-channel flow. However, with a few assumptions (i.e., flow is steady, horizontally uniform, and flow is driven by gravity) the equation can be simplified as gx þ #
@2u ¼0 @z2
ð15Þ
where gx ¼ g sin θ, u is the horizontal velocity of water, z is the water depth, and # is the viscosity of water (Anderson and Anderson 2010). Apart from this methodology, mean velocity is usually estimated with an empirical formula where Manning’s equation is the most accepted. k 2 1=2 v ¼ Rh =3S n
ð16Þ
where v is the mean cross-sectional velocity, Rh is the hydraulic radius, S is the channel slope, n is the Manning’s roughness coefficient, and k is the unit factor (1 for SI units and 1.49 for English units). Sediment is an important element in the fluvial system that shapes landforms, helps in enhancing nutrients, and sometimes causes havoc to human society. Sediment is usually transported either as bedload or suspended load. The most prominent method to empirically estimate bedload transport
Reach Scale
River morphology is important to understand the connection between the channel and watershed. It also defines habitat for riverine ecology. Therefore, it is important to understand and quantify the causative factors of river morphology. Reachscale studies of river morphology primarily focus on the effect of physical factors on the water and sediment transport in a river system. Unit stream power is the most effective measure of the driving force to understanding morphological processes (Eq. 12). A river reach can be represented in distinct categories. The most dominant patterns are braided, meandering, and anabranching. A braided river consists of a multiple channel network usually separated by a high amount of bedload deposited in a channel. A meandering channel on the contrary is a single-thread channel with multiple bends along its path. Anabranching channels are also multithread channels but fundamentally stable. Van den Berg (1995) suggested an empirical equation to separate single-thread and multithread channels based on a substantial dataset from natural channels: o ¼ 900D0:42 50
ð19Þ
where o is the unit’s stream power at the transition between single-thread and multithread channels, and D50 is the median grain size. The sinuosity values also suggested that the higher gradient with higher unit stream power values results in a transition from single-thread to multithread channel (especially braided channels). Sediment characteristics govern resisting force. Chang (1985) expressed the grain size as
Q
1140
Quantitative Geomorphology
an important factor that leads to morphological p changes in the natural river. The regression between S= D versus Q was used to understand the morphological characteristics of alluvial rivers. The erosion process in an alluvial channel is usually modeled with Exner’s equation: @z 1 @Q @S ¼ þ @t rb @x @t
ð20Þ
where Q is the sediment transport rate as bedload, S is the suspended sediment, and rb is the porosity factor (Anderson and Anderson 2010). In the case of the bedrock rivers, the erosion processes at reach scale are modeling using an excess shear stress model: E ¼ K ð t tc Þ a
ð21Þ
where t is the bed shear stress, tc is the critical shear stress, and K and a are parameters. An implication of several studies has shown that Eq. 21 is a nonlinear equation that is dependent on the local slope and water discharge (Tucker and Hancock 2010). Basin Scale
Apart from the reach-scale analysis, understanding of topographic and network characteristics at basin scale is important to quantify the topography. Morphometric analysis is perhaps the most established method to quantify, describe, and analyze landforms at different scales of studies. For the basinscale studies, researchers have used morphometric analysis for several decades. These morphometric parameters are primarily linear or areal (Table 1).
Hypsometric curve and integral are also important to study drainage basin characteristics. Hypsometric analysis corresponds to the distribution of surface area to elevation. Hypsometric curves have been largely utilized to describe the flatness of a drainage basin surface. Hypsometric integral is the integration value of the hypsometric curve and estimated using the following expression: V ¼ HA
1
x dy 0
ð22Þ
where V is the total volume, HA is the entire volume of the reference area, x ¼ Aa (a is the area within a band of elevation; A is the total area of the basin), and y ¼ Hh (h is the relative elevation; H is the absolute elevation). This integral primarily describes the nature of the dissection of a landmass within a drainage basin (Strahler 1952). River long profile is the fundamental feature to assess the spatial variability of fluvial processes. The shape of the long profile reflects the evolutionary trajectory of a fluvial system (Fryirs and Brierley 2012), and therefore, modeling the long profile has remained a fundamental question. Various functions like logarithmic, power, and exponential functions have been used to express the shape of long profiles. Jain et al. (2006) observed that natural river long profile shape could be better represented by the sum of two exponential functions: Z ¼ a1 ebf L þ a2 ebs L
ð23Þ
where Z is the elevation of the channel long profile at any point, L is the length from channel head up to any point along the river long profile, and α and β are constants and
Quantitative Geomorphology, Table 1 A list of morphometric parameters to assess the drainage basin characteristics Category Linear
Areal
Morphometric parameters Stream order Stream number (Nu) Stream length (Lu) (km) Stream length ratio (Lur) Bifurcation ratio (Rb) Basin length (km) Basin perimeter (km) Form factor ratio (Rf) Elongation ratio (Re) Circularity ration (Rc) Drainage density (Dd) Drainage texture (T) Stream frequency (Fs) Compactness coefficient (Cc)
Definition/equation Method to assign a numeric order to links in a stream network Number of stream segments of order u Length of stream of particular order Lu =LUþ1 Nu/Nu þ 1 The linear distance between source to mouth of a river in a basin Length of the basin boundary A/Lb2 p 1:128 A=Lb 4πA/P2 Ls/A Nu/P Nu/A p 2ð a=pÞ=Lb
Total length of stream for (Ls), length of particular stream order (Lu), number of streams for particular stream order (Nu), drainage basin area (A), length of drainage basin (Lb), and drainage basin perimeter (P)
Quantitative Geomorphology
1141
coefficients, respectively, of the exponential function (Sonam and Jain 2018). The morphometric analysis has been effectively used to evaluate the geological or tectonic controls in a drainage basin (Różycka et al. 2021). The differential geometry has been effectively utilized to understand the tectonics and landslide processes (Jordan 2003). Recently, the application of fractals on morphometric data also highlighted the role of lithology on landscape characteristics (Sahoo et al. 2020). Morphometric characteristics define the physical characteristics of a river basin and can also be utilized to understand the hydrologic response utilizing geomorphologic instantaneous unit hydrograph (GIUH). GIUH is the probability density function for the time of arrival of a raindrop, randomly placed at any point in the watershed (Rodríguez-Iturbe and Valdés 1979; Jain and Sinha 2003). The general equation of GIUH for an Nth-order stream can be derived by application of semi-Markov process on the drainage network and is written as dyNþ2 ðtÞ ¼ dt
N
yi ð 0Þ i¼1
d’iðNþ2Þ ðtÞ dt
ð24Þ
where θN þ 2(t) is defined as the probability that the drop is found in the state (N þ 2) at the interval “t,” and “N + 2” represents the final state at the basin outlet; θi(0) is the probability that the transport of the drop starts at state “I”; and ’i(N þ 2)(t) is the interval transition probability from state “I” to state N þ 2.
GIUH is efficient model for predicting direct runoff based on Horton’s stream-order ratio and kinematic wave theory. Additionally, the model can also depict the role of geomorphic characteristics in the flood hazard. Length ratio and length of the channel of maximum orders are proved to be the most effective control on the hydrologic response of a drainage basin in the Himalayan setting (Jain and Sinha 2006). The knowledge of the hydrologic response of a drainage basin subsequently helps to better model the response of an ungauged drainage basin. Aeolian Systems Aeolian processes in the arid region are primarily controlled by the action of wind, which is essentially a form of incoming solar energy in terms of a pressure gradient (Fig. 3). Apart from solar energy, the wind is also generated due to the Earth’s rotation and surface irregularities. Erosion and sediment transport processes in the arid region are primarily quantified by wind velocity and shear stress. Sediment is first entrained by the shear stress of the wind by overcoming the particle weight and cohesion. Bagnold (1941) first presented the simplified model representing a critical wind shear for a particular grain size: uct ¼ A
ðs rÞ g:d r
ð25Þ
where uct is the critical shear stress, A the constant dependent upon the grain Reynolds number, s is the particle density, r is
Q
Quantitative Geomorphology, Fig. 3 Aeolian landscape consisting of major landforms associated with processes. The equation numbers correspond to the different processes
1142
Quantitative Geomorphology
the wind density, g is the acceleration due to gravity, and d is the grain diameter. After the selective entrainment, the sand grains are transported either via saltation or suspension. A general model for the transportation of sediment is therefore the function of the grain size, though a general model of sediment transport should incorporate a range of grain sizes and variable wind speed. Therefore, in a given shear velocity of the wind, larger grains are transported via saltating trajectories and finer grains follow suspension trajectories. Four main forces act on the sand particle: (1) body force due to gravity, (2) aerodynamic drag force, (3) aerodynamic lift force, and (4) Magnus lift due to the spin of the particle (Anderson and Hallet 1986). Empirical models are also used to predict wind-driven transportation processes. One such model is the Rodok model, which is a fully empirical model that is particularly used to model the movement of fine particles. Whenever the wind velocity exceeds the threshold of sediment entrainment, this model is applicable (Leenders et al. 2011). The model is given as Fe ¼ a1 eb1 u
ð26Þ
where Fe is the flux of sediment, u is the sheer velocity of wind, and a1 and b1 are the empirical constants. Further, physics-based equations have also been used to understand the sediment flux in the arid and semi-arid regions. A dynamic mass balance approach is applied to the sediment transport problem. The model suggests an increase in the flux when the horizontal wind velocity exceeds the threshold of the entrainment of the sand grain. Experimental studies have shown that the dynamic mass balance approach provides a better means to quantify the sediment flux (Mayaud et al. 2017). Furthermore, this model provides temporal-scale variation of sediment transport mechanisms. Dynamic mass balance model is given as dQip ¼ H a1 ð u u t Þ 2 dt
Qip b1 d D
þ cu1tu
ð27Þ
where Qip is the predicted mass flux for the given time interval, u is the mean horizontal wind velocity over the given time interval, ut is the horizontal wind velocity threshold, d is the grain diameter (mm), D is a reference grain diameter, a1 is the constant with dimension, b1 and c1 are dimensionless constants. Integrated wind erosion modeling system (IWEMS), on the other hand, provides a quantitative prediction of wind erosion from local to global scale (Lu and Shao 2001). This model incorporates physics-based as well as empirical equations that takes care of starting from particle entrainment to deposition. Researchers have observed that
these integrated models provide a better means to simulate the effect of dust storms over a large area (Lu and Shao 2001). The grains that are entrained and transported eventually get deposited when the gravitational force acting on the grains overcomes the shear stress force. However, the rate of settling of particles in fluid flow (water and air) also depends on the density and viscosity of the transportation agents following Stoke’s law. Furthermore, vegetation also plays a major role in the sediment deposition processes by disturbing the wind velocity profiles by acting as a roughness on the surface. Glacial Systems Glaciers are the accumulated body of ice that moves downstream due to the weight of the ice and the slope of the surface under the gravitational forces (Fig. 4). Glacial systems have been used as markers of climate change because the change in the mass of ice in the glacier dictates temperature change (Vuille et al. 2008). Depending on the thermal regime, glaciers can be categorized into two types: (1) cold-based glaciers and (2) warm-based glaciers. Cold-based glaciers have the basal part entirely below the pressure melting point and are therefore also termed “dry-based glaciers.” The temperature at the base is characterized by subzero temperature. Warm-based glaciers or temperate glaciers are characterized by the temperature at 0 or above the melting point. Warmbased glaciers slide over a thin sheet of water and under the gravitational force through internal deformation. Whereas the cold-based glaciers move through internal deformation only. Basal sliding velocity is modeled as a function of basal shear stress (tb) (driving force) and effective normal stress (N ) (resisting force): p U b ¼ k tm bN
ð28Þ
where m and p are positive constants, k is a sliding parameter, and N is the effective normal stress that is the difference between ice overburden and the basal water pressure (Raymond and Harrison 1987). Empirical data also support that the erosion rates are positively correlated to the basal sliding velocity across the globe (Cook et al. 2020). The glacial erosion rate is generally assumed to be proportional to the sliding velocity of the ice in the glacial valley. Bedrock surface erosion rate (E) is estimated as E ¼ ec TU ev v
ð29Þ
where ec is erosion rate constant, T is the time-step size, Ub is the sliding velocity, and ev is an erosion exponent (Harbor 1992). The flow of ice behaves as non-Newtonian fluid where strain rate and shear stress are nonlinearly related:
Quantitative Geomorphology
1143
Quantitative Geomorphology, Fig. 4 Glacial landscape and processes. The equation numbers correspond to the different processes
du t ¼a dz t0
n
ð30Þ
where u is the fluid velocity, z is the distance along the profile, t is the shear stress, t0 is a reference or yield stress, and a and n are rheological parameters. The values of a and n may be constant in some cases but practically depend on the temperature or density of the flow (Pelletier 2008). Sediment transport in the glacial system happens via (1) debris-rich ice transport or (2) transport by meltwater. However, most sediment transport occurs by meltwater, which is characterized as glacio-fluvial sediment transportation (Delaney et al. 2018). The study utilized an empirical power-law function between the bedload and discharge from the melt. Additionally, a substantial mass of the sediment is also transported supraglacially or within the ice mass. These piles of sediment are only visible below the equilibrium line. One of the primary sources of sediment in glacial streams is the debris transported by the ice and subglacial rivers. Depositions in the glaciated region consist of poorly sorted grains, primarily called “tills.” A distinctive depositional feature in the glacial valley is eskers. Eskers represents the landform where subglacial tunnels swiftly transport water and coarse sediments. The equation for the potential field that is the total head is described as follows (Anderson and Anderson 2010): ; ¼ rw gz þ ri gðH ðz zb ÞÞ
ð31Þ
where z is elevation, zb is the elevation of the bed, and H is the thickness of the ice. Water is transported along a line perpendicular to the line of equipotential:
d ðH ðz zb ÞÞ df dz ¼0 ¼ rw g þ ri g dx dx dx
ð32Þ
The expression for the potential gradient along the interface of ice and rock is df dx
¼ ð rw r i Þ bed
dzb dz þ ri g s dx dx
ð33Þ
where zs is the elevation of ice surface. The term bed slope can be written as a function of the slope of the ice surface as dzb dz < 11 s dx dx
ð34Þ
Overall, the slope of the glacier, temperature dynamics, and glacio-fluvial interaction governs the depositional characteristics. Coastal Processes Coastal regions are one of the most dynamic landscapes. The coastal landscapes are prominently shaped by the oceanic surface waves (driven by wind energy) and the tides (driven by the gravitational attraction of the Moon and the Sun). The change in climatic conditions marks the imprint on the coastal geomorphology through the change in the geomorphic processes driven by the aforementioned agents. Contemporary as well as ancient depositional features record the period of sealevel changes, which is a marker of climate change. Wave energy is the most important geomorphic agent in the coastal system. The surface waves originate when the wind energy is transferred to the ocean surface. Therefore, strong windstorms are a causative factor of higher-amplitude waves.
Q
1144
Quantitative Geomorphology
The wave energy transported toward the coastal region in the form of oscillatory waves on the beach area leads to substantial modification of the physical features. The energy per unit length of wave or the energy density may be written as E¼
rgH 2 8
ð35Þ
where H is the wave height, r is the density of ocean water, and g is the gravitational constant. The mean potential energy is PE ¼
rgh2 rgH 2 þ 2 16
ð36Þ
Here, the first term is the mean potential energy associated with the water column height above the ocean surface and the second term represents potential energy associated with the wave (Holthuijsen 2010). The flux of energy or the power of the wave is the product of the energy density with the group wave speed: rg =2 2 1=2 H h 8 3
o¼
(37)
Wave energy is further transformed into currents, which aid in transporting and depositing sediments in the beach area. Coastal barriers are generally formed by shore-parallel transportation of sand and gravels. The most used empirical equation for the longshore drift is (Komar 1998) 3=
5=
Qlong ¼ 1:1 rg 2 H br2 sin abr cos abr
(38)
where Hbr is the height of the breaker, g is the acceleration due to gravity, r is the density of water and the sediment transport rate Qlong, and αbr is the angle between the breaker line and local shoreline. Although this mathematical model proved to be the proper justification of the physics of the movement of grains, real-time estimation of sediment transport is extremely difficult to comprehend. Therefore, these are usually achieved by measuring the actual amount of loose sediments trapped behind man-made structures. Deltas are the other important landform that forms when the river enters the sea. Deltas are primarily important because of the diverse flora and fauna. The formation of the delta can be examined by using the continuity equation of sediment (Eq. 20). Although this simplified formulation provides meaningful results, the shape and sedimentology largely depend on the water density difference, shape of the continental shelf, and amount of sediment in the channel. Therefore, modeling of deltas remains an important problem to date (Takagi et al. 2019).
Landscape Patterns and Process Interactions Mathematical Morphology in Quantitative Geomorphology Terrestrial surfaces of Earth and Earth-like planets exhibit variations across spatiotemporal scales. Understanding the organizational complexity of such terrestrial surfaces and associated features both across spatial and temporal scales leads to a study of interest to quantitative geomorphologists. Quantitative geomorphology, which gained wide attention from the classical geomorphologists, was popularized through Horton–Strahler’s contributions to morphometric and hypsometric analyses, respectively, carried out on two fundamental topological quantities, namely, river networks and elevation contours. The main source of these two quantities was topographic maps. However, the data, in particular, DEMs, available at multiple spatial scales, offer numerous advantages over the topographic maps but also pose challenges to the quantitative geomorphologists. The DEM data have rich geometric, morphological, and topological (GMT) relevance immensely useful in quantitative geomorphology. To develop a cogent spatiotemporal model, well-analyzed and well-reasoned information retrieved from spatiotemporal data are important ingredients (Sagar and Serra 2010). Such models are essential to understanding the dynamical behavior of terrestrial surficial processes in a firm quantitative manner. Three ways of understanding complexity involved in spatiotemporal behavior of terrestrial surfaces and associated phenomena include considering (1) topographic depressions and their relationships with the rest of the surface, (2) unique topological networks, and (3) terrestrial surfaces. The three features of geomorphologic relevance represented in mathematical terms as functions, sets, and skeletons are respectively surfaces, planes, and networks. From the context of geomorphology, some examples of such functions, sets, and networks include DEMs, water bodies, and river networks (Sagar 2020). Mathematical morphology (Serra 1982; Matheron 1975) offers an approach to deal with all the intertwined topics. With the advent of powerful computers with highresolution graphics facilities and the availability of DEMs, a new set of mathematical approaches offered quantitative geomorphologists insights. One of such mathematical approaches is mathematical morphology (Serra 1982) originally popular in shape and image analysis studies. Mathematical morphology offers numerous operators and transformations – named after terms stemming from the literature of the Earth Sciences (Beucher 1999) – to deal with retrieval of information from DEMs, quantitative characterization of DEMs, quantitative reasoning of the information retrieved from DEMs, modeling, and simulation of various surficial processes that involve DEMs, and spatiotemporal visualization of various surficial phenomena and processes.
Quantitative Geomorphology
Initial impetus for quantitative geomorphology through the applications of mathematical morphology was given by Sagar (1996, 2013). For the retrieval of GMT quantities from the DEMs, quantitate analysis of DEMs and GMT quantities thus retrieved from the DEMs, quantitative spatiotemporal reasoning of those retrieved quantities, modeling and simulation, as well as visualization of spatiotemporal behavior of geomorphologically relevant phenomena and processes, mathematical morphology offers numerous operators, transformations, algorithms, and frameworks. Some of the transformations, to name a few, include • Skeletonization to extract topological quantities such as the ridge, the valley connectivity networks (Sagar et al. 2000, 2003), mountain objects (Dinesh et al. 2007), and the hierarchical watersheds (Chockalingam and Sagar 2003) from the DEMs that would be eventually used in the terrestrial surface characterization. • Morphological distances in classification and clustering of geomorphologic units such as basins, sub-basins, and watersheds represented in planar form (Vardhan et al. 2013) and terrestrial surficial data represented in DEM form (Sagar and Lim 2015). • Multiscale morphological operations, skeletonization by zones of influence (SKIZ), morphological pruning, thinning, thickening, and Hit-Or-Miss Transformation were employed in deriving a host of allometric power-law relationships in channel networks, water bodies, watersheds, and several other geomorphologic phenomena (Tay et al. 2005a, b, c; Sagar and Tien 2004; Sagar and Chockalingam 2004; Nagajothi et al. 2021). The characterization via a set of derived power-law relationships highlighted the evidence of self-organization via scaling laws – in networks, hierarchically decomposed subwatersheds, and water bodies and their zones of influence, which evidently belong to different universality classes – which possess excellent agreement with geomorphologic laws such as Horton’s laws, Hurst exponents, Hack’s exponent, and other power-laws given in non-geoscientific context (Sagar 1996, 1999, 2000a; Sagar et al. 1998a, b, 1999). This aspect is further extended based on intuitive arguments that these universal scaling laws possess limited utility in exploring possibilities to relate them with geomorphologic processes. These arguments formed the basis to provide alternative methods that yield scaleinvariant but shape-dependent power laws. • Morphological shape decompositions in the characterization of hillslope region by deriving scale-invariant but shape-dependent measures (Sagar and Chockalingam 2004; Chockalingam and Sagar 2005) to quantitatively characterize the spatiotemporal terrestrial complexity that explains the commonly sharing physical mechanisms involved in terrestrial phenomena and processes.
1145
• Granulometries in surficial roughness characterization of the basins and watersheds partitioned from the DEMs (Tay et al. 2005c, 2007; Nagajothi and Sagar 2019), • Multiscale openings and closings in the modeling of the geomorphologic processes, in discrete space, under the perturbations caused due to cascade forces (flood–drought, expansion–contraction, uplift–erosion, protruding–flattening, and shortening–amplification) in a nonlinear fashion mimicking the realistic situations (Sagar et al. 1998b). Morphological modeling of the geomorphologic phenomena and processes provides unique contributions to simulations of geomorphologic networks (Sagar et al. 1998a, 2001), terrestrial surfaces (Sagar and Murthy 2000), water bodies and sand dunes (Sagar 2000b, 2001) laws of geomorphic structures under the perturbations created through an interplay between numerical simulations and graphic analysis (Sagar et al. 1998b), and understanding spatial and/or temporal behaviors of certain evolving and dynamic geomorphic phenomena (Sagar 2005). • Morphological interpolations, which provide an effective way to transform sparse data and/or information into dense data and/or information (Sagar 2010; Sagar and Lim 2015; Challa et al. 2018), in spatiotemporal visualizations of (geomorphologic) phenomena. While developing spatiotemporal geomorphologic models that provide a better understanding of the behavioral patterns of the geo(morpho)logic phenomena and processes, the availability of the relevant data at the time intervals as close as possible is important. Mathematical morphology – A robust mathematical theory offers a plethora of operations and transformations that provide the opportunity to transform the way quantitative geomorphologists examine terrestrial surfaces. Applications of numerous mathematical morphological transformations, which are new to the geomorphologic community, provide insights for quantitative geomorphologists. For the illustrations appropriate for section “Mathematical Morphology in Quantitative Geomorphology”, the reader may refer to the relevant cross-referenced chapters. Landscape Evolution Models (LEMs) The geomorphic studies explain the origin and dynamics of the topography over the Earth’s surface. Various physical and chemical processes, along with actions of different geomorphic agents at long time scales, lead to variability across the landscape. Tools in mathematical morphology are usually utilized to quantitatively characterize a landscape and its trajectory with time (Fig. 2). Further, to explain the timedependent evolution of the topography, there is a need to model the evolution processes mathematically. LEMs integrate the dynamics of the different landforms to develop the evolutionary trajectory of landscape with the help of various
Q
1146
Quantitative Geomorphology
equations corresponding to different processes at the landform scale. LEMs consider spatial locations as grids or TINs, and output from the equations flows across the grid to develop a feedback mechanism in a landscape. LEM not only explains the temporal variation of a landscape but more generally indicates a process–response system due to the interaction of landform and formative processes. LEM aims to describe the changes in the aggregate of landform shape, size, and relief over the smallest to the largest spatial scale. Integration of scales is a major challenge in the geomorphic system. The mathematical expressions of landscape and its constituent landforms help to integrate the processes at different scales in a given landscape and provide an evolutionary trajectory of landscape or explain major landscape pattern. The landformscale equations are mostly derived from the functional approach, while its integration leads to the evolutional trajectory of the landscape. Hence, LEMs not only integrate processes at different scales, but also help to integrate two approaches of geomorphic studies, i.e., functional and evolutional approaches. Various governing equations for landscape evolution are too complex to obtain coupled closed-form solutions, and therefore, numerical solutions methods are necessary to obtain the solutions. LEMs generally require the following steps: (1) continuity of mass, (2) geomorphic transport equations for movement of water and sediment, and (3) equations for erosion, transport, and deposition by geomorphic agents (Tucker and Hancock 2010). Continuity of mass is expressed as @r ¼ rsc B ∇:ðrs qs Þ @t
ð39Þ
regolith thicknesses. This equation shows that if the thickness of regolith decreases below a critical thickness, the rate of production also decreases. Although this exponential decay rule successfully predicts the weathering processes, there is a further need to include the physiochemical processes of weathering to establish mathematical models in landscape evolution. The movement of water and sediment is another problem in landscape evolution modeling. In the low-slope region, soil creep is a primary factor that transports the sediment down the slope. A simple linear or nonlinear equation (Eq. 6) for creep is utilized to model the creep Although this equation can explain the transport of materials, this does not explain the fast transport on the higher slope. This further needs a nonlinear transport function as follows: qs ¼
KD 1 ðj∇jSc Þa
ð41Þ
where Sc is the threshold gradient that represents the point of failure (Howard 1994; Roering et al. 1999). Rivers are another important feature of a landscape at a larger spatial scale. The erosion driven by the water in a river depends on the ability to transport large materials and erode its bed. The detachment limited model offers to model the river processes where the sediment is efficiently transported downstream. Expression of the 1D detachment limited erosion model is @h @hn ¼ U KAm @t @x
ð42Þ n
where is the height of land surface relative to a datum, t is time, qs is a vector depicting transport rate, B is a source term (maybe uplift or subsidence), ∇ is the divergence operator, r is the average density of rock/sediment, and rsc is the density of source materials. Geomorphic transport functions are the main component of any numerical LEM. The theory of landscape evolution, therefore, requires complete knowledge of weathering, biological activities, and action of other geomorphic agents. Anderson (2002) presented a simple approach:
where h is elevation, and U is the uplift rate. The term KAm@h @x represents channel incision as a function of driving force (Eqs. 11–13), where basin area is used as a proxy for discharge, and K is the erodibility parameter. The rate of erosion per unit time E depends on the excess bed shear stress (Eq. 21). Bed shear stress is also responsible for sediment transport processes (Eq. 17). The overall 1D model for landscape evolution with detachment limited stream power model and linear hillslope diffusion model can be expressed as @h @h @2h ¼ U KAm þD 2 @t @x @x n
dzb ¼ min Ps0 þ bH, Ps1 exp H=H dt
ð40Þ
where dzdtb is the rate of weathering, “min” is “take the minimum of,” Ps0 is the production of bare bedrock, b scales the production rate increase due to increase in thickness of regolith, Ps1 defines the intercept of the exponential decay, and H* scales the exponential decline of weathering rate for large
ð43Þ
where D is the diffusion coefficient. Isostatic rebound is another important component in the LEM. Isostatic rebound can be associated with surface erosion where the topography is relaxed due to the removal of surface load:
Quantitative Geomorphology
D
@4w @4w @4w þ 2 þ @x2 @x2 @y2 @y4
1147
¼ Drgw þ rs gDh
ð44Þ
where w is the surface displacement due to isostatic adjustment owing to increment of erosion Δh, rs is the density of rocks at the surface, Δr is the density contrast between the surface and asthenosphere, and D is the flexural rigidity. Studies have shown that without tectonic components the landscape can be elevated by several hundreds of meters with isostatic adjustment due to denudational unloading (Gilchrist and Summerfield 1990). These equations are usually solved with the help of different numerical methods. These equations are always coupled equations, and therefore, they can be simply solved by an explicit finite difference scheme. However, research has shown that implicit methods are also very efficient and do not require several checks for the stability of the solution (Braun and Willett 2013; Fagherazzi et al. 2002). Currently, Landlab is one of the most widely used Python-based numerical LEMs that solves various problems of surface, subsurface, and biological processes (Barnhart et al. 2020). Landlab numerical model contains different separate process components that can be integrated for high spatial and temporal resolution problems. Large spatiotemporal-scale models aided by the availability of big data are the current trend in this field of research. Although highly complex models are necessary for obtaining significant results, they are not sufficient to model highly complex natural processes. Therefore, all models have uncertainty. The important step to overcome the issue of uncertainty is to focus on the correctness of building a model as close as possible to reality. However, an increase in the complexity does not necessarily provide better or desired results; rather, this can attenuate important model outcomes. Geomorphic Connectivity In recent times, the concept of connectivity is an emergent property of geomorphic systems (Wohl 2017). Any geomorphic system is composed of different landforms, and there exists a hierarchical link between the operative geomorphic processes (Brierley et al. 2006). The term connectivity can be defined as the efficiency of the water, sediment, and nutrient transport from one component to another. Therefore, the study of geomorphic connectivity provides an opportunity to understand the interrelationship and interdependencies of the components (landforms) of a geomorphic system (landscape). Wohl et al. (2019) have explained three types of geomorphic connectivity: (1) sediment connectivity, (2) landscape connectivity, and (3) hydrologic connectivity. The nature of connectivity could be structure-based (structural connectivity) or process-based (functional connectivity) (Jain and Tandon 2010; Wainwright et al. 2011).
Quantification of connectivity and prediction remains a major research question for the last couple of decades. A major focus has been given to quantifying the functional connectivity of sediment through a different element of a landscape (e.g., hillslope to rivers). Connectivity index (IC) based on an empirical approach is a common tool to quantitatively express the nature of connectivity in a landscape. A commonly used IC for a river basin is expressed as (after Borselli et al. (2008)) Dup Ddn
IC ¼ log 10
ð45Þ
where Dup and Ddn are upslope and downslope components of connectivity, respectively. The upslope component of sediment connectivity is the potential for the downslope movement of the available sediment and is estimated as p Dup ¼ WS A
ð46Þ
where W is the average weighting factor of the contributing area in the upslope, S is the average slope of the upslope area, and A is the contributing area. Ddn is the component that considers the flow path length for a particle to reach the immediate sink and is estimated as Ddn ¼
i
di W i Si
ð47Þ
where di is the length of the flow path along the ith cell following the steepest downslope direction, and Wi and Si are the weighting factor and gradient of the i-th cell, respectively. Cavalli et al. (2013) suggested further modification to the slope factor, upstream contributing area, and weighting factors and suggested that these simple methods can help to understand the sediment dynamics in a very complex morphodynamic system. They also suggested that the sediment delivery pathways, amount of vegetation, and channel bed morphology are other important factors for understanding the connectivity. Recently, the application of graph theory to quantify connectivity resulted in new insights into the sediment mobilization process. Heckmann and Schwanghart (2013) pioneered an approach to describe the geomorphic system as a network consisting of notes (sediment sources) and edges (links between geomorphic processes). A graph with n nodes can be represented as n n adjacency matrix A. Depending on the topological and spatial characteristics, the fluxes are associated with the edges that transport the sediment from upstream to downstream. The betweenness centrality index (B) is the measure of the extent to the presence of a node (i) between other nodes:
Q
1148
Quantitative Geomorphology
Bi ¼
nijk njk
ð48Þ
where nijk is the number of edges or path that is present between node j and node k and that pass through i, and njk is the total number of edges or paths between node j and node k. The Shimbel index (Shi) is the factor that considers the distance between nodes and checks if the location of the nodes changes the total possible paths within the network: Shii ¼
dij d jk
ð49Þ
If the Shimbel index is high, then the node contributes to creating long paths within the network; if the Shimbel index is low, then the compactness of the network is maximized by the nodes (Cossart and Fressard 2017). For the local scale, the network structural connectivity index is developed based on the potential fluxes and the accessibility of the transportation of flux (Cossart and Fressard 2017). NSCi ¼
Fi and Fi ¼ Shii
Quantitative Geomorphology, Fig. 5 Schematic diagram showing major components of critical zone observatory. (Modified after Riebe et al. 2017)
Fijo Fjo
ð50Þ
Fijo and Fjo are calculated from the reconstruction of sediment pathways throughout the cascade. These methodologies have successfully described the scale-dependent sediment mobilization and delivery behavior in a catchment. Additionally, as the graph theory also considers the structural connectivity between the nodes (or sediment sources), hotspots of geomorphic works can be identified. Critical Zones The topmost layer of the Earth’s surface, i.e., from the bedrock to the lower atmosphere, is termed “critical zone” (Fig. 5). This term was introduced by the National Science Foundation (NRC 2001), defined as “. . . the heterogeneous, near-surface environment in which complex interactions involving rock, soil, water, air and living organisms regulate the natural habitat and determine availability of life-sustaining resources” (NRC 2001). Therefore, this zone connects the atmosphere, lithosphere, hydrosphere, and biosphere for various physiochemical processes. The experimental setup to study this thin layer is known worldwide as critical zone observatories (CZOs). Recently, several CZOs were set up around the world to understand the
Quantitative Geomorphology
1149
coupled processes and their outcome on the natural world (Goddéris and Brantley 2013). Mathematical models for the critical zone are essentially needed to quantify the rate of weathering and soil production in order to address soil sustainability. Therefore, an integrated model has been established to link the soil formation, nutrients’ transport, and transport of water (Giannakis et al. 2017). The water flow is expressed by Richard’s equation: @y @ @h ¼ k þS @t @z @z
ð51Þ
where θ is the volumetric water content, t is time, z is the vertical coordinate, k is the unsaturated hydraulic conductivity, h is the pressure head, and S is the source of water. A knowledge-based reactive transport model is generally used to understand the time-varying water saturation (Aguilera et al. 2005). Representation of 1D continuum coupled mass transport and chemical reactions can be expressed mathematically as x
@Cj @Cj @ D:x: ¼ @x @t @x
@ n:x:Cj @x
þ j
Nr
lk,j :sk ;
k¼1
j ¼ 1, . . . , m ð52Þ where t is time, and x is the position along the ID spatial domain. The two terms in the square bracket are part of the transport operator (T) whereas the last term is the sum of all transformation processes. j is a particular species and Cj is the concentration of that species. x, ν and D are variables that depends on the environmental system. sk represents the rate of the k-th kinetic reaction and lk,j is the stoichiometric coefficient of species. Although the critical zone is the fragile skin of the earth, the role of groundwater is an important factor that sustains the flora and fauna. If the groundwater table is raised, then it causes an increase in soil moisture and evapotranspiration, and further affects regional climate in the long term. Climate models without surface water–groundwater interactions tried to assess the interaction between climate–soil vegetation dynamics. The results show that vegetation properties can alter groundwater table dynamics (Leung et al. 2011). Numerical models are also utilized to understand surface water– groundwater interaction. MODFLOW’s streamflow equations are utilized to assess the connectivity between surface and groundwater (Brunner et al. 2010). The exchange of volumetric flux between the surface water and groundwater is estimated by
QSG ¼
K c Lw ðhriv hÞ ¼ criv ðhriv hÞ hc
ð53Þ
where Kc is the hydraulic conductivity of the clogging layer, L is the length of the river within a cell, w is the width of the river, hc [L] is the thickness of the clogging layer, hriv is the hydraulic head of the river, h is the groundwater head, and criv is the conductance of the clogging layer (McDonald and Harbaugh 1988). Critical zone science has brought new opportunities and challenges for quantitative geomorphologists. Understanding the feedback mechanism between different processes using large datasets requires mathematical models. Handling multiple datasets from various sources and assimilation in a single generic modeling framework is complex. It is expected that the assimilation of large datasets with mathematical modeling will yield new insight into process interactions.
Conclusions Individual geomorphic agent-based processes responsible for landscape evolution have been studied for several decades in the early twentieth century. Although these studies developed process-based understanding, the recent research queries are advancing to integrate the processes at different scales across different landforms or components. Quantitative geomorphology deals with this integration of different physiochemical processes to elucidate the landscape dynamics in a more holistic way. With the advent of large field data from different processes, comparison of model output with the real-world scenario is now easier to achieve. Furthermore, an interdisciplinary approach has brought several physicsbased and mathematical tools to understand the geomorphic system. The use of coupled differential equations (ODE and PDE), statistical techniques, and many mathematical tools has helped to better explain the geomorphologic processes at different spatial and temporal scales. Furthermore, the use of long-term LEMs helps to assess the relative contribution of different geomorphic, tectonic, and climatic agents. Such quantitative approaches also help to integrate functional and evolutionary branches of geomorphology. Quantitative geomorphology offers several approaches to derive indexes of relevance to geometry, topology, and morphology of terrestrial surfaces. Such indices would be of immense use in developing parameter-specific models through which one can make predictions of the dynamical behavior of the terrestrial surfaces. What is more interesting is to develop approaches to construct “geomorphologic attractors” that eventually provide short- and long-term behavioral predictions.
Q
1150
Cross-References ▶ Digital Elevation Model ▶ Earth Surface Processes ▶ Mathematical Morphology ▶ Morphometry ▶ Scaling and Scale Invariance
Bibliography Aguilera DR, Jourabchi P, Spiteri C, Regnier P (2005) A knowledgebased reactive transport approach for the simulation of biogeochemical dynamics in earth systems. Geochem Geophys Geosyst 6(7) Anderson RS (2002) Modeling the tor-dotted crests, bedrock edges, and parabolic profiles of high alpine surfaces of the Wind River Range. Wyoming Geomorphol 46(1–2):35–58 Anderson RS, Anderson SP (2010) Geomorphology: the mechanics and chemistry of landscapes. Cambridge University Press Anderson RS, Hallet B (1986) Sediment transport by wind: toward a general model. Geol Soc Am Bull 97(5):523–535 Asselman NEM (2000) Fitting and interpretation of sediment rating curves. J Hydrol 234:228–248 Bagnold RA (1941) The physics of blown sand and desert dunes. Methuen, London Bagnold RA (1966) An approach to the sediment transport problem from general physics. US Government Printing Office Barnhart KR, Hutton EW, Tucker GE, Gasparini NM, Istanbulluoglu E, Hobley DE, Lyons NJ, Mouchene M, Nudurupati SS, Adams JM, Bandaragoda C (2020) Landlab v2. 0: a software package for earth surface dynamics. Earth surface. Dynamics 8(2):379–397 Beucher S (1999) Mathematical morphology and geology: when image analysis uses the vocabulary of earth science: a review of some applications. Proceedings of geovision—99. University of Liege, Belgium, pp 6–7 Borselli L, Cassi P, Torri D (2008) Prolegomena to sediment and flow connectivity in the landscape: a GIS and field numerical assessment. Catena 75(3):268–277 Braun J, Willett SD (2013) A very efficient O (n), implicit and parallel method to solve the stream power equation governing fluvial incision and landscape evolution. Geomorphology 180:170–179 Brierley G, Fryirs K, Jain V (2006) Landscape connectivity: the geographic basis of geomorphic applications. Area 38(2):165–174 Brunner P, Simmons CT, Cook PG, Therrien R (2010) Modeling surface water-groundwater interaction with MODFLOW: some considerations. Groundwater 48(2):174–180 Cavalli M, Trevisani S, Comiti F, Marchi L (2013) Geomorphometric assessment of spatial sediment connectivity in small Alpine catchments. Geomorphology 188:31–41 Challa A, Danda S, Sagar BSD, Najman L (2018) Some properties of interpolations using mathematical morphology. IEEE Trans Image Process 27(4):2038–2048 Chang HH (1985) River morphology and thresholds. J Hydraul Eng 111(3):503–519 Chockalingam L, Sagar BSD (2003) Mapping of sub-watersheds from digital elevation model: a morphological approach. Int J Pattern Recognit Artif Intell 17(02):269–274 Chockalingam L, Sagar BSD (2005) Morphometry of network and nonnetwork space of basins. J Geophys Res Solid Earth 110(B8) Cook SJ, Swift DA, Kirkbride MP, Knight PG, Waller RI (2020) The empirical basis for modelling glacial erosion rates. Nat Commun 11(1):1–7
Quantitative Geomorphology Cossart É, Fressard M (2017) Assessment of structural sediment connectivity within catchments: insights from graph theory. Earth Surf Dyn 5(2):253–268 Delaney I, Bauder A, Werder MA, Farinotti D (2018) Regional and annual variability in subglacial sediment transport by water for two glaciers in the Swiss Alps. Front Earth Sci 6:175 Dinesh S, Radhakrishnan P, Sagar BSD (2007) Morphological segmentation of physiographic features from DEM. Int J Remote Sens 28(15):3379–3394 Draebing D, Krautblatter M (2019) The efficacy of frost weathering processes in alpine rockwalls. Geophys Res Lett 46(12): 6516–6524 Fagherazzi S, Howard AD, Wiberg PL (2002) An implicit finite difference method for drainage basin evolution. Water Resour Res 38(7): 21–21 Fedo CM, Eriksson KA, Krogstad EJ (1996) Geochemistry of shales from the Archean (f3.0 Ga) Buhwa Greenstone Belt, Zimbabwe: implications for provenance and source-area weathering. Geochim Cosmochim Acta 60:1751–1763 Fernandes NF, Dietrich WE (1997) Hillslope evolution by diffusive processes: the timescale for equilibrium adjustments. Water Resour Res 33(6):1307–1318 Fryirs KA, Brierley GJ (2012) Geomorphic analysis of river systems: an approach to reading the landscape. Wiley, Chichester, p 362 Gaillardet J, Dupré B, Louvat P, Allegre CJ (1999) Global silicate weathering and CO2 consumption rates deduced from the chemistry of large rivers. Chem Geol 159(1–4):3–30 Giannakis GV, Nikolaidis NP, Valstar J, Rowe EC, Moirogiorgou K, Kotronakis M, Paranychianakis NV, Rousseva S, Stamati FE, Banwart SA (2017) Integrated critical zone model (1D-ICZ): a tool for dynamic simulation of soil functions and soil structure. Adv Agron 142:277–314 Gilchrist AR, Summerfield MA (1990) Differential denudation and flexural isostasy in formation of rifted-margin upwarps. Nature 346(6286):739–742 Goddéris Y, Brantley SL (2013) Earthcasting the future Critical Zone. Elementa: Science of the Anthropocene, 1 Harbor JM (1992) Numerical modeling of the development of U-shaped valleys by glacial erosion. Geol Soc Am Bull 104(10):1364–1375 Harnois L (1988) The CIW index: a new chemical index of weathering. Sediment Geol 55:319–322 Hartmann J, Jansen N, Dürr HH, Kempe S, Köhler P (2009) Global CO2-consumption by chemical weathering: what is the contribution of highly active weathering regions? Glob Planet Chang 69(4): 185–194 Heckmann T, Schwanghart W (2013) Geomorphic coupling and sediment connectivity in an alpine catchment—exploring sediment cascades using graph theory. Geomorphology 182:89–103 Holthuijsen LH (2010) Waves in oceanic and coastal waters. Cambridge University Press Howard AD (1994) A detachment-limited model of drainage basin evolution. Water Resour Res 30(7):2261–2285 Jain V, Preston N, Fryirs K, Brierley G (2006) Comparative assessment of three approaches for deriving stream power plots along long profiles in the upper Hunter River catchment, New South Wales, Australia. Geomorphol 74(1–4):297–317 Jain V, Sinha R (2003) Derivation of unit hydrograph from GIUH analysis for a Himalayan river. Water Resour Manag 17(5):355–376 Jain V, Sinha R (2006) Evaluation of geomorphic control on flood hazard through geomorphic instantaneous unit hydrograph. Curr Sci 85(11): 1596–1600 Jain V, Tandon SK (2010) Conceptual assessment of (dis) connectivity and its application to the Ganga River dispersal system. Geomorphology 118(3–4):349–358
Quantitative Geomorphology Jordan G (2003) Morphometric analysis and tectonic interpretation of digital terrain data: a case study. Earth Surface Proc Landforms: J Br Geomorphol Res Group 28(8):807–822 Komar PD (1998) Beach processes and sedimentation, 2nd edn. Prentice-Hall, England Cliffs Leenders JK, Sterk G, Van Boxel JH (2011) Modelling wind-blown sediment transport around single vegetation elements. Earth Surface Proc Landform 36(9):1218–1229 Leung LR, Huang M, Qian Y, Liang X (2011) Climate–soil–vegetation control on groundwater table dynamics and its feedbacks in a climate model. Clim Dyn 36(1):57–81 Lu H, Shao Y (2001) Toward quantitative prediction of dust storms: an integrated wind erosion modelling system and its applications. Environ Model Softw 16(3):233–249 Matheron G (1975) Random sets and integral geometry. Wiley, New York, p 468 Mayaud JR, Bailey RM, Wiggs GF, Weaver CM (2017) Modelling aeolian sand transport using a dynamic mass balancing approach. Geomorphology 280:108–121 McDonald MG, Harbaugh AW (1988) A modular, threedimensional finite-difference ground-water flow model. USGS, Reston McFadden LD, Eppes MC, Gillespie AR, Hallet B (2005) Physical weathering in arid landscapes due to diurnal variation in the direction of solar heating. Geol Soc Am Bull 117(1–2):161–173 Meyer-Peter E, Müller R (1948) Formulas for bed-load transport. Proceedings of the second Meeting of the International Association for Hydraulic Structures Research, pp 39–64 Nagajothi K, Sagar BSD (2019) Classification of geophysical basins derived from SRTM and Cartosat DEMs via directional granulometries. IEEE J Select Topics Appl Earth Observ Remote Sensing 12(12):5259–5267 Nagajothi K, Rajasekhara HM, Sagar BSD (2021) Universal fractal scaling laws for surface water bodies and their zones of influence. IEEE Geosci Remote Sens Lett 18(5):781–785 Nesbitt HW, Young GM (1982) Early Proterozoic climates and plate motions inferred from major element chemistry of lutites. Nature 299:715–717 NRC (2001) Basic research opportunities in earth science. National Academies Press, Washington, DC Parker A (1970) An index of weathering for silicate rocks. Geol Mag 107:501–504 Pelletier JD (2008) Quantitative modeling of earth surface processes. Cambridge University Press Piégay H (2019) Quantitative geomorphology. In: International encyclopedia of geography: people, the earth, environment and technology: people, the earth, environment and technology, pp 1–3. https://doi. org/10.1002/9781118786352.wbieg0417.pub2 Raymond CF, Harrison WD (1987) Fit of ice motion models to observations from Variegated Glacier, Alaska. In: Waddington ED, Walder JS (eds) The physical basis of ice sheet modelling, vol 170. International Association of Hydrologie Sciences Publication, pp 153–166 Riebe CS, Hahm WJ, Brantley SL (2017) Controls on deep critical zone architecture: a historical review and four testable hypotheses. Earth Surf Process Landf 42(1):128–156 Rodríguez-Iturbe I, Valdés JB (1979) The geomorphologic structure of hydrologic response. Water Resour Res 15(6):1409–1420 Roering JJ, Kirchner JW, Dietrich WE (1999) Evidence for nonlinear, diffusive sediment transport on hillslopes and implications for landscape morphology. Water Resour Res 35(3):853–870 Różycka M, Jancewicz K, Migoń P, Szymanowski M (2021) Tectonic versus rock-controlled mountain fronts–Geomorphometric and geostatistical approach (Sowie Mts., Central Europe). Geomorphology 373:107485 Ruxton BP (1968) Measures of the degree of chemical weathering of rocks. J Geol 76:518–527
1151 Sagar BSD (1996) Fractal relations of a morphological skeleton. Chaos, Solitons Fractals 7(11):1871–1879 Sagar BSD (1999) Estimation of number-area-frequency dimensions of surface water bodies. Int J Remote Sens 20(13):2491–2496 Sagar BSD (2000a) Letter to editor’s fractal relation of medial axis length to the water body area. Discret Dyn Nat Soc 4(2):97–97 Sagar BSD (2000b) Multi-fractal-interslipface angle curves of a morphologically simulated sand dune. Discret Dyn Nat Soc 5(1):71–74 Sagar BSD (2001) Generation of self organized critical connectivity network map (SOCCNM) of randomly situated surface water bodies. Letters to Editor, Discrete Dynamics in Nature and Society 6(3): 225–228 Sagar BSD (2005) Discrete simulations of spatio-temporal dynamics of small water bodies under varied streamflow discharges. Nonlinear Process Geophys 12(1):31–40 Sagar BSD (2010) Visualization of spatiotemporal behavior of discrete maps via generation of recursive median elements. IEEE Trans Pattern Anal Mach Intell 32(2):378–384 Sagar BSD (2013) Mathematical morphology in geomorphology and GISci. Chapman & Hall, Taylor & Francis Group, p 546 Sagar BSD (2020) Digital elevation models: an important source of data for geoscientists [education]. IEEE Geosci Remote Sensing Magazine 8(4):138–142 Sagar BSD, Chockalingam L (2004) Fractal dimension of non-network space of a catchment basin. Geophys Res Lett 31(12):L12502 Sagar BSD, Lim SL (2015) Ranks for pairs of spatial fields via metric based on grayscale morphological distances. IEEE Trans Image Process 24(3):908–918 Sagar BSD, Murthy KSR (2000) Generation of fractal landscape using nonlinear mathematical morphological transformations. Fractals 8(3):267–272 Sagar BSD, Serra J (2010) Spatial information retrieval, analysis, reasoning and modelling. Int J Remote Sensing 31(22):5747–5750 Sagar BSD, Tien TL (2004) Allometric power-law relationships in a Hortinian fractal digital elevation model. Geophys Res Lett 31(6): L06501. https://doi.org/10.1029/2003GL019093 Sagar BSD, Omoregie C, Rao BSP (1998a) Morphometric relations of fractal-skeletal based channel network model. Discret Dyn Nat Soc 2(2):77–92 Sagar BSD, Venu M, Gandhi G, Srinivas D (1998b) Morphological description and interrelationship between force and structure: a scope to geomorphic evolution process modelling. Int J Remote Sens 19(7):1341–1358 Sagar BSD, Venu M, Murthy KSR (1999) Do skeletal network derived from water bodies follow Horton’s laws? J Math Geol 31(2):143–154 Sagar BSD, Venu M, Srinivas D (2000) Morphological operators to extract channel networks from digital elevation models. Int J Remote Sens 21(1):21–30 Sagar BSD, Srinivas D, Rao BSP (2001) Fractal skeletal based channel networks in a triangular initiator basin. Fractals 9(4):429–437 Sagar BSD, Murthy MBR, Rao CB, Raj B (2003) Morphological approach to extract ridge-valley connectivity networks from Digital Elevation Models (DEMs). Int J Remote Sens 24(3):573–581 Sahoo R, Singh RN, Jain V (2020) Process inference from topographic fractal characteristics in the tectonically active Northwest Himalaya, India. Earth Surf Proc Landforms 45(14):3572–3591 Serra J (1982) Image analysis and mathematical morphology. Academic Press, London, p 610 Sonam, Jain V (2018) Geomorphic effectiveness of a long profile shape and the role of inherent geological controls in the Himalayan hinterland area of the Ganga River basin, India. Geomorphology 304: 15–29 Strahler AN (1952) Hypsometric (area-altitude) analysis of erosional topography. Geol Soc Am Bull 63(11):1117–1142
Q
1152 Takagi H, Quan NH, Anh LT, Thao ND, Tri VPD, Anh TT (2019) Practical modelling of tidal propagation under fluvial interaction in the Mekong Delta. Int J River Basin Manag 17(3):377–387 Tay LT, Sagar BSD, Chuah HT (2005a) Analysis of geophysical networks derived from multiscale digital elevation models: a morphological approach. IEEE Geosci Remote Sens Lett 2(4): 399–403 Tay LT, Sagar BSD, Chuah HT (2005b) Allometric relationships between travel-time channel networks, convex hulls, and convexity measures. Water Resour Res 46(2) Tay LT, Sagar BSD, Chuah HT (2005c) Derivation of terrain roughness indicators via Granulometries. Int J Remote Sens 26(18):3901–3910 Tay LT, Sagar BSD, Chuah HT (2007) Granulometric analysis of basinwise DEMs: a comparative study. Int J Remote Sens 28(15): 3363–3378 Tucker GE, Hancock GR (2010) Modelling landscape evolution. Earth Surf Process Landf 35(1):28–50 Tucker GE, Slingerland RL (1994) Erosional dynamics, flexural isostasy, and long-lived escarpments: a numerical modeling study. J Geophys Res: Solid Earth 99(B6):12229–12243 Van den Berg JH (1995) Prediction of alluvial channel pattern of perennial rivers. Geomorphology 12(4):259–279 Vardhan SA, Sagar BSD, Rajesh N, Rajashekara HM (2013) Automatic detection of orientation of mapped units via directional granulometric analysis. IEEE Geosci Remote Sens Lett 10(6):1449–1453 Vuille M, Francou B, Wagnon P, Juen I, Kaser G, Mark BG, Bradley RS (2008) Climate change and tropical Andean glaciers: past, present and future. Earth Sci Rev 89(3–4):79–96 Wainwright J, Turnbull L, Ibrahim TG, Lexartza-Artza I, Thornton SF, Brazier RE (2011) Linking environmental regimes, space and time: interpretations of structural and functional connectivity. Geomorphology 126(3–4):387–404 Wohl E (2017) Connectivity in rivers. Prog Phys Geogr 41(3): 345–362 Wohl E, Brierley G, Cadol D, Coulthard TJ, Covino T, Fryirs KA, Grant G, Hilton RG, Lane SN, Magilligan FJ, Meitzen KM (2019) Connectivity as an emergent property of geomorphic systems. Earth Surf Process Landf 44(1):4–26 Yanites BJ, Tucker GE, Mueller KJ, Chen YG, Wilcox T, Huang SY, Shi KW (2010) Incision and channel morphology across active structures along the Peikang River, central Taiwan: implications for the importance of channel width. Bulletin 122(7–8): 1192–1208 Zhang W, Jia Q, Chen X (2014) Numerical simulation of flow and suspended sediment transport in the distributary channel networks. J Appl Math:2014
Quantitative Stratigraphy Felix Gradstein1 and Frits Agterberg2 1 University of Oslo, Oslo, Norway 2 Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Quantitative stratigraphy uses relatively simple or complex mathematical-statistical methods to calculate stratigraphic models that with a minimum of data provide a maximum of predictive potency, and include formulation of confidence limits. (F. P. Agterberg 1990)
Quantitative Stratigraphy
Definition The RASC method for RAnking and SCaling of biostratigraphic events was developed by the authors and associates to calculate large zonations with estimates of uncertainty. The purpose of ranking is to create an optimum sequence of fossil events observed in many different wells or sections subject to stratigraphic inconsistencies in the direction of the arrow of time. These inconsistencies, which result in crossovers of lines of correlation between sections, are due to various sampling errors and other sources of uncertainty including reworking and misclassification. They can be resolved by statistical averaging combined with stratigraphic reasoning. Subsequent scaling of the events may be carried out by estimating intervals between successive events along a relative timescale. This results in the scaled optimum sequence. Either the ranked optimum sequence or the scaled optimum sequence may be used for biozonation. The observed positions in the sections of different biostratigraphic events can have different degrees of precision. These differences may be evaluated through analysis of variance. Other types of stratigraphic events including logmarkers can be integrated and incorporated with the biostratigraphic events. Stratigraphically, the (scaled) optimum sequence represents the average order in which events occur in a sedimentary basin. The latter is realistic for practical correlations of strata.
Introduction The purpose of this contribution is to provide a succinct description of RASC using the geomathematical method outlined in Agterberg and Gradstein (1999) and the Cretaceous microfossil data set of Gradstein et al. (1999) for illustration. The Cretaceous data set contains 1753 records on occurrences of 517 stratigraphic microfossil events in 31 exploration wells, offshore Norway. Quantitative stratigraphy is also concerned with the correlation of lithologies or logs in different wells or outcrop sections. The emphasis in this entry is on biostratigraphic correlation although lithologies with lithostratigraphic signals can be incorporated into quantitative methods such as RASC. Modern biostratigraphy must cope with occurrence data (events) from hundreds of fossil taxa, in thousands of samples derived from many wells or sections in a sedimentary basin. New tools in stratigraphy, using quantitative methods, make it easier to objectively build zonations integrating different microfossil groups. In addition, individual sections may be tested for “stratigraphic normality.” A prime method is called RASC for RAnking and Scaling of fossil events, and subject of this study. RAnking and SCaling is a probabilistic method, available with Version 20. The method uses fossil event order in wells
Quantitative Stratigraphy
or outcrop sections to construct a most likely and average sequence of events. This optimal order is scaled in relative time using crossover frequencies of all event pairs. The method provides detailed stratigraphic error analysis, and several correlation options, the latter executed in the program called CASC. RASC with CASC operates on all wells simultaneously, is very fast, handles large and complex data sets, and is relatively insensitive to noise. Literature on the method is extensive, and there are many applications, particularly in petroleum basins (see later). Before we outline the method itself, we briefly deal with the properties of fossil data and the data input. A paleontological record is the position of a fossil taxon in a rock sequence. The stratigraphic range of a fossil is a composite of all its records from oldest to youngest in a stratigraphic sense. The endpoints of this range are biostratigraphic events, which include the first occurrence and appearance in time and last occurrence or disappearance from the geologic record. A biostratigraphic event is the presence of a taxon in its time context, derived from its position in a rock sequence. The fossil events are the result of the continuing evolutionary trends of Life on Earth; they differ from physical events in that they are unique, nonrecurrent, and that their order is irreversible. Often, first and last occurrences of fossil taxa are relatively poorly defined records, based on few specimens in scattered samples. Particularly with time-wise scattered last occurrences, one may be suspicious that reworking has locally extended the record, a reason why it is useful to distinguish between last occurrence (LO), and last common or last consistent occurrence (LCO), where consistency refers to occurrence in consecutive samples in a well or outcrop section. The shortest spacing in relative time between successive fossil events is called resolution. The greater the probability that such events follow each other in time, the greater the likelihood that correlation of the event record models isochrons. Most industrial data sets make use of sets of LO and LCO events. In an attempt to increase resolution in basin stratigraphy, particularly when many sidewall cores in wells are available, up to five events may be recognized along the stratigraphic range of a fossil taxon. Such events include last stratigraphic occurrence “top” or LO event, last common or last consistent occurrence LCO event, last abundant occurrence LAO event, first common or consistent occurrence FCO event, and first occurrence FO event (Fig. 1). Unfortunately, such practice may not yield the desired increase in biostratigraphic resolution sought after, for reason of poor event traceability. Event traceability is illustrated in Fig. 2, where cumulative event distribution is plotted against the number of wells for seven exploration and Deep Sea Drilling Project data sets with different microfossil records studied by the senior author. The events stem from dinoflagellates, foraminifers, and relatively few miscellaneous microfossils. The curves are asymptotic, showing an inverse relation between event
1153
distribution and the number of wells. None of the events occur in all wells; clearly, far fewer events occur in five or six wells than in one or two wells, and hence the cumulative frequency drops quite dramatically with a small increase in number of wells. Obviously, the majority of fossil events have poor traceability, which is true for most data sets, either from wells or from outcrops (Gradstein and Agterberg 1998). Microfossil groups with higher local species diversity, on average, have less event traceability. Data sets with above average traceability of events are those where one or more dedicated observers have spent above average time examining the fossil record, verifying taxonomic consistency between wells or outcrop sections, and searching for “missing” data. In general, routine examination of wells by consultants yields only half or much less of the taxa and events than may be detected with a slightly more dedicated approach. There are other reasons than lack of detail from analysis why event traceability is relatively low. For example, lateral variations in sedimentation rate change the diversity and relative abundance of taxa in coeval samples between wells, particularly if sampling is not exhaustive, as with well cuttings or sidewall cores. It is difficult to understand why fossil events might be locally missing. Since chances of detection depend on many factors, stratigraphical, mechanical, and statistical in nature (Fig. 3), increasing sampling and studying more than one microfossil group in detail is beneficial. Although not admitted or clarified, biostratigraphy relies almost as much on the absence, as on the presence of certain markers. This remark is particularly tailored to microfossils that generally are widespread and relatively abundant, and harbor stratigraphically useful events. Only if nonexistence of events is recognized in many well-sampled sections, may absences be construed as affirmative for stratigraphic interpretations. If few samples are available over long stratigraphic intervals, the chance to find long-ranging taxa considerably exceeds the chance to find short-ranging forms, unless the latter are abundant. In actual practice, so-called index fossils have a short stratigraphic range and generally are less common, hence easily escape detection, a reason why their absence should be used with caution (Fig. 4).
Data Selection, Run Parameters, and Unique Events It will be obvious to the user of RASC that a set of wells or outcrop sections to be zoned should have maximal stratigraphic overlap between sections and be devoid of major breaks or disconformities, etc. Since RASC is a statistical method, it is desirable to have sufficient sections (usually 10 or more) and sufficient events for calculation. Hence, events to be zoned should occur in at least 4, 5, 6, 7, or more sections (threshold Kc), with event pairs obviously set
Q
1154
Quantitative Stratigraphy
Quantitative Stratigraphy, Fig. 1 Events recognized along the stratigraphic range of a fossil taxon. Such events include last stratigraphic occurrence “top” or LO event, last common or last consistent occurrence LCO event, last abundant occurrence LAO event, first common or
consistent occurrence FCO event, and first occurrence FO event. Unfortunately, such practice may not yield the desired increase in biostratigraphic resolution sought after, for reason of poor event traceability. Event traceability is illustrated in Fig. 3
to occur in fewer sections, e.g., 3 (threshold Mc). Index fossils, occurring in fewer than threshold Kc sections, may be introduced in the Optimum Sequence by the Unique Event routine. After calculation of the Optimum Sequence, assigned unique events are looked up in the single or few wells/sections where they occur. Their Optimum Sequence position is determined from the position of their nearest (section) neighbors above and below, known in the calculated optimum sequence. In the Optimum Sequence, these Unique Events have two stars.
generate the input data, for RASC, and also provides a printable file for input data curation. The basic input for RASC consists of a list of event names dictionary (max. 999 events per dictionary, with fossil names of max. 45 characters, incl. spacing) and a well data file consisting of multiple records of the events in all wells considered (max. 150 wells, but 0. RBFs can be split into two categories: strictly positive definite (SPD) and conditionally positive definite (CPD). Commonly used SPD and CPD RBFs are listed in Tables 1 and 2, respectively. Refer to Wendland (2005) for detailed analysis and properties of these functions. For SPD RBFs, A is always positive definite, whereas conditionally positive definite (CPD) RBFs of order m require the orthogonality constraints N
wj pk xj ¼ 0, 1 k Q
ð4Þ
j
to ensure positive definiteness where p(x) is at most a (m-1)degree polynomial. Furthermore, for SPD RBFs, low-degree polynomials are not required however can be useful for extrapolation. Notations used in this definition are given in Table 3.
ð1Þ
j¼1
Introduction where p(x) is a l-degree d-variate polynomial pð xÞ ¼
Q
ck pk ðxÞ, Q ¼ k
ðl þ dÞ! l!d!
ð2Þ
Coefficients wj and ck are determined from the linear system A PT
P 0
w f ¼ c 0
(3)
Scattered data approximation methods utilizing radial basis functions originally appeared in spatially dependent geoscience applications such as geodesy, hydrology, and geophysics in the early 1970s. A thorough review on these early works, using multiquadric and inverse multiquadric RBFs, is found in Hardy (1990). Beyond the 1980s, a myriad of uses for RBFs were found in a wide range of applications including artificial intelligence, solving PDEs, optimization, finance, signal processing, and 3D modeling. Their success can be
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
1176
Radial Basis Functions
Radial Basis Functions, Table 1 Common SPD RBFs f(r)
Strictly positive definite (SPD) RBF Gaussian
2
eðerÞ (ε2 þ r2)k/2 k < 0 (1 rc)4(4rc þ 1), rc < 1 0, rc 1
Inverse multiquadric Wendland (C2)
ð1 r c Þ6 35r 2c þ 18r c þ 3 , rc < 1 0, rc 1 eεr eεr(1 þ εr) eεr(3 þ 3εr þ (εr)2)
Wendland (C4) Matérn (C0) Matérn (C2) Matérn (C4)
Smaller shape parameters ε correspond to flatter functions as r increases. For Wendland’s functions, the variable rc ¼ r/R where R is a cutoff radius. For values rc > 1, the function is zero Radial Basis Functions, Table 2 Common CPD RBFs and their conditional order m Conditionally positive definite (CPD) RBF Multiquadric Radial powers Thin-plate splines
f(r) (ε2 þ r2)k/2 k > 0, k 2ℕ rk k > 0, k 2ℕ r2k log r k > 0, k ℕ
Conditional order m dk/2e dk/2e kþ1
Radial powers and thin-plate splines are known as polyharmonic splines
Radial Basis Functions, Table 3 Notations and associated descriptions used in the RBF definition Notations d x X N xj f(x) s(x) F(x, xj) r f(r) l p(x) pk(x) Q w c A P wTAw m ε R rc
Descriptions Dimensionality d-dimensional point, e.g., 3D point x 5 (x, y, z) Set of scattered points {x1,, . . ., xN}. Scattered points are irregularly distributed Number of data points in point set X j-th data point from X Unknown scalar value at point x Interpolated scalar value at point x Kernel function. Two points (x and xj) are input, and output is a scalar Euclidean distance between two points Radial basis function Degree of polynomial l-degree polynomial, e.g., 1-degree trivariate (3D): p(x) ¼ c1x þ c2y þ c3z þ c4 k-th monomial of polynomial, e.g., 1-degree trivariate polynomial has 4 monomials: (x, y, z, 1), 2-degree trivariate polynomial has 10 monomials: (x2, y2, z2, xy, xz, yz, x, y, z, 1) Number of monomials for polynomial RBF coefficients w 5 (w1, . . ., wN) Polynomial coefficients c 5 (c1, . . ., cQ) Symmetric kernel matrix generated by radial kernel F(x, xj) using all of point pairs of point set X Polynomial matrix containing all monomials for point set X Quadratic form associated with A. Interpolant smoothness is measured in terms of the quadratic form: ksk2 ¼ wTAw, where ksk is the interpolant’s norm. Positive definiteness of the quadratic form (e.g., wTAw > 0) ensures the smoothest possible interpolation Conditional order of CPD RBF SPD RBF shape parameter Cutoff radius for Wendland’s RBFs Wendland’s RBF variable
attributed to the robustness, versatility, and simplicity of the mathematical models that use them. Important historical milestones beyond Hardy’s pioneering work will be given. The variational approach
developed by Duchon (1976) demonstrated that polyharmonic splines, like thin plate splines, minimize the bending energy (e.g., aggregated squared curvature). These characteristics enabled modeling of complex surface
Radial Basis Functions
geometries with arbitrary genus topology in addition to desirable surface smoothness and hole-filling properties. In the mid-1980s, Micchelli (1986) proved the interpolation matrix is always invertible and laid a strong foundation for future work. By the late 1990s and early 2000s, these theoretical foundations coupled with dramatic increases in computing power, and RBF-based optimization techniques led to widespread use of implicit modeling of complex 3D objects from large datasets (Carr et al. 2001). The most common use of RBFs in geoscience is with implicit modeling. Interestingly, the geostatistical method of interpolation known as kriging which takes a stochastic point of view of data observations as realizations of random variables belonging to a random field has been proven mathematically equivalent to RBF interpolation (Dubrule 1984; Scheuerer et al. 2013). Although these connections have been known by geostatisticians for some time, it has been only recently recognized by the numerical analysis community. RBF interpolation can be viewed as a subset of kriging with a chosen fixed covariance function and polynomial trend degree. In kriging, covariance functions are estimated from data.
RBF Implicit Modeling Implicit modeling is a method by which iso-contours implicitly defined as some level set of a reconstructed scalar function are extracted from a discretized domain (e.g., triangulated meshes, voxel grids). Iso-contours, such as curves (2D) and surfaces (3D), require interpolants to be first evaluated throughout the discretized domain. Iso-contour extraction from evaluated domains is performed using computer graphic algorithms such as marching cubes (Lorensen and Cline 1987) (for 3D applications). In geoscience, RBF-based implicit modeling methods are routinely applied to geologically derived structures and properties (Cowan et al. 2002) and provide a fast, automatic, and reproducible approach. Figure 1 illustrates RBF implicit modeling for 3D structural geology and ore grade estimation applications. Information regarding geological setting and data used for these examples can be found in de Kemp et al. 2016. Implicit modeling greatly improved upon shortcomings of previously widely used approaches for these applications called explicit modeling. Explicit-based modeling approaches required manual wireframe construction by Bezier and NURB objects by a modeler using CAD tools. Although explicit modeling offered great precision and flexibility, they suffered from lengthy nonreproducible user-dependent wireframe creation, as well as being difficult to update to newly acquired data – a common occurrence in geoscience. Many geoscience applications suffer from limited data observations and give rise to nonunique models. There are an infinite set of solutions describing given observations. In
1177
underdetermined settings, domain experts would like to test possible scenarios and hypotheses explaining physical phenomena apparent from data observations. Implicit modeling provides a framework in which many of these scenarios can be quickly tested and compared. Furthermore, recent developments in implicit model uncertainty characterization using Bayesian inference can quantify the degree to which scenarios are more likely (de la Varga and Wellmann 2016). These considerations make implicit modeling particularly advantageous in geoscience applications. For structural RBF implicit modeling, geologically derived surfaces, such as interfaces between stratigraphic layers, are commonly implicitly defined as the zero-level set of the reconstructed scalar function (e.g., f(x) ¼ 0). In order to obtain nontrivial solutions to Eq. 3, additional points, called offset points, are generated by projecting known surface points attributed with planar orientation (e.g., normal), either measured or derived. Constraints for scalar values of offset points are set to its projection distance. Offset points above the surface distances are positive, whereas points below distances are negative. Geometrical artifacts in modeled surfaces using offset points can occur when modeling highly variant structures, since depending on the chosen projection distance points can be projected on the wrong side of the surface. Structural field observation datasets predominantly contain orientation measurements not directly sampling the surface of interest. Since off-surface orientation measurements influence the surface of interest’s geometry, with closer measurements having greater impact than further measurements, the offset point approach to implicit modeling for such applications is far from optimal. Problems arising from the offset points were resolved by extensions to the basic RBF interpolation method explained in the next section.
RBF Methods and Enhancements for Geoscience RBF interpolants of the form Eq. 1 can be expanded to include derivatives, inequality, surface points from multiple distinct surfaces, and fault constraints further enhancing modeling capacity for geoscience applications. Supplementary to the additional constraints, important RBF-based approximation methods will be described that may provide helpful for noisy datasets where exact interpolation should be avoided. Derivative and Inequality Constraints Incorporation of both derivatives and inequality constraints into RBF-interpolation can be achieved using generalized interpolation (Hillier et al. 2014) which employs a set of linearly independent functionals operating on RBFs. Linear functionals vary depending on constraint type. Derivative constraints such as gradients and directional derivatives
R
1178
Radial Basis Functions, Fig. 1 Examples of RBF implicit modeling in geoscience: (a) overturned stratigraphic contact surface modeled using structural field observations: strike/dip orientations (red/blue colored tablets indicating stratigraphic top/bottom, respectively, of contact) and
Radial Basis Functions
green contact points from map traces and drill core; (b) ore grade modeling using metal assay data from drill core point samples (colored points). Iso-surfaces represent low, medium, and high ore grade
Radial Basis Functions
allow orientation data to be directly included into interpolation, overcoming challenges associated with the offset point approach for influencing modeled orientation. In addition, these constraints enhance continuity and extension of sampled structural features. Several scalar field interpolations using derivative constraints for orientation data (strike/dip with polarity) are shown in Fig. 2. Inequality constraints provide a means for imposing lower or upper bounds, as well as both, on data values. These constraints can also be leveraged for approximation purposes rather than exact interpolation (Fig. 2b). Furthermore, for structural modeling, inequality constraints can be used for rock unit observations where only their relative positions (e.g., above/below) to modeled surfaces are known. Determining optimal interpolants constrained by inequalities requires minimization of the interpolant’s norm via quadratic optimization. It is important to note that generalized RBF interpolants including geological interfaces, planar orientations, and tangent constraints (e.g., directional derivatives) are nearly identical to the interpolant from co-kriging in the dual form
1179
(Lajaunie et al. 1997), differing only by the adoption of interface increments, indeed not surprising given the mathematical equivalence of RBF interpolation and kriging previously mentioned. Interface increments are constraints that permit the inclusion of multiple distinct surfaces simultaneously. To do so, a set of independent pairs of points for each distinct surface are constructed and added to the interpolant. The interpolation conditions for these constraints are the scalar field difference for each pair being equal to zero (e.g., points on the same iso-surface have equal scalar field values). Interface increments are particularly useful when modeling conformal stratigraphic geological volumes of rock and can also easily be implemented into the RBF interpolation using the same methodology. Discontinuities Discontinuous features can be incorporated into reconstructed scalar fields using RBF-based modeling. For 3D geological implicit modeling, discontinuous shifts in modeled rock volumes caused by geological faulting can be incorporated using
R
Radial Basis Functions, Fig. 2 Effect of different RBFs, methods, and parameters on scalar field interpolations for a synthetic dataset consisting of interface points (white circles) and strike/dip measurements with polarity (oriented line with arrow). Curve (black) represents the zerovalued iso-contour associated with the modeled interface. (a and b) cubic
radial power RBF that is fitted exactly and approximately (smoothed), respectively. (c and d) multiquadric RBF interpolations using different shape parameter ε values. (e and f) Wendland (C2) RBF interpolations for two cutoff radii R
1180
the three-step procedure developed by Calcagno et al. (2008). First, fault surface geometries represented by its own scalar field are modeled using corresponding fault constraints (e.g., fault locations, orientations). Second, descriptions for each fault’s termination characteristics (e.g., infinite vs. finite) in addition to fault relationships (e.g., fault network), if applicable, using fault-fault relations are developed. The fault network relationships characterize the volumes of each fault’s scalar field representation. Finally, to incorporate the sharp shifts in rock volumes across faults, discontinuous polynomial functions are added to the linear system Eq. 3 in a manner respecting the fault networks relationships. Approximation Methods Noisy and/or highly varying datasets can be problematic for exact RBF-interpolation. In these settings, geometrical artifacts and physically implausible solutions can be produced. Approximation methods are suggested for datasets possessing these attributes. There are four available approximation RBF-methods with varying degrees of implementation and computational complexity, including spline smoothing (Carr et al. 2001), use of inequality constraints, convolving an interpolant with a smoothing kernel (Carr et al. 2003), and greedy algorithms (Carr et al. 2001). Spline smoothing simply requires user-specified values to be added to the diagonal elements of matrix A in Eq. 3. Larger values produce larger amounts of smoothing. However, choosing optimal values to add which are physically meaningful can be difficult. Obtaining the desired amount of smoothing requires a trial and error approach. As previously mentioned, inequality constraints can be useful for approximation purposes (Fig. 2b). RBFinterpolants are optimal in the sense they have minimum norms where smaller interpolant norms correspond to smoother solutions. Since inequality constraints expand the range of possible solutions, interpolants that incorporate them will be smoother as compared to exact interpolation. The disadvantage of this approach is the additional computational complexity resulting from the quadratic optimization needed to obtain the minimum norm interpolants. Convolution-based smoothing acts as a loss pass filter and is achieved by changing the interpolant’s RBF with its associated smoothing kernel. A key advantage of this method is that the interpolant is only fitted once, and interpolant evaluations can use associated RBF-smoothing kernels with varying degrees of smoothing using the same fitted coefficients. Greedy algorithms begin by fitting the interpolant using a small random subset of data. Next, residuals measuring the difference between data and interpolant values are computed on remaining points not included in the initial data subset by interpolant evaluation. Points corresponding to residuals beyond a user-defined threshold are added to the interpolant and refitted. The algorithm terminates when all residuals are
Radial Basis Functions
below the user-defined threshold. The disadvantage with this approach is the sensitivity to data outliers.
RBF Selection RBF interpolation requires a specific RBF to be chosen for the interpolant (Eq. 1). Depending on data sampling, application, and model complexity (e.g., highly deformed geology), some RBFs may perform better than others. However, currently, there is a knowledge gap in determining the best RBF depending on these factors for geoscience applications and presents opportunities for future research. Nevertheless, awareness of the following two observations can be useful: (1) Interpolation differences between RBFs increase as data sparsity increases, (2) CPD RBFs listed in Table 2 generally perform well for surface modeling. The effect of using different RBFs (radial power, multiquadric, and Wendland) and associated parameters from Tables 1 and 2 on scalar field interpolations for a synthetic dataset is illustrated in Fig. 2. Interpolations presented in Fig. 2a, b using a cubic radial power RBF can be advantageous since they are parameterfree. For the multiquadric, smaller shape parameter values (Fig. 2c) ε correspond to flatter functions while larger values (Fig. 2d) correspond to smoother functions. For Wendland’s compactly supported functions, the cutoff radii parameter R specifies the range of influence of sampled point data. Smaller R values are more locally based interpolations (Fig. 2e) whereas larger R values are more global (Fig. 2f). The most common technique used for determining the optimal RBF and associated shape parameters (if applicable) given a dataset is leave one out cross-validation (LOOCV). This technique typically uses either error estimates or stability metrics of RBF interpolants. Although these metrics are useful to study mathematical properties of the interpolation, they are insufficient in determining which RBF and their parameters are best for producing physically plausible models for many geoscience applications. Ideally, determining the best RBF for geoscience applications requires development of application-specific metrics computed on models produced by the evaluation of RBF interpolants over a discretized domain. The purpose of model metrics is to provide comparative measures for how much better one model compares to another. Model metrics can include various geometrical measures of smoothness and shape, topological measures computed from the Morse-Smale complex (Edelsbrunner et al. 2003), volume/area of regions bounded by function value ranges, and model uncertainty as measured by entropy from ensemble of models (de la Varga and Wellmann 2016). Combining model uncertainty with application-specific model metrics capable of measuring the performance and plausibility of relevant model features is recommended.
Radial Basis Functions
1181
Software Implementations
Conclusions
There are numerous commercial and open source softwareimplementing RBF interpolation and associated methodologies. However, there are no implementations which offer comprehensive support for all possible optimizations and constraint types. Furthermore, some implementations obscure access and restrict fine tuning of various RBF parameters which can be beneficial for modeling. Access to specific RBF methods and control over their parameterization may require using open source code – albeit at the expense of ease of use. Additionally, there are potential synergies with combining RBF interpolation with other geomodeling tools which enable incorporation of additional geological information provided by interpretation, knowledge, and multidisciplinary datasets.
RBFs remain popular in geoscience for scattered data approximation for spatially dependent applications, particularly for implicit modeling. Although there has been widespread success in their use, there are limitations depending on the application. Notwithstanding the limitations, continued development of RBF-interpolation and their novel uses provides current and future geoscience applications with increasingly effective tools.
Cross-References ▶ Interpolation ▶ Kriging ▶ Three-Dimensional Geologic Modeling ▶ Topology in Geosciences
Limitations Although there has been great success with using RBF interpolation for spatially dependent geoscience applications, limitations exist, particularly with respect to geological modeling of structurally complex features. In these settings, physically implausible solutions can be produced. The predominant issue is limited control over global interpolation properties. A priori knowledge quite often available describing global properties such as topology, volume/area, and structural feature thickness cannot be incorporated as constraints, the reason being many global properties can only be computed after interpolants have been fitted. In addition, there are scalability problems. While RBF-based numerical optimizations enabled up to millions of data points using interpolants of the form Eq. 1, many of extensions previously described and their kriging counterparts have yet to be accommodated by these methods.
Recent Research Potential for modeling capacity enhancement for geoscience applications using RBF interpolation may exist in leveraging recent research from the numerical analysis community. In particular, the variably scaled kernels introduced by Bozzini et al. (2015) transform the interpolation problem to a higher dimensional space permitting locally varying shape parameters for SPD RBF functions. The concept could provide a mechanism for incorporating locally varying anisotropy. For discontinuous applications, the work on variably scaled discontinuous kernels by De Marchi et al. (2020) would be of interest.
Acknowledgments Many thanks to Eric de Kemp for valuable feedback on this entry and many fruitful discussions involving the use of ® RBFs for constrained 3D geological modeling. SKUA-GOCAD software was provided through the RING (Research for Integrative Numerical Geology) research consortia and Emerson. NRCan contribution number 20200068.
Bibliography Bozzini M, Lenarduzzi L, Rossini M, Schaback R (2015) Interpolation with variably scaled kernels. IMA J Numer Anal 35:199–219 Calcagno P, Chiles JP, Courrioux G, Guillen A (2008) Geological modelling from field data and geological knowledge Part I. Modelling method coupling 3D potential-field interpolation and geological rules. Phys Earth Planet Inter 171(1-4):147–157 Carr JC, Beatson RK, Cherrie JB, Mitchell TJ, Fright WR, McCallum BC, Evans TR (2001) Reconstruction and representation of 3D objects with radial basis functions. In: ACM SIGGRAPH 2001, Computer graphics proceedings. ACM Press, New York, pp 67–76 Carr JC, Beatson RK, McCallum BC, Fright WR, McLennan TJ, Mitchell TJ (2003) Smooth surface reconstruction from noisy range data. In: Proceedings of the 1st international conference on computer graphics and interactive techniques in Australasia and South, East Asia, p 119-ff Cowan EJ, Beatson RK, Fright WR, McLennan TJ, Mitchell TJ (2002) Rapid geological modelling. In: Applied structural geology for mineral exploration and mining, international symposium, pp 23–25 de Kemp EA, Schetselaar EM, Hillier MJ, Lydon JW, Ransom PW (2016) Assessing the workflow for regional-scale 3D geologic modeling: an example from the Sullivan time horizon, Purcell Anticlinorium East Kootenay Region, Southeastern British Columbia. Interpretation 4(3):SM33–SM50 de la Varga M, Wellmann JF (2016) Structural geologic modeling as an inference problem: a Bayesian perspective. Interpretation 4(3):SM1– SM16 De Marchi S, Marchetti F, Perracchione E (2020) Jumping with variably scaled discontinuous kernels (VSDKs). BIT Numer Math 60: 441–463. https://doi.org/10.1007/s10543-019-00786-z
R
1182 Dubrule O (1984) Comparing splines and kriging. Comput Geosci 10(2-3):327–338 Duchon J (1976) Interpolation des fonctions de deux variables suivant le principe de la flexion des plaques minces. RAIRO Analyse Numérique 10(12):5–12 Edelsbrunner H, Harer J, Natarajan V, Pascucci V (2003) Morse-Smale complexes for piecewise linear 3-manifolds. In: Proceedings of the 19th annual symposium on Computational Geometry (SCG ’03). ACM, New York, pp 361–370. https://doi.org/10.1145/777792. 777846 Hardy RL (1990) Theory and applications of the multiquadricbiharmonic method 20 year of discovery 1968–1988. Comput Math Appl 19(8-9):163–208 Hillier MJ, Schetselaar EM, de Kemp EA, Perron G (2014) Threedimensional modelling of geological surfaces using generalized interpolation with radial basis functions. Math Geosci 46(8): 931–953 Lajaunie C, Courrioux G, Manuel L (1997) Foliation fields and 3D cartography in geology: principles of a method based on potential interpolation. Math Geol 29(4):571–584 Lorensen WE, Cline HE (1987) Marching cubes: a high resolution 3D surface construction algorithm. Comput Graph 21(4):163–169 Micchelli CA (1986) Interpolation of scattered data: distance matrices and conditionally positive definite functions. Constr Approx 2:11–22 Scheuerer M, Schaback R, Schlather M (2013) Interpolation of spatial data – a stochastic or a deterministic problem? Eur J Appl Math 24(4):601–629 Wendland H (2005) Scattered data approximation. Cambridge University Press, New York
Random Forest
partitions become more refined and the data more homogeneous within the partitions. The splitting rules may be set to maximize local-node homogeneity or optimize a criterion related to the predictive value of the target (dependent) variable. The random forest algorithm constructs an ensemble of such decision trees using a training data set. This ensemble (“forest”) is instantiated with values of predictor variables to establish the class of the target variable (in the case of classification) or the value of the target variable (in the case of regression). The target variable represents the quantity to be determined and the predictors are variables whose known values are thought to be related to the target. The use of an ensemble approach, rather than a single tree, improves the predictive accuracy outside the training data set due to averaging the individual tree predictions and assures stability and robustness of the predictions because the ensemble encompasses more possibilities in the training data set (Hastie et al. 2009). In supervised learning, training data are the data used to train an algorithm to predict the target variable. If a separate validation data set is required, the training data may be a random subsample of the available data.
Description of Random Forest Algorithm
Random Forest Emil D. Attanasi1 and Timothy C. Coburn2 1 US Geological Survey, Reston, VA, USA 2 Department of Systems Engineering, Colorado State University, Fort Collins, CO, USA
Definition The random forest algorithm formalized by Breiman (2001) is a supervised learning method applied to predict the class for a classification problem or the mean value of a target variable of a regression problem. The fundamental elements of the algorithm consist of “classification and regression trees” (CART) that are applicable for modeling nonlinear relationships, particularly where the prerequisites of standard parametric regression, such as explicit functional expressions, independence of regressors, homoscedasticity, and normality of errors, cannot be met. Individual “trees,” sometimes referred to as decision trees, are built through the process of recursive partitioning of the predictor variable data space leading to a schematic of “nodes” and “branches” that depicts the relationships among the variables and their hierarchical importance. This iterative process splits the data into branches and then continues splitting each partition into smaller groups as the
The random forest algorithm constructs an ensemble of trees in the following way. Each individual decision tree is developed from a bootstrap sample randomly selected with replacement from the training data set and subsequently built by recursively partitioning the predictor space. The recursive partitions are constructed by sampling a subset of the available predictors and then optimal splits in the predictor data space are identified that provide the greatest reductions in node impurity. The node impurity is a measure of the homogeneity of the target values at the node. For classification trees the standard algorithm minimizes Gini impurity, which is interpreted as the probability of misclassifying an observation. For regression trees, partitions are made to minimize the error variance of the target variables associated with the predictors. Bootstrap sampling of the training data to construct each tree, plus sampling the set of predictor variables prior to each partition, is intended to quantify uncertainty and reduce the prediction variance at the expense of introducing slight bias. In the case of classification, the random forest ensemble prediction consists of the “majority vote” among all class assignments determined for the individual trees; and in the regression case, prediction consists of the mean of the predicted values of the target variable across all individual trees. For each tree, the part of the training data that is not included in the bootstrap sample used to construct the tree is known as the out-of-bag (OOB) sample. These data are used by the
Random Forest
algorithm to incorporate a validation step in the training procedure (Breiman 2001). The OOB error estimate is comparable to the error calculated with standard cross-validation procedures. The OOB error is an estimate of the random forest’s “generalization error,” which is defined as the prediction error expected for samples outside the training set. Hyperparameters are used to control the learning process in the machine learning algorithms so that the predictive model minimizes the estimated generalization error. For random forests, they include the number of trees, the number of predictors sampled, the bootstrap sampling fraction of the training data, and various controls on tree complexity, such as the minimum number of observations to be used at a terminal node, the required observations for a partition (node size), and the maximum number of terminal nodes. A smaller subset of the predictors leads to lower correlation among trees. A smaller minimum node size leads to deeper trees with more splits. A smaller training sample fraction also leads to reduced correspondence among trees in the forest.
Model Interpretation Model interpretation centers on the relative importance of the predictor variables and the responses of the target variable to incremental predictor variable changes. Two metrics for assessing the relative impact of predictors are permutation importance and Gini importance. Permutation importance ranks the predictor variables based on degradation of the model’s predictive performance when their values in the OOB sample are randomly and successively permuted. Predictor variable importance rank is directly proportional to the degree of the loss in predictive accuracy for the OOB samples when predictor values are successively permuted. Alternatively, Gini importance counts the number of times a predictor is used in partitioning the predictor space weighted by the fraction of samples it splits (which approximates the probability of reaching that node) averaged over all trees of the ensemble. Gini importance rank corresponds to the relative magnitude of the calculated Gini importance. Because this traditional computational scheme for computing Gini importance produced biased results in favor of predictors with many possible split points, it was modified by Nembrini et al. (2018) and the revised version was shown to be unbiased. Other interpretive tools include permutation tests (Altmann et al. 2010) used to calculate p-values associated with the predictor variables. Innovative methods to compute prediction intervals (Zhang et al. 2019) and ways to generalize the algorithm to provide rigorous causal inference interpretations to the relationship between predictors and the target variable (Athey et al. 2019) are areas of recent research. Because of their breadth, tree-based predictive models can capture the interaction effects of predictors. While identifying
1183
and recognizing the nature of such interactions can be difficult, the use of various visual displays can be helpful. For example, the marginal effect that one predictor or two features have on the target variable can be shown by partial dependence plots (PDPs). By averaging predicted target values across trees, when marginal changes are made in one predictor, and other predictors are set at their average values, PDPs show whether the relationship between the target and a feature is linear, monotonic (i.e., entirely nonincreasing or nondecreasing), or more intricate. Another visual diagnostic is the individual conditional expectation plot (ICE). Whereas the PDPs show the predicted target values averaged across all trees, the ICE plot displays the individual tree values of the target relative to marginal changes in a predictor, thereby capturing the diversity of the predicted responses. Additional methods to identify, analyze, and interpret effects of predictor interactions are being pursued by researchers in a variety of other sciences, particularly medicine. Analytical advantages include the nonparametric nature of the algorithm, its superior predictive performance, its capability to determine variable importance, built-in procedures to facilitate model validation using OOB observations, and the ability to compute variable proximities that can be used for data clustering to further improve prediction performance (Tyralis et al. 2019). Most implementations of the algorithm also provide imputation options to allow use of incomplete observations. The analyst should be aware of the following limitations when applying the original random forest algorithm. First, the results are not as readily interpreted as those derived from a single regression or classification tree. Second, the reliability of the variable importance metrics is affected by correlation among predictors. Third, predictions cannot be reliably extrapolated beyond the range of the training data. Fourth, modifications are required to adequately model data sets in which the number of observations of the response variable belonging to one class is substantially different from the numbers of observations associated with the other classes (e.g., in the case of rare events such as some mineral occurrences). Finally, it is worth noting that the original algorithm is not suitable for causal inference. Further, variations of the random forest algorithm (e.g., extremely randomized forests or conditional inference random forests) do not consistently provide better performance (Tyralis et al. 2019).
Variations and Software Tyralis et al. (2019) have published one of the most extensive reviews of the literature relating to random forests, all in the context of water and hydrogeological research. They list and reference 33 variations of the basic algorithm, with extremely randomized trees, quantile regression trees, and trees
R
1184
associated with conditional inference being among those most frequently applied. Additional references for variations that lead to Bayesian, neural, dynamic, and causal random forests are also provided. Computer software for modeling random forests is readily available in a number of open source domains and in many commercial packages. Tyralis et al. (2019) list 55 packages related to random forests available in the R language, which is a free software environment for statistical computing and graphics. Python, also an open source language, has a random forest implementation in the Scikit-Learn package, and commercial software providers, such as Mathworks ® and SAS ®, commonly have packages for computing random forests (Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the US Government.).
Earth Science Applications Numerous applications of the random forest algorithm and its variations have been published in the quantitative earth science literature. For example, applications of the algorithm to remote sensing are reviewed by Belgiu and Drǎguţ (2016). Another broad class of applications relate to predictive mapping at the regional level utilizing the classification function of the random forest algorithm. For example, mapping situations that use data generated from satellite images combined with various types of geophysical measurements (either from airborne sensors or field samples) can be used to classify rock lithology. In this case, the multiple classification scheme works on a pixel basis to generate the map image. Similar approaches have been applied to data on physiography, rainfall, and soil type to map areas having a propensity for landslides; and groundwater resources have been mapped using physiographic data, satellite imagery, and observed occurrences of springs (Tyralis et al. 2019 and references therein). Other mapping applications include those involving land use and wetlands (Belgiu and Drǎguţ 2016). During the last 50 years, geologists have developed an expert knowledge base identifying evidentiary data of individual classes of mineral occurrences. This information, along with spatial geologic data and occurrence/nonoccurrence of minerals, can be addressed with the random forest algorithm to generate favorability maps for use in mineral prospecting (e.g., Carranza and Laborte 2016). Such applications, which include copper, gold, and polymetallic minerals, require adjustments to be made to the input data because the identified potentially marketable concentrations of mineral occurrences are quite rare. Laboratory applications of the random forest algorithm include identification and location of organic matter and
Random Forest
pores in scanning electric microscope (SEM) images of organic rich shales. Similar image processing techniques can be used to analyze petrographic thin sections to predict (classify) rock type and mineral composition (Xie et al. 2020). In the oil and gas industry, the random forest algorithm has been applied to well log images and digitally processed well logs to identify lithology and rock properties that become the basis for calculating oil and gas reservoir parameters. The algorithm is routinely used in field operations to classify rock lithology from well measurements taken while drilling in real time to identify the target formation as well as to predict rates of drilling penetration (Xie et al. 2020). Other uses include prediction of individual well flow rates and expected ultimate oil recovery by well. These latter applications of the regression tree function of random forests naturally lead to identifying the drivers of well productivity utilizing the variable importance function of random forests. As well noted in Tyralis et al. (2019), water science applications largely involve categorization of stream flow volumes and water quality features, as well as the analysis of predictor importance. Uses of the random forest algorithm to address environmental quality and ecological viability include mapping contaminants in estuarine systems, modeling nitrate concentrations in water wells, and prediction of algae blooms in tropical lakes. In these applications the algorithm is used to identify predictors that drive the models of such systems. In the atmospheric sciences, the use of random forests has been shown to improve severe weather forecasts when 2 and 3 days out from the event.
Summary The random forest algorithm is broadly employed across multiple disciplines, with numerous variations having been developed for a wide variety of contexts and scenarios. Its use has become almost ubiquitous in the earth sciences, with applications ranging from research and planning to optimization of field and industrial operations, resource exploration and development, environmental monitoring, and beyond. Given the ready availability of software, execution of the basic algorithm is relatively straightforward, with computations and setup requiring minimal assumptions. Applications often encompass large data bases and thousands of predictors, and its superior predictive performance is well established in the earth sciences as well as in other fields such as the medical sciences. While not without limitations, the methodology underlying the algorithm is still an active field of research for generalization and statistical interpretation (Zhang et al. 2019; Athey et al. 2019).
Random Function
Cross-References ▶ Artificial Intelligence in the Earth Sciences ▶ Decision Tree ▶ Machine Learning ▶ Predictive Geologic Mapping and Mineral Exploration
Bibliography Altmann A, Tolosi L, Sander O, Lengauer T (2010) Permutation importance: a corrected feature importance measure. Bioinformatics 26(10):1340–1347. https://doi.org/10.1093/bioinformatics/btq134 Athey S, Tibshirani J, Wager S (2019) Generalized random forests. Ann Stat 47:1148–1178 Belgiu M, Drǎguţ L (2016) Random forest in remote sensing: a review of applications and future directions. ISPRS J Photogramm Remote Sens 114, 24:–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011 Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi. org/10.1023/A:1010933404324 Carranza EJM, Laborte AG (2016) Data-driven predictive modeling of mineral prospectivity using random forests: a case study in Catanduanes Island (Philippines). Nat Resour Res 25(1):35–50 Hastie T, Tibshirani R, Friedman J (2009) Elements of statistical learning – data mining, inference, and prediction, 2nd edn. Springer, Berlin Nembrini S, Keonig I, Wright M (2018) The revival of the Gini importance? Bioinformatics 34(21). https://doi.org/10.1093/bioinformatics/bty373 Tyralis H, Papacharalampous G, Langousis A (2019) A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 11(5):910. https://doi.org/10.3390/ w11050910 Xie Y, Zhu C, Hu R, Zhu Z (2020) Lithology Identification with Extremely Randomized Trees. Math Geosciences, published online 12 August 2020. https://doi.org/10.1007/s11004-020-09885-y Zhang H, Zimmerman J, Nettleton D, Nordman DJ (2019) Random Forest prediction intervals. Am Stat 74(4):392–406. https://doi.org/ 10.1080/00031305.2019.1585288
1185
Z ðxÞ fzðx, yÞg, 8y O, x D,
ð1Þ
where D is the domain within which the realization is defined (it could be finite or infinite, discrete or continuous); in geosciences, D will be the one-, two-, or three-dimensional portion of the space where the variable of interest is being studied, or, in many occasions, the point set of one-, two-, or three-dimensional locations centered at the voxels discretizing the study area; Ω is the sample space containing all outcomes of the experiments; x is a point in the domain D; and θ an outcome of the experiment. For the reminder and without loss of generality, x will be regarded as a spatial coordinate.
Realization Versus Random Variable A random function as defined above can be analyzed also in terms of random variables. When the experiment outcome is fixed at θ0, the function z(x, θ0) is a realization of the random function within the study area. But when the location is fixed at x0, the function z(x0, θ) corresponds to a random variable. To alleviate the notation, θ is dropped hereafter, and then, a lower case z will represent realizations (and the values they can take) and an upper case Z a random variable or the random function itself. The random function can then be characterized by all the n-variate distributions Fðx1 , x2 , . . . , xn ; z1 , z2 , . . . , zn Þ ¼ ProbðZ ðx1 Þ z1 , Zðx2 Þ z2 , . . . , Zðxn Þ zn Þ,
ð2Þ
J. Jaime Gómez-Hernández Research Institute of Water and Environmental Engineering, Universitat Politècnica de València, Valencia, Spain
which are defined for any set of locations {x1, x2, . . ., xn}, and any set of values {z1, z2, . . ., zn}. To better understand what a random function is, it is convenient to consider this duality between realizations and random variables (Fig. 1). When considered as a collection of realizations, one needs to know which is the rule that associates a realization to the outcome of an experiment. When considered as a collection of random variables, one needs to know all its multivariate distributions.
Synonyms
The Random Function Model in Geosciences
Stochastic process
Geoscience variables generally display patterns of spatial variability that cannot be modeled with deterministic models, and, at the same time, they display spatial patterns that denote that their distribution in space is not totally random. The random function model is a model suited for the modeling of these variables. A decision is taken that the specific, and unknown save for a few measurement locations, spatial distribution of the variable of interest corresponds with one of
Random Function
Definition A random function Z(x), also known as a stochastic process, can be defined as a rule that assigns a realization to the outcome of an experiment
R
1186
Random Function
C xi , x j ¼ E
Z(x) x2
ð5Þ Variogram
x3
Like the covariance, the variogram is also computed between two locations
x4 x 5
x1 x2
z ii
, 8xi , xj D:
x4 x 5
x1 zi
Z ðxi Þ mðxi Þ Z xj m xj
1 g xi , xj ¼ E 2
x3
Z ðxi Þ Z xj
2
, 8xi , xj D:
ð6Þ
Stationary and Ergodicity
x4 x 5
x1 zθ
x2
x3
Random Function, Fig. 1 Conceptualization of a random function as a collection of realizations, Z (x) {zi, zii, . . ., zθ}. At any given location x, the collection of z values through the realizations corresponds to a random variable
the realizations of a random function. Then, a random function is chosen and it is analyzed to build an understanding about the values of the variable at unsampled locations. The difficulty in adopting this model strives in how the random function is chosen, and then, how it is characterized, whether through the rule that associates realizations to experiment outcomes or through the multivariate distribution. Some Random Function Summary Statistics There are some summary statistics commonly used in the characterization of a random function. The first and second order ones are given next. Expected Value
The expected value of a random function is a function of x with the expected values of each random variable EfZðxÞg ¼ mðxÞ:
ð3Þ
Variance
Likewise, the variance of a random function is a function of x with the variance of each random variable VarfZðxÞg ¼ s2 ðxÞ ¼ E ðZðxÞ mðxÞÞ2 :
ð4Þ
Covariance
The covariance is computed between two locations and it corresponds to the covariance between the two random variables at those locations
Refer back to Fig. 1, adopting a random function model implies the acceptance that one of the realizations {zi, zii, . . ., zθ} is the reality. Some data may have been sampled, but it is evident that, from these data, it is impossible to derive any of the common statistics described in the previous section, much less any of the n-variate distributions in Eq. (2). At x1, there is only one sample of the random variable Z(x1), from which it is impossible to derive m(x1), and the pair of values z(x2) and z(x3) are not sufficient to compute the covariance C(x2, x3) or the variogram γ(x2, x3). For these reasons, the random function models used in the geosciences are not as generic as their definition implies. It is typical to use random functions that are both stationary and ergodic. Stationarity
A random function is said to be stationary of the first order if its expected value is constant EfZðxÞg ¼ mðxÞ ¼ m:
ð7Þ
A random function is said to be stationary of the second order if its covariance depends only on the separation vector between the two locations and not in the actual locations. In Fig. 1, the pairs of points z(x2) and z(x3), and z(x4) and z(x5) are separated by the same vector; if the random function is stationary of the second order, the covariance would be the same for both pairs C xi , xj ¼ C xi xj :
ð8Þ
When the separation vector is the null vector, the covariance becomes the variance of the random variable and, as a consequence of the previous equation, it will be constant Cð0Þ ¼ s2 :
ð9Þ
Likewise, the variogram of a second-order stationary random function depends only on the separation vector g xi , xj ¼ g xi xj , and the following two relationships hold
ð10Þ
Random Function
1187
C xi xj ¼ s2 g xi xj
ð11Þ
g xi xj ¼ s2 C xi xj
ð12Þ
Ergodicity
A stationary random function defined on an infinite domain D is said to be ergodic on the mean when the constant expected value of all the random variables coincides with the mean computed on any realization E Z xj
¼
D
zy ðxÞdx ¼ m,
8xj D, 8y O:
ð13Þ
Likewise, it is ergodic on the covariance when the expected value computed through the realizations can be replaced by the integral on any realization C xi xj ¼ CðhÞ ¼ EfðZ ðxi Þ mÞðZðxi þ hÞ mÞg ¼
D
ðzy ðxÞ mÞðzy ðx þ hÞ mÞdx,
8xi xj D, 8y O ð14Þ The Decision of Stationarity and Ergodicity Given that, in practice, only limited information about one of the realizations of the random function is available, it seems convenient to use stationary and ergodic random function models since, in that case, the mean and covariance of the random function could be replaced by approximate estimates of the spatial integrals computed from the sample data. But the use of a stationary and ergodic covariance is only a matter of convenience, nothing that can be proved or disproved on the basis of sparse information from a given realization. On certain occasions, the modeler may choose a nonstationary random function, for instance one with a spacevarying expected value, but such a decision must be taken from ancillary data, like geological or process information but never on a statistical analysis of a single sample from multiple random variables. Convenient Random Function Models There are some random function models that are commonly used in mathematical geosciences for their convenience in specifying all n-variate distributions in Eq. (2). The Multi-Gaussian Random Function Model The multi-Gaussian random function model is probably the most commonly used because its multivariate distribution only depends on the mean and the covariance; all remaining higher-order moments are functions of these two (Anderson
1984). A stationary and ergodic multi-Gaussian random function has the advantage that mean and covariance can be estimated from the mean and covariance derived from the data using Eqs. (13) and (14). The Multi-w-Gaussian Random Function Model The properties of the multi-Gaussian random function model extend to any monotonic transformation Y(x) ¼ j(Z(x)). The new random function Y(x) is the collection of realizations {yθ ¼ j(zθ)} and has the same property of being fully characterized just by the mean and the covariance, as long as function j is monotonic. A typical example is the multi-logGaussian distribution used frequently to model permeabilities. The Indicator Random Function One of the problems associated with the adoption of any Gaussian-related random function has to do with its convenience of being fully characterized by the mean and the covariance. The full dependency of all higher- order moments on the first two moments makes it impossible to control the higher-order moments, and, particularly, the multi-Gaussian models are characterized by little spatial connectivity for the extreme values (Gómez-Hernández and Wen 1998). Extremevalue continuity is fundamental in geological settings; in petroleum engineering, shale barriers or flow channels can depict much higher continuity in space than intermediate values, and such a feature cannot be reproduced by a Gaussian random function. The continuity of extreme values is explicitly handled by the indicator formalism. Consider that the range of z values is binned into K classes and then K indicator variables are derived for any z value ik ðx; zÞ ¼
1;
if z ½zk1 , zk2
0;
if not
, k ¼ 1, . . . , K
ð15Þ
where [zk1, zk2[ are the limits of class k. As a result, now there are K random functions Ik (x; z) the continuity of which can be controlled independently of each other and can be used to build non-Gaussian realizations of Z(x).
Algorithmically Defined Random Functions According to Eq. (1), the set of realizations generated by current stochastic simulation computer codes define a random function, as long as an unequivocal rule identifying each realization is defined. Such a rule is the algorithm behind the computed code. The experiment consists of drawing random seed number to prime the algorithm. To each seed corresponds a unique realization.
R
1188
In the beginning, computer simulation codes were built to generate realizations drawn from a specific random function model. This is the case for any sequential Gaussian simulation code (see, for instance, Deutsch and Journel 1992; GómezHernández and Journel 1993; Gómez-Hernández and Srivastava 2021), which, by construction, will generate realizations drawn from a multi-Gaussian random function. Likewise, any sequential indicator simulation code (GómezHernández and Srivastava 1990; Deutsch and Journel 1992) generates realizations from an indicator random function. But soon after these codes were of general use, practitioners demanded codes that generated realizations with other characteristics. For example, there was an interest in generating realizations from a stationary random function with univariate distributions far from a Gaussian bell and with a given covariance. This could not be achieved with any Gaussian simulator. To solve this problem, the direct sequential simulation was proposed (Soares 2001), this algorithm generates realizations with the desired univariate distribution and covariance; however, the other moments or n-variate distributions are unidentified, but they could be derived from the statistical analysis of a sufficiently large collection of realizations. The concept of algorithmically defined random function was born and the need to have the analytical expression for the n-variate distributions of Eq. (2) disappears. The most recent multipoint geostatistics (Mariethoz and Caers 2014; Strebelle 2002) would be a paradigmatic example of these kinds of random functions defined by a collection of realizations, which are built on the basis of a training image and an algorithm to extract high-order statistics from the training image to infuse them onto the realizations.
Summary and Conclusions The random function model is often used in the geosciences to model the spatial variability of properties that display a heterogeneity that cannot be described deterministically nor is completely random. From a practitioner point of view, the use of such a model is a decision that cannot be proved or disproved from the data. The random function model is the foundation of geostatistics and is behind standard algorithms such as kriging or sequential simulation.
Cross-References ▶ Kriging ▶ Monte Carlo Method ▶ Multivariate Analysis ▶ Random Variable ▶ Realizations ▶ Simulation ▶ Spatial Statistics
Random Variable
Bibliography Anderson TW (1984) Multivariate statistical analysis. Wiley, New York Deutsch CV, Journel AG (1992) GSLIB, geostatistical software library and user’s guide. Oxford University Press, New York Gómez-Hernández JJ, Journel AG (1993) Joint sequential simulation of multi-Gaussian fields. In: Soares A (ed) Geostatistics Tróia ‘92, vol 1. Kluwer Academic, Dordrecht, pp 85–94 Gómez-Hernández JJ, Srivastava RM (1990) ISIM3D: an ANSI-C three dimensional multiple indicator conditional simulation program. Comput Geosci 16(4):395–440 Gómez-Hernández JJ, Srivastava RM (2021) One step at a time: the origins of sequential simulation and beyond. Math Geosci 53(2): 193–209 Gómez-Hernández JJ, Wen XH (1998) To be or not to be multiGaussian: a reflection in stochastic hydrogeology. Adv Water Resour 21(1): 47–61 Mariethoz G, Caers J (2014) Multiple-point geostatistics: stochastic modeling with training images. Wiley Soares A (2001) Direct sequential simulation and cosimulation. Math Geol 33(8):911–926 Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34(1):1–21
Random Variable Ricardo A. Olea Geology, Energy and Minerals Science Center, U.S. Geological Survey, Reston, VA, USA
Background When Galileo dropped objects from the leaning tower of Pisa, he noted that, for a given height and meteorological condition, the falling time was the same. This is one example of a situation in which the result of the experiment is unique and predictable. These types of observations are called deterministic experiments. Now flip a coin. The result is not unique anymore. More than one outcome is possible; the result is now uncertain. These experiments go by the name of random experiments. Statistics uses the concept of probability to measure the chances of different outcomes in a random experiment. By convention, probabilities are always positive numbers including zero. There are three fundamental probability axioms (Hogg et al. 2018): • The probability of an impossible outcome is 0. • The maximum value of 1 denotes outcomes absolutely certain to occur. • If two outcomes of the same experiment cannot occur simultaneously, their probabilities are additive.
Random Variable
Probabilities can also be multiplied by a factor of 100 and reported as percentages. A set containing all possible outcomes is called sample space.
1189
directions, and ratio scale variables. These types of random variables deserve separate development of their theories. A random variate or realization is the result of randomly drawing one outcome from a random variable.
Definition Discrete and Continuous Random Variables A random variable is a function that assigns a value in a sample space to an element of an arbitrary set (James 1992; Pawlowsky-Glahn et al. 2015). It is a model for a random experiment: the arbitrary set is an abstraction of the experimental conditions, the values taken by the random variable are in the sample space, and the function itself models the assignment of outcomes, thus also describing its frequency of appearance. In simpler terms, for the purpose of this presentation, a random variable is a function that assigns to each of the outcomes of a random experiment a value with a certain probability. A random variable also goes by stochastic variable and aleatory variable. Random variables are usually annotated as Roman capital letters, such as X or Y. The characteristics of the sample space may define a variety of random variables. When the sample space comprises the real numbers and the scale is assumed absolute, differences are computed by ordinary subtraction, and averages are computed using the ordinary sum. However, there is a need of alternative sample spaces. For instance, a set of functions of time; a set of bounded regions on a plane; the points on a simplex; points on a sphere; positive real numbers. They characterize the respective random variables as stochastic processes, random sets, random compositions, random
In order to fully characterize a random variable, a description of the probability of the outcomes is necessary. The nature of the sample space determines the kind of description. In terms of the type of the sample space, the random variables can be discrete or continuous. If one can count the number of possible outcomes, then the random variable is discrete. Outcomes from game chances such as flipping coins, playing cards, or casting dice are examples of discrete random variable sample spaces. Fig. 1 shows two examples that provide chances in random experiments with only two possible outcomes per trial. The case when the chance of each outcome is the same can be associated with the cumulative number of heads expected after flipping a coin multiple times (Fig. 1a). When the value of the parameter is 0.1, it gives the expected number of producing wells when drilling wildcats in different plays (Fig. 1b). In the discrete case, a probability mass function assigns the probabilities to each outcome. A continuous sample space has different mathematical characteristics. For this presentation, we center the attention in random variables whose sample space is identified with the real numbers. Commonly, the description of the probability in this continuous case is determined based on a probability
R
Random Variable, Fig. 1 Examples of discrete random variables when repeating a random experiment with only two possible outcomes 12 times: (a) number of heads or tails in flipping a fair coin; (b) number
of new reservoirs successfully discovered when drilling 12 wildcats at different plays when considering that the probability of success per well is 10%
1190
Random Variable
density function, f (x). If X is a univariate random variable, the probability of X to be in an interval (a, b) is given by b
Pr½a X b ¼ f ðxÞ dx:
ð1Þ
a
With this kind of definition, a single point is always assigned a null probability. It should be remarked that the probability density function does not always exist but when it does, it is always nonnegative. When the limits of integration in Eq. 1 are (1, 1), the result is the area under the curve, which must be 1 to agree with the axioms in the Background section. Random Variable, Fig. 2 Examples of normal distributions. Parameter m determines the center of the distribution, and s the spread and height of the peak
Analytical and Numerical Random Variables The results in Fig. 1 can be obtained by actually running experiments or from mathematical analysis. In the latter case, the result is a probability mass function that in this case goes by the name of binomial distributions (Forbes et al. 2011). The binomial distribution is an example of a discrete analytical probabilistic model: f b ðxÞ ¼
n! px ð1 pÞnx , x ¼ 0, 1, 2, . . . , n; 0 < p < 1, x! ðn xÞ!
(2) where n is a positive integer. The normal or Gaussian distribution is one of the most widely used continuous analytical models because many attributes either follow or approximate a normal distribution (Papoulis 2002): f N ðxÞ ¼
1 1 xm p exp 2 s s 2p
2
, 0 < s,
ð3Þ
where m and s are two parameters allowing the same expression to take different shapes, such as those in Fig. 2. The distribution is always symmetrical with respect to parameter m, which coincides with the mean of the distribution. Parameter s controls the spread of the distribution and coincides with the value of its standard deviation. The domain of the normal distribution is (1, 1), which sometimes is a problem in the geosciences because no attribute takes infinite values, and some do not take negative values. The advent of computers is increasingly popularizing the generation of random variables by general purpose methods such as the Monte Carlo method or the bootstrap method. Figure 3 is an example of a repeated realization of a random variable – a random sample for short – describing the volume
Random Variable, Fig. 3 Example of a numerical random variable showing all possible values of the true gas volume in a gas reservoir
of natural gas to be produced from a petroleum reservoir. In this case, the modeling was based on estimated ultimate recovery of natural gas from several wells (Olea et al. 2010). In this situation, the outcomes are all the possible values that the true volume could have. Note that the possible values are not equally likely. Relative to the coin example in Fig. 1a, the interpretation of the random variable is different now. By no means is the random variable implying that the gas volume in the one reservoir can take different values. In this case, there is just one unique true value, say, 125.2 billion cubic feet, but that value is unknown. The whole purpose of using a random variable is to learn what to expect about the magnitude of the true value.
Random Variable
1191
R
Random Variable, Fig. 4 Example of numerical approximation of oil volume random variable: (ae) input random variables, all following shifted and scaled beta distributions; (f) resulting oil volume using 5000 draws per attribute
1192
Rank Score Test
Combination of Random Variables
Summary and Conclusions
Even in the simplest case of a real sample space, the arithmetic of random variables can be complex, so the analytical results are quite limited. The simplest cases for operation, for instance sum or product of real random variables, are those in which the random variables involved are independent. Two random variables are independent when their joint probability density function is the product of their individual probability density functions. This means that the result of one of the random variables does not influence the realizations of the other one. A well-known classical example is that of two independent normal random variables X and Y. In such a case, their sum follows another normal distribution with a mean equal to the sum of the means, and the variance is equal to the sum of the variance of X plus the variance of Y. The advent of computers has radically increased the capabilities to work with combinations of random variables. Today, the Monte Carlo methods can be used to consider unlimited combinations of random variables. For example, suppose one were interested in a probabilistic assessment of the unknown volume of oil in place within the drainage area of an oil well. Let H be the random variable representing the net formation thickness, A the drainage area, P the porosity, W the water saturation, and B the conversion factor to standard conditions. The volume V of the oil in place is given by (Terry and Rogers 2014):
A random variable is a convenient concept to link outcomes of random experiments to elements in another set. Random variables play a fundamental role in probabilistic modeling of uncertain attributes, from simple experiments like flipping coins, to more complex computer-aided calculations, such as determining the volume of crude oil resources in a reservoir.
V ¼ 0:0001
H A P ð100 W Þ : B
ð4Þ
This is an expression complex enough not able to be solved analytically regardless of the form of the five random variables. However, if the modeler can provide distributions for all five random variables, then it is possible to obtain a numerical distribution for the random variable V by repeatedly drawing random variates from those distributions by applying the Monte Carlo method. At least 1000 iterations are recommended for stable results. In the example, for the first set of draws the values were 16.353 m (H ), 43,038 m2 (A), 9.912% (P), 31.149% (W ), and 1.2 (B). The oil volume for those values is 40.026 thousand m3 (V ). The result of performing the draws and calculations for another 4999 times are presented in Fig. 4. Both α and β are shape parameters of the beta distribution whose domain can be shifted and linearly scaled to vary within any finite boundaries (a, b) (Olea 2011): f b ðxÞ ¼
1 0
xa1 ð1 xÞb1 ua1 ð1 uÞb1 du
Cross-References ▶ Bootstrap ▶ Compositional Data ▶ Monte Carlo Method ▶ Probability Density Function ▶ Univariate
Bibliography Forbes C, Evans M, Hastings N, Peacock B (2011) Statistical distributions, 4th edn. Wiley, 321 pp Hogg RV, McKean JW, Craig AT (2018) Introduction to mathematical statistics, 8th edn. Pearson, London, 768 pp James RC (1992) Mathematics dictionary, 5th edn. Springer, 560 pp Olea RA (2011) On the use of the beta distribution in probabilistic resource assessments. Nat Resour Res 20(4):377–388 Olea RA, Cook TA, Coleman JL (2010) Methodology for the assessment of unconventional (continuous) resources with an application to the Greater Natural Buttes gas field, Utah. Nat Resour Res 19(4): 237–251 Papoulis A (2002) Probability, random variables and stochastic processes, 4th edn. McGraw Hill Europe, 852 pp Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, 247 pp Terry RE, Rogers JB (2014) Applied petroleum reservoir engineering, 3rd edn. Pearson, 503 pp
Rank Score Test Alejandro C. Frery School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
Synonyms ,
0 < x < 1;
a, b > 0: ð5Þ
Rank test
Rank Score Test
1193
Definition
Mann-Whitney Test
Rank score tests use the index of sorted observations to make inference, rather than the observations themselves. There are several rank tests in the literature. They are all based on the index which sorts the observations, rather than on the values. With this approach, rank tests gain robustness. We refer the reader to the book by Hollander et al. (2013) for a comprehensive account of nonparametric techniques. Girón et al. (2012) used such tests for edge detection in SAR imagery. Consider the two-sample case in which we want to compare x ¼ (x1, x2,..., xm) and y ¼ (y1, y2,...,yn). Denote xy ¼ (x1, x2,..., xm, y1, y2,...,yn) the joint sample. Unless explicitly stated, we will assume that all observations are different. Denote by Rx and Ry the vectors of sizes m and n which correspond to the ranks of x and y in the joint sample, respectively. The literature is not unanimous in the names for the following tests. We will follow the denomination presented by Hollander et al. (2013).
A different approach for the same tests presented is based on the Mann-Whitney statistic:
Wilcoxon Rank Sum Test This is a generic test for the hypothesis that x and y are the result of drawing observations from the same distribution, i.e., H 0 : FX(t) ¼ FY(t) for every t ℝ. Typical alternatives are of the form H 1 : FY(t) ¼ FX(t Δ). These alternatives are called “location-shift model,” and describe a “treatment effect” measured by Δ ¼ E(Y) E(X). With this, the null hypothesis reduces to H 0 : Δ ¼ 0. The Wilcoxon test uses the sum of the ranks of the n second sample i¼1 Ry ðiÞ: It is possible to compute its expected value and variance, with which we form the test statistic: W¼
n i¼1 Ry ðiÞ
nðm þ n þ 1Þ=2
mnðm þ n þ 1Þ=12
:
m
n
U¼
1xi 0 if W zα; • Reject H 0 in favor of H 1: Δ < 0 if W zα; • Reject H 0 in favor of H 1: Δ 6¼ 0 if |W| zα/2, (in which zα is the α quantile of the standard Normal distribution)
▶ Interquartile Range ▶ Univariate
Bibliography Girón E, Frery AC, Cribari-Neto F (2012) Nonparametric edge detection in speckled imagery. Math Comput Simul 82:2182–2198. https://doi. org/10.1016/j.matcom.2012.04.013 Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods. Wiley. https://www.ebook.de/de/product/20514944/hol lander_chicken_wolfe_nonparametric_statistical_meth.html
R
1194
Rao, C. R. B. L. S. Prakasa Rao CR Rao Advanced Institute of Mathematics, Statistics and Computer Science, Hyderabad, India
Fig. 1 C. R. Rao, courtesy of Dr. Tejaswini Rao, daughter of C. R. Rao
Biography Statistical methods are widely employed in all areas of activities of society including industry and government. Professor Calyampudi Radhakrishna Rao has made distinct and extensive contributions to several branches of the subject of statistics and its applications leading to efficient methods of statistical analysis. These include the theory of estimation, multivariate analysis, characterization problems, combinatorics, and the matrix algebra. In the area of estimation, Rao established that, under specific conditions, the variance of an unbiased estimator of a parameter is not less than the reciprocal of the Fisher information contained in the sample. This result became known as the Cramer-Rao inequality. He has also shown that one can possibly improve an unbiased estimator by a process known as Rao-Blackwellization; Rao developed tests known as the Score and Lagrangian multiplier
Rao, C. R.
tests based on large samples for testing simple and composite hypotheses under general conditions. Based on the properties of suitable sample statistics, Rao obtained a number of characterizations of the normal, Poisson, Cauchy, stable, exponential, geometric, and other distributions. General classes of distributions known as weighted distributions were introduced in his work on discrete distributions arising out of the methods of ascertainment. The problem of defining a generalized inverse of a singular square or rectangular matrix was studied by Rao and he developed the calculus of generalized inverses of matrices with applications to unified theory of linear estimation of parameters in a general Gauss-Markov model. In multivariate analysis, one has to deal with extraction of information from a large number of measurements made on each sample unit. Not all measurements carry independent information. It is possible that a subset of measurements may lead to procedures which are more efficient than using the whole set of measurements. Rao developed a test to ascertain whether or not the information contained in a subset is the same as that given in the complete set. He also developed a method for studying clustering and other interrelationships among individuals or populations. Using general diversity measures applicable to both qualitative and quantitative data, the method of analysis of diversity was developed by Rao for which he introduced the concept of quadratic entropy in the analysis of diversity. Combinatorial arrangements known as orthogonal arrays were introduced by Rao for use in the design of experiments. These arrangements are widely used in multifactorial experiments to determine the optimum combinations of factors to solve industrial problems. These have also applications in coding theory. An important result of practical interest resulting from this novel approach is the Hamming-Rao bound associated with orthogonal arrays. Calyampudi Radhakrishna Rao was born on September 10, 1920, in Huvvina Hadagall in the state of Karnataka in India. After finishing his high school and basic undergraduate education, he went to Andhra University in Visakhapatnam, a coastal city in the state Andhra Pradesh where he obtained his B.A.(Hons) at the age of 19. After deciding to pursue a research career in mathematics, Rao went to Calcutta where he joined the Indian Statistical Institute (ISI) for training in the subject of statistics. At the same time, he completed a master’s degree program in statistics at Calcutta University. Later Rao obtained his Ph.D. degree at Cambridge University in the UK after which he became a full professor at ISI at the young age of 29. Upon retiring from the ISI in 1980, he moved to the USA where he worked for another 40 years and currently is a research professor at the University of Buffalo in New York State. He received several awards including the Bhatnagar award and the India Science award from the Government of India, the National Medal of Science from USA, and was
Rao, S. V. L. N.
elected as fellow of several academies of Science in India and abroad. In total he received 39 honorary doctorates from universities in India and other countries. Several students received their Ph.D. degrees under his guidance. C. R. Rao is among the worldwide leaders in statistical science over the last several decades. He celebrated his 100th birthday on September 10, 2020.
Rao, S. V. L. N. B. S. Daya Sagar Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
Fig. 1 S.V. L. N Rao (1928–1999). (Courtesy of “Dr. Ramana Sonty, second son of SVLN Rao”)
Biography Professor S. V. L. N. Rao was born on May 11, 1928, in India. He studied at Andhra University and obtained his B.Sc. (Hons) and M.Sc. degrees by 1950. He started his career in 1952, as an instructor in geology at the Indian Institute of Technology (IIT) Kharagpur. By 1975, he rose to the position of professor at the same institute. He obtained his Ph.D. degree in 1959. During 1985–1987, he served as Head of the Department of Geology and Geophysics. His role in the establishment of the Remote Sensing Service Centre at the IIT campus with the funding from the Department of Science and Technology (DST) was instrumental. Rao, earlier worked on geochemistry problems, and as a postdoctoral fellow at Kingston, Canada, he learned the use of computers while he pursued crystallography studies.
1195
From the late 1970s, his research interests have shifted to the applications of geostatistics and mathematical morphology in satellite and geological data analysis (Rao and Srivastava 1969; Rao and Prasad 1982). As a researcher and visiting professor, he visited several countries including Canada, Hungary, Germany, France, Norway, the UK, and the USA. He served as an adviser and consultant to provide solutions in mineral exploration and mine planning work for National Mineral Development Corporation and Hindustan Zinc Limited. On the invitation of the then Vice-Chancellor of Andhra University, he joined the Andhra University, where he headed the Center for Remote Sensing and Information Systems as a Director (1988–1992) and also served as Professor of Computer Science and Systems Engineering. He has guided about 40 Ph.D. students in geophysics, geology, remote sensing, mathematics, and computer science during his four decades of teaching and research. He played an instrumental role in starting an M.Tech course in remote sensing in 1988 as the Founding Director of the Center for Remote Sensing and Information Systems. From several conversations, it was obvious that he benefited from his collaboration with Late Professor Barth of Chicago University, Late Professor Berry, the then President of American Mineralogical Society, Prof. Jean Serra, Centre de Morphologie Mathematique, and Professor Jef Johnson of Open University. In 1984, he was invited to Paris School of Mines, where he worked with Jean Serra on optical microscopy, and he was involved in developing approaches for the quantitative treatment of polarized microscopy in geology. While he worked with Jef Johnson, Prof. Rao proposed the principle of dilational similarity. His colleagues and students remember Professor Rao as a distinguished geochemist, geostatistician, and computer scientist, and as a selfless friendly thesis adviser. He was an associate editor for the Mathematical Geology that released a special issue in his memory (Sagar 2001).
Cross-References ▶ Mathematical Morphology
Bibliography Rao SVLN, Prasad J (1982) Definition of kriging in terms of fuzzy logic. Math Geol 14(1):37–42 Rao SVLN, Srivastava GS (1969) Comparison of regression surfaces in geologic studies. Trans Kansas Acad Sci (1903-) 72(1):91–97 Sagar BSD (ed) (2001) Special issue ‘In memory of the Late Professor SVLN Rao’. J Math Geosci 33(3):245–396
R
1196
Realizations C. Özgen Karacan U.S. Geological Survey, Geology, Energy and Minerals Science Center, Reston, VA, USA
Definition In statistics, a realization is an observed value of a random variable (Gubner 2006). In mathematical geology, the most important realizations are those in the form of maps of spatially correlated regionalized variables. Spatial description of random variables within complex domains and making certain decisions about those require complete knowledge of the attribute of interest at each point in space. However, it is virtually impossible to sample from every location within the domain to gain a complete spatial understanding of the random variables with certainty at different scales. Therefore, limited sampling leaves us with incomplete information, which is the source of uncertainty. Understanding the uncertainty and quantifying it are essential to minimize the risks of decision making. Geostatistical simulation techniques aim to quantify spatial uncertainty of random variables by numerically reproducing the reality, which we have limited knowledge of, in a discretized (node) domain. The process creates multiple alternative maps of the random variable, which are called realizations. These are numerical representations of reality constructed by obeying the constraints of spatial statistics of the sampled data. Realizations are discretized maps explaining the uncertainty of the property of interest.
Methods of Generating Realizations Realizations can be generated through interpolation methods (e.g., kriging), or simulation methods that rely on two-point statistics (e.g., sequential Gaussian simulation, sequential indicator simulation), or multiple point statistics (e.g., FILTERSIM and SNESIM), which utilize a training image (TI) (Remy et al. 2009). The choice of the most appropriate method may depend on the objectives and the theoretical and practical aspects of the work. For instance, interpolation methods tend to be practical, but they generate only one realization where spatial variations are smoothed along with a kriging variance surface. The drawback is that quantifying joint uncertainty of the model at unsampled locations is not possible. Also, the local variations are suppressed through smoothing, which may not be a realistic representation of reality. Simulation methods (either relying on a variogram or a TI) are more commonly used to generate realizations since these
Realizations
methods can generate multiple realizations through a stochastic process. Depending on the needs, data limitations, spatial complexity, and the objectives of the work, either a variogram-based method or a multiple point simulation algorithm along with a TI as a conceptual model to define local patterns can be used to generate realizations. The realizations generated through multiple point simulations generally exhibit local patterns similar to those conceptualized in the TI (Pyrcz and Deutsch 2014). Simulations may or may not be conditioned to the data in sampling locations, depending on the approach. Conditional simulations honor the data at sampling locations while estimated values vary outside the sampled locations constrained by spatial continuity, trend, and secondary information (Pyrcz and Deutsch 2014). On the other hand, in realizations generated using the unconditional approach, locations of high- and low-estimated values do not necessarily correspond to the actual locations of the high- and low- sampled data as they are not preserved at their locations. Therefore, conditional simulation is preferred more often to generate realizations.
The Concept of “Equiprobable” Multiple realizations generated by repeating the entire simulation process are used to assess the joint uncertainty of the model at unsampled locations. Each realization obeys the input statistics and trends, as well as the data at sampling locations. Since each of these realizations are generated by repeating the same stochastic process during simulations and are constrained with the same data and spatial correlations, they are equally likely to occur and equally likely to be the representation of reality. Therefore, they are often called “equiprobable.” However, equal probability refers and applies to generation and representativeness of realizations when they are considered individually. In reality, when all realizations that represent the same attribute are evaluated with a criterion that is important for the goals of the project, some realizations are more similar to others and that class has a higher probability within the entire population of realizations. Therefore, the term equiprobable is true for individual realizations in the sense that each of them is one of the outcomes of the simulation process with equal probability. However, they may not have the same probability when they are ranked.
Number of Required Realizations, Screening, and Ranking With the vast number of realizations that can be generated through stochastic simulations, it is important to determine the optimum number of realizations needed as well as the
Realizations
1197
minimum screening and ranking criteria. Simulation algorithms can generate tens or hundreds of realizations depending on the need. However, as the number of nodes and the number of realizations increase, computational time and data storage requirements increase and may become physical limitations. For the purposes of generating a realization population for ranking and for practical purposes of spatial uncertainty assessments, usually 100 realizations can be considered optimal. This number of realizations forms a robust population for visualization, uncertainty assessment, and ranking for use in other models. There are a set of criteria discussed by Leuangthong et al. (2004) for screening and acceptance of realizations. If these criteria are not met within acceptable range, realizations cannot be considered as representations of reality as they are intended to. According to these authors, realizations should reproduce (1) data values at their location – a crossplot of the data and the simulated values at the data locations should be generated to see if the data follows the 1:1 line; (2) the histogram of the attribute of interest and basic statistics – A “Q-Q” plot of realization data versus the data at sampling locations can be utilized to compare the two distributions, which should follow the 1:1 line. Alternatively, global reproduction of the histogram can be examined on a few randomly selected realizations. The key features to note are the histogram shape, the range of the simulated values, and the summary statistics, such as the mean, median, and the variance; (3) the spatial continuity of the data – the variogram should be calculated for multiple realizations and compared to the input variogram model in the same directions. The model variogram should be reproduced within acceptable ergodic fluctuations. Ordering and ranking of the realizations, for both visualization and use in other models, can be achieved by capturing
0.2
P5
P50
the similarities and differences between them. De Barros and Deutsch (2017) argue that calculating the Euclidean distance between the realizations can be an effective approach, as well as other distance measures including the use of connectivity and other flow-related properties, or weighting the realizations according to calculated resources, such as gas quantity. For example, Fig. 1 shows a ranking of 100 total gas in place (GIP) realizations of a coal-bearing strata and the realization corresponding to P50 of the probability distribution.
Further Evaluation of Realizations for Mechanistic Model Uncertainty Generation of realizations is rarely the only and the final objective of geostatistical simulations, unless realizations are for resource assessments (e.g., gold quantity or coal tonnage). For example, in petroleum engineering studies, the final product can be an uncertainty model defining reservoir architecture to aid in making optimum decisions for given injection/production scenarios. Therefore, once realizations are compiled and pass the minimum screening and acceptance criteria, they may be ranked or subjected to post-processing for use in mechanistic models, such as reservoir simulations (e.g., realizations of porosity, permeability, or gas in place) for conducting sensitivity analysis of reservoir response. When utilizing realizations in flow models, for example, either all realizations, or ranked realizations with defined marginal probabilities (P5, P50, and P95), or the E-type models or local percentile models can be used (Karacan and Olea 2015; Thanh et al. 2020). An E-type model, which is the local average of all realizations at each node, provides an indication of the most likely value of each reservoir property at a specific node. Similarly, local percentile models can be calculated
R
P95 GIP (MMscf)
0.18
8000
1.10
0.14 0.12
Nothing (ft)
Re lat ive fre que ncy
0.16
0.1 0.08 0.06
0.90
6000
0.70
4000
0.50 2000
0.30
0.04 0
0.02 0 4300
4400
4500
4600
4700
4800
4900
0
2000
4000
6000
8000 10000 12000 14000 16000
0.10
Easting (ft)
5000
Gas in place (GIP) MMscf
Realizations, Fig. 1 Histogram of total gas in place (GIP) realizations generated using sequential Gaussian simulation for ranking, and the P50 realization
1198
Realizations
from the realizations and are good indicators of areas with definitive high or low values (Fig. 2). The advantage of sequentially using all realizations of properties, on the other hand, is the ability to fully assess uncertainty and to investigate the sensitivity of reservoir response to all equiprobable representations of reality. If this is not possible, or practical,
E-type models or ranked realizations can also be used to define model architecture and assess uncertainty of reservoir behavior. Historical oil or gas production data can be extremely useful for further assessing realizations to refine the architecture of the reservoir and to assess its uncertainty through
Prob. GIP > 1.133 MMscf
Realizations, Fig. 2 Local probabilities calculated for GIP value >1.133 (mean) using 100 realizations for the same coalbearing strata from Fig. 1
1
Nothing (ft)
8000
0.9 0.8
6000
2000
0.7 0.6 0.5 0.4 0.3 0.2
0
0.1 0
4000
0
2000
4000
Realizations, Fig. 3 Comparison of predicted gas production rates of a coalbed methane well with the measured rates using 100 realizations generated for different properties (white lines). The results with realization number 69, the E-type model, and the average of the best
6000
8000
10000 12000
Easting (ft)
14000 16000
realizations (BR) are marked. Permeability realizations (in md) corresponding to those production profiles are shown on the right-hand side. (Modified from Karacan and Olea 2015)
Reduced Major Axis Regression
history matching. If production data are available, the simulation results generated either using all or ranked realizations, such as E-type or different percentiles, can be compared to well-production data. For instance, if a certain combination of realizations does not produce similar production values, then these realizations can be removed from the set as possible options, thereby reducing model uncertainty (Pyrcz and Deutsch 2014). Figure 3 shows a comparison of predicted coalbed methane gas production with the measured production when all 100 realizations of different properties are used in the model. Three predictions are shown on Fig. 3 (1) Predictions provided by the E-type model; (2) the realization that gives the minimum error for all wells (realization number 69); (3) the average of the realizations that minimize the error for each of the wells (best realizations). The permeability realizations corresponding to those three gas production profiles are also shown. E-type models generate reasonable production results. However, using all realizations allows for fully assessing uncertainty and for selecting individual models that best describe the reservoir architecture and contained resources (Karacan and Olea 2015). Therefore, although each realization is an equiprobable representation of reality at the geostatistical simulation level, the realizations are not equiprobable when it comes to defining the architecture of mechanistic models, as shown by the reservoir response (Fig. 3).
Summary and Conclusions Geostatistical simulation methods generate a model of uncertainty with multiple sets of spatially distributed possible values. One set of possible outcomes is referred to as a realization. Realizations can be generated using variogrambased or TI-based methods using either conditional or unconditional approaches. Since simulations use a stochastic approach, they can generate a range of tens to hundreds of realizations, depending on the objective and physical limitations of the modeled system. However, regardless of the number of realizations, they must pass certain screening and acceptance criteria constrained by the data at sampled locations to be considered as a representation of reality and deemed useful for further post-processing or analysis. Realizations can be ranked using different distance criteria. Ranked or post-processed realizations can then be used for many different purposes spanning from resource assessments to building mechanistic models for assessing uncertainties.
Cross-References ▶ Random Variable ▶ Simulation ▶ Variogram
1199
Bibliography De Barros G, Deutsch CV (2017) Optimal ordering of realizations for visualization and presentation. Comput Geosci 103:51–58 Gubner JA (2006) Probability and random processes for electrical and computer engineers. Cambridge University Press, 363 pp Karacan CÖ, Olea RA (2015) Stochastic reservoir simulation for the modeling of uncertainty in coal seam degasification. Fuel 148:87–97 Leuangthong O, McLennan JA, Deutsch CV (2004) Minimum acceptance criteria for geostatistical realizations. Nat Resour Res 13: 131–141 Pyrcz MJ, Deutsch CV (2014) Geostatistical reservoir modeling. Oxford University Press, New York. 433 pp Remy N, Boucher A, Wu J (2009) Applied geostatistics with SGeMS. Cambridge University Press, Cambridge, UK. 264 pp Thanh HV, Sugai Y, Nguele R, Sasaki K (2020) Robust optimization of CO2 sequestration through a water alternating gas process under geological uncertainties in Cuu Long Basin, Vietnam. J Nat Gas Sci Eng 76:103208
Reduced Major Axis Regression Muhammad Bilal1, Md. Arfan Ali1, Janet E. Nichol2, Zhongfeng Qiu1, Alaa Mhawish1 and Khaled Mohamed Khedher3,4 1 School of Marine Sciences, Nanjing University of Information Science and Technology, Nanjing, China 2 Department of Geography, School of Global Studies, University of Sussex, Brighton, UK 3 Department of Civil Engineering, College of Engineering, King Khalid University, Abha, Saudi Arabia 4 Department of Civil Engineering, High Institute of Technological Studies, Mrezgua University Campus, Nabeul, Tunisia
Definition Ordinary least squares (OLS) regression assumes that the independent variable on the X-axis is measured without uncertainty or error, and the dependent variable on the Y-axis is measured with error. However, reduced major axis (RMA) regression assumes errors in both the dependent as well as in the independent variables and minimizes the sum of the diagonal (vertical and horizontal) distances of the data points from the fitted line (Harper 2016). In geosciences, the measurements are usually exposed to several types of error that vary in magnitude and sources, For example, remotely sensed aerosol optical depth (AOD) measured with either ground instruments (such as Sunphotometers) or satellite-based sensors have errors with different magnitudes, although errors are higher in satellitebased than ground-based sensors. However, irrespective of the magnitude of the error in the independent variable X-axis (ground-based measurements) and dependent variable Y-axis (satellite-based measurements), the errors in both variables
R
1200
should be considered in regression analysis. For modeling the relationship between dependent and independent variables both having an error, RMA regression is preferable to OLS regression, which assumes that the X-axis variable is measured without error (Sokal and Rohlf 1981). Moreover, if the error rate in the independent variable surpasses one-third of the error rate in the dependent variable (which is common in geosciences), then RMA regression should be considered (McArdle 1988). Firstly, RMA normalizes the variables and then fits a line of slope (β), which is the geometric mean of lines degenerating X on Y and Y on X (Sokal and Rohlf 1981). The RMA slope can be positive and negative, depending on the correlation (r) sign. However, the RMA slope is unaffected by the decreasing value of r.
Reduced Major Axis Regression
Methods Ordinary Least Squares (OLS) Regression Ordinary Least Squares (OLS) regression analysis is based on the assumption that the X (independent) variable is measured without an error or uncertainty and the Y (independent) variable is having an error or uncertainty. However, both X and Y variables (data) may have errors, especially, remote sensing data in the geosciences. Generally, in OLS regression, the errors in the dependent variable are ignored, but these errors should be considered for remote sensing data in the geosciences (Curran and Hay 1986). The OLS regression equation can be expressed as Eq. (1): Y ¼ aðOLS Þ þ bðOLS Þ X þ e
Introduction Data representation, whether geophysical observations, numerical models, or laboratory results, using the best fit line is a conventional method in the scientific and other fields. In this regard, the most common statistical methods such as ordinary least squares (OLS) regression and reduced major axis (RMA) regression are used to demonstrate a relationship between two variables (dependent and independent). However, in geosciences, both the dependent and independent variables are assumed to have errors. Specifically, remote sensing data provided by both satellite-based and ground-based instruments, have errors due to instrument calibration, algorithm uncertainties, and changing environmental conditions, etc. Therefore, to define the best fit line between remote sensing data used in geosciences applications, RMA is recommended rather than OLS (Bilal et al. 2019, 2021). The utility of RMA is demonstrated in a case study, where both RMA and OLS regression were applied to aerosol remote sensing data obtained from the MODIS (Moderate Resolution Imaging Spectroradiometer) operational Level 2 aerosol products and ground-based AERONET (Aerosol Robotic Network) Sunphotometer Version 3 Level 2.0 measurements. The remotely sensed MODIS aerosol data have pre-calculated expected errors depending on the inversion method, and similarly, the AERONET measurements have some uncertainties. The AERONET measurements are considered as standard (independent variables) for the validation of satellite-based aerosol remote sensing data (dependent variable). Therefore, the use of RMA is more suitable and recommended, as both the dependent variable (satellite remote sensing data) and independent variable (AERONET Sunphotometer measurements) have errors.
ð1Þ
where β(OLS) is the slope of OLS, α(OLS) is the intercept of OLS, and ε is the error. The β(OLS) and α(OLS) are calculated using Eqs. (2) and (3), respectively (Harper 2016): bðOLS Þ ¼
SXY SXX
ð2Þ
aðOLS Þ ¼ Y bðOLS Þ X
ð3Þ
where X is the mean of the X variable, Y is the mean of the Y variable. The SXY and SXX can be calculated using Eqs. (4) and (5), respectively: n
SXY ¼
Xi X Y i Y
ð4Þ
i¼1 n
SXX ¼
Xi X
2
ð5Þ
i¼1
Reduced Major Axis (RMA) Regression Reduced major axis (RMA) regression is based on the assumption that both the X (independent) and Y (independent) variables are measured with error or uncertainty. RMA has many potential applications in numerous disciplines including remote sensing applications in geosciences. For validation of remote sensing observations against reference (in situ) data, where both observations and reference data have errors, RMA is a better choice compared to OLS (Curran and Hay 1986). Several studies in the fields of remote sensing applications in geosciences (Drury et al. 2008;
Reduced Major Axis Regression
1201
Li et al. 2019; Lin et al. 2014) have used RMA regression to define the best fit line between dependent and independent variables. The slope (β(RMA)) and intercept (α(RMA)) of the best fit line of the RMA regression method are calculated using Eqs. (6) and (7), respectively (Harper 2016): SXY =j r j SXX
bðRMAÞ ¼
ð6Þ
aðRMAÞ ¼ Y bðRMAÞ X
ð7Þ
where |r| is the absolute value of the Pearson correlation coefficient (r), and the slope (β(RMA)) calculated using Eq. (6) can be positive or negative. The slope (β(RMA)) can also be calculated using Eq. (8) and the positive or negative sign of the slope depends on the sign of the Pearson correlation coefficient (r). bðRMAÞ ¼
sY sX
ð8Þ
where sY is the standard deviation of the Y (dependent) variable and sX is the standard deviation of the X (independent) variable. Evaluation Method To report the performance and errors in the RMA and OLS regression models, the mean bias error (MBE; Eq. 9), the root-mean-squared difference (RMSD; Eq. 10), and the mean systematic error (MSE; Eq. 11) are used. The MSE is considered as a significant method to show the difference between the trend-line of the independent (X) and dependent (Y) data, where a small value of MSE indicates a good trend and a large value of MSE denotes a trend has an error. MBE ¼
1 n
1 n
Y i Xi
i¼1
1 n
RMSD ¼
MSE ¼
n
n i¼1 n i¼1
ð9Þ
Y i Xi
Y i Xi
2
2
ð10Þ
ð11Þ
Where Y is the predicted value obtained from the RMA and OLS regression models.
Results and Discussion This study used remotely sensed aerosol optical depth (AOD) data from both satellite and ground-based Sunphotometer to demonstrate the performance of both the RMA and OLS models. Figure 1a shows the scatterplot and linear fit between MODIS AOD retrievals against AERONET Sunphotometer AOD measurements (reference data) calculated with both RMA (red line) and OLS (orange line) methods. The results show a better slope ~ 1.01 for RMA compared to the ~0.77 slope for OLS, which indicates a significant underestimation by 23%. However, no significant over- and underestimations were observed in the slope calculated by RMA. The results also showed a smaller MSE of 0.004 and RMSD of 0.063 for the RMA model compared to the OLS model (MSE ¼ 0.006 and RMSD ¼ 0.079). However, both the regression models showed the same MBE of 0.063. To further demonstrate the performance of RMA vs. OLS, satellite-based rainfall retrievals were validated against ground-based rainfall measurements (Fig. 1b). A significant difference was also observed in slope and intercept calculated by RMA compared to those calculated by OLS with small errors (RMSD, and MSE). These results suggest the better performance of RMA regression, especially in terms of slope calculation, compared to OLS, and recommend the use of the RMA regression model for geoscience research applications.
Summary and Conclusions In geosciences, satellite remote sensing data are usually required to validate against ground-based instrumental measurements. However, both datasets depend on instrumental calibration, which may introduce errors in the datasets. Therefore, the use of appropriate regression analysis is required to model the relationship between dependent and independent variables, which can consider errors in both variables. However, in general, the OLS regression method is most commonly used in geosciences to model the relationship between two variables, and it is assumed that the independent variable is measured without error, although this rarely occurs. On the other hand, the RAM method models the relationship between two variables considering errors in both the dependent as well as independent variables. Therefore, RMA regression should be considered more suitable for satellite remote sensing applications. To demonstrate the performance of RMA against OLS, remotely sensed satellite-based aerosol and rainfall data were validated against ground-based measurements. The results showed that the RMA regression is better able to model the relationship between the variables
R
1202
Reduced Major Axis Regression
Reduced Major Axis Regression, Fig. 1 Reduced major axis (RMA) regression vs. ordinary least squares (OLS) regression. (a) Validation of MODIS AOD retrievals against AERONET AOD measurements and (b) Validation of satellitebased rainfall retrievals against ground-based rainfall measurements. Where the dashed black line represents the 1:1 line, the red line represents the line of the reduced major axis (RMA) regression, and the orange line represents the line of the ordinary least squares (OLS) regression
compared to OLS regression in terms of slope, small MBE, RMSD, and MSE. This study recommends that selecting a proper regression method is essential for modeling relationships in geoscience data and that RMA regression is more suitable than OLS. Acknowledgments The authors would like to acknowledge NASA’s Level-1 and Atmosphere Archive and Distribution System (LAADS) Distributed Active Archive Center (DAAC) (https://ladsweb.modaps. eosdis.nasa.gov/) for MODIS data and Principal Investigators of AERONET sites for AOD measurements. This research was funded by the National Key Research and Development Program of China (2016YFC1400901), the Jiangsu Provincial Department of Education for the Special Project of Jiangsu Distinguished Professor (R2018T22), the Deanship of Scientific Research at King Khalid University (RGP.1/ 372/42), and the Startup Foundation for Introduction Talent of NUIST (2017r107).
Bibliography Bilal M, Nazeer M, Nichol JE, Bleiweiss MP, Qiu Z, Jäkel E, Campbell JR, Atique L, Huang X, Lolli S (2019) A simplified and robust Surface Reflectance Estimation Method (SREM) for use over diverse land surfaces using multi-sensor data. Remote Sens 11:55 Bilal M, Mhawish A, Nichol JE, Qiu Z, Nazeer M, Ali MA, de Leeuw G, Levy RC, Wang Y, Chen Y, Wang L, Shi Y, Bleiweiss MP, Mazhar U, Atique L, Ke S (2021) Air pollution scenario over Pakistan: characterization and ranking of extremely polluted cities using long-term concentrations of aerosols and trace gases. Remote Sens Environ 264:112617 Curran P, Hay A (1986) The importance of measurement error for certain procedures in remote sensing at optical wavelengths. Photogramm Eng Remote Sens 52:229–241 Drury E, Jacob DJ, Wang J, Spurr RJD, Chance K (2008) Improved algorithm for MODIS satellite retrievals of aerosol optical depths over western North America. J Geophys Res 113:D16204
Regression Harper WV (2016) Reduced major axis regression. In: Wiley StatsRef: statistics reference online. Wiley Online Library, Hoboken Li Z, Roy DP, Zhang HK, Vermote EF, Huang H (2019) Evaluation of Landsat-8 and Sentinel-2A aerosol optical depth retrievals across Chinese cities and implications for medium spatial resolution urban aerosol monitoring. Remote Sens 11:122 Lin J, van Donkelaar A, Xin J, Che H, Wang Y (2014) Clear-sky aerosol optical depth over East China estimated from visibility measurements and chemical transport modeling. Atmos Environ 95:258–267 McArdle BH (1988) The structural relationship: regression in biology. Can J Zool 66:2329–2339 Sokal RR, Rohlf FJ (1981) Biometry, 2nd edn. Freeman, New York
Regression D. Arun Kumar1, G. Hemalatha2 and M. Venkatanarayana1 1 Department of Electronics and Communication Engineering and Center for Research and Innovation, KSRM College of Engineering, Kadapa, Andhra Pradesh, India 2 Department of Electronics and Communication Engineering, KSRM College of Engineering, Kadapa, Andhra Pradesh, India
Definition Regression analysis is a statistical approach to find the mathematical relationship between input independent variables and output dependent variables using linear/nonlinear equations. The regression model (also called as linear or nonlinear equation) is used to predict the values of output variable for the given values of input variables.
Introduction The Earth’s resources like vegetation, minerals, water, etc. are monitored using remote sensing. Remote sensing data analysis is based on statistical methods like regression, classification, and clustering (Richards and Jia 2006). Regression (also identified as prediction) involves obtaining output values of themes such as temperature, slope, pressure, elevation, etc., for the respective input values (Jensen 2004). One of the major applications of regression analysis is the prediction of weather parameters like pressure, temperature, and humidity using the data obtained from the sensors. Data classification is labeling the pixels in images to the classes like land use/land cover, weather parameters, etc. Clustering is an unsupervised data analysis approach in which the data points (also called as pixels in the image) are grouped into clusters without class labels (Theodoridis and Koutroumbas 1999). The clustered data points are assigned with corresponding class labels using ancillary data and ground truth.
1203
The information related to Earth resources is represented in the form of digital images (Matus-Hernández et al. 2018). A digital image is mathematically defined as 2D function. The digital image consists of finite elements called pixels (Lillesand and Kiefer 1994). The digital images in remote sensing vary from monochromatic to multispectral and hyperspectral images (Campbell 1987). In multispectral and hyperspectral images, the pixels are also defined as pixel vectors or patterns (Bishop et al. 1995). A pattern is an entity associated and characterized by features (Duda et al. 2000). The pixel vector is a point in N-dimensional feature space. Regression analysis is predicting the output values for the unknown patterns based on the existing output values of patterns in the dataset (Doan and Kalita 2015). Regression analysis is implemented using a mathematical model by fitting a linear/ nonlinear equation from the training data (Matus-Hernández et al. 2018). There exist various methods of regression analysis for remote sensing application. These methods are broadly classified as linear and nonlinear approaches. The detailed description of these methods is given in following sections. Different methods of regression analysis are discussed in section “Type of Regression Analysis in Remote Sensing.” The evaluation methods of regression analysis are given in section “Evaluation of Regression model.” The results of linear regression model for an example dataset is provided in section “Example: Linear Regression.” The relevant conclusion of regression analysis is given in section “Conclusions.”
Type of Regression Analysis in Remote Sensing The functional block diagram of the regression model is provided in Fig. 1. The input pixels/patterns are used to prepare the required dataset to fit the regression model. The data is preprocessed, and later, the dataset is divided into training set, testing set, and validation set. The training set is used to fit the equation for the regression model. The regression models are classified as linear or nonlinear models based on the relationship between input variables and output variable. A linear equation representing linear regression model is used to predict the values of the output variable if there exists a linear relationship between input variables and output variable. The equation for a single input variable linear regression model is given as: y ¼ a0 þ a1 x
ð1Þ
where y is the output variable, x is a single input variable, and a0 and a1 are coefficients. Similarly, a nonlinear equation is used to fit a regression model if the relationship between input feature and output feature is nonlinear. In non-linear
R
1204
Regression
Regression, Fig. 1 Regression model for remote sensing data analysis. The model considers input data, preprocessing of input data, fitting the linear or nonlinear equation, classifying/clustering the data, assigning the class label, and obtaining the classified map
Input
Data Pre-
Regression
Data
processing
Model
Output Classification Classified
Class label
/ Clustering
Map
regression analysis, a polynomial equation is used to fit the relationship between the dependent variable (output variable) and independent variable (input variable). The nth order polynomial equation for the nonlinear single input regression model is given as: y ¼ a0 þ a1 x þ a2 x 2 þ þ an x n ,
ð2Þ
where a0, a1, a2, and an are coefficients. The regression models are also classified based on the dependence of the output variable on the number of input variables (features). The models are classified as single output-single input regression model, single output-multiinput regression model, multi input-multi-output regression model. The single output single input regression model predicts the values of the output variable which is dependent on the single input variable. The mathematical equation for single output-single input regression model is given in Eq. 1. The single output-multi-input regression model is used to predict the values of output variable which depends on multiple inputs. The equation for single output multi-input regression model is given as: y ¼ a0 þ a 1 x 1 þ a 2 x 2 þ þ a n x n ,
ð3Þ
where a0, a1, a2, . . ., an are coefficients and x1, x2, and xn are input variables or features. The models such as neural networks are used for multi-input-multi-output regression analysis. The regression model is tested with the test dataset by passing each pattern/pixel vector and predicting the values of the output variable. The regression model is validated using the validation dataset. The predicted values of the patterns/ pixels are then classified by using various supervised/ unsupervised classification techniques. The major supervised classification techniques include K-NN, minimum distance to mean, neural networks, decision trees, and random forest. The unsupervised classification techniques include clustering techniques like K-means clustering, hierarchical clustering,
density-based SCAN, etc. The patterns are labeled with the corresponding class labels to obtain the output classified map.
Evaluation of Regression Model The performance of the regression model is evaluated using three important performance metrics like R square, root mean square error (RMS), and mean absolute error (MAE). R square is the square of the correlation coefficient (R), and it is the measure of variability in dependent/output variable. Root mean square error is the square root of mean square error. Mean square error is the sum of the square of predicted output value and the actual value divided by the total number of samples. RMS error is calculated as: n
RMS ¼
i¼1
ðy i y i 0 Þ2 N
ð4Þ
where N is the total number of patterns/pixels and n is the nth pixel in N number of pixels. yi is the actual value of output variable, and yi0 is the predicted value of output variable. The mean absolute error (MAE) is the sum of absolute value of error divided by total number of pixels. n
MAE ¼
i¼1
jyi yi 0 j N
ð5Þ
The values of evaluation metrics for the regression model lies between 0 and 1 while 0 indicates less error and 1 indicates the maximum error.
Example: Linear Regression In the present study, two datasets were considered to demonstrate model fitting for linear regression and polynomial regression. The first dataset consists of pixel values and
Regression
Regression, Fig. 2 Linear regression model to predict elevation values for the given pixel values (training set). The pixel values and the corresponding elevation values are collected from the SRTM digital elevation model (DEM) image
corresponding ground elevation (m). The dataset is prepared by considering the SRTM DEM image with a spatial resolution of 90 m. The dataset is divided in to the training set and test set. The training set is used to fit the linear regression model, and the test set is used to test the performance of the model. The linear regression model for the training dataset is given in Fig. 2. The linear regression model is tested on the test dataset, and the relationship between the predicted values and the input pixel values is given in Fig. 3. The second dataset consists of pixel values and corresponding slope values. The dataset is generated from the raster slope map using QGIS. The polynomial regression model for the second dataset is given in Fig. 4. In Fig. 4, the x-axis represents the digital number (DN) or pixel value, and the y-axis represents the slope value. The fourth-order polynomial is used to fit the polynomial regression model for the second dataset. The performance of regression models was evaluated using three metrics such as RMSE, MAE, and R square. The values of three metrics were obtained between 0 and 1 for two datasets.
Conclusions In the present study, the basic regression models for data prediction were presented with example datasets. Emphasis is given on regression models such as linear, polynomial, single input-single output, and single output-multi-input. In the present study, two datasets were considered as an example to demonstrate the functioning of linear and polynomial regression analysis for both training and test cases. The model was evaluated using the metrics like R square, MSE, and MAE, and the values were obtained between 0 and
1205
Regression, Fig. 3 Linear regression model to predict elevation values for the given pixel values (test set). The pixel values and corresponding elevation values are collected from the SRTM digital elevation model (DEM) image
Regression, Fig. 4 Polynomial regression model to predict slope values for the given pixel values. The pixel values and corresponding pixel values are collected from a raster slope map generated using QGIS
1. Furthermore, there exist advanced regression models like logistic regression, robust regression, probit regression, ridge regression, etc., to predict the values of the output variable. Acknowledgments The author is thankful to K. Madan Mohan Reddy, vice chairman, Kandula Srinivasa Reddy College of Engineering, and A. Mohan, director, Kandula Group of Institutions for establishing the Machine Learning group at Kandula Srinivasa Reddy College of Engineering, Kadapa, Andhra Pradesh, India – 516003.
Bibliography Bishop CM et al (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
R
1206 Campbell JB (1987) Introduction to remote sensing. The Guilford Press, New York Doan T, Kalita J (2015) Selecting machine learning algorithms using regression models. In: 2015 IEEE international conference on data mining workshop (ICDMW), IEEE, pp 1498–1505 Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley Interscience Publications, New York Jensen JR (2004) Introductory digital image processing: a remote sensing perspective, 3rd edn. Prentice Hall, Upper Saddle River Lillesand TM, Kiefer RW (1994) Remote sensing and image interpretation. John Wiley and Sons, Inc., Chichester Matus-Hernández MÁ, Hernández-Saavedra NY, Martínez-Rincón RO (2018) Predictive performance of regression models to estimate chlorophyll-a concentration based on landsat imagery. PLoS One 13(10):e0205682. https://doi.org/10.1371/jour-nal.pone.0205,682 Richards JA, Jia X (2006) Remote sensing digital image analysis: an introduction, 4th edn. Springer Verlag, Berlin, Heidelberg Theodoridis S, Koutroumbas K (1999) Pattern recognition and neural networks. In: Advanced course on artificial intelligence. Springer, pp 169–195. Pattern Recognition, 4th edn. Academic Press, Inc., USA
Remote Sensing
back in 1858 from balloon-based photography of Paris and later through rocket-propelled camera systems by the Germans in 1891. The first image of the Earth from space was taken by the Explorer 6 satellite of the United States in August 1959. Systematic Earth observation started with the launch of Television Infrared Observation Satellite (TIROS-1) on April 1, 1960, designed primarily for meteorological observations. Remote Sensing as we understand today gained momentum after the launch of Earth Resources Technology Satellite-1 (ERTS, later renamed as LANDSAT 1) in 1972 by the National Aeronautics and Space Administration, USA. The basic objective of earth remote sensing is to obtain inventory of natural resources in a spatial format at different time intervals, to ascertain their condition, to observe dynamics of the atmosphere and oceans, and to help understand various physical processes in an integrated manner. Subsequent to the launch of LANDSAT-1, many space agencies of different countries and some private players have successfully launched a number of RS satellites.
Remote Sensing Ranganath Navalgund1 and Raghavendra P. Singh2 1 Indian Space Research Organisation (ISRO), Bengaluru, India 2 Space Applications Centre (ISRO), Ahmedabad, India
Definition Remote sensing (RS) is the science of making inference from measurements made at distance without any physical contact with the objects understudy. It refers to identification of earth features by detecting electromagnetic (EM) radiation that is reflected and/or emitted by the surface at various wavelengths. It is based on the scientific rationale that every object reflects/scatters a portion of electromagnetic energy incident on it. The reflection depends on physical and chemical properties of the material. The object also emits radiation depending upon its temperature and emissivity.
Introduction Visual perception of objects by the human eye is an example of remote sensing. However, it is confined to visible portion of EM spectrum. Modern remote sensors are capable of detection of electromagnetic radiation extending from gamma rays, x rays, and ultraviolet to infrared and microwave regions. While, in general, the medium of interaction for RS is through the electromagnetic radiation, gravity and acoustics are also employed in some specific circumstances. However, acoustic RS is very limited because it requires a physical medium for transmission. Aerial remote sensing started way
Physical Basis and Signatures Study of the interaction of EM radiation with matter at different wavelength regions is paramount to remote sensing. Electromagnetic radiation covers a large spectrum from the very short wavelength gamma rays ( min max Xðt, N Þ Xðt, N Þ (Eq. 1). Rð N Þ ¼ 1tN 1tN R(N) is called the range of the random variables Xt; t ¼ 1, 2, , N, it depends on N and on the standard deviation SðN Þ ¼ 1 N
N
Xt XN
2
1=2
. To eliminate this dependence, Hurst
t¼1
(1951) introduced the rescaled range R ðN Þ ¼ RSððNNÞÞ (Eq. 2). To understand how R (N ) changes with N, Hurst considered tossing k coins N times, and defined the random variable Xt as the difference between the number of heads and tails in the t-th experiment. For this process 1=2 RðN Þ RðN Þ ¼ p2 kN 1; S ¼ k1=2 , that is R ðN Þ ¼ SðN Þ / 1=2 p 2N
(Eqs. 3a, b, c; Korvin 1992: 329). Hurst analyzed some 120 geophysical and economic time series (river discharges, rainfall, water level, temperature, wheat prices, etc.; see Korvin 1992, for modern examples Hamed 2007) and found that R/S approximately scales as R ðN Þ ¼ RSððNNÞÞ / N H (Eq. 4), where the exponent H always lied in the range H ¼ 0.73 0.09, that is, significantly differed from the theoretically expected H ¼ 1/2. Later, the exponent H was called Hurst exponent, the non-trivial scaling with exponents H 6¼ 1/2 is the Hurst phenomenon or Hurst effect, and the determination of H is the task of the Rescaled Range (R/S) analysis.
Random Processes with Hurst Exponent H 6¼ 1/2 As seen (Eq. 3c), the expected Rescaled Range of a sequence {Xt}, t ¼ 1, , N, of N 1 random coin-tossing grows as 1=2 E½R ðN Þ ¼ E RSððNNÞÞ / p2 N (Eq. 4), this also holds asymptotically for any sequence of identically and independently distributed (iid) normal variables E RS ¼ p2N 0:5 (Korvin 1992: 329). Anis and Lloyd (1976) expressed the exact expected value of the rescaled range for iid normal varin 1 n1 G ni 2 for n340 p n i pG RðnÞ i¼1 ables as E SðnÞ ¼ 2 n1 1 ni for n>340 np i i¼1 2 (Eq. 5a, b), where Γ(x) is the gamma function. Hamed (2007) noted that substituting x ¼ N/2, a ¼ 1/2, b ¼ 0 into the known lim Gðn= 1= 1=2 xþaÞ relation xba GGððxþb yields G 2n= 2 2n Þ¼ 1 ð 2Þ x!1 (Eq. 6a) in the first factor in the right-hand side of
N1
Eq. (5a) and, while r¼1 N1
of N the expression r¼1
Nr r Nr r
pN 2 for N 1, for small values 0:5 pN (Eq. 6b) gives 2 1:46N
better approximation. Substituting Eqs. (6 a, b) to Eq. (5), Hamed got the improved estimate E RS p2∙n0:5 1:164 (Eq. 7). There have been many speculations about what kind of processes exhibit the anomalous scaling R/S / NH with H 6¼ 1/2 (Korvin 1992). A feasible model is the fractional Brownian noise ( fBn) model of Mandelbrot and Van Ness (1968). They defined first fractional Brownian motion ( fBm) of Hurst exponent H which is a non-stationary self-affine 0 H0:5 1 random process BH ðtÞ ¼ GðHþ0:5 1 jt lj Þ t
jljH0:5 dBðlÞ þ 0 jt ljH0:5 dBðlÞg (Eq. 8a) where B(t) is a Gaussian process with zero mean and s2 variance, and proved that the continuous-time process BH(t) satisfies: (i) its increments are stationary; (ii) BH(t) is Gaussian, hBH(t)i ¼ 0; h[BH(1)]2i ¼ s2 (Eq. 8b); (iii) BH(t) is statistically invariant (self-affine) under the transformation {t ! lt; BH ! lHBH} (Eq. 8c). Self-affinity implies that BH(lt) BH(0) ¼ lH{BH(t) BH(0)} (Eq. 8d). (The process B0.5(t) is the ordinary Brownian motion). It is easy to prove that h[BH(t þ T ) BH(t)]2i / s2T2H (Eq. 9). Next, they ðH Þ constructed the process Xt from the increments of BH(t), as ðHÞ Xt ¼ BH ðtÞ BH ðt 1Þ (Eq. 10) and called it fractional Brownian noise ( fBn) with exponent H (Mandelbrot and Van Ness 1968), it is stationary, of zero mean and variance s2. By simple algebra (using Eqs. 8 and 9) the correlation between ðHÞ ðH Þ ðHÞ ðH Þ X1 and XK is: X1 ∙XK / s2 H ð2H 1ÞK 2H2 if ðH Þ
ðH Þ
K 1 and H 6¼ 1/2 (Eq. 11a); and X1 ∙XK ¼ 0 for 8K 6¼ 1 if H ¼ 1=2 (Eq. 11b). In Eq. (11a) the correlations are always positive, and fall off as a power of K. Equation (11b) (for H ¼ 1/2) describes the uncorrelated white ðH Þ noise. Let us prove that fXt g ¼ Xt satisfies Hurst’s Law, RðN Þ SðN Þ
/ N H . If for a process {Xi} we denote by X (t) the partial t
sum
Xi , then the range of {Xi} can be re-written in terms of i¼1
X (t) as RðN Þ ¼
max 0tN
In particular, for the process X
X ðtÞ Nt X ðN Þ (Eq. 12).
min 0tN
ðH Þ
Xt
, Eq. (10) implies that
(H )
(t) ¼ BH(t) (Eq. 13). By the affine property, Eq. (8),
changing the variable t to t ¼ 1/N in Eq. (12), we obtain RðN Þ ¼ max 0tN
min 0tN
N H BH ðtÞ tN H BH ð1Þ ¼ const∙N H ,
and because S(N) / s for the f Bn {Xt}, Hurst’s rule N H (Eq. 14) holds indeed.
RðN Þ SðN Þ
/
Rescaled Range Analysis
The Hurst Exponent H as an Indicator of LongTerm Behavior Let Z(x) be an f Bm function which is only known at the two endpoints x ¼ 0 and x ¼ L of an interval [0, L], and suppose we need to estimate Z(x) inside the interval 0 x L (“interpolation”) and for values Lx (“extrapolation,” “prediction”) using the optimal unbiased linear combination Z(x) ¼ l1Z(0) þ l2Z(1) (Mandelbrot and Van Ness 1968 Hewett 1986 Korvin 1992: 336–337). The coefficients l1 & l2 are found from the system l2 gð0, LÞ þ m ¼ gð0, xÞ of equations l1 gð0, LÞ þ m ¼ gðL, xÞ (Eq. 15), where m l1 þ l2 ¼ 1 is a Lagrange parameter, and gðx, yÞ ¼ 12 jZðxÞ ZðyÞj2 the semivariogram. By Eq. (9) for an f Bm with Hurst exponent H and variance s2 we have gðx, yÞ ¼ 12 s2 jx yj2H (Eq. 16), the system of Eq. (15) is easily solved and yields the interpolation-extrapolation formula Z ðxÞ ¼ Z ðLÞ 2H 1 þ x2H ¼ ZðLÞ∙QðxÞ, where x ¼ Lx (Eq. 17). 2 1 j1 xj The extrapolation-interpolation function Q(x) is shown in Fig. 1 for different values of H. We have Q(0) ¼ 0, Q(1) ¼ 1, for 0 x 1 the function is almost linear. For x > 1, note that | 1 x| ¼ ( x 1) and x2H ð x 1Þ2H ddx x2H x2H1 that is for x > L the function Z(x) is expected to continue as ZðxÞ / 2H1 ZðLÞ∙ Lx (Eq. 18). For H ¼ 1/2 the best prediction is the last functional value; for H > 1/2 the trend over the interval will be continued (“persistence”), for H < 1/2 the trend will be reversed and the predicted value tends to the mean over the interval [0, L] (“antipersistence”).
1215
Traditional R/S Analysis Rescaled Range (R/S) analysis is used to find H for a timeseries. It was invented and widely applied by Hurst (1951) and, in its classic form, consists of the following steps (Korvin 1992). (i) A time series of length N 1 is divided into a great number of shorter sub-intervals of respective lengths n ¼ n1, n2, n3 where n1 ¼ N > n2 n3 (Eq. 19a). One way to do this is to select, in turn, n ¼ N, N/2, N/4, . . . or, more generally, n ¼ N, N/n, N/n2, (Eq. 19b); (ii) for a partial time series X of length n; {x1, x2, , xn} we calculate its mean: m ¼ 1n
n
xi and (iii) subtract it from all data to get i¼1
mean-adjusted values yi ¼ xi m (i ¼ 1, 2, , n); (iv) we compute the cumulative sums Zt ¼
t i¼0
yi ðt¼ 1, 2, , nÞ;
(v) calculate the range R(n) ¼ max (Z1, Z2, , Zn) min (Z1, Z2, , Zn); (vi) determine the Standard Deviation Sð nÞ ¼
1 n
n i¼1
y2i ; (vi) The process is repeated for each sub-
period having the same length n, yielding the average rescaled ranges RSððnnÞÞ over all sub-intervals of length n. (vii) The Hurst exponent is estimated by fitting the power law
RðnÞ SðnÞ
¼ C∙nH
(Eq. 20) to the data. Least squares regression on the logarithms of each side of Eq. (20) yields log RSððnnÞÞ logC þ Hlogn
(Eq. 21a), the sign “≈” of approximate equality refers to the problematic issues and inevitable errors of logarithmic fitting. (viii) If the lengths of subintervals are powers of some integer n, as in Eq. (19b), it is recommended (Krištoufek 2010) to use base- n logarithm, that is log n RSððnnÞÞ log n C þ H log n n (Eq. 21b).
Recent Improvements and Variants
Rescaled Range Analysis, Fig. 1 The interpolation-extrapolation function Q(x) ¼ Q(x/L ) for different values of H. (From Korvin (1992: 337, Fig. 5.3, after Hewett 1986))
Detrended Hurst Analysis Detrended Hurst analysis, also called Detrended Fluctuation Analysis DFA (Bassingthwaighte and Raymond 1994; Krištoufek 2010) uses a different measure of dispersion – squared fluctuations around the trend of the signal, rather than around the local mean, as in ordinary R/S analysis. Because we de-trend the sub-periods, the method works for non-stationary time-series, contrary to R/S. The procedure is labeled DFA-0, DFA-1, and DFA-2 according to the order of polynomials fitting the trend. The polynomials are seldom higher than second order (Krištoufek 2010). The procedure starts as in R/S: the whole series is divided into sub-periods of length u (with increasing lengths u), then ðiÞ for each subperiod Pi ¼ {x1, , xn} a polynomial fit Pn,l ðtÞ is established (where l is the order of the polynomial, n the
R
1216
Rescaled Range Analysis ðiÞ
length of the sub-period). The de-trended signal Y n,l ðtÞ ¼ ðiÞ xt Pn,l ðtÞ; t ¼ 1, 2, , n (Eq. 22) has a fluctuation F2ðiÞ ðn, lÞ ¼ 1n
n
t¼1
ðiÞ 2
Y n,l
ðtÞ (Eq. 23). We do not divide the
fluctuations by the variance but take their average over all subperiods Pi of length n, in order to get F2DFA ðn, lÞ, which scales as F2DFA ðn, lÞ n2HðlÞ (Eq. 24). The argument “l” in the estimated H indicates that it might depend on the degree of the fitting polynomial. Finally, an ordinary least squares regression on the logarithms of both sides of Eq. (24) yields H(l) as slope of the straight line log FDFA(n, l) ≈ logc þ H(l)logn (Eq. 25). The Structure Function Method The Structure Function Method utilizes that the qth-order structure functions Sq(t) ¼ h|W(ti þ t) W(ti|qi of processes W(t) showing the Hurst effect scale as Sq(t)~Cq ∙ tqH(q) (Eq. 26). The exponent H(q) is determined by plotting Sq(t) vs. t on a log–log plot, and finding the slope qH(q). Gilmore et al. (2002), in their analysis of plasma turbulence, eliminated the effect of q on H(q) by using q ¼ 0.5,1.0,1.5,. . .,5.0 and averaging the H(q) exponents. A Modified Scaling Law Hamed (2007 noted that traditional R/S analysis gives biased estimates of the Hurst exponent for finite samples, and more precisely computed the exact expression of Anis and Lloyd (1976) for the expected rescaled range of independent normal summands, n 1 n1 G ni 2 for n340 p n i pG RðnÞ i¼1 E SðnÞ ¼ . 2 n1 1 ni for n>340 np i i¼1 2 He derived, instead of the customary asymptotic formula E RS ¼ p2N 0:5 , an improved scaling law E RS 0:5 p 1:164 . Inspired by this equation, he proposed 2∙nN the
RðnÞ SðnÞ
data should be non-linearly fitted as E
RðnÞ SðnÞ
¼
anH þ b (Eq. 27). He applied this to a large set of data and found: (a) For the annual mean temperature data from 56 Historical Climatological Network (HCN) stations in Midwest USA the proposed estimator gave an average estimated Hurst coefficient 0.645 compared to 0.725 with the old method(11% change); (b) for annual rainfall data from 60 HCN stations in Midwest USA it gave H ¼ 0.568 instead 0.650(13% change); (c) for annual river flow data from 49 U.S. Geological Survey (USGS) river stations in Midwest USA H ¼ 0.631 as compared to 0.621 (+2% change); for tree-ring data from
88 locations over the world H ¼ 0.596 increased to 0.639 (+7% change). Anis-Lloyd Corrected R/S Hurst Exponent Anis and Lloyd (1976) derived different theoretical expressions for the expected rescaled range of independent normal summands for small and large values of n:
E
RðnÞ ¼ Sð nÞ
G p
n 1 n1 ni 2 n i pG i¼1 2 n1 1 ni np i i¼1 2
for
n340
: for
n>340
The Anis-Lloyd corrected R/S Hurst exponent is calculated as 0.5 plus the slope of RSððnnÞÞ E RSððnnÞÞ . A slight modification of the original Anis-Lloyd formula by a factor (n 1/2)/n, n1 n 1=2 2 ∙p n n pG 2 n 1=2 1 n1 ∙ n np i¼1 2 G
E
RðnÞ ¼ S ðnÞ
n1 i¼1
ni i
ni for n 340 i for n > 340 (Eq:28)
gives further improvements for small n (Weron 2002). Estimating H from the Power Spectrum The method (Weron 2002) utilizes that (by Eq. 11) an f Bn with exponent H has a power spectrum G( f ) / f 2H þ 1 ¼ f β (Eq. 29) with β ¼ 2H 1 (Eq. 30) (see Korvin 1992: 341, Eq. 5.1.2.2), f is frequency. If P( f ) is power spectrum of a fractal process, its Hurst exponent H is obtained from the logP( f ) versus log f plot. A straight line with a negative slope (–β) is obtained for a self-affine process, and the Hurst exponent is found as H ¼ (β þ 1)/2 (Eq. 31). A nice example is Hewett’s study (1986; Korvin 1992: 333, 335) who analyzed the porosity values derived from a gamma-gamma (“density”) log through 1000 feet of sandstone (Fig. 2a). R/S analysis of the porosity log resulted in H ¼ 0.85. As an independent check, Hewett computed the power spectrum of the log, established a fall-off with frequency /f β, with β ¼ 0.71 ≈ 2H 1. Note that the relation (Eq. 31) had been rigorously proved for f Bn processes only (Mandelbrot and Van Ness 1968), and is not necessarily valid for an arbitrary time series.
Rescaled Range Analysis
1217
Rescaled Range Analysis, Fig. 2 (a) Density-derived porosity log through a sandstone formation, depth in feet; (b) its R/S analysis, dashed line shows the H ¼ 2 case. (From Korvin (1992: 333, Fig. 5.2 d1 and d2, after Hewett 1986))
Periodogram Regression This is another spectral method which estimates the so-called fractional differencing parameter d from the periodogram, and from the relation H ¼ d þ 0.5 (Eq. 32) established for a fractional Gaussian noise with Hurst exponent H, gets H (Weron 2002). For the data {X1, X2, , XL } the periodogram (analogue of the spectral density for sampled data) is p
computed
as
PL ðok Þ ¼ L1
L
2
xt ∙e2pjðt1Þok ; j ¼
t¼1
1; ok ¼ Lk ; k ¼ 1, , L=2 (Eq. 33), then we run a linear regression log[PL(ok)] ≈ a d ∙ log {4sin2(ok/2)} (Eq. 34) at sufficiently low frequencies ok, k ¼ 1, 2, , K L/2 (Eq. 35) for which the logarithmic slope is approximately linear. The choice of K in Eq. (35) is crucial, in practice L0.2 K L0.5. Apparently, this is the only method with known asymptotic properties for finding H (Weron 2002: 289).
Practical Issues There is a consensus among practitioners (Bassingthwaighte and Raymond 1994; Gilmore et al. 2002; Hamed 2007; Krištoufek 2010) that if the data is of short duration, contains gaps or bursts of noises, or it is non-stationary, then the calculation of the H by R/S analysis is not reliable. When tested on computer-generated fBn signals of known H it has been found that R/S gives biased estimates of H: too low for H > 0.72, and too high for H < 0.72, and the method has poor convergence properties, it requires about 2000 points for 5% accuracy and 200 points for 10% accuracy (Bassingthwaighte and Raymond 1994). According to Gilmore et al. (2002), the R/S method accurately determines
H in plasma fluctuation data provided a long enough record is used, but lags must be greater than five to ten times the autocorrelation time tL. They claim the structure function only requires lags t 1 tL. (For a random process X(t) the autocorrelation time is the lag where the autocorrelation function falls off to (1/e)-times its peak value, jRXX ðtÞj 1 e RXX ð0Þ for 8t tL) Is it important to remove local trends in R/S analysis? Hurst (1951) himself applied trend correction. Bassingthwaighte and Raymond (1994), who tested the performance of R/S analysis on fractional Brownian noise of known H, reported that the locally trend-corrected method gives better estimates of H when H 0.5, that is, for persistent signals with positive correlations between neighboring values.
Concluding Remarks Since introduced by Hurst (1951), Rescaled Range (R/S) Analysis has been used to find long-term dependence in hydrologic and geophysical records and other real-world data. R/S Analysis provides the Hurst exponent which characterizes long-term correlations in stochastic processes. There are persistent (H > 0.5) and antipersistent (H < 0.5) processes, for the white noise H ¼ 0.5. Natural processes with Hurst exponent H can be modeled by fractional Brownian noise ( fBn) of Hurst exponent H. There are many variants of R/S Analysis, here we discussed (i) Traditional R/S analysis; (ii) Detrended Fluctuation Analysis DFA; (iii) a modified scaling law; (iv) the Anis-Lloyd corrected R/S Hurst exponent; (v) Estimating H from the power spectrum; (vi) Periodogram regression. To find their relative merits,
R
1218
speed up their convergence, and determine their asymptotic properties need future research.
Cross-References ▶ Allometric Power Laws ▶ Autocorrelation ▶ Computational Geoscience ▶ Constrained Optimization ▶ Exploratory Data Analysis ▶ Geocomputing ▶ Geomathematics ▶ Geostatistics ▶ Hurst Exponent ▶ Interpolation ▶ Kriging ▶ Mandelbrot, Benoit B. ▶ Mathematical Geosciences ▶ Porosity ▶ Scaling and Scale Invariance ▶ Signal Analysis ▶ Stationarity ▶ Time Series Analysis ▶ Time Series Analysis in the Geosciences
Reyment, Richard A.
Reyment, Richard A. Peter Bengtson Institute of Earth Sciences, Heidelberg University, Heidelberg, Germany
Fig. 1 Richard A. Reyment (Courtesy of Professor Reyment’s daughter Britt-Louise Kinnefors)
Acknowledgments Dedicated to the memory of our Master, Benoit Mandelbrot (1924–2010) whose book has changed my life.
Biography Bibliography Anis AA, Lloyd EH (1976) The expected value of the adjusted rescaled range of independent normal summands. Biometrica 63(1):111–116 Bassingthwaighte JB, Raymond GM (1994) Evaluating rescaled range analysis for time series. Ann Biomed Eng 22:432–444 Gilmore M, Yu CX, Rhodes TL, Peebles WA (2002) Investigation of rescaled range analysis, the Hurst exponent, and long-time correlations in plasma turbulence. Phys Plasmas 9(4):1312–1317 Hamed KH (2007) Improved finite-sample Hurst exponent estimates using rescaled range analysis. Water Resour Res 43:W04413 Hewett TA (1986) Fractal distributions of reservoir heterogeneity and their influence on fluid transport. In: SPE annual technical conference and exhibition, New Orleans, October 1986. Paper number: SPE-15386 Hurst HE (1951) Long-term storage capacity of reservoirs. Trans Am Soc Civ Eng 116:770–799 Korvin G (1992) Fractal models in the Earth sciences. Elsevier, Amsterdam Krištoufek L (2010) Rescaled range analysis and detrended fluctuation analysis: finite sample properties and confidence intervals. Czech Econ Rev 4:236–250 Mandelbrot BB, Van Ness JW (1968) Fractional Brownian motions, fractional noises and applications. SIAM Rev 10(4):422–437 Weron R (2002) Estimating long-range dependence: finite sample properties and confidence intervals. Phys A 312:285–299
Richard Arthur Reyment (1926–2016) was born in Coburg, a suburb of Melbourne, Australia, in a family with roots in England, Ireland, Sweden, and Spain. After obtaining his B.Sc. from the University of Melbourne in 1948, he emigrated to Sweden. He started his scientific career as a geologist at the Geological Survey of Nigeria (1950–1956), followed by a senior lectureship at Stockholm University, Sweden (1956–1962). He was then appointed Professor of Petroleum Geology at the University of Ibadan, Nigeria (1963–1965), Associate Professor of Biometry at Stockholm University (1965–1967), and finally Professor and Chair of Historical Geology and Palaeontology at Uppsala University, Sweden (1967–1991). After retirement, he continued his research as Emeritus Professor at the Swedish Museum of Natural History in Stockholm. In 1955 Reyment obtained his M.Sc. from the University of Melbourne on the subject of Cretaceous mollusks of Nigeria and shortly thereafter his Ph.D. from Stockholm University on the stratigraphy and palaeontology of the Cretaceous of Nigeria and the Cameroons. During his time as senior
R-Mode Factor Analysis
lecturer at Stockholm University, his research focused mainly on micropalaeontology. In 1958/59 the Finnish palaeontologist Björn Kurtén visited Stockholm and introduced him to the use of statistical methods. This marked the beginning of what was to become his major field of research, biometric applications in palaeontology and related sciences. In 1960/ 61 he spent a period as research associate in the Soviet Union, where he was influenced by Andrei Borisovich Vistelius, generally regarded as the founder of the science of mathematical geology. In 1966/67 he spent a period at the Kansas Geological Survey (University of Kansas), USA, an institution with a strong record in quantitative geology, for example, through Robert R. Sokal, Daniel F. Merriam, F. James Rohlf, and Roger L. Kaesler. In the mid-1960s Reyment headed the Subcommittee for the Application of Biometrics in Paleontology of the International Paleontological Union, and from 1968 to 1974 he was a member of the IUGS Committee on Storage, Automatic Processing and Retrieval of Geological Data (COGEODATA). At the 1968 International Geological Congress in Prague, the International Association for Mathematical Geology (IAMG) was born, with Reyment as prime mover. He was the first Secretary General of the Association 1968–1972 and President 1972–1976. Together with Daniel F. Merriam he founded the journal Computers & Geosciences, launched in 1975. In 1967 Reyment was appointed to the Chair of Historical Geology and Palaeontology at Uppsala University. Besides teaching and supervising students, he continued his research and carried out extensive fieldwork. Important achievements during his Uppsala years were the IGCP Project “MidCretaceous Events” and the launch of the journal Cretaceous Research. Reyment was a highly versatile scientist. He published articles and books on several fossil groups (mainly ammonites, ostracods, and foraminifers), regional geology, stratigraphy, biogeography, dispersal of organisms, palaeoecology, evolution, sexual dimorphism, actuopalaeontology, sedimentology, random events in time (volcanic eruptions and earthquakes), geomagnetism, geochemistry, and history of geology. The majority of his publications concerned the Jurassic, Cretaceous, or Tertiary. Other publications dealt with mathematical methods and topics outside the geosciences, such as biology, biochemistry, genetics, serological analysis, linguistics, Romanis, and Spanish Moors. Remarkably enough, he also found time for his many hobbies and interests, such as music (clarinet and oboe), squash and tennis, family genealogy, history, and, not least, languages. He spoke fluent English, Swedish, French, German, and Spanish and had a good reading knowledge of Latin, Portuguese, Russian, and Afrikaans. He also studied Japanese, Yoruba, Finnish, and Romani.
1219
Reyment was an influential pioneer in the application of mathematical methods in the geosciences. His impact is reflected in the numerous citations in the Dictionary of Mathematical Geosciences (Howarth 2017), found under the topics asymmetry analysis, biostratigraphic zonation, biplot, canonical correlation, cross-validation, discriminant analysis, eigenanalysis, heteroscedacity, kurtosis, latent root, morphometrics, multivariate normal distribution, pattern classification, peakedness, point event, point process, positive definite matrix, principal component analysis, quantitative paleoecology, robust estimate, scores, serial correlation coefficient, singular value, slotting, statistical zap, survival function, and zap. For additional information about Richard Reyment’s activities and achievements, see Birks (1985), Merriam (2004), Merriam and Howarth (2004), Howarth (2017), and Sagar et al. (2018), and Bengtson (2018) for a general biography including a list of his publications.
Bibliography Bengtson P (2018) Richard A. Reyment (1926–2016) – Ammonitologist sensu latissimo and founder of Cretaceous Research. Cretaceous Res 88:5–35 Birks HJB (1985) Recent and possible future mathematical developments in quantitative palaeoecology. Palaeogeogr Palaeoclimatol Palaeoecol 50:107–147 Howarth RJ (2017) Dictionary of mathematical geosciences. Springer, Cham. xvi + 893 pp Merriam DF (2004) Richard Arthur Reyment: father of the International Association for Mathematical Geology. Earth Sci Hist 23:365–373 Merriam DF, Howarth RJ (2004) Pioneers in mathematical geology. Earth Sci Hist 23:314–324 Sagar BSD, Cheng Q, Agterberg F (eds) (2018) Handbook of mathematical geosciences. Springer, Cham. xxviii + 914 pp
R-Mode Factor Analysis Norman MacLeod Department of Earth Sciences and Engineering, Nanjing University, Nanjing, Jiangsu, China
Definition R-Mode Factor Analysis – a statistical modeling procedure whose purpose is to find a small set of latent linear variables that estimate the influence of the factors controlling the intervariable covariance or correlation structure in a multivariate dataset and are optimized to preserve as much of that structure as appropriate given an estimate of the number of causal influences.
R
R-Mode Factor Analysis
1220
R-Mode Factor Analysis, Table 1 Correlation matrix of academic test scores that exhibit a classic factor structure. (From Manly 1994) Classics French English Mathematics Music
Classics 1.00 0.83 0.78 0.70 0.63
French 0.83 1.00 0.67 0.67 0.57
English 0.78 0.67 1.00 0.64 0.51
Introduction Factor analysis (FA) is often regarded as a simple variant of principal components analysis (PCA). Indeed, in some quarters, FA has a somewhat nefarious reputation due to the nature of its statistical model and due to its overenthusiastic use in the field of psychology, especially the branch that deals with intelligence testing (see Gould 1981).
Mathematics 0.70 0.67 0.64 1.00 0.51
Music 0.63 0.57 0.51 0.51 1.00
aptitude, culture-based bias in the test, or some combination of the two.1 Generalizing the relations presented above, the model underlying FA has the following form. X1 ¼ a1,1 F1 þ a1,2 F2 þ a1,m Fm þ e1 X2 ¼ a2,1 F1 þ a2,2 F2 þ a2,m Fm þ e2 X2 ¼ a3,1 F1 þ a3,2 F2 þ a3,m Fm þ e3
ð2Þ
⋮ Xj ¼ aj,1 F1 þ bj,2 F2 þ aj,m Fm þ ej
Historical Background Around 1900 Charles Spearman noticed that scores on different academic subject tests often exhibited an intriguing regularity. Consider the following correlation matrix of test scores (Table 1). If the diagonal values are ignored, in many cases the ratio between paired variable values approximates a constant. Thus, for English and Mathematics . . . 0.78/0.70 ¼ 1.114
0.67/0.67 ¼ 1.000
0.51/0.51 ¼ 1.000
. . . and for French and Music . . . 0.83/0.63 ¼ 1.317
0.67/0.51 ¼ 1.314
0.67/0.51 ¼ 1.314
Spearman (1904) suggested the most appropriate model for such data had the following generalized form. Xj ¼ aj,1 F1 þ ej
ð1Þ
In this expression, aj was a standardized score the jth test, F was a “factor” value for the individual taking the battery of tests, and ej was the part of Xj specific to each test. Because Spearman was dealing with the matrix of correlations, both X and F were standardized, with means of zero and standard deviations of one. This fact was taken advantage of in Spearman’s other FA insight, that aj2 is the proportion of the variance accounted for by the factor. Using these relations Spearman proposed that mental abilities could be deduced from the scores on a battery of different tests and were composed of two parts, one (F) that was common to all tests and so reflected the innate abilities – or “general intelligence” – and one (e) that was specific to each test and so could be interpreted as evidence of subject-specific
In this expression, aj is a standardized value the jth variable, F is the “factor” value for that variable across all measured or observed cases, and ej is the part of the correlation not accounted for by the factor structure. R-mode factor analysis attempts to find a set of m linear factors (F, where m < j) that underly the expression of the sample’s covariance or correlation structure; hopefully one that provides a good estimate of the larger population’s structure from which the sample was drawn. Note the similarity between this model and the multiple regression analysis model. The creators of factor analysis viewed it originally as a type of multiple regression analysis. But unlike a standard regression problem in which a set of observations (x) is related to an underlying general model ( y) by specifying a rate of change (the slope β), one cannot observe any aspect of the factor(s) (F) directly. As a result, their structure must be inferred from the raw data, which represents an unknown mixture of common and unique variances. Standard PCA also cannot resolve this problem. The PCA model is simply a transformation that re-expresses the observed data in a form that maximizes variance on the various eigenvectors and ensures their orientational independence from one another. This approach focuses on accounting for variance and keeps extracting vectors until the remaining unaccounted – or residual – variance is either exhausted or drops below some arbitrarily determined threshold. Factor analysis does not try to account for the variable variance
1
This seemingly straight-forward interpretation of Spearman’s work later became highly controversial when estimates of a person’s general intelligence were used to direct or limit their life opportunities and when the culturally biased character of many standard ‘intelligence’ tests was recognized (see Gould 1981 for a review).
R-Mode Factor Analysis
1221
values exhibited by a sample, but instead focuses on recovering the structure of the covariances or correlations among the variables. In other words, PCA attempts to reduce the residual values of the diagonal or trace of the covariance or correlation matrix. Factor analysis attempts to model the covariances or correlations in such a way as to minimize the residual values of the matrix’s off-diagonal elements. Among the many myths about FA is the idea that it only differs from PCA in the sense that the number of factors extracted from the covariance or correlation matrix in FA is less than the number of variables, whereas in PCA, the number of components extracted equals the number of variables. Nothing could be further from the truth. The principal use of PCA is to reduce the dimensionality of a dataset from p variables to m components, where p < m. Factor analysis attempts to infer both the number and the effects of the influences responsible for the variance structure. The erroneous view of equivalence in the relation between PCA and FA has been perpetuated by the innumerable over-generalized and/or superficial descriptions of both techniques, especially in the user’s guides of commercial multivariate data-analysis software packages in which these methods are often discussed together owing to the use of eigenanalysis (or singular value decomposition) in their calculation.
Choosing and Extracting the Factors There are a number of ways to perform a classic R-mode FA. The method presented below would not be accepted by many as “true” FA; especially as it is practiced in the social sciences. However, I will recount a basic version of the component-based procedure referred to as FA by most earth scientists (e.g., Jöreskog et al. 1976; Reyment and Jöreskog 1993; Manly 1994; Davis 2002). Factor analysis begins with a PCA. In addition to a set of data, FA assumes the researcher has a level of understanding of the processes responsible for the data’s variance structure sufficient to estimate how many independent factors are involved. This number is required as a controlling parameter. In Spearman’s case, he suspected a one-factor solution, the one-factor model quantifying his “general intelligence” index with the remaining test-specific variation being subsumed into the error term. For morphological data (see MacLeod
2005 for an example) where distance measurements have been collected from a set of fossil morphologies, the generalized factors might refer to “size” and “shape.” Regardless of the number of factors specified, all applications of FA assume the existence of both “generalized effect” factors and a specific-effect deviation. It is often the case that one has little idea beforehand how many factors are either present, or of interest, in the data. Fortunately, FA, like PCA, can be used in an exploratory mode. There are a number of FA rules-or-thumb that can be of assistance in making decisions regarding how many factors axes to specify. If you are working with a correlation matrix, one of the most popular and easiest to remember is to set the number of factors to equal the number of PC eigenvectors that contain more information than the raw variables. This makes sense in that these axes are easy to identify (all will have eigenvalues >1.0) and there seems little logical reason to consider composite variables that contain less information than their raw counterparts. Another popular rule-of thumb is the so-called scree test in which the magnitudes of the eigenvalues are plotted in rank order and the decision made on the basis of where the slope of the resulting curve breaks from being dominantly vertical to dominantly horizontal. The other quick and easy test to apply when considering how many factors to extract, and indeed, whether factor analysis is appropriate, is Spearman’s constant proportion test (see above). If a large number of the ratios between the nondiagonal row values of the correlation matrix do approximate a constant, the data exhibit the classic structure assumed by the factor analysis model and the appropriate number of factors will be the number of approximately constant values. Regardless, it is important to remember that, if there is uncertainty regarding the number of factors controlling the dataset’s variance structure, FA may proceed but should be interpreted with caution. Unlike classical PCA, factor analysis scales the loadings of the retained components by their eigenvalues, according to the following expression. aj,m ¼
2
lm bj,m :
ð3Þ
Here, lm is the eigenvalue associated with the mth principal component with m being set to the number of retained factors
R-Mode Factor Analysis, Table 2 Principal component loadings, factor loadings, and factor communalities for the example trilobite morphometric variables Parameter/variable Eigenvalue Body length Glabella length Glabella width
Principal components 1 2 2.776 (92.54%) 0.142 (4.72%) 0.573 0.757 0.583 0.108 0.576 0.644
3 0.082 (2.74%) 0.314 0.805 0.504
Factors 1 2.776 (92.54%) 0.954 0.972 0.959
2 0.142 (4.72%) 0.285 0.041 0.242
Communality 0.992 0.947 0.979
R
R-Mode Factor Analysis
1222
and b is the eigenvector loading of the jth variable on the mth principal component. This scaling, along with the reduction in the number of components (¼ factors) being considered, represents a basic computational difference between PCA and FA. Next, the “communality” values of the m scaled factors are calculated as the sum of squares of the factor loading values. This summarizes the proportion of variance provided by those variables to each factor. The quantity “1- the sum of communality values for each variable” then expresses the proportion of the variable’s variance attributable to the error term (e) of the factor model. If the correct number of factors has been chosen, all the summed, squared factor values should exhibit a high communality on each variable with a low residual error. Once the factor equations have been determined, the original data can be projected into the factor space. The resulting
^ may then be tabulated, plotted, inspected, factor scores (X) and interpreted in the manner normal for PCA ordinations. Table 2 and Fig. 1 compare and contrast results of a PCA and two-factor FA for the trilobite data listed in MacLeod (2005). For this simple dataset, both the first principal component and first factor would be interpreted as being consistent with generalized (allometric) size variation owing to their uniformly positive loading coefficients of varying magnitudes. Similarly, for both analyses, the second component/factor would be interpreted as a localized shape factor owing to the contrast between body length and the glabellar variables. While the relative positions of the projected data points are also similar, and discontinuities in the gross form distributions are evident in both plots, the scaling of the PCA and FA axes differs reflecting the additional eigenvalue-based scaling that is part of component-based FA. Nevertheless, an important change has taken place in the ability of the FA vectors to
R-Mode Factor Analysis, Fig. 1 Ordination spaces formed by the first two principal components (a) and a two-factor extraction (b) from the three-variable trilobite dataset. Note the similarity of these results in
terms of the relative positions of forms projected into the PCA and factor spaces and the difference in terms of the axis scales
R-Mode Factor Analysis, Table 3 Original correlation matrix, correlation matrix reproduced on the basis of the first to principal components, and correlation matrix reproduced by the two-factor FA solution. Note
the bottom two matrices are based on the component/loading values shown in Table 2
Original correlation matrix Body length Glabella length Glabella width Reproduced correlation matrix (PCA) Body length Glabella length Glabella width Reproduced correlation matrix (FA) Body length Glabella length Glabella width
Body length 1.000 0.895 0.859
Glabella length 0.895 1.000 0.909
Glabella width 0.859 0.909 1.000
0.902 0.253 0.158
0.253 0.352 0.405
0.158 0.405 0.746
0.992 0.916 0.847
0.916 0.947 0.943
0.847 0.943 0.979
R-Mode Factor Analysis
1223
R
R-Mode Factor Analysis, Fig. 2 Original (a) and rotated (b) variableaxis orientations with respect to the fixed, orthogonal factor axis orientation for the three-variable trilobite data. The rotation of these variables is centered on Factor 1 owing to their high mutual correlations.
A summary of the (minor) projected data displacements arising as a result of varimax axis rotation (c). More complex datasets will typically exhibit larger variable-vector rotations
represent the structure of the original correlation matrix (Table 3). Here it is evident that the two-factor PCA result focuses on reproducing the values along the trace of the original covariance/correlation matrix, whereas the FA result focuses on reproducing the entire structure of the covariance/correlation
matrix. If all three principal components had been used in these calculations, only the trace of the original correlation matrix would be reproduced exactly, whereas if a three-factor solution had been calculated, the original correlation matrix would have been reproduced exactly (except for rounding error). In this sense then, FA represents a more information-
R-Mode Factor Analysis
1224
rich analysis than PCA, especially if a dramatic reduction in dimensionality is required. The validity of applying the FA model to a dataset is predicated, however, on the dataset exhibiting a factor-based variance structure.
Factor Rotation In addition to scaling of the factor loadings, the other aspect of a classic component-based FA that differs from standard PCA is factor rotation. As can be seen from Fig. 1, the factorscaling calculation does not alter the orientation of the factor axes with respect to the original variable axes. However, since the goal of a FA is to “look beyond” the data to hand in order to discover the structure of the influences responsible for observed patterns of variation, strict geometric conformance between the original data and the factor model is regarded by many factor analysts as being beside the point. Thus, a justification is provided to alter the orientations of the factor axes relative to the original variable axes (or vice versa) in order to make them easier to interpret in terms of the original variables. Two general approaches to factor rotation are popular: orthogonal (in which the orthogonality of the factor-axes system is maintained) and oblique (in which the factor axisorthogonality constraint is relaxed). In both cases, the goal of factor rotation is to align the factor axes with the covariance/ correlation structure of the variable set to the maximum extent possible. Orthogonal axis rotation (usually accomplished using Kaiser’s 1958 varimax procedure) performs this operation by iteratively changing the orientation of the factor-axis system two factors at a time until the squares of the variances are either maximized or minimized across all factors. Operationally this amounts to adjusting the set of factor loadings, rigidly in a geometric sense, until all variables exhibit loadings at or as close to 0.0 or 1.0 as possible. Figure 2 illustrates the unrotated and varimax rotated solutions for the example trilobite dataset. Obviously, in this simple example, the variable vectors were already quite well aligned with factor 1. However, a small adjustment was necessary to achieve a fully optimized result. In higher-dimensional datasets, the angle of varimaxoptimized factor rotation can be quite large (see Davis 2002 for an example). A final word about calculation of the factor scores that project the original data into the space of the factor axes. The PCA method accomplishes this by postmultiplying the original data values (X) by the component loadings (A). The factor model differs from principal component model, however, in that the former subdivides the variance structure into common and unique factors (see above). Thus, true factor scores (X^)
cannot be calculated in the same way as principal component scores because the data matrix contains contributions from both the common and unique factors. In order to obtain the true factor scores, the raw (or standardized) data must be normalized for the unique factor’s influence. Fortunately, this can be accomplished with minimal effort by postmultiplying the data matrix by the inverse of the covariance/ correlation matrix (S1), prior to the factor loading postmultiplication. X ¼ XS 1 F
ð4Þ
Conclusion R-mode factor analysis has a long and distinguished history in the earth sciences and has played a key data-analytic role in advancing many earth-science research programs (e.g., Jöreskog et al. 1976; Sepkoski 1981; Reyment and Jöreskog 1993). Recently, Bookstein (2017) has advocated its use in the context of general morphometric analysis as a more pathanalytic approach to understanding of the influences that underly shape variation. It is to be hoped that this and (perhaps) other applications of FA in novel data-analytic contexts will lead to the wider use of FA in geological, oceanographic, biological, meteorological, and cryological contexts.
Cross-References ▶ Correlation and Scaling ▶ Eigenvalues and Eigenvectors ▶ Morphometry ▶ Principal Component Analysis ▶ Regression ▶ Shape ▶ Variance
References Bookstein FL (2017) A method of factor analysis for shape coordinates in physical anthropology. J Phys Anthropol 164:221–245 Davis JC (2002) Statistics and data analysis in geology, 3rd edn. Wiley, New York Gould SJ (1981) The mismeasure of man. W. W. Norton, New York Jöreskog KG, Klovan JE, Reyment RA (1976) Geological factor analysis. Elsevier, Amsterdam Kaiser HF (1958) The varimax criterion for analytic rotations in factor analysis. Psychometrika 23:187–200 MacLeod N (2005) Factor analysis. Palaeontol Assoc Newsl 60:38–51 Manly BFJ (1994) Multivariate statistical methods: a primer. Chapman & Hall, Bury, St. Edmonds, Suffolk
Robust Statistics Reyment RA, Jöreskog KG (1993) Applied factor analysis in the natural sciences. Cambridge University Press, Cambridge Sepkoski JJ (1981) A factor analytic description of the Phanerozoic marine fossil record. Paleobiology 7:36–53 Spearman C (1904) “General intelligence”, objectively determined and measured. Am J Psychol 15:201–293
Robust Statistics Peter Filzmoser Institute of Statistics & Mathematical Methods in Economics, Vienna University of Technology, Vienna, Austria
Definition Robust statistics is concerned with the development of statistical estimators that are robust against certain model deviations, caused, for example, by outliers.
Introduction Data analysis and robust statistics have a strong historical link, because many questions regarding specific features in the data structure are connected to the outlier problem. Outliers have always been viewed as something atypical, not represented in the data majority, and for that reason they are usually of special interest to the analyst. Robust statistics, in general, is concerned with the development of statistical estimators that are robust against certain model deviations, such as deviations from normal distribution, which is frequently assumed in estimation theory. Robust estimators are supposed to still deliver reliable results in case of such deviations, e.g., caused by outliers, and various tools that allow to quantify the robustness of an estimator have been developed (Maronna et al. 2019). Probably the most prominent tool is the breakdown point (BP), which refers to the smallest fraction of observations that could be replaced by any arbitrary values in order to make the estimator completely meaningless. For example, for estimating the univariate location, the commonly used arithmetic mean has a BP of zero, because already replacing a single observation by an arbitrary value can drive the estimator beyond all bounds. The median, on the other hand, achieves the maximum BP of 0.5. Under normality, however, the median comes with a lower statistical efficiency, and thus often a compromise between (positive) BP and reasonable efficiency needs to be taken (e.g., trimmed mean or, better, M estimator – see below).
1225
This entry aims to explain some important tools and concepts in robust statistics from a practical perspective. Robust regression and covariance estimation, as well as principal component analysis (PCA) will be in focus. The interested reader can find many more details and background to robust statistics in the book Maronna et al. (2019).
Regression Many text books on robustness motivate the use of robust regression routines by presenting some x- and y-data which follow a linear trend – with the exception of some single outliers. Those outliers, appropriately placed, are able to completely spoil the least-squares (LS) regression line but do not affect the fit from a robust regression. While such examples may be inspiring, one would always ask why the outliers are not simply removed, as they are clearly visible, and a subsequent LS regression would be useful for predicting y from x. The problem is that in more complex situations, especially in higher dimension, outlier detection is not at all trivial. Fortunately, the outlier detection step is not necessary in practice, because robust regression methods will automatically downweight outliers, thus observations which are inconsistent with the linear model, and this also works in the higher-dimensional situation. Example Data In order to be more specific, a data set from the GEMAS project (Reimann et al. 2014), a geochemical mapping project covering most European countries, is considered in the following. Similar as in van den Boogaart et al. (2020), a regression of pH on the composition of the major oxides Al2O3, CaO, Fe2O3, K2O, MgO, MnO, Na2O, P2O5, SiO2, TiO2, and LOI is of interest. Here, only the data from Poland are used, as they are relatively homogeneous concerning the soil type. In other countries, the pH values heavily depend on whether the soils are on karstic landscapes or on felsic rocks, for instance. Regression Model The response, here pH, is denoted as variable y, while the explanatory variables are x1 to xD. In total, there are n ¼ 129 samples taken from Poland, which lead to the observations yi, and xi1, . . ., xiD, for i ¼ 1, . . ., n. Since the explanatory variables are compositional data, it is advisable to first transform them into the usual Euclidean geometry for which the traditional regression methods have been designed. One option is to use pivot coordinates (Filzmoser et al. 2018), which results in the variables z1, . . ., zD1, with the corresponding observations zi ¼ (zi1, . . ., zi, D1)0. The linear regression model is
R
1226
Robust Statistics D1
y i ¼ b0 þ
zij bj þ ei ,
ð1Þ
j¼1
with the error terms εi, which are supposed to be independent and N(0, s2) distributed, where s2 is the error variance. For pivot coordinates, the particular interest is in the coefficient b1 associated to variable z1, which contains all relative information of x1 to the remaining parts in the composition. A reordering of the composition, with a different part on the first position, followed by computing pivot coordinates allows for an interpretation of the coefficient related to another (first) part (Filzmoser et al. 2018). For a given esti^ ¼ b^0 , b^1 , . . . , b^D1 0 , one can compute the fitted mator b ^ and the residuals r i ¼ r i b ^ ¼ yi y^i , values y^i ¼ 1, z0i b, for i ¼ 1, . . ., n, which refer to the discrepancy of the observed responses to the fitted ones. Least-Squares (LS) Estimator The LS estimator minimizes the sum of the squared residuals and is thus defined as ^LS ¼ argmin b b
n
r 2i ðbÞ:
ð2Þ
i¼1
Outliers in the context of regression may refer to atypical values in the response, or to unusual observations in the space of the explanatory variables. The former are called vertical outliers, and the latter leverage points, where a distinction between good (values along the regression hyperplane) and
bad leverage points (values far away from the regression hyperplane) is made. Vertical outliers as well as bad leverage points lead to large residuals, and their square can heavily dominate the minimization problem (Eq. 2). For this reason, ^LS is sensitive to outliers and has a BP of zero, since even a b single observation could “attract” the regression hyperplane and thus make the estimator useless for the data majority. LS Regression for the Example For the considered example data set, a reliable outlier detection based on visual inspection is practically impossible. Visual diagnostics is thus done only after model fitting. Figure 1 (left) shows the response values yi versus the fitted values yi from LS regression. The solid line indicates equality of response and fitted values, and the dashed lines refer to the outlier cutoff values for the residuals, which are typically ^LS =^ ^ 2LS is the esticonsidered as r i b sLS ¼ 2:5, where s mated residual variance from the LS estimator. According to these cutoff values, there are no outlying residuals, and the QQ-plot (Fig. 1, right) even confirms approximate normality of the residuals. The model does not fit well, because for small values of pH it leads to an overestimation, and for bigger values to an underestimation. At this stage, however, it is unclear if a nonlinear model would be more appropriate, if outliers could have affected the LS estimator, of if the explanatory variables are simply not more useful for modeling the response. Robust Regression Robustification of the LS estimator are M estimators for regression, which minimize a function, say r of the residuals.
Normal Q−Q Plot
1 0 −1
6 4
5
pH fitted
7
Sample Quantiles
8
LS regression
4
5
6
pH
7
8
−2
−1
0
1
2
Theoretical Quantiles
Robust Statistics, Fig. 1 LS regression of pH on the composition of the major oxides. Left plot: pH versus fitted values of pH; right plot: normal QQ-plot for the LS residuals
Robust Statistics
1227
Since this would depend on the residual scale, the M estimator is defined as ^M ¼ argmin b b
n
r
i¼1
r i ðbÞ , ^ s
^LTS ¼ argmin b b
^2LTS ¼ c s
ð3Þ
^ is a robust scale estimator of the residuals (Maronna where s et al. 2019). If r is the square function, thus r(r) ¼ r2, one would again obtain the LS estimator. In order to achieve robustness, the function r thus needs to be chosen adequately: For small (absolute) residuals, it should be approximately quadratic to obtain high efficiency, while for large (absolute) residuals it needs to increase more slowly than the square. Various proposals are available in the literature, and they also affect the robustness properties of the resulting estimator. It ^ is derived. The so-called MM estimator also matters how s combines a highly robust scale estimator with an M estimator, and it achieves a BP of 0.5, with a high (tunable) efficiency (Maronna et al. 2019). Another robust regression estimator which is probably easier to understand and still widely applied is the LTS (least trimmed squares) estimator, defined as h
residuals from the LTS estimator, it is possible to robustly estimate the residual variance by
r 2ðiÞ ðbÞ,
ð4Þ
i¼1
where r 2ð1Þ . . . , r 2ðhÞ . . . r 2ðnÞ are the ordered values of the squared residuals and h is an integer between n/2 and n. The maximum BP of 0.5 is attained for h ¼ nþD 2 , where [a] is the integer part of a (Maronna et al. 2019). Based on the
h
1 h
^LTS , r 2ðiÞ b
ð5Þ
i¼1
where c is a constant for consistency under normality. Similar as above, residuals would be considered as outlying if ^LTS =^ j ri b sLTS j> 2:5:
ð6Þ
In contrast to the outlier diagnostics for LS regression, both the residuals and the residual variance are estimated robustly within LTS regression, and thus, this diagnostic tool is reliable and useful in presence of outliers. Since the statistical efficiency of the LTS estimator is low, it is common to add a reweighting step. This consists of a weighted LS estimator, with weights 0/1, where zero is assigned to residuals for which (Eq. 6) is valid, and one otherwise. Robust Regression for the Example Figure 2 presents the resulting plots for LTS regression, analogously as in Fig. 1 for LS regression. The dashed lines in the left plot correspond to the outlier cutoff values according to (Eq. 6), and here indeed, several points are identified as outliers (red symbols +). The model still has a problem with accommodating pH values in the range from about 6 to 7, but it fits much better for lower pH values
Normal Q−Q Plot
R
4
−1 −2 −3 −5
−4
8 6
pH fitted
10
Sample Quantiles
0
12
1
Robust regression
4
6
8
pH
10
12
−2
−1
0
1
2
Theoretical Quantiles
Robust Statistics, Fig. 2 LTS regression of pH on the composition of the major oxides. Left plot: pH versus fitted values of pH; right plot: normal QQ-plot for the LTS residuals. Red points with + are outliers
1228
Robust Statistics
compared to the LS model. Also the QQ-plot confirms approximate normality of the (nonoutlying) residuals. The superiority of the LTS model is also reflected in the MSE (mean-squared error), which is 0.22 (outliers not included). The MSE for the LS model is 0.61 (no outliers identified). A further interesting outcome is the inference table, which reveals the explanatory variables that are significant in the model. Accordingly, for LTS regression, the pivot coordinates for Al2O3, CaO, Fe2O3, K2O, Na2O, and LOI are significant. In contrast, for the LS model, only MnO and CaO resulted in significance, and thus the interpretation would be quite different. In practice, there is often the desire to continue with removing the outliers, and then applying again LS regression. However, this has precisely been done with the reweighted LS step after LTS regression. Other robust regression estimators such as MM regression also determine weights for a weighted LS regression, but the weights can be in the whole interval [0,1], corresponding to the outlyingness of the residuals.
Principal Component Analysis (PCA) PCA is a dimension reduction method and widely used for data exploration. The projection onto the first two principal components (PCs) often allows to inspect the essential information contained in the data. However, in presence of outliers, classical PCs can be strongly determined by the outliers themselves, since PCs are defined as directions maximizing the (classical) variance. Usually, the PC directions are computed as the eigenvectors of the covariance matrix, which first needs to be estimated.
Covariance Estimation Assume (D – 1)-dimensional observations z1, . . ., zn. Then the classical (empirical) covariance matrix is S¼
0.4
PCA for the Example Consider again the oxide composition from the GEMAS data set, which has been used above in a regression model to explain the response pH. With robust regression, several outliers in the residuals have been identified, and these are shown in the plots in Fig. 3 by red symbols +. These plots are so-called CoDa-biplots, and the compositional parts are transformed from pivot coordinates to centered logratio coefficients for a meaningful interpretation (Filzmoser et al. 2018). The left biplot is for classical PCA, based on an eigen-decomposition of the sample covariance matrix, and the right biplot for robust PCA, using the MCD estimator as a basis for the calculations.
Robust PCA 0.6
0.0
0.5
1.0
0.5
1
CaO Na2OLOI MgO K2O Al2O3 Fe2O3 TiO2
SiO2 −1
−0.5
P2O5 −0.4
MnO
SiO2
0.0
0
0.2 0.0
MgOFe2O3 Na2O Al2O3 K2O TiO2
P2O5
PC2 (16.4%)
0.4
0.6
2 1 0
LOI
−1
−0.5
0.8 0.8
CaO
PC2 (23.9%)
ð7Þ
i¼1
1.0
0.2
ð zi zÞ ð zi zÞ 0 ,
2
0.0
n
with the arithmetic mean z ¼ 1n ni¼1 zi . This covariance estimator is sensitive to outliers, and there are many options for a more robust estimation. One well-known estimator is the MCD (minimum covariance determinant) estimator, which is given by the mean and covariance of the subset of those h points (with n2 h n) which yield the smallest determinant of the empirical covariance matrix (Maronna et al. 2019). Similar to LTS regression, the choice of h determines the BP of the estimator. Higher efficiency of the estimator is achieved by a reweighting step (see Maronna et al. 2019).
Classical PCA −0.4
1 n1
MnO −1
0
1
PC1 (45.2%)
2
−1
0
1
2
PC1 (56%)
Robust Statistics, Fig. 3 CoDa-biplots for classical (left) and robust (right) PCA of the oxide composition. Red symbols + are residual outliers from robust regression, see Fig. 2
Rock Fracture Pattern and Modeling
The classical biplot reveals that the direction of the second component PC2 seems to be mainly determined by outliers, and these outliers appear in the part CaO. Note that the pivot coordinate for CaO was significant in the LS regression model, and this is very likely just an artifact of the outliers. The biplot for robust PCA is not affected by the outliers, and thus it better reflects the relationships in the oxide composition.
Summary Robust statistical methods are supposed to give reliable results even if strict model assumptions that are required for the classical methods are violated to some extent. This means, however, that for the data majority the model assumptions need to be fulfilled. An example is multivariate outlier detection, where outlier cutoff values are usually derived from distributional assumptions valid for the data majority. There are also other approaches, in particular for outlier detection, which are not based on distributional assumptions, typically originating from the machine learning community. Methods from both disciplines can be very useful, but this also depends on the problem setting. For a recent overview in this context, see Zimek and Filzmoser (2018). Nowadays, it is easy to compare the results of a classical and a robust statistical method, because software is freely available. For example, many robust methods are implemented in the package robustbase (Maechler et al. 2020) of the software environment R (R Development Core Team 2020), and the authors did their best to guarantee a similar input and output structure for the classical and robust methods. If the results of both analyses differ (clearly), one should go into detail to see why this happened. The field of robust statistics is still very active with developing new methods and technologies. The “big data era” has not only led to bigger data sets, but also to more complex data structures, which brings in new challenges and requirements for the analysis methods. As an example, in the highdimensional data setting it is desirable that not the complete information of an observation is discarded, but rather that only deviating entries of the observation are downweighted. For recent references to such approaches, see Filzmoser and Gregorich (2020).
Cross-References ▶ Compositional Data ▶ Exploratory Data Analysis ▶ Least Median of Squares ▶ Least Squares ▶ Multivariate Analysis
1229
▶ Ordinary Least Squares ▶ Principal Component Analysis ▶ Regression ▶ Statistical Outliers
Bibliography Filzmoser P, Gregorich M (2020) Multivariate outlier detection in applied data analysis: global, local, compositional and cellwise outliers. Math Geosci 52(8):1049 Filzmoser P, Hron K, Templ M (2018) Applied compositional data analysis: with worked examples in R. Springer, Cham Maechler M, Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, SalibianBarrera M, Verbeke T, Koller M, Conceicao ELT, Anna di Palma M (2020) Robustbase: basic robust statistics. R Package Version 0.93-6 Maronna R, Martin R, Yohai V, Salibián-Barrera M (2019) Robust statistics: theory and methods (with R). John Wiley & Sons, New York R Development Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3900051-07-0 Reimann C, Birke M, Demetriades A, Filzmoser P, O’Connor P (eds) (2014) Chemistry of Europe’s agricultural soils – part a: methodology and interpretation of the GEMAS data set. Geologisches Jahrbuch (Reihe B 102). Schweizerbarth, Hannover van den Boogaart K, Filzmoser P, Hron K, Templ M, Tolosana-Delgado R (2020) Classical and robust regression analysis with compositional data. Math Geosci 1–36 Zimek A, Filzmoser P (2018) There and back again: outlier detection between statistical reasoning and data mining algorithms. WIRES Data Min Knowl Discov 8(6):e1280
Rock Fracture Pattern and Modeling Katsuaki Koike1 and Jin Wu2 1 Department of Urban Management, Graduate School of Engineering, Kyoto University, Kyoto, Japan 2 China Railway Design Corporation, Tianjin, China
Definition Fracture is a general term for any type of rock discontinuity (e.g., crack, fissure, joint, fault) and covers a wide size range (mm to km scale). Fractures are generated by the destruction of rock mass in a compressive or tensile stress field. Fractures are mechanically classified into the following categories: mode I, extension or opening; mode II, shear; and mode III, a mixture of modes I and II. The degree of fracture development is essential to engineering practices because it strongly affects the hydraulic and mechanical properties of rock mass including their magnitude and anisotropic behavior. Fractures are generally targeted as representative elements of a fracture pattern to characterize their development, length, orientation,
R
1230
Rock Fracture Pattern and Modeling
density, connectivity, aperture, and geometry (Fig. 1). Using these elements, a spatial pattern of the fracture distribution can be computationally modeled following several statistical assumptions, for which discrete fracture network (DFN) modeling is the most representative approach (e.g., Lei et al. 2017). Such computational modeling is indispensable, because the fracture distribution in rock mass is difficult to predict by 1D and 2D surveys using borehole and rock surface data, respectively. Constructed fracture models can be linked to fluid flow modeling, permeability assessments, and deformation simulations of rock mass.
Fracture Pattern Quantification Methods Length Distribution The true length of a fracture is impossible to measure because only a part of a fracture system appears on the rock mass surface. The trace length on the surfaces of an outcrop, tunnel, dam, or drift is therefore typically used to characterize the fracture length distribution. This distribution can be approximated using a negative power law, lognormal distribution, or negative exponential distribution (Bonnet et al. 2001). For example, the distribution of fracture length l following a negative power law with scaling exponent a and density constant α is formulated as: nðlÞ ¼ a∙la
ð1Þ
where n(l)dl is the number of fractures per unit volume in the length range [l, l þ dl]. Because this law signifies the absence of a characteristic length, the definition of a total range
Rock Fracture Pattern and Modeling, Fig. 1 Representative elements for characterizing rock-fracture patterns
between the minimum and maximum lengths [lmin, lmax] is necessary for practical use. Upon increasing the a value, the n(l) inclination becomes steeper and the ratio of short to long fractures increases, which is demonstrated by a comparison of the simulated fracture distributions in a cubic domain of 100-m side lengths with a ¼ 2.5, 3.0, and 3.5 and a length range of [1 m, 1000 m] (Fig. 2a). Long and short fractures are dominantly distributed in models with a ¼ 2.5 and 3.5, respectively. Orientation Distribution Fracture orientation is defined by the strike and dip or dip and dip direction. The distribution of fracture orientation is typically approximated using a uniform, Gaussian, or Fisher distribution by considering the statistical property of the orientations, such as the dispersion and shape. Among them, the Fisher distribution (Fisher 1993), which is expressed by the following probability density distribution f(θ), has been widely adopted owing to its adequate fitness to many case studies (e.g., Hyman et al. 2016). f ðyÞ ¼
k sin yek cos y ek ek
ð2Þ
where θ ( ) represents the deviation angle from the mean vector of the dip direction and k is the Fisher constant that expresses the dispersion of direction clusters. Fracture orientations are scattered with small k values and become concentrated with increasing k values, as demonstrated by comparing two DFN models with k ¼ 10 and 100 in Fig. 2b. The greatest advantage of the Fisher distribution is
Rock Fracture Pattern and Modeling
1231
Rock Fracture Pattern and Modeling, Fig. 2 DFN models with different distributions of fracture length and orientation and densities in a calculation domain of 100 100 100 m. (a) DFN models following a power law of length distribution with three scaling exponents a, length range [1 m, 1000 m], and P32 ¼ 0.1 m2/ m3. (b) DFN models with two dispersion parameters k with a mean dip angle of 60 and azimuth angle of 60 clockwise from the north. (c) DFN models with three fracture densities P32 ¼ 0.05, 0.1, and 0.2 m2/m3
the correct modeling of preferential orientation upon suitably setting the angle dispersion around the mean orientation. Density Fracture density is typically quantified using scaleindependent indices, P10, P21, and P32 (Dershowitz and Herda 1992). P10 (m1) is the number of fractures per unit length intersected by a 1D investigation line using a scanline or borehole. P21 (m/m2) is the total length of fracture traces per unit surface area by 2D investigation using an outcrop or tunnel wall. P32 (m2/m3) is the volumetric fracture intensity for a 3D density value expressed by the total surface area of fractures per unit volume. P32 is experimentally derivable from P10 or P21 because it cannot be directly measured owing to the lack of access to the interior sections of rock mass. For example, a 3D DFN model can be generated with a P10 value along a borehole or different P10 values along several boreholes by setting the fracture size and orientation distribution. The total fracture area over the calculation domain is computed, and P32 can therefore be quantified. More fractures are distributed and naturally connected with increasing P32 values, as shown in Fig. 2c.
Connectivity The connectivity of fractures strongly affects the mechanical and hydraulic properties of rock mass. The most common method for connectivity measurements is percolation theory, which quantifies connectivity by the number of intersections per fracture using a percolation parameter p (Lei and Wang 2016). Connectivity increases with increasing p. For a 3D domain, p can be defined as the sum of π2 (each fracture area / π)1.5 per unit volume (Itasca 2019). Aperture The aperture is the opening width of a fracture, which is measured along the perpendicular direction of the fracture surface. The real aperture distribution is difficult to specify because natural fractures are irregularly shaped with variable apertures rather than simple plane and aperture data, which can be obtained from outcropped fractures and borehole TV observations of a rock wall. Gaussian, log-normal, negative power law, and uniform distributions have therefore been assumed for the aperture distribution (Lei et al. 2017), and one must be selected depending on the statistical properties of the measured data and rock-mass conditions.
R
1232
Rock Fracture Pattern and Modeling
Rock Fracture Pattern and Modeling, Fig. 3 Simulated fractures using GEOFRAC for a granitic area (modified from Koike et al. (2015)). (a) Perspective view of the locations of 2309 observed fractures from 26 deep boreholes, perspective views showing the distribution pattern of
the simulated continuous fractures viewed from (b) above, (c) the west, and (d) the lower hemisphere projection of Schmidt’s net of a pole density distribution of the fractures. Each fracture is randomly and semi-transparently colored to discriminate the different fractures
The hydraulic conductivity (or permeability) of rock mass is strongly controlled by the fracture aperture because open fractures act as essential pathways of fluid flow. An open fracture is partly filled with minerals (e.g., quartz, calcite), which causes a rough fracture surface, impermeable portions in the pathway, and a highly variable aperture. The hydraulic aperture, which is substantially considered for characterizing hydraulic properties and fluid flow, is therefore considerably smaller than the aperture in general.
adopted in which the flow rate though the fracture is proportional to the cube of the hydraulic aperture (Witherspoon et al. 1980). For actual rock conditions, the fracture-surface roughness must be considered because roughness significantly affects the hydromechanical behavior of the fractured rock mass (Brown 1987). The joint roughness coefficient (JRC) is a representative index for the roughness measure, which can be related to the nonlinear shear strength criterion of fracture with normal stress in engineering (Barton 1976). Because JRC is known to vary with scale, self-affine or self-similar fractal models have been used for a more accurate expression of the fracture surface (e.g., Dauskardt et al. 1990; Maximo and de Lacer 2012).
Geometry Because real fracture shapes in a rock mass are impossible to specify, they must be approximated as simple regular shapes, such as planar polygons or disks (Jing and Stephansson 2007). However, complicated shapes can form by connecting several disks that are located close to one another with similar orientations and enveloping them (Koike et al. 2012). Fractures are modeled as a smooth parallel plate for the simplest laminar fluid. For this simplification, the cubic law is
Fracture Modeling The rock fracture pattern includes the size, orientation, distribution density, aperture in some cases, and intersection, and is typically modeled using a DFN. Numerous application software programs are available for DFN modeling (e.g.,
Rock Fracture Pattern and Modeling
1233
Rock Fracture Pattern and Modeling, Fig. 4 Fracture propagation in rock mass around a tunnel with the excavation progress simulated by the hybrid FEMDEM
FLAC3D, PFC3D), as demonstrated by the example in Fig. 2 generated by FLAC3D using different geometric parameters. These applications conventionally define the probability distributions of the fracture geometric parameters and shapes as polygons or disks for modeling. These parameters are randomly selected from the probability distributions and assigned to each fracture. Geostatistical methods have also been developed for DFN modeling to honor the location and orientation data of measured fractures (e.g., Dowd et al. 2007). GEOFRAC (GEOstatistical FRACture simulation method; Koike et al. 2015) is one such method that considers the spatial correlations of fracture orientations and densities and forms an arbitrary fracture shape by connecting several disks, as described above (Koike et al. 2012). Regional fracture patterns can be revealed using GEOFRAC from a borehole fracture dataset that show clear dominant orientations, continuous fracture distributions, and concentrated zones of parallel and intersecting fractures (Fig. 3). Continuum- and discontinuum-based methods have been developed to model fracture initiation and propagation under tectonic stress conditions (Lisjak and Grasselli 2014; Wolper et al. 2019). The former methods are based on the mechanics of plasticity and continuum damage by assuming continuous displacement over the calculation domain, and typically use the finite element method (FEM) for the simulation. In contrast, the latter methods assume that fractures are represented by the discontinuous displacement field, and typically use the discrete element method (DEM), discontinuous deformation analysis (DDA) method, and hybrid finite-DEM (FEMDEM) for the simulation. Figure 4 shows an example of fracture propagation around a tunnel during the excavation progress, as simulated by the hybrid FEMDEM. The change in permeability and extension of the damage zone is predictable with the fracture development. Machine learning has recently been adopted to predict fracture propagation by considering the fracture geometry and mechanics (e.g., Hsu et al. 2020).
Summary The rock fracture pattern (i.e., fracture distribution configuration) is a control factor of the hydraulic and mechanical properties of rock mass, and is typically characterized by the fracture length, orientation, density, connectivity, aperture, and geometry. The length, orientation, and aperture distributions are approximated using the most suitable statistical and power-law distributions of the fracture survey data. The volumetric fracture intensity, P32 (m2/m3), is indispensable for the 3D density to realistically model the fracture distribution in rock mass. Although this density cannot be directly measured, it is experimentally derivable from the 1D intensity P10 or 2D intensity P21. Percolation theory is most commonly used for connectivity measurements. The fracture shape is simply approximated as a planar polygon or disk, and selfaffine or self-similar fractal models are reasonable to accurately express the fracture surface. Considering the fracture pattern, the spatial fracture distribution is typically modeled by a DFN that uses a conventional probability distribution of the geometric parameters and the fracture shape as a polygon or disk. DFN modeling is a type of nonconditional simulation. In contrast, geostatistical methods have also been developed to honor the fracture location and orientation data obtained by measurements, which includes the incorporation of spatial correlations of fracture orientations and densities into the DFN model and the formation of an arbitrary fracture shape by connecting several disks. In addition to the above static fracture modeling, dynamic fracture modeling using continuum- and discontinuum-based methods is essential for modeling fracture initiation and propagation in a tectonic stress field. Changes in the permeability and extension of the damage zone with the fracture development are predictable by dynamic modeling using a method such as hybrid FEMDEM.
R
1234
Cross-References ▶ Geostatistics ▶ Kriging ▶ Machine Learning ▶ Statistical Rock Physics ▶ Stochastic Geometry in the Geosciences
Bibliography Barton N (1976) The shear strength of rock and rock joints. Int J Rock Mech Min Sci Geomech Abstr 13:255–279. https://doi.org/10.1016/ 0148-9062(76)90003-6 Bonnet E, Bour O, Odling NE, Davy P, Main I, Cowie P, Berkowitz B (2001) Scaling of fracture systems in geological media. Rev Geophys 39:347–383. https://doi.org/10.1029/1999RG000074 Brown SR (1987) Fluid flow through rock joints: the effect of surface roughness. J Geophys Res 92:1337–1347. https://doi.org/10.1029/ JB092IB02P01337 Dauskardt RH, Haubensak F, Ritchie RO (1990) On the interpretation of the fractal character of fracture surfaces. Acta Metall Mater 38: 143–159. https://doi.org/10.1016/0956-7151(90)90043-G Dershowitz WS, Herda HH (1992) Interpretation of fracture spacing and intensity. Paper presented at the 33rd U.S. Symposium on Rock Mechanics (USRMS), Santa Fe, New Mexico, June 1992 Dowd PA, Xu C, Mardia KV, Fowell RJ (2007) A comparison of methods for the stochastic simulation of rock fractures. Math Geol 39:698–714. https://doi.org/10.1007/s11004-007-9116-6 Fisher NI (1993) Statistical analysis of circular data. Cambridge University Press. https://doi.org/10.1017/CBO9780511564345 Hsu YC, Yu CH, Buehler MJ (2020) Using deep learning to predict fracture patterns in crystalline solids. Matter 3:197–211. https://doi. org/10.1016/J.MATT.2020.04.019 Hyman JD, Aldrich G, Viswanathan H, Makedonska N, Karra S (2016) Fracture size and transmissivity correlations: implications for transport simulations in sparse three-dimensional discrete fracture networks following a truncated power law distribution of fracture size. Water Resour Res 52:6472–6489. https://doi.org/10.1002/ 2016WR018806 Itasca Consulting Group Inc (2019) FLAC3D user manual, Version 6.0 Jing L, Stephansson O (2007) The basics of fracture system characterization – field mapping and stochastic simulations. Dev Geotech Eng 85:147–177. https://doi.org/10.1016/S0165-1250(07)85005-X Koike K, Liu C, Sanga T (2012) Incorporation of fracture directions into 3D geostatistical methods for a rock fracture system. Environ Earth Sci 66:1403–1414. https://doi.org/10.1007/s12665011-1350-z Koike K, Kubo T, Liu C, Masoud A, Amano K, Kurihara A, Matsuoka T, Lanyon B (2015) 3D geostatistical modeling of fracture system in a granitic massif to characterize hydraulic properties and fracture distribution. Tectonophysics 660:1–16. https://doi.org/10.1016/J. TECTO.2015.06.008 Lei Q, Wang X (2016) Tectonic interpretation of the connectivity of a multiscale fracture system in limestone. Geophys Res Lett 43: 1551–1558. https://doi.org/10.1002/2015GL067277 Lei Q, Latham JP, Tsang CF (2017) The use of discrete fracture networks for modelling coupled geomechanical and hydrological behaviour of fractured rocks. Comput Geotech 85:151–176. https://doi.org/10. 1016/J.COMPGEO.2016.12.024 Lisjak A, Grasselli G (2014) A review of discrete modeling techniques for fracturing processes in discontinuous rock masses. J Rock Mech
Rodionov, Dmitriy Alekseevich Geotech Eng 6:301–314. https://doi.org/10.1016/J.JRMGE.2013. 12.007 Maximo L, de Lacer LA (2012) Fractal fracture mechanics applied to materials engineering. In: Applied fracture mechanics. https://doi. org/10.5772/52511 Witherspoon PA, Wang JSY, Iwai K, Gale JE (1980) Validity of cubic law for fluid flow in a deformable rock fracture. Water Resour Res 16: 1016–1024. https://doi.org/10.1029/WR016I006P01016 Wolper J, Fang Y, Li M, Lu J, Gao M, Jiang C (2019) CD-MPM: continuum damage material point methods for dynamic fracture animation. ACM Trans Graph 38:1–15. https://doi.org/10.1145/ 3306346.3322949
Rodionov, Dmitriy Alekseevich Hannes Thiergärtner Department of Geosciences, Free University of Berlin, Berlin, Germany
Fig. 1 D. A. Rodionov. (Photo: Thiergärtner 1984)
Biography Dmitriy Alekseevich Rodionov was the leading Russian scientist in applications of mathematical geosciences between 1970 and 1990 and one of the 20 founding members of the International Association for Mathematical Geology (IAMG). Born in Saratov (Russia) on October 9, 1931, he studied geochemistry at the State University in Saratov before he started a carrier as exploratory geologist. Participating in a
Rodriguez-Iturbe, Ignacio
field campaign, he befriended the mathematician Prof. Yu. V. Prochorov who inspired many young geologists about using mathematics. Rodionov obtained his PhD doctorate in the field of statistical treatment of geochemical and mineralogical data. He developed quickly into the leading expert for applied mathematical geosciences in the former Soviet Union when he worked in the Moscow Institute for Mineralogy, Geochemistry, and Crystal Chemistry of Rare Elements (IMGRE) of the Academy of Science of the USSR. On the occasion of the 23rd International Geological Congress 1968 in Prague Prof. Rodionov was one of the founding members of the IAMG and was elected as one of its Councillors. Because of the former political situation, he could not really take part in IAMG activities until 1990 but he always was in close contact with colleagues from other eastern European countries. He moved to the Moscow All-Union Institute for the Economy of Mineral Resources (VIEMS) in 1969 where he worked until 1981 as the leading specialist for the development of mathematical techniques and data processing in the field of economical geology. Under his leadership, numerous mathematical approaches were introduced for geological exploration and documented in several monographs and other publications in the Russian language. He was also one of the Russian scientists who regularly participated in the biennial scientific meetings at Přibram (then Czechoslovakia) where scientists from both political blocks could meet and exchange scientific ideas. Between 1981 and 1992, Rodionov was the director of the Laboratory for Mathematical Geology of the Institute for Ore Deposit Geology, Mineralogy, Petrology, and Geochemistry (IGEM) of the USSR Academy of Sciences in Moscow. The last few years before his demise on October 2, 1994 he worked again at VIEMS. The IAMG honored his lifetime contributions in 1994 with the William Christian Krumbein medal. Former friends remember Dmitriy Alekseevich as a wonderful personality. All his life he remained an agile geoscientist, always connected with the science, helpful towards colleagues and open-minded toward any new development. Last but not least, Dmitriy Alekseevich inspired everybody he worked with.
Cross-References ▶ Object Boundary Analysis
Bibliography Kogan RI, Belov Yu, Rodionov DA (1983) Statistical rank criteria in geology. Nedra, Moscow. (in Russian) Rodionov DA (1964) Distribution functions of the content of chemical elements and minerals in eruptive rocks. Nauka, Moscow. (in Russian)
1235 Rodionov DA (1968) Statistical methods to mark geological objects by a complex of attributes. Nedra, Moscow. (in Russian) Rodionov DA (1981) Statistical solutions in geology. Nedra, Moscow. (in Russian) Rodionov DA, Zabelina TM, Rodionova MK (1973) Semi-quantitative analysis in biostratigraphy and palaeoecology. Nedra, Moscow. (in Russian) Rodionov DA, Kogan RI, Belov Yu (1979) Statistical methods classifying geological objects. VIEMS publishers, Moscow. (in Russian) Rodionov DA, Kogan RI, Golubeva VA (1987) Handbook for mathematical methods applied in geology. Nedra, Moscow. (in Russian)
Rodriguez-Iturbe, Ignacio B. S. Daya Sagar Systems Science and Informatics Unit, Indian Statistical Institute, Bangalore, Karnataka, India
Fig. 1 Professor Ignacio Rodriguez-Iturbe (1942–2022) (Courtesy of Ignacio Rodriguez-Iturbe)
Biography Ignacio Rodriguez-Iturbe is Distinguished University Professor and TEES Eminent Professor at the Departments of Ocean Engineering, Civil Engineering, and Biological and Agricultural Engineering, Texas A&M University. He is also James S. McDonnell Distinguished University Professor Emeritus at Princeton University. Born on 8 March 1942 in Caracas, Venezuela, and educated as a civil engineer at Universidad del Zulia in Maracaibo, he finished his Ph.D. in Colorado State University (1967) under Professor Vujica Yevjevich. After teaching in Venezuela and the United States (MIT, Iowa, Texas A&M) he was in Princeton during 2000–2018 as the James
R
1236
S. McDonnell Distinguished University Professor (now Emeritus) and Professor of Civil and Environmental Engineering. He is a member of numerous academies including the National Academy of Sciences, the National Academy of Engineering, the American Academy of Arts and Sciences, the Spanish Royal Society of Sciences, the Vatican Academy of Sciences, the Venezuelan National Academy of Engineering, the Mexican Academy of Engineering, the Water Academy (Sweden), the Third World Academy of Sciences, and several others. He is the recipient of the Stockholm Water Prize (2002), the Mexico International Prize for Science and Technology (1994), the Venezuela National Science Prize (1991), the Prince Sultan Bin Abdulaziz International Water Prize (2010), the Venezuela National Engineering Research Prize (1998), the E.O. Wilson Biotechnology Pioneer Award (2009), and several other international prizes and awards. In the United States, he received from the American Geophysical Union, the Bowie Medal (2009), the Horton Medal (1998), the Macelwane Medal (1977), the Hydrologic Sciences Award (1974), and the Langbein Lecture Award (2000). He is a Fellow of the AGU and the American Meteorological Society where he received the Horton Lecture Award (1995). From the American Society of Civil Engineers he received the Huber Research Prize (1975) and the V.T.Chow Award (2001). He has Honorary Doctor’s Degrees from the University of Genova (Italy, 1992), Universidad del Zulia (Venezuela, 2003), and Universidad de Cantabria (Spain, 2011) as well as Teaching Awards in the MIT Civil and Environmental Engineering Department (1974) and Universidad del Zulia (1970). Among many other awards, he has received the Hydrology Days Award (2002) and Distinguished Alumnus Award (1994) from Colorado State University; Peter S.Eagleson Lecturer (Consortium of Universities for Hydrologic Sciences, 2010); Enrico Marchi Memorial Lecture (Italy, 2011); Chester Kisiel Memorial Lecture (Arizona, 1991); Landolt Lecture (EPFL, Switzerland, 2009); Moore Lecture (University of Virginia, 1999); Evnin Lecture (Princeton, 2000); Smallwood Distinguished Lecture (University of Florida, 2010); Distinguished Lecturer (College of Engineering, Texas A&M, 2003); Miller Research Professor (University of California, Berkeley, 2004). He is also the namesake of the “Ignacio Rodriguez-Iturbe Prize” awarded annually for the best paper in the journal “Ecohydrology.” He is the author of four books and editor of several others. His book on “Fractal River Basins: Chance and Self-Organization” coauthored with Andrea Rinaldo is a mathematically fascinating book and presents several cases supporting the self-organized nature of river basins and related terrestrial phenomena and processes (Rodríguez-Iturbe and Rinaldo 2001). He has contributed near 300 papers in international journals and many more in Proceedings and Chapters in
Root Mean Square
books. His research interests are broadly in Surface Water Hydrology with special emphasis in Ecohydrology, Hydrogeomorphology, Hydrometeorology, and the International Trade of Virtual Water.
Bibliography Rodríguez-Iturbe I, Rinaldo A (2001) Fractal river basins: chance and self-organization. Cambridge University Press, Cambridge, p 570
Root Mean Square Tianfeng Chai Cooperative Institute for Satellite Earth System Studies (CISESS), University of Maryland, College Park, MD, USA NOAA/Air Resources Laboratory (ARL), College Park, MD, USA
Definition Root mean square (RMS), also called the quadratic mean, is the square root of the mean square of a set of numbers.
Introduction Unlike arithmetic mean which allows positive and negative numbers offset each other, RMS has a non-negative contribution from each. While RMS has many uses, such as RMS voltage in electrical engineering and RMS speed for gas molecules in statistical mechanics, in geoscience it is almost exclusively associated with root mean square error (RMSE) or root mean square deviation (RMSD). Model simulation results, estimates with empirical formulas, and some measurements in question often need to be evaluated using more reliable measurements or theoretical predictions. Those quantities that need to be evaluated are first paired with the assumed “truth,” that is, more accurate measurements or predictions. Then the pairwise differences are obtained as a set of deviation or error samples, di, i ¼ 1,. . ., n. RMSD (or RMSE) is then defined as: RMSD ¼
1 n
n
d i¼1 i
2
ð1Þ
While the terms of RMSD and RMSE are interchangeable in practice, RMSE appears more frequently than RMSD in geoscience literature. Hereafter we will only use the term RMSE. Note that RMSE is scale dependent, and normalized
Root Mean Square
1237
RMSE is sometimes employed to avoid the dependence on units.
RMSE and MAE While RMSE is often used to measure the overall magnitude of error samples, it is often compared with mean absolute error (MAE), which is a simple arithmetic mean of the absolute values of errors. Clearly, RMSE and MAE measure the magnitude of error samples differently. The larger errors contribute to RMSE much more than the smaller errors due to the squaring operation. RMSE is suitable to describe a set of error samples that follow a normal (or Gaussian) distribution with a zero mean. If the error samples have a non-zero mean, the mean error (or the mean bias) is often removed before the standard deviation of the errors is calculated. In such a case, it is the standard error (SE) that should be used. Note that the degree of freedom is reduced by one when calculating the mean bias-corrected sample variance. In geoscience literature, unbiased RMSE is often used without correcting the degree of freedom change. However, the difference between unbiased RMSE and SE is minimal for typical large count of error samples in model evaluation studies. For error samples from a Gaussian distribution population, calculating mean and SE of the errors will allow one to reconstruct the actual distribution if there are enough samples. Table 1 shows RMSEs calculated for randomly generated pseudo-errors which follow a Gaussian distribution with a zero mean and unit standard deviation, similarly as those shown in Chai and Draxler (2014). As the sample size reaches 100 or above, using the calculated RMSEs one can reconstruct the error distribution close to its “truth” or “exact solution,” with its standard deviation within 5% to its truth (i.e., 1) as shown in Table 1. For errors following a normal distribution with non-zero mean, the error in the estimate of p the mean is proportional to 1= n. With enough observations Root Mean Square, Table 1 RMSEs of randomly generated pseudoerrors which follow a Gaussian distribution with a zero mean and unit variance. For seven different sample sizes, n ¼ 4, 10, 100, 1000, 10,000, 100,000, and 1,000,000, five sets of errors are generated with different random seeds n (error sample size) 4 10 100 1000 10,000 100,000 1000,000
RMSE Set #1 0.9185 0.8047 1.0511 1.0362 0.9974 1.0034 0.9996
Set #2 0.6543 1.1016 1.0293 0.9772 0.9841 0.9998 1.0007
Set #3 1.4805 0.8329 1.0305 1.0107 1.0142 1.0038 1.0003
Set #4 1.0253 0.9506 0.9961 1.0004 0.9956 1.0028 0.9993
Set #5 0.7906 1.0099 1.0401 1.0029 1.0023 1.0015 0.9996
for error samples, the error distribution can be reliably constructed using the estimated mean and SE. When there are not enough error samples, presenting the values of the errors themselves (e.g., in tables) is probably more appropriate than calculating RMSE or any other statistics. Although sometimes error samples do not exactly follow a Gaussian distribution, the error distributions in most applications are often close to be normal distributions. It has been observed that many physical quantities that are affected by many independent processes typically follow Gaussian distributions. This phenomenon is commonly explained by the central limit theorem, which states that the properly normalized sum of independent random variables tends toward a normal distribution even if the original variables themselves are not normally distributed. There are also other explanations for the ubiquity of normal distributions (e.g., Lyon 2014). Note that the model evaluation studies where error samples are calculated as the differences between predictions and observations are focused here. Unlike some physical quantities which are often non-negative and may not directly follow a normal distribution without a mathematical transformation, the error samples in model evaluation studies usually closely resemble normal distributions. In model evaluation studies, both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed to measure the overall error magnitude. In the literature, RMSE is sometimes considered to be ambiguous as a metric for average model performance (e.g., Willmott and Matsuura 2005; Willmott et al. 2009). However, such suggestion is probably misleading. Chai and Draxler (2014) show that some of the arguments to avoid RMSE are indeed mistaken. For instance, RMSE actually satisfies the triangle inequality requirement for a distance metric, contrary to what was claimed by Willmott et al. (2009). Furthermore, the use of absolute value operation in calculating MAE makes the gradient or sensitivity of MAE with respect to certain model parameters difficult if not impossible. In fact, it is not wise to avoid either RMSE or MAE without considering the specifics of the application. The best statistics metric should be based on the actual distribution of the errors in order to provide a better performance measure. The MAE is suitable to describe uniformly distributed errors while RMSE is good to measure errors following a normal distribution. Also note that when both RMSE and MAE are calculated, RMSE is by definition never smaller than MAE. As any single statistical metric condenses a set of error values into a single number, it provides only one projection of samples and, therefore, emphasizes only a certain aspect of the error characteristics. Thus, a combination of metrics, including but certainly not limited to RMSE and MAE, is often required to better assess model performance. For errors sampled from an unknown distribution, providing more statistical moments of the
R
1238
model errors, such as mean, variance, skewness, and flatness, will help depict a better picture of the error variation.
Usage of RMSE It is well known that RMSE is sensitive to outliers. In fact, the existence of outliers and their probability of occurrence are well described by the normal distribution underlying the use of the RMSE. Chai and Draxler (2014) showed that with a large amount of error samples that include some outliers, one can still closely reconstruct the original normal distribution. In addition, sensitivity to larger errors makes RMSE more discriminating as a model evaluation metric. This aspect is often more desirable when evaluating models. In data assimilation field, the sum of squared errors is often formed as the main component of the cost function to be minimized by adjusting model parameters. In such applications, penalizing the outliers associated with large discrepancies between model and observations through the defined least-square terms proves to be very effective in improving model performance. In practice, it might be necessary to throw out the outliers that are several orders larger than the other samples when calculating the RMSE, especially when the number of samples is limited.
Root Mean Square
Summary Root mean square errors (RMSEs) are often used for model evaluation studies in geoscience. The arguments to choose MAE over RMSE in literature are mistaken. Compared with MAE, RMSE is suitable to measure errors which follow a normal distribution. As a more discriminating metric because of its sensitivity to large errors, RMSE is probably more desirable in model evaluations. In addition, RMS is closely related to cost function formulation in data assimilation field.
References Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? – arguments against avoiding RMSE in the literature. Geosci Model Dev 7:1247–1250. https://doi.org/10.5194/ gmd-7-1247-2014 Lyon A (2014) Why are normal distributions normal? Br J Philos Sci 65(3):621–649. https://doi.org/10.1093/bjps/axs046 Willmott C, Matsuura K (2005) Advantages of the Mean Absolute Error (MAE) over the Root Mean Square error (RMSE) in assessing average model performance. Clim Res 30:79–82 Willmott CJ, Matsuura K, Robeson SM (2009) Ambiguities inherent in sums-of-squares-based error statistics. Atmos Environ 43:749–752
S
Sampling Importance: Resampling Algorithms Anindita Dasgupta and Uttam Kumar Spatial Computing Laboratory, Center for Data Sciences, International Institute of Information Technology Bangalore (IIITB), Bangalore, Karnataka, India
Definition Sampling is a technique from which information about the entire population can be inferred. In case of remote sensing (RS) and geographic information system (GIS), training and test data are created during the early part of the project using random sampling while post-classification stratified random sampling is employed for different land-cover classes behaving as strata. RS images may have distortions that may be corrected using georectification. Georectification or georeferencing is the process of establishing mathematical relationship between the addresses of pixels in an image with the corresponding coordinates of those pixels on another image or map or ground. Georeferenced imagery is used to extract distance, polygon area, and direction information accurately. Georeferencing indicates that resampling of an image requires matching not only to a reference image but also to reference points that correspond to specific known locations on the ground. Resampling is related to but distinct from georeferencing. It is the application of interpolation to bring an image into registration with another image or a planimetrically correct map. The three methods used for resampling are nearest-neighbor, bilinear interpolation and cubic convolution. The nearest-neighbor approach uses the value of the closest input pixel for the output pixel value. Bilinear interpolation uses the weighted average of the nearest four pixels to the output pixel. Cubic convolution approach uses the weighted average of the nearest 16 pixels to the output pixel.
Introduction Studying an entire population is both time-consuming and resource-intensive. Therefore, it is advisable to take a sample from the population of interest, and on the basis of this sample, information about the entire population is inferred. A sample is a portion, piece, or segment that is representative of a whole population. Conclusions about an entire population are drawn based on the sample information through statistical inference. A sample generally provides an accurate picture of the population from which it is drawn. It is random; each individual in the population has an equal chance of being selected (Groves et al. 2011). Different sampling schemes are explained next. Simple random sampling: In this technique, units are independently selected one at a time until the desired sample size is achieved. Each study unit in the finite population has an equal chance of being included in the sample. This method of sampling has the advantage of representativeness, freedom from bias, and ease of sampling and analysis. The disadvantages of this method are errors in sampling, and additional time and labor requirements. Systematic sampling: This method has evenly distributed samples over the entire population. When N units exist in the population and n units are to be selected, then sampling interval R ¼ N/n. The advantage is that it is spatially well distributed with small standard errors. The disadvantages of this method are that it is less flexible to increase or decrease the sampling size and is not applicable for fragmented strata. Stratified sampling: It involves grouping the population of interest into strata to estimate the characteristics of each stratum and improve the precision of an estimate for the entire population (Cochran 2007). Stratified random sampling is useful for data collection if the population is heterogeneous, as the entire population is divided into a number of homogeneous groups known as strata. The sampling error depends on the population variance within stratum but not between the strata (Singh and Masuku 2014). The advantage is that it
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
1240
allows specifying the sample size within each stratum and different sampling design for each stratum; stratification may increase precision. The major drawbacks are that it yields large standard error if the sample size selected is not appropriate. Moreover, it is not effective if all the variables are equally important. Cluster sampling: It involves grouping of the spatial units or objects sampled into clusters. All observations in the selected clusters are included in the sample. This method can reduce the time and expense of sampling by reducing travel distance. The disadvantage is that clustering can yield higher sampling error. Further, it can be difficult to select representative clusters. Multistage random sampling: In this sampling, the region is separated into different subsets that are randomly selected (first stage), and then the selected subsets are randomly sampled (second stage). This is similar to stratified random sampling, except that with stratified random sampling, each stratum is sampled. However, there is stronger clustering here than simple random sampling.
Sampling Importance: Resampling Algorithms
variability in the attributes being measured (Miaoulis and Michener 1976). • Level of precision: Sampling error, also known as the level of precision, is the range in which the true value of the population is estimated (Singh, 2014). This range is expressed in percentage (e.g., 5%). • Confidence interval: In a normal distribution, approximately 95% of the sample values are within two standard deviations of the true population value. This confidence interval is also known as the risk of error in statistical hypothesis testing. • Degree of variability: When the variables exist with more homogeneous population, the sample size required is smaller. If the population is heterogeneous, larger sample size is required to obtain a given level of precision (Singh, 2014). In case of RS data, Fitzpatrick Lins (1981) suggested that sample size N to be used to access accuracy of land-cover classified map should be determined from the formula of binomial probability theory:
Sampling in Geospatial Data Analysis Let’s understand sampling in the context of geospatial data analysis in RS applications. In case of land-cover classification from remotely sensed data, training and test reference data are randomly sampled. When the sample size is sufficient, simple random sampling provides adequate estimate of the population. While using stratified sampling, the caution is not to overestimate the population parameter. Stratified sampling may be preferred as minimum number of samples is taken from each stratum. For example, if we consider land-cover classes such as built-up and agriculture, then samples have to be taken from both the classes. Test sample can be collected using Global Positioning System (GPS) to identify ground control points (GCPs). During early part of the project, random sampling can be employed to collect data followed by post-classification random sampling within the strata. Sample Size Sample size is an important part of any investigation or research project where the study seeks to make inferences about the population from a sample. When the population is heterogeneous, stratified sampling can be used and different sample sizes are collected for each population. Census method is a technique of statistical enumeration where all members of the population are studied and the sample size is equal to the population size. Three criteria determine the appropriate sample size: level of precision, level of confidence or risk, and degree of
N¼
Z2 ðpÞðqÞ E2
ð1Þ
where p is the expected percent accuracy, q ¼ (100 - p), E is the allowable error, and Z ¼ 2 from standard normal deviation of 1.96 from 95% two-side confidence interval. For example, if the expected accuracy is 85% and allowable error is 5%, then N¼
22 ð85Þð15Þ ¼ 204 52
Ground Control Points (GCPs) Remotely sensed imagery exhibits internal and external geometric error, and these random and systematic distortions are corrected by analyzing well-distributed Ground Control Points (GCPs) occurring in an image. Hence, remotely sensed data should be preprocessed to remove geometric distortion so that individual picture elements (pixels) are in their proper planimetric (x, y) map locations. GCP is a location on the surface of the Earth (e.g., a road intersection, rail-road crossings, towers, buildings, etc.) that can be identified on the remotely sensed imagery and located accurately on a map (Jenson, 1986). The locations of the output pixels are derived from locational information provided by GCPs, places on the input image that can be located with precision on the ground and on planimetrically correct maps. The locations of these points establish the geometry of the output image and its
Sampling Importance: Resampling Algorithms
1241
relationship to the input image. Thus, this first step establishes the framework of pixel positions for the output image using the GCPs.
Georectification or Georeferencing Georectification allows RS-derived information to be related to the other thematic information in GIS or spatial decision support systems (SDSS). It is the process of establishing mathematical relationship between the addresses of pixels in an image with the corresponding coordinates of those pixels on another image or map or ground. In the correction process, numerous GCPs are located in terms of their image coordinates (column and row numbers) on the distorted image and in terms of their ground coordinates. This involves fitting the coordinate system of an image to that of the second image of the same area. The transformation of remotely sensed images so that it has a scale and projections of a map is called geometric correction (Lillesand, 2015). Geometrically corrected imagery can be used to extract distance, polygon area, and direction information accurately. In other words, rectification involves identification of geometric relationship between input pixel location (column and row) and associated map coordinates of the same point (x,y). During rectification, GCPs are selected and polynomial equations are fitted using least-squares technique. Least-squares regression is used to determine coefficients for two coordinate transformation equations that can be used to interrelate the geometrically correct (map) coordinates and the distorted image coordinates (Jenson, 1986). The correct image coordinates for any map position can be precisely estimated with the help of the coefficients from these equations: x ¼ f 1 ðX, Y Þ y ¼ f 2 ðX, Y Þ where (x, y) is the correct map coordinates (column, row), (X, Y) is the distorted image coordinates, and f1 and f2 are the transformation functions. GCPs should be uniformly spread across the image and present in a minimum number, depending on the type of transformation to be used. Conformal, affine, projective, and polynomial transformations are frequently used in RS image rectification. Higher-order transformation may be required such as projective and polynomial transformations to geocode an image and make it fit with another image. Depending upon the distortions in the imagery, the number of GCPs used, and their locations relative to one another, complex polynomial equations are also used. The degree of complexity of the polynomial is expressed as the order of the polynomial. The order is the highest exponent used in the polynomial.
x 0 ¼ a0 þ a1 X þ a2 Y
ð2Þ
y 0 ¼ b0 þ b1 X þ b2 Y
ð3Þ
where x0 y0 are the rectified positions (in output image), and x, y are the positions in the input distorted image. When the six coefficients a0, a1, a2, b0, b1 and b2 are known, then it is possible to transfer the pixel values from the original to the rectified output image. However, before applying rectification to the entire dataset, it is important to determine how well the six coefficients derived from the leastsquares regression of the initial GCPs account for the geometric distortion in the input image. The method used involves computing root mean square error (RMSE) for each of the GCPs. RMSE is the distance between the input (source or measured) location of a GCP and the retransformed (or computed) location for the same GCP (Jenson, 1986). RMS error is computed with a Euclidean distance measure, and it is a way to measure distortion for each control point. RMS error ¼
x0 xorig
2
þ y0 yorig
2
1=2
ð4Þ
where xorig and yorig are the original column and row coordinates of the GCP in the image, and x’ and y’ are the computed coordinate in the distorted image. The square root of the standard deviation represents measure of accuracy. After each computation of a transformation, the total RMSE reveals that a given set of control points exceeds the threshold. The GCP that has the highest individual error is deleted from the analysis, and the coefficients and RMSE for all the points are recomputed.
Image Resampling It is the application of interpolation to bring an image into registration with another image or a planimetrically correct map. Resampling is related but distinct from georeferencing. Georeferencing indicates that resampling of an image requires matching not only to a reference image but also to reference points that correspond to specific known locations on the ground (Campbell and Wynne 2011). Interpolation is applied to bring an image into registration with another image or a planimetrically correct map. Pixel brightness values (BVs) must be determined. There is no direct one-to-one relationship between the movement of input pixel values to output pixel locations. A pixel in the rectified output image often requires a value from the input pixel grid that does not fall neatly on a row-and-column coordinate. When this occurs, there must be some mechanism for determining the BV to be assigned to the output-rectified
S
1242
Sampling Importance: Resampling Algorithms
pixel. This process is called intensity interpolation. Various popular resampling techniques used in RS applications are mentioned below. • Nearest-Neighbor The nearest-neighbor approach uses the value of the closest input pixel for the output pixel value. The pixel value occupying the closest image file coordinate to the estimated coordinate will be used for the output pixel value in the georeferenced image. Let’s try to understand this with the help of Fig. 1. The green grid is considered to be the output image to be created. To determine the value of central pixel (highlighted in light green box), in the nearest-neighbor approach, the value of the nearest original pixel is assigned, i.e., the value of black pixel in this example. In other words, the value of each output pixel is assigned on the basis of digital number closest to the pixel in the input matrix. This method has the advantage of simplicity and ability to preserve original input values as the output values. While resampling methods tend to average surrounding values, this may be an important consideration when discriminating between vegetation types or locating boundaries. This is easily computed and faster than other techniques. The disadvantage is that image has a rough appearance relative to the original unrectified data due to “stair-stepped” effect (see Fig. 2) (Bakx et al. 2012).
• Bilinear Interpolation This method uses the weighted average of the nearest four pixels to the output pixel. Weighted mean is calculated for the four nearest pixels in the original image (here, it is represented as dark gray and black pixels) (see Fig. 1). 4
BVwt¼ k¼1 4 k¼1
Zk D2 k
ð5Þ
1 D2 k
where Zk are the surrounding 4 data point values, and D2k are the distances squared from the point in question (x’ y’) to the data points. The advantage of this technique is that the stair-step effect caused by the nearest-neighbor approach is reduced and this image looks smooth. The disadvantage is that original data is altered and contrast is reduced by averaging neighboring values together. It is computationally more expensive than nearest-neighbor. • Cubic Convolution Cubic convolution approach uses the weighted average of the nearest 16 pixels to the output pixel. In Fig. 1, it is represented as a black and gray pixel in the input image.
Sampling Importance: Resampling Algorithms, Fig. 1 Principle of resampling using nearest-neighbor, bilinear interpolation, and cubic convolution. (Bakx et al. 2012)
Sampling Importance: Resampling Algorithms, Fig. 2 Nearest-neighbor, bilinear interpolation, and cubic convolution. (Bakx et al. 2012)
Original
Nearest Neighbour
Bilinear Interpolation
Bicubic Interpolation
Sampling Importance: Resampling Algorithms
1243
Sampling Importance: Resampling Algorithms, Fig. 3 Resampled images
The output is similar to bilinear interpolation, but the smoothing effect caused by averaging of surrounding input pixel values is more dramatic. 16
Zk D2 k BVwt¼ k¼1 16 1 D2 k¼1 k
ð6Þ
where Zk are the surrounding 16 data point values, and D2k are the distances squared from the point in question (x’ y’) to the data points. The advantage is that the stair-step effect caused by the nearest-neighbor approach is reduced and the image looks much smoother (Fig. 2). The disadvantages of this method are same as that of the bilinear interpolation. It is computationally more expensive than nearest-neighbor or bilinear interpolation.
Case Study of Resampling in Remote Sensing Applications
stepped effect is reduced and images are subsequently smoothened with bilinear and bicubic interpolation.
Conclusions Resampling is the application of interpolation technique to bring an image into registration with another image or a planimetrically correct map. Resampling techniques used for georegistration are nearest-neighbor, bilinear interpolation and cubic convolution. After applying these techniques, the result obtained was observed. Nearest-neighbor was easily computed and faster than other techniques. However, the stair-stepped effect of nearest-neighbor was prominent, while the same effect reduced and images were subsequently smoothened in bilinear and bicubic interpolation approaches. However, in bilinear interpolation and cubic convolution, the original image values got altered and the contrast reduced by averaging neighboring values. These two techniques are computationally more expensive than nearest-neighbor.
Cross-References In the following experiment, digital elevation model (DEM) of a part of Bangalore City, India, at a spatial resolution of 150 m is considered. The stair-stepped effect of nearestneighbor is visible in Fig. 3b. It is distinctly seen within the red rectangular bonding box. Similarly, in Fig. 3c, d, the stair-
▶ Cluster Analysis and Classification. ▶ Geographical Information Science. ▶ Remote Sensing.
S
1244
Bibliography Bakx JPG, Janssen L, Schetselaar EM, Tempfli K, Tolpekin VA (2012) Image analysis. In: The core of GIScience: a systems-based approach. University of Twente, Faculty of Geo-Information Science and Earth Observation (ITC), pp 205–226 Campbell JB, Wynne RH (2011) Introduction to remote sensing. Guilford Press Cochran WG (2007) Sampling techniques. Wiley Fitzpatrick-Lins K (1981) Comparison of sampling procedures and data analysis for a land-use and land-cover map. Photogram Eng Remote Sensing 47(3):343–351 Groves RM, Fowler FJ Jr, Couper MP, Lepkowski JM, Singer E, Tourangeau R (2011) Survey methodology, vol 561. Wiley, New York Jensen JR (1986) Introductory digital image processing: a remote sensing perspective. Univ. of South Carolina, Columbus Lillesand T, Kiefer RW, Chipman J (2015) Remote sensing and image interpretation. Wiley, New York Miaoulis G, Michener RD (1976) An introduction to sampling. Iowa: endall/Hunt Publishing Company, Dubuque Singh AS, Masuku MB (2014) Sampling techniques & determination of sample size in applied statistics research: an overview. Int J Econ Comm Manage 2(11):1–22
Scaling and Scale Invariance S. Lovejoy Physics, McGill University, Montreal, Canada
Definition Scaling is a scale symmetry that relates small and large scales of a system in scale-free, power law manner: without a characteristic scale. Because it is a symmetry, in systems with structures/fluctuations over wide ranges of scale, it is the simplest such relationship. In one dimension, the notion of scale is usually taken as the length of an interval, and in two or higher dimensions, the Euclidean distance is often used. However, the latter is isotropic: It is restricted to “self-similar” fractals, multifractals. Most geosystems are stratified in the vertical and have other anisotropies so that the scale notion must be broadened. “Generalized Scale Invariance” does this with two elements: a definition of the unit scale (all the unit vectors) defining the scales of the nonunit vectors with an operator that changes scale by a given scale ratio. This scale-changing operator forms a group whose generator is a scale-invariant exponent. Systems that are symmetric with respect to these scale changes may be geometric sets of points (fractals) or fields (multifractals). The relationship between scales is generally (but not necessarily) statistical and involves additional scaling relationships and corresponding scale-invariant exponents. These are often fractal dimensions, or codimensions (sets),
Scaling and Scale Invariance
or corresponding (multifractals).
dimension,
codimension
functions
Scaling Sets: Fractal Dimensions and Codimensions Geosystems typically display structures spanning huge ranges of scale in space, in time, and in space-time. The relationship between small and large, fast and slow processes is fundamental for characterizing the dynamical regimes – for example, weather, macroweather, and climate – as well as for developing corresponding mathematical and numerical models. Since scaling can be formulated as a symmetry principle, the simplest assumption about such relationships is that they are connected in a scaling manner governed by scaleinvariant exponents. In general terms, a system is scaling if there exists a “scale free” (power law) relationship (possibly deterministic, but usually statistical) between fast and slow (time) or small and large (space, space-time). If the system is a geometric set of points – such as the set of meteorological measuring stations (Lovejoy et al. 1986) – then the set is a fractal set and the number of points n in a circle radius L varies as: nðLÞ / LD
ð1Þ
where the (scale invariant) exponent D is the fractal dimension (Mandelbrot 1977). Consider instead, the density of points r(L ): rðLÞ / nðLÞ=Ld Lc ; c ¼ d D
ð2Þ
where d is the dimension of space (in this example, stations on the surface of the Earth, d ¼ 2 and empirically, D ≈ 1.75). The exponent of r characterizes the sparseness of the set; its exponent c is its fractal codimension. For geometric sets of points, it is the difference between the dimension of the embedding space and the fractal dimension of the set. More generally, codimensions characterize probabilities and therefore statistics; they are usually more useful than fractal dimensions (see below). The distinction is important for considering stochastic processes that are typically defined on infinite dimensional probability spaces so that d, D!1 even though c remains finite. Notice that whereas n(L ) and r(L ) are scaling, their exponents D and c are scale invariant; the two notions are closely linked. Scaling Functions, Fluctuations, and the Fluctuation Exponent H Geophysically interesting systems are typically not sets of points but rather scaling fields such as the rock density or temperature f(r, t) at the space-time point (r, t). (We will generally use the term “scaling fields,” but for multifractals, the more precise notion is of singular densities of multifractal
Scaling and Scale Invariance
1245
measures). In such a system, some aspect – most often a suitably defined fluctuation – Δf – has statistics whose small and large scales are related by a scale-changing operation that involves only the scale ratios: Over the scaling range, the system has no characteristic size. In one dimension – temporal series f(t) or spatial transects, f(x) – scaling can be expressed as: d
Df ðDtÞ ¼ jDt DtH
ð3Þ
where Δf is a fluctuation, Δt is the time interval over which the fluctuation occurs, H is the fluctuation exponent, and jΔt is a random variable that itself generally depends on the scale (for d transects, replace Δt by the spatial interval Δx). The sign ¼ is in the sense of random variables; this means that the random fluctuation Δf(Δt) has the same probability distribution as the random variable jΔtΔtH. In one dimension, the notion of scale can be taken to be the absolute temporal or spatial interval Δt, Δx; for processes in two or higher dimensional spaces, the definition of scale itself is nontrivial and we need Generalized Scale Invariance (GSI) discussed later. Although the general framework for defining fluctuations is wavelets, the most familiar fluctuations are either Δf(Δt) taken as a scale Δt difference: Δf(Δt) ¼ f(t) f(t Δt), or a scale Δt anomaly f 0Dt which is the average over the interval Δt of the series f 0 ¼ f f whose overall mean f has been removed. We suppress the t dependence of Δf since we consider the case where the statistics are independent of time or space: They are respectively statistically stationarity or statistically homogeneous. Physically, this is the assumption that the underlying physical processes are the same at all times and everywhere in space. Difference fluctuations are typically useful when average fluctuations increase with scale (1 > H > 0), whereas anomaly fluctuations are useful when they decrease with scale 1 < H < 0. A simple type of fluctuation that covers both ranges (i. e., 1 < H < 1) is the Haar fluctuation (Lovejoy and Schertzer 2012) that appears to be adequate for almost all geoprocesses. To see that fluctuations obey the scaling property eq. 3, consider the simplest case where j is a random variable independent of resolution. This “simple scaling” (Lamperti 1962) is respected, for example, by regular Brownian motion in which case H ¼ 1/2, ΔT(Δt) is a difference and j is a Gaussian random variable. The extensions to cases 0 < H < 1 are called “fractional Brownian motions” (fBm, Kolmogorov 1940) and to the first differences (increments) of fBm, “fractional Gaussian noises” (fGn) with 1 < H < 0 (Mandelbrot and Van Ness 1968). In fGn, Δf(Δt) must be taken as an anomaly fluctuation f 0 Dt , Haar fluctuation, or other appropriate fluctuation definition. Although Gaussian statistics are commonly assumed, in scaling processes, they are in fact exceptional. In the more usual case, at fixed scales, the “tails” (extremes) of the
probability distribution of j is also a power law. Scaling in probability space means that for large enough thresholds s, the probability of a random j exceeding s is Pr(j > s) ≈ s-qD. qD is a critical exponent since statistical moments of order q > qD diverge (the notation “Pr” indicates probabiilty, “< >” indicates statistical averaging, and the “D” subscript is because the value of the critical exponent generally depends on the dimension of space over which the process is averaged). Power law probabilities (with possibly any qD > 0) are a generic property of multifractal processes; more classically, they are also a property of stable Levy variables with Levy index 0 < α < 2 (for which qD ¼ α; the exceptional α ¼ 2 case is the Gaussian, all of whose moments converge so that qD ¼ 1). Probability distributions with power law extremes can give rise to events that are far larger than those predicted from Gaussian models, indeed they can be so strong that they have sometimes been termed “black swans” (Adapted from Taleb 2010 who originally termed such extreme events “grey swans”). Empirical values of qD in the range 3–7 for geofields ranging from the wind, precipitation, temperature, and seismicity have been reported in dozens of papers, (many are reviewed in ch.5 of Lovejoy and Schertzer 2013 that also includes the theory). To see why Eq. 3 has the scaling property, consider the simplest case where the fluctuations in a temporal series f(t) follow eq. 1, but with j a random variable independent of d resolution: jDt ¼ j. If we denote l > 1 as a scale ratio, it is easy to see that the statistics of large-scale (Δf(lΔt)) and small-scale (Δf(Δt)) fluctuations are related by a power law: d
Df ðlDtÞ ¼ lH Df ðDtÞ
ð4Þ
Equation 4 directly shows the scaling property relating the fluctuations differing by a scale ratio l. Since the scaling exponent H is the same at all scales, it is scale invariant. Here and below, we treat time series and spatial sections (transects) without distinction, their scales are simply absolute differences in time Δt, or in space Δx. This ignores the sometimes important difference that – due to causality – the sign of Δt (i.e., whether the interval is forward or backward in time) can be important; whereas in space one usually assumes left-right symmetry (the sign of Δx is not important), here we treat both as absolute intervals (see Marsan et al. 1996). Equation 4 relates the probabilities of small and large fluctuations; it is usually easier to deal with the deterministic equalities that follow by taking qth order statistical moments of Eq. 3 and then averaging over a statistical ensemble (indicated by “< >”): hðDf ðDtÞÞq i ¼ jqDt DtqH
ð5Þ
The equality is now a usual, deterministic one. Eq. 5 is the general case where the resolution Δt is important for the
S
1246
Scaling and Scale Invariance
statistics of j. In general, jΔt is a random function or field averaged at resolution Δt; if it is scaling, its statistics will follow: jql
¼l
K ðqÞ
;
ð6Þ
l ¼ t=Dt 1
where t is the “outer” (largest) scale of the scaling regime satisfied by the equation, and l is a convenient dimensionless scale ratio. K(q) is a convex (K00 (q) 0) exponent function; since the mean fluctuation is independent of scale (hjli ¼ lK(1) ¼ const), we have K(1) ¼ 0. Equation 6 describes the statistical behavior of cascade processes; these are the generic multifractal processes. Since large q moments are dominated by large, extreme values, and small q moments by common, typical values, K(q) 6¼ 0 implies a different scaling exponent for each statistical moment q: “multiscaling.” Fluctuations of various amplitudes change scale with different exponents, and each realization of the process is a multifractal: The values that exceed fixed thresholds are fractal sets. Increasing the threshold defines sparser and sparser exceedance sets (see the “spikes” in Figs. 1a and 2); the fractal dimensions of these sets decrease as the threshold is increased; this is discussed further in the next section. In general, K(q) 6¼ 0 is associated with intermittency, a topic we treat in more detail in Sect. 2. Structure Functions and Spectra Returning to the characterization of scaling series and transects via their fluctuations, we can combine eqs. 5, 6, to obtain:
xðqÞ ¼ qH K ðqÞ
ð7Þ
where Sq is the qth order (“generalized”) structure function and x(q) is its exponent. The structure functions are scaling since the small and large scales are related by a power law: Sq ðlDtÞ ¼ lxðqÞ Sq ðDtÞ
ð8Þ
When K(q) 6¼ 0, different moments change differently with scale so that, for example, the root mean square fluctuation has statistical moment x(2)/2 which can readily be smaller than the exponent of the first-order moment: x (1) x(2)/ 2 ¼ K(2)/2 > 0 (since K(1) ¼ 0 and for all q, K00 (q) > 0). This is shown graphically in Fig. 1a, b. A useful way of quantifying this deviation is the derivative at the mean: C1 ¼ K0 (1) where C1 0 is the codimension of the mean fluctuation (see below). Typical values of C1 are ≈ 0.05–0.15 for turbulent quantities (wind, temperature, humidity, and pressure (Lovejoy and Schertzer 2013) as well as topography, susceptibility, and rock density (Lovejoy and Schertzer 2007). It is significant that the Martian atmosphere has nearly identical exponents (H, C1) to Earth (wind, temperature, and pressure, (Chen et al. 2016), and Martian topography is similarly close to Earth’s (Landais et al. 2019). Precipitation and seismicity are notably much more extreme with C1 ≈ 0.4 (Lovejoy et al. 2012), ≈1.3 (Hooge et al. 1994). We could note that whereas the moments are “scaling,” the exponent functions such as K(q), x(q), etc. are “scale invariant.”
Low intermi ency (low spikiness)
Temperature changes (oC)
Scaling and Scale Invariance, Fig. 1a A comparison of temporal and spatial macroweather series at 2 resolution. The top are the absolute first differences of a temperature time series at monthly resolution (from 80 E, 10 N, 1880–1996, annual cycle removed, displaced by 4C for clarity), and the bottom is the series of absolute first differences (“spikes”) of a spatial latitudinal transect (annually averaged, 1990 from 60 N) as a function of longitude. One can see that while the top is noisy, it is not very “spikey”
Sq ðDtÞ ¼ hðDf ðDtÞÞq i ¼ jqDt DtqH / DtxðqÞ ;
1880
Time
High intermi ency (high spikiness)
0o
Space
360o
1996
Scaling and Scale Invariance
1247
Scaling and Scale Invariance, Fig. 1b The first order and RMS Haar fluctuations of the series transect in Fig. 1a. One can see that in the spikey transect (space, top), the first order and RMS statistics converge at large lags (Δx), and the rate of the converge is quantified by the intermittency parameter C1. The time series (bottom) is less spikey, and it converges very little and has low C1 (see Fig. 1a, top). The break in the scaling at ≈ 20 years is due to the dominance of anthropogenic effects at longer time scales. Quantitatively, the intermittency parameters near the mean are C1 ≈ 0.12 (space), C1 ≈ 0.01 (time)
Equations 7 and 8 are the statistics of (real space) fluctuations; however, it is common to analyze data with Fourier techniques. These yield the power spectrum: EðoÞ /
f ðoÞ
2
ð9Þ
where f ðoÞ is the Fourier Transform of f(t), and o is the frequency. Due to “Tauberian theorems” (e.g., Feller 1971), power laws in real space are transformed into power laws in Fourier space, hence for scaling processes: EðoÞ ob
ð10Þ
where β is the spectral exponent. Due to the WienerKhintchin theorem, the spectrum is the Fourier transform of the autocorrelation function, itself closely related to S2(Δt). We therefore obtain the general relation: b ¼ 1 þ xð2Þ ¼ 1 þ 2H K ð2Þ
ð11Þ
(Technical note: Scaling is generally only followed over a finite range of scales, and unless there are cut-offs, there are generally low- or high-frequency divergences. However, if the scaling holds over wide enough ranges, then scaling in real space does imply scaling of the spectrum and vice versa, and if the scaling holds over a wide enough range, then eq. 11 relates their exponents).
Returning now to the case of simple scaling, we find hjqi ¼ Bq where Bq is a constant independent of scale (Δt); hence we have K(q) ¼ 0 and Sq(Δt) / ΔtqH so that: xðqÞ ¼ qH
ð12Þ
i.e., x(q) is a linear function of q. This “linear scaling” with β ¼ 1 þ 2H is also sometimes called “simple scaling.” Linear scaling arises from scaling linear transformations of noises; the general linear scaling transformation is a power law filter (multiplication of the Fourier Transform by o-H) which is equivalently a fractional integral (H > 0) or fractional derivative (H < 0). Fractional integrals of order H þ 1/2 of Gaussian white noises yield fBm (1 > H > 0) and fGn (1 < H < 0). The analogous Levy motions and noises are obtained by the filtering of independent Levy noises (in this case, x(q) is only linear for q < α < 2; for q > α, the moments diverge so that both x(q) and Sq ! 1). The more general “nonlinear scaling” case where K(q) is nonzero is associated with fractional integrals or derivatives of order H of scaling multiplicative (not additive) random processes (cascades, multifractals). These fractional integrals (H > 0) or derivatives (H < 0) filter the Fourier Transform of j by o-H; this adds the extra term qH in the structure function exponent: x(q) ¼ qH - K(q). In the literature, the notation “H” is not used consistently. It was introduced in honor of Edwin Hurst a pioneer of long memory processes sometimes called “Hurst phenomena” (Hurst 1951). Hurst introduced the rescaled range exponent notably in the study of Nile river streamflow records. To
S
1248
explain Hurst’s findings, Mandelbrot and Van Ness [1968] developed Gaussian scaling models (fGn, fBm) and introduced the symbol “H.” At first, this represented a “Hurst exponent,” and they showed that for fGn processes, it was equal to Hurst’s exponent. However, by the time of the landmark “Fractal Geometry of Nature” (Mandelbrot 1982), the usage was shifting from a scaling exponent to a model specification: the “Hurst parameter.” In this new usage, the symbol H was used for both fGn and its integral fBm, even though the fBm-scaling exponent is larger by one. To avoid confusion, we will call it HM. Subsequently, a mathematical literature has developed using HM with 0 < HM < 1 to parametrize both the process (fGn) and its increments (fGn). However, also in the early 1980s (Grassberger and Procaccia 1983; Hentschel and Procaccia 1983; Schertzer and Lovejoy 1983), much more general scaling processes with an infinite hierarchy of exponents – multifractals – were discovered clearly showing that a single exponent was not enough. Schertzer and Lovejoy (1987) showed that it was nevertheless possible to keep H in the role of a mean fluctuation exponent (originally termed a cascade “conservation exponent”). This is the sense of the H exponent discussed here. As described above, using appropriate definitions of fluctuations (i.e., by the use of an appropriate wavelet), H can take on any real value. When the definition is applied to fBm, it yields the standard fBm value H ¼ HM, yet when applied to fGn, it yields H ¼ HM-1.
Intermittency Spikes Often, scaling systems are modeled by linear stochastic processes, leading to linear exponent x(q) ¼ qH, i.e., K(q) is neglected; the processes are sometimes called “monofractal processes,” the most common examples being fBm, fGn. Nonzero K(q) is associated with the physical phenomenon of “spikiness” or intermittency: Compare the spatial spiky and temporal nonspiky series in Fig. 1a and their structure functions in Fig. 1b. Classically, intermittency was first identified in laboratory flows as “spottiness” (Batchelor and Townsend 1949), in the atmosphere by the concentration of most of atmospheric fluxes in tiny, sparse (fractal) regions. In solid Earth geophysics, fields such as concentrations of ore are similarly sparse and scaling, a fact that notably prompted (de Wijs 1951) to independently develop a multiplicative (cascade) model for the distribution of ore. In the 1980s, de Wijs’ model was rediscovered in statistical physics as the multifractal “p model” (see also Agterberg 2005). Early quantitative intermittency definitions were developed originally for fields (space). These are of the “on-off” type: When the temperature, wind, or other field exceeds a threshold, then it is “on,” i.e., in a special state – perhaps of
Scaling and Scale Invariance
strong/violent activity. At a specific measurement resolution, the on-off intermittency can be defined as the fraction of space that the field is “on” (where it exceeds the threshold). In a scaling system, for any threshold the “on” region will be a fractal set (sparseness characterized by c, eq. 2) and threshold by exponent γ. The resulting function c(γ) describes the intermittency over all scales and all intensities (thresholds) and is related to K(q) as described below. In scaling time series, the same intermittency definition applies (note that other definitions are sometimes used in deterministic chaos). Sometimes, the intermittency is obvious, but sometimes it is hidden and underestimated or overlooked; let us discuss some examples. In order to vividly display the nonclassical intermittency, it suffices to introduce a seemingly trivial tool – a “spike plot.” A spike plot is the series (or transect) of the absolute first differences Df normalized by their means, Df =Df (for a series f ). Fig. 2 shows examples from the main atmospheric scaling regimes (temperatures, the middle figure corresponds to the macroweather data analyzed in Fig. 1a, 1b). Fig. 3 shows examples from solid earth geophysics. The resolutions have been chosen to be within the corresponding scaling regimes. In Fig. 2, one immediately notices that with a single exception – macroweather in time – all of the regimes are highly “spikey,” exceeding the maximum expected for Gaussian processes by a large margin (the solid horizontal line). Indeed, for the five other plots, the maxima corresponds to (Gaussian) probabilities p < 109 (the top dashed line), and four of the six to p < 1020. The exception – macroweather in time – is the only process that is close to Gaussian behavior, but even macroweather is highly non-Gaussian in space (bottom, middle). In Fig. 3, we show corresponding examples from the KTB borehole of density and susceptibility of rocks. For the susceptibility, the intermittency is so extreme as to be noticeable in the original series (upper right), but less so for the rock density (upper left). In both cases, the spike plots (bottom row) are strongly non-Gaussian with extremes corresponding to Gaussian probabilities of 1015, 10162, respectively. Codimensions and Singularities In order to understand the spike plots, recall that if a process is scaling, we have eq. 1 where jΔt is the (normalized) flux driving the process. The normalized spikes DT=DT can thus be interpreted as estimates of the nondimensional, normalized driving fluxes: DT ðDtÞ=DT ðDtÞ ¼ jDt,un =jDt,un ¼ jDt
ð13Þ
(where jΔt,un is the raw, unnormalized flux, the overbar indicates an average over the series). The interpretation in terms of fluxes comes from turbulence theory and is routinely used to quantify turbulence. In the weather regime in
Scaling and Scale Invariance
1249
' T 14 / 'T
Time
14
14
Weather
12
Macroweather
12
8
8
8 6
6
6
4
4
4
2
2
2
0
0
50 100 150 200 250 300 350
50 100 150 200 250 300 350
' T / 'T
14
14
12
12
12
10
10
10
8
8
8
6
6
6
4
4
4
2
2
2
Space
14
0
50 100 150 200 250 300 350
0
50 100 150 200 250 300 350
Scaling and Scale Invariance, Fig. 2 Temperature spike plots for weather, macroweather, climate (left to right), and time and space (top and bottom rows). The solid horizontal black line indicates the expected maximum for a Gaussian process with the same number of points (360 for each with the exception of the lower right which had only 180 points); the dashed lines are the corresponding probability levels p ¼ 106, p ¼ 109 for Gaussian processes; and two of the spikes exceed
Scaling and Scale Invariance, Fig. 3 Density (left), magnetic susceptibility (right), nondimensionalized by their means (top), and corresponding spike plots (normalized absolute first differences, bottom). The data are from the first 2795 points of the KTB borehole, each point at a 2 m interval; the horizontal axis indicates the number of points from the surface. In the spike plots, the solid line indicates the maximum expected for a Gaussian process, the dashed lines corresponding to (Gaussian) probability levels of 109, 1012. The extreme spikes correspond to Gaussian probabilities of ≈1015, 10162, respectively
Climate
10
10
10
12
0
50 100 150 200 250 300 350
0
50 100 150 200 250 300 350
14; p < 1077. The upper left is Montreal at 1 hour resolution; upper middle Montreal at 4 month resolution; upper right, paleotemperatures from Greenland ice cores (GRIP) at 240 year resolution; lower left aircraft at 280 m resolution; lower middle one monthly resolution temperatures at 1o resolution in space; lower right 140 year resolution in time, 2o in space at 45oN. Reproduced from [Lovejoy, 2018]
S
1250
Scaling and Scale Invariance
respectively time and space, the squares and cubes of the wind spikes are estimates of the turbulent energy fluxes. This spikiness is because most of the dynamically important events are sparse, hierarchically clustered, occurring mostly in storms and the center of storms. The spikes visually display the intermittent nature of the process. As long as H < 1 (true for nearly all geo-processes), the differencing that yields the spikes acts as a high pass filter, and the spikes are dominated by the high frequencies. The Gaussian fGn, fBm processes – or nonscaling processes such as autoregressive and moving average processes (and the hybrid fractional versions of these) – will have spikes that look like the macroweather spikes: In Figs. 2 and 3, they will be roughly bounded by the (Gaussian) solid horizontal lines. To go beyond Gaussians to the general multifractal scaling case, each spike is considered to be a singularity of order γ: lg ¼ jDf j=jDf j
ð14Þ
l is the scale ratio: l ¼ (the length of the series) / (the resolution of the series) ¼ the number of pixels; in Fig. 3, l ¼ 2795. The most extreme spike (jDf j=jDf j) therefore has a probability ≈1/2795. For Gaussian processes, spikes with this probability have jDf j=jDf j ¼ 4.47; this is shown by the solid lines in Fig. 3; the line therefore corresponds to g ¼ log jDf j=jDf j = logl log 4:47= log 2795 0:19 . For comparison, the actual maxima from the spike plots are γmax ¼ log(34.5)/log(2795) ¼ 0.45 and log(11.6)/ log(2795) ¼ 0.30 (susceptibility, density, respectively). These values are close to those predicted by multifractal models for these processes (for more details, see Lovejoy 2018). To understand the spike probabilities, recall that the statistics of jl were defined above by the moments, and K(q). However, this is equivalent to specifying them via the corresponding (multiscaling) probability distributions: Prðjl > sÞ lcðgÞ ;
g¼
log s logl
ð15Þ
where “≈” indicates equality to within an unimportant prefactor and c(γ) is the codimension function that specifies how the probabilities change with scale (Eq. 2 with l / L1 and Pr / r). In scaling systems, the relationship between probabilities and moments is between the corresponding scaling exponents c(γ), K(q). It turns out that it is given by the simple Legendre transformation: cðgÞ ¼ max ðqg K ðqÞÞ q
K ðqÞ ¼ max ðqg cðgÞÞ
ð16Þ
g
(Parisi and Frisch 1985). The Legendre relations are generally well behaved since K and c are convex, although
discontinuities in the first- and second-order derivatives can occur notably due to the divergence of high-order moments q > qD (“multifractal phase transitions”). The Legendre transformation also allows us to interpret c(γ) as the fractal codimension of the set of points characterized by the singular behavior lγ. Returning to the spike plots, using Eq. 15 and the estimate Eq. 14 for the singularities, we can write the probability distribution of the spikes as: Pr jDf j=jDf j > s lcðgÞ ;
g¼
log s logl
ð17Þ
Pr jDf j=jDf j > s is the probability that a randomly chosen spike jDf j=jDf j exceeds a fixed threshold s (it is equal to one minus the more usual cumulative distribution function). c(γ) characterizes sparseness because it quantifies how the probabilities of spikes of different amplitudes change with resolution l (for example, when they are smoothed out by averaging). The larger c(γ), the sparser the set of spikes that exceed the threshold s ¼ lγ. A series is intermittent whenever it has spikes with c > 0. Gaussian series are not intermittent since c(γ) ¼ 0 for all the spikes. If there were no constraints on the system beyond scaling, an empirical specification would require the (unmanageable) determination of the entire scale-invariant functions c(γ) or K(q): the equivalent of an infinite number of parameters. Fortunately, it turns out that stable, attractive “universal” multifractal processes exist that require only two parameters: K ð qÞ ¼
C1 ðqa qÞ ða 1Þ
ð18Þ
where 0 α 2 is the Levy index that characterizes the degree of multifractality (Schertzer and Lovejoy 1987). Notice that for any α, the intermittency parameter C1 ¼ K0 (1) and in addition α ¼ K00 (1)/K0 (1). Together, α and C1 thus characterize the tangent and curvature of K near the mean (q ¼ 1). For the universal multifractal probability exponent, the Legendre transformation of eq. 18 yields: cðgÞ ¼ C1
g 1 þ C1 a0 a
a0
ð19Þ
where the auxiliary variable α’ is defined by: 1/α0 þ 1/α ¼ 1. This means that processes, whose moments have exponents K(q) given by eq. 7, have probabilities with exponents c(γ) given by eq. 15. Empirically, most geofields have α in the range 1.5–1.9, not far from the “log-normal” multifractal (α ¼ 2). The c(γ), K(q) codimension multifractal formalism is appropriate for stochastic multifractals (Schertzer and
Scaling and Scale Invariance
Lovejoy 1987). Another commonly used multifractal formalism (often used in solid earth geophysics (Cheng and Agterberg 1996) is the dimension formalism of (Halsey et al. 1986) that was developed for characterizing phase spaces in deterministic chaos with notation fd(αd), αd, td (q). In a finite dimensional space of dimension d, fd(αd) ¼ d - c(γ) where αd ¼ d - γ, and td(q) ¼ (q-1)d - K(q) where the subscript “d” (the dimension of the space) has been added to emphasize that unlike c, γ, K, in the dimension formalism, the basic quantities depend on both the statistics as well as d. For example, whereas c, γ, and K are the same for 1-D transects, 2-D sections, or 3-D spaces, fd, αd, and td are different for each subspace.
Scaling in Two or Higher Dimensional Spaces: Generalized Scale Invariance Scale Functions and Scale-Changing Operators: From Self-Similarity to Self-Affinity If we only consider scaling in 1-D (series and transects), the notion of scale itself can be taken simply as an interval (space, Δx) or lag (time, Δt), and large scales are simply obtained from small ones by multiplying by their scale ratio l. More general geoscience series and transects are only 1-D subspaces of (r, t) space-time geoprocesses. In both the atmosphere and the solid earth, an obvious issue arises due to vertical/horizontal stratification: We do not expect the scaling relationship between small and large to be the same in horizontal and in vertical directions. Unsurprisingly, the corresponding transects generally have different exponents (x(q), H, K(q), c(γ)). In general, the degree of stratification of structures systematically changes with scale. To deal with this fundamental issue, we need an anisotropic definition of the notion of scale itself. The simplest scaling stratification is called “self-affinity”: The squashing is along orthogonal directions whose directions are the same everywhere in space, for example, along the x and z axes in an x-z space, e.g., a vertical section of the atmosphere or solid earth. More generally, even horizontal sections will not be self-similar: As the scale changes, structures will be both squashed and rotated with scale. A final complication is that the anisotropy can depend not only on scale but also on position. Both cases can be dealt with by using the formalism of Generalized Scale Invariance (GSI; Schertzer and Lovejoy 1985b), corresponding respectively to linear (scale only) and nonlinear GSI (scale and position) (Lovejoy and Schertzer 2013, ch. 7; Lovejoy 2019, ch. 3). The problem is to define the notion of scale in a system where there is no characteristic size. Often, the simplest (but usually unrealistic) “self-similar” system is simply assumed without question: The notion of scale is taken to be isotropic. In this case, it is sufficient to define the scale of a
1251
vector r by the usual vector norm (the length of the vector r 1=2 denoted by jr j ¼ ðx2 þ z2 Þ . jr jsatisfies the following elementary scaling rule: l1 r ¼ l1 jr j
ð20Þ
where again, l is a scale ratio. When l >1, this equation says that the scale (here, length) of the reduced, shrunken vector l1r is simply reduced by the factor l1, a statement that holds for any orientation of r. To generalize this, we introduce a more general scale function kr k as well as a more general scale-changing operator Tl; together they satisfy the analogous equation: kT l r k ¼ l1 kr k
ð21Þ
For the system to be scaling, a reduction by scale ratio l1 followed by a reduction l2 should be equal to first reduction by l2 and then by l1, and both should be equivalent to a single reduction by factor l ¼ l1 l2. The scale-changing operator therefore satisfies group properties, so Tl is a one-parameter Lie group with generator G: T l ¼ lG
ð22Þ
When G is the identity operator (I), then T l r ¼ lI r ¼ l1 Ir ¼ l1 r so that the scale reduction is the same in all directions (an isotropic reduction): l1 r ¼ l1 kr k. However, a scale function that is symmetric with respect to such isotropic changes is not necessarily equal to the usual norm jr j since the vectors with unit scale (i.e., those that satisfy kr k ¼ 1) may be any (nonconstant, hence anisotropic) function of the polar angle – they are not necessarily circles (2D) or spheres (3D). Indeed, in order to complete the scale function definition, we must specify all the vectors whose scale is unity – the “unit ball.” If in addition to G ¼ I, the unit scale is a circle (sphere), then the two conditions imply kr k ¼ jr j and we recover Eq. 20. In the more general – but still linear case where G is a linear operator (a matrix) – Tl depends on scale but is independent of location; more generally – nonlinear GSI – G also depends on location and scale; Figs. 4, 5, 6a, 6b and 7 give some examples of scale functions, and Figs. 8a, 8b, 9 and 10 show some of the corresponding multifractal cloud, magnetization, and topography simulations. Scaling Stratification in the Earth and Atmosphere GSI is exploited in modeling and analyzing the earth’s density and magnetic susceptibility fields (the review Lovejoy and Schertzer 2007), and in many atmospheric fields (wind, temperature, humidity, precipitation, cloud density, and aerosol concentrations; see the monograph Lovejoy and Schertzer 2013).
S
1252
Scaling and Scale Invariance
Scaling and Scale Invariance, Fig. 4 A series of ellipses each separated by a factor of 1.26 in scale, red indicating the unit scale (here, a circle, thick lines). Upper left to lower right, Hz increasing from 2/5, 3/5, 4/5 (top), 1, 6/5, 7/5 (bottom, left to right). Note that when Hz > 1, the stratification at large scales is in the vertical rather than the horizontal direction (this is required for modeling the earth’s geological strata). Reproduced from Lovejoy (2019)
Scaling and Scale Invariance, Fig. 6a The theoretical shapes of average vertical cross sections using the empirical parameters estimated from CloudSat – derived mean parameters: Hz ¼ 5/9, with sphero-scales 1 km (top), 100 m (middle), 10 m (bottom), roughly corresponding to the geometric mean and one-standard-deviation fluctuations. In each of the three, the distance from left to right horizontally is 100 km, from top to bottom vertically is 20 km. It uses the canonical scale function. The top figure in particular shows that structures 100 km wide will be about 10 km thick whenever the sphero-scale is somewhat larger than average (Lovejoy et al. 2009)
Hz characterizes the degree of stratification (see below), and ls is the “sphero-scale,” so-called because it defines the scale at which horizontal and vertical extents of structures are equal (although they are generally not exactly circular): kðls , 0Þk ¼ kð0, ls Þk ¼ ls
ð24Þ
It can be seen by inspection that k(x, z)ksatisfies: Scaling and Scale Invariance, Fig. 5 Blowups and reductions by factors of 1.26 starting at circles (thick lines). The upper left shows the isotropic case, the upper right shows the self-affine (pure stratification case), the lower left example is stratified but along oblique directions, and the lower right example has structures that rotate continuously with scale while becoming increasingly stratified. The matrices used are: ,
–
, and
,
(upper left to
lower right). Reproduced from Lovejoy (2019)
To give the idea, we can define the “canonical” scale function for the simplest stratified system representing a vertical (x,z) section in the atmosphere or solid earth: kðx, zÞk ¼ ls
¼ ls
x z , signðzÞ ls ls x ls
2
þ
z ls
1=Hz
2=H z 1=2
ð23Þ
kT l ðx, zÞk ¼ l1 kðx, zÞk;
T l ¼ lG ;
G¼
1
0
0
Hz ð25Þ
(note that matrix exponentiation is simple only for diagonal l1 0 matrices – here T l ¼ – but when G is not 0 lHz diagonal, it can be calculated by expanding the series: lG ¼ eG log l ¼ 1 G log l þ (G log l)2/2 . . .). Notice that in this case, the ratios of the horizontal/vertical statistical exponents (i.e., x(q), H, K(q), and c(γ)) are equal to Hz. We could also note that linear transects taken any direction other than horizontal or vertical will have two scaling regimes (with a break near the sphero-scale). However, the break is spurious; it is a consequence of using the wrong notion of scale. Equipped with a scale function, the general anisotropic generalization of the 1-D scaling law (Eq. 1) may now be expressed by using the scale kDr k:
Scaling and Scale Invariance
1253
Scaling and Scale Invariance, Fig. 6b Vertical cross-section of the magnetization scale function assuming Hz ¼ 2 and a spheroscale of 40,000 km. The scale is in kilometers and the aspect ratio is ¼ (Lovejoy et al. 2001)
Scaling and Scale Invariance, Fig. 7 The unity of geoscience illustrated via a comparison of typical (average) vertical sections in the atmosphere (top), and in the solid earth (bottom), an aspect ratio of 1/5 was used. The key points are as follows: Hz < 1, ls small (atmosphere, stratification increasing at larger scales) and Hz > 1, ls large (solid earth, stratification increasing at smaller scales). The sphero-scale (ls) varies in space and in time (see Fig. 6a, 6b) d
Df ðDr Þ ¼ jkDrk kDr k
H
ð26Þ
This shows that the full scaling model or full characterization of scaling requires the specification of the notion of scale via the scale-invariant generator G and unit ball (hence the scale function), the fluctuation exponent H, as well the statistics of jkDrk specified via K(q), c(γ) or – for universal multifractals, C1, α. In many practical cases – e.g., vertical stratification – the direction of the anisotropy is fairly obvious, but in horizontal sections, where there can easily be significant rotation of structures with scale, the empirical determination of G and the scale function is a difficult, generally unsolved problem. In the atmosphere, it appears that the dynamics are dominated by Kolmogorov scaling in the horizontal (Hh ¼ 1/3) and Bolgiano-Obhukhov scaling in the vertical (Hv ¼ 3/5) so that Hz ¼ Hh/Hv ¼ 5/9 ¼ 0.555... Assuming that the horizontal directions have the same scaling, then typical structures of size LxL in the horizontal have vertical extents of LHz, hence their volumes are LDel with “elliptical dimension” Del ¼ 2 þ Hz ¼ 2.555. . .; the “23/9D model” (Schertzer and Lovejoy 1985a). This model is very close to the empirical data, and it contradicts the “standard model” that is based on isotropic symmetries and that attempts to combine a small-
scale isotropic 3D regime and a large-scale isotropic (flat) 2D regime with a transition supposedly near the atmospheric scale height of 10 km. The requisite transition has never been observed, and claims of large-scale 2D “geostrophic” turbulence have been shown to be spurious (reviewed in ch. 2 of Lovejoy and Schertzer 2013). Since Hz < 1, the atmosphere becomes increasingly stratified at large scales. In solid earth applications, it was found that Hz ≈ 2–3 (rock density, susceptibility, and hydraulic conductivity (Lovejoy and Schertzer 2007)) i.e., Hz > 1 implying on the contrary that the earth’s strata are typically more horizontally stratified at the smallest scales (Figs. 6a, 6b and 7).
Conclusions Geosystems typically involve structures spanning wide ranges of scale: from the size of the planet down to millimetric dissipations scales (the atmosphere) or micrometric scales (solid earth). Classical approaches stem from the outdated belief that complex behavior requires complicated models. The result is a complicated “scalebound” paradigm involving hierarchies of phenomenological models / mechanisms each spanning small ranges of scale. For example, conventionally, the atmosphere was already divided into synoptic, meso- and microscales when Orlanski (1975) proposed further divisions implying new mechanisms every factor of two or so in scale. Ironically, this still popular paradigm arose at the same time that scale-free numerical weather and climate models were being developed that were based on the scaling dynamical equations and now known to display scaling meteorological fields. At the same time as the scale-bound paradigm was ossifying, the “Fractal Geometry of Nature” (Mandelbrot 1977, 1982) proposed an opposite approach based on sets that respected an isotropic scaling symmetry: self-similar fractals. These have scale-free power law number-size relationships whose exponents are scale-invariant geo-applications already included topography and clouds. However, geosystems are rarely geometric sets of points but rather fields (i.e., with values, e.g., temperature, rock density) varying in space and
S
1254
Scaling and Scale Invariance, Fig. 8a Simulations of the liquid water density field. The top is a horizontal section, to the right the corresponding central horizontal cross section of the scale function. The stratification is determined by Hz ¼ 0.555 with ls ¼ 32. The bottom
Scaling and Scale Invariance, Fig. 8b The top is the visible radiation field (corresponding to Fig. 8a), looking up (sun at 45 from the right); the bottom is a side radiation field (one of the 512x64 pixel sides) (Lovejoy and Schertzer 2013)
Scaling and Scale Invariance
shows side views. There is also horizontal anisotropy with rotation. Statistical (multifractal) parameters: α ¼ 1.8, C1 ¼ 0.1, and H ¼ 0.333, on a 512x512x64 grid (Lovejoy and Schertzer 2013)
Scaling and Scale Invariance, Fig. 9 Numerical simulation of a multifractal rock magnetization field (vertical, component Mz) with parameters deduced from rock samples and near surface magnetic anomalies (Pecknold et al. 2001). The simulation was horizontally isotropic with vertical stratification specified by Hz ¼ 1.7, ls ¼ 2500 km; the region is 32x32x16 km, resolution 0.25 km. The statistical parameters specifying the horizontal multifractal statistics are H ¼ 0.2, α ¼1.98, and C1 ¼ 0.08. This is a reasonably realistic crustal section, although the spheroscale was taken to be a bit too small in order that strata may be easily visible. Notice that the structures become more stratified at smaller scales. The direction of M is assumed to be fixed in the z direction
Scaling and Scale Invariance
1255
Scaling and Scale Invariance, Fig. 10 Comparison of isotropic versus anisotropic simulations for three different scaling models. Top row shows the scale functions. From left to right, we change the type of anisotropy: The left column is self-similar (isotropic) while the middle and right columns are anisotropic and symmetric with respect to 0:8 0:5 G¼ . The middle column has unit ball circular at 0:05 1:2 1 pixel, while for the right one the unit ball is also anisotropic. The second, third, and fourth rows show the corresponding fBm (with H ¼ 0.7), the analogous fractional Levy motion (fLm α ¼ 1.8, H ¼ 0.7), and multifractal (α ¼1.8, C1 ¼ 0.12, H ¼ 0.7) simulations. We note that in the case of fBm, one mainly
perceives textures; there are no very extreme mountains or other morphologies evident. One can see that the fLm is too extreme; the shape of the singularity (particularly visible in the far right) is quite visible in the highest mountain shapes. The multifractal simulations are more realistic in that there is a more subtle hierarchy of mountains. When the contour lines of the scale functions are close, the scale kr k changes rapidly over short distances. For a given order of singularity γ, lγ will therefore be larger. This explains the strong variability depending on direction (middle bottom row) and on shape of unit ball (right bottom row). Indeed, spectral exponents will be different along the different eigenvectors of G (Lovejoy and Schertzer 2007).
in time. In addition, these fields are typically highly anisotropic notably with highly stratified vertical sections. In the 1980s, the necessary generalizations to scaling fields (multifractals) and to anisotropic scaling (Generalized Scale Invariance, GSI) were achieved.
GSI clarifies the significance of scaling in geoscience since it shows that scaling is a rather general symmetry principle: It is thus the simplest relation between scales. Just as the classical symmetries (temporal, spatial invariance, and directional invariance) are equivalent (Noether’s theorem) to conservation laws (energy, momentum, and angular momentum), the
S
1256
(nonclassical) scaling symmetry conserves the scaling functions G, K, and c. Symmetries are fundamental since they embody the simplest possible assumption, model: Under a system change, there is an invariant. In physics, initial assumptions about a system are that it respects symmetries. Symmetry breaking is only introduced on the basis of strong evidence or theoretical justification: in the case of scale symmetries, they are broken by the introduction. of characteristic space or time scales. Theoretically, scaling is a unifying geophysical principle that has already shown its realism in numerous geoapplications both on Earth and Mars proving the pertinence of anisotropic scaling models and analyses. It is unfortunate that in geoscience, scale-bound models and analyses continue to be justified on superficial phenomenological grounds. Geoapplications of scaling are therefore still in their infancy.
References Agterberg F (2005) New applications of the model of de Wisj in regional geochemistry. Math Geol. https://doi.org/10.1007/s11004-00696063-7 Batchelor GK, Townsend AA (1949) The nature of turbulent motion at large wavenumbers. Proc R Soc Lond A 199:238 Chen W, Lovejoy S, Muller JP (2016) Mars’ atmosphere: the sister planet, our statistical twin. J Geophys Res Atmos 121. https://doi. org/10.1002/2016JD025211 Cheng Q, Agterberg F (1996) Multifractal modelling and spatial statistics. Math Geol 28:1–16 de Wijs HJ (1951) Statistics of ore distribution, part I. Geol Mijnb 13: 365–375 Feller W (1971) An introduction to probability theory and its applications, vol 2. Wiley, p 669 Grassberger P, Procaccia I (1983) Measuring the strangeness of strange atractors. Physica 9D:189–208 Halsey TC, Jensen MH, Kadanoff LP, Procaccia I, Shraiman B (1986) Fractal measures and their singularities: the characterization of strange sets. Phys Rev A 33:1141–1151 Hentschel HGE, Procaccia I (1983) The infinite number of generalized dimensions of fractals and strange attractors. Physica D 8:435–444 Hooge C, Lovejoy S, Pecknold S, Malouin JF, Schertzer D (1994) Universal multifractals in seismicity. Fractals 2:445–449 Hurst HE (1951) Long-term storage capacity of reservoirs. Trans Am Soc Civ Eng 116:770–808 Kolmogorov AN (1940) Wienershe spiralen und einige andere interessante kurven in Hilbertschen Raum. Doklady Academii Nauk SSSR 26:115–118 Lamperti J (1962) Semi-stable stochastic processes. Trans Am Math Soc 104:62–78 Landais F, Schmidt F, Lovejoy S (2019) Topography of (exo)planets. MNRAS 484:787–793. https://doi.org/10.1093/mnras/sty3253 Lovejoy S (2018) The spectra, intermittency and extremes of weather, macroweather and climate. Nat Sci Rep 8:1–13. https://doi.org/10. 1038/s41598-018-30829-4 Lovejoy S (2019) Weather, macroweather and climate: our random yet predictable atmosphere. Oxford University Press, p 334 Lovejoy S, Schertzer D (2007) Scaling and multifractal fields in the solid earth and topography. Nonlin Processes Geophys 14:1–38
Scattergram Lovejoy S, Schertzer D (2012) Haar wavelets, fluctuations and structure functions: convenient choices for geophysics. Nonlinear Proc Geophys 19:1–14. https://doi.org/10.5194/npg-19-1-2012 Lovejoy S, Schertzer D (2013) The weather and climate: emergent Laws and Multifractal cascades, 496 pp. Cambridge University Press Lovejoy S, Schertzer D, Ladoy P (1986) Fractal characterisation of inhomogeneous measuring networks. Nature 319:43–44 Lovejoy S, Pecknold S, Schertzer D (2001) Stratified multifractal magnetization and surface geomagnetic fields, part 1: spectral analysis and modelling. Geophys J Inter 145:112–126 Lovejoy S, Pinel J, Schertzer D (2012) The global space-time Cascade structure of precipitation: satellites, gridded gauges and reanalyses. Adv Water Resour 45:37–50. https://doi.org/10.1016/j.advwatres. 2012.03.024 Lovejoy S, Tuck AF, Schertzer D, Hovde SJ (2009) Reinterpreting aircraft measurements in anisotropic scaling turbulence. Atmos Chem Phys 9: 1–19. Mandelbrot BB (1977) Fractals, form, chance and dimension. Freeman Mandelbrot BB (1982) The fractal geometry of nature. Freeman Mandelbrot BB, Van Ness JW (1968) Fractional Brownian motions, fractional noises and applications. SIAM Rev 10:422–450 Marsan D, Schertzer D, Lovejoy S (1996) Causal space-time multifractal processes: predictability and forecasting of rain fields. J Geophy Res 31D:26333–326346 Orlanski I (1975) A rational subdivision of scales for atmospheric processes. Bull Amer Met Soc 56:527–530 Parisi G, Frisch U (1985) A multifractal model of intermittency. In: Ghil M, Benzi R, Parisi G (eds) Turbulence and predictability in geophysical fluid dynamics and climate dynamics. North Holland, pp 84–88 Pecknold S, Lovejoy S, Schertzer D (2001) Stratified multifractal magnetization and surface geomagnetic fields, part 2: multifractal analysis and simulation. Geophys Inter J 145:127–144 Schertzer D, Lovejoy S (1983) On the dimension of atmospheric motions, paper presented at IUTAM Symp. In: On turbulence and chaotic phenomena in fluids, Kyoto, Japan Schertzer D, Lovejoy S (1985a) The dimension and intermittency of atmospheric dynamics. In: Bradbury LJS, Durst F, Launder BE, Schmidt FW, Whitelaw JH (eds) Turbulent shear flow. SpringerVerlag, pp 7–33 Schertzer D, Lovejoy S (1985b) Generalised scale invariance in turbulent phenomena. Physico-Chemical Hydrodynamics Journal 6: 623–635 Schertzer D, Lovejoy S (1987) Physical modeling and analysis of rain and clouds by anisotropic scaling of multiplicative processes. J Geophys Res 92:9693–9714 Taleb NN (2010) The Black Swan: the impact of the highly improbable. Random House, p 437pp
Scattergram Richard J. Howarth Department of Earth Sciences, University College London (UCL), London, UK
Synonyms Crossplot; Dotplot; Pointplot; Scattergraph; Scatterplot; xy-plot
Scattergram
Definition Scattergram is a contraction of scatter diagram, a term which also occurs in the literature. It is a bivariate graph in which pairs of values of two variates (x, y) are plotted as points, using coordinates based on their numerical values on two orthogonal axes in which the x-axis is conventionally horizontal and the y-axis vertical. If one variate is known to be dependent in some way on the other, the y-axis is assigned to the dependent variate. The axes are scaled with regard to a continuous range of the possible, or observed, values of a variate, and each may use either arithmetic or logarithmic scaling. Figure 1 is an early example from the geological literature. The synonyms crossplot, dotplot, pointplot, scattergraph, and scatterplot are sometimes written as two separate words which may, or may not, be hyphenated, but such forms are much less frequent; xy-plot is also occasionally used. Since the 1990s, scatterplot has become established as the most frequently used term (e.g., in Reimann et al. (2008), which, despite its general title, deals largely with regional geochemical data). The terms scattergram, scatter plot, and scattergraph were all in use by the early 1920s, crossplot was in use by 1940, and dotplot was introduced in the 1960s. Apart from “xy-plot,” hyphenation is uncommon today. Data on the annual frequency of usage of these terms were retrieved, ignoring case, from the Google Books 2012 English corpus, which comprises all the words extracted by optical character recognition from the scanned text of some eight million “books” (which include bound journal volumes) using the ngramr package of Carmody (2015). Words occurring less than 40 times in the corpus are not recorded in the publicly available database (Michel et al. 2011), so in some
Scattergram, Fig. 1 Scattergram, redrawn after Krumbein and Pettijohn (1938, Fig. 89), showing the relationship between average roundness and geometric mean size of a sample of beach pebbles
1257
cases it may be impossible to find when the actual “first” use of a given word took place. The raw frequency counts for each term were normalized relative to the average annual frequency of occurrence of a set of “neutral” English “words with little or no specific meaning,” the, of, and, in, a, is, was, not, and other, and are shown in Fig. 2 as the number of occurrences per million words, as advocated by Younes and Reips (2019). The time trends were obtained by smoothing, using robust locally weighted regression with a 7-year moving window. On account of the large variation in the frequency of usage of different terms, the time trends are shown as a semilogarithmic chart, in which an exponential increase in usage with time becomes linear (the term scatterplot exhibits this type of behavior). Early examples of usage in the Earth sciences include scatter diagram by Krumbein and Pettijohn (1938), scattergram by Burma (1948), and scatterplot by Mason and Folk (1958). Plots in which the x-axis corresponds to time or to an ordered list of abbreviations for chemical element names, etc., are usually referred to by other titles (such as a time-series plot, spider plot, and so on). Scattergrams have long been used in igneous and sedimentary petrology to show compositional trends or to assign samples to a class using one of many existing classification schemes (e.g., the (Na2O þ K2O) vs. SiO2 “total alkali silica” or “TAS diagram,” for volcanic and plutonic rocks of Cox et al. (1979); see also Janoušek et al. (2016)), for plotting isotopic ratios (e.g., 87Sr/86Sr vs. 206Pb/204Pb), and so on. In some cases, the values of an associated categorical attribute, such as geographical location or class affinity, or the value of a third continuous variate, may be shown by means of discrete symbols of different shapes, proportional symbol sizes, or colored symbols of the same or different shapes. If there is too large a number of datapoints of the same type to distinguish them individually, then the value of a third
S
1258
Scattergram
Scattergram, Fig. 2 Frequency of usage of the term scattergram and its most common synonyms (1900–2000) in the Google Books 2012 English corpus
variate may be shown using isolines. Spatial point density within a plot may be shown in a similar way. Correlation between the x- and y-variates in a scattergram is visually assessed by the closeness of fit of the datapoints to a line, or curve. If a formal fit by a function is required, then it should be accomplished using an appropriate linear or nonlinear regression algorithm. In doing so, it should be noted whether measurement errors (or similar) are associated with x, y, or both, and an appropriate regression method applied. It should also be borne in mind that correlation is inherent in the so-called “closed” percentaged and other constant-sum data. Specialized techniques for dealing with such “compositional data” exist (e.g., Buccianti et al. 2006; Tsagris et al. 2020). Authors, referees, and editors should all bear in mind that because of the expense of color-printed figures in journals (often charged to the author), some illustrations may appear in the printed, as opposed to online, version of an article using monochrome tones of gray. Care should therefore be taken to ensure that the essential information to be conveyed by an illustration, which may appear evident enough when viewed in color on a computer screen or in a color print, can also be distinguished in a monochrome version. Some “temperature” color scales used to depict the value of a third variate often use pastel colors which can be difficult to distinguish individually. A check should also be made for use of colors which those who are color-blind cannot distinguish (e.g., using https://www. color-blindness.com/coblis-color-blindness-simulator/). Ensuring clear distinction may require submission of two differently drawn versions of the same illustration for publication. Scatterplots may also be used as visual goodness-of-fit tests when the observed frequency distribution for a set of data is fitted by a theoretical model, e.g., in quantile-quantile plots, percentile-percentile plots, cumulative probability
plots, and also to find possible outliers using chi-square plots (see Reimann et al. 2008 for discussion).
Summary A scattergram has long been one of the most commonly used graphical displays in the geological literature, providing a useful visual synopsis of the relationship between two (or three) variates which the eye can easily assimilate. However, some thought is required to ensure that its design is such that its message can be most effectively conveyed in all circumstances.
Cross-References ▶ Compositional Data ▶ Correlation Coefficient ▶ Data Visualization ▶ Exploratory Data Analysis ▶ Krumbein, William Christian ▶ Locally Weighted Scatterplot Smoother ▶ Ordinary Least Squares ▶ Reduced Major Axis Regression ▶ Regression ▶ Total Alkali-Silica Diagram
Bibliography Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) (2006) Compositional data analysis in the geosciences: from theory to practice.
Schuenemeyer, John H. Geological society special publication, vol 264. The Geological Society, London Burma BH (1948) Studies in quantitative paleontology: I. some aspects of the theory and practice of quantitative invertebrate paleontology. J Paleontol 22:725–761 Carmody S (2015) Package ‘ngramr.’ https://github.com/seancarmody/ ngramr. Accessed 9 June 2020 Cox KG, Bell JD, Pankhurst RJ (1979) The interpretation of igneous rocks. Allen & Unwin, London Janoušek V, Moyen J-F, Martin H, Erban V, Farrow CM (2016) Geochemical modelling of igneous processes – principles and recipes in R language. Bringing the power of R to a geochemical community. Springer, Heidelberg Krumbein WC, Pettijohn FJ (1938) Manual of sedimentary petrography. Appleton-Century, New York Mason CC, Folk RL (1958) Differentiation of beach, dune and aeolian flat environments by size analysis, Mustang Island, Texas. J Sediment Petrol 28:211–226 Michel J-B, Shen YK, Aiden AP, Veres A, Gray MK, The Google Books Team, Pickett JP, Hoiberg D, Clancy D, Norvig P, Orwant J, Pinker S, Nowak MA, Aiden EL (2011) Quantitative analysis of culture using millions of digitized books. Science 331:176–182 Reimann C, Filzmoser P, Garrett RG, Dutter R (2008) Statistical data analysis explained. Applied environmental statistics with R. Wiley, Chichester Tsagris M, Athineou G, Alenazi A (2020) Package ‘Compositional.’ https://cran.r-project.org/web/packages/Compositional/Compositional. pdf. Accessed 14 June 2020 Younes N, Reips U-D (2019) Guideline for improving the reliability of Google Ngram studies: evidence from religious terms. PLoS One 14(3):e0213554. https://doi.org/10.1371/journal.pone.0213554. Accessed 4 Apr 2019
Schuenemeyer, John H. Donald Gautier DonGautier L.L.C., Palo Alto, CA, USA
Fig. 1 Jack Schuenemeyer (Courtesy of J. Schuenemeyer)
1259
Biography John H. (Jack) Schuenemeyer is an American mathematical statistician known for his work in mass balance stochastic analysis, discovery process modeling, probabilistic resource assessment, uncertainty evaluation, and statistical education. He is emeritus professor of mathematical sciences at the University of Delaware, an elected member of the International Statistics Institute, Fellow of the American Statistical Association, Distinguished Lecturer for the International Association of Mathematical Geosciences, and recipient of the John Cedric Griffiths Award for excellence in teaching mathematical geology. Schuenemeyer was born in St. Louis, Missouri, on October 19, 1938, to descendants of German immigrants. He grew up in the St. Louis area with a love of baseball and mathematics. Two years at Washington University showed that an engineering career was not for him, so he joined the US Air Force. While stationed at Lowry Air Force Base, near Denver, Colorado, he met his future wife, Judy, with whom he would ultimately have three children. Upon return to civilian life, he was hired by RCA in New Jersey as a computer programmer. He then returned to Denver where he worked for the US Bureau of Mines and enrolled in the University of Colorado, earning B.S. and M.S. degrees in Applied Mathematics. In 1972, he was accepted to the University of Georgia, where Prof. Ralph Bargmann supervised his doctoral work in multivariate statistics; he received his Ph. D. in 1975. The following year, Dr. Schuenemeyer joined the University of Delaware faculty, initially as an assistant professor and then as full professor of mathematics with joint appointments in Geology and Geography. At Delaware, he taught graduate and undergraduate courses in statistics and supervised numerous graduate students, while simultaneously conducting research in applied statistics. An important part of his research was a fruitful collaboration with the US Geological Survey, for whom he developed quantitative techniques for resource evaluation and provided methodological leadership in spatial analysis, nonparametric regression analysis, discovery process modeling, and dependency evaluation. In 1998, he retired from the university to work full time at the USGS as a research statistician, before moving to Cortez, Colorado, to found Southwest Statistical Consulting L.L.C. Schuenemeyer is a devoted advocate of high-quality education for all members of society, including the financially disadvantaged. For many years, he has volunteered his expertise as a member and president of his local school board in Cortez where he lives with his wife, Judy, a retired attorney.
S
1260
Schwarzacher, Walther Jennifer McKinley Geography, School of Natural and Built Environment, Queen’s University, Belfast, UK
Fig.1 Professor Walther Schwarzacher. (Taken by Jennifer McKinley, 23 April, 2015)
Biography Professor Walther Schwarzacher was born in 1925 in Graz, Austria. He completed both his undergraduate studies and his doctoral dissertation in a total of 4 years at Innsbruck University, Austria, under the supervision of the distinguished geologist Bruno Sander. Walther was awarded a British Council Scholarship to Cambridge University, where he became a member of the University Natural Science Club and participated in an expedition to Spitsbergen. In recognition of Walther’s respect for his academic supervisor, Bruno Sander, he was proud to name a glacier in Spitsbergen after him. Walther made many lifelong friends during his professional life. His advisor at Cambridge, Percival Allen, suggested that he should consider an open position in the Geology Department of Queen’s University Belfast, UK. Walther joined Queen’s University Belfast as an assistant lecturer in 1949. He was promoted to lecturer, reader, and eventually professor when he was appointed to a personal chair in mathematical geosciences in 1977. In 1967/1968, Walther was distinguished visiting lecturer in the newly formed mathematical geology section at the Geological Survey of Kansas. During this year, he applied statistical time series analysis to local Pennsylvanian limestone-shale sequences and performed experiments simulating the Kansas cyclothems. Walther also spent sabbaticals at the Christian Albrechts University Kiel.
Schwarzacher, Walther
His former head of the school at Queen’s University Belfast, Professor Julian Orford, recounts “Walther had a strong belief that geology had a story that could be explained by maths – in particular cycles and rhythms in sediment deposition and planetary motions that such cycles might be associated with.” After 65 years including 25 years as emeritus professor, Walther retired but continued to inspire many through his pioneering research on Markov processes. Walther was one of the first researchers to find evidence for Milankovitch cycles, periodic variations in the Earth’s orbit, and in the thickness of sedimentary facies. He wrote two influential books in the field of sedimentology and many international publications and chapters. The first book Sedimentation Models and Quantitative Stratigraphy (dedicated to A.B.. Vistelius and W.C. Krumbein) was published by Elsevier in 1975. In it he proposed stochastic models for sedimentary processes introducing applications of Markov chains and the use of semiMarkov processes. The second book Cyclostratigraphy and the Milankovitch Theory which was published in 1993 contains his original evidence for Milankovitch cycles in the thicknesses of coupled limestone and marl beds. Professor Walther Schwarzacher was a founding member of the International Association for Mathematical Geosciences (IAMG). From 1980 to 1986, Walther was a lecturer in the “Flying Circus” offering short courses in nine countries on four continents under the auspices of the Quantitative Stratigraphy Project of the UNESCO-sponsored International Geological Correlation Program. This included field trips in Brazil, India, and China, accompanied by his wife June. He was extremely proud to have published in the first ever edition of Mathematical Geology (now Mathematical Geosciences) the first and flagship journal of the Association. The IAMG recognized his scientific contributions through the award in 1977 of the William Christian Krumbein Medal, the second Krumbein Medal to be awarded. This is the highest award given by the Association in recognition of outstanding career achievement and distinction in the application of mathematics or informatics in the Earth sciences, service to the IAMG, and support to professions involved in the Earth sciences. This fully reflects Walther’s dedication to advancing mathematical geoscience in quantitative sedimentology and stratigraphy. I knew Walther for many years, as one of my lecturers in the former Geology Department at Queen’s University Belfast and later as a colleague in Queen’s and the IAMG. I was pleased that his role in establishing mathematical geoscience and as a founding member of the IAMG was acknowledged through recognition as an honorary IAMG member. Walther’s research continues to inspire researchers to use a quantitative cyclo-stratigraphic approach. Acknowledgments I would like to thank Walther’s wife, June, family, and friends for the information and inspirational stories for this biography.
Sedimentation and Diagenesis
Bibliography Schwarzacher W (1942) Zur Morphologie des Wallersees. Arch Hydrobiol 42:372–376 Schwarzacher W (1966) Sedimentation in subsiding basins. Nature 210(5043):1349 Schwarzacher W (1969) The use of Markov chains in the study of sedimentary cycles. J Int Assoc Math Geol 1(1):17–39 Schwarzacher W (1975) Sedimentation models and quantitative stratigraphy. Developments in Sedimentology, vol 19. 382pp, Elsevier Scientific Publishing, Amsterdam, Oxford, New York. ISBN 0 444 41302 2 Schwarzacher W (1993) Cyclostratigraphy and the Milankovitch theory, vol 52. ELSEVIER Amsterdam, London, New York, Tokyo Schwarzacher W (2007) Sedimentary cycles and stratigraphy. Stratigraphy 4(1):77–80
Sedimentation and Diagenesis Nana Kamiya1 and Weiren Lin2 1 Laboratory of Earth System Science, School of Science and Engineering, Doshisha University, Kyotanabe, Japan 2 Earth and Resource System Laboratory, Graduate School of Engineering, Kyoto University, Kyoto, Japan
Synonyms Sedimentation: Deposition, Accumulation. Diagenesis: Lithification.
1261
Diagenesis is a gradual process that constitutes the first stage of burial after sedimentation. Variations in temperature and pressure with burial change the physical and chemical properties of the sediment. Dehydration, by increasing the overburden pressure, and the dissolution of minerals caused by an increase in the contact surface, leads to decreased porosity and increased density, which is the process of lithification.
Introduction On the surface of the solid Earth, particles of the crust are mobile. Crustal surface features, such as mountains, are decomposed physically and chemically by weathering. The eroded particles are transported to the ocean and/or lake by fluids, wind, glaciers, and tides. Transported particles deposit at the bottom of the water and/or on the surface of the ground. This deposition process is called sedimentation. The sediments change their properties during burial. As the sediments are buried deeper, the sedimentary rocks notably change. On the surface of Earth’s crust, weathering, erosion, transportation, deposition, burying, and metamorphism occur in that order. In this process, deposition and lithification during burial are “sedimentation” and “diagenesis,” respectively. These two processes are complicated and interact with each other. Because sedimentation and diagenesis are related to the generation of natural disasters and natural resources, these phenomena are also relevant to human society. Therefore, understanding the processes of sedimentation and diagenesis is important.
Definition Sedimentation and diagenesis are the act and process of forming sedimentary rocks distributed on the surface of Earth’s crust. During sedimentation, fluid sediment composed of particles flows, deposits, and accumulates with varying degrees of sorting. The accumulating sediments forms layers and becomes buried. After sedimentation, the process gradually progresses to diagenesis. Conditions such as temperature, pressure, and the chemical environment change with burial, in which sediment changes into solid rock. Sedimentation is a phenomenon that occurs on the surface of Earth’s crust, in which terrigenous clastic sediments, which are particle grains from rocks formed by weathering and/or erosion sink, accumulate and pile onto the surface and/or bottoms of water bodies by fluid motion. The origin of sedimentary particles can be divided into two types: clastic and biogenic. Gravel, sand, and mud are particles derived from sedimentary, igneous, and metamorphic rocks. Biogenic sediment includes skeletal remains of mollusks, echinoderms, foraminifera, diatoms, coral, etc.
Sedimentation Processes and Sedimentary Structures When a particle becomes gravitationally unstable due to the flow of air and/or water, the particle begins to move. When the flow speed decreases or the particle reaches a gravitationally stable position, the particles stop and deposit. Terrigenous clastic sediment generated in mountainous areas are transported along rivers and are deposited in lakes and the ocean. Lakes and the ocean are stable (low potential) environments for transported sediment particles. Large-grained gravels are deposited in estuaries and coasts with strong currents. On the other hand, fine silt and clay particles are carried offshore and deposited where the flow is calm. The settling velocity of sedimentary particles depends on the density of the fluid and the particle size. In the case of quartz with a perfect spherical shape, when the particle Reynolds number is less than 1 (Re < 1), the settling velocity depends on Stokes law:
S
1262
Sedimentation and Diagenesis
v¼
ðs rÞg 2 d 18
where d, s, r, , and g are the radius of the particle, density of the particle, density of the fluid, viscosity coefficient, and gravitational acceleration, respectively. On the other hand, when the Reynolds number for the particle is more than 1 (Re > 1), the settling velocity depends on the impact law (Newton’s law): v2 ¼
4ðs rÞg d 3CD r
where CD is drag coefficient. Figure 1 shows the relationship between the radius of the sphere and the settling velocity of quartz with a perfect spherical shape. The settling velocity of quartz particles whose radius is less than 100 mm mostly follow Stokes law, which is less than 1.0 cm/s. Large particles of quartz with radii greater than 2000 mm have settling velocities closer to the impact law, although there can be some differences.
Sphere diameter in milimeters 0.01
0.1
1
10
ST OK
ES
Gibbs et al.
Sedimentary conditions such as the sedimentation rate, particle size, fluid density, etc. are recorded in sedimentary rocks as sedimentary structures (Fig. 2). Figure 2a shows the gradual distribution of particles. Large particles are distributed at the bottom and small particles are distributed at the top, which is called graded bedding. Graded bedding shows that the sorting of the particles depends on density differences. Figure 2b depicts a flame structure. When heavier sediments are layered on top of unconsolidated sediments immediately after deposition, the upper sediments sink due to their weight, creating a sedimentary structure in which the lower sediments flow upwards. In Fig. 2b, the white layer is composed of pumice, and the black layer includes scoria. The densities of pumice and scoria are generally ~1.0 g/cm3 and ~2.0 g/cm3, respectively. After the sedimentation of pumice, scoria was deposited on the pumice layer. At that time, a part of the pumice layer rose up to the surface because there was a density difference at the boundary of the two layers. When particles are deposited via fluid currents, winds and waves, the surface of sediment is called a bedform and is related to the condition of the sediments. When the current speed increases at the same depth, a bedform composed of medium sand changes from flat, to ripple, dune, transition, plane bed, standing wave, and antidune. Evidence of the current is recorded in the sediments and/or rocks as a sediment structure called a lamination. Figure 2c shows a stripe pattern indicative of lamination. This pattern would typically have formed in a slow fluid flow and indicates that the sedimentary surface had previously been rippled.
10 Settling velocity in cm/sec
Human Society and Sedimentation RUBEY ct
a mp
I
1
0.1
JANKE 0.01 10
SILT
SAND 100
GRAVEL 1000
10,000
Sphere diameter in micros
Sedimentation and Diagenesis, Fig. 1 Settling velocity versus sphere diameter of quartz particles. (After Gibbs et al. 1971)
Turbidity currents are one type of sedimentary gravity flows. Particles of sand and mud from land enter seawater, form high-density current flows, and are deposited. These current flows erode the seafloor surface. Terrestrial sediment includes organic matter such as plants and organisms, which become natural resources; therefore, turbidites play an important role in transporting organic sediments to the seafloor. On the other hand, turbidites can be generated by earthquakes (seismoturbidites) and tsunami deposits and are thus important in the study of disasters. Because seismoturbidites are evidence of past earthquake occurrences, recurrence intervals of earthquakes can be estimated by analyzing the distribution of seismoturbidites (cf. Adams 1990). Tsunami deposits are important records of large earthquakes in the past. Recently, new technology has been used in the field of sedimentology to understand the complex condition and structure of sedimentary flows. Mitra et al. (2021) reconstructed the flow conditions of tsunami deposits using a deep neural network inverse model and obtained a simulation result that was consistent with the observed values of the flow conditions, such as the
Sedimentation and Diagenesis
1263
Sedimentation and Diagenesis, Fig. 2 Sedimentary structures in rocks. (a) Graded bedding. Large particles are distributed at the bottom and small particles are distributed at the top. (b) Flame structure. The white layer is composed of pumice, and the black layer includes scoria.
A part of the pumice layer rose up to the surface because there was a density difference at the boundary of the two layers. (c) Lamination. A stripe pattern would typically have formed in a slow fluid flow and indicates that the sedimentary surface had previously been rippled
maximum inundation distance, flow velocity, and maximum flow depth. Their results indicate that the deep neural network inverse model has the potential to estimate the physical characteristics of modern tsunamis.
Representative authigenic minerals are clay minerals such as kaolinite, smectite, montmorillonite, and illite. • Organic maturation: Organic matter included in the sediments changes properties with increasing temperature and pressure due to burial.
Diagenesis
Consolidation, cementation, recrystallization, replacement, differential solution, and authigenesis make sediments harder and transform sediments into rock (lithification). On the other hand, organic maturation is the reaction of organic matter included in the sediment, which is involved in the generation process of natural resources such as natural gas and oil. The generation of natural resources from organic matter is controlled by the environment conditions, especially temperature, and pressure.
As sediment is buried deep underground, the temperature and pressure rise; therefore, its physical and chemical properties change continuously over time. In the process of diagenesis, the following phenomena occur as the process generates rock from sediments. • Consolidation: The porosity of the sediments decreases, and the density increases. The consolidation typically progresses with dehydration. • Cementation: The constituent particles become bound to each other, and the pore space is filled by minerals. Representative filled minerals are carbonate and silicate minerals. • Replacement: Primary minerals are replaced by secondary minerals. Calcareous sediment replaces dolomite during diagenesis (dolomitization). • Differential solution: Differential solutions are caused by differences in constituent minerals. The phenomenon in which one of them dissolves discriminatively due to the pressure by the contact part of minerals and/or gravel is called the pressure solution. • Recrystallization: The existing minerals grow crystals. • Authigenesis: This process is a self-sustaining action of new minerals. The generation of authigenic minerals relates to the temperature and chemical state in diagenesis.
Interaction of Sedimentation and Diagenesis Generally, diagenesis is a process that occurs after sedimentation. However, there is a reverse situation in which diagenesis generates sediments. For example, consolidation, which is one of the diagenetic processes, decreases the pore volume due to a decrease in porosity with dehydration. The progress of consolidation is governed by the difference between the overburden pressure and the pore pressure. Immediately after deposition, the permeability is high because it is in an unconsolidated state. Therefore, dehydration easily occurs and consolidation progresses. However, in the case of low permeability, for instance, due to a high sedimentation rate, pore water cannot escape to the surface, and the pore pressure increases, in which consolidation is decelerated. The generated high pore pressure reduces the stability of the ground. As a result, the seafloor collapses, which forms mass transport
S
1264
depositions (MTDs). In fact, MTDs distributed in the high pore pressure zone are confirmed. Kamiya et al. (2018) calculated the level of consolidation of formations including MTDs in the Boso Peninsula, Central Japan. As a result, the consolidation level (consolidation yield stress) of the layers including MTDs is less than that of other formations, which indicates that the high pore pressure prevented consolidation and generated mass transport. Thus, sedimentation and diagenesis can interact. The surface layer of the solid Earth is undergoing complex and diverse dynamic actions.
Self-Organizing Maps Gibbs JR, Matthews DM, Link AD (1971) The relationship between sphere size and settling velocity. J Sediment Petrol 41(1):7–18. https://doi.org/10.1306/74D721D0-2B21-11D7-8648000102C1865D Kamiya N, Utsunomiya M, Yamamoto Y, Fukuoka J, Zhang F, Lin W (2018) Formation of excess fluid pressure, sediment fluidization and ass-transport deposit in the Plio-Pleistocene Boso forearc basin, Central Japan. Geological society. London Special publications 477: 255–264. https://doi.org/10.1144/SP477.20 Mitra R, Narusee H, Fujino S (2021) Reconstruction of flow conditions from 2004 Indian Ocean tsunami deposits at the Phra thong island using a deep neural network inverse model. Natsural Hazards and Earth System Sciences 21:1677–1683. https://doi.org/10.5194/ nhess-21-1667-2021
Summary During sedimentation, different sedimentary structures are created depending on what, where, and how sediments are deposited. After deposition, the alteration process of minerals and organic matter differs depending on the environment in which they are buried. Through such a process of deposition and diagenesis, the complex surface material of the Earth is produced. In addition to research that explores past environments through observational analyses, numerical analyses based on probability theory are also being conducted. From the perspective of disaster prevention and the production of natural resources such as minerals and petroleum, research on Earth’s surface will continue to be important.
Cross-References ▶ Earth Surface Processes ▶ Earth System Science ▶ Exploration Geochemistry ▶ Flow in Porous Media ▶ Geohydrology ▶ Geomechanics ▶ Geotechnics ▶ Geothermal Energy ▶ Grain Size Analysis ▶ Mathematical Minerals ▶ Mine Planning ▶ Mineral Prospectivity Analysis ▶ Porosity ▶ Porous Medium ▶ Predictive Geologic Mapping and Mineral Exploration ▶ Turbulence
Bibliography Adams J (1990) Paleoseis,icity of the Cascadia Subduction Zone: Evidence from turbidites of the Oregon-Washington Marg Vol. 9, Issue. 4, pp. 569–583. https://doi.org/10.1029/TC009i004p00569
Self-Organizing Maps Sid-Ali Ouadfeul1, Leila Aliouane2 and Mohamed Zinelabidine Doghmane3 1 Algerian Petroleum Institute, Sonatrach, Boumerdes, Algeria 2 LABOPHT, Faculty of Hydrocarbons and Chemistry, University of Boumerdes, Boumerdes, Algeria 3 Department of Geophysics, FSTGAT, University of Science and Technology Houari Boumediene, Algiers, Algeria
Definition of the Biological and Formal Neurons In the brain, neurons are linked together through axons and dendrites; Fig. 1 is a synthetic presentation of the human neuron. As a first approach, we can consider that these links are conductors and can thus convey messages from one neuron to another. The dendrites are the inputs of a neuron and its axon are the outputs (Mejia 1992). A neuron emits an electrical signal based on coming signals from other neurons. We observe, in fact, at the level of a neuron, a summation of signals received over time and the neuron in turn emits an electrical signal when the sum exceeds certain threshold. A formal neuron is a mathematical and computational presentation of a biological neuron (McCulloch and Pitts 1943); it is a simple unit able to carrying out some elementary calculations. A large number of these units are then connected together, and an attempt is made to determine the computing power of the network thus obtained. The formal neuron is therefore a mathematical modeling that mimics the functioning of the biological neuron, in particular the summation of inputs, at a biological level, synapses do not all have the same “value” (the connections between the neurons being more or less strong). McCulloch and Pitts (1943) and Rosenblatt (1958) created an algorithm that weights the sum of its inputs by synaptic weights (weighting coefficients). In addition, the 1 s and 1 s at the input are used
Self-Organizing Maps
1265
Self-Organizing Maps, Fig. 1 Biological neuron (McCulloch and Pitts 1943)
Self-Organizing Maps, Fig. 2 Formal neuron (McCulloch and Pitts 1943)
Neuron j Input X ji Xj1 Neuron j
Wj1 Xj2
Wj2 Neuron j
Xjn-1
Output: Sj
Wjn-1 Wjn
Xjn Weights
Weighted Sum Bias
The activation function bj
there to represent an excitatory or inhibitory synapse, Fig. 2 is a graphical presentation of the formal neuron. The selforganizing map represents a specific type of artificial neural network where its main utility is to map to study and map the distribution of data in a large-dimension, thus, its application can provide many computational advantages in Geosciences even its learning process is unsupervised.
Overview of Artificial Neuron Characteristics A connectionist network is composed of neurons, which interact to give to the network its global behavior (Tian et al. 2021). In connectionist models, these neurons are elementary processors whose definition is made in analogy with nerve cells (Le Cun 1985). These basic units receive signals from outside or from other neurons in the network, they calculate a function, usually simple, of these signals and in turn send signals to one or more other neurons or to the outside. A neuron is characterized by three parameters: its state, its connections with other neurons, and its transition function (Anderson and Rosenfeld 1988). The State An artificial neuron is an element that has an internal state, in fact, it receives signals that eventually allow it to change its state (MacLeod 2021). We will denote Si the set of possible states of a neuron. It could be for example {0, 1}, where 0 will be interpreted as the inactive state and 1 the active state (Tian et al. 2021).
Self-Organizing Maps, Fig. 3 Presentation of a neuron (Mejia 1992)
The state Si of the neuron is a function of the inputs S1, . . ., Sn. The neuron produces an output, which will be transmitted to the connected neurons. To calculate the state of a neuron we must therefore consider the connections between this neuron and other neurons (Fig. 3). Connections Between Neurons or Architecture A connection is an established link between two neurons; connections are also called synapses, in analogy with the name of the connectors of real neurons. A connection between two neurons has an associated numerical value called the connection weight. The connection weight Wij between two neurons j and i can take discrete values in Z or continuous in R. The information passing through the connection will be affected by the value of the corresponding weight. A connection with a weight Wij ¼ 0 is equivalent to no connection.
S
1266
Self-Organizing Maps
The Transition Function We are interested here in neurons that calculate their state from the information they receive. We will use the following notation below: S: The set of possible states of neurons Xi: The state of a neuron i, where Xi S. Ai: The activity of neuron i Wij: The weight of the connection between neurons j and i The activity of a neuron is calculated according to the states of the neurons in its neighborhood and the weights of their connections, according to the following formula: Ai ¼
W ij Xj
Networks with Local Connections A neuron is not necessarily connected to all the neurons of the previous layer (Fig. 5) Recurrent Connection Networks We always have a layered structure, but with returns or possible connections between neurons of the same layer (Fig. 6) Networks with Complete Connections All neurons are interconnected (Fig. 7) (Karacan 2021).
ð1Þ
j
The state of a neuron i is a function of the states of neurons j, of its neighborhood, and of the weights of the connections Wij, this function is called the transition function. These elements are assembled according to certain architecture of a network. This architecture defines a composition of elementary functions that can be used in several ways during the operation of the network; this is called dynamic of a network (Mejia 1992).
Overview of Topologies of Neural Networks There are several topologies of neural networks: Multilayer Networks They are organized in layers. Each neuron generally takes as input all the neurons of the lower layer (Fig. 4). They do not have cycles or intra-class connections. We then define an input layer, an output layer, and n hidden layers (Mejia 1992).
Self-Organizing Maps, Fig. 4 Multilayer perceptron with one hidden layer
Self-Organizing Maps, Fig. 5 Neural network with connections
Self-Organizing Maps, Fig. 6 Neural network with recurrent connections
Self-Organizing Maps
1267
learns by detecting the regularities in the structure of the input patterns and produces the most satisfactory output. Reinforced Learning
It is used when the feedback on the quality of performance is provided, the desired output is not completely specified by a teacher. So learning is less directed than supervised learning. Unlike unsupervised learning where no feedback is given, the reinforced Learning Network can use the reinforcement signal to find the most needed desirable weights.
Self-Organizing Maps, Fig. 7 Neural network with complete connections
Definition of Learning and Overview of Modes One of the most interesting characteristics of neural networks is their ability to learn. Learning will allow the network to modify its internal structure (synaptic weights) to adapt to its environment (Rumelhart and Mc Clelland 1986) In the context of neural networks, a network is defined by its connection graph and the activation function of each neuron. Each choice of synaptic coefficients (weights of connection) corresponds to a system, the learning operation is the seeking of the best weights that solve a given problem. To be able to evaluate a particular system, we make a series of experiments to observe the behavior of the network. An experiment is the presentation of the input to the system; the response is collected at the output. The network is evaluated by the value of an error function; a learning problem is then to find a network that minimizes this function. Learning Modes We distinguish three learning modes: unsupervised and reinforced learning.
supervised,
Supervised Learning
In this mode, a teacher who is perfectly familiar with the desired or correct output guides the network by teaching it the right result at each step. The supervised learning consists to compare the obtained result by the artificial network with the desired output, then to adjust the weights of the connections to minimize the difference between the calculated and the desired output (MacLeod 2021). Unsupervised Learning
In the unsupervised learning, the network modifies its parameters taking into account only local information. This learning doesn’t need any predetermined desired outputs. Networks using this technique are called self-dynamic networks and are considered to be regularity detectors because the network
Concept and Construction of SelfOrganizing Maps Self-adaptive or self-organizing map is a class of artificial neural network based on unsupervised learning mode (Tian et al. 2021). It is often referred to by the English term SelfOrganizing Map (SOM), or even Teuvo Kohonen’s map named after the statistician who developed the concept in 1992. They used a map in the space to study the distribution of data in a large-dimension. In practice, this mapping can be used to perform discretization, vector quantization, or classification tasks (Silversides 2021). These intelligent data representation structures are inspired, like many other creations of artificial intelligence, from human biology (MacLeod 2021). They involve reproducing the principle of the vertebrate brain: stimuli of the same nature excite a very specific region of the brain. Neurons are organized in the cortex to interpret every kind of imaginable stimuli. In the same way, the selfadaptive map is deployed in order to represent a set of data (Wen et al. 2021), and each neuron represents a very particular group of data according to the common characteristic that bring them together. It allows a multi-dimensional visualization of crossed data. The map performs vector quantization of the data space, it means discrediting the space and dividing it into zones. A significant characteristic called the referent vector is assigned to each zone. The space V is divided into several zones and Wr presents a referent vector associated with a small area of the space Vr and M, r(2,3) presents its associated neuron in the grid A (Fig. 8). Each area can be easily addressed by the indexes of the neurons in the grid (Kohonen 1992 1998). From an architectural point of view, a Kohonen’s selforganizing maps is made up of a grid (most often onedimensional or two-dimensional). In each node of the grid there is a neuron. Each neuron is linked to a referent vector, responsible for an area in the data space (called the input space) (see Fig. 8). In a self-organizing map, the referent vectors provide a discrete representation of the input space. They are positioned in such a way that they retain the topological shape of the entry space. By keeping the neighborhood
S
1268
Self-Organizing Maps
relations in the grid, they allow an easy indexing (via the coordinates in the grid). This is useful in various fields, such as classification of textures, data interpolation, visualization of multidimensional data (Silversides 2021). Let A the rectangular neural grid of a self-organizing map. A neuron map assigns to each input vector V a neuron designated by its position vector, such that the reference vector Wr is closest to V. Mathematically, we express this association by a function (Kohonen 1992):
A rectangular grid
(1,1) r (2,3)
V rM
wr
V Space to degitize Self-Organizing Maps, Fig. 8 Self-Organizing maps architecture
Self-Organizing Maps, Fig. 9 Self-organization algorithm for the Kohonen model
’m : V ! A
ð2Þ
r ¼ fm ðvÞ ¼ arg min r A kv wr k
ð3Þ
This function allows us to define the applications of the card. For the quantifier vector, we approximate each point in the input space by the closest referent vector by: W r ¼ ’m 1 ð’w ðvÞÞ
ð4Þ
For the classifier we use the function r ¼ fw(v). Each neuron in the grid is assigned a label corresponding to a class. All the points of the entry space which are projected on the same neuron belong to the same class. The same class can be associated with several neurons. The learning algorithm is based on the concept of the winner neuron, after a random initialization of the values of each neuron, the data are submitted one by one to the selfadaptive map (Agterberg and Cheng 2022). Depending on the values of the neurons, there is one that will respond best to the stimulus (the one whose value will be closest to the data presented). Then this neuron will be rewarded with a change in value so that it responds even better to another stimulus of the same nature as the previous one. In the same way, the neurons neighboring the winner are also somewhat rewarded with a gain multiplying factor of less than one. Thus, it is the entire region of the map around the winning neuron that specializes. At the end of the algorithm, when the neurons no longer move, or very little, at each iteration, the selforganizing map covers the entire topology of the data (see Fig. 9)
Selection of Winner neuron
S
(A) Grid v
ws D ws Selection of referent neurons
D wr wr V (Space to desctize)
Self-Organizing Maps
1269
The mapping of the input space is carried out by adapting the referent vectors Wr. The adaptation is made by a learning algorithm whose power lies in the competition between neurons and in the importance given to the notion of neighborhood. A random sequence of input vectors is presented during training. With each vector, a new adaptation cycle is started. For each vector V in the sequence, we determine the winning neuron, i.e., the neuron whose referent vector approaches V as best as possible: S ¼ ’m ðvÞ ¼ arg min r A v W tr
ð5Þ
The winning neuron S and its neighbors (defined by a neighborhood membership function) move their referent vectors to the input vector W tþ1 ¼ W tr þ DW tr r
ð6Þ
DW tr ¼ e:h: v W tr
ð7Þ
With
where ε ¼ ε(t) represents the learning coefficient and h ¼ h(r, s,t) the function which defines membership in the neighborhood (Sreevalsan-Nair 2021). The learning coefficient defines the amplitude of the overall displacement of the map. The notion of neighborhood is inspired from the cortex, where neurons are connected to each other. This is the topology of the map. The shape of the map defines the neighborhoods of the neurons and therefore the links between neurons. The neighborhood function describes how the neurons in the vicinity of the winner are drawn into the corrective movement (Sreevalsan-Nair 2021). In general, we use: hðr, s, tÞ ¼ exp
! kr sk 2s2 ðtÞ
ð8Þ
Where s is called the neighborhood coefficient. Its role is to determine a neighborhood radius around the winning neuron. The neighborhood function h forces neurons in the neighborhood of S to move their referent vectors closer to the input vector V (Sreevalsan-Nair 2021). The closer a neuron is to the winner in the grid, the less its displacement is important. The correction of referent vectors is weighted by the distances in the grid. This reveals, in the input space, the order relations in the grid. During learning, the map described by the referent vectors of the network evolves from a random state to a state of stability in which it describes the topology of the input
space while respecting the order relations in the grid (Kohenen 1998).
Application to Petroleum Geosciences In petroleum exploration and production, the prediction of lithofacies is an important issue, where lithofacies is the first task to identify in petroleum reservoir characterization in order to determine layer limits of a crossed formation of a well and geological rock type using petrophysical recordings crossed a well named well-logs data with core analysis. The recuperation of core is an expensive process and not always continuous (Chen and Zhang 2021). Here, we show the efficiency of self-organizing maps to predict lithofacies from raw well-logs data of two boreholes named Well01 and Well02, located in the Algerian Sahara. Raw well-logs data are: Natural Gamma ray, Natural Gamma ray spectroscopy (Thorium, Potassium and Uranium concentration), Bulk density; Neutron porosity, Photoelectric absorption coefficient and Slowness of the P wave. Figure 10 presents these raw logs data versus the depth, only the depth interval [3410 m, 3505 m] is investigated with a sampling interval of half a foot and the obtained lithofacies using the natural Gamma ray log (see track 02). In this case we take the following criteria: 0 < Gamma ray 0
ð3Þ
Ramp Function The ramp function is mathematically defined by Eq. (4) and graphically represented by Fig. 2c. r ðtÞ ¼ t∙uðtÞ ¼
t 1
t ¼ T
uðtÞdt
t 1 < T 2 t 1 0 for > T 2 1 for
1 for t ¼ 0 0
þ1
dT ð t Þ ¼
dðt kT Þ
for t 6¼ 0
ð7Þ
k!1
T is the period of the comb. This sequence is sometimes referred to as a pulse train or a sampling function because this type of signals is mainly used in sampling. Cardinal Sinus Function This function plays an important role in the signal analysis due to its mathematical properties; it is defined mathematically by Eq. (8) and highlighted graphically by Fig. 1g
ð4Þ
sinðpt Þ pt
ð8Þ
One of the main important properties of the cardinal sinus function is expressed by þ1 1
sincðtÞdt ¼ 1,
þ1 1
sinc2 ðtÞdt ¼ 1
Frequency Representation ð5Þ
Dirac Pulse Function The Dirac function can be seen as a rectangular function with a width T that tends towards 0 and whose area is equal to 1. It is mathematically defined by Eq. (6) and graphically schematized by Fig. 2e. dðtÞ ¼
Dirac Comb Function The Dirac comb function is a periodic succession of Dirac impulses, its mathematically presentation is given by Eq. (7), and its graphically appearance is highlighted in Fig. 1f.
sincðtÞ ¼
Rectangular Function The rectangular function is mathematically defined by Eq. (5) and graphically represented by Fig. 2d. It is also called the window function because it serves as a basic windowing function.
rect
Figure 2e shows the schema of the function but doed not represent it because it contains an infinite term; the arrow in this figure indicates the function whose surface should be equal to 1. It should also be noticed that the Dirac function is the derivative of the step function dðtÞ ¼ dudtðtÞ . The importance of this function is that it is used to sample other complex signals; the passage from one state to another in a given signal is seen as a pulse (Dirac function) if the rise time is very negligible in comparison to the original signal global time (Wang et al. 2012).
ð6Þ
The signals are generally presented as a function of time because our perception of physical world is based on taking the time as the most important variable for all phenomena. However, in many cases this presentation cannot provide deep analysis especially for semi-periodic and quasiperiodic complex phenomenon in time (Benedetto 2010). Therefore, the necessity to transfer these signals into other domains rather than time became primordial. The knowledge of the spectral properties of a signal from its energy is essential (Huang et al. 2015). Thus, the frequency representation of the signal in the frequency domain can be very efficient for energy-phenomenon characterization. This transformation
Signal Analysis
1291
S Signal Analysis, Fig. 1 Most important functions for signal analysis: (a) sign function, (b) step function, (c) ramp function, (d) rectangular function, (e) Dirac pulse function, (f) Dirac comb function, (g) cardinal sinus function, (h) window function for band-pass filter
of a signal from time domain to the frequency domain has many advantages, i.e., it can help in filtering the noise from original signal since the noise is generally characterized by high-frequency dynamics in seismic data (Liu et al. 2016). The Fourier series have been firstly proposed in order to decompose the perfectly periodic signals into sinusoids so that passage can be easily (Chakraborty and Okaya 1995). Later on, the Fourier transform has been introduced as the generalization of Fourier series for all nonperiodic functions.
Fourier Series The Fourier series principle is based on decomposing the periodic signal into a sum of sinusoids. Thus, it permits to transform easily from time domain to the frequency domain but with the necessary condition that the decomposed signal should have bounded variations. Let’s consider the periodic bounded signal s(t); its Fourier series can be written as:
1292
Signal Analysis
a
b
c
Signal Analysis, Fig. 2 Examples of seismic data analysis: (a) original seismic trace responses of earth, (b) comparison between original and noise traces, (c) one trace extracted from a, (d) one trace extracted from noise data, (e) comparison between original and noised wavelet,
d
(f) Fourier transform of the wavelet in e, (g) fast Fourier transform of the wavelet in e, (h) example of filtered and stacked wavelet, (i) complete seismic trace
Signal Analysis
1293
e
f
g
Signal Analysis, Fig. 2 (continued)
S
1294
Signal Analysis
h
i
Signal Analysis, Fig. 2 (continued)
1
s ð t Þ ¼ S0 þ
½An cosðno0 tÞ þ Bn sinðno0 tÞ
ð9Þ
n¼1
with
o0 ¼ 2p T0 ,
2 T 0 ðT 0 Þ sðtÞ cosðno0 tÞdt,
and Bn ¼
S0 ¼ T10
ðT 0 Þ sðtÞdt, 2 T 0 ðT 0 Þ sðtÞ sinðno0 tÞdto0
An ¼
developed from Fourier series by supposing that all nonperiodic functions are periodic functions with a period that tends to infinity as explained in section “The Fourier Transform”.
is the
fundamental pulsation, no0 is the n rank harmonic, and S0 is the average value of the signal s(t). Other complex presentations of the Fourier series can be found in the literature too. The main limitation of the Fourier series is that it is applied only for periodic functions where the signal variables should be bounded too. For that reason, the Fourier transform is
The Fourier Transform The Fourier transform is the generalization of the Fourier series decomposition to all deterministic signals; thus, it provides the spectral presentation of all signals all over the frequency range (Huang et al. 2015). In other terms, it
Signal Analysis
1295
demonstrates the variation of the amplitude (or energy) of the signal as a function of its frequency. In order to explain that in details, let s(t) be now a deterministic signal; its Fourier transform is a complex function of the frequency, and it is given by Eq. (10). þ1
Sðf Þ ¼ FT ½sðtÞ ¼
1
sðtÞej2pft dt
ð10Þ
It has been proven that if the transform given by Eq. (10) exists, then the inverse Fourier transform can be obtained by Eq. (11). The modulus of the Fourier transform of a signal is called the spectrum. sðtÞ ¼ TF 1 ½Sðf Þ ¼
þ1 1
Sðf Þej2pft df
ð11Þ
Table 2 summarizes some of the most useful properties of Fourier transform for signal analysis in earth sciences in general and for seismic processing in particular (Robinson 1981).
Convolution and Deconvolution Principle and the Fourier Transform The convolution product of a given signal s(t) by another signal h(t) is given by Eq. (12). sðtÞ hðtÞ ¼
þ1 1
sðkÞhðt kÞdk
system response. It [a(t) b(t)] ¼ A( f ) ∙ B( f ).
is
proven
that
TF
Filtering Concept Filtering is a kind of signal analysis, where the frequency spectrum and the phase of the signal are changed based on our need; therefore, the temporal form of the signal is consequently changed (Huang et al. 2015). The changes can be either eliminating (or at least weakening) the unwanted frequencies of the noises or isolating the useful frequencies of the main signal (Li et al. 2018). Thence, an ideal filter can be described by zero attenuation in the frequency band that we want to keep and an infinite attenuation in the band that we want to eliminate. In practice, it is impossible to obtain such a perfect filter, but the best approach is to keep the attenuation lower than a given value Amax in the bandwidth and greater than Amin in the attenuated band. The bandwidth for filters of seismic data is defined based on regional studies, and they are generally between 20 and 80 Hz (Chakraborty and Okaya 1995). Based on the filter template, which defines the attenuation and the prohibited zones, four types of filers can be constructed: (1) low-pass filter, (2) high-pass filter, (3) band-pass filter, and (4) band-cut filter. By sizing the filter window, only a few analytical functions can be used, which their characteristics can be suitable for the realization of the template. Examples of such functions that will set the physical properties of the filter are Butterworth, Tchebycheff, Bessel, and Cauer.
ð12Þ
Modulation Concept
The main utility of the convolution is that it is used to extract the transfer function of systems, since, the output signal of convolution of the input direct impulse function with the transfer function of the system is the transfer function itself; this is called the impulse response of the system. The value of the output signal at time t is thus obtained by summing the past values of the excitation signals, weighted by
The principle of modulating a signal is mainly used for the signal transmission; it is based on adapting the signal to be transmitted through the transmission channel, wherein modulation of the signal amplitude necessitates a frequency transposition of signal; it is achieved by a simple multiplication, where the necessary bandwidth to transmit a signal of f frequency is 2f. In order to retrieve the original signal, demodulation process should be involved.
Signal Analysis, Table 2 Properties of the Fourier transform Domain Linearity Translation Inflection Derivation
s(t) a ∙ s(t) þ b ∙ r(t) s(t t0) eþ2jpf 0 t sðtÞ s(t) dn sðtÞ dtn
S( f ) a ∙ S( f ) þ b ∙ R( f ) e2jpf t0 Sðf Þ S( f f0) S(f ) (2jπf)nS( f )
Dilatation
s(a ∙ t) with a 6¼ 0
f 1 jaj S a
Convolution
s(t) r(t) s(t) ∙ r(t) S(t)
S( f ) ∙ R( f ) S( f ) R( f ) s(f )
Duality
Digitization Since the outside world is analog by default, the necessity to analog-to-digital conversion is growing up due the computational limitation of the digital world; the analog conversion to digital is essentially based on three main steps. The first step is the sampling process in order to convert the analog signal into discrete; then the second step is the quantization, which is the association an amplitude value for each time step; and the last step is coding, where each value is associated to its
S
1296
Signal Analysis
appropriate code. These steps are gathered in one process named sampling-holding process as detailed in the next subsection. Sampling-Holding In practice, the analog to digital conversion is not done instantly; there should be a blocking time of the analog signal for a period of T to have an error-free conversion. The sampleholding is therefore used to achieve this step, where the blocking effect is generally modeled by a window function shifted by a period of t/2. The mathematical function of sampling and holding process is given by Eq. (13), and its graphical presentation is given by Fig. 1g. þ1
yðtÞ ¼
rect k!1
¼ rect
t 2t kT e t
þ1 t 2t dðt kT e Þ t k!1
N1
x½n ∙W nk N,
X ½k ¼
with W N ¼ e 2jpðnkþN=2Þ N
2jp N
, W 2nk N ¼e
nkþN=2
¼ W nk N=2 , and W N
¼
¼ The necessary condition to use the FFT is that the number of sample should be written as a power of 2: N ¼ 2m; then by separating the odd and the even indices, the repeated calculation operations will be separated as follows: x1[n] ¼ x[2n] and x2[n] ¼ x[2n þ 1]. By exploring this property, we finally find the FFT as given by Eq. (16). The calculation rate will be minimized from N2 to Nlog2(N ). e
X kþ
The process of sampling-and-blocking can therefore be seen as multiplying the signal s(t) by y(t). The Fourier transform of the sampled signal is therefore can be given by Eq. (14).
2jpnk N=2
W nk N
X½k ¼X1 ½k þW kN ∙X2 ½k
ð13Þ
ð15Þ
n¼0
N 2
¼X1 ½k W kN ∙X2 ½k
for 0 k
N 1 2
ð16Þ
Example from Geophysics In the example given in Fig. 2, we will show a seismic signal and some signal analysis techniques that can be applied to it in order to extract the desired information; the signal represents a seismic trace that shows a typical response (reflection) of earth rock to artificially submitted energy signal (Efi and Kumar 1994).
þ1
Se ð f Þ ¼
t sincðtf Þ∙ Sðf kf e Þ∙ejpf t Te k!1
ð14Þ
Quantification The quantization principle consists in adding to any real value x any other value xq that belongs to a finite set of values according to a certain law: rounding up, rounding up closer, etc. The gap between each value xq is called no quantization. Rounding off the start value inevitably results in a quantization error called quantization noise (Li et al. 2018). Coding Coding consists of associating a composite code with a set of discrete values of binary elements. The most famous codes: natural binary code, shifted binary code, code DCB, and gray code.
FFT: Fast Fourier Transform In order to be able to apply Fourier Transform with large data, an algorithm is used to obtain the transformation in a much faster time under pre-defined conditions. This algorithm is called the fast Fourier transform. The transformation in discrete domains is given by Eq. (15).
Conclusions In this chapter, we have briefly reviewed the theory of signal analysis from mathematical perspectives, where definitions and most important functions and their mathematical properties have been given in order to allow beginner students to construct a general idea about this topic. Moreover, examples of signal analysis applications in geophysics have been demonstrated; they will permit future earth scientists in general and geophysicists in particular to link the theoretical information delivered in this chapter with practical applications in their field of study.
Cross-References ▶ Compositional Data ▶ Fast Fourier Transform ▶ Inversion Theory. ▶ Mathematical Geosciences, ▶ Power Spectral Density ▶ Quality Control ▶ Random Variable ▶ Spatial Analysis
Signal Processing in Geosciences
▶ Spectral Analysis ▶ Time Series Analysis ▶ Wavelets in Geosciences.
Bibliography Benedetto A (2010) Water content evaluation in unsaturated soil using GPR signal analysis in the frequency domain. J Appl Geophys 71(1): 26–35. https://doi.org/10.1016/j.jappgeo.2010.03.001 Chakraborty A, Okaya D (1995) Frequency-time decomposition of seismic data using wavelet-based methods. Geophysics 60(6). https://doi.org/10.1190/1.1443922 Dimri V (2005) Fractals in geophysics and seismology: an introduction. In: Dimri VP (ed) Fractal behaviour of the earth system. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-26536-8_1 Efi F-G, Kumar P (1994) Wavelet analysis in geophysics: an introduction. Wavelet Anal Appl 4:1–43. https://doi.org/10.1016/B978-0-08052087-2.50007-4 Herrera RH, Han J, van der Baan M (2013) Applications of the synchrosqueezing transform in seismic time-frequency analysis. Geophysics 79(3). https://doi.org/10.1190/geo2013-0204.1 Li F, Zhang B, Verma S, Marfurt KJ (2018) Seismic signal denoising using thresholded variational mode decomposition. Explor Geophys 49(4). https://doi.org/10.1071/EG17004 Liu W, Cao S, Chen Y (2016) Seismic time–frequency analysis via empirical wavelet transform. IEEE Geosci Remote Sens Lett 13(1): 28–32. https://doi.org/10.1109/LGRS.2015.2493198 Robinson EA (1981) Signal processing in geophysics. In Bjørnø L (eds) Underwater acoustics and signal processing. NATO advanced study institutes series (series C — mathematical and physical sciences), vol 66. Springer, Dordrecht. https://doi.org/10.1007/978-94-0098447-9_56 Wang T, Zhang M, Yu Q, Zhang H (2012) Comparing the applications of EMD and EEMD on time–frequency analysis of seismic signal. J Appl Geophys 83:29–34. https://doi.org/10.1016/j.jappgeo.2012. 05.002. ISSN 0926-9851 Huang W, Wang R, Yuan Y, Gan S, Chen Y (2015) Signal extraction using randomized-order multichannel singular spectrum analysis. Geophysics 82(2). https://doi.org/10.1190/geo2015-0708.1
Signal Processing in Geosciences E. Chandrasekhar1 and Rizwan Ahmed Ansari2 1 Department of Earth Sciences, Indian Institute of Technology Bombay, Mumbai, India 2 Department of Electrical Engineering, Veermata Jijabai Technological Institute, Mumbai, India
Definition Geophysical signal processing is an important branch of geophysics that deals with various signals and systems. It serves as a backbone for efficient analysis and interpretation of the complex geophysical data. Concepts of signal processing are encountered at various stages, right from the data
1297
acquisition to processing to interpretation. Over the years, geophysical signal analysis techniques have steadily progressed from the use of basic integral transforms to present-day nonlinear signal processing techniques such as wavelet analysis, fractal and multifractal analysis, and empirical mode decomposition technique, which have found their forays in geosciences, paving the way for effectively unravelling the hidden information from the nonlinear and nonstationary geophysical data. The use of these novel signal analysis techniques in geophysical data analysis is always important and essential for a better understanding of the dynamics of the systems that are generating such complex geophysical data.
Introduction The word “signal” can be best defined as the time or space variation of any physical quantity. While geophysical data constitute time signals (e.g., radiometric data, ionospheric total electron content (TEC) data), and space signals (e.g., well-log data), some are also classed into spatiotemporal signals, which vary as a function of both time and space (e.g., Earth’s magnetic and gravitational field). Also, there are some geophysical signals, which are causal (e.g., seismic data, well-log data, etc.) and noncausal (e.g., Earth’s upper atmospheric data, magnetic field, gravity field, to name a few) (Causal signals are those, which do not occur before the onset of any excitation. Noncausal signals do not require any excitation to occur. They are continuously generated naturally, since time immemorial.). If the statistical properties, such as mean, variance, etc. of a signal, do not vary as a function of space/time, they are defined as stationary signals. Otherwise, they are defined as nonstationary signals. Most geophysical signals are nonstationary in nature and a very few, namely, geomagnetic solar quiet day (Sq) variations, and the TEC data, may be called as quasi-stationary, as they generally exhibit some known periodicity in their temporal behavior. Initially, while the statistical methods helped to understand the behavior of geophysical signals by facilitating to estimate the trends and cyclicities present in the data, the well-known mathematical transforms, such as the Fourier transform and Hilbert transform, to name a few, have found their role in solving the potential field problems, thereby significantly improving the understanding and interpretation of the anomalous in situ geophysical signals. However, since all the above techniques are most effective in analyzing stationary or quasistationary signals, they lack in extracting some important information, such as time-frequency localization, from the nonlinear geophysical signals. Before discussing about nonlinearity in the signals, it would be more prudent, if we understand the difference between linearity and nonlinearity in signals, in the first place. If any observational data (y) can
S
1298
be modeled in the form of, say, y ¼ mx þ c, where, x is any variable, then that data exhibits linear behavior. On the other hand, if any observational data best fits to a model having exponents, say, y ¼ xm (m 6¼ 1), then such data exhibits nonlinear behavior. Most observational geophysical data, such as gravity data, magnetic data, seismic and seismological data, well-log data, geomagnetic data, ionospheric data, and weather data, to name a few are all nonlinear in nature. Nonlinear signals exhibit a power-law behavior with time and thus to correctly understand their behavior with time, suitable nonlinear mathematical techniques need to be employed. Basically, the aim is to correctly understand the dynamics of the nonlinear system that is generating such nonlinear signals. For example, the seismological data generated at different places on the Earth are different, which are a result of different subsurface dynamics associated with them (Telesca et al. 2004). Therefore, to correctly understand such nonlinear dynamics of the subsurface, responsible for generating such diverse signals, we must employ nonlinear signal analysis techniques. With the advent of wavelet analysis and other nonlinear and data-adaptive techniques, such as fractal and multifractal analyses and empirical mode decomposition technique, significant improvements have been made in the analysis of geophysical signals. In this chapter, we first discuss the theory behind the basic integral transforms (mentioned here, as linear signal processing techniques) and their applications in geophysics. We next discuss the theory and geophysical application of continuous and discrete wavelet transform, the fractal and multifractal analysis techniques, and the fully data-adaptive empirical mode decomposition (EMD) technique. Finally, we discuss how the EMD technique facilitates to quantitatively estimate the degree of subsurface heterogeneity, using well-log data of two different wells. For all the above techniques, adequate references are provided for the benefit of the reader to further explore and understand the geophysical applications of all these techniques.
Linear Signal Processing Techniques in Geophysics In this section, we briefly define the basic integral transforms like the Fourier transform, the Z-transform, and the Hilbert transform and discuss their applications in geophysical signal analysis. Although there are many other integral transforms in the literature, it is not possible to discuss all of them here, and thus we stick only to the well-known above three transforms, which have seen many applications in geophysics.
Signal Processing in Geosciences
The Fourier Transform If a time-varying signal is denoted as f(t), then its Fourier transform, F(o), is mathematically expressed as 1
f ðtÞeiot dt
FðoÞ ¼
ð1Þ
1
Equation (1) explains that the time signal treated with a complex exponential function and summed up over the entire time signal will result in the frequency content of that signal (If the kernel is a negative exponential function in the Fourier transform, it should be positive in the inverse Fourier transform and vice versa.). The greatness of this transformation lies in the fact that the Fourier transform is the only efficient technique that can delineate all the frequencies present in the signal in the entire frequency range 1 to þ 1. F(o) is called the power spectrum of f(t). F(o) is a complex function, implying that the Fourier transform of a real function will give not only the amplitude spectrum (j FðoÞ j¼ AðoÞ ¼
RðoÞ2 þ I ðoÞ2 ) but also the phase
spectrum (;ðoÞ ¼ tan 1 RIððooÞÞ ), where R(o) and I(o), respectively, denote the real and imaginary parts of F(o). Accordingly, Eq. (1) can also be written as FðoÞ ¼j FðoÞ j ei;ðoÞ By applying a reverse operation, what is known as the inverse Fourier transform, it is also possible to get back the original signal, using the formula 1 f ðtÞ ¼ 2p
1
FðoÞeiot do
ð2Þ
1
All types of signals, which satisfy the Dirichlet’s conditions (see Hayes 1999), are Fourier transformable. Several properties of the Fourier transform can be found in Haykins (2006). Equation (1) can be applied to a variety of time, space, or spatiotemporal signals in geophysics to determine the parameters of the subsurface anomalous bodies, such as their depth of location, width, etc. Application of the Fourier Transform to Determine the Subsurface Anomaly Parameters
As an example of the application of the Fourier transform in geophysics, we describe below a methodology to determine the parameters of a horizontal cylinder buried in the subsurface, by analyzing the gravity effect over it, using the Fourier transform. Figure 1 shows the schematic of the horizontal cylinder, having a radius R with depth to the center of the body as h, in the cartesian coordinate system.
Signal Processing in Geosciences
1299
1 p
R¼
Gð0Þ=2gs
ð8Þ
Similarly, Eq. (3) at x ¼ 0 gives gð0Þ ¼
2pgsR2 Gð0Þ ¼ ph h
ð9Þ
Therefore, from Eq. (9), we have h¼
Signal Processing in Geosciences, Fig. 1 A schematic representation of the buried horizontal cylinder (see text for the definitions of various parameters).
The gravity anomaly over a horizontal cylinder, as a function of horizontal distance, x, is mathematically expressed as gð x Þ ¼
2pgsR2 h x 2 þ h2
ð3Þ
where γ is the universal gravitational constant, s refers to the density contrast, and x designates the distance on the surface of the earth along a profile, over which, the gravity measurements are made. From Eq. (1), the Fourier transform of Eq. (3) is given by 1 2
GðoÞ ¼ 2pgsR h 1
eiox dx 2 x þ h2
1
eiox peoh dx ¼ 2 h þh
x2
ð4Þ
ð5Þ
ð6Þ
Equation (6) at o ¼ 0 gives Gð0Þ ¼ 2p2 R2 gs From Eq. (7) we have
Equations (8) and (10) explain the role of the Fourier transform in conveniently delineating the subsurface parameters, R and h of the buried anomalous body from its gravity effect. Similar examples of delineating various other parameters of various subsurface anomalous bodies using their respective geophysical anomalous effects can be found in Odegard and Berg (1965), Sharma and Geldart (1968), and Bhimasankaram et al. (1977a, b, 1978). The Z-Transform The Z-transform notation of a discrete signal, say, [yn] ¼ y0, y1, y2, y3,. . .. yn-1, sampled at equal time intervals (denoted as subscripts) is generally expressed in the form of a polynomial in z as Y ðzÞ ¼ y0 þ y1 z þ y2 z2 þ y3 z3 þ . . . yn1 zn1
ð11Þ
Y ðzÞ ¼ yn zn . . . þ y3 z3 þ y2 z2 þ y1 z1 þ y0 þ y1 z þ y2 z2 þ y3 z3 þ . . . yn zn
ð12Þ
where the coefficients of z represent the amplitudes of the signal at different times. Both in Eqs. 11 and 12, the z can be visualized as a unit delay operator. That means, the z operated on the sample at time t is actually the value of the signal after t units of time are elapsed. Accordingly, we can define z½yn ¼ yn1 :Similarly, z2 ½yn ¼ yn2 ; z3 ½yn ¼ yn3 , etc:
Substituting Eq. (5) in Eq. (4), we get GðoÞ ¼ 2p2 R2 gseoh
ð10Þ
For noncausal signals, Eq. (11) can be written as
Since the integrand in Eq. (4) has poles at x ¼ ih, it needs to be solved using complex integration. Accordingly, considering the path of integration around x ¼ þ ih, the contour integration of Eq. (4) yields 1
Gð0Þ pgð0Þ
ð7Þ
Mathematically, the unit delay operator, z, is expressed as z ¼ eio. While the magnitude of this unit delay operator and all the powers of z is always 1, graphically, a single unit delay represents the time taken by the signal in completing one circle in the anticlockwise direction in the argand diagram, when o goes from 0 to 2π. Similarly, a two-unit delay (i.e., z2or e2io) represents the time taken by the signal in completing the path 0 to 2π twice in the anticlockwise
S
1300
Signal Processing in Geosciences
direction, and a three-unit delay (i.e., z3or e3io) represents the time taken by the signal in completing the path 0 to 2π thrice in the anticlockwise direction and so on. This way the timedelayed representation of the signal at different delayed times can be truly expressed in z-transform notation, in the form of a polynomial in z. Relation Between the Fourier Transform and Z-Transform
Equation (11) can be written as n1
Y ðzÞ ¼
y t zt
ð13Þ
1 yðtÞ ¼ 2p þ y2 e
p
ð. . . þ y2 e2io þ y1 eio þ y0 þ y1 eio p
2io
þ . . .Þeiot do ð18Þ
Invoking the Dirac delta function, the left-hand side of Eq. (18) can be written as yðtÞ ¼
yl dðt lÞ
ð19Þ
l
t¼0
If we substitute, z ¼ eio in the above equation, we get n1
Y ðoÞ ¼
yt eiot
ð14Þ
Therefore, by virtue of Eq. (17), all terms in Eq. (18) would vanish except the term with the coefficient of z with power zero, actually contributing to the integral. Thus, with the help of Eq. (19), we can write Eq. (18) as
t¼0
The analog form of Eq. (14) with extended limits, written as 1
Y ðo Þ ¼
yðtÞe dt iot
ð15Þ
1
defines the Fourier transform. An interesting question that arises as a consequence is: If the simple process of attaching powers of z to the discrete time signal defines the Z-transform, then the inverse Z-transform should amount to determining the coefficients of z. Let’s look into this. From Eq. (15), the inverse Fourier transform is written as
yðtÞ ¼
1 2p
1
Y ðoÞeiot do
ð16Þ
1
We know that the integration of eino (or zn) over the integral π o < π is nonzero for n ¼ 0 and is equal to 0 for all n 6¼ 0 i.e. 1 2p
p
e p
ino
1 do ¼ 2p ¼
p
ðcos no þ i sin noÞdo p 1 for n¼0 0 for n6¼0
ð17Þ
Therefore, accordingly, changing the limits of integration and in terms of discretized function, Eq. (16) can be written as
1 yt ¼ y 2p t
p
do ¼ yt
ð20Þ
p
This proves that the inverse Z-transform facilitates to identify the coefficients of powers of z. Z-transform has a lot of applications in designing the digital filters and in determining their stability by calculating the transfer functions and poles and zeros of digital filters (see Hayes 1999; Haykins 2006). Thus, the Z-transform has indirect application in geophysics, in the sense that they will be effectively used in the filtering and windowing operations of geophysical signals. The Hilbert Transform We earlier have seen that the Fourier transform operating on a given signal changes the time domain signal to frequency domain, thereby facilitating to determine its frequency content. The Hilbert transform on the other hand, while keeping the time signal in the same time domain, will change its phase content. In other words, Hilbert transform facilitates to characterize the signal based on its phase information. The Hilbert transform is a linear time-invariant filter, whose transfer function results in changing the phase of each frequency in the input signals by an amount of π/2 radians. Let us understand the concept of Hilbert transform further. Let us examine the spectral amplitudes of cosine and sine waves, as shown in Fig. 2. It can be seen from Fig. 2a that the spectral amplitude for cosine wave is symmetric and will appear only on the real plane for both positive and negative frequencies. In contrast, for sine wave (Fig. 2b), the spectral amplitude is asymmetric and will appear only on the imaginary plane for both positive and negative frequencies. Accordingly, the phase is +90 for the positive frequencies
Signal Processing in Geosciences
1301
Signal Processing in Geosciences, Fig. 2 Schematic representation of spectral amplitudes of (a) cosine and (b) sine waves and (c) and their rotational transformation. “A” designates the amplitude of cosine and sine waves
and 90 for the negative frequencies. From Fig. 2c, we can easily observe that if we rotate the negative frequency components of the cosine by +90 and the positive frequency components by 90 , we can transform a cosine function into a sine function. This rotational transformation is called Hilbert transformation. In other words, the Hilbert transform operation involves rotation of all negative frequency components of a signal by a þ90 phase shift and all positive frequency components by a 90 phase shift (see Fig. 2c). Thus, it is equivalent to multiplying the negative phasor by +i and the positive phasor by i in the frequency domain. It is important to note here that the Hilbert transform affects only the phase of the signal and has no effect on the amplitude. If g(t) is a real-valued time signal and gðf Þ denotes its Hilbert transform in the frequency domain, then gðf Þ is defined as
where 1/πt defines the inverse Fourier transform of the function i sgn ( f ). Eq. (23) represents the time domain form of Hilbert transform. The Hilbert transform has a lot of applications in geophysics, particularly in solving the potential field problems involving delineation of subsurface anomaly parameters from the measured gravity or magnetic field in any area (Mohan et al. 1982; Sundararajan 1983; Sundararajan and Srinivas 2010). Another Definition of Hilbert Transform
From Equation (1), we can have FðoÞ ¼ RðoÞ iI ðoÞ
FðoÞ ¼ RðoÞ þ iI ðoÞ
and 1
f ðtÞ cos otdt and I ðoÞ
Rð o Þ ¼
where,
1
gð f Þ ¼
i gðf Þ þi gðf Þ
f >0 f 0 indicates the scale (or dilation parameter) and t indicates the translation parameter. s is analogous to frequency, in the sense that larger scales (low frequencies) provide overall information of the signal and smaller scales (high frequencies) provide detailed information of the signal. The fc scale-frequency relation is given by f ¼ sDt , where fc indicates the central frequency of the wavelet and Δt refers to the sampling interval and f indicates the pseudo frequency of the wavelet corresponding to the scale, s. t refers to the time location of the wavelet window as it is slided over the signal along the time axis of the signal and thus apparently refers to time information in the transformed domain. If two functions f(t) and g(t) are square integrable in ℝ (i.e., f ðtÞ, gðtÞ L 2 ðℝÞ), then their inner product is defined as ⟨f ðtÞgðtÞ⟩ ¼
ℝ
f ðtÞ: g ðtÞdt
ð39Þ
where “*” indicates the complex conjugate. According to (39), the wavelet transform of a function f(t) can be defined as the inner product of the mother wavelet c(t) and f(t), given by 1 tt WTf ðt, sÞ ¼ p f ðtÞ:c dt s s
ð40Þ
Equation (40) explains that the wavelet transformation gives a measure of the similarity between the signal and the
Signal Processing in Geosciences
wavelet function. Such a measure at any particular scale s0 and translation t0 is identified by a wavelet coefficient. The larger the value of this coefficient, the higher the similarity between the signal and the wavelet at t0, s0 and vice versa. If a large number of high wavelet coefficients occur by using a particular wavelet, then that indicates the higher degree of suitability of that wavelet to study the signal under investigation. Two types of wavelet transforms are continuous wavelet transform (CWT) and discrete wavelet transform (DWT). A brief explanation of both these techniques is provided below. The Continuous Wavelet Transform (CWT)
In CWT (Eq. 40), the inner product of the signal and the wavelet function is computed for different segments of the data by continuously varying t and s. Because the wavelet window can be scaled (compressed or dilated) at different levels of analysis, the time localization of high-frequency components of the signal and frequency localization of lowfrequency components of the signal can be effectively achieved. Because the wavelet is translated along the time axis of the signal in a continuous fashion, the CWT is ideally suited to identify the jumps and singularities present in the data (Jaffard 1991). In CWT, the wavelet window is first placed at the beginning of the signal. The inner product of the signal and wavelet is computed, and the CWT coefficients are estimated. Next, the wavelet is slightly shifted along the time axis of the signal, and the CWT coefficients are again computed. This process is repeated till the end of the signal is reached. Figure 4a depicts the pictorial representation of the above operation. Next, the wavelet window is dilated (or increased in scale by stretching) by a small amount, and the above process is repeated at that scale, at all translations. Likewise, the CWT coefficients are calculated for several scales and translations. Finally, all the CWT coefficients corresponding to all translations and dilations are expressed in the form of a contour plot in the time-scale plane, known as the scalogram. Figure 4b depicts an example of a scalogram, obtained by performing wavelet analysis of gamma ray log data using the Gaus1 wavelet (see Chandrasekhar and Rao 2012). It can be seen in Fig. 4b that the CWT can clearly delineate and distinguish the thin and thick interfaces between different lithologies (seen as alternating sequences of low and high wavelet coefficients) at small and high wavelet scales as a function of depth, representing the high and low frequency features present in the data respectively. According to Eq. (40), the scalogram signifies that any given one-dimensional time signal is transformed into a twodimensional time-scale (read as time-frequency) plane, thereby increasing the degrees of freedom to understand the given signal effectively. Thus, the wavelet scalograms facilitate to identify what frequencies are present at what times in
1305
any given timeseries. CWT has been applied to a variety of problems in various branches of science and engineering, such as geophysical fluid dynamics (Farge 1992; Farge et al. 1996), geomagnetism (Alexandrescu et al. 1995; Kunagu et al. 2013; Chandrasekhar et al. 2013, 2015), material science (Gururajan et al. 2013), and electromagnetic induction studies (Zhang and Paulson 1997; Zhang et al. 1997; Garcia and Jones 2008). In the case of well-logging, the wavelet analysis has found its wide applications in effectively describing the inter-well relationship (Jansen and Kelkar 1997), determining the sedimentary cycles (Prokoph and Agterberg 2000), reservoir characterization (Panda et al. 2000; Vega 2003), and the space localization and the depths to the tops of reservoir zones (Chandrasekhar and Rao 2012 and references therein; Hill and Uvarova 2018, Zhang et al. 2018). Chandrasekhar and Rao (2012) have also explained the optimization of suitable mother wavelets through histogram analysis of CWT coefficients. Translation-Invariance Property of Wavelets in CWT Unlike in the Fourier transform, one of the important properties of wavelets in CWT is their translation-invariance property. It explains that a small amount of shift in the wavelet function will also result in the same amount of shift in the wavelet transformed signal. Mathematically, this can be proved as follows. Let f t0 ¼ f ðt t0 Þ be the time-shifted form of the signal f(t), by a small shift, t0. The CWT of f t0 is given by 1 tt CWTf t0 ðt, sÞ ¼ p f ðt t0 Þ:c dt s s
ð41Þ
Put t0 ¼ t t0 in (41). Then we have t0 ðt t0 Þ 1 dt0 CWTf t0 ðt, sÞ ¼ p f ðt0 Þ:c s s ¼ CWT f ðt t0 , sÞ
ð42Þ ð43Þ
Equation (43) explains that, since the output is also shifted in time by the same amount as in the input, the CWT is translation-invariant. The translation-invariance property plays a vital role in identifying the stable features in shifted images in pattern recognition (Ma and Tang 2001). For various other properties of wavelets, the reader is referred to Mallat (1989, 1999). Discrete Wavelet Transform (DWT)
In CWT, since the analysis of the entire signal is done by translating and dilating the wavelet in a continuous fashion, the CWT has proven to be very effective in identifying the sudden jumps and singularities in the data (Alexandrescu et al. 1995; Chandrasekhar et al. 2013). Further, in the
S
1306
Signal Processing in Geosciences
Signal Processing in Geosciences, Fig. 4 (a) A pictorial representation of the CWT operation. First, the CWT coefficients, when the wavelet at translation t ¼ 0 and dilation s ¼ 1, are computed. Next, the coefficients, when the wavelet at translation t ¼ 1 and dilation s ¼ 1, are obtained. Likewise, for the same scale of the wavelet, the coefficients corresponding to all other translations, C2, 1, C3, 1, . . .. . .. . Cn, 1, are obtained. Next, the above steps are repeated for different translations and dilations of the wavelet. Then, a complete scalogram depicting the time-scale representation of the signal under study is obtained (after Chandrasekhar and Dimri 2013). (b) An example of a scalogram obtained by performing CWT on gamma ray log data with Gaus1 wavelet (after Chandrasekhar and Rao 2012). The color scale represents the values of wavelet coefficients in the scalogram
scalogram, the region, where the WTf(t, s) is nonzero, is known as the region of support in the ts plane. However, such an analysis in continuous fashion also produces a lot of redundant information, and thus the entire support of WTf(t, s) may not be required to recover the signal f(t), particularly, when it could be possible to represent any signal with a fewer wavelet coefficients. Therefore, to reduce the redundancy, the discrete wavelet transform (DWT) has been introduced (Daubechies 1988; Mallat 1989). Unlike in CWT, the translation and dilation operations in DWT are done in discrete steps by considering, s ¼ sk0 ; t ¼ lt0 sk0, where k and l are integers, known as scale factor and shift factor, respectively. Thus, the DWT representation of Eq. (38) is given by (Daubechies 1988)
ck,l ðt, sÞ ¼
1 sk0
c
t lt0 sk0 sk0 0
ð44Þ
The translation parameter, 0 lt0 sk0 , depends on the chosen dilation rate of the scale (Daubechies 1988). For all practical purposes, generally, the values for s0 and t0 are chosen as 2 and 1, respectively, so that the dyadic sampling is obtained both on frequency (scale) and time axes. Since the DWT samples both the time and scale in a dyadic fashion, it is not translation-invariant (see Ma and Tang 2001). The DWT is computed using Mallat’s algorithm (Mallat 1989), which essentially consists of successive low-pass and high-pass filtering steps. First, the signal (X[n]) to be wavelet
Signal Processing in Geosciences
transformed is decomposed into high-frequency (h[n]) and low-frequency components (l[n]). h[n] and l[n] are also, respectively, known as detailed and approximate coefficients. While the former are known as level 1 coefficients, the latter are again decomposed into the next level of detailed and approximate coefficients and so on. A pictorial representation of the entire DWT process described above is shown in Fig. 5. However, at this stage, it is important to understand what exactly happens at each level of such decomposition. The successive decomposition of the signal into detailed and approximate coefficients implies that at the first level of decomposition, the frequency resolution of the detailed coefficients is enhanced, thereby reducing its uncertainty by half. The decimation by two (see Fig. 5) halves the time resolution, as the entire signal after the first level of decomposition is represented by only half the number of samples. Therefore, in the second level of decomposition, while the half-band low-pass filtering removes half of the frequencies, the
1307
decimation by two doubles the scale. With this approach, the time resolution in the signal gets improved at high frequencies, and the frequency resolution is improved at low frequencies. Such an iterative process is repeated until the desired levels of resolution in frequency are reached. However, the important point to note here is that at every wavelet operation, only half of the spectrum is covered. Does this mean an infinite number of operations are needed to compute the DWT? The answer is no. The DWT process needs to be carried out up to a reasonable level, when the required time and frequency resolutions are achieved. The scaling function of the wavelet is used to control this. Let’s summarize the above description as follows: In DWT, the given signal is analyzed using the combination of a wavelet function (denoted by c(t)) and a scaling function (denoted by f(t)). While the wavelet function helps to calculate the detailed coefficients (high frequency components), the scaling function helps to calculate the approximate coefficients (low frequency components). Finally, the DWT of the original signal is obtained by concatenating all the detailed and approximate coefficients. The correctness of the reconstructed signal will suggest, whether the required levels of decomposition in DWT were properly achieved or not. Thus, the DWT can be viewed as a filter bank operation. More details about DWT and design of wavelets can be found in several literatures such as Daubechies (1988, 1992), Mallat (1989, 1999), Sharma et al. (2013), and Sengar et al. (2013), to name a few. The discretization of translation and dilation parameters in DWT gives rise to a two-dimensional sequence, dj,k, commonly referred to as the DWT of the function f(t). In other words, DWT amounts to mapping of f(t); t ℝ into a twodimensional sequence, dj,k. This implies, any signal, f ðtÞ, L 2 ðℝ Þ can be represented as an orthogonal wavelet series expansion if the sampling on the ts plane is adequate, thereby reducing the redundancy in the estimation of wavelet transform coefficients (Sharma et al. 2013). Because of such a significant reduction in the redundant computations, DWT offers a wide range of applications, particularly in field of image processing. In geosciences, it is widely used in the fields of remote sensing and geomorphology. The next section provides a brief account of the application of multiresolution analysis of remote sensing images based on DWT. Multiresolution Analysis of Remote Sensing Images
Signal Processing in Geosciences, Fig. 5 Schematic representation of the DWT operation in the form a tree. The number “2” in the circles indicates the decimation factor for down sampling. See text for more description of this tree (after Chandrasekhar and Dimri 2013)
Remote sensing is a well-known technique used to classify different regions on the earth, based on the amount of electromagnetic radiation reflected from such regions. Changes in the amount and properties of the electromagnetic radiation in any region are valuable sources of data for interpreting the geological and geophysical properties of the region. The amount of electromagnetic radiation depends on the frequency of operation, instantaneous field of view or spatial resolution, radiometric resolution, etc. Data acquired in the
S
1308
Signal Processing in Geosciences
form of images correspond to localized quantization levels of radiometric resolution in the spectral band of each pixel. Although various techniques have been developed to process and analyze the data (see Gonzalez and Woods 2007), it is difficult to analyze and assess the local changes of the intensity in individual pixels in an image. As discussed earlier, given the poor localization properties of the sinusoid basis functions used in the Fourier transform, the Fourier transform is not very effective in extracting the localized features of the image. To overcome this problem, a multiresolution analysis technique, based on DWT, can be implemented, which can provide a coarse-to-fine and scale-invariant decomposition for the interpretation of images (Mallat 1989). According to Mallat’s algorithm, an input image is analyzed with high-pass and low-pass filters using a wavelet function and its associated scaling function, and downsampling (Fig. 5). Four images (each one with half the size of the original image) are obtained corresponding to high frequencies in the horizontal direction and low frequencies in the vertical direction (HL), low frequencies in the horizontal direction and high frequencies in the vertical direction (LH), high frequencies in both directions (HH), and low frequencies in both directions (LL). The last image (LL) is a low-pass version of the original image and is called as the approximation image. Figure 6a shows a three-level decomposition where LL1 (approximation sub-band at level 1) is further decomposed into LL2, LH2, HL2, and HH2. This procedure is repeated for the approximation image at each resolution 2j (for dyadic scales). Wavelets are viewed as the projection of the signal on a specific set of c(t) and f(t) functions in the vector space, defined as cðtÞ ¼
nℤ
fðtÞ ¼
h½n fð2s t tÞ
ð45Þ
l½n fð2s t tÞ
ð46Þ
nℤ
where h[.] are high-pass filter coefficients and l[.] are lowpass filter coefficients of filter bank. s and t, respectively, denote the scaling and translating indexes. The signal decomposition into different frequency bands is obtained by successive high-pass and low-pass filtering of the time domain signal. The original signal X[n] is first passed through a half-band high-pass filter h[n] and a low-pass filter l [n]. After the filtering, half of the samples can be eliminated according to Nyquist’s criteria; the signal can therefore be down-sampled by 2, by eliminating every other sample. This constitutes one level of decomposition and can mathematically be expressed as follows oh ½t ¼ ¼
n
X½n :h½2t n and ol ½t
n
X½n :l½2t n
ð47Þ
where oh and ol are the outputs of the high-pass and low-pass filters respectively, after down-sampling by two. Eq. (47) is iteratively repeated to obtain DWT coefficients at other levels of decomposition. Figure 6b shows an image of Kuwait City, acquired by the Indian Resourcesat-1 satellite with a spatial resolution of 5.8 m x 5.8 m (609 x 609 pixels). This resolution is well suited for texture analysis, since a spatial resolution of this order is not adequate to extract individual buildings or narrow roads but groups of them, which render a visible checked pattern in dense urban areas. Figure 6c shows the approximate and detailed sub-bands of the original image, where different horizontal, vertical, and diagonal details of urban features can be visualized. Several methods, including wavelet, curvelet, and contourlet transforms have been used in multiresolution analysis for image enhancement (Ansari and Buddhiraju 2015), texture analysis (Ansari and Buddhiraju 2016), slum identification (Ansari et al. 2020a), and change detection (Ansari et al. 2020b) applications using remotely sensed images. Fractal and Multifractal Analyses Benoit B. Mandelbrot, who invented fractal theory, first coined the term, fractal, to open a new discipline of mathematics that deals with non-differentiable curves and geometrical shapes (Mandelbort 1967). The fractal theory provides a framework to study irregular shapes as they exist, instead of approximating them using regular geometry, which usually is done for the sake of mathematical convenience. A prime characteristic of fractal objects is that their measured metric properties, namely, the length, area, or volume, are a function of the scale lengths of measurement. Mandelbrot, in his landmark paper, entitled, “How long is the coast of Britain?” (Mandelbort 1967), illustrates this property by explaining that finer details of coastline are not well realized at larger scale lengths (lower spatial resolution) but become apparent and thus well recognized only at smaller scale lengths (higher spatial resolution). This implies that the measured length of any irregular or wriggled line increases with a decrease in scale length. This further implies that in fractal geometry, the Euclidean concept of “length” becomes a continuously varying process rather than a fixed entity. Two types of fractals that we largely deal with in nature are classed as “self-similar fractals” and “self-affine fractals.” A fractal is defined as self-similar, if its geometry is retained even after magnification. Mathematically it can be expressed in Cartesian reference frame as x ¼ la x’ and y ¼ la y’
ð48Þ
Examples include cauliflower, a cobweb, fern leaves, cantor set, and Koch curve. On the other hand, a self-affine fractal requires different scaling on either axis, to retain its geometry, in the sense that the variation in one axis scales differently
Signal Processing in Geosciences
1309
S
Signal Processing in Geosciences, Fig. 6 Figure showing (a) the structure of three-level decomposition of DWT, (b) original map of Kuwait City as test data, and (c) three-level wavelet decomposition of original image shown in (b)
from that on the other axis. Mathematically it can be expressed in Cartesian reference frame as x ¼ la x’ and y ¼ mb y’
ð49Þ
where, l, α, m and β represent some non-zero constants. Examples include irregular curves and nonlinear timeseries of any dynamical system. The fractal dimension, D, for a selfsimilar set in ℝn defines N nonoverlapping copies of itself
with each copy scaled down by a ratio r < 1 in all coordinates. Mathematically, it is expressed as N ¼ r D )
D¼
ln N lnð1=r Þ
ð50Þ
Although a straight line, a square, and a cube satisfy the self-similar conditions described by Eq. (48), they are not fractals, since they have integer dimensions. Therefore, any
1310
self-similar objects, which have noninteger dimensions only are said to be self-similar fractals. For more details on the estimation of fractal dimension for fractal structures, the reader is referred to Fernandez-Martinez et al. (2019). Fractal dimensions for self-affine irregular shapes and curves can be calculated using several techniques, some of which, we shall briefly discuss below. It is important to note here that the concept of fractals is not limited to the study of geometrical shapes alone. It has rather wide applications in the study of different signals having a variation of some physical properties as a function of time. Since most geophysical timeseries represent the outcome of the behavior of any dynamical system, responsible for generating such nonstationary and nonlinear signals, they are expected to show self-affine behavior. In the case of timeseries analysis, different magnification factors are needed (along the vertical and horizontal axes), to compare the statistical properties of the rescaled and original timeseries, since these two axes represent different physical parameters: amplitude and time. Let us explain this further, as given below. Consider a nonlinear data of length l2 (Fig. 7a). A portion of the same data having length l1 is expanded to a length, equal to l2 (Fig. 7b). Now calculate the probability distribution of the data corresponding to window lengths l1 and l2, and plot them as shown in Fig. 7c. s1 and s2 indicate the halfwidths of the probability distribution curves corresponding to the data of lengths l1 and l2, respectively. The relationship
Signal Processing in Geosciences, Fig. 7 Schematic representation of estimation of scaling exponent for a nonlinear data. (a) example nonlinear signal, (b) a portion of the signal expanded to be equal to the length of the original signal, (c) estimation of half-widths s1 and s2 of probability distribution function, P( y) for the data corresponding to l1 and l2 windows, and (d) calculation of the scaling exponent, α
Signal Processing in Geosciences
between s1, s2 and l1, l2 is expressed by a parameter called, the scaling exponent, denoted by α, is given by s2 ¼ s1
l2 l1
a
ð51Þ
The scaling exponent is estimated as shown in Fig. 7d. Likewise, α is calculated for different portions of the data with varying window lengths. If the α value is observed to be constant, indicating a steady increasing trend with the increasing window lengths, then that data is said to possess monofractal behavior. Alternatively, if the α values show different trends, with increasing window lengths, then that data is said to possess multifractal behavior. Most observational data (generated by the dynamics of the unknown systems producing such nonlinear and nonstationary data) cannot be characterized by a single exponent throughout its length, and thus more than one scaling exponent are obtained for different segments within the same data based on the temporal variations in the statistical properties of the data. Such signals show multifractal behavior. Examples include seismological data, ionospheric data, geomagnetic data, and weather data, to name a few. While it is true that sometimes the nature of the data itself warrants its multifractal behavior, in most other cases, the effect of various types of noises that grossly vitiate the data, make it “behave” like a multifractal.
Signal Processing in Geosciences
1311
A few techniques to determine the scaling exponents for nonlinear and nonstationary data are (i) rescaled range (R/S) analysis, (ii) detrended fluctuation analysis (DFA) and (iii) multifractal DFA, and (iv) wavelet transform modulus maxima (WTMM) method. Let us briefly discuss each of them. For a full review of several other fractal and multifractal techniques, the reader is referred to Lopes and Betrouni (2009). Rescaled Range (R/S) Analysis
This is also known as R/S technique, where R and S, respectively, denote the range and standard deviation of the integrated series generated from the given signal. This method basically relies on the fact that R and S bear a power-law relationship, defined as Rð k Þ ¼ kH Sð k Þ
ð52Þ
where k designates the time length of the segment of the input signal, say x(t). The integrated timeseries, yk, of x(t) is estimated as yk ¼
k i¼1
½xðiÞ ̄ x
k ¼ 1, 2, 3, . . . N
ð53Þ
The range, R(k), defines the difference between the maximum and minimum of yk, given by RðkÞ ¼ maxðy0 , y1 , y2 . . . yk Þ minðy0 , y1 , y2 . . . yk Þ ð54Þ The standard deviation, S(k), of the integrated series yk, over the period k, is given by Sð k Þ ¼
1 k
k i¼1
½yi ̄ y 2
ð55Þ
H in Eq. (52) is known as the Hurst exponent. The slope of the curve in the plot of the logarithm of the ratio of R(k) to S(k) Vs. the logarithm of k defines H. Figure 8a shows an example of interevent timeseries data generated from the earthquake magnitude (Mw) data (source: http://www.isc.ac.uk/ iscbulletin/search/catalogue/interactive/) in the range, 4.0–4.9 of Japan region corresponding to a period between 1990 and 2016. Figure 8b shows the least-squares estimation of Hurst exponent value of 0.68, calculated using R/S analysis technique for the same data set, signifying the presence of persistent long-range correlations in the data. The Hurst exponent, H, is understood in the following way. (i) If H > 0.5, then the system behavior is called persistent. This suggests, an increasing (decreasing) trend in the past implies an increasing (decreasing) trend in the future. This is true for any natural phenomena, as in Fig. 8b. (ii) If H < 0.5, then the system behavior is called anti-persistent. This suggests, an increasing (decreasing) trend in the past implies a decreasing (increasing) trend in the future. Finally, (iii) if H ¼ 0.5, then the correlation of the past and future increments vanishes at all times as required for an independent process, and thus the data represents the pure white noise series. More details about the Hurst exponent can be read from Bodruzzaman et al. (1991) and Gilmore et al. (2002). Detrended Fluctuation Analysis (DFA) and Multifractal DFA (MFDFA)
DFA and MFDFA help to study the intrinsic self-similarities in nonstationary and nonlinear signals, by determining the scaling exponents in a modified least-square sense. In other words, they help to understand the order in the chaos. To estimate the scaling exponents through both these techniques, first the given signal is converted to a self-similar process by integration. This would help to bring out the power-law behavior in the signal, by making it unbounded, and detects the underlying self-similarities in the data over various
S
Signal Processing in Geosciences, Fig. 8 The test data (a) of interevent timeseries of earthquake magnitude (Mw) in the range, 4.0–4.9 of Japan region corresponding to a period between 1990 and 2016 and (b) Hurst exponent value calculated using R/S analysis technique
1312
Signal Processing in Geosciences
window lengths (Goldberger et al. 2000). Next, the integrated series is divided into short windows of equal length, and the local trend (a least-squares fit of chosen order) of the data corresponding to each window is calculated and removed from each data point of the respective window, and a fluctuation function corresponding to each window is calculated. This exercise is repeated for various chosen window lengths of the data. It is important to note here that after estimating the fluctuation functions with one window length, the p successive window lengths are incremented by a factor of 8 2, with the maximum window length not exceeding N/4, where N is the total number of data points in the signal under investigation (Peng et al. 1994). The fractal scaling exponent then signifies the slope of the linear least-squares regression between the logarithm of fluctuations and the logarithm of window lengths. In MFDFA, the above procedure is carried out repeatedly for different orders (also known as statistical moments) of fluctuation functions in a modified least-squares sense. The entire description of DFA and MFDFA techniques in simple mathematical steps is described below (see Kantelhardt 2002; Kantelhardt et al. 2002; Subhakar and Chandrasekhar 2016; Chandrasekhar et al. 2016): 1. First determine an integrated series y(m) of N-point input data sequence, say x(i), by m
yðmÞ ¼
ðxðiÞ x Þ; m ¼ 1, 2, 3, . . . . . . . . . . . . N
ð56Þ
i¼1
where, x denotes the mean of the N-point data sequence. 2. Divide the m-length integrated series into various m/k nonoverlapping windows of equal length, each consisting of k number of samples.
Signal Processing in Geosciences, Fig. 9 Examples of two different nonlinear mono and multifractal signals characterized by (a) single scaling exponent and (b) two scaling exponents respectively (after Subhakar and Chandrasekhar 2015)
3. Calculate the least-squares fit of preferred order to the data points in each window. This represents the local trend, yk(m). 4. Detrend the integrated series, y(m), by subtracting the local trend, yk(m), from each data point of the corresponding window. 5. For a window of length, k, the average fluctuation, F(k, n), of the detrended series is calculated by Fðk, nÞ ¼
1 k
nk i¼sþ1
½yðiÞ yk ðiÞ 2
ð57Þ
where s ¼ (n 1)k; Nk ¼ int (m/k) and n denotes the window number. 6. Equation (57) is calculated for various window lengths, k. The power-law relation between F(k, n) and k is given by F(k, n) ¼ kα. The scaling exponent, α is estimated by a ¼ ln Fðk, nÞ= ln k
ð58Þ
If α bears a linear relation between ln F(k, n) and ln k, such that it steadily increases with increase in window length, for the entire data, then the data is said to possess monofractal behaviour. If α bears more than one linear relation between ln F(k, n) and ln k, corresponding to a group of window lengths, as k increases, then the data is said to possess multifractal behavior. In other words, the data possessing multifractal behavior shows persistent short-range correlations up to a certain group of shorter window lengths and persistent longrange correlations for a group of larger window lengths. Fig. 9 depicts the examples of two different nonlinear mono and multifractal signals characterized by a single scaling exponent (Fig. 9a) and two scaling exponents (Fig. 9b), respectively.
Signal Processing in Geosciences
1313
7. In MFDFA calculations, Eq. (57) is calculated by operating the windows in both forward and backward directions. For forward operations, n is considered as n ¼ 1, 2, 3. . .. . Nk and for backward operations, n is considered as, n ¼ Nk, Nk 1, Nk 2, . . .1. Considering both forward and backward operations, the generalized form for overall average fluctuations is expressed as qthorder fluctuation function as (Kantelhardt et al. 2002). Fq ð k Þ ¼
1 2N k
2N k n¼1
F2 ðk, nÞ
q 2
1 q
khðqÞ
ð59Þ
Signal Processing in Geosciences, Fig. 10 The q Vs h(q) plot, obtained using synthetic data consisting of several periodicities, obtained using Eqs. 59, 60, and 61 (after Chandrasekhar et al. 2016)
Equation (59) is iteratively calculated for various k and q values to provide a power-law relation between Fq(k) and kh(q). The Hurst exponent, h(q), defined as hðqÞ ¼ ln Fq ðkÞ= ln k
ð60Þ
expresses the slope of the linear least-squares regression between the logarithm of the overall average fluctuations Fq(k) and the logarithm of k, for corresponding q. q signifies the statistical moment, defined as mean (q ¼ 1), variance (q ¼ 2), skewness (q ¼ 3), kurtosis (q ¼ 4), etc. However, for better characterization of the signal, Eq. (60) is calculated for further upper and lower bounds of q, such that q can be negative also. Generally, q values considered in the range 8 to 8 are sufficient to meaningfully characterize any nonlinear signal. In the range, q to þ q, for the case of q ¼ 0 (as the exponent becomes divergent at this value), Eq. (59) will be transformed to a logarithmic averaging procedure, given by (Kantelhardt et al. 2002). 1 F0 ðkÞ ¼ exp 4N k
Signal Processing in Geosciences, Fig. 11 A schematic representation of an example of multifractal (MF) singularity spectrum, depicting several parameters that aid in clear understanding of the spectrum. See the text for better understanding of the MF spectrum (after Chandrasekhar et al. 2016)
f ðaÞ ¼ q½a hðqÞ þ 1 and a ¼ hðqÞ þ qh0 ðqÞ 2N k n¼1
2
ln F ðk, nÞ
k
hð0Þ
ð61Þ
The behavior of multifractal Hurst exponent, h(q), depends on the average fluctuations, Fq(k). It bears a nonlinear relation with the order of the fluctuation function, q, such that the average fluctuations have high (low) h(q) for negative (positive) q (Kantelhardt et al. 2002). If h(q) varies (remains constant) for various q, then the signal is said to have multifractal (monofractal) behavior. This can be easily observed in Fig. 10, which depicts the q Vs h(q) plot, obtained using synthetic data consisting of several periodicities (after Chandrasekhar et al. 2016). In Fig. 10, while the 1-day periodicity signal depicts monofractal behavior, the signals of all other periodicities exhibit multifractal behavior. 8. Finally, estimate the multifractal singularity spectrum, defining the relation between the singularity spectrum, f (α) and strength of the singularity, α, defined by
ð62Þ
α is also known as Hölder exponent. The reader is referred to Kantelhardt et al. (2002) for more details on α and f (α). Fig. 11 shows an example of f(α) versus α plot, known as multifractal singularity spectrum, depicting different parameters required to interpret the data. αmax αmin defines the width of the multifractal spectrum. The broader the width of the singularity spectrum, the stronger the multifractal behavior of the nonlinear data and vice versa (Kantelhardt et al. 2002). Also, in Fig. 11, the parameters a and b signify the distances of the extrema from the center of the spectrum. If b > a, then the spectrum is said to be right-skewed and if a > b, then the spectrum is said to be left-skewed. The rightskewed spectrum signifies the presence of finer structures in the data, and the left-skewed spectrum signifies the presence of coarser structures in the data. One easy way to understand this concept is as follows: If the multifractal singularity spectrum of the ECG signal of a heart patient is right-skewed, then the heart functioning is said to be good, indicating the
S
1314
systematic heartbeat of the patient. However, if the spectrum is left-skewed, then the heart functioning is said to be not good, indicating the presence of large-scale and irregular fluctuations in the heartbeat, suggesting medical attention for that patient. This way, multifractal analysis helps to understand and interpret the factors contributing to the dynamics of the system, generating the nonlinear data. Another important point to understand about the multifractal singularity spectrum is its positioning in the α Vs. f(α) plot. Calculating the multifractal spectra of a synthetic time series (the same synthetic data used for calculating h(q) for Fig. 10), Chandrasekhar et al. (2016) have explained (see Fig. 12) that the positioning of multifractal singularity spectrum in the α Vs. f(α) plot corresponding to any periodicity in any given data is dictated by the number of cycles of that periodicity present in the data. If a fewer number of cycles of any periodicity are present in the data, then the multifractal spectrum corresponding to the data of that periodicity will be positioned at higher values of α and vice versa. Applications The Hurst exponent, h(q) (Eq. 60), plotted as a function of q, and the multifractal singularity spectra (Eq. 62), depicting the f(α) versus α plot, are the two important diagnostic information used in multifractal analysis to understand the fractal/multifractal behavior of the nonlinear and nonstationary data. Telesca et al. (2004) applied the MFDFA algorithm on seismological data recorded for several years at three tectonically active zones in Italy. Multifractal studies helped them to easily demonstrate the differences in the subsurface dynamics in all the different zones. They also have compared their results with those of R/S analysis performed on the same data sets. Telesca et al. (2012) have also studied the environmental influences such as ocean effect and coast effect, on the magnetotelluric data recorded at different sites in Taiwan, using the MFDFA technique.
Signal Processing in Geosciences, Fig. 12 The α Vs f(α) plots, depicting the comparison of the position locations of the multifractal singularity spectra of different periodicities for the same synthetic data used in Fig. 10, obtained using Eq. (62) (after Chandrasekhar et al. 2016)
Signal Processing in Geosciences
Subhakar and Chandrasekhar (2015, 2016) applied both DFA and MFDFA algorithms on geophysical well-log data of Bombay offshore basin (the same data used by Chandrasekhar and Rao (2012) for wavelet analysis) and identified the depths to the tops of formations and compared their results of DFA with those of wavelet analysis. Subhakar and Chandrasekhar (2016) also distinguished the oil and gas zones in the subsurface hydrocarbon reservoir zones, based on the multifractal behavior of well-log data. Further, they also explained whether the multifractal behavior in well-log signals is due to the presence of long-range correlations or the broad probability distribution in the data (Subhakar and Chandrasekhar 2015). Chandrasekhar et al. (2016) applied the MFDFA algorithm to ionospheric total electron content (TEC) data of a particular longitude zone covering both the hemispheres to understand the hemispherical differences in the spatiotemporal behavior of TEC and also the multifractal behavior of TEC recorded during the geomagnetically quiet and disturbed periods. Multifractal studies facilitated them to clearly demonstrate the semi-lunar tidal influences on TEC data as a function of latitude. Chandrasekhar et al. (2015) studied the multifractal behavior of geomagnetic field corresponding to geomagnetic quiet and disturbed days. They suggest that more geomagnetic data sets corresponding to different seasons would provide more comprehensive information on the multifractal behavior of the extraterrestrial source characteristics influencing the geomagnetic field variations during geomagnetic disturbed days. For other applications of multifractal analysis in the field of geomagnetism, the reader is referred to Hongre et al. (1999), Yu et al. (2009), and the references therein. Wavelet-Based Multifractal Formalism: The Wavelet Transform Modulus Maxima (WTMM) Method
This method allows the determination of multifractal behavior of nonlinear dynamical systems using wavelets. This technique relies on the wavelet coefficients estimated from CWT. In this method, if the chosen mother wavelet has n vanishing moments, it will nullify all the polynomial trends of the order n-1 in the data (Daubechies 1992), leading to a natural detrending process (Vanishing moments of a wavelet signify the ability of the wavelet to suppress the polynomial trends in the given data. Higher vanishing moments imply the wavelet’s ability to suppress the higher order trends in the data and vice versa.). This facilitates to accurately identify the singularities in the data. The “modulus” of the wavelet coefficients, obtained along the lines of maxima, can easily bring out the multifractal behavior of the data. (Lines of maxima are the curves formed by joining the points of local maxima at different scales in the wavelet scalogram (cf. Section “The Continuous Wavelet Transform (CWT)”). The convergence of lines of maxima at the lowest scale on the time axis
Signal Processing in Geosciences
1315
indicates the presence of singularities in the signal at those times). Mathematical details of the WTMM technique are as follows. The multifractal behavior of a function, f(t), around any point is characterized by the scaling properties of its local power law, given by
log 2 Zðq, sÞ tðqÞ log 2 s
Finally, the multifractal singularity spectrum from the WTMM method is obtained by (Bacry et al. 1993). DðaÞ ¼ min ½qða þ 1=2Þ tðqÞ q¼R
⟨jf ðt þ kÞ f ðtÞj⟩ kaðtÞ
ð63Þ
The exponent α(t) describes the local degree of singularity or regularity around t. The collection of all the points that share the same singularity exponent is called the singularity manifold of exponent α and is a fractal set of fractal dimension D(α). The relation between D(α) and α signifies the multifractal singularity spectrum that fully describes the distribution of the singularities in f(t). The CWT of a function is defined by 1 Wf ðu, sÞ ¼ ⟨f , cu,s ⟩ p s
1 1
f ðtÞ c
tu dt s
ð65Þ
The modulus of the wavelet coefficients along the lines of maxima bears a power-law relationship with the scale corresponding to each identified singularity in the signal (Mallat 1999). Let (up, s) be the position of all local maxima of |Wf(u, s)| at a fixed scale s. The global partition function, Z(q, s), signifying the measure of the sum at a power q of all these wavelet modulus maxima is given by (Arneodo et al. 2008). Zðq, sÞ ¼
p
Wf up , s
q
where, Zðq, sÞ s
tðqÞ
ð66Þ q defines the statistical moment. The multifractal singularity spectrum from the wavelet transform local maxima can be obtained using Z(q, s). The modulus of the wavelet transform and the set of maxima locations corresponding to scale s is given by (Mallat 1999) jWf ðu, sÞj Sðaþ1=2Þ when s ! 0þ
ð69Þ
For further illustration of the WTMM, the reader is referred to Arneodo et al. (2002, 2008) for numerous examples, Kantelhardt et al. (2002), Audit et al. (2002), and Salat et al. (2017) for comparison with the MDFA method. Using the cumulants of WTMM coefficients, Bhardwaj et al. (2020) studied the multifractal behavior of the ionospheric total electron content (TEC) data and established the consistent results obtained by Chandrasekhar et al. (2016) using MFDFA on the same data sets. Empirical Mode Decomposition (EMD) Technique
ð64Þ
The presence of the local singularity of f(t) in a scalogram can be checked by the decay of jWf(u, s)j at t0 on the time axis. The wavelet transform modulus maxima at a point (u0, s0) describes that jWf(u, s)j is locally maximum at u ¼ u0, such that @Wf ðu0 , s0 Þ ¼0 @u
ð68Þ
ð67Þ
From Eq. (66), the scaling exponent t(q) can be obtained by
Since most geophysical data are nonstationary and nonlinear in nature, the use of wavelet transformation, instead of the Fourier transform, has been proved to be one of the efficient techniques for characterizing the spatiotemporal nature of a variety of signals. However, the inherent problem with wavelet analysis lies with the identification of a suitable mother wavelet that best suits to analyze the signals. Besides wavelet analysis and the data-adaptive fractal and multifractal studies, another efficient and fully data-adaptive signal analysis technique, called the empirical mode decomposition (EMD) technique, can unravel the hidden information from the nonlinear signals. EMD technique, combined with Hilbert spectral analysis, is known as Hilbert-Huang transform (see Huang et al. 1998; Huang and Wu 2008). The EMD technique decomposes the data into oscillatory signals of different wavelengths, known as intrinsic mode functions (IMFs), just the similar way the discrete wavelet transform (DWT) decomposes the signal into detailed and approximate wavelet coefficients, representing the high- and low-frequency components of the signal respectively. However, the advantage of EMD technique over the DWT is that in the former, the decomposition of the signal into IMFs is easily done through some simple sifting operations, which are totally data-adaptive, unlike in the latter, where a similar operation necessitates an arduous task of choosing suitable wavelet and scaling functions. Although the EMD technique lacks a thorough mathematical framework, it is proved to be best suited to analyze nonlinear and nonstationary signals (Huang et al. 1998; Huang and Wu 2008; Gairola and Chandrasekhar 2017) and also acts as an efficient filter bank (Flandrin et al. 2004). Methodology to Estimate the IMFs by EMD Technique
The end result of the fully data-adaptive EMD technique is the decomposition of different frequency components of the
S
1316
Signal Processing in Geosciences
signal (from highest to lowest) till the signal becomes monotonic, which is called the residue and represents the overall trend of the signal. The sequential steps to determine IMFs are as follows: 1. Identify the local maxima and minima of the given signal, x(t), with a cubic-spline interpolation of them to form upper and lower envelopes, respectively, and calculate their mean, m1. 2. Calculate the first proto-mode, x1 given by x1 ¼ x(t) m1. 3. If x1 still contains maxima and minima, repeat steps 1 and 2 to get x11 ¼ x1 m11. Accordingly, after k iterations, we shall have, x1k ¼ x1(k 1) m1k. According to Huang et al. (1998), the stopping criteria for this iterative process are (i) the number of extrema and number of zero crossings must be equal to or differ at most by 1. (ii) m1 should be zero for the entire signal. Condition (i) ensures that the signal under investigation should not contain any local fluctuations and condition (ii) ensures that the Hilbert transform estimates the correct instantaneous frequencies from the data. 4. Choose the stopping criterion as the normalized squared difference of the data after two successive sifting operations, given by N
Dk ¼ i¼1
ðxk1 ðiÞ xk ðiÞÞ2 N i¼1
x2k1 ðiÞ
where N denotes the total number of data samples. Generally, Dk should be as small as possible, say Dk 0.1, for correct estimation of each IMF after k iterations. Once the stopping criterion is satisfied after the first set of k iterations, then the first IMF (IMF-1), signifying the highest frequency present in the data is determined. It is important to note here that the condition, Dk 0.1, is rather arbitrary and largely depends on the data quality. Higher Dk limits can be set for poor quality data and vice versa (Gairola and Chandrasekhar 2017). 5. Calculate the residue, r1, by subtracting IMF-1data from the original signal and repeat steps 1–4 above to determine IMF-2. 6. Repeat steps 1–5 to calculate the successive IMFs until the residue becomes monotonic, when no further IMFs could be extracted from the data. The original signal can be obtained by synthesizing all the IMFs and the residue, given by xðtÞ ¼
n1
IMFi þ r n , where n denotes the nth
i¼1
number of iteration, at which rn becomes monotonic. Figure 13 depicts an example of a nonlinear test signal and the IMFs obtained by EMD technique. EMD technique has
found several applications in a variety of specializations in geophysics, such as magnetotellurics (Cai et al. 2009; Cai 2013; Neukirch and Garcia 2014), well-logging (Gairola and Chandrasekhar 2017, Gaci and Zaourar (2014), seismics (Battista et al. 2007; Xue et al. 2013; Jayaswal et al. 2021), fault detection in mechanical devices (Gao et al. 2008), atmospheric sciences (McDonald et al. 2007), and sequence stratigraphy (Zhao and Li 2015), to name a few. Dätig and Schlurmann (2004) discussed the limitations of HHT, while applying it to study irregular water waves. Gaci and Zaourar (2014) and Gairola and Chandrasekhar (2017) have also discussed the use of IMFs in quantifying the degree of subsurface heterogeneity from nonlinear signals. The EMD technique suffers with the problem of spilling over of frequencies from one IMF into other IMFs, called mode mixing. Mode mixing mainly arises due to overshoot and undershoot in cubic spline interpolation of upper and lower envelopes, while estimating the IMFs. To address this problem, a minor variant of the EMD technique, called the ensemble EMD (EEMD), technique has been developed (Wu and Huang 2009; Torres et al. 2011; Gaci 2016). Basically, in EEMD, different white noise series are added into the signal at several sifting operations. Since the noise added at each operation is different, the resulting IMFs do not display any correlation with their adjacent ones, and thus spilling of frequencies from one IMF to another one can be arrested. However, the principle of all these techniques remains the same as that of EMD technique described above. Also, the applications of EEMD technique in geosciences have been very limited as of now. Application of IMFs in Characterizing the Degree of Subsurface Heterogeneity If two nonlinear signals are generated from two different heterogeneous systems, then generally it is natural to believe that the signal producing more number of IMFs must have been generated from a system having a higher degree of heterogeneity than the other, thereby qualitatively understanding the heterogeneity of the two systems generating such signals. However, while this may be true, sometimes, it is not an easy task to assess the heterogeneity in different systems, when signals generated from two different dynamical systems produce equal number of IMFs. In such situations, it becomes difficult to establish, which system is more heterogeneous than the other. A better understanding of the dynamics of such systems, generating such nonlinear signals, can be best achieved by carrying out the heterogeneity analysis (see Gaci and Zaourar 2014). Heterogeneity analysis of the IMFs of nonlinear signals provides a quantitative estimate of the degree of heterogeneity of the dynamics of the system producing such signals. We earlier have explained that the EMD technique acts as a filter bank (Flandrin et al. 2004), just the similar way the discrete wavelet transform does (Mallat 1999). Therefore, according
Signal Processing in Geosciences
1317
Signal Processing in Geosciences, Fig. 13 Schematic representation of a nonlinear test signal and the IMFs generated from it. Note that the first IMF (IMF-1) designates the highest frequency and the last IMF (IMF-8) depicts the lowest frequency present in the signal. The bottom most curve depicts the residue, which is a monotonic function (see the text), which cannot be decomposed into any further IMFs (after Gairola and Chandrasekhar 2017)
to the filter bank theory, there must exist a nonlinear relationship between the IMF number (m) and the respective mean wavelength (Im) of each IMF, given by Im ¼ krm, where r designates the heterogeneity index and k is the constant (Gaci and Zaourar 2014). As the above relation between Im and m explains, the estimated r values bear an inverse relation with the heterogeneity of the subsurface. Accordingly, smaller (larger) r values designate higher (lesser) heterogeneity of the system. The two-step procedure to determine r is as follows: 1. For each IMF, calculate the mean wavelength (Im) defined as the ratio of the total number of points to the total number of peaks. 2. Draw a plot between the IMF number, m, and the logarithm of Im. The antilogarithm of the slope of this line defines the heterogeneity index, r. Figure 14 depicts an example of heterogeneity indices determined from the geophysical well-log data of two different wells having an equal number of IMFs (see Gairola and Chandrasekhar 2017). Since the value of the heterogeneity index of the data of the first well (Fig. 14a) is greater than that
of Fig. 14b, the subsurface of the former well is less heterogeneous than the latter. This has also been confirmed with the results of multifractal analysis of the same data sets (see Subhakar and Chandrasekhar 2016).
Summary Fundamentals of signal processing techniques and their applications in various problems of geoscience research have been discussed. Starting from the basics of integral transforms to advanced levels of nonlinear signal analysis techniques, their theory and applications have been discussed. Particularly, the application of linear integral transforms, namely, the Fourier transform, Z-transform, and the Hilbert transform in geophysics, and the applications of nonlinear signal analysis techniques, such as wavelet analysis (CWT and DWT), fractal and multifractal analysis, empirical mode decomposition technique, and all of their roles in diverse fields of geosciences, such as solid earth geophysics, ionospheric studies, and geomagnetism, has been discussed. Exhaustive literature on all of the above techniques has also been provided for the benefit of the reader.
S
1318
Signal Processing in Geosciences
Signal Processing in Geosciences, Fig. 14 Linear regression between the logarithm of mean wavelength of each IMF and the respective IMF number for geophysical well-log data of two different wells. Since the heterogeneity index of the well-log data corresponding to the
first well (a) is greater than that of the second well (b), the subsurface of the former well is less heterogeneous than the latter (after Gairola and Chandrasekhar 2017)
Cross-References
Audit B, Bacry E, Muzy J-F, Arneodo A (2002) Wavelet-based estimators of scaling behavior. IEEE Trans Inf Theory 48:2938–2954. https://doi.org/10.1109/TIT.2002.802631 Bacry E, Muzy J-F, Arneodo A (1993) Singularity spectrum of fractal signals from wavelet analysis: exact results. J Stat Phys 70:635–674 Battista BM, Knapp C, Goebel T, M.V. (2007) Application of the empirical mode decomposition and Hilbert-Huang transform to seismic reflection data. Geophysics 72(2):H29–H37. https://doi.org/ 10.1190/1.2437700 Bhardwaj S, Chandrasekhar E, Seemala GK, Gadre VM (2020) Characterization of ionospheric total electron content using wavelet-based multifractal formalism. Chaos, Solitons Fractals 134:109653. https:// doi.org/10.1016/j.chaos.2020.109653 Bhimasankaram VLS, Mohan NL, Rao SVS (1977a) Analysis of gravity effect of two dimensional trapezoidal prism using Fourier transforms. Geophys Prospect 25:334–341 Bhimasankaram VLS, Nagendra R, Rao SVS (1977b) Interpretation of gravity anomalies due to finite incline dikes using Fourier transforms. Geophysics 42:51–60 Bhimasankaram VLS, Mohan NL, Rao SVS (1978) Interpretation of magnetic anomalies of dikes using Fourier transforms. Geoexploration 16:259–266 Bodruzzaman M, Cadzow J, Shiavi R, Kilroy A, Dawant B, Wilkes M, (1991) Hurst’s rescaled-range (R/S) analysis and fractal dimension of electromyographic (EMG) signal. IEEE Proc. Southeastcon ‘91, Williamsburg, VA, 2, 1121–1123 Burg JP (1975) Maximum entropy spectral analysis, PhD thesis, Stanford University, Stanford (Un published) Burrus CS, Gopinath RA, Guo H (1998) Introduction to wavelets and wavelet transforms. A Primer, Upper Saddle River, Prentice Hall Cai JH (2013) Magnetotelluric response function estimation based on Hilbert-Huang transform. Pure Appl Geophys 170:1899–1911 Cai JH, Tang JT, Hua XR, Gong YR (2009) An analysis method for magnetotelluric data based on the Hilbert-Huang transform. Explr Geophys 40:197–205 Chandrasekhar E, Dimri VP (2013) Introduction to wavelets and fractals. In: Chandrasekhar E et al (eds) Wavelets and fractals in earth system sciences. CRC Press, Taylor and Francis, pp 1–28
▶ Fast Fourier Transform ▶ Fast Wavelet Transform ▶ Hilbert Space ▶ Multifractals ▶ Z-transform
References Alexandrescu M, Gilbert D, Hulot G, Le Mouël JL, Saracco G (1995) Detection of geomagnetic jerks using wavelet analysis. J Geophys Res 100:12557–12572 Ansari RA, Buddhiraju KM (2015) k-means based hybrid wavelet and curvelet transform approach for denoising of remotely sensed images. Remote Sens Lett 6(12):982–991 Ansari RA, Buddhiraju KM (2016) Textural classification based on wavelet, curvelet and contourlet features. In: 2016 IEEE international geoscience and remote sensing symposium (IGARSS). IEEE, pp 2753–2756 Ansari RA, Malhotra R, Buddhiraju KM (2020a) Informal settlement identification using contourlet assisted deep learning. Sensors 20(9): 2733. https://doi.org/10.3390/s200927332 Ansari RA, Buddhiraju KM, Malhotra R (2020b) Urban change detection analysis utilizing multiresolution texture features from polarimetric SAR images. Remote Sens Appl Soc Environ 20:100418. https://doi.org/10.1016/j.rsase.2020.100418 Arneodo A, Audit B, Decoster N, Muzy J-F, Vaillant C (2002) Wavelet based multifractal formalism: applications to DNA sequences, satellite images of the cloud structure and stock market data. In: Bunde A, Kropp J, Schellnhuber HJ (eds) The science of disasters: climate disruptions, heart attacks, and market crashes. Springer, Berlin, pp 26–102 Arneodo A, Audit B, Kestener P, Roux S (2008) Wavelet-based multifractal analysis. Scholarpedia 3(3):4103. revision #137211
Signal Processing in Geosciences Chandrasekhar E, Rao VE (2012) Wavelet analysis of geophysical welllog data of Bombay offshore basin, India. Math Geosci 44(8): 901–928. https://doi.org/10.1007/s11004-012-9423-4 Chandrasekhar E, Prasad P, Gurijala VG (2013) Geomagnetic jerks: a study using complex wavelets. In: Chandrasekhar E et al (eds) Wavelets and fractals in earth system sciences. CRC Press, Taylor and Francis, pp 195–217 Chandrasekhar E, Subhakar D, Vishnu Vardhan Y (2015) Multifractal analysis of geomagnetic storms. Proc. of the 17th annual conference of the Intl. Assoc. Math. Geosci. (Editors: Helmut Schaben et al.), ISBN: 978-3-00-050337-5 (DVD), pp 1122–1127 Chandrasekhar E, Prabhudesai SS, Seemala GK, Shenvi N (2016) Multifractal detrended fluctuation analysis of ionospheric Total electron content data during solar minimum and maximum. J Atmos Sol Terr Phys 149:31–39 Dätig M, Schlurmann T (2004) Performance and limitations of the Hilbert-Huang transformation (HHT) with an application to irregular water waves. Ocean Eng 31:1783–1834 Daubechies I (1988) Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math XLI:909–996. John Wiley & Sons Inc. Daubechies I (1992) Ten lectures on wavelets, 2nd ed., SIAM, Philadelphia. CBMS-NSF regional conference series in applied mathematics 61 Farge M (1992) Wavelet transforms and their applications to turbulence. Annu Rev Fluid Mech 24:395–457 Farge M, Kevlahan N, Perrier V, Goirand E (1996) Wavelets and turbulence. Proc IEEE 84(4):639–669 Fernandez-Martinez M, Guirao JLG, Sanchez-Granero MA, Segovia JET (2019) Fractal dimension for fractal structures (with applications to finance). Springer Nature, Cham., 221 pages. https://doi.org/ 10.1007/978-3-030-16645-8 Flandrin P, Rilling G, Goncalves P (2004) Empirical mode decomposition as a filter Bank. IEEE Sig Proc Lett 11:112–114 Gabor D (1946) Theory of communication. J Instit Elect Eng 93: 429–441 Gaci S (2016) A new ensemble empirical mode decomposition (EEMD) denoising method for seismic signals. Energy Procedia 97:84–91 Gaci S, Zaourar N (2014) On exploring heterogeneities from well logs using the empirical mode decomposition. Energy Procedia 59:44–50 Gairola GS, Chandrasekhar E (2017) Heterogeneity analysis of geophysical well-log data using Hilbert Huang transform. Physica A 478: 131–142. https://doi.org/10.1016/j.physa.2017.02.029 Gao Q, Duan C, Fan H, Meng Q (2008) Rotating machine fault diagnosis using empirical mode decomposition. Mech Syst Signal Process 22: 1072–1081 Garcia X, Jones AG (2008) Robust processing of magnetotelluric data in the AMT dead band using the continuous wavelet transform. Geophysics 73:F223–F234 Gilmore M, Yu CX, Rhodes TL, Peebles WA (2002) Investigation of rescaled range analysis, the Hurst exponent, and long-time correlations in plasma turbulence. Phys Plasmas 9:1312–1317 Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) Physio bank, physio toolkit, and physio net: components of a new research resource for complex physiologic signals. Circulation 101(23):e215– e220. http://circ.ahajournals.org/cgi/content/full/101/23/e215 Gonzalez RC, Woods RE (2007) Digital image processing, 3rd edn. Prentice Hall, New York Graps A (1995) An introduction to wavelets. IEEE Comput Sci Eng 2: 50–61. https://doi.org/10.1109/99.388960 Gururajan MP, Mitra M, Amol SB, Chandrasekhar E (2013) Phase field modeling of the evolution of solid–solid and solid–liquid boundaries: Fourier and Wavelet implementations. In: Chandrasekhar E et al (eds) Wavelets and fractals in earth system sciences. CRC Press, Taylor and Francis, pp 247–271 Hayes MH (1999) Schaum’s outline of theory and problems of digital signal processing. McGraw-Hill Companies, Inc. ISBN 0–07–027389–8, 436 pages
1319 Haykins S (2006) Signals and systems, 4th edn. Wiley. 816 pages Hill EJ, Uvarova Y (2018) Identifying the nature of lithogeochemical boundaries in drill holes. J Geochem Explr 184:167–178 Hongre L, Sailhac P, Alexandrescu M, Dubois J (1999) Nonlinear and multifractal approaches of the geomagnetic field. Phys Earth Planet Inter 110:157–190 Huang NE, Wu Z (2008) A review on Hilbert-Huang transform: method and its application to geophysical studies. Rev Geophys 46:RG2006 Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, Yen NC, Tung CC, Liu HH (1998) The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis. Proc R Soc Lond Ser A Math Phys Eng Sci 454:903–993 Jaffard S (1991) Detection and identification of singularities by the continuous wavelet transform, Preprint, Lab. Math. Modelisation, Ecole Nat. Ponts-et-Chaussees, La Courtine, Noisy-Ie-Grand, France Jansen FE, Kelkar M (1997) Application of wavelets to production data in describing inter-well relationships, in: Society of Petroleum Engineers # 38876: annual technical conference and exhibition, San Antonio, TX, October 5–8, 1997 Jayaswal V, Gairola GS, Chandrasekhar E (2021) Identification of the typical frequency range associated with subsurface gas zones: a study using Hilbert-Huang transform and wavelet analysis. Arab J Geo Sci 14:335. https://doi.org/10.1007/s12517-021-06606-5 Kantelhardt JW (2002) Fractal and multifractal time series. In: Meyers R (ed) Encyclopedia of complexity and system science. Springer, Berlin/Heidelberg, pp 3754–3779 Kantelhardt JW, Zschiegner SA, Koscielny-Bunde E, Havlin S, Bunde A, Stanley HE (2002) Multifractal detrended fluctuation analysis of nonstationary time series. Physica A 316:87–114. https://doi. org/10.1016/S0378-4371(02)01383-3 Kunagu P, Balasis G, Lesur V, Chandrasekhar E, Papadimitriou C (2013) Wavelet characterization of external magnetic sources as observed by CHAMP satellite: evidence for unmodelled signals in geomagnetic field models. Geophys J Int 192:946–950. https://doi.org/10.1093/ gji/ggs093 Lopes R, Betrouni N (2009) Fractal and multifractal analysis: a review. Med Image Anal 13:634–649. https://doi.org/10.1016/j.media.2009. 05.003 Ma K, Tang X (2001) Translation-invariant face feature estimation using discrete wavelet transform. In: Tang YY et al (eds) WAA 2001, LNCS 2251, Springer, Berling, Heidelberg, pp 201–210 Mallat S (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11: 674–693 Mallat S (1999) A wavelet tour of signal processing. Academic, San Diego Mandelbort BB (1967) How long is the coast of Britain? Statistical selfsimilarity and fractional dimension. Science 156:636–638 McDonald AJ, Baumgärtner AJG, Frasher GJ, George SE, Marsh S (2007) Empirical mode decomposition of atmospheric wave field. Ann Geophys 25:375–384 Mohan NL, Sundararajan N, Rao SVS (1982) Interpretation of some two-dimensional magnetic bodies using Hilbert transforms. Geophysics 47:376–387 Neukirch M, Garcia X (2014) Nonstationary magnetotelluric data processing with instantaneous parameter. J Geophys Res Solid Earth 119:1634–1654. https://doi.org/10.1002/2013JB010494 Odegard ME, Berg JW (1965) Gravity interpretation using the Fourier integral. Geophysics 30:424–438 Panda MN, Mosher CC, Chopra AK (2000) Application of wavelet transforms to reservoir – data analysis and scaling. SPE J 5:92–101 Peng C-K, Buldyrev SV, Havlin S, Simons M, Stanley HE, Goldberger AL (1994) Mosaic organization of DNA nucleotides. Phys Rev E 49(2):1685–1689 Prokoph A, Agterberg FP (2000) Wavelet analysis of well-logging data from oil source rock, egret member offshore eastern Canada. Am Assoc Pet Geol Bull 84:1617–1632
S
1320 Salat H, Murcio R, Arcaute E (2017) Multifractal methodology. Physica A 473:467–487 Sengar RS, Cherukuri V, Agarwal A, Gadre VM (2013) Multiscale processing: a boon for self-similar data, data compression, singularities, and noise removal. In: Chandrasekhar E et al (eds) Wavelets and fractals in earth system sciences. CRC Press, Taylor and Francis, pp 117–154 Sharma B, Geldart LP (1968) Analysis of gravity anomalies of two dimensional faults using Fourier transforms. Geophys Prospect 16: 76–93 Sharma M, Vanmali AV, Gadre VM (2013) Construction of wavelets: principles and practices. In: Chandrasekhar E et al (eds) Wavelets and fractals in earth system sciences. CRC Press, Taylor and Francis, pp 29–92 Subhakar D, Chandrasekhar E (2015) Detrended fluctuation analysis of geophysical well-log data. In: Dimri VP (ed) Fractal solutions for understanding complex systems in Earth sciences, 2nd edn. Springer International Publishing, Cham, pp 47–65. https://doi.org/10.1007/ 978-3-319-24675-8_4 Subhakar D, Chandrasekhar E (2016) Reservoir characterization using multifractal detrended fluctuation analysis of geophysical well-log data. Physica A 445:57–65. https://doi.org/10.1016/j.physa.2015. 10.103 Sundararajan N (1983) Interpretation techniques in exploration geophysics using Hilbert transforms: Ph.D. thesis, Osmania University, Hyderabad, India Sundararajan N, Srinivas Y (2010) Fourier–Hilbert versus Hartley–Hilbert transforms with some geophysical applications. J Appl Geophys 71:157–161 Telesca L, Lapenna V, Macchiato M (2004) Mono- and multi-fractal investigation of scaling properties in temporal patterns of seismic sequences. Chaos, Solitons Fractals 19:1–15. https://doi.org/ 10.1016/S0960-0779(03)00188-7 Telesca L, Lovallo M, Hsu HL, Chen CC (2012) Analysis of site effects in magnetotelluric data by using the multifractal detrended fluctuation analysis. J Asian Earth Sci 54–55:72–77 Torres ME, Colominas MA, Schlotthauer G, Flandrin P (2011) A complete ensemble empirical mode decomposition with adaptive noise. IEEE Int Conf Acoustics Speech Signal Process (ICASSP), 2011, 4144–4147. https://doi.org/10.1109/ICASSP.2011. 5947265 Vega NR (2003) Reservoir characterization using wavelet transforms (Ph.D. dissertation), Texas A&M University Wu ZH, Huang NE (2009) Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv Adapt Data Anal 1: 1–4 Xue Y, Cao J, Tian R (2013) A comparative study on hydrocarbon detection using three EMD-based time frequency analysis methods. J Appl Geophys 89:108–115 Yu ZG, Anh V, Estes R (2009) Multifractal analysis of geomagnetic storm and solar flare indices and their class dependence. J Geophys Res 114:A05214. https://doi.org/10.1029/2008JA013854 Zhang Y, Paulson KV (1997) Enhancement of signal-to-noise ratio in natural-source transient magnetotelluric data with wavelet transform. PAGEOPH 149:405–419 Zhang Y, Goldak D, Paulson KV (1997) Detection and processing of lightning-sourced magnetotelluric transients with the wavelet transform. IEICE Trans Fund Elect Commun Comp Sci E80: 849–858 Zhang Q, Zhang F, Liu J, Wang X, Chen Q, Zhao L, Tian L, Wang Y (2018) A method for identifying the thin layer using the wavelet transform of density logging data. J Pet Sci Eng 160:433–441 Zhao N, Li R (2015) EMD method applied to identification of logging sequence strata. Acta Geophys 63(5):1256–1275. https://doi.org/ 10.1515/acgeo-2015-0052
Simulated Annealing
Simulated Annealing Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition Physical annealing with solids is a thermodynamic process used in metallurgy, where heating and controlled cooling are used for modifying the physical properties of a material. The crystalline solid is first heated and then left to cool slowly until it reaches its most stable and regular crystal lattice configuration, i.e., with minimum lattice energy, and is free of crystal defects. Annealing gives superior structural integrity of solids if the cooling is done sufficiently slowly (Henderson et al. 2003). Thus, the physical state of a material, broadly represented as a physical system, is measured using its internal energy. The state of the material, i.e., the system, is stable when its internal energy is minimum. The state transition occurs during the cooling schedule in one of the two ways, namely, (i) when the material goes to a lower-energy configuration, by design, and (ii) when it goes to an upperenergy configuration under a specific condition. The condition for the latter is that the transition has an acceptance probability higher than a random number between 0 and 1. Simulated annealing (SA) is akin to the annealing process where a computational system finds the global minimum as an optimum in the solution space through local searches. Thus, SA is defined as an optimization method that is a local search meta-heuristic (Henderson et al. 2003). Optimization methods are a class of computational methods that find a solution that either maximizes or minimizes an objective function, where the state of the system being optimized reaches the maximum or minimum value in its optimal state. Meta-heuristics are strategies of searching for a sufficiently good solution to an optimization problem. SA is also as an iterative algorithm which uses local search strategy using hillclimbing method. Hill-climbing methods involve transitions of the states of the physical system that worsen the objective function. That said, SA is focused on finding an approximate globally optimal solution, as opposed to a precise locally optimal solution. Thus, the hill-climbing moves help in escaping the local extrema. In each iteration, the annealing process requires identifying two solutions in the solution space, where the probability of the system changing from one state to another is computed, and the solutions are compared to select the one that improves the value of the objective function is chosen, for the next iteration. Such transitions are referred to as improving solutions. In a few cases, a solution that
Simulated Annealing
deteriorates the objective function value referred to as a nonimproving solution, is chosen only to escape the local optima, in favor of the search for the global optima. This characteristic of SA has made it a popular optimization tool for both discrete optimization problems as well as continuous variable problems.
Overview While SA has been found to be used prior to the formal usage of its current name, the documented introduction of the method, that also coined the term, was by Kirkpatrick et al. in 1983. This original method has since then been improved, and now, these resulting methods form a class of algorithms that mimics the physical annealing process (Kirkpatrick et al. 1983) and also includes the use of hill-climbing. The other class of approaches includes those that solve the Langevin equations, which are stochastic differential equations, and the global minimum is marked as the solution (Henderson et al. 2003). Langevin equation describes how a system goes through state transitions under the influence of deterministic but fluctuating random forces and is also used for describing Brownian motion. Every instance of SA has the following elements (Bertsimas and Tsitsiklis 1993): • A finite set S of states. • A real valued objective/cost function J defined on S such that S S, where the global minimum of J lie in S*. • A set of neighbors of i given as N ðiÞ with symmetric neighborhood relationship, i.e., j N ðiÞ implies that i N ðjÞ, where j S {i}. • The cooling schedule, which is a monotonously decreasing function T : Z ! (0, 1), where Z is a set of positive integers, i.e., Z ℤ+, where T(t) is the temperature at time t, and the initial temperature is T(t0) ¼ T0, which is a positive value. • An initial state i0 S. The iterative algorithm has a nested loop (Henderson et al. 2003) where the outer loop is governed by the temperature, and the inner loop is governed by the local neighborhood of a chosen state in a specific iteration. For every iteration of the outer loop, a specific number of iterations of the inner loop is implemented, and a decrement of temperature occurs. The outer loop is terminated when the stopping criterion is met, e.g., a certain temperature is reached. There is a “current state” associated with each iteration of the outer loop, whose neighborhood is searched for better solutions within the inner loop. The “current state” in the first iteration of the outer loop is i0. In the inner iteration, the neighbors of the chosen iteration are considered for a transition, if it is an improving solution except in specific cases of avoiding the local
1321
minimum. The transitions lead to the change of the “current state” of the outer loop. The transitions are selected using hillclimbing, in practice. SA has been used for several applications, including traditional optimization problems, e.g., graph partitioning, graph coloring, and the traveling salesman problem (TSP) (Bertsimas and Tsitsiklis 1993), and industrial engineering/operations research problems, e.g., flow shop and job shop scheduling and lot sizing (Henderson et al. 2003). The standard SA has a few disadvantages, including (a) the computationally intensive and inefficient approach in finding the optimal solution and (b) its inflexibility in certain problems (Ingber 1993). Thus, throughout the evolution of the SA, several directions have been pursued to improve the algorithm and its usability. These include generalizability of hillclimbing algorithm, convergence, efficient implementation, its extensions, etc. (Henderson et al. 2003). Genetic algorithms have been used in speeding up SA (Ingber 1993). One such method is simulated quenching, where the cooling schedule is a logarithmic function to allow faster cooling (Ingber 1993). The computational efficiency can be improved using parallel implementation using multiple processing units, such as CPU and GPU. Specific strategies are required to run an inherently sequential algorithm, such as SA, using parallel methods. One such implementation uses speculative computing to impose multi-point statistics for generating stochastic models subject to constraints (Peredo and Ortiz 2011). A variant of SA is the adaptive simulated annealing (ASA), which relies on importance sampling of the parameter space, as opposed to deterministic methods. There are several strategies to implement ASA, of which one is about deriving parameters for current iteration from previous iterations. SA can also be extended to multidimensional case where the objective function is defined on a subset of a k-dimensional Euclidean space (Fabian 1997). This implies the neighborhood search has to be extended to k-dimensional space, where both deterministic and random search strategies can be used.
Applications SA has been applied for several applications in geoscience and related domains since its inception. One such application is to being a feasible alternative for multi-objective land allocation (MOLA) (Santé-Riveira et al. 2008). Rural landuse allocation is a complex problem given the multifunctionality of land units and noncontiguous fragmentation owing to urbanization. Thus, the conflicting multiple objectives for land-use allocation includes evaluating land units for both its utility type (e.g., crop type being cultivated), the contiguity of the units, and the compactness of resultant single-use landmasses. Additionally, SA is compared against
S
1322
other similar optimization methods, such as integer programming, with respect to its run-time efficiency for large-scale problems, which is the case for regions with 2000–3000 land units. The SA has been found to be superior to alternative methods such as hierarchical optimization, ideal point analysis, and MOLA. The efficiency of SA has been found to be improved when provided with good a priori land use areas. Optimization of reconfigurable satellite constellations (ReCon) taking into consideration the satellite design and orbits is a multidisciplinary problem (Paek et al. 2019). A ReCon is significant in agile earth observation, where it is used for both regular Earth observation as well as disaster monitoring. It is a constrained optimization problem which included four constraints, namely, (i) global observation mode (GOM) coverage, (ii) regional observation mode (ROM) revisit time, (iii) constellation mass, and (iv) reconfiguration time. The threefold objective in optimizing a ReCon include (i) minimization of revisit access time and maximization of area coverage, (ii) minimization of initial launch mass of ReCon, and (iii) minimization of reconfiguration time. The variables corresponding to the constraints are known to demonstrate nonlinear behavior, owing to which closed-form models are not available. Among the different SA methods used for this optimization problem, the point-based method has been found to outperform the gradient-based one. Given the nonlinearity behavior of the variables, genetic algorithms perform slightly better than SA in giving more optimal solutions, for this application. SA can be used in conjunction with other data analytic methods. For an application of flash-flood hazard mapping, SA is used for eliminating redundant variables from the classifier models, which include boosted generalized linear model (GLMBoost), random forest classifier (RFC), and Bayes generalized linear model (BayesGLM) used in an ensemble (Hosseini et al. 2020). The classification done in this application is based on the severity of the flash flood and is used for forecasting. SA has been used in a novel and effective way for feature selection here.
Future Scope In a similar way of integrating SA in a data analytic process, there is increased interest in the use of SA in deep learning, of late (Rere et al. 2015). While deep learning has been effective in classification and feature learning, it has its limitations in the ease of training models. SA can be used in improving the performance of convolutional neural networks (CNNs), in lieu of using optimal deep learning using meta-heuristics. This work has shown improvement in the performance of the CNNs with SA, specifically in the reduction of classification error, albeit at the cost of higher computation time. Overall, it can be concluded that simulated annealing continues to be used as a powerful optimization tool in
Simulation
applications in geosciences and related domains. It has the potential to combine with newer data analysis methods to give improved results.
Cross-References ▶ Constrained optimization ▶ Deep learning in Geoscience ▶ Optimization in Geosciences
References Bertsimas D, Tsitsiklis J (1993) Simulated annealing. Stat Sci 8(1):10–15 Fabian V (1997) Simulated annealing simulated. Comput Math Appl 33(1–2):81–94 Henderson D, Jacobson SH, Johnson AW (2003) The theory and practice of simulated annealing. In: Handbook of metaheuristics. Springer, New York, pp 287–319 Hosseini FS, Choubin B, Mosavi A, Nabipour N, Shamshirband S, Darabi H, Haghighi AT (2020) Flash-flood hazard assessment using ensembles and bayesian-based machine learning models: application of the simulated annealing feature selection method. Sci Total Environ 711(135):161 Ingber L (1993) Simulated annealing: practice versus theory. Math Comput Model 18(11):29–57 Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 Paek SW, Kim S, de Weck O (2019) Optimization of reconfigurable satellite constellations using simulated annealing and genetic algorithm. Sensors 19(4):765 Peredo O, Ortiz JM (2011) Parallel implementation of simulated annealing to reproduce multiple-point statistics. Comput Geosci 37(8):1110–1121 Rere LR, Fanany MI, Arymurthy AM (2015) Simulated annealing algorithm for deep learning. Proc Comput Sci 72:137–144 Santé-Riveira I, Boullón-Magán M, Crecente-Maseda R, Miranda-Barrós D (2008) Algorithm based on simulated annealing for land-use allocation. Comput Geosci 34(3):259–268
Simulation Behnam Sadeghi1,2 and Julian M. Ortiz3 1 EarthByte Group, School of Geosciences, University of Sydney, Sydney, NSW, Australia 2 Earth and Sustainability Research Centre, School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, Australia 3 The Robert M. Buchan Department of Mining, Queen’s University, Kingston, ON, Canada
Definition Simulation refers to the construction of computer-based models that represent natural phenomena. These models can
Simulation
be stochastic in nature, where an underlying probabilistic framework defines the possible outcomes of the model, or they can refer to complex numerical modeling based on processes or physics principles. In this article, we refer to stochastic models built for geoscientific applications in a spatial context, where geostatistical methods are used to create multiple possible outcomes of the spatial distribution of different attributes. These outcomes are called realizations and are based on a random function model that is reproduced by a geostatistical method, or are defined algorithmically. In all cases, the goal is to properly characterize the spatial variability of one or more variables that represent the phenomenon.
Introduction The application of geostatistical simulation has been significantly developed in geosciences to quantify and assess the variability at a local scale using conditional realizations, and model the uncertainty at unsampled locations (Journel 1974; Deutsch and Journel 1998; Chilès and Delfiner 2012; Mariethoz and Caers 2014; Sadeghi 2020) – see the ▶ “Uncertainty Quantification” chapter. Geostatistical simulation methods can be used to characterize categorical and continuous variables. In the case of categorical variables, each location can take one and only one of K categories. For continuous variables, the variable can take a value in a continuous range, which depends on its statistical distribution. In both cases, the spatial continuity of the variable is characterized by variograms (see the ▶ “Variogram” chapter) or multiple-point statistics (see the ▶ “Multiple Point Statistics” chapter). Unlike interpolation methods where a single prediction is obtained at every location, and the predicted values do not reproduce the spatial variability of the phenomenon, geostatistical simulations honor the data at sample locations, reproduce the statistical distribution of the variable over the simulated domain, and impose the spatial continuity of the model over each realization, which allows the characterization of uncertainty over any transfer function that involves multiple locations in the domain simultaneously.
Methods The simplest frameworks to create these stochastic models include the indicators approach, which can be used to characterize categorical and continuous variables, and the multiGaussian approach, which is the basis of several simulation methods for continuous variables, but can also be used for methods based on truncation rules to represent categorical variables. Algorithmically driven methods are based on
1323
optimization (e.g., simulated annealing) and on reproduction of multiple-point statistics.
Indicator Approach The indicator approach is based on coding the random variable as an indicator. This indicator variable can then be characterized by its indicator variogram or transition probabilities, and estimated with indicator kriging. The indicator kriging estimates the probability of prevalence of a given class in the case of categorical variables, or the probability of not exceeding a threshold, in continuous variables. These probabilities can be used to build the conditional distribution and draw simulated values in a sequential fashion (Alabert 1987; Journel 1989). For a categorical variable S, where S(u) can take one of K values {s1, . . ., sK}, K indicator variables are defined as: I ðu; kÞ ¼
1, if SðuÞ ¼ sk
k ¼ 1, . . . , K
0, otherwise
The categorical random function S can be characterized by its statistical distribution, in particular its probability mass function, and by the direct and cross-indicator variograms (or transition probabilities): gI ðh; k, k0 Þ ¼
1 Ef½I ðu; kÞ I ðu þ h; kÞ 2
½I ðu; k0 Þ I ðu þ h; k0 Þ g k, k0 ¼ 1, . . . , K
The probability of prevalence of any of the K categories at a particular location can be inferred by applying kriging (or cokriging) to the indicator variable for that particular category. For instance, the simple indicator kriging estimate at location u0 is: I ðu0 ; kÞSK ¼
n
1
n
lj
Pk þ
j¼1
lj I uj ; k j¼1
where Pk is the probability mass of category sk, and the weights lj are determined by solving the simple indicator kriging system of equations: n
C I ui , uj ; k lj ¼ C I ð ui , u0 ; k Þ
i ¼ 1, . . . , n
j¼1
where the indicator covariances CI are inferred from the variogram γI that characterizes the indicator variable. The predicted conditional distribution can be built by combining the K estimates I ðu; kÞSK , k ¼ 1, . . . , K . This conditional
S
1324
Simulation
distribution can be used in a sequential framework for conditional simulation, as described later. In the case of a continuous variable Z, where Z(u) is characterized by its statistical distribution FZ, a set of K indicator variables is defined as: I ðu; kÞ ¼
1, if Z ðuÞ zk 0, otherwise
k ¼ 1, . . . , K
where the thresholds zk, k ¼ 1, . . ., K discretize the global distribution FZ. The random function Z(u) is characterized by
its spatial continuity through direct and cross-indicator variograms, and the conditional distribution at a particular location u0 can be inferred by kriging (or cokriging) the indicators. The conditional cumulative distribution function is approximated by: FZzk ðu0 Þ ¼ I ðu0 ; kÞ As with the case of categorical variables (Fig. 1), these conditional distributions can be used in sequential simulation to create random fields that reproduce the spatial continuity of the variable.
Simulation, Fig. 1 Sequential indicator simulation (SISIM) representative realizations (#20, 40, 60, and 80) of 100 conditional realizations for four mineralized zones in Daralu copper deposit (SE Iran). (From Sojdehee et al. 2015)
Simulation
1325
distribution of Y(u), Y(u þ h1), . . ., Y(u þ hn) is a (n þ 1)variate multi-Gaussian distribution for any n, that is:
Multi-Gaussian Approach The multi-Gaussian approach relies on a distributional assumption to infer the properties of the random function (Goovaerts 1997). This greatly simplifies the inference and calculations. In most cases, this requires an initial transformation of the variable into a (standard) normal distribution that is assumed to behave as a multi-Gaussian variable:
f Y ðuÞ...Y ðuþhn Þ ðy0 , . . . , yn Þ ¼
exp 12 ðy mÞT S1 ðy mÞ ð2pÞnþ1 j S j
where
Z ¼ ’ðY Þ, with Y N ð0, 1Þ The multi-Gaussian assumption for Y means that the joint
y¼
y0
m0
s2Y ðuÞ
C Y ðuÞY ðuþh1 Þ
C Y ðuÞY ðuþhn Þ
y1
m1
C Y ðuþh1 ÞY ðuÞ
s2Y ðuþh1 Þ
C Y ðuþh1 ÞY ðuþhn Þ
⋮
⋮
⋱
⋮
σ2Y ðuþhn Þ
⋮ yn
m¼
S¼
⋮
C Y ðuþhn ÞY ðuÞ C Y ðuþhn ÞY ðuþh1 Þ
mn
The vector m contains the means of the corresponding random variables, while S represents the covariances between pairs of random variables. Under this assumption any conditional distribution is Gaussian in shape and its conditional mean and conditional variance can be inferred by simple kriging of the conditioning variables: n
Y SK ðuÞ ¼
lj Y u þ hj j¼1
s2Y,SK ðuÞ ¼ 1
n j¼1
lj CY ðuÞY ðuþhj Þ:
As in the indicator framework, these conditional distributions can be used in sequential simulation to create random fields that reproduce the spatial continuity of the variable Y. These values can be back-transformed to the original units to create a random field of the original variable Z.
Sequential Simulation The indicator or multi-Gaussian approaches facilitate the inference of the local distribution at a given location u, conditioned by a set of known values at locations u þ h1, . . ., u þ hn. A random field can be simulated by the recursive application of Bayes’ law. Every newly simulated value
becomes a new conditioning point, leading to a set of spatially correlated points drawn from the joint multivariate (multipoint) distribution (Journel 1994).
Simulation Methods Although the sequential simulation framework provides a general method to simulate categorical or continuous variables, there exist other methods to achieve similar results. Categorical variables can be simulated with sequential indicator simulation, or by truncation of one or more multiGaussian fields, as in truncated Gaussian simulation (Matheron et al. 1987), and pluri-Gaussian simulation (Armstrong et al. 2003). The key in the last two methods is to determine the spatial continuity of the multi-Gaussian fields in order to obtain the continuity of the indicator variables, once the truncation rule is applied. Continuous variables can be simulated with sequential Gaussian simulation (SGSIM- Fig. 2) (Sadeghi 2020), but many other methods exist to create multi-Gaussian random fields, such as turning bands (TBSIM- Fig. 3), matrix decomposition (LU simulation), and FFT moving average. Additionally, multiple-point simulation methods (Mariethoz and Caers 2014), some of which also rely on the sequential framework, can also be used to impose more complex relationships that are not properly reproduced with the sole use of variograms, both in the case of categorical and continuous variables.
S
1326
Simulation
Simulation, Fig. 2 Several representative realizations obtained by SGSIM, along with their Etype (mean at every location over the realizations), and conditional variance (variance at every location over the realizations)
Furthermore, most methods can be extended to the multivariate case, where more than one attribute can be simulated simultaneously, preserving the direct and cross-spatial correlation, by generalizing the inference of the conditional distribution with simple cokriging. In practice, there are many implementation details associated to the simulation of variables in real applications. Dealing with trends and nonstationary features and modeling multivariate relationships that depart from the multi-Gaussian assumption are the most relevant problems.
Use of Simulated Realizations The different realizations generated by any of the above methods should: 1. Reproduce the conditioning sample data at their locations 2. Reproduce the global histogram (or proportions) of the variable in a stationary domain 3. Reproduce the spatial continuity of the variable (variogram or multiple-point statistics)
These realizations are not locally accurate, but aim at reproducing the in situ variability of the attribute in space. Therefore, these realizations can be used to characterize the uncertainty around the expected value at every location. This requires processing an ensemble (or a set) of realizations, in order to assess the expected variability in any response. This means that every realization is passed through a transfer function to generate a unique response. The set of responses obtained from processing an ensemble of realizations reflects the uncertainty about that response. The ensemble of realizations can thus be interrogated to answer many different questions, such as the probability of one location to exceed a threshold zC, or the probability of a volume (i.e., a collection of point locations) to exceed that same threshold zC. More complex transfer functions are used in practical applications in environmental sciences, geological engineering, mining, petroleum, and other disciplines. The transfer function may be a complicated function of multiple locations, as in the case of the flow of petroleum in an oil reservoir, or be linked to a nonlinear function of the attribute, such as when calculating the metal recovery in a mine. It can
Simulation
1327
Simulation, Fig. 3 TBSIM several representative realizations (#100, #500 and #900)
also depend on the production sequence and time frame if a net present value is computed. The variability captured by the simulations translates in uncertainty in prediction and design, which can be accounted for to obtain a robust and resilient design of the extractive approach in a mine or a petroleum reservoir.
case in geosciences, environmental, mining, and petroleum applications, where the response may depend on multiple locations simultaneously and even change with time, as in the case of the net present value. Geostatistical simulation is a powerful tool to address the uncertainty in geoscience projects and obtain robust optima in decision-making that account for such uncertainty.
Summary or Conclusions Cross-References Geostatistical simulation provides tools to characterize the variability of attributes at unsampled locations, and a quantification of the uncertainty related to any transfer function over the attributes. Simulation works by adding the lost variability that estimation methods smooth out, thus trading the local accuracy by the reproduction of the spatial variability. There are many different simulation methods to characterize categorical and continuous variables, which can also be extended to the simulation of multiple variables. The resulting realizations honor the data at sample locations, reproduce the global histogram, and reproduce the spatial continuity. These realizations must be treated as an ensemble to assess any response variable. There is no single best realization. All of the valid realizations must be processed through the transfer function to obtain multiple responses. The collection of responses characterizes the uncertainty after the transfer function. This is the
▶ Multiple Point Statistics ▶ Uncertainty Quantification ▶ Variogram
Bibliography Alabert FG (1987) Stochastic imaging of spatial distributions using hard and soft information. Master’s thesis, Stanford University, Stanford Armstrong M, Galli A, Le Loc’h G, Geffroy F, Eschard R (2003) Plurigaussian simulations in geosciences. Springer, Berlin Chiles JP, Delfiner P (2012) Geostatistics modeling spatial uncertainty, 2nd edn. Wiley, New York Deutsch CV, Journel AG (1998) GSLIB: geostatistical software library and user’s guide, 2nd edn. Oxford University Press, New York Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York
S
1328 Journel AG (1974) Geostatistics for conditional simulation of orebodies. Econ Geol 69:673–680 Journel AG (1989) Fundamentals of geostatistics in five lessons. volume 8 Short course in geology. American Geophysical Union, Washington, DC Journel AG (1994) Modeling uncertainty: some conceptual thoughts. In: Geostatistics for the next century. Springer, Dordrecht Mariethoz G, Caers J (2014) Multiple-point geostatistics: stochastic modeling with training images. Wiley Matheron G, Beucher H, De Fouquet C, Galli A, Guerillot D, Ravenne C (1987) Conditional simulation of the geometry of fluvio-deltaic reservoirs. In: 62nd annual technical conference and exhibition of the society of petroleum engineers, Dallas. SPE 16753, 1987, pp 571–599 Sadeghi B (2020) Quantification of uncertainty in geochemical anomalies in mineral exploration. PhD thesis, University of New South Wales Sojdehee M, Rasa I, Nezafati N, Vosoughi Abedini M, Madani N, Zeinedini E (2015) Probabilistic modeling of mineralized zones in Daralu copper deposit (SE Iran) using sequential indicator simulation. Arab J Geosci 8:8449–8459
Singular Spectrum Analysis U. Radojičić1, Klaus Nordhausen2 and Sara Taskinen2 1 Institute of Statistics and Mathematical Methods in Economics, TU Wien, Vienna, Austria 2 Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä, Finland
Definition Singular spectrum analysis (SSA) is a family of methods for time series analysis and forecasting, which seeks to decompose the original series into a sum of a small number of interpretable components such as trend, oscillatory components, and noise.
Introduction Singular spectrum analysis (SSA) aims at decomposing the observed time series into the sum of a small number of independent and interpretable components such as a slowly varying trend, oscillatory components, and noise (Elsner and Tsonis 1996; Golyandina et al. 2001). SSA can be used, for example, for finding trends and seasonal components (both short and large period cycles) in time series, smoothing, and forecasting. When SSA is used as an exploratory tool, one does not need to know the underlying model of the time series. The origins of SSA are usually associated with nonlinear dynamics studies (Broomhead and King 1986a,b), and over
Singular Spectrum Analysis
time, the method has gained a lot of attention. As stated in (Golyandina et al. 2001), SSA has proven to be very successful in time series analysis and is now widely applied in the analysis of climatic, meteorological, and geophysical time series. Many variants of SSA exist. However, in the following, we focus on a specific SSA method, often referred to as basic SSA.
Singular Spectrum Analysis We define x ¼ {xt: t ¼ 1,. . ., N} for an observable time series. The SSA algorithm consists of the following four steps (Golyandina et al. 2018). 1. Embedding: First, the so-called l k-dimensional trajectory matrix
T ð xÞ ¼
x1
x2
x3
x2
x3
x4
xkþ1
x3 ⋮
x4 ⋮
x5 ⋮
xkþ2 ⋱ ⋮
xl
xlþ1
xlþ1
xk ð1Þ
xN
is constructed. Here N is the time series length, l is so-called window length and k ¼ N – l þ 1. The trajectory matrix T is a linear map mapping x ℝN into a l k-dimensional Hankel matrix, i.e., to l k-dimensional matrix X ¼ T (x) with equal elements on the off-diagonals. Columns xi ¼ (xi,. . ., xi þ l–1)0 , i ¼ 1,. . ., k, of X are called as lagged vectors of dimension l. Golyandina et al. (2001) suggests that l should be smaller than N/2 but sufficiently large so that l-lagged vector xi, i ¼ 1,. . ., k, incorporates the essential part of the behavior in the initial time series x. 2. Decomposition: Once the time series x is embedded into a trajectory matrix X, it is decomposed into a sum of rank-1 matrices. Denote now S ¼ X X0 and write S ¼ U ΛU0 for an eigendecomposition of S. Here Λ is a l l diagonal matrix with nonnegative eigenvalues, l1 ll, as diagonal values and U ¼ (u1,. . ., ul)0 , ui ℝl, is an l l orthogonal matrix that includes the corresponding eigenvectors as p columns. If we further write vi ¼ X 0 ui = li , i ¼ 1,. . ., d, where d ¼ rank(X), then X can be decomposed into a sum of rank-1 matrices as follows: d
X¼
d
Xi ¼ i¼1
li ui v0i :
ð2Þ
i¼1
Notice that ui ℝl and vi ℝk are left p and right singular vectors of X, respectively, and li > 0 are the
Singular Spectrum Analysis
1329
corresponding singular values. In SSA literature these ordered singular values are often referred to as the singular spectrum, thus giving the name to the method (Elsner and Tsonis, 1996). It is important to mention that the above method, where X is being decomposed into rank-1 components using singular value decomposition, is called basic SSA. In practice, also other decompositions can be used. For more details, see Golyandina et al., (2001), for example. 3. Grouping: In this step, the rank-1 components in decomposition (2) are grouped into predefined groups, where the group membership is described by a set of indices I ¼ {i1, . . ., ip} {1, . . ., d}, p d. Then, the matrix corresponding to the group I is defined as X I ¼ X i1 þ þ X ip . If the set of indices {1,. . ., d} is partitioned into m disjoint subsets I1,. . ., Im, m d, then one obtains so-called grouped decomposition of matrix X: X ¼ X I1 þ þ X Im :
ð3Þ
Especially, if Ik ¼ k, for k ¼ 1,. . ., d, the grouping is called elementary and the corresponding components in (3) are called elementary matrices. Furthermore, if only one group I {1, . . ., d} is specified, one proceeds by assuming that the given partition is {I, I c}, where Ic ¼ {1, . . ., d}\I. In that case, XI usually corresponds to the pattern of interest, while X Ic ¼ X X I is treated as the residual. 4. Reconstruction: As final step, matrices X Ij , j ¼ 1,. . ., m, from decomposition (3) are transformed into new time series of length N. Reconstructed xIk are obtained by sequentially averaging the elements of matrix X Ik , k ¼ 1,. . ., m, that lay on the off-diagonals, i.e., ½xIk t ¼
1 j St j
½X Ik i,j ,
ð4Þ
iþj¼tþ1
where | St | denotes the cardinality of the finite set St ¼ {(i, j): i þ j ¼ t þ 1}. In the literature, this process is known as diagonal averaging. If one applies the given reconstruction to all components in group decomposition (3) and, for simplicity of the notation, denotes xk ≔xIk , k ¼ 1,. . ., m, the resulting decomposition of initial time series x is given by: x ¼ x1 þ þ x m :
ð5Þ
In the case of basic SSA for univariate time series, the tuning parameters one needs to specify a priori are window length l and the partition of the set of indices. In more general SSA, the trajectory matrix, as well as its rank-1 decomposition, can be chosen more freely.
Separability A central concept in SSA is separability, which to some extent ensures the validity of the method. Assume that one can decompose the time series into two univariate series, that is, x ¼ x1 þ x2. This representation is usually associated with a signal plus noise model, trend plus the remainder model, and other structured models. (Approximate) separability of the components x1 and x2 implies that there exists a grouping (see Step 4) such that reconstructed x1 and x2 are (approximately) equal to x1 and x2, respectively, i.e., xi xi , i ¼ 1, 2. In basic SSA, the (approximate) separability corresponds to (approximate) orthogonality of the trajectory matrices. Consider as an example a time series of length N. If N is large enough, the trend, which is a slowly varying smooth component, and periodic components are (approximately) separable and both are (approximately) separable from the noise. For illustration see Fig. 1, where a trend and a periodic component are separated from a noise component. In Fig. 1 (a), the trend is extracted using the first three significant elementary components, with window length size l ¼ 100. In Fig. 1 (b), the seasonal component is extracted using the first six significant elementary components, with window length size l ¼ 150. The extracted components are approximately equal to the theoretical trend and seasonal component. In order to check the separability of the reconstructed components x1 and x2, with corresponding trajectory matrices X 1 and X 2 , respectively, the normalized orthogonality measure given in (Golyandina et al., 2018) is:
r X 1, X 2 ¼
X 1, X 2 X1
F
X2
F
,
F
where h iF and k kF are Frobenius matrix inner product and norm, respectively. This orthogonality measure further induces dependence measure between two time series called w-correlation (Golyandina and Korobeynikov 2014). A large value of w-correlation between a pair of elementary components suggests that, in the grouping step, these should perhaps be in the same group. Furthermore, additive sub-series can be identified using the principle that the form of an eigenvector resembles the form of the sub-series produced by the eigenvector. For example, the eigenvectors produced by a slowlyvarying component (trend) are slowly varying, and the eigenvectors produced by a sine wave (periodic component) are again sine waves with the same period (Golyandina and Korobeynikov 2014). Therefore, plots of eigenvectors are often used in the process of identification. For more details, see Golyandina and Korobeynikov (2014); Golyandina et al. (2018), for example.
S
1330
Singular Spectrum Analysis
b
Reconstructed Series
5
Original
10
10 15 20 25
Reconstructed Series
6 4
Seasonality
2
15 10
2 -2
0
Residuals
1 -3 -2 -1 0
Residuals
2
3
4
40
0
5
Trend
20
8
25 0
0
5
Original
a
0
50
100
150
200
0
Time
50
100
150
200
250
300
Time
Extraction of the trend
Extraction of the oscillatory component
Singular Spectrum Analysis, Fig. 1 Left: extraction of the polynomial trend in time series xt ¼ (0.05(t – 1) – 5)2 þ yt, t ¼ 1,. . ., 200, where yt is MA(0.9) process. Right: extraction of the oscillatory component in
time series xt ¼ atyt, where at ¼ 1, for t ¼ 1,. . ., 100, at ¼ 0.9 for t ¼ 101,. . ., 200 and at ¼ 0.92 for t ¼ 201,. . ., 300. yt ¼ (0.1(t – 1) – 5)2 þ zt, where zt is MA(0.9) process.
Prediction Using SSA
The real-data time series are not in general finite-rank time series. However, if time series x is a sum of a finite-rank signal x1 and additive noise, then SSA may approximately separate the signal component, and one can further use the methods designed for analysis and forecasting of the finite-rank series, thus obtaining the continuation (forecast) of a signal component x1 of x Golyandina and Korobeynikov (2014); Golyandina et al. (2001). Such a problem is known as forecasting the signal (trend, seasonality. . .) in the presence of additive noise. The confidence intervals for the forecast can be obtained using bootstrap techniques. For more details, see Golyandina et al. (2001); Golyandina and Korobeynikov (2014).
A class of time series especially suited for SSA are so-called finite-rank time series, where we say that the time series x is of finite-rank if its trajectory matrix is of rank d < min(k, l ) and does not depend on window length l, for l large enough. In that case, under mild conditions, there exists one-to-one correspondence between the trajectory matrix of the time series x and a linear recurrent relation (LRR): d1
xtþd ¼
ai xtþi , t ¼ 1, . . . , N d,
ð6Þ
i¼1
that governs the time series x. Furthermore, the solution to LRR (6) can then be expressed as sums and products of polynomial, exponential, and sinusoidal components, whose identification leads to the reconstruction of trend and various periodic components, among others. If one applies obtained LRR (6) to the last terms of the initial time series x, one obtains the continuation of x which serves as a prediction of the future.
Example To illustrate the SSA method, we use the average monthly temperatures (in C) measured at the Jyväskylä airport, in Finland, from 1960 to 2000. To perform basic SSA as described above, we use the R (R Core Team 2020) package Rssa (Golyandina et al. 2015). As suggested in Golyandina
Singular Spectrum Analysis
1331
et al. (2001), for extracting a periodic component in short time series, it is preferable to take the window length l proportional to the period. Thus, we take l ¼ 48. After investigating the singular values and paired scatter plots of the left singular vectors, we identify the first component as depicting a slowly varying trend, and the grouped fifth and sixth components to depict the yearly seasonality of the average monthly temperature. The original time series is shown together with its decomposition into seasonal component, trend component,
and residuals in Fig. 2. As seen in Fig. 2, the seasonal component is clearly dominating. However, there exists also a trend that looks almost linear. Eliminating the noise and using only the reconstructed trend and yearly seasonality, we predict the average monthly temperature for years 2000 and 2001 based on the components shown in Fig. 2. The predictions together with the true observed values are shown in Fig. 3. Figure 3 shows that SSA can be used to predict the temperatures in Jyväskylä quite nicely. For more details and
10 0 -10 5 0 -5 -10 2.8 2.7 2.6
Trend
2.9
Seasonality
10
-20
Original
20
Reconstructed Series
5 0 -5 -10
Residuals
10
S
1960
1970
1980
1990
2000
Time Singular Spectrum Analysis, Fig. 2 Top to bottom: Average monthly temperature ( C) at the Jyväskylä airport from 1960 to 2000, the reconstructed seasonal component with 12-month period, reconstructed trend component, and residuals.
10 5 0 -10
Singular Spectrum Analysis, Fig. 3 The predicted average monthly temperature ( C) at the Jyväskylä airport for 2000 and 2001 (blue line). The dashed black line shows the true observed average temperatures.
Singularity Analysis
Average temperature
1332
1996
1997
guidelines for grouping and identification of trend and oscillatory components, we refer to Golyandina et al. (2001).
Extensions and Relations to Other Approaches SSA has been extended to multivariate time series in which case it is known as multidimensional or multichannel SSA (M-SSA) and for the analysis of images where it is known as 2D singular spectrum analysis (2D-SSA). While usually time series analysis is either performed in the time or the frequency domain, one can see SSA as a compromise, as a time-frequency method. For example, SSA can be seen as a method that chooses an adaptive basis generated by the time series itself while, for example, Fourier analysis uses a fixed basis of sine and cosine functions. For more detailed discussions and interpretations of SSA, we refer to Elsner and Tsonis (1996); Golyandina et al. (2001); Ghil et al. (2002).
1998
1999
2000
2001
2002
References Broomhead D, King GP (1986a) Extracting qualitative dynamics from experimental data. Physica D Nonlinear Phenomena 20(2):217–236 Broomhead D, King GP (1986b) On the qualitative analysis of experimental dynamical systems. Nonlinear Phenomena and Chaos:113–144 Elsner JB, Tsonis AA (1996) Singular spectrum analysis: a new tool in time series analysis. Plenum Press, New York Ghil M, Allen MR, Dettinger MD, Ide K, Kondrashov D, Mann ME, Robertson AW, Saunders A, Tian Y, Varadi F, Yiou P (2002) Advanced spectral methods for climatic time series. Rev Geophys 40(1):3–1–3–41 Golyandina N, Korobeynikov A (2014) Basic singular spectrum analysis and forecasting with R. Comput Stat Data Anal 71:934–954 Golyandina N, Nekrutkin V, Zhigljavsky A (2001) Analysis of time series structure: SSA and related techniques. Chapman & Hall/ CRC, Boca Raton Golyandina N, Korobeynikov A, Shlemov A, Usevich K (2015) Multivariate and 2D extensions of singular spectrum analysis with the Rssa package. J Stat Softw 67(2):1–78 Golyandina N, Korobeynikov A, Zhigljavsky A (2018) Singular spectrum analysis with R. Springer, Berlin R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/
Summary Singular spectrum analysis (SSA) is a model-free family of methods for time series analysis, that can be used, for example, for eliminating noise in the data, identifying interpretable components such as trend and oscillatory components, forecasting, and imputing missing values. The method consists of the four main steps: embedding, decomposition, grouping, and reconstruction. The method is currently widely applied in the analysis of climatic, meteorological, and geophysical time series.
Cross-References ▶ Singularity Analysis ▶ Time Series Analysis ▶ Time Series Analysis in the Geosciences
Singularity Analysis Behnam Sadeghi1,2 and Frits Agterberg3 1 EarthByte Group, School of Geosciences, University of Sydney, Sydney, NSW, Australia 2 Earth and Sustainability Research Centre, University of New South Wales, Sydney, NSW, Australia 3 Central Canada Division, Geomathematics Section, Geological Survey of Canada, Ottawa, ON, Canada
Definition Singular physical processes include various types of mineralization, volcanoes, floods, rainfall, and landslides (Wang et al. 2012). The singularity method is a model to “characterise
Singularity Analysis
anomalous behavior of such processes mostly resulting in anomalous amounts of energy release or material accumulation within a narrow spatial–temporal interval” (Cheng 2007).
Introduction One of the main objectives in the analysis of data in geochemical exploration programs is the selection and application of suitable classification models to distinguish between signals related to the mineralization effects and other processes or sources, as well as controls on variation in background geochemical distributions (Carranza 2009; Najafi et al. 2014; Zuo and Wang 2016; Grunsky and de Caritat 2020; Sadeghi 2020; see the entry ▶ “Exploration Geochemistry”). In environmental or urban geochemistry, the objective is generally the separation of geochemical patterns related to anthropogenic contamination from those derived from natural or geogenic processes (Guillén et al. 2011; Demetriades et al. 2015; Zuluaga et al. 2017; Albanese et al. 2018; Thiombane et al. 2019). The data addressed in these cases is commonly presented in the form of geochemical maps at various scales, sampling densities, and sampling geometries (Darnley 1990; Sadeghi 2020). Various mathematical and statistical models have been applied to regional geochemical data to reveal subtle geochemical patterns related to the effects of different styles of mineralization or variations in parent lithologies. Conventional statistical methods such as exploratory data analysis (Tukey 1977; Chiprés et al. 2009), weights of evidence (Bonham-Carter et al. 1988, 1989), and probability plots (Sinclair 1991) used for anomaly detection or the isolation of geochemical signals and patterns related to the effects of mineralization on regolith geochemistry have a number of limitations. This has encouraged investigation of less conventional alternatives to conventional parametric methods, including fractal modeling and geostatistical simulation to isolate such signals or patterns (Cheng et al. 1994; Gonçalves et al. 2001; Li et al. 2003; Lima et al. 2003; Carranza 2009; Zuo 2011; Sadeghi et al. 2012, 2015). Geochemical distributions, including those derived from regional regolith geochemical mapping studies, have been shown to display fractal behavior at various spatial scales (Bölviken et al. 1992). Such geochemical patterns develop through a variety of processes over geological time. It is the superimposition of several element enrichment or depletion episodes that results in the final element concentration distributions displaying multifractal behavior, containing a hierarchy of geochemical dispersion patterns observed at local to continental scales within most sampling materials (Zuo and Wang 2016, 2019). Differences observed in the fractal characteristics of regolith geochemical patterns relate to a number of factors. Whereas the main factor is typically a variation in
1333
parent lithology (Cheng 2014; Nazarpour et al. 2015; Cámara et al. 2016), variation may also relate to the effects of mineralization and the task of separating element concentrations and/or samples into the categories traditionally termed “anomalous” versus “background” (Cohen and Bowell 2014). The concept of fractal geometry was introduced to explain self-similarity or self-affinity at various scales in natural systems with models based on power-law spatial distributions in data (Mandelbrot 1983). Fractal or multifractal models are derived from power-law relationships between variables, such as the observed relationship between element concentration and the number of samples exceeding a given concentration. A fractal model can be represented in the general form: N ðr Þ ¼ C r D : r > 0, D ℜ > 0 where r is a characteristic linear measure, N(r) is the number of objects with characteristic linear measure r (N(r)), C is a constant of proportionality (a prefactor parameter), and D is the generalized fractal dimension (Shen and Zhao 1998). Some parametric distributions in geochemical data are intrinsically fractal (Shen and Cohen 2005). Fractal relationships manifest as linear segments on (log-log) plots of the chosen fractal characteristics, in similar fashion to the identification of populations displaying particular distribution types and determining their parameters via probability or Q-Q plots (Sinclair 1991). The fractal dimensions are represented by the slopes of straight lines fitted to corresponding plots (Spalla et al. 2010; Sadeghi et al. 2012; Khalajmasoumi et al. 2017). In geochemical data, the slope for a fractal dimension can be interpreted as a measure of the element enrichment intensity (Afzal et al. 2010; Bai et al. 2010; Sadeghi et al. 2012; Zuo and Wang 2016). A change in slope or line orientation results in a new representative population for the given variables. In regional regolith geochemical data, this may be associated with a change in processes (e.g., different parent lithologies) or the existence of atypical conditions (such as the influence of mineralization). The threshold values, classifying different populations, are the breakpoints of the straight lines on such log-log plots (Carranza 2009; Afzal et al. 2011), providing a relatively simple graphical means of splitting data into populations. A growing range of spatial and parametric fractal methods has been applied to geochemical data, to associate univariate or multivariate fractal dimensions to specific geochemical processes or domains. Whereas statistically self-similar fractals are typically isotropic, some may be anisotropic and display patterns at various scales depending on spatial orientations (Turcotte 1997), if a spatial fabric exists within the data.
S
1334
In general, geochemical landscapes (Fortescue 1992) contain multiple fractal dimensions in regional geochemical data and thresholds and may be considered as spatially intertwined sets of monofractals (Feder 1988; Stanley and Meakin 1988). This may also be applicable to patterns with continuous spatial variability (Agterberg 1994, 2001). The multifractal behavior of variables within geochemical landscapes can be associated with the distributions of the probability density and spatial distributions of geochemical data (Cheng and Agterberg 1996; Carranza 2009; Zuo et al. 2009; Zuo and Wang 2016). In large, regional geochemical datasets, geochemical populations associated with mineralization are typically small and distinct as mineralization is typically of limited areal extent and is generally geochemically very distinct from other “background” or non-mineralization-related populations. In conventional parametric methods, outlying values are generally removed prior to development of models (such as regression analysis or factor analysis) as they may significantly bias models. This partly derives from many conventional methods assuming geochemical element distributions following a normal or “normalisable” distribution (Reimann and Filzmoser 1999; Limpert et al. 2001; Sadeghi et al. 2015). Geochemical concentrations in the majority of studies do not follow a normal distribution or are the summation of a large number of component populations that either appear en mass to exhibit normality or log-normality (Li et al. 2003; Bai et al. 2010; He et al. 2013; Luz et al. 2014; Afzal et al. 2010). Q-Q plots are also based on the normality of data distribution neglecting the shape and intensity of anomalous areas. Fractal modeling does not typically require or assume normality and is largely unaffected by the presence of even extreme outliers. Therefore, fractal methods are gaining preference over conventional parametric methods (Afzal et al. 2010). Depending on the method used, fractal modeling can simultaneously incorporate population as well as spatial distribution characteristics of geochemical data (He et al. 2013; Luz et al. 2014). Some fractal models have been developed to distinguish between different populations in geochemical datasets. This is to characterize geochemical anomalous values that could be related to the effects of mineralization. Such fractal models are number-size (N-S) (Mandelbrot 1983) and its 3D equivalent (Sadeghi et al. 2012), concentration-area (C-A) and perimeter-area (P-A) (Cheng et al. 1994, 1996; see the entry ▶ “Concentration-Area Plot”), spectrum-area (S-A) based on the relation between power spectrum and the occupied area of geochemical data values (Cheng et al. 1999; see the entry ▶ “Spectrum-Area Method”), singularity (Cheng 1999; Daya Sagar et al. 2018) based on the singularity spectrum, concentration-volume (C-V) (Afzal et al. 2011), spectrumvolume (S-V) (Afzal et al. 2012), concentration-concentration (C-C) (Sadeghi 2021), concentration-distance from centroids (C-DC) (Sadeghi and Cohen 2021a), category-based
Singularity Analysis
fractal modeling (Sadeghi and Cohen 2021b) simulated sizenumber (SS-N) (Sadeghi et al. 2015), and global simulated size-number (GSS-N) (Madani and Sadeghi 2019). Most such models are based on the relation between element concentrations and geometrical properties such as area and perimeter. The S-A model is used to differentiate geochemical anomalies from background by spectral analysis in the frequency domain through Fourier series analysis (Cheng 1999). The singularity model is applied to characterize the uniqueness degree of geological and geochemical properties based on its ability to delineate target areas, which are smoothed by conventional contouring models (Cheng 2007, 2008, 2012; Cheng and Agterberg 2009; Zou et al. 2013; Zuo and Wang 2016). In exploration geochemical data, changes in fractal dimensions, indicated by the fractal classification models and ensuing clustering of samples, are used to cluster or partition samples and areas in geochemical maps. In large-scale regional geochemical mapping, the different classes represent the effects of variation in parent lithology, regolith development, and some environmental factors. Populations that are influenced by the effects of mineralization or contamination are typically represented by small numbers of samples that have conventionally been referred to as “geochemical anomalies” and the remainder as representing local or general “background” populations.
Methodology: Singularity Model Singularity modeling is a form of fractal/multifractal modeling, which has been proposed and developed by Cheng (1999) to use in remote sensing and satellite image processing, with further work by Cheng (2007) focusing on its application in the study of stream sediment geochemical data for prediction of undiscovered mineral deposits. Some other studies on geochemical anomaly recognition have been undertaken using this method; e.g., Cheng and Zhao (2011) proposed that geochemical anomalies could be recognized for mineral deposit prediction through singularity theory, which has been developed to characterize nonlinear mineralization processes; Cheng (2012) applied the singularity theory for mapping geochemical anomalies related to buried sources and for predicting undiscovered mineral deposits in concealed areas; Wang et al. (2012) proposed a tectonic-geochemical exploration model based on the application of singularity theory to fault data; Xiao et al. (2012) combined the singularity mapping technique and spatially weighted principal component analysis (PCA) to delineate geochemical anomalies; Arias et al. (2012) demonstrated the robustness of C-A multifractal model and singularity technique for enhancing weak geochemical anomalies, in comparison with TS methods; and Zuo et al. (2013) compared the robustness of
Singularity Analysis
1335
C-A and S-A fractal models to singularity analysis in order to delineate geochemical anomalies in covered areas. Moreover, Zuo and Wang (2016) published a review paper to discuss and compare the theories, benefits, limitations, applications, and relations of C-A, S-A, concentration-distance (C-D), and C-V multifractal models in addition to the singularity model, especially for use in the recognition of weak anomalies associated with buried sources. In general, the singularity modeling has been developed based on local singularity exponent. Such exponents are calculated by creating a number of maps (e.g., geochemical maps) with different scales. Such calculations are applicable to quantify the element concentration/depletion scaling properties. Based on Cheng (2012), singularity model is highly applicable to define weak but complex anomalous areas associated with mineral deposits, especially in areas covered or concealed by deserts, regolith, or vegetation. The singularity phenomenon can be described in both 2D and 3D spaces based on a power-law relation between area (A) or volume (V ) occupied by the target mineralization and the target metal/element total amount in that area (m(A)) or volume (m(V )). Such relationships have been formulated as follows (Cheng 2007): a
For 3D : mðV i Þ / V i =3
where α is the singularity index. The metal/element concentration in this equation can be defined as C(Vi)¼ m(Vi)/Vi, and consequently: a
CðV i Þ / V i3
1
The same process for 2D models would result in: a=
mð A i Þ / A i 2 Then, a
CðAi Þ / Ai2
1
The singularity index can be calculated using the following equation (Liu et al. 2019): X ¼ c:eaE where X depicts element concentrations, c represents a constant value, ε is the normalized distance measure, α is the singularity index, and E is the Euclidian dimension (usually 2 in geochemical anomaly mapping) (Agterberg 2012; Zuo et al. 2013). In order to estimate the singularity index, we can
Troodos Ophiolite Basaltic volcanic units Mafic cumulate units & sheeted dykes Ultramafic cumulate units
Troodos Massif
S Mesaoria Plain
Quaternary sediments Circum-Troodos Sedimentary Succession units
Keryneia Terrane units Mamonia Terrane units Singularity Analysis, Fig. 1 Simplified geology of Cyprus showing the Troodos Ophiolite (TO) flanked by the younger Circum-Troodos Sedimentary Succession (CTSS) and coastal alluvium-colluvium deposits (Sadeghi 2020: after Cohen et al. 2011)
1336
Singularity Analysis
apply window-based methods. The process of estimating the singularity index has been discussed in detail in Cheng 2007, 2012 and Zuo et al. 2013. The main point here that needs to be taken into account is that anomalous positive singularity index values (with α < 2) and those with negative singularity index values (with α > 2) are usually associated with enrichments and depletions of element concentrations, respectively. However, when α ≈ 2,
Alestos & Memi Skouriotossa
Case Study: Cyprus Sadeghi (2020) applied the singularity model to the highdensity soil geochemical atlas of Cyprus (GAC) with the
Kokkinoyia & Kokkinopesula Peristerka, Pytharochoma, Kokkinonero & Kapedhes
Apliki
Lefkosia
Agrokipia
Limni
there is no positive or negative geochemical singularity (Xiao et al. 2012).
Troulli
Mathiatis Sth Evloymeni, Uncle Charles & Kinousa
Larnaca
Sha Kalavasos Maghaleni
Pafos Lemesos
Major urban area Pillow basalts
Cu sulphide occurrence
Cu mines
Sampling site topsoil 0 – 25cm depth
0.5–2 2–5 >5mt
Singularity Analysis, Fig. 2 Location of Cyprus-style VMS Cu deposits and mines, the host basalt units and the 5,515 soil sampling locations (Sadeghi 2020)
Singularity Analysis, Fig. 3 Classified singularity model of the Cyprus-style VMS Cu mineralization in Cyprus (Sadeghi 2020, 2021)
Singularity Analysis
objective to isolate patterns associated with the Cyprus-style Cu deposits (against a high but variable background Cu content in most of the lithologies hosting the mineral deposits) (Figs. 1 and 2). Factor analysis was previously completed on the dataset by Cohen et al. (2011, 2012a,b). All significant factors are related to general lithologically controlled geochemical variations in the soil. At the scale of sampling undertaken, none of the factors could be related to either effects of mineralization or anthropogenic influences. So, the singularity model is applied to the third factor (Fig. 3). The model has strongly defined the highly anomalous areas including the relevant known mineral deposits and mines. Moreover, several other areas in eastern and SW Cyprus have been recognized that could be potential target areas for further exploration.
Summary and Conclusions Singularity index, from a multifractal point of view, is applicable to quantify the amount of enrichment or depletion in mineralization. It is also applicable to generate characteristic maps of mineralization based on different classes from background to highly anomalous populations, especially in concealed, buried, and undiscovered areas. Phenomena such as mineralization have both spatial and temporal properties although, even in quite small areas or in short time, both properties can be defined by spatial and temporal singularity models.
Cross-References ▶ Concentration-Area Plot ▶ Exploration Geochemistry ▶ Spectrum-Area Method
Bibliography Afzal P, Khakzad A, Moarefvand P, Rashidnejad Omran N, Esfandiari B, Fadakar Alghalandis B (2010) Geochemical anomaly separation by multifractal modeling in Kahang (Gor Gor) porphyry system, Central Iran. J Geochem Explor 104:34–46 Afzal P, Fadakar Alghalandis Y, Khakzad A, Moarefvand P, Rashidnejad Omran N (2011) Delineation of mineralization zones in porphyry Cu deposits by fractal concentration-volume modeling. J Geochem Explor 108:220–232 Afzal P, Fadakar Alghalandis Y, Moarefvand P, Rashidnejad Omran N, Asadi Haroni H (2012) Application of power-spectrum-volume fractal method for detecting hypogene, supergene enrichment, leached and barren zones in Kahang Cu porphyry deposit, Central Iran. J Geochem Explor 112:131–138 Agterberg FP (1994) Fractal, multifractals and change of support. In: Dimitrakopoulus R (ed) Geostatistics for the next century. Kluwer, Dordrecht, pp 223–234
1337 Agterberg FP (2001) Multifractal simulation of geochemical map patterns. In: Merriam DF, Davis JC (eds) Geologic modeling and simulation in sedimentary systems. Kluwer-Plenum, New York, pp 327–346 Agterberg FP (2012) Multifractals and geostatistics. J Geochem Explor 122:113–122 Albanese S, Cicchella D, Lima A, De Vivo B (2018) Geochemical mapping of urban areas. Environ Geochem 2nd ed, pp 133–151 Arias M, Gumiel P, Martín-Izard A (2012) Multifractal analysis of geochemical anomalies: a tool for assessing prospectivity at the SE border of the Ossa Morena Zone, Variscan Massif (Spain). J Geochem Explor 122:101–112 Bai J, Porwal A, Hart C, Ford A, Yu L (2010) Mapping geochemical singularity using multifractal analysis: application to anomaly definition on stream sediments data from Funin Sheet, Yunnan, China. J Geochem Explor 104:1–11 Bölviken B, Stokke PR, Feder J, Jössang T (1992) The fractal nature of geochemical landscapes. J Geochem Explor 43:91–109 Bonham-Carter GF, Agterberg FP, Wright DF (1988) Integration of geological datasets for gold exploration in Nova Scotia. Photogramm Eng Rem Sci 54:1585–1592 Bonham-Carter GF, Agterberg FP, Wright DF (1989) Weights of evidence modeling: a new approach to mapping mineral potential. In: Agterberg FP, Bonham-Carter GF (eds) Statistical applications in the Earth sciences. Geological survey of Canada, Ottawa, Ontario Paper, vol 89, pp 171–183 Cámara J, Gómez-Miguel V, Ángel Martín M (2016) Identification of bedrock lithology using fractal dimensions of drainage networks extracted from medium resolution LiDAR Digital Terrain Models. Pure Appl Geophys 173:945–961 Carranza EJM (2009) Geochemical anomaly and mineral prospectivity mapping in GIS. In: Handbook of exploration and environmental geochemistry, vol 11. Elsevier, Amsterdam. 368 p Cheng Q (1999) Multifractal interpolation. In: SJ Lippard, A Naess, R Sinding-Larsen (eds) Proceedings 5th annual conferences international association mathematical geology, Trondheim, Norway 245–250 Cheng Q (2007) Mapping singularities with stream sediment geochemical data for prediction of undiscovered mineral deposits in Gejiu, Yunnan Province, China. Ore Geol Rev 32:314–324 Cheng Q (2008) Non-linear theory and power-law models for information integration and mineral resources quantitative assessments. Math Geol 40:503–532 Cheng Q (2012) Singularity theory and methods for mapping geochemical anomalies caused by buried sources and for predicting undiscovered mineral deposits in covered areas. J Geochem Explor 122:55–70 Cheng Q (2014) Vertical distribution of elements in regolith over mineral deposits and implications for mapping geochemical weak anomalies in covered areas. Geochem Explor Environ Anal 14:277–289 Cheng Q, Agterberg FP (1996) Multifractal modeling and spatial statistics. Math Geol 28:1–16 Cheng Q, Agterberg FP (2009) Singularity analysis of ore-mineral and toxic trace elements in stream sediments. Comput Geosci 35: 234–244 Cheng Q, Zhao P (2011) Singularity theories and methods for characterizing mineralization processes and mapping geo-anomalies for mineral deposit prediction. Geosci Front 2:67–79 Cheng Q, Agterberg FP, Ballantyne SB (1994) The separation of geochemical anomalies from background by fractal methods. J Geochem Explor 51:109–130 Cheng Q, Agterberg FP, Bonham-Carter GF (1996) A spatial analysis method for geochemical anomaly separation. J Geochem Explor 56: 183–195 Cheng Q, Xu Y, Grunsky E (1999) Integrated spatial and spectral analysis for geochemical anomaly separation. In: Lippard SJ,
S
1338 Naess A, Sinding-Larsen R (eds) Proc. of the IAMG conference, vol 1. Trondheim, Norway, pp 87–92 Chiprés J, Castro-Larragoitia J, Monroy M (2009) Exploratory and spatial data analysis (EDA-SDA) for determining regional background levels and anomalies of potentially toxic elements in soils from Catorce-Matehuala, Mexico. Appl Geochem 24:1579–1589 Cohen DR, Bowell RJ (2014) Chap 24: Exploration geochemistry. In: Scott SD (ed) Treatise on geochemistry: Vol. 13, Geochemistry of mineral deposits. Elsevier. ISBN 978 174223 306 2 Cohen DR, Rutherford NF, Morisseau E, Zissimos AM (2011) The geochemical atlas of Cyprus. UNSW Press, Sydney Cohen DR, Rutherford NF, Morisseau E, Christoforou E, Zissimos AM (2012a) Anthropogenic versus lithological influences on soil geochemical patterns in Cyprus. Geochem Explor Environ Anal 12: 349–360 Cohen DR, Rutherford NF, Morisseau E, Zissimos AM (2012b) Geochemical patterns in the soils of Cyprus. Sci Total Environ 420: 250–262 Darnley AG (1990) International geochemical mapping: a new global project. J Geochem Explor 39:1–13 Daya Sagar BS, Cheng Q, Agterberg F (2018) Handbook of mathematical geosciences. Springer, Cham Demetriades A, Birke M, Albanese S, Schoeters I, De Vivo B (2015) Continental, regional and local scale geochemical mapping. J Geochem Explor 154:1–5 Feder J (1988) Fractals. Plenum, New York Fortescue JAC (1992) Landscape geochemistry: retrospect and prospect. Appl Geochem 7:1–53 Gonçalves MA, Mateus A, Oliveira V (2001) Geochemical anomaly separation by multifractal modelling. J Geochem Explor 72(2): 91–114 Grunsky EC, de Caritat P (2020) State-of-the-art analysis of geochemical data for mineral exploration. Geochem Explor Environ Anal 20(2): 217–232 Guillén MT, Delgado J, Albanese S, Nieto JM, Lima A, De Vivo B (2011) Environmental geochemical mapping of Huelva municipality soils (SW Spain) as a tool to determine background and baseline values. J Geochem Explor 109:59–69 He J, Yao S, Zhang Z, You G (2013) Complexity and productivity differentiation models of metallogenic indicator elements in rocks and supergene media around Daijiazhuang Pb-Zn deposit in Dangchang County, Gansu Province. Nat Resour Res 22:19–36 Khalajmasoumi M, Sadeghi B, Carranza EJM, Sadeghi M (2017) Geochemical anomaly recognition of rare earth elements using multifractal modeling correlated with geological features, Central Iran. J Geochem Explor 181:318–332 Li C, Ma T, Shi J (2003) Application of a fractal method relating concentrations and distances for separation of geochemical anomalies from background. J Geochem Explor 77:167–175 Lima A, De Vivo B, Cicchella D, Cortini M, Albanese S (2003) Multifractal IDW interpolation and fractal filtering method in environmental studies: an application on regional stream sediments of Campania Region (Italy). Appl Geochem 18:1853–1865 Limpert E, Stahel WA, Abbt M (2001) Log-normal distributions across the sciences: keys and clues. Bioscience 51:341–352 Liu Y, Cheng Q, Carranza EJM, Zhou K (2019) Assessment of geochemical anomaly uncertainty through geostatistical simulation and singularity analysis. Nat Resour Res 28:199–212 Luz F, Mateus A, Matos JX, Gonçalves MA (2014) Cu- and Zn-soil anomalies in the NE Border of the South Portuguese Zone (Iberian Variscides, Portugal) identified by multifractal and geostatistical analyses. Nat Resour Res 23:195–215 Madani N, Sadeghi B (2019) Capturing hidden geochemical anomalies in scarce data by fractal analysis and stochastic modeling. Nat Resour Res 28:833–847 Mandelbrot BB (1983) The fractal geometry of nature, 2nd edn. Freeman, New York
Singularity Analysis Najafi A, Karimpour MH, Ghaderi M (2014) Application of fuzzy AHP method to IOCG prospectivity mapping: a case study in Taherabad prospecting area, eastern Iran. Int J Appl Earth Obs 33:142–154 Nazarpour A, Omran NR, Rostami Paydar G, Sadeghi B, Matroud F, Mehrabi Nejad A (2015) Application of classical statistics, logratio transformation and multifractal approaches to delineate geochemical anomalies in the Zarshuran gold district, NW Iran. Chem ErdeGeochem 75:117–132 Reimann C, Filzmoser P (1999) Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environ Geol 39: 1001–1014 Sadeghi B (2020) Quantification of uncertainty in geochemical anomalies in mineral exploration. PhD thesis, University of New South Wales Sadeghi B (2021) Concentration-concentration fractal modelling: a novel insight for correlation between variables in response to changes in the underlying controlling geological-geochemical processes. Ore Geol Rev. https://doi.org/10.1016/j.oregeorev.2020.103875 Sadeghi B, Cohen D (2021a) Concentration-distance from centroids (CDC) multifractal modeling: A novel approach to characterizing geochemical patterns based on sample distance from mineralization. Ore Geol Rev. https://doi.org/10.1016/j.oregeorev.2021.104302 Sadeghi B, Cohen D (2021b) Category-based fractal modelling: A novel model to integrate the geology into the data for more effective processing and interpretation. J Geochem Explor. https://doi.org/10. 1016/j.gexplo.2021.106783 Sadeghi B, Moarefvand P, Afzal P, Yasrebi AB, Daneshvar Saein L (2012) Application of fractal models to outline mineralized zones in the Zaghia iron ore deposit, Central Iran. J Geochem Explor 122: 9–19 Sadeghi B, Madani N, Carranza EJM (2015) Combination of geostatistical simulation and fractal modeling for mineral resource classification. J Geochem Explor 149:59–73 Shen W, Cohen DR (2005) Fractally invariant distributions and an application in geochemical exploration. Math Geol 37:895–909 Shen W, Zhao PD (1998) Theory study of fractal statistical model and its application in geology. Sci Geol Sin 33:234–243 Sinclair AJ (1991) A fundamental approach to threshold estimation in exploration geochemistry: probability plots revisited. J Geochem Explor 41:1–22 Spalla ML, Gosso G, Moratta AM, Zucali M, Salvi F (2010) Analysis of natural tectonic systems coupled with numerical modelling of the polycyclic continental lithosphere of the Alps. Int Geol Rev 52: 1268–1302 Stanley HE, Meakin P (1988) Multifractal phenomena in physics and chemistry. Nature 335(6189):405–409 Thiombane M, Di Bonito M, Albanese S, Zuzolo D, Lima A, De Vivo B (2019) Geogenic versus anthropogenic behaviour and geochemical footprint of Al, Na, K and P in the Campania region (Southern Italy) soils through compositional data analysis and enrichment factor. Geoderma 335:12–26 Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading Turcotte DL (1997) Fractals and Chaos in geology and geophysics, 2nd edn. Cambridge University Press, Cambridge Wang G, Carranza EJM, Zuo R, Hao Y, Du Y, Pang Z, Sun Y, Qu J (2012) Mapping of district-scale potential targets using fractal models. J Geochem Explor 122:34–46 Xiao F, Chen J, Zhang Z, Wang C, Wu G, Agterberg F (2012) Singularity mapping and spatially weighted principal component analysis to identify geochemical anomalies associated with Ag and Pb-Zn polymetallic mineralization in Northwest Zhejiang, China. J Geochem Explor 122:90–100 Zuluaga MC, Norini G, Ayuso R, Nieto JM, Lima A, Albanese S, De Vivo B (2017) Geochemical mapping, environmental assessment and Pb isotopic signatures of geogenic and anthropogenic sources in three localities in SW Spain with different land use and geology. J Geochem Explor 181:172–190
Smoothing Filter
1339
Zuo R (2011) Identifying geochemical anomalies associated with Cu and Pb-Zn skarn mineralization using principal component analysis and spectrum-area fractal modeling in the Gangdese Belt, Tibet (China). J Geochem Explor 111:13–22 Zuo R, Wang J (2016) Fractal/multifractal modeling of geochemical data: a review. J Geochem Explor 164:33–41 Zuo R, Wang J (2019) ArcFractal: an ArcGIS add-in for processing geoscience data using fractal/multifractal models. Nat Resour Res 29:3–12 Zuo R, Cheng Q, Agterberg FP, Xia Q (2009) Application of singularity mapping technique to identification local anomalies using stream sediment geochemical data: a case study from Gangdese, Tibet, Western China. J Geochem Explor 101:225–235 Zuo R, Xia Q, Zhang D (2013) A comparison study of the C–A and S–A models with singularity analysis to identify geochemical anomalies in covered areas. Appl Geochem 33:165–172
Consider real-valued images defined on a finite regular grid S of size m n. Denote this support S ¼ {1, . . ., m} {1, . . ., n} ℕ2. Elements of S are denoted by (i, j) S, where 1 i m is the row, and 1 j n is the column, or by s {1, . . ., mn}. An image is then a function of the form f : S ! ℝ, i.e., f Sℝ, and a pixel is a pair (s, f(s)). A central notion in filters is that of a neighborhood. The neighborhood of any site s S is any set of sites that does not include s, denoted @ s S \ {s}, obeying the symmetry relationship t @ s , s @ t, where “\” denotes the difference between sets, i.e., A\B ¼ A\Bc, and Bc is the complement of the set B. Image filters on the data domain are typically defined with respect to a relatively small squared neighborhood of odd side ‘ of the form @ ði,jÞ ¼
Smoothing Filter
i
‘1 ‘1 ‘1 ‘1 ,i þ ,j þ j 2 2 2 2
\ S∖ði, jÞ:
ð1Þ Alejandro C. Frery School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
“small” in the sense that ‘ m and ‘ n. We define the mask on the support M¼
Synonyms Low-pass filter
‘1 ‘1 ‘1 ‘1 , , , 2 2 2 2
with (odd) side ‘, and it is denoted hM. Figure 1a shows the image we will use to illustrate the effect of smoothing filters. The image has 235 235 pixels, and it shows the amplitude data of a polarimetric synthetic aperture radar, horizontal-vertical channel. The effect of speckle is evident, but it is possible to see linear features.
Definition Smoothing filters aim at improving the signal-to-noise ratio.
0.09
Observation
S
0.06
0.03
0.00 1
(a) Original image Smoothing Filter, Fig. 1
Original image and transect
128
Column
(b) Middle horizontal transect
235
1340
Smoothing Filter
The “perfect” filter would eliminate the speckle, while retaining the details. Figure 1b shows the values of pixels in line 128, i.e., the transect at the middle of the image. Although the image shows dark and bright regions in this transect, the individual observations hinder such information due to the presence of speckle.
Smoothing Filter, Fig. 2
Linear Filters Consider C : Sℝ ! Sℝ an image transformation. It is said to be linear if for any pair of real numbers α, β and any pair of images f1, f2 Sℝ, holds that C(αf1 þ βf2) ¼ αC( f1) þ βC( f2). Linear filters belong to the class of linear transformations.
Mean filtered images and transects: original data in lilac, 3 3 in teal, 7 7 in brown, and 13 13 in pink
Smoothing Filter
1341
They are defined by means of a mask, and they are also called convolutional filters. The convolution of the image f Sℝ by the mask hM Mℝ is a new image g Sℝ given by
gði, jÞ ¼ 0 0 ‘1 ‘1 2 i , j 2
f ði i0 , j j0 ÞhM ði0 , j0 Þ,
ð2Þ
in every (i, j) S, and it is denoted g ¼ f * hM.
S
Smoothing Filter, Fig. 3
Median filtered images and transects: original data in lilac, 3 3 in teal, 5 5 in brown, and 7 7 in pink
1342
Spatial Analysis
If all the elements of the mask are nonnegative, then it is customary to say that the resulting filter is a “low pass filter.” This is related to the fact that, in the frequency domain representation, such filters tend to reduce the high-frequency content, which is mainly related to noise and to small details. In the following we will illustrate some of these filters, and their effect on the data presented in Fig. 1a. The class of mean filters is defined by masks hM where all or some values take the same (positive) value, being the rest zero. In order to preserve the mean value of the image f, the usual choice for this value is the reciprocal of the sum of entries of hM, as presented in the following:
Choosing or designing a smoothing filter requires a good knowledge of both the kind of data and noise, and of the desired effect.
Summary Smoothing filters aim at reducing noise while improving the signal-to-noise ratio. They often achieve such a task at the expense of introducing blurring and loosing details.
Bibliography h5M ¼
0
1=5
0
1=5
1=5
1=5
0
1=5
0
, h9M ¼
1=9
1=9
1=9
1=9
1=9
1=9
1=9
1=9
1=9
,
1 and h169 M ¼ ðhði, jÞ ¼ =169Þ in every 6 i, j 6. Figure 2a, b, and c are the result of applying convolutions 169 with masks h9M , h49 M , and hM to the image shown in Fig. 1a. These filtered images show the typical effect of low-pass filters: The noise is reduced, but small details are also affected. The bigger the support of the mean filter is, the more intense both effects are. The transect shown in Fig. 2d shows the effect of the mean filters. The detail presented in the inset plot illustrates how the noise reduction goes along with the blurring of sharp edges.
Statistical Filters This kind of filters employs statistical ideas. Many of them also belong to the class of linear filters, but many others do not. The median filter returns in each coordinate s the median value of the observations belonging to @ s . Its effect is similar to that of the mean filter, but it tends to preserve better edges at the expense of not reducing the additive noise as effectively as the latter does. On the other hand, it is excellent for reducing impulsive noise. Figure 3a, b, and c show the effect of the median filter and how it varies according to the window size. Figure 3d shows the original and median-filtered values along the transect.
More Details A more model-based approach can be found in the book by Velho et al. (2008). Other important references are the works by Barrett and Myers (2004), Jain (1989), Lim (1989), Gonzalez and Woods (1992), Myler and Weeks (1993), and Russ (1998), among many others. Frery and Perciano (2013) discuss these and other filters, and how to implement them, while Frery (2012) discusses their properties and applications to digital document processing.
Barrett HH, Myers KJ (2004) Foundations of image science. Pure and applied optics. Wiley-Interscience, New York Frery AC (2012) Image filtering. In: Mello CAB, Oliveira ALI, Santos WP (eds) Digital document analysis and processing. Nova Science Publishers, pp 53–69. https://www.novapublishers.com/catalog/ product_info.php?products_id¼28733 Frery AC, Perciano T (2013) Introduction to image processing with R: learning by examples. Springer. http://www.springer.com/computer/ image+processing/book/978-1-4471-4949-1 Gonzalez RC, Woods RE (1992) Digital image processing, 3rd edn. Addison-Wesley, Reading Jain AK (1989) Fundamentals of digital image processing. Prentice-Hall International Editions, Englewood Cliffs Lim JS (1989) Two-dimensional signal and image processing. Prentice hall signal processing series. Prentice Hall, Englewood Cliffs Myler HR, Weeks AR (1993) The pocket handbook of image processing algorithms in C. Prentice Hall. Englewood Cliffs, New Jersey Russ JC (1998) The image processing handbook, 3rd edn. CRC Press, Boca Raton Velho L, Frery AC, Miranda J (2008) Image processing for computer graphics and vision, 2nd edn. Springer, London. https://doi.org/10. 1007/978-1-84800-193-0
Spatial Analysis Qiyu Chen2, Gang Liu1,2, Xiaogang Ma3 and Xiang Que4 1 State Key Laboratory of Biogeology and Environmental Geology, Wuhan, Hubei, China 2 School of Computer Science, China University of Geosciences, Wuhan, Hubei, China 3 Department of Computer Science, University of Idaho, Moscow, ID, USA 4 Computer and Information College, Fujian Agriculture and Forestry University, Fuzhou, China
Definition Spatial analysis is a quantitative study of spatial phenomena in geography and even earth science. Its main ability is to
Spatial Analysis
manipulate spatial data into different forms and extract and mine potential information and knowledge. Spatial analysis derives from geographic information science (GIScience) and has been expanded and applied to a wide range of fields, including geoscience, architecture, environmental science, etc. Spatial analysis has been a core of Geographic Information System (GIS). Spatial analysis capability, especially to the extraction and transmission of spatial hidden information, is the main sign of GISs that differ from other information systems.
Situation and Development of Spatial Analysis Technologies Since the appearance of maps, people have consciously or unconsciously carried out various types of spatial analysis. For example, measuring the distance and area between geographic elements on the map, and using the map for tactical research and strategic decision-making. With the application of modern computer technology into cartography and geography, geographic information system (GIS) began to gestate and develop. Digital maps stored in the computer show a broader application field. Using computers to analyze maps, extract information, discover knowledge, and support spatial decision-making has become an important research content of GIS. “Spatial Analysis” has also become a specialized term in this field. Spatial analysis is the core and soul of GIS, and it is one of the main signs of GISs differing from other information systems (Cressie and Wikle 2015). Spatial analysis, combined with the attribute information of spatial data, can provide powerful and abundant analysis and mining functions for spatial data. Therefore, the position of spatial analysis in GIS is self-evident. So far, GIS and spatial analysis technologies have been applied in a wide range of fields including earth science, architecture, environmental science, social science, etc. (Fotheringham and Rogerson 2013). Spatial analysis aims to obtain derivative information and new knowledge from the relationship between spatial object. It is a complex process of extracting information from one or more spatial data sources (Burrough 2001). Spatial analysis uses geographic computing and spatial characterization to mine potential spatial patterns and knowledge. Its essence includes 1) detecting patterns in spatial data, 2) studying the relationship between multivariate data and establishing spatial data models, 3) making spatial data more intuitive to express its potential meaning, and 4) improving the prediction and control capabilities of geospatial events. Obviously, spatial analysis mainly uses the joint analysis of spatial data and spatial models to mine the potential information among spatial objects. The basic information of these spatial objects consists of their spatial location, distribution, form, distance,
1343
orientation, topological relationship, etc. Among them, distance, orientation, and topological relationship constitute the spatial relationship of spatial objects. It is the spatial characteristics between geographic entities and can be used as the basis for data organization, query, analysis, and reasoning. Combining the spatial data and attribute data of the spatial objects can perform spatial calculation and analysis of many specialized tasks (De Smith et al. 2018). Many spatial analysis methods have been implemented in GIS software. For instance, a large number of spatial analysis tools have been integrated in ArcToolBox, such as spatial information classification, overlay, network analysis, neighborhood analysis, geostatistical analysis, etc. (Fischer and Getis 2009).
Research Objects Spatial analysis is a general term for related methods of analyzing spatial data, and it is a sign of the advancement of GISs. Early GISs emphasized simple spatial query, and the analysis functions were really weak. With the development of GIS, users need more and more complex spatial analysis functions. These requirements promote the development of spatial analysis technology and also make a variety of spatial analysis functions. According to the different features of spatial data, it can be divided into: ① analysis operation based on spatial graphic data; ② data operation based on non-spatial attributes; ③ joint operation of spatial and nonspatial data. The basis of spatial analysis is geospatial data. Its main goal is to use a variety of geometric logic operations, mathematical statistical analysis, algebraic operations, and other mathematical methods to solve the actual problems of geographic space.
Main Contents Spatial analysis involves a variety of contents, which can be summarized into the following types. 1. Spatial location: the location information of a spatial object is transferred by means of the spatial coordinate system. It is the foundation of spatial object characterization. 2. Spatial distribution: group positioning information of similar spatial objects, including distribution, trend, comparison, etc. 3. Spatial morphology: geometric form of spatial objects. 4. Spatial distance: the proximity of space objects. 5. Spatial relationship: related relationship of spatial objects, including topology, orientation, similarity, correlation, etc.
S
1344
Basic Methods Spatial Query and Measurement Spatial query is to obtain the location and attribute information of the spatial objects according to certain conditions to form a new data set. Spatial query methods can be divided into the following types: a) Location query, b) Hierarchical query, c) Regional query, d) Conditional query, and e) Spatial relationship query. There are a variety of spatial relationships between spatial entities, including topological, sequence, distance, orientation, and so on. Querying and locating spatial entities through spatial relationships is one of the features of GISs that are different from other information systems. Generally, GIS software has the function of measuring points, lines, and areas. Spatial measurement has different meanings for different points, lines, and areas. a) Point features (0D): coordinates; b) Linear features (1D): length, direction, curvature; c) Area features (2D): area, perimeter, shape, etc.; d) Body features (3D): volume, surface area, etc. Buffer Analysis Buffer analysis is one of the spatial analysis tools to quantify the proximity problem of geospatial objects. It aims to automatically build a set of polygons around points, lines, areas, and other geographic entities. Proximity describes how close two geospatial objects are. In reality, buffer zones reflect a range of influence or service area of a geospatial object. According to the features of geospatial objects (point, line, area, and body), the corresponding buffer analysis methods have been developed. Overlay Analysis The overlay analysis of GIS is the operation of superimposing the data composed of related thematic layers to generate a new data layer. The result is a synthesis of the attributes of the original two or more layers. Overlay analysis includes not only the comparison of spatial relationships, but also the comparison of attribute relationships. Overlay analysis can be divided into the following categories: visual information overlay, points and polygons overlay, lines and polygons overlay, polygons overlay, and raster overlay. Logical operations are commonly used in overlay analysis, and logical expressions are used to analyze and process the logical relationships between geospatial objects from different layers. Network Analysis Network analysis is a basic model in operations. Its fundamental purpose is to study and plan how to arrange a network and make it run best (Okabe and Sugihara 2012). The analysis and modeling of geospatial networks (such as transportation
Spatial Analysis
networks) and urban infrastructure networks (such as various network cables, power lines, telephone lines, water supply and drainage pipelines, etc.) are the main objects of GIS network analysis. The basic idea is that human activities always tend to choose the best spatial location according to a certain goal. Such problems are numerous in social and economic activities, so it is of great significance to study network problems in GIS. Network analysis includes: path analysis (seeking the best path), address matching (essentially a query of geospatial locations) and resource allocation. Spatial Statistical Analysis Due to the interaction between spatial phenomena in different directions and different distances, traditional mathematical statistics cannot well solve the problems of spatial sampling, spatial interpolation, and the relationship between two or more spatial datasets. Therefore, the methods of spatial statistical analysis emerged. In the 1960s, Matheron formed a new branch of statistics through a lot of theoretical research, namely, spatial statistics, also called geostatistics (Matheron 1963). Spatial statistics is based on the theory of regionalized variables. It uses the variograms as the main tool to study the spatial interaction and change laws of geospatial objects or phenomena (Unwin 1996). The spatial statistical methods assume that all the values in the study area are nonindependent and correlated with each other. In the context of space or time, it is called autocorrelation. By detecting whether the variation in a position depends on the variation of other positions in its neighborhood, it can be used to judge whether the variation bears spatial autocorrelation. The important task of spatial statistical analysis is to reveal the correlation rules of spatial data and use these rules to predict unknown points. Spatial statistics contains two significant tasks: 1) analyzing the variograms or covariograms of spatial variability and structures and 2) Kriging interpolation for spatial local estimation. Spatial interpolation is to generate a continuous surface using samples collected at different locations. Kriging is the most common and widely used spatial interpolation method. Kriging and its variants are based on the theoretical analysis of variogram or covariogram. They are methods for unbiased and optimal estimation of regionalization variables in a limited area.
Conclusions and Outlook Spatial analysis is a spatial data analysis technique based on the location and morphology of geospatial objects. Its
Spatial Autocorrelation
purpose is to discover, extract, mine, and transmit potential spatial information and knowledge. Spatial analysis technology has been applied in many fields including geography, geology, surveying and mapping, architecture, environmental science, and social science. Different application fields present different meanings to spatial analysis. Their emphasis is different, but they all explain the connotation of spatial analysis from different aspects: either focusing on geometric analysis, or focusing on geostatistics and modeling. In summary, GIS spatial analysis is a technology that uses geometric analysis, statistical analysis, mathematical modeling, geographic calculation, and other methods to describe, analyze, and model the spatial relationship of in geospatial entities, and further provide services for spatial decision. Especially, spatial statistics or geostatistics has become the basis of quantitative research in geosciences. Geostatistics is widely used in the characterization of complex heterogeneous geological phenomena and structures, mapping the prospect of mineral resources, reserve calculation of mineral resources, reservoir prediction and simulation, etc. With the further development of artificial intelligence and big data technologies, using artificial neural networks, machine learning, and deep learning technologies to mine and analyze the deeper correlations and knowledge hidden behind the massive data has become the hotspots and key issues in the joint field of mathematical geoscience and GIS.
Cross-References ▶ Geostatistics ▶ Interpolation ▶ Spatial Statistics
Bibliography Burrough PA (2001) GIS and geostatistics: essential partners for spatial analysis. Environ Ecol Stat 8(4):361–377 Cressie N, Wikle C (2015) Statistics for spatio-temporal data. Wiley, Hoboken De Smith M, Longley P, Goodchild M (2018) Geospatial analysis – a comprehensive guide, 6th edn. The Winchelsea Press Fischer MM, Getis A (eds) (2009) Handbook of applied spatial analysis: software tools, methods and applications. Springer Science & Business Media Fotheringham S, Rogerson P (eds) (2013) Spatial analysis and GIS. CRC Press Matheron G (1963) Principles of geostatistics. Econ Geol 58(8): 1246–1266 Okabe A, Sugihara K (2012) Spatial analysis along networks: statistical and computational methods. Wiley, Hoboken Unwin DJ (1996) GIS, spatial analysis and spatial statistics. Prog Hum Geogr 20(4):540–551
1345
Spatial Autocorrelation Donato Posa and Sandra De Iaco Department of Economics, Section of Mathematics and Statistics, University of Salento, Lecce, Italy
Synonyms Covariance function; Spatial correlation; Variogram
Definition Spatial autocorrelation is an assessment of the correlation between two random variables which describe the same aspect of the phenomenon under study, referred to two locations of the domain. The suffix “auto” is justified since in some sense the spatial autocorrelation quantifies the correlation of a variable with itself over space. However, the expressions “spatial autocorrelation” and “spatial correlation” are used interchangeably in literature. Most of the theoretical results of classical statistics consider the observed values, as independent realizations of a random variable: this assumption makes the statistical theory much easier. Unfortunately, this last hypothesis cannot be valid if the observations are measured in space (or in time). Historically, the concept of spatial autocorrelation has naturally been considered an extension of temporal autocorrelation: however, time is onedimensional, and only goes in one direction, ever forward, on the other hand, the physical space has (at least) two dimensions. Any subject which regards data collected at spatial locations, such as soil science, ecology, atmospheric science, geology, image processing, epidemiology, forestry, astronomy, needs suitable tools able to analyze dependence between observations at different locations.
A Brief Overview In the last years, spatial data analysis has developed peculiar methods to provide a quantitative description of several spatial variables in various fields of applications, from environmental science to mining engineering, from medicine to biology, from economics to finance and insurance. The present contribution will be devoted to spatial autocorrelation for geostatistical data: indeed, besides geostatistics, distributions on lattices and spatial point processes are considered further branches of spatial statistics. In
S
1346
Spatial Autocorrelation
particular, geostatistical data refer to observations related to an underlying spatially continuous phenomenon, i.e., to measurements which can be taken at any point of a given spatial domain and utilize the formalism of the random functions (RFs) (Christakos 2005; Journel and Huijbregts 1981; Matheron 1971). Most of the applications use the variogram or the covariance function to characterize spatial correlation. A further and more general development to geostatistical methods has been provided by the theory of the intrinsic random functions of order k (Matheron 1973) and by multiple point geostatistics (Krishnan and Journel 2003). Although real-valued covariance functions have been mostly utilized in several applications, it should be properly underlined that a covariance, according to Bochner’s theorem, is a complex valued function: this relevant aspect is often neglected. For this purpose, in recent years, a significant and more general contribution to the theory of spatial autocorrelation has been given thanks to the formalism of complex RFs, which generalize the theory of real RFs (Posa 2020); indeed, the theory of complex valued RFs provides an important contribution in the study of various phenomena in oceanography, signal analysis, meteorology, and geophysics. In particular, in the last part of this entry, it will be underlined which tools of spatial autocorrelation are more appropriate and suitable, according to the various and different case studies.
Real Valued Random Functions Geostatistical techniques rely on the theory of RFs: in particular, in the whole first part of the entry the discussion will be devoted to real valued RFs, whereas in the last part of the entry, a brief outline on complex valued RFs will also be given. Consider the value z(s), s D, of a spatial variable z (which belongs to the set of the real numbers), where s is the vector of the spatial location and D is the spatial domain. Any sampled measure z(s) with s D is considered as a realization of the random variable Z(s). The set of all random variables over the domain D is called RF. Hence, the sampled values of a spatial phenomenon can be interpreted as a finite realization of a spatial RF (SRF), i.e., fZðsÞ; s Dg, D ℝn , n ℕþ :
ð1Þ
By assuming the existence of the first and second order moments for Z, the SRF can be decomposed as follows: Z ðsÞ ¼ mðsÞ þ Y ðsÞ,
ð2Þ
where m(s) ¼ E[Z(s)] is the expected value of Z, usually called drift, and Y is a zero mean SRF, which describes the random
fluctuations (or the micro scale variability) of the phenomenon under study around the expected value. In applied sciences usually the distribution function which characterizes the SRF is unknown and cannot be derived empirically due to the small number of realizations (in most cases only one sequence of measurements is available). Thus, the characterization of the SRF is limited to its statistical moments of order up to two. The part of the general theory that studies the properties of a SRF utilizing only the first and second order moments is called correlation theory. In particular, the expected value m (moment of first order), the covariance C, and the semivariogram γ (moments of second order) of an SRF Z are defined, respectively, as follows: mðsÞ ¼ E½ZðsÞ ,
s D,
C si , sj ¼ E ðZðsi Þ mðsi ÞÞ Z sj m sj
ð3Þ , si D,
sj D,
ð4Þ 2g si , sj ¼ Var Zðsi Þ Z sj ,
si D,
sj D: ð5Þ
Note that 2γ is called variogram and differs from the semivariogram by the constant. Stationarity Stationarity is a prior decision which refers to the SRF, not to the sampled values; this hypothesis makes inference possible from the observed values over the spatial domain D. The SRF Z is said to be stationary of order two, if its expected value exists and is independent of the location, its covariance C exists and depends only on the separation vector h between two spatial locations, i.e., E½Z ðsÞ ¼ m, s D,
ð6Þ
CðhÞ ¼ E½ðZðs þ hÞ mÞðZ ðsÞ mÞ , s D, s þ h D: ð7Þ As a consequence, even the semivariogram γ exists and depends only on the separation vector h between two spatial locations, i.e., 2gðhÞ ¼ E ðZ ðs þ hÞ Z ðsÞ 2 ,
s D, s þ h D:
ð8Þ
In case of a stationary random field, the covariance function, which is represented by a one-parameter function, is called covariogram. The covariance function usually decreases when the distance between two spatial locations increases; on the other hand, the semivariogram, by definition, is a variance, hence it usually increases when the distance between two spatial locations increases. It is well known that the covariance C and the semivariogram γ are connected through the following relationship:
Spatial Autocorrelation
1347
gðhÞ ¼ Cð0Þ CðhÞ:
ð9Þ
A detailed and interesting overview and comprehensive discussion about stationarity is given in Myers (1989). Note also that the variogram and the covariance functions are the main tools utilized to describe the spatial dependence; moreover, the corresponding models are used to make predictions.
can be found in Chiles and Delfiner (1999), in Christakos (1984) and in Cressie (1993). Properties of Covariance and Semivariogram Functions The family of the real covariance functions satisfies the following properties: jCðhÞj Cð0Þ,
Intrinsic Hypotheses A weaker stationarity hypothesis, often recalled in the literature, assumes that, for every vector h, the increment (Z(s þ h) Z(s)) is second order stationary: in this case, Z is called an intrinsic RF (IRF), i.e.,
CðhÞ ¼ CðhÞ,
and then C(0) 0. If Ci, i ¼ 1, . . ., n are covariance functions defined in ℝn and ai, i ¼ 1, . . ., n are non- negative coefficients, then CP and CS, defined as follows:
E ðZ ðs þ hÞ Z ðsÞ ¼ 0, 2gðhÞ ¼ E ðZ ðs þ hÞ Z ðsÞ 2 :
n
CP ðhÞ ¼
n
Ci ðhÞ,
CS ðhÞ ¼
i¼1
Hence, the processes that satisfy the hypotheses of secondorder stationarity are a subset of the processes that satisfy the intrinsic hypotheses. This aspect justifies why the semivariogram function γ is more utilized than the covariance function in several applications. Intrinsic Random Functions of Order k In geostatistics, it is well known that if there is a trend in the data, then a bias affects the variogram of the residuals in universal kriging. Historically, this has been the starting point to construct models in a such a way to reach stationarity by considering increments of higher order than the ones defined through the intrinsic hypotheses: indeed, these last are defined as IRF of order zero. Through the IRF of order k (IRFk) theory (Matheron 1973) a wider class of covariance functions (the generalized covariance functions) has been obtained and applied in order to overcome the bias, above mentioned, and to reach stationarity through a suitable order of increments. The theory of IRFk generalizes the approach utilized in time series (ARIMA models): indeed, Z is an IRFk, with k ℕ, if: m
W ðsÞ ¼
li Zðsi þ sÞ,
s D,
i¼1
is second order stationary, where si D, i ¼ 1, . . ., m, and Λ ¼ (l1, . . ., lm)T is a vector of real numbers such that (fj(s1), . . ., fj(sm))Λ ¼ 0, where fj, j ¼ 1, . . ., k þ 1, are mixed monomials. The generalized covariances, which characterize IRFk, are even functions and are unique up to an even polynomial of degree 2k. In summary, the RFs which are second order stationary are a subset of the RFs which are intrinsically stationary, i.e., IRF of order 0 (IRF0), and these last are a subset of IRF1 and so forth. A detailed discussion on IRFk, on the properties and construction of generalized covariances
ð10Þ
ai Ci ðhÞ i¼1
are covariance functions. If Cn(h), n ℕ are covariance functions in ℝn, then CðhÞ ¼ lim Cn ðhÞ is a covariance function, if the limit exists. n!1
If C(h; a) is a covariance in ℝn for all values of a A ℝ and if m(da) is a positive measure on A, then Cðh; aÞmðdaÞ is A
a covariance function, if the integral exists for all h D. Moreover, the condition for a function to be a covariance is that it be positive definite, namely: n
n
a i aj C s i s j 0
ð11Þ
i¼1 j¼1
for any si D, any ai ℝ, i ¼ 1, . . ., n, and any positive integer n. The function C is strictly positive definite if it excludes the chance that the quadratic form in (11) is equal to zero 8 n ℕ+, any choice of distinct points (si), 8ai ℝ (with at least one different from zero). Note that strict positive definiteness is desirable, since it ensures the invertibility of the kriging coefficient matrix; a thorough discussion on strict positive definiteness for covariance functions is given in De Iaco and Posa (2018). A semivariogram function satisfies the following properties: gðhÞ ¼ gðhÞ,
gð0Þ ¼ 0,
lim
gðhÞ ¼ 0; hk2
khk!1 k
if γi, i ¼ 1, . . ., n are semivariogram functions defined in ℝn and ai, i ¼ 1, . . ., n are non negative coefficients, then gS ðhÞ ¼
n
ai gi ðhÞ is a semivariogram function. Moreover,
i¼1
a semivariogram is a conditional negative definite function, namely,
S
1348
Spatial Autocorrelation n
n
n
ai aj g si sj 0, i¼1 j¼1
ai ¼ 0
ð12Þ
i¼1
for any si D, ai ℝ, i ¼ 1, . . ., n, and any positive integer n. Although a number of studies provide theoretical results in terms of the covariance, the spatial correlation is usually analyzed through the variogram, which is preferred to the covariance for different reasons (Cressie 1993).
Peculiar Features of Spatial Autocorrelation Some relevant peculiar characteristics, such as isotropy, separability, and symmetry, concerning spatial autocorrelation will be described hereafter and will be given in terms of the covariance function. Note that the definition of isotropy can also be given in terms of the variogram; however, the definition of separability can only be described in terms of the covariance function. These special features, which are described in detail in (De Iaco et al. 2019), play an important role for the choice of a suitable correlation model for the spatial variable. Isotropy/Anisotropies Geostatistical applications usually utilize, although very restrictive, the hypothesis of isotropy. Let Z be a SRF which is second order stationary and let C be its covariance function. Then Z is isotropic on ℝn (n ℕ+), if there exists a function C0 such that: Cðs1 s2 Þ ¼ C0 ðkhkÞ, s1 , s2 D, s1 s2 ¼ h,
ð13Þ
where k.k indicates the norm in the Euclidean space ℝn. Definition (13) can also be given in terms of the variogram. If C is an isotropic covariance function in ℝn, then C is also an isotropic covariance function in ℝm, with m < n; the same property holds for a variogram function γ. However the converse could not be true. As a consequence, the definition of anisotropy stems from the rejection of (13). In Yaglom (1987), some relevant properties concerning the Eq. (13) are described in detail. In classical geostatistics, the class of anisotropic SRF has been often classified in two main subsets, according to two different kinds of anisotropy: geometric anisotropy and zonal anisotropy, which can be equally defined in terms of the covariance or variogram. The first subset includes SRF characterized by a variogram function which presents the same sill value in all directions, but the range changes with the spatial direction. Hence, given an isotropic covariance function C0, it can be transformed to a geometric anisotropic covariance function C through a linear transformation of its lag vector
h ℝn, i.e., C(h) ¼ C0 (kAhk), where the matrix A is obtained as the product of a rotation matrix with a scaling parameters diagonal matrix. On the other hand, according to Journel and Huijbregts (1981), by setting equal to zero some values of the previous diagonal matrix, a zonal anisotropy is obtained; hence, zonal anisotropy can be defined as a special case of geometric anisotropy. A covariance function, characterized by zonal anisotropy, can be obtained through the sum of covariance functions, where each covariance is defined on a proper subspace of the spatial domain. As shown hereafter, the following covariance function in ℝn is given as the sum of the models C1 and C2, i.e., CðhÞ ¼ C1 ðh1 Þ þ C2 ðh2 Þ,
h ¼ ðh1 , h2 Þ,
where h1 ℝn1 and h2 ℝn2 , with ℝn ¼ ℝn1 ℝn2 . Zimmerman (1993) introduced a peculiar classification of anisotropy. In particular, he dropped out the definition of “zonal anisotropy” for the more explanatory terms such as “range anisotropy” (also slope anisotropy), “sill anisotropy,” and “nugget anisotropy”, and to save the classical term “geometric anisotropy,” which is interpreted as a special case of range anisotropy. Separability Let C be a covariance function on ℝn; • if C(h) ¼ C1(h1)C2(h2) Cn(hn), and each Ci : ℝ ! ℝ is a covariance function, i ¼ 1, . . ., n, then C is fully separable; • if C(h) ¼ C1(h1)C2(h2) Ck(hk), k < n, and each Ci is a covariance function on ℝni , i ¼ 1, . . . , k, with ℝn ¼ ℝn1 ℝn2 ℝnk , then C is partially separable. If a covariance function is fully separable, then it is partially separable, whereas the converse is not true. Symmetry Symmetry is a further characteristic of a SRF which is second order stationary; in particular: • A SRF is axially symmetric, if C(h1, , hj, , hn) ¼ C((h1, , hj, , hn) ¼ C(h1, , hj, , hn) for a given j. This last property is named axial or reflection symmetry in ℝ2; • A SRF is quadrant symmetric, if C(h1, , hj, , hn) ¼ C(h1, , hj, , hn) for all j ¼ 1, . . ., n. In particular, in ℝ2, a SRF is
Spatial Autocorrelation
1349
• Laterally or diagonally symmetric, if C(h1, h2) ¼ C(h2, h1), (h1, h2) ℝ2. • Complete symmetric, if both diagonal and reflection symmetric properties are satisfied, i.e., C(h1, h2) ¼ C(h1, h2) ¼ C(h2, h1) ¼ C(h2, h1), (h1, h2) ℝ2. Note that the set of isotropic covariance functions is a subset of the class of symmetric covariance functions. Remarks Symmetry, strict positive definiteness, and separability do not require any form of stationarity for the SRF. On the other hand, the property of isotropy is appropriate only for second order stationary SRFs. A covariance function can be separable and isotropic if and only if all the factors of the product model are Gaussian covariance functions (De Iaco et al. 2020).
Given the difficulties to verify conditions (11) and (12), it is advisable for users to look for the best model of spatial correlation, for a given case study, among the wide parametric families whose members are known to respect the above conditions. Specific details about the construction and characteristics of covariance and variogram models are given in Chiles and Delfiner (1999), Cressie (1993), and Journel and Huijbregts (1981). Several parametric families of covariance and variogram models can be found in the previous mentioned geostatistical textbooks. Among the various variogram models, the most utilized, such as the exponential, the gaussian, the rational quadratic, and the power model, are valid in ℝn, n 1, whereas the spherical model is valid in ℝn, with n 3. With the exception of the power model which is unbounded, all the previous models are bounded. Spatial correlation with an alternance of positive and negative values, caused by a periodicity of the spatial phenomenon, is often modeled by the hole effect or wave variogram. Some isotropic semivariogram models proposed in the literature for a random function that satisfies at least the intrinsic hypotheses are reported below. For simplicity of notation, the semivariogram is assumed to be dependent on the modulus of h, that is γ(h) ¼ γ(h), where h ¼ khk. • Spherical model.
g ð hÞ ¼
3
0ha
gðhÞ ¼ C½1 expðh=aÞ ,
(14)
h>a
where a ℝ+ is the range and C ℝ+ is the sill value.
ð15Þ
where a ℝ+ is the range and C ℝ+ is the sill value. This model reaches the sill value asymptotically, then the value a0 for which γ(a0) ¼ 0, 95C is such that a0 ¼ 3 a and it called effective range. • Gaussian model. gðhÞ ¼ C 1 exp h2 =a2 ,
ð16Þ
where a ℝ+ is the range and C ℝ+ is the sill value. Similarly to the previous model, it reaches the sill value asymptotically and the value a0 for which γ(a0) ¼ 0, 95C is p such that a0 ¼ 3a. • Power model. gðhÞ ¼ oha ,
Covariance and Variogram Models
h h C 1, 5 0, 5 a a C
• Exponential model.
ð17Þ
with α [0, 2]. This function is concave for 0 < a 1, while it is convex for 1 < a < 2. • Hole effect model. The hole effect models are used to describe the behavior of a semivariogram which does not grow monotonically, as h increases. – Hole Effect 1. gðhÞ ¼ C½1 cosðh=aÞ ,
ð18Þ
with a ℝ+ and C ℝ+. This model, valid in ℝ, presents a hole effect, which described by a periodic component, without damping in the oscillations. – Hole Effect 2. gðhÞ ¼ C½1 expðh=a1 Þ cosðh=a2 Þ ,
ð19Þ
with a1, a2 ℝ+ and C ℝ+. This model is valid in ℝ; moreover, it is valid in ℝ2 if and only if a2 a1 and in p ℝ3 if and only if a2 a1 3 (Yaglom 1987). – Hole Effect 3. gðhÞ ¼ C½1 ða=hÞ sinðh=aÞ ,
ð20Þ
with a ℝ+ and C ℝ+. This model is valid in ℝ3 and similarly to the previous model presents a damping effect in the oscillations. Figure 1 shows the semivariogram models analytically described above. Further classes and specific properties of correlation models, which are constructed through the modified Bessel functions, can be found in Matern (1980).
S
1350
Spatial Autocorrelation
Spatial Autocorrelation, Fig. 1 (a) Spherical model; (b) Exponential model; (c) Gaussian model; (d) Power model; (e) hole effect 1 model; (f) hole effect 2 model; (g) hole effect 3 model
Characteristics of Spatial Autocorrelation In the present section some characteristics of the variogram functions are briefly discussed, essentially because they are useful to choose an appropriate model of spatial correlation. At this purpose, the behavior near the origin and the behavior for large distances are often relevant issues. Sill and Range
For what concerns the behavior of a semivariogram for large distances, two typical situations are usually encountered: the semivariogram systematically increases at large distances, otherwise it stabilizes around a particular value, which is called sill, while the corresponding distance at which the sill is reached is called range. In this last case, for distances greater than the range, there is absence of spatial correlation.
Nested Structures
The variogram can reveal nested structures, that is, hierarchical structures, each characterized by its own range (Journel and Huijbregts 1981). Behavior Near the Origin
The importance of the behavior of a variogram close to the origin is related to the regularity of the spatial variable. In particular, a parabolic behavior is typical of a spatial variable which is very regular and the corresponding RF is differentiable in the mean square sense. A linear behavior is typical of a spatial variable which is not so regular as in the previous case, while the corresponding RF is continuous but not differentiable in the mean square sense. If the semivariogram presents a discontinuity at the origin, called nugget effect, then the spatial variable is very irregular. The denomination
Spatial Autocorrelation
nugget effect originates from the strong variability presented in gold deposits. However, the discontinuity of a variogram at the origin can be caused by several reasons, well described in the classical geostatistical references (Journel and Huijbregts 1981). At last, if the variogram function is a flat curve, then this behavior is typical of absence of spatial correlation. Hole Effect
If the variogram function presents some bumps for some distances, which correspond to negative values (holes) of the covariance function, then the physical interpretation of this behavior derives from the presence of couples of locations, at the same distances, where relatively low values of the spatial variable are combined with relatively high values and vice versa. Periodicities
As well known, temporal observations frequently present a periodic behavior, because of the natural influence of cycles which characterize human activities and natural phenomena: this periodic behavior is captured by a correlation measure. On the other hand, spatial variables are rarely characterized by periodicities, except in some peculiar cases, such as the behavior of sedimentary rocks. Presence of a Drift
If the sample variogram increases faster than a parabola for large distances, then this behavior reflects the presence of a trend in the data, hence the expected value of the SRF cannot be assumed constant over the spatial domain. Structural Analysis The whole geostatistical analysis can be summarized through the following steps: 1. Look for a model of the SRF from which data could reasonably be derived. 2. Estimate the variogram as a measure of the spatial correlation exhibited by the data. 3. Fit a suitable model to the estimated variogram. 4. Use this last model in the kriging system for spatial prediction. In particular, structural analysis is executed on the realizations of a second order stationary SRF. These realizations correspond directly to the observed values (if the macro scale component can be reasonably supposed constant over the domain) or to the resid uals otherwise. Structural analysis begins with the estimation of the semivariogram or covariogram of the residual SRF Y defined in (2). Given the set A ¼ {si, i ¼ 1, 2 . . ., n} of data locations, the semivariogram can be estimated through the sample semivariogram g as follows:
1351
gð rs Þ ¼
1 2jLðrs Þj
½ Y ðs þ h Þ Y ðsÞ 2
ð21Þ
Lð r s Þ
where jL(rs)j is the cardinality of the set L(rs) ¼ {(s þ h) A, (s) A : h Tol(rs)}, and Tol(rs) is a specified tolerance region around rs. Similarly, the sample covariogram C is defined below: C ð rs Þ ¼
1 jLðrs Þj
½Y ðs þ hÞ Y ½Y ðs, tÞ Y ,
ð22Þ
Lðrs Þ
where Y is the sample mean. The second step of structural analysis consists in fitting a theoretical model to the sample spatial semivariogram or covariogram. As already underlined in one of the previous sections, various classes of covariance functions or semivariograms are available. Variogram or covariogram modeling has been a research focus in this field for a long-time. Maximum likelihood and least squares are the two common ways to achieve this model fitting goal. Some details on the fitting aspects can be found in Cressie (1993). After modeling the spatial empirical variogram/ covariogram surface, the subsequent step is to evaluate the reliability of the fitted model through the application of crossvalidation and jackknife techniques. Then, if the model performance is satisfactory, the same model can be used to predict the variable under study over the spatial domain by using kriging. Through the cross-validation technique all the observed values are removed, one at a time, and at each location where the observed value has been temporarily removed, the spatial variable is estimated by utilizing all the remaining observed values and the selected variogram model. At last, the estimated values are compared with the observed ones through the correlation coefficient. On the other hand, the technique of jackknife operates in a different way; in particular, two different data sets of observed values are considered. The first data set is the one used in structural analysis to estimate and model the correlation function (variogram); this last model is then used to predict the spatial variable at the locations of the second data set. Hence, these last estimated values are compared with the true values (which are known) of the second data set, called for this reason, control data set. From a computational point of view, the literature offers various packages which perform structural analysis in two and three dimensions, such as the Geostatistical Soft- ware Library GSLib and its version for Windows WinGSLIB (Deutsch and Journel 1998), SGeMS (Remy et al. 2009), as well as the commercial software ISATIS. Moreover, in the R environment, there are various packages devoted to geostatistical modeling, such as gstat and RGeostats (Pebesma and Wesseling 1998; Renard et al. 2014).
S
1352
Spatial Autocorrelation
Multiple Point Statistics As previously pointed out, the analysis of spatial correlation in classical geostatistics is based on the variogram or covariance function. However, these tools, which are known as two point statistics, could not be appropriate for modeling phenomena which present curvilinear structures or, more generally, complex spatial patterns. At this purpose, multiple point statistics (Krishnan and Journel 2003) represent a solution to overcome these difficulties since the relations among the variables at more than two points at a time are considered. Unfortunately, spatial data are usually sparse, hence they are not very informative for the application of multiple point statistic techniques. These last problems can be overcome by utilizing training images from which multiple point statistic can be provided.
Complex Random Functions Complex RFs, as a generalization of real RFs, can be useful to model vectorial data in ℝ2: for example, a wind field can be viewed as a complex variable by considering the intensity of the wind speed and its direction at each spatial location (De Iaco and Posa 2016). Let ℝn and ℂ be the n-dimensional Euclidean space and the set of complex numbers, respectively. Let Z1 and Z2 be the real components of the following complex RF Z: ZðsÞ ¼ Z 1 ðsÞ þ iZ 2 ðsÞ, s ℝn , where i2 ¼ 1, with i the imaginary unit. If Z is a second order stationary RF, its covariance function is defined as follows: CðhÞ ¼ E ðZðsÞ mÞðZðs þ hÞ mÞ and it can be expressed in the following form: C(h) ¼ Cre(h) þ iCim(h), where Cre ðhÞ ¼ CZ1 ðhÞ þ CZ2 ðhÞ, Cim ðhÞ ¼ CZ2 Z1 ðhÞ CZ1 Z2 ðhÞ Cre is an even function and it is a real covariance function, whereas Cim is an odd function and it is not a covariance function. Note that the following properties generalize the previous ones given in (10) for a real valued covariance function: CðhÞ ¼ CðhÞ; Cð0Þ 0; jCðhÞj Cð0Þ. The theorem of Bochner (1959) specifies that any continuous covariance function can be represented as follows:
CðhÞ ¼
ℝ
n
exp ivT h dF ðvÞ,
ð23Þ
where F is a non-negative and finite measure on ℝn. Hence, according to the previous theorem, a covariance is a complex valued function. A detailed discussion on the construction of complex and, in particular, of real covariance functions has been recently given in Posa (2020). In particular, the symmetry of the spectral distribution function or the property of the spectral density function to be an even function, guarantee that the covariance is a real function. Otherwise the covariance is a complex function.
Summary and New Results In the present section, a summary on spatial autocorrelation, as well as some recent and significant results will be provided. • One of the main issues, which often requires a specific answer, concerns the measure of spatial autocorrelation to be utilized for the case study at hand. Indeed, the answer to this kind of question is not so easy, as could appear or as often underlined in the literature. In fact, in most of the geostatistical applications the variogram is preferred to the covariance function, essentially because the first function satisfies the intrinsic hypotheses, which describe a wider class of spatial RFs respect to the processes which satisfy the hypotheses of second order stationarity. Several papers provide a comparison between these two correlation functions and the above choice is often justified by further reasons (Cressie and Grondona 1992). • On the other hand, the variogram, by definition, is a variance, hence it is a real valued function; according to Bochner’s theorem, the covariance is a complex valued function and it is suitable to analyze vectorial data in two dimensions, as shown in some interesting applications concerning phenomena which often happen in oceanography, environmental and ecological sciences, geophysics, in meteorology, and signal analysis. • If the spatial variable is a scalar quantity, the variogram or, more generally, the formalism of the IRFk could be a suitable choice for the measure of spatial autocorrelation. • In presence of curvilinear or complex spatial patterns, if a training image is available, then the techniques of multiple point statistics are recommended, as shown in several applications. • Wide classes of parametric families of complex covariance functions have been recently constructed (Posa 2020). These complex covariance models could be applied in a spatial context, in several dimensional spaces, as well as in
Spatial Data
a spatiotemporal domain to model the correlation structure of complex valued random fields. • Although it is well known that the difference of two covariance functions could not be a covariance function, several classes of models for the difference of two covariance functions have been recently constructed in the complex domain and in the subset of the real domain (Posa 2021). All the relevant issues and consequences which stem from this last result are also discussed in the same paper.
Cross-References ▶ Multiple Point Statistics ▶ Markov Random Fields ▶ Stationarity ▶ Variance
References Bochner S (1959) Lectures on Fourier integrals. Princeton University Press, Princeton, 338 p Chiles J, Delfiner P (1999) Geostatistics. Wiley, New York, 687 p Christakos G (1984) On the problem of permissible covariance and variogram models. Water Resour Res 20(2):251–265 Christakos G (2005) Random field models in earth sciences. Dover, Mineola, 512 p Cressie N (1993) Statistics for spatial data. Wiley, New York, 900 p Cressie N, Grondona M (1992) A comparison of variogram estimation with covariogram estimation. In: Mardia KV (ed) The art of statistical science: a tribute to G.S. Watson. Wiley, Chichester, pp 191–208 De Iaco S, Posa D (2016) Wind velocity prediction through complex kriging: formalism and computational aspects. Environ Ecol Stat 23(1):115–139 De Iaco S, Posa D (2018) Strict positive definiteness in geostatistics. Stoch Env Res Risk A 32:577–590 De Iaco S, Posa D, Cappello C, Maggio S (2019) Isotropy, symmetry, separability and strict positive definiteness for covariance functions: a critical review. Spat Stat 29:89–108 De Iaco S, Posa D, Cappello C, Maggio S (2020) On some characteristics of Gaussian covariance functions. Int Stat Rev 89(1):36–53 Deutsch CV, Journel AG (1998) GSLib: geostatistical software library and user’s guide, Applied Geostatistics Series, 2nd edn. Oxford University Press, New York Journel AG, Huijbregts CJ (1981) Mining geostatistics. Academic Press, London, 610 p Krishnan S, Journel AG (2003) Spatial connectivity: from variograms to multiple-point measures. Math Geol 35(8):915–925 Matern B (1980) Spatial variation, 2, lecture notes in statistics. Springer, New York Matheron G (1971) The theory of regionalized variables and its applications. Les Cahiers du Centre de Morphologie Mathematique in Fontainebleu, Fontainebleau, 211 p Matheron G (1973) The intrinsic random functions and their applications. Adv Appl Probab 5:439–468 Myers DE (1989) To be or not to be... Stationary? That is the question. Math Geol 21:347–362
1353 Pebesma EJ, Wesseling CG (1998) Gstat: a program for geostatistical modelling, prediction and simulation. Comput Geosci 24(1):17–31. https://doi.org/10.1016/s0098-3004(97)00082-4 Posa D (2020) Parametric families for complex valued covariance functions: some results, an overview and critical aspects. Spat Stat 39 (2):1–20. https://doi.org/10.1016/j.spasta.2020.100473 Posa D (2021) Models for the difference of continuous covariance functions. Stoch Env Res Risk A 35:1369–1386 Remy N, Boucher A, Wu J (2009) Applied geostatistics with SGeMS: a user’s guide. Cambridge University Press, New York Renard D, Desassis N, Beucher H, Ors F, Laporte F (2014) RGeostats: the geostatistical package. Mines Paris Tech Yaglom AM (1987) Correlation theory of stationary and related random functions, vol I. Springer, Berlin, 526 p Zimmerman DL (1993) Another look at anisotropy in geostatistics. Math Geol 25(4):453–470
Spatial Data Sabrina Maggio and Claudia Cappello Dipartimento di Scienze dell’Economia, Università del Salento, Lecce, Italy
Synonyms Geodatabase; Geographical data; Georeferenced data; Territorial data
Definition Data which present a spatial component are called spatial data (Fischer and Wang 2011). In a nutshell, they are sample data of a random field referred to a spatial domain.
Overview Spatial data are a collection of observations measured in different locations on a spatial domain. They are interpreted as a finite realization of a random field, whose distribution law (often unknown) provides a likelihood measure of the spatial evolution of the phenomena under study. Spatial data have two characteristics, that is: • They are non-repetitive, since only one observation is available in every single location. • They are spatially correlated, which means that the phenomenon under study varies in space, but the variations are correlated over some distances. For this last reason, the usual assumption of independence, made in classical statistical inference, is not reasonable in
S
1354
Spatial Data
Spatial Data, Table 1 Classification and characteristics of spatial data, together with the R packages available for their analysis Types of dataa Geostatistical
a
Variable type Random variables discrete or continuous
Spatial index Variable defined everywhere in the domain
Examples Soil pH Air temperature Soil clay content
Lattice
Random variables discrete or continuous
Point/area objects fixed in the domain
Point patterns
Random variables discrete or continuous (if measured)
Randomly located points
Disease rate Crime rate Land use Total fertility rate Location of trees, of restaurants, of bicycle theft crimes
R packagesb gstat geoR geospt RandomFields RGeostats (MINES 2020) spdep spgwr spatialreg DCluster spatstat splancs spatial SpatialKernel
Classification suggested by Cressie (1993) An exhaustive list is available on the CRAN Task View: Analysis of Spatial Data (Bivand 2021)
b
spatial statistics. Indeed spatial data usually show dependencies along any direction in the domain, and their intensity is generally closely connected with the distance between locations and weakens as the distance arises (Cressie 1993). This feature of the spatial data recalls Tobler’s first law of geography: “everything is related to everything else, but near things are more related than distant things” (Tobler 1970). Spatial data are used in various scientific fields, such as agriculture, geology, environmental sciences, hydrology, epidemiology, and economics (Müller 2007). In the geographical information system, the term spatial data is used to refer to features (points, lines, and polygons) linked to a geographic location on the Earth; in this sense, the features are georeferenced. Note that each specific application often requires an appropriate data resolution, since some features of interest are scale-dependent (Flury et al. 2021).
Types of Spatial Data Over the past 40 years, many studies have been carried out on spatial data analysis considered from different points of view (Ripley 1981; Isaaks and Srivastava 1989; Goovaerts 1997; Chilès and Delfiner 2012; Schabenberger and Gotway 2005 and others). Given a spatial random field in a d-dimensional space {Z(s) : s D ℝd}, where Z(s) denotes a spatial random variable and s a location of the spatial domain D, the spatial data can be classified with respect to the nature of the domain D, as follows: • Geostatistical data • Lattice data • Point patterns
A summary of these spatial data classification is provided in Table 1. For further details on the spatial data classification, the readers may refer to Cressie (1993).
Geostatistical Data In the case of geostatistical data, the sample data are considered as a finite realization of the random field {Z(s) : s D}. The values that can be observed at the i-th spatial point is interpreted as a possible realization of a random variable, which is denoted as Z(si), where si represents the spatial location (Cressie 1993). Moreover, the peculiar feature of geostatistical data is that the domain D ℝd is continuous and fixed. If d ¼ 2, the spatial location s is characterized by the Cartesian coordinates (x, y). In this field of spatial data, the locations are treated as explanatory variables and the values recorded at these locations as response variables. The spatial domain is continuous since the random variable Z(s) can be observed at any point of D; hence between two spatial locations si and sj, an infinite/non-countable number of other spatial locations can be fixed. It is worth noting that the adjective continuous refers to the spatial domain and not to the type of the random variable measured on it, which could be continuous or discrete. Furthermore, the spatial domain is fixed in the sense that the spatial points si, i ¼ 1,. . ., n, are non-random (Schabenberger and Gotway 2005; Cressie 1993). It is important to highlight that since the spatial domain is continuous, it cannot be sampled totally; consequently, the sample data is sampled at a finite number of locations, chosen at random or by a deterministic method. The sampling design takes into account sampling costs, benefits, and experimental constraints (i.e., locations of measuring stations).
Spatial Data
1355
a
b
1.33 |- 24.29 24.29 |- 32.20 32.20 |- 39.55 39.55 |- 68.62
Spatial Data, Fig. 1 (a) Gray scale map and (b) contour map of the soil clay percentage in an Italian district (Source data: soil map of Italy, year 2000, https://esdac.jrc.ec.europa.eu/)
Since there are only few measurements of the phenomena under study over the spatial domain, in geostatistics the main objective is the estimation of the random variable Z(s) at unsampled spatial location, starting from observations available at a finite set of locations. The kriging is the most and widely used method for predicting the random variable in an unsampled location, by means of the observed data. This optimal prediction method requires the knowledge of a variogram/covariogram model, for this reason a relevant effort is devoted to estimate the spatial correlation function (through the variogram or the covariogram) as well as to fit a model to it. In the following, an example regarding geostatistical data is proposed. Example 1 Soil clay content is a key parameter in soil science, since it influences magnitudes and rates of several chemical, physical, and hydrological phenomena in soils. Given a spatial sampling, the observations of this feature taken at the sampled spatial locations can be considered as geostatistical data. Indeed, the soil clay content can be evaluated at any location of a given domain D, but only few measurements at some spatial locations are available. One of the main goals might be to define a continuous surface across the domain in order to estimate the spatial variability of the soil clay content over the area under study.
In Fig. 1, a location map of 485 sample points regarding the soil clay percentage in an Italian district and the corresponding kriging estimates, over the entire domain, are shown (Posa and De Iaco 2009). The estimates are obtained by using the ordinary kriging method and a variogram model.
Lattice Data Lattice data are spatial data where the domain D is a countable collection of spatial regions at which the random variable is observed. The spatial regions are such that none of them can intersect each other, and the regions are characterized by a neighborhood structure. For example, in a 2D space, the spatial domain can be expressed as the set of the areas Ai, i ¼ 1,. . ., n, which cover the territory under study D: D ¼ fA 1 , A 2 , . . . , A n g
A i D ℝ2
A1 [ A2 [ [ An ¼ D
with Ai \ Aj ¼ ; , i, j ℕ+, i 6¼ j. Hence, in this context, the spatial domain is fixed and discrete. In particular, the spatial domain is • Fixed, in the sense that the spatial regions Ai, i ¼ 1,. . ., n, in D are not stochastic (Schabenberger and Gotway 2005). • Discrete, since the number of spatial regions could be infinite, although countable.
S
1356
Spatial Data
The spatial data could refer to a regular lattice (i.e., the cells/regions show the same form and size) or irregular lattice (i.e., the spatial domain is divided into cells/regions according to the natural or policy barriers). Moreover, the data do not correspond to a specific point in space, but to the entire region. Thus, it is usually assigned to each area a representative location in space, which is usually the centroid of the region, and then recorded for each centroid the corresponding spatial coordinates (i.e., longitude and latitude) in a 2D domain. Therefore, the spatial domain can be also formalized as follows: D ¼ fði; xi , yi Þ : i ¼ 1, . . . , ng where i denotes the i-th region under study, xi and yi are the longitude and latitude of the i-th centroid, respectively. Similarly to the geostatistical context, the values that can be observed at the i-th region are interpreted as a possible realization of a random variable, which is denoted as Z(si), where si, represents the centroid of the corresponding region and refers to the representative location (Cressie 1993). In other contributions (see, for example, Schabenberger and Gotway 2005), the notation Z(Ai) is used, in order to underline the areal nature of lattice data, where Ai is the i-th sampled areal unit. It is important to point out that a specific feature of the lattice data is that they are exhaustive over the spatial domain. Moreover, they usually concern aggregated data over the sample region. However, the aggregation of the data causes sometimes some problems; in particular if the level of aggregation is chosen arbitrarily, the modifiable areal unit problem can arise (Openshaw and Taylor 1979), since the size of the areas on which aggregation is applied influences the correlation between variables. This problem occurs frequently in ecology, where most of ecological variables depend on other covariates, acting at different spatial scales (Saveliev et al. 2007). Note that, in the analysis of the lattice data, the estimation of the random variable at unsampled points is not relevant, since the phenomenon under study is known over the entire domain, but the main goal is to model the spatial process using the spatial regression method. In the model, the variable of interest is predicted by using other variables measured on the same areas as well as by including the spatial dependence of neighboring regions, since if regions are nearby, the random variables collected at these areas could be spatially correlated. In order to include the spatial component in the regression model, the covariance structure must be identified. This last requires to define previously a neighborhood structure by means of the (n n) spatial weights matrix W, whose elements wij are defined as follows:
wij ¼
1 if Ai and Aj share a common border, 0 otherwise:
ð1Þ
Note that symmetry of the weights is not a requirement (Schabenberger and Gotway 2005). Regions which are neighboring and with similar values identify a spatial pattern, which is representative of a cluster, and the selection of correlated variables with the one of interest helps in the detection of explicative factors for the analyzed phenomenon. For lattice data modeling, the wider applied approaches are based on the linear regression models. However, if the residuals highlight the presence of spatial correlation, alternative models which take into account the spatial dependence could be used. Among these, it is worth mentioning the conditional spatial autoregression (CAR), simultaneous spatial autoregression (SAR), and moving average (MA) models. In the following, an example regarding lattice data is proposed. Example 2 The total fertility rate (TFR) is the average number of children that a woman would potentially have in her child-bearing years (i.e., for ages 15–49). In Fig. 2, the map of the centroids of the regions and the color map of this demographic index, available for each region in France, are provided (De Iaco et al. 2015). This is a typical example of lattice data, where the domain D is a set of 21 counties; thus it is fixed and discrete. Moreover, the TFR is an aggregate statistic over each region.
Point Patterns Geostatistical and lattice data are measured over a fixed, nonstochastic domain, where the term “fixed” is used to highlight that the domain D does not change if different realizations of the spatial process are considered. Moreover, for these types of data, the analyst is interested in the measurement of the attribute Z over the spatial domain D. On the other hand, if the analyst intends to focus only on the location at which an event occurs, such a dataset is known as spatial point pattern. Diggle (2003) defined the point process as a “stochastic mechanism which generates a countable set of events,” where the events are the points of the spatial pattern in a bounded region. The point patterns are said to be sampled or mapped, if the events are partially observed or if all the events are recorded, respectively. For a spatial point process, the spatial domain D is a collection of points on the hyperplane D ¼ fs 1 , s 2 , . . . s n g
Spatial Data
1357
a
b
Nord Pas de Calais Haute Picardie Normandie Basse Champagne Normandie ^ Ile de Ardenne Lorraine France Pays de Alsace Bretagne la Loire Bourgogne Centre Franche Poitou Comté Charentes Limousin ^ Rhone Alpes Auvergne Aquitaine Midi Pyrénées Languedoc Roussillon
Provence Alpes ^ d’Azur Cote
0
100 200 km
Centroid of the region Classes of TFR values up to 1.43 1.44 | 1.59 1.59 | 1.75 1.75 | 1.91 1.91 | 2.07 more than 2.07
Spatial Data, Fig. 2 (a) Location map of the regional centroids and (b) color map of the yearly TFR registered in 2011 for the 21 regions of mainland France (Source data: Eurostat, 2013, http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics)
In general, no random variable Z(s) is observed at these points. Otherwise, if an attribute is measured in each point, the realization of a point process {Z(s) : s D ℝd} represents a pattern of points over the domain D. In order to distinguish these two cases, in the literature the terms unmarked and marked point pattern are often used. In particular, the former is given to the locations where the event occurs, while the latter is used when an attribute Z( ) is also observed at these points. For instance, the locations at which trees emerge in a forest, the spatial distribution of schools in a city, or the locations of an earthquake are examples of unmarked point patterns. Nevertheless, the set of locations regarding the height of trees in a forest, the size of schools in a city, or the magnitude of earthquakes are examples of marked point patterns. It is important to highlight that, in contrast with geostatistical and lattice data, which can be measured at any point of the spatial domain D, a point process takes nonzero values only in specific points that are the ones in which the underlying process occurs (Hristopulos 2020). Moreover, in spatial point pattern statistics, the spatial locations as well as the values measured in each point are treated as response variables. Given a random set D, for each region Ai D, N(Ai) denotes the number of points in the region Ai and l(Ai) is the average number of events in the same area. The point pattern is named completely random process or, alternatively, Poisson point process if the following properties are satisfied: • The intensity function l( ), which is the average number of points in each area, is constant. • The numbers of events in two areas Ai and Aj, such as Ai \ Aj ¼ ;, i, j ℕ+, i 6¼ j, in the domain D are
independent; hence the probability that an event occurs is equal in each spatial location. Note that this model is widely used in spatial point pattern analysis, to compare the locations of the events with this reference distribution. However, the Poisson point process does not always adequately describe phenomena with spatial clustering of points and spatial regularity. In the literature, a variety of other models were proposed, among these the Cox process, the inhibition process, or the Markov point process (Cressie 1993; Schabenberger and Gotway 2005). The objectives of point pattern analysis vary according to the scientist’s purposes. In general, it is relevant to focus on the spatial distribution of the observed events and make inference about the underlying process that generates them. In particular, it could be interesting to analyze the distribution of the points in space, in terms of intensity of the point pattern, as well as the existence of possible interactions between events, which may reveal the tendency of events to appear clustered, independently, or regularly spaced (Bivand et al. 2013). For example in ecology, a target could be determining the spatial distribution of a tree species over a study area or assessing whether two or more species are equally distributed. Furthermore, in spatial epidemiology, it could be challenging to check whether or not the cases of a certain disease are clustered. This approach can be also used in human sciences, to identify, for example, areas where crimes occur, the activity spaces of individuals, the catchment areas of services, such as banks and restaurants. In the following, an example regarding point patterns is proposed. Example 3 In Fig. 3, a map of bicycle theft crimes occurring in London (September 2017) is shown. This represents an
S
1358
Spatial Data Infrastructure and Generalization
0
10
20 km
bicycle theft
Spatial Data, Fig. 3 Location map of bicycle thefts in the city of London in September 2017 (Source data: UK Police Department 2017, https://data.police.uk/data/archive/)
example of unmarked spatial point pattern, which consists of 2218 locations of bicycle thefts recorded within the city of London.
Summary The spatial data include information on the location and the measured attributes. They have two fundamental characteristics: (a) dependency between specific locations along any direction, where the intensity becomes weaker as the distance arises, and (b) non-repetitive, since only one observation is available in every single location. As reported in Cressie (1993), the spatial data types have been classified with respect to the characteristics of the domain D, as geostatistical data, lattice data, and point patterns.
Cross-References ▶ Geostatistics ▶ Multiple Point Statistics ▶ Random Variable ▶ Spatial Analysis ▶ Spatial Statistics
Bibliography Bivand RS (2021) CRAN task view: analysis of spatial Data. Version 2021-03-01, URL: https://CRAN.R-project.org/view¼Spatial
Bivand RS, Pebesma E, Gomez-Rubio V (2013) Applied spatial data analysis with R, 2nd edn. Springer, New York, 405 pp Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, 2nd edn. Wiley, Hoboken, p 734 Cressie NAC (1993) Statistics for spatial data, Revised edn. Wiley, New York, p 416 De Iaco S, Palma M, Posa D (2015) Spatio-temporal geostatistical modeling for French fertility predictions. Spat Stat 14:546–562 Diggle PJ (2003) Statistical analysis of spatial point patterns, 2nd edn. Arnold, London, p 267 Fischer MM, Wang J (2011) Spatial data analysis. Models, methods and techniques. Springer, Heidelberg/Dordrecht/London/New York, p 91 Flury R, Gerber F, Schmid B, Furrer R (2021) Identification of dominant features in spatial data. Spat Stat 41:25. https://doi.org/10.1016/j. spasta.2020.100483 Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York, p 483 Hristopulos D (2020) Random fields for spatial data modeling: a primer for scientists and engineers. Springer, Netherlands, p 867 Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, New York, p 561 MINES ParisTech/ARMINES (2020) RGeostats: the geostatistical R Package. Free download from http://cg.ensmp.fr/rgeostats Müller WG (2007) Collecting spatial data. Optimum design of experiments for random fields. Springer, Berlin/Heidelberg, 250 pp Openshaw S, Taylor PJ (1979) A million or so correlation coefficients: three experiments on the modifiable areal unit problem. In: Wrigley N (ed) Statistical applications in the spatial sciences. Pion, London, pp 127–144 Posa D, De Iaco S (2009) Geostatistica: Teoria e Applicazioni. Giappichelli, Torino, p 264 Ripley BD (1981) Spatial statistics. John Wiley & Sons, New York, p 252 Saveliev AA, Mukharamova SS, Zuur AF (2007) Analysis and modelling of lattice data. In: Analysing ecological data. Statistics for biology and health. Springer, New York, pp 321–339 Schabenberger O, Gotway CA (2005) Statistical methods for spatial data analysis: texts in statistical science, 1st edn. CRC Press, Boca Raton, p 512 Tobler WR (1970) A computer movie simulating urban growth in the Detroit region. J Econ Geogr 46:234–240
Spatial Data Infrastructure and Generalization Jagadish Boodala, Onkar Dikshit and Nagarajan Balasubramanian Department of Civil Engineering, Indian Institute of Technology Kanpur, Kanpur, UP, India
Definition The Spatial Data Infrastructure (SDI) is a framework that facilitates the reuse and integration of spatial data and services available from different sources. SDI framework is a collection of policies, spatial data, technologies, humans, and institutional arrangements that ease the sharing and reuse of spatial data. The main component of SDI is spatial data. The
Spatial Data Infrastructure and Generalization
generalization plays an essential role in situations where the spatial data of different levels of detail or scales are to be combined and made homogeneous. The transformation of spatial data from more detailed to the abstract level of detail is called generalization.
Introduction Traditional maps were the outcome of spatial observations for many years to get the spatial context (Tóth et al. 2012). With technological advancements, conventional paper maps were replaced by digital spatial data. Initially, the public sector was responsible for capturing and producing a significant portion of the spatial data, such as National Mapping and Cadastral Agencies (Toomanian 2012). The private sector has shown interest in the spatial data when its applications have increased in areas like location-based services, agriculture, intelligent navigation, etc. The licensing issues of authoritative spatial data from public and private sectors lead to the
1359
crowdsourced data called volunteered geographic information (Toomanian 2012). The availability of spatial data from the public, private, and crowdsourced have resulted in the data explosion. Furthermore, the number of spatial data processing tools has also increased. The cost of data capturing is high; hence there is a necessity to reuse the existing spatial data. There is a need for a framework, called SDI, to make better sense and use of the current spatial data. One such legal framework to establish European SDI is INSPIRE. INSPIRE stands for the Infrastructure for Spatial Information in the European Community. INSPIRE’s main objective is to assist in policy-making related to environmental issues by improving the accessibility and interoperability of spatial data among European Union member states (Duchêne et al. 2014; INSPIRE 2007a). As the spatial data are coming from various sources, there is a possibility that the same real-world phenomena are modeled and represented by different viewpoints and also at different levels of detail or scales. The multiple representations are quite common, which can be derived and handled by a generalization process (Tóth 2007). The following sections
S
Spatial Data Infrastructure and Generalization, Fig. 1 Features of different themes at the same level of detail. (Contains OS data © Crown copyright and database right (2017))
1360
highlight the requirements set out by the INSPIRE Directive (INSPIRE 2007a) to achieve data interoperability and explain how generalization can help in achieving it.
INSPIRE: Multiple Representations and Data Consistency The possibility of combining spatial data sets, without repetitive human intervention, into coherent data sets whose value is enhanced is called data interoperability (INSPIRE 2007a). Recital 6 of the INSPIRE Directive provides five common principles of INSPIRE (INSPIRE 2007a, b). The first three principles form the basis for defining data interoperability (INSPIRE 2014). Article 8(3) and Article 10(2) of the INSPIRE Directive provide the requirements to handle multiple representations and data consistency (INSPIRE 2007a). Hence, the elements “(Q) Consistency between data,” and “(R) Multiple representations” are included as the data interoperability components (INSPIRE 2014).
Spatial Data Infrastructure and Generalization
The component “(Q) Consistency between data” provides the rules to maintain consistency between the data corresponding to the same location or between the same data collected at different levels of detail or scales (INSPIRE 2007a). However, the component “(R) Multiple representations” explains how generalization can be used to derive multiple representations (INSPIRE 2014). Another data interoperability component that specifies the target scale range is the element “(S) Data capturing.” This component defines the data capturing rules or selection criteria according to the target scale (INSPIRE 2014). The requirements of INSPIRE discussed above helps to pinpoint the situations where generalization finds its worth.
Generalization in the Context of INSPIRE The first principle of INSPIRE (INSPIRE 2007b), “data should be collected only once,” motivates to apply an automatic generalization to derive spatial data of abstract levels of
Spatial Data Infrastructure and Generalization, Fig. 2 Building features at two different levels of detail. (Contains OS data © Crown copyright and database right (2017))
Spatial Data Infrastructure and Generalization
detail. These different representations of the same real-world phenomena are called multiple representations. The generalization is a process of reducing the level of detail or scale of the spatial data by applying a broad set of operators called generalization operators. The high-level classification of generalization operators is either based on automation (Foerster et al. 2007) or the cartographer’s (Roth et al. 2011) point of view. However, the essential list of generalization operators includes selection, simplification, classification, collapse, merge, typification, displacement, enhancement, and elimination. The data capturing rules, for example, minimum area or length, defined in INSPIRE, are useful while performing the selection operation. The generalization is a complex process. It is challenging to achieve unique results because of the subjectivity involved in the process itself (Stoter 2005). The issues with generalization are its subjectivity, complexity, and immaturity. Hence, generalization service is not part of the INSPIRE data transformation services; however, multiple representations are allowed in INSPIRE (INSPIRE 2008). The modeling of multiple representations in INSPIRE paves the way for easy integration of spatial data sets from
1361
different sources. While dealing with multiple representations and data sets from different sources, it is necessary to maintain consistency between the datasets. INSPIRE identifies different types of consistency checks: within a data set, between features of different themes at the same level of detail, and between the same feature at different levels of detail (INSPIRE 2014). Figure 1 shows the scenario where the features of different themes are represented together in a product of scale 1:10000. The generalization process should consider the consistency rules defined between all the features. These rules are developed based on logical, functional, and geometrical relationships between features. For example, Building features must not overlap between themselves and with the Woodland features. The consistency rules are expressed as topological constraints to avoid inconsistencies in the data set and guide the generalization process. The “OS OpenMap – Local” product of Ordnance Survey, UK, is used for the illustration in Fig. 1. The consistency between multiple representations of the same feature at different levels of detail should be maintained. Fig. 2 explains the importance of this consistency
S
Spatial Data Infrastructure and Generalization, Fig. 3 Road features at two different levels of detail. (Contains OS data © Crown copyright and database right (2017 & 2015))
1362
requirement. It shows the two representations of the Building features in products of 1:10000 (OS OpenMap – Local) and 1: 25000 (OS VectorMap District) scales. The advantages of automatic generalization are twofold. One is in deriving the abstract representations, and the other is in deriving the consistent data sets. Maintaining links between multiple representations from existing data sets is quite challenging than maintaining it between the representations derived via automatic generalization. The established links help in maintaining data consistency (INSPIRE 2008). Figure 3 illustrates the situation frequently encountered in SDI. It considers the Road features from products of 1:10000 (OS OpenMap – Local) and 1:250000 (Strategi) scales. In this case, the spatial data user has two different data sets of different levels of detail or scales separately covering the region of his/her interest. It is clear from Fig. 3 that the spatial data sets covering the region of the user’s interest are heterogeneous. In such cases, generalization comes into the picture to homogenize the level of the detail or scale of the combined spatial data set.
Summary An increase in spatial data applications in various fields attracted investments in spatial data capturing from both the public and private sectors. Nevertheless, the data capturing cost remains high. Some policies or decisions, for example, related to the environment or disaster management, have to be taken at the national or international level. In such cases, spatial data from different sources are required to support the policy or decision-making. The spatial data from different sources cannot be integrated and used instantaneously because of different standards followed in capturing it. Also, the accessibility of the spatial data sets is not easy and quick. All these aspects lead to the SDI, and the INSPIRE Directive aims to establish a European Union SDI. The spatial data is an essential component of INSPIRE, and for that matter, of any SDI. Hence, INSPIRE defines a total of 20 elements as the data interoperability components. Among those 20 elements, generalization finds its application in only 3 components to achieve data interoperability. They are “(Q) Consistency between data,” “(R) Multiple representations,” and “(S) Data capturing.” In the future, generalization can be used in INSPIRE view services to provide high-quality cartographic data in an INSPIRE geoportal.
Cross-References ▶ Data Visualization ▶ Geographical Information Science ▶ Metadata ▶ Spatial Data
Spatial Statistics
Bibliography Duchêne C, Baella B, Brewer CA, Burghardt D, Buttenfield BP, Gaffuri J, Käuferle D, Lecordix F, Maugeais E, Nijhuis R, Pla M, Post M, Regnauld N, Stanislawski LV, Stoter J, Tóth K, Urbanke S, van Altena V, Wiedemann A (2014) Generalisation in practice within National Mapping Agencies. In: Burghardt D, Duchêne C, Mackaness W (eds) Abstracting geographic information in a data rich world. Springer International Publishing, Cham, pp 329–391 Foerster T, Stoter J, Köbben B (2007) Towards a formal classification of generalization operators. In: Proceedings of the 23rd international cartographic conference – cartography for everyone and for you. International Cartographic Association, Moscow INSPIRE (2007a) Directive 2007/2/EC of the European Parliament and of the Council of 14 March 2007 establishing an Infrastructure for Spatial Information in the European Community (INSPIRE). https:// eur-lex.europa.eu/eli/dir/2007/2/oj. Accessed 29 Oct 2020 INSPIRE (2007b) INSPIRE Principles. https://inspire.ec.europa.eu/ inspire-principles/9. Accessed 29 Oct 2020 INSPIRE (2008) D2.6: Methodology for the development of data specifications, v3.0. https://inspire.ec.europa.eu/documents/ methodology-development-data-specifications-baseline-version-d26-version-30. Accessed 29 Oct 2020 INSPIRE (2014) D2.5: Generic conceptual model, Version 3.4. https:// inspire.ec.europa.eu/documents/inspire-generic-conceptual-model. Accessed 29 Oct 2020 Roth RE, Brewer CA, Stryker MS (2011) A typology of operators for maintaining legible map designs at multiple scales. Cartograph Perspect 68:29–64. https://doi.org/10.14714/CP68.7 Stoter JE (2005) Generalisation within NMA’s in the 21st century. In: Proceedings of the 22nd international cartographic conference: mapping approaches into a changing world. International Cartographic Association, A Coruña Toomanian A (2012) Methods to improve and evaluate spatial data infrastructures. Department of Physical Geography and Ecosystem Science. Lund University, Lund Tóth K (2007) Data consistency and multiple-representation in the European spatial data infrastructure. In: 10th ICA workshop on generalisation and multiple representation, Moscow Tóth K, Portele C, Illert A, Lutz M, Nunes de Lima M (2012) A conceptual model for developing interoperability specifications in spatial data infrastructures. EUR, Scientific and technical research series, vol 25280. Publications Office of the European Union, Luxembourg
Spatial Statistics Noel Cressie and Matthew T. Moores School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, Australia
Definition Spatial statistics is an area of study devoted to the statistical analysis of data that have a spatial label associated with them. Geographers often refer to the “location information” associated with the “attribute information,” whose study defines a research area called “spatial analysis.” Many of the ways to manipulate spatial data are driven by algorithms with no
Spatial Statistics
1363
uncertainty quantification associated with them. When a spatial analysis is statistical, that is, when it incorporates uncertainty quantification, it falls in the research area called spatial statistics. The primary feature of spatial statistical models is that nearby attribute values are more statistically dependent than distant attribute values; this is a paraphrasing of what is sometimes called the First Law of Geography (Tobler 1970).
data, denoted here as Z (e.g., Ripley 1981; Upton and Fingleton 1985; and Cressie 1993), rather than the types of spatial processes Y that underly the spatial data. In this review, we classify our spatial-statistical modeling choices according to the process model (2). Then the data model, namely, the distribution of the data Z given both Y and D in (2), is the straightforward conditional-probability measure, ½ZjY, D :
Introduction Spatial statistics provides a probabilistic framework for giving answers to those scientific questions where spatiallocation information is present in the data, and that information is relevant to the questions being asked. The role of probability theory in (spatial) statistics is to model the uncertainty, both in the scientific theory behind the question, and in the (spatial) data coming from measurements of the (spatial) process that is a representation of the scientific theory. In spatial statistics, uncertainty in the scientific theory is expressed probabilistically through a spatial stochastic process, which can be written most generally as: fY ðsÞ : s D g,
ð1Þ
where Y(s) is the random attribute value at location s, and D is a subset of a d-dimensional space, for illustration Euclidean space ℝd, that indexes all possible spatial locations of interest. Contained within D is a (possibly random) set D that indexes those parts of D relevant to the scientific study. We shall see below that D can have different set properties, depending upon whether the spatial process is a geostatistical process, a lattice process, or a point process. It is convenient to express the joint probability model defined by random {Y(s) : s D} and random D in the following shorthand, [Y, D], which we refer to as the spatial process model. Now, ½Y, D ¼ ½YjD ½D ,
ð2Þ
where for generic random quantities A and B, their joint probability measure is denoted by [A, B]; the conditional probability measure of A given B is denoted by [A | B]; and the marginal probability measure of B is denoted by [B]. In this review of spatial statistics, expression (2) formalizes the general definition of a spatial statistical model given in Cressie (1993, Section 1.1). The model (2) covers the three principal spatial statistical areas according to three different assumptions about [D], which leads to three different types of spatial stochastic process, [Y | D]; these are described further in the next section “Spatial Process Models”. Spatial statistics has, in the past, classified its methodology according to the types of spatial
ð3Þ
For example, the spatial data Z could be the vector (Z(s1), . . ., Z(sn))0 , of imperfect measurements of Y taken at given spatial locations {s1, . . ., sn} D, where the data are assumed to be conditionally independent. That is, the data model is n
½ZjY, D ¼
½Zðsi ÞjY, D :
ð4Þ
i¼1
Notice that while (4) is based on conditional independence, the marginal distribution, [Z | D], does not exhibit independence: The spatial-statistical dependence in Z, articulated in the First Law of Geography that was discussed in section “Definition”, is inherited from [Y | D] and (4) as follows: ½ZjD ¼ ½ZjY, D ½YjD dY: Another example is where the randomness is in D but not in Y. If D is a spatial point process (a special case of a random set), then the data Z ¼ {N, s1, . . ., sN}, where N is the random number of points in the now-bounded region D, and D ¼ {s1, . . ., sN} are the random locations of the points. If there are measurements (sometimes called “marks”) {Z(s1), . . ., Z(sN)} associated with the random points in D, these should be included within Z. That is, Z ¼ fN, ðs1 , Z ðs1 ÞÞ, . . . , ðsN , Z ðsN ÞÞg:
ð5Þ
This description of spatial statistics given by (2) and (3) captures the (known) uncertainty in the scientific problem being addressed, namely, scientific uncertainty through the spatial process model (2) and measurement uncertainty through the data model (3). Together, (2) and (3) define a hierarchical statistical model, here for spatial data, although this hierarchical formulation through the conditional probability distributions, [Z | Y, D], [Y | D], and [D] for general Yand D, is appropriate throughout all of applied statistics. It is implicit in (2) and (3) that any parameters θ associated with the process model and the data model are known. We now discuss how to handle parameter uncertainty in the hierarchical statistical model. A Bayesian would put a
S
1364
Spatial Statistics
probability distribution on θ : Let [θ] denote the parameter model (or prior) that captures parameter uncertainty. Then, using obvious notation, all the uncertainty in the problem is expressed through the joint probability measure, ½Z, Y, D, y ¼ ½Z, Y, Djy ½y
ð6Þ
¼ ½ZjY, D, y ½YjD, y ½Djy ½y :
ð7Þ
A Bayesian hierarchical model uses the decomposition (7), but there is also an empirical hierarchical model that substitutes a point estimate y of θ into the first factor on the righthand side of (6), resulting in its being written as, Z, Y, Djy ¼ ZjY, D, y YjD, y Djy :
ð8Þ
Finding efficient estimators of θ from the spatial data Z is an important problem in spatial statistics, but in this review we emphasize the problem of spatial prediction of Y. In what follows, we shall assume that the parameters are either known or have been estimated. Hence, for convenience, we can drop y in (8) and express the uncertainty in the problem through the joint probability measure, ½Z, Y, D ¼ ½ZjY, D ½YjD ½D ,
ð9Þ
and Bayes’ Rule can be used to infer the unknowns Y and D through the predictive distribution: ½Y, DjZ ¼
½ZjY, D ½YjD ½D ; ½Z
ð10Þ
here [Z] is the normalization constant that ensures that the right-hand side of (10) integrates or sums to 1. If the spatial index set D is fixed and known then we can drop D from (10), and Bayes’ Rule simplifies to: ½YjZ ¼
½ZjY ½Y , ½Z
ð11Þ
which is the predictive distribution of Y (when D is fixed and known). It is this expression that is often used in spatial statistics for prediction. For example, the well-known simple kriging predictor can easily be identified as the predictive mean of (11) under Gaussian distributional assumptions for both (2) and (3) (Cressie and Wikle 2011, pp. 139–141). Our review of spatial statistics starts with a presentation in the next section “Spatial Process Models”, of a number of commonly used spatial process models, which includes multivariate models. Following that, the section “Spatial Discretization”, turns attention to discretization of D ℝd, which is an extremely important consideration when actually computing the predictive distribution (10) or (11). The
extension of spatial process models to spatiotemporal process models is discussed in the section “Spatiotemporal Processes”. Finally, in the section “Conclusion”, we briefly discuss important recent research topics in spatial statistics, but due to a lack of space we are unable to present them in full. It will be interesting to see 10 years from now, how these topics will have evolved.
Spatial Process Models In this section, we set out various ways that the probability distributions [Y | D], [Y ], and [D], given in Bayes’ Rule (10), can be represented in the spatial context. These are not to be confused with [Z | D] and [Z], the probability distributions of the spatial data. In many parts of the spatial-statistics literature, this confusion is noticeable when researchers build models directly for [Z | D]. Taking a hierarchical approach, we capture knowledge of the scientific process starting with the statistical models, [Y | D] and [D], and then we model the measurement errors and missing data through [Z | Y, D]. Finally, Bayes’ Rule (10) allows inference on the unknowns Y and D through the predictive distribution, [Y, D | Z]. We present three types of spatial process models, where their distinction is made according to the index set D of all spatial locations at which the process Y is defined. For a geostatistical process, D ¼ DG, which is a known set over which the locations vary continuously and whose area (or volume) is >0. For a lattice process, D ¼ DL, which is a known set whose locations vary discretely and the number of locations is countable; note that the area of DL is equal to zero. For a point process, D ¼ DP, which is a random set made up of random points in D. Geostatistical Processes In this section, we assume that the spatial locations D are given by DG, where DG is known. Hence D can be dropped from any of the probability distributions in (10), resulting in (11). This allows us to concentrate on Y and, to feature the spatial index, we write Y equivalently as {Y(s) : s DG}. A property of geostatistical processes is that DG has positive area and hence is uncountable. Traditionally, a geostatistical process has been specified up to second moments. Starting with the most general specification, we have mY ðsÞ EðY ðsÞÞ;
s DG
CY ðs, uÞ covðY ðsÞ, Y ðuÞÞ;
s, u DG :
ð12Þ ð13Þ
From (12) and (13), an optimal spatial linear predictor ^ 0) of Y(s0) can be obtained that depends on spatial data Y(s
Spatial Statistics
1365
Z (Z(s1), . . ., Z(sn))0 . This is an n-dimensional vector indexed by the data’s n known spatial locations, DG {s1, . . ., sn} DG. In practice, estimation of the parameters θ that specify completely (12) and (13) can be problematic due to the lack of replicated data, so Matheron (1963) made stationarity assumptions that together are now known as intrinsic stationarity. That is, for all s, u DG, assume EðY ðsÞÞ ¼ moY
ð14Þ
varðY ðsÞ Y ðuÞÞ ¼ 2goY ðs uÞ,
ð15Þ
where (15) is equal to CY (s, s) þ CY (u, u) 2CY (s, u). The quantity 2goY ð Þ is called the variogram, and goY ð Þ is called the semivariogram (or occasionally the semivariance). If the assumption in (15) were replaced by cov Y ðsÞ, Y ðuÞ ¼ CoY ðs uÞ, for all s, u DG ,
The optimized MSPE (17) is called the kriging variance, and its square root is called the kriging standard error: sk ðs0 Þ
E
Y ðs0 Þ Y ðs0 Þ
2
1=2
, for any s0 DG : ð18Þ
Figure 2 shows a map over DG of the kriging standard error associated with the kriging predictor mapped in Fig. 1. It can be shown that a smaller sk(s0) corresponds to a higher density of weather stations near s0. While ordinary and universal kriging produce an optimal linear predictor, there is an even better predictor, the best optimal predictor (BOP), which is
ð16Þ
40
then (16) and (14) together are known as second-order stationarity. Matheron chose (15) because he could derive optimal-spatial-linear-prediction (i.e., kriging) equations of Y(s0) without having to know or estimate moY . Here, “optimal” is in reference to a spatial linear predictor Y ðs0 Þthat minimizes the mean-squared prediction error (MSPE),
35
2
20
E
Y ðs0 Þ Y ðs0 Þ
, for any s0 DG ,
30
25
ð17Þ 15
n i¼1 li Z ðsi Þ.
The minimization in (17) is with where Y ðs0 Þ respect to the coefficients {li : i ¼ 1, . . ., n} subject to the ^ 0)) ¼ E(Y(s0)), or equivalently unbiasedness constraint, E(Y(s subject to the constraint ni¼1 li ¼ 1 on {li}. With optimally ^ 0) is known as the kriging predictor. chosen {li}, Y(s Matheron called this approach to spatial prediction ordinary kriging, although it is known in other fields as BLUP (Best Linear Unbiased Prediction); Cressie (1990) gave the history of kriging and showed that it could also be referred to descriptively as spatial BLUP. The constant-mean assumption (14) can be generalized to E(Y(s)) x(s)0 β, for s DG, which is a linear regression where the regression coefficients β are unknown and the covariate vector x(s) includes the entry 1. Under this assumption on E(Y(s)), ordinary kriging is generalized to universal ^ 0). Figure 1 shows the universalkriging, also notated as Y(s kriging predictor of Australian temperature in the month of January 2009, mapped over the whole continent DG, where the spatial locations DG ¼ {s1, . . ., sn} of weather stations ^ 0) that supplied the data Z are superimposed. Formulas for Y(s can be found in, for example, Chilès and Delfiner (2012, Section 3.4).
0
500 1000 km
Spatial Statistics, Fig. 1 Map of a kriging predictor of Australian temperature in January 2009, superimposed on spatial locations of data
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Spatial Statistics, Fig. 2 Map of the kriging standard error (18) for the kriging predictor shown in Fig. 1
S
1366
Spatial Statistics
the best of all the best predictors obtained under a variety of constraints (e.g., linearity). From Bayes’ Rule (10), the predictor that minimizes the MSPE (16) without any constraints is Y*(s0) E(Y(s0) | Z), which is the mean of the predictive distribution. Notice that the BOP, Y*(s0), is unbiased, namely, E(Y*(s0)) ¼ E(Y(s0)), without having to constrain it to be so. Lattice Processes In this section, we assume that the spatial locations D are given by DL, a known countable subset of ℝd. This usually represents a collection of grid nodes, pixels, or small areas and the spatial locations associated with them; we write the countable set of all such locations as DL {s 1, s2, . . .}. Each si has a set of neighbors, N (s i) DL\si, associated with it, and whose locations are spatially proximate (and note that a location is not considered to be its own neighbor). Spatial-statistical dependence between locations in lattice processes is defined in terms of these neighborhood relations. Typically, the neighbors are represented by a spatialdependence matrix W with entries wi,j nonzero if sj N (si), and hence the diagonal entries of W are all zero. The nondiagonal entries of W might be, for example, inversely proportional to the distance, ||si sj||, or they might involve some other way of moderating dependence based on spatial proximity. For example, they might be assigned the value 1 if a neighborhood relation exists and 0 otherwise. In this case, W is called an adjacency matrix, and it is symmetric if sj N (si) whenever si N (sj) and vice versa. Consider a lattice process in ℝ2 defined on the finite grid L D ¼ {(x, y) : x, y ¼ 1, . . ., 5}. The first-order neighbors of the grid node (x, y) in the interior of the lattice are the four adjacent nodes, N (x, y) ¼ {(x 1, y), (x, y 1), (x þ 1, y), (x, y þ 1)}, shown as: ∘ ∘
∘ ∘
∘ ∘
• ∘
∘
∘
∘ • • ∘
∘ ∘
∘ ∘
• ∘
∘ ∘
∘
∘
where grid node si is represented by , and its first-order neighbors are represented by •. Nodes situated on the boundary of the grid will have less than four neighbors. The most common type of lattice process is the Markov random field (MRF), which has a conditional-probability property in the spatial domain ℝd that is a generalization of the temporal Markov property found in section “Spatiotemporal Processes”. A lattice process {Y(s) : s DL} is a MRF if, for all si DL, its conditional probabilities satisfy
Y ðsi ÞjY DL ∖si
¼ ½Y ðsi ÞjYðN ðsi ÞÞ ,
ð19Þ
where Y(A) {Y(sj) : sj A}. The MRF is defined in terms of these conditional probabilities (19), which represent statistical dependencies between neighboring nodes that are captured differently from those given by the variogram or the covariance function. Specifically, Y ðsi ÞjY DL ∖si
¼
expff ðY ðsi Þ, YðN ðsi ÞÞÞg , C
ð20Þ
where C is a normalizing constant that ensures the right-hand side of (20) integrates (or sums) to 1. Equation (20) is also known as a Gibbs random field in statistical mechanics since, under regularity conditions, the Hammersley-Clifford Theorem relates the joint probability distribution to the Gibbs measure (Besag 1974). The function f (Y(si), Y(N (si))) is referred to as the potential energy, since it quantifies the strength of interactions between neighbors. A wide variety of MRF models can be defined by choosing different forms of the potential-energy function (Winkler 2003, Section 3.2). Note that care needs to be taken to ensure that specification of the model through all the conditional probability distributions, {[Y(si) | Y(N (si))] : si DL}, results in a valid joint probability distribution, [{Y(si) : si DL}] (Kaiser and Cressie 2000). Revisiting the previous simple example of a first-order neighborhood structure on a regular lattice in ℝ2, notice that grid nodes situated diagonally across from each other are conditionally independent. Hence, DL can be partitioned into two sub-lattices DL1 and DL2 , such that the values at the nodes in DL1 are independent given the values at the nodes in DL2 and vice versa (Besag 1974; Winkler 2003, Section 8.1): • ∘ • ∘ •
∘ • ∘ • ∘
• ∘ • ∘ •
∘ • ∘ • ∘
• ∘ • ∘ •
This forms a checkerboard pattern where Y ðsÞ : s DL1 at nodes DL1 represented by • are mutually independent, given the values Y ðuÞ : u DL2 at nodes DL2 represented by ∘. Besag (1974) introduced the conditional autoregressive (CAR) model, which is a Gaussian MRF that is defined in terms of its conditional means and variances. We refer the reader to LeSage and Pace (2009) for discussion of a different lattice-process model, known as the simultaneous autoregressive (SAR) model, and a comparison of it with the CAR model. We define the CAR model as follows: For si DL, Y(si) is conditionally Gaussian defined by its first and second moments,
Spatial Statistics
1367
ð21Þ
ð22Þ
where ci,j are spatial autoregressive coefficients such that the diagonal elements c1,1 ¼ ¼ cn,n ¼ 0, and t2i are the variability parameters for the locations {si}, respectively. Under an important regularity condition (see below), this specification results in a joint probability distribution that is multivariate Gaussian. That is, Y Gau 0, ðI CÞ1 M ,
½N ðA1 Þ, N ðA2 Þ ¼ ½N ðA1 Þ ½N ðA2 Þ :
ð23Þ
where Gau (m, S) denotes a Gaussian distribution with mean vector m and covariance matrix S; the matrix M diag t21 , . . . , t2n is diagonal; and the regularity condition referred to above is that the coefficients C {ci,j} in (21) have to result in M1 (I C) being a symmetric and positive-definite matrix. With a first-order neighborhood structure, such as shown in the simple example above in ℝ2, the precision matrix is block-diagonal, which makes it possible to sample efficiently from this Gaussian MRF using sparse-matrix methods (Rue and Held 2005, Section 2.4). The data vector for lattice processes is Z (Z(s1), . . ., Z(sn))0 , where DL {s1, . . ., sn} DL. As for the previous subsection, the data model is [Z | {Y(s) : s DL}] which, to emphasize dependance on parameters θ, we reactivate earlier notation and write as [Z | Y, θ]. Now, if we write the latticeprocess model as [{Y(s) : s DL} | θ] [Y | θ], then estimation of θ follows by maximizing the likelihood, L(θ) [Z | Y, θ] [Y | θ] dY. Regarding spatial prediction, Y*(s0) E(Y(s0) | Z, θ) is the best optimal predictor of Y(s0), for s0 DL and known θ (e.g., Besag et al. 1991). Note that s0 may not belong to DL , and hence Y*(s0) is a predictor of Y(s0) even when there is no datum observed at the node s0. Inference on unobserved parts of the process Y is just as important for lattice processes as it is for geostatistical processes.
Spatial Point Processes and Random Sets A spatial point process is a countable collection of random locations D DP D. Closely related to this random set of points is the counting process that we shall call {N(A) : A D}, where recall that D indexes all possible locations of interest, and now we assume it is bounded. For example, if A is a given subset of D , and two of the random points {si} are contained in A, then N(A) ¼ 2. Since DP ¼ {si} is
ð24Þ
The basic spatial point process known as the Poisson point process has the independence property (24), and its associated counting process satisfies ½N ðAÞ ¼ expflðAÞg
lðAÞNðAÞ ; N ðAÞ!
A D,
ð25Þ
where l(A) A l(s)ds. In (25), l( ) is a given intensity function defined according to: lðsÞ lim
jdsj!0
EðY ðdsÞÞ , j ds j
ð26Þ
where δs is a small set centered at s D, and whose volume |δs| tends to 0. In (25), the special case of l(s) l, for all s D, results in a homogeneous Poisson point process, and a simulation of it is shown in Fig. 3. The simulation was obtained using an equivalent probabilistic representation for which the count random variable N(D), for D ¼ [0, 1] [0, 1], was simulated
S
0.8
varðY ðsi ÞjYðN ðsi ÞÞÞ ¼ t2i ,
random and A is fixed, N(A) is a random variable defined on the non-negative integers. Clearly, the joint distributions [N(A1), . . ., N(Am)], for any subsets {Aj : j ¼ 1, . . ., m} contained in D (possibly overlapping) and for any m ¼ 0, 1, 2, . . ., are well defined. Spatial dependence can be seen through the spatial proximity between the {Aj}. Consider just two fixed subsets, A1 and A2 (i.e., m ¼ 2) and, to avoid ambiguity caused by potentially sharing points, let A1 \ A2 be empty. Then no spatial dependence is exhibited if, for any disjoint A1 and A2, there is statistical independence; that is,
0.4
ci,j Y sj sj N ðsi Þ
0.0
EðY ðsi ÞjYðN ðsi ÞÞÞ ¼
0.0
0.4
0.8
Spatial Statistics, Fig. 3 A realization on the unit square D ¼ [0, 1] [0, 1], of the homogenous Poisson point process (25) with parameter l ¼ 50; for this realization, N(D) ¼ 46
1368
Spatial Statistics
according to (25). Then, conditional on N(D), {s1, . . ., sNðDÞ } was simulated independently and identically according to the uniform distribution,
½u ¼
1 ; uD lð D Þ 0; elsewhere:
ð27Þ
This representation explains why the homogenous Poisson point process is commonly referred to as a Completely Spatially Random (CSR) process, and why it is used as a baseline for testing for the absence of spatial dependence in a point process. That is, before a spatial model is fitted to a point pattern, a test of the null hypothesis that the point pattern originates from a CSR process, is often carried out. Rejection of CSR then justifies the fitting of spatially dependent point processes to the data (e.g., Ripley 1981; Diggle 2013). Much of the early research in point processes was devoted to establishing test statistics that were sensitive to various types of departures from the CSR process (e.g., Cressie 1993, Section 8.2). This was followed by researchers’ defining and then estimating spatial-dependence measures such as the second-order-intensity function and the K-function (e.g., Ripley 1981, Chapter 8), where inference was often in terms of method-of-moments estimation of these functions. More efficient likelihood-based inference came later; Baddeley et al. (2015) give a comprehensive review of these methodologies for point processes. From a modeling perspective, particular attention has been paid to the log-Gaussian Cox point processes; here, l( ) in (25) is random, such that {log(l(s)) : s D } is a Gaussian process (e.g., Møller and Waagepetersen 2003). This model leads naturally to hierarchical Bayesian inference for l( ) and its parameters (e.g., Gelfand and Schliep 2018). If an attribute process, {Y(si) : si DP}, is included with the spatial point process DP, one obtains a so-called marked point process (e.g., Cressie 1993, Section 8.7). For example, the study of a natural forest where both the locations {si} and the sizes of the trees {Y(si)} are modeled together probabilistically, results in a marked point process where the “mark” process is a spatial process {Y(si) : si DP} of tree size. Now, Bayes’ Rule given by (10), where both Y and D (¼ DP) are random, should be used to make inference on Y and DP through the predictive distribution [Y, DP | Z]. Here, Z consists of the number of trees, the trees’ locations, and the trees’ size measurements, as denoted in (5). After marginalization, we can obtain [DP | Z], the predictive distribution of the spatial point process DP. A spatial point process is a special case of a random set, which is a random quantity in Euclidean space that was defined rigorously by Matheron (1975). Some geological
processes are more naturally modeled as set-valued phenomena (e.g., the facies of a mineralization); however, inference for random-set processes has lagged behind those for spatial point processes. It is difficult to define a likelihood based on set-valued data, which has held back statistically efficient inferences; nevertheless, basic method-of-moment estimators are often available. The most well-known random set that allows statistical inference from set-valued data is the Boolean Model (e.g., Cressie and Wikle 2011, Section 4.4).
Multivariate Spatial Processes The previous subsections have presented single spatial statistical processes but, as models become more realistic representations of a complex world, there is a need to express interactions between multiple processes. This is most directly seen by modeling vector-valued geostatistical processes {Y(s) : s DG}, and vector-valued lattice processes {Y(si) : si DL}, where the k-dimensional vector Y(s) (Y1(s), . . ., Yk(s))0 represents the multiple processes at the generic location s D. Vector-valued spatial point processes can be represented as a set of k point processes, {{s1,i}, . . ., {sk,i}}, and these are presented in Baddeley et al. (2015, Chapter 14). If we adopt a hierarchicalstatistical-modeling approach, it is possible to construct multivariate spatial processes whose component univariate processes could come from any of the three types of spatial processes presented in the previous three subsections. This is because, at a deeper level of the hierarchy, a core multivariate geostatistical process can control the spatial dependence for processes of any type, which allows the possibility of hybrid multivariate spatial statistical processes. In what follows, we describe briefly two approaches to constructing multivariate geostatistical processes, one based on a joint approach and the other based on a conditional approach. We consider the case k ¼ 2, namely, the bivariate spatial process {(Y1(s), Y2(s))0 : s DG}, for illustration. The joint approach involves directly constructing a valid spatial statistical model from m(s) (m1(s), m2(s))0 (E(Y1(s)), E(Y2(s)))0 , for s DG, and from covðY l ðsÞ, Y m ðuÞÞ Clm ðs, uÞ; l, m ¼ 1, 2,
ð28Þ
for s, u DG. The bivariate-process mean m( ) is typically modeled as a vector linear regression; hence it is straightforward to model the bivariate mean once the appropriate regressors have been chosen. Analogous to the univariate case, the set of covariance and cross-covariance functions, {C11( , ), C22( , ), C12( , ), C21( , )}, have to satisfy positive-definiteness conditions for the bivariate geostatistical model to be valid, and it is important to note that, in general, C12(s, u) 6¼ C21(s, u). There are classes of valid models that exhibit symmetric cross-
Spatial Statistics
1369
dependence, namely, C12(s, u) ¼ C21(s, u), such as the linear model of co-regionalization (Gelfand et al. 2004). These are not reasonable models for ore-reserve estimation when there has been preferential mineralization in the ore body. The joint approach can be contrasted with a conditional approach (Cressie and Zammit-Mangion 2016), where each of the k processes is a node of a directed acyclic graph that guides the conditional dependence of any process, given the remaining processes. Again consider the bivariate case (i.e., k ¼ 2), where there are only two nodes such that Y1( ) is at node 1, Y2( ) is at node 2, and a directed edge is declared from node 1 to node 2. Then the appropriate way to model the joint distribution is through ½Y 1 ð Þ, Y 2 ð Þ ¼ ½Y 2 ð ÞjY 1 ð Þ ½Y 1 ð Þ ,
ð29Þ
where [Y2( ) | Y1( )] is shorthand for [Y2( ) | {Y1(s) : s DG}]. The geostatistical model for [Y1( )] is simply a univariate spatial model based on a mean function m1( ) and a valid covariance function C11( , ), which was discussed in section “Geostatistical Processes”. Now assume that Y2( ) depends on Y1( ) as follows: For s, u DG, E½Y 2 ðsÞjY 1 ð Þ m2 ðsÞ þ
bðs, vÞ DG
ðY 1 ðvÞ m1 ðvÞÞ dv, covðY 2 ðsÞ, Y 2 ðuÞjY 1 ð ÞÞ C2j1 ðs, uÞ,
ð30Þ ð31Þ
where C2|1( , ) is a valid univariate covariance function and b( , ) is an integrable interaction function. The conditionalmoment assumptions given by (30) and (31) follow if one assumes that (Y1( ), Y2( ))0 is a bivariate Gaussian process. Cressie and Zammit-Mangion (2016) show that, from (30) and (31), C12 ðs, uÞ ¼ C21 ðs, uÞ ¼
DG
DG
C11 ðs, vÞbðu, vÞdv
ð32Þ
C11 ðv, uÞbðv, sÞdv
ð33Þ
In summary, the conditional approach allows multivariate modeling to be carried out validly by simply specifying m( ) ¼ (m1( ), m2( ))0 and two valid univariate covariance functions, C1( , ) and C2|1( , ). The strengths of the conditional approach are that only univariate covariance functions need to be specified (for which there is a very large body of research; e.g., Cressie and Wikle 2011, Chapter 4), and that only integrability of b( , ), the interaction function, needs to be assumed (Cressie and Zammit-Mangion 2016).
Spatial Discretization Although geostatistical processes are defined on a continuous spatial domain DG, this can limit the practical feasibility of statistical inferences due to computational and mathematical considerations. For example, kriging from an n-dimensional vector of data involves the inversion of an n n covariance matrix, which requires order n3 floating-point operations and order n2 storage in available memory. These costs can be prohibitive for large spatial datasets; hence, spatial discretization to achieve scalable computation for spatial models is an active area of research. In practical applications, spatial statistical inference is required up to a finite spatial resolution. Many approaches take advantage of this by dividing the spatial domain D into a lattice of discrete points in D, as shown in Fig. 4. As a consequence of this discretization, a geostatistical process can be approximated by a lattice process, such as a Gaussian MRF (e.g., Rue and Held 2005, Section 5.1); however, sometimes this can result in undesirable discretization errors and artifacts. More sophisticated approaches have been developed
S
C22 ðs, uÞ ¼ C2j1 ðs, uÞ þ DG DG
bðs, vÞC11 ðv, wÞbðu, wÞdv dw, ð34Þ
for s, u DG. Along with m1( ), m2( ), and C11( , ), these functions (32)–(34) define a valid bivariate geostatistical process [Y1( ), Y2( )]. A notable property of the conditional approach is that asymmetric cross-dependence (i.e., C12(s, u) 6¼ C21(s, u)) occurs if b(s, u) 6¼ b(u, s).
Spatial Statistics, Fig. 4 Discretization of the spatial domain D , a convex region around Australia, into a triangular lattice
1370
Spatial Statistics
to obtain highly accurate approximations of a geostatistical (i.e., continuously indexed) spatial process evaluated over an irregular lattice, as we now discuss. Let the original domain D be bounded and suppose it is tessellated into the areas {Aj D : j ¼ 1, . . ., m} that are small, non-overlapping basic areal units (BAUs; Nguyen et al. 2012), so that D ¼ [m j¼1 Aj , and Aj \ Ak is the empty set for any j 6¼ k {1, . . ., m}; Fig. 4 gives an example of triangular BAUs. Spatial basis functions {f‘( ) : ‘ ¼ 1, . . ., r}, can then be defined on the BAUs. For example, Lindgren et al. (2011) used triangular basis functions where r > m, while fixed rank kriging (FRK; Cressie and Johannesson 2008; Zammit-Mangion and Cressie 2021) can employ a variety of different basis functions for r < m, including multi-resolution wavelets and bisquares. Vecchia approximations (e.g., Datta et al. 2016; Katzfuss et al. 2020) are also defined using a lattice of discrete points DL DG D , that include the coordinates of the observed data DG ¼ {s1, . . ., sn} and the prediction locations {sn þ 1, . . ., sn þ p}. Let [X] [Z, Y], where data Z and spatial process Y are associated with the lattice DL {s1, . . ., sn, sn þ 1, . . ., sn þ p}. It is a property of joint and conditional distributions that this can be factorized into a product: ½Xðsi ÞjXðs1 Þ, . . . , Xðsi1 Þ :
ð35Þ
i¼2
In the previous section, the set of spatial coordinates DL had no fixed ordering. However, a Vecchia approximation requires that an artificial ordering is imposed on {s1, . . ., sn þ p}. Let the ordering be denoted by {s(1), . . ., s(n þ p)}, and define the set of neighbors N (s(i)) {s(1), . . ., s(i 1)}, similarly to “Lattice Processes,” except that these neighborhood relations are not reciprocal: If for j < i, s( j) belongs to N (s(i)), then s(i) cannot belong to N (s( j)). As part of the Vecchia approximation, a fixed upper bound q n on the number of neighbors is chosen. That is, |N (s(i))| q, so that the lattice formed by {N (si) : i ¼ 1, . . ., n þ p} is a directed acyclic graph, which results in a partial order in D. The joint distribution [X] given by (35) is then approximated by:
Spatiotemporal Processes The section “Multivariate Spatial Processes” introduced processes that were written in vector form as, YðsÞ ðY 1 ðsÞ, . . . , Y k ðsÞÞ0 ; s DG :
X sðiÞ jX N sðiÞ
X ,
ð36Þ
ð37Þ
In that section, we distinguished between the joint approach and the conditional approach to multivariatespatial-statistical modeling and, under the conditional approach, we used a directed acyclic graph to give a blueprint for modeling the multivariate spatial dependence. Now, consider a spatiotemporal process, ð38Þ
where T is a temporal index set. Clearly, if T ¼ {1, 2, . . .} then (38) becomes a spatial process of time series, {Y(s;1), Y(s;2), : s DG}. If T ¼ {1, 2, . . ., k}, and we define Yj(s) Y(s; j), for j ¼ 1, . . ., k, then the resulting spatiotemporal process can be represented as a multivariate spatial process given by (37). Not surprisingly, the same dichotomy of approach to modeling statistical dependence (i.e., joint versus conditional) occurs for spatiotemporal processes as it does for multivariate spatial processes. Describing all possible covariances between Y at any spatiotemporal “location” (s;t) and any other one (u;v), amounts to treating “time” as simply another dimension to be added to the d-dimensional Euclidean space, ℝd. Taking this approach, spatiotemporal statistical dependence can be expressed in (d þ 1)-dimensional space through the covariance function, Cðs; t, u;vÞ covðY ðs;tÞ, Y ðu;vÞÞ;
nþp
X sð1Þ
to the true predictive process, [X | Z].
Y ðs;tÞ : s DG ; t T ,
nþp
½X ¼ ½Xðs1 Þ
example, it can be used as a random log-intensity function, log(l(s)), in a hierarchical point-process model, or it can be combined with other processes to define models described in section “Multivariate Spatial Processes”. However, in all of these contexts it should be remembered that the resulting predictive process, XjZ , is an approximation
s, u DG , t, v T : ð39Þ
i¼2
which is a Partially Ordered Markov Model (POMM; Cressie and Davidson 1998). This Vecchia approximation, X , is a distribution coming from a valid spatial process on the original, uncountable, unordered index set DG (Datta et al. 2016), which means that it can be used as a geostatistical process model with considerable computational advantages. For
Of course, the time dimension has different units than the spatial dimensions, and its interpretation is different since the future is unobserved. Hence, the joint modeling of space and time based on (39) must be done with care to account for the special nature of the time dimension in this descriptive approach to spatiotemporal modeling.
Spatial Statistics
1371
From current and past spatiotemporal data Z, predicting past values of Y is called smoothing, predicting unobserved values of the current Y is called filtering, and predicting future values of Y is called forecasting. The Kalman filter (Kalman 1960) was developed to provide fast predictions of the current state using a methodology that recognizes the ordering of the time dimension. Today’s filtered values become “old” the next day when a new set of current data are received. Using a dynamical approach that we shall now describe, the Kalman filter updates yesterday’s optimal filtered value with today’s data very rapidly, to obtain a current optimal filtered value. The best way to describe the dynamical approach is to discretize the spatial domain. The previous section “Spatial Discretization”, describes a number of ways this can be done; here we shall consider the discretization that is most natural for storing the attribute and location information in computer memory, namely, a fine-resolution lattice DL of pixels or voxels (short for “volume elements”). Replace {Y(s;t) : s DG, t ¼ 1, 2, . . .} with {Y(s;t) : s DL, t ¼ 1, 2, . . .}, where DL {s1, . . ., sm} are the centroids of elements of small area (or small volume) that make up DG. Often the areas of these elements are equal, having been defined by a regular grid. As we explain below, this allows a dynamical approach to constructing a statistical model for the spatiotemporal process Y on the discretized space-time cube, {s1, . . ., sm} {1, 2, . . .}. Define Yt (Y(s;t) : s DL)0 , which is an m-dimensional vector. Because of the temporal ordering, we can write the joint distribution of {Y(s;t) : s DL, t ¼ 1, . . ., k} from t ¼ 1 up to the present time t ¼ k, as ½Y1 , Y2 , . . . , Yk ¼ ½Y1 ½Y2 jY1 . . . ½Yk jYk1 , . . . , Y2 , Y1 , ð40Þ which has the same form as (35). Note that this conditional modeling of space and time is a natural approach, since time is completely ordered. The next step is to make a Markov assumption, and hence (40) can be written as k
½Y1 , Y2 , . . . , Yk ¼ ½Y1
Yj jYj1 :
ð41Þ
j¼2
This is the same Markov property that we previously discussed in section “Lattice Processes”, except it is now applied to the completely ordered one-dimensional domain, T ¼ {1, 2, . . .}, and N ( j) ¼ j 1. The Markov assumption makes our approach dynamical: It says that the present, conditional on the past, in fact only depends on the “most recent past.” That is, since N ( j) ¼ j 1, the factor [Yj | Yj 1, . . ., Y2, Y1] ¼ [Yj | Yj 1], which results in the model (41). For further information on the types of models used in the descriptive approach given by (39) and the types of models used in the dynamical approach given by (41), see Cressie and
Wikle (2011, Chapters 6–8) and Wikle et al. (2019, Chapters 4 and 5). The statistical analysis of observations from these processes is known as spatiotemporal statistics. Inference (estimation and prediction) from spatiotemporal data using R software can be found in Wikle et al. (2019).
Conclusion Spatial-statistical methods distinguish themselves from spatial-analysis methods found in the geographical and environmental sciences, by providing well-calibrated quantification of the uncertainty involved with estimation or prediction. Uncertainty in the scientific phenomenon of interest is represented by a spatial process model, {Y(s) : s D}, defined on possibly random D in ℝd, while measurement uncertainty in the observations Z is represented in a data model. In section “Introduction”, we saw how these two models are combined using Bayes’ Rule (10), or the simpler version (11), to calculate the overall uncertainty needed for statistical inference. There are three main types of spatial process models: geostatistical processes where uncertainty is in the process Y, which is indexed continuously in D ¼ DG; lattice processes where uncertainty is also in Y, but now Y is indexed over a countable number of spatial locations D ¼ DL; and point processes where uncertainty is in the spatial locations D ¼ DP. Multiple spatial processes can interact with each other to form a multivariate spatial process. Importantly, processes can vary over time as well as spatially, forming a spatiotemporal process. With some exceptions, spatial-statistical models (1) rarely consider the case of measurement error in the locations in D. Here we focus on a spatial-statistical model for the location error (Cressie and Kornak 2003): Write the observed locations as D* {ui : i ¼ 1, . . ., n}; in this case, a part of the data model is [D* | D], and a part of the process model is [D]. Finally then, the data consist of both locations and attributes and are Z {(ui, Z(ui)) : i ¼ 1, . . ., n}, the spatial process model is [Y, D], and the data model is [Z | Y, D]. Then Bayes’ Rule given by (10) is used to infer the unknowns Y and D from the predictive distribution [Y, D | Z]. As the size of spatial datasets have been increasing dramatically, more and more attention has been devoted to scalable computation for spatial-statistical models. Of particular interest are methods that use spatial discretization to approximate a continuous spatial domain, DG. There are other recent advances in spatial statistics that we feel are important to mention, but their discussion here is necessarily brief. Physical barriers can sometimes interrupt the statistical association between locations in close spatial proximity. Barrier models (Bakka et al. 2019) have been developed to account for these kinds of discontinuities in the spatial correlation function. Other methods for modeling nonstationarity,
S
1372
anisotropy, and heteroskedasticity in spatial process models are an active area of research. It can often be difficult to select appropriate prior distributions for the parameters of a stationary spatial process, for example its correlation-length scale. Penalized complexity (PC) priors (Simpson et al. 2017) are a way to encourage parsimony by favoring parameter values that result in the simplest model consistent with the data. The likelihood function of a point process or of a non-Gaussian lattice model can be both analytically and computationally intractable. Surrogate models, emulators, and quasi-likelihoods have been developed to approximate these intractable likelihoods (Moores et al. 2020). Copulas are an alternative method for modeling spatial dependence in multivariate data, particularly when the data are non-Gaussian (Krupskii and Genton 2019). One area where non-Gaussianity can arise is in modeling the spatial association between extreme events, such as for temperature or precipitation (Tawn et al. 2018; Bacro et al. 2020). As a final comment, we reflect on how the field of geostatistics has evolved, beginning with applications of spatial stochastic processes to mining: In the 1970s, Georges Matheron and his Centre of Mathematical Morphology in Fontainebleau were part of the Paris School of Mines, a celebrated French tertiary-education and research institution. To see what the geostatistical methodology of the time was like, the interested reader could consult Journel and Huijbregts (1978), for example. Over the following decade, geostatistics became notationally and methodologically integrated into statistical science and the growing field of spatial statistics (e.g., Ripley 1981; Cressie 1993). It took one or two more decades before geostatistics became integrated into the hierarchical-statistical-modeling approach to spatial statistics (e.g., Banerjee et al. 2003; Cressie and Wikle 2011, Chapter 4). The presentation given in our review takes this latter viewpoint and explains well-known geostatistical quantities such as the variogram and kriging in terms of this advanced, modern view of geostatistics. We also include a discussion of uncertainty in the spatial index set as part of our review, which offers new insights into spatial-statistical modeling. Probabilistic difficulties with geostatistics, of making inference on a possibly non-countable number of spatial random variables from a finite number of observations, can be finessed by discretizing the process. In a modern computing environment, this is key to carrying out spatial-statistical inference (including kriging).
Cross-References ▶ Bayesian Inversion in Geoscience ▶ Cressie, Noel A.C. ▶ Krige, Daniel Gerhardus
Spatial Statistics
▶ Matheron, Georges ▶ Geostatistics ▶ High-Order Spatial Stochastic Models ▶ Hypothesis Testing ▶ Interpolation ▶ Kriging ▶ Markov Random Fields ▶ Multiple Point Statistics ▶ Multivariate Analysis ▶ Point Pattern Statistics ▶ Spatial Analysis ▶ Spatial Autocorrelation ▶ Spatial Data ▶ Spatiotemporal Analysis ▶ Spatiotemporal Modeling ▶ Statistical Computing ▶ Stochastic Geometry in the Geosciences ▶ Tobler, Waldo ▶ Uncertainty Quantification ▶ Variogram Acknowledgments Cressie’s research was supported by an Australian Research Council Discovery Project (Project number DP190100180). Our thanks go to Karin Karr and Laura Cartwright for their assistance in typesetting the manuscript.
Bibliography Bacro JN, Gaetan C, Opitz T, Toulemonde G (2020) Hierarchical spacetime modeling of asymptotically independent exceedances with an application to precipitation data. J Am Stat Assoc 115(530):555–569. https://doi.org/10.1080/01621459.2019.1617152 Baddeley A, Rubak E, Turner R (2015) Spatial point patterns: Methodology and applications with R. Chapman & Hall/CRC Press, Boca Raton Bakka H, Vanhatalo J, Illian JB, Simpson D, Rue H (2019) Nonstationary Gaussian models with physical barriers. Spat Stat 29: 268–288. https://doi.org/10.1016/j.spasta.2019.01.002 Banerjee S, Carlin BP, Gelfand AE (2003) Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC Press, Boca Raton Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B Stat Methodol 36(2):192–236 Besag J, York J, Mollié A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43:1–20 Chilès JP, Delfiner P (2012) Geostatistics: Modeling spatial uncertainty, 2nd edn. Wiley, Hoboken Cressie N (1990) The origins of kriging. Math Geol 22(3):239–252. https://doi.org/10.1007/BF00889887 Cressie N (1993) Statistics for spatial data, Revised edn. Wiley, Hoboken Cressie N, Davidson JL (1998) Image analysis with partially ordered Markov models. Comput Stat Data Anal 29(1):1–26 Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. J R Stat Soc Ser B Stat Methodol 70(1):209–226. https://doi.org/10.1111/j.1467-9868.2007.00633.x Cressie N, Kornak J (2003) Spatial statistics in the presence of location error with an application to remote sensing of the environment. Stat Sci 18(4):436–456. https://doi.org/10.1214/ss/1081443228
Spatiotemporal Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, Hoboken Cressie N, Zammit-Mangion A (2016) Multivariate spatial covariance models: A conditional approach. Biometrika 103(4):915–935. https://doi.org/10.1093/biomet/asw045 Datta A, Banerjee S, Finley AO, Gelfand AE (2016) Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J Am Stat Assoc 111(514):800–812. https://doi.org/10. 1080/01621459.2015.1044091 Diggle PJ (2013) Statistical analysis of spatial and spatio-temporal point patterns, 3rd edn. Chapman & Hall/CRC Press, Boca Raton Gelfand AE, Schliep EM (2018) Bayesian inference and computing for spatial point patterns. In: NSF-CBMS regional conference series in probability and statistics, vol 10. Institute of Mathematical Statistics and the American Statistical Association, Alexandria, VA, pp i–125 Gelfand AE, Schmidt AM, Banerjee S, Sirmans C (2004) Nonstationary multivariate process modeling through spatially varying coregionalization. TEST 13(2):263–312 Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic Press, London Kaiser MS, Cressie N (2000) The construction of multivariate distributions from Markov random fields. J Multivar Anal 73(2):199–220 Kalman RE (1960) A new approach to linear filtering and prediction problems. Trans ASME J Basic Eng 82:35–45 Katzfuss M, Guinness J, Gong W, Zilber D (2020) Vecchia approximations of Gaussian-process predictions. J Agric Biol Environ Stat 25(3):383–414. https://doi.org/10.1007/s13253-020-00401-7 Krupskii P, Genton MG (2019) A copula model for non-Gaussian multivariate spatial data. J Multivar Anal 169:264–277 LeSage J, Pace RK (2009) Introduction to spatial econometrics. Chapman & Hall/CRC Press, Boca Raton Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. J R Stat Soc Ser B Stat Methodol 73(4):423–498. https://doi.org/10.1111/j.1467-9868.2011.00777.x Matheron G (1963) Principles of geostatistics. Econ Geol 58(8): 1246–1266 Matheron G (1975) Random sets and integral geometry. Wiley, Hoboken Møller J, Waagepetersen RP (2003) Statistical inference and simulation for spatial point processes. Chapman & Hall/CRC Press, Boca Raton Moores MT, Pettitt AN, Mengersen KL (2020) Bayesian computation with intractable likelihoods. In: Case studies in applied Bayesian data science. Springer, Berlin, pp 137–151 Nguyen H, Cressie N, Braverman A (2012) Spatial statistical data fusion for remote sensing applications. J Am Stat Assoc 107(499): 1004–1018. https://doi.org/10.1080/01621459.2012.694717 Ripley BD (1981) Spatial statistics. Wiley, Hoboken Rue H, Held L (2005) Gaussian Markov random fields: Theory and applications. Chapman & Hall/CRC Press, Boca Raton Simpson D, Rue H, Riebler A, Martins TG, Sørbye SH (2017) Penalising model component complexity: A principled, practical approach to constructing priors. Stat Sci 32(1):1–28 Tawn J, Shooter R, Towe R, Lamb R (2018) Modelling spatial extreme events with environmental applications. Spat Stat 28:39–58 Tobler WR (1970) A computer movie simulating urban growth in the Detroit region. Econ Geogr 46(suppl):234–240 Upton G, Fingleton B (1985) Spatial data analysis by example, volume 1: point pattern and quantitative data. Wiley, Hoboken Wikle CK, Zammit-Mangion A, Cressie N (2019) Spatio-temporal statistics with R. Chapman & Hall/CRC Press, Boca Raton Winkler G (2003) Image analysis, random fields and Markov Chain Monte Carlo methods: A mathematical introduction, 2nd edn. Springer, Berlin Zammit-Mangion A, Cressie N (2021) FRK: an R package for spatial and spatio-temporal prediction with large datasets. J Stat Softw, vol 98, pp 1–48
1373
Spatiotemporal Sandra De Iaco1, Donald E. Myers2 and Donato Posa3 1 Department of Economics, Section of Mathematics and Statistics, University of Salento, National Biodiversity Future Center, Lecce, Italy 2 Department of Mathematics, University of Arizona, Tucson, AZ, USA 3 Department of Economics, Section of Mathematics and Statistics, University of Salento, Lecce, Italy
Synonyms Space-time, ℝd ℝ domain, product space
Definition The term spatio-temporal is associated with a wide area of research, which aims to provide methods and tools for analyzing the evolution of a phenomenon over a spatial and temporal domain. There are many variables that can be viewed as phenomena with a spatio-temporal variability, such as meteorological readings (temperature, humidity, pressure), hydrological parameters (permeability and hydraulic conductivity), and many measures of air, soil, and water pollution. In this context, methods for analyzing and explaining the joint spatio-temporal behavior of the above variables are necessary in order to get different goals: optimization of sampling design network, estimation at unsampled spatial locations or unsampled times and computation of maps of predicted values, assessing the uncertainty of predicted values starting from the experimental measurements, trends’ detection in space and time, particularly important to cope with risks coming from concentrations of hazardous pollutants. Hence, more and more attention is given to the spatial-temporal analysis in order to sort out these issues. The term spatio-temporal and space-time are generally used interchangeably in literature.
Geostatistical Approach Geostatistics is a branch of spatial and spatio-temporal statistics that uses the formalism of random functions to analyze data characterized by spatial or spatio-temporal structure. It is assumed that the data relate to a phenomenon with an underlying continuous evolution, and thus other types of phenomena, e.g., lattice or point processes, are excluded from geostatistical analysis and are not the object of this contribution. Geostatistics was born to solve mining engineering problems; however, the domain of geostatistical applications
S
1374
is much broader to date. Important contributions to the current knowledge about space-time models can be also found in Kyriakidis and Journel (1999) as well as in Christakos (2017), which skillfully combine theory and modeling aspects. A recent text by Cressie and Wikle (2011) presents a statistical approach to spatio-temporal data modeling from the perspective of Bayesian Hierarchical Modeling. The main advantages of geostatistical techniques in modeling spatio-temporal variables are related to the possibility of: • Choosing an appropriate spatio-temporal model for the non-constant mean (or deterministic component), if any. Note that there are no restrictions in the methods or in the models to be used to define the deterministic component. • Analyzing both the spatial distribution and the temporal evolution of the data (or the trend residuals) through a spatio-temporal correlation measure (variogram or covariance function). This paper provides an introduction to the spatio-temporal random function (STRF) theory, together with modeling tools and prediction methods based on kriging. Note that the terms random function and random field are equivalent, although this last terminology is less common.
Space-Time Domain The natural domain for a geostatistical space-time model is ℝd ℝ, where ℝd stands for the d-dimensional spatial domain (d ℕ+) and ℝ for the temporal domain. From a purely mathematical perspective, ℝd ℝ ¼ ℝd þ 1, thus there are no differences between the coordinates. However, physically there is a clear-cut separation between the spatial and time dimensions, and a realistic statistical model will take account of it. It is worth to underline that all technical results on spatial covariance functions or on least-squares prediction, or kriging, in Euclidean spaces apply directly to space-time problems, simply by separating the vector u (representing a point in space-time), into its spatial and temporal components, that is, s and t, respectively. Other spatio-temporal domains are also relevant in practice even if they are not used hereafter. Monitoring data are frequently observed at fixed temporal lags, and it might be useful to model a random function on ℝd ℤ, with time considered discrete. In atmospheric and geophysical applications, the spatial domain of interest is frequently expansive or global and the curvature of the Earth has to be considered; thus, the spatio-temporal domain is defined as ℝ or ℤ, where denotes a sphere in the three-dimensional space.
Spatiotemporal
Problems in Space-Time Although spatio-temporal estimation techniques can be viewed as an extension of the spatial ones or, alternatively, of the temporal ones, a number of theoretical and practical problems must be addressed. These problems are related to the following aspects: • Differences between spatial and temporal phenomena • Data characteristics • Metric in space-time Among the fundamental differences between spatial and temporal phenomena, it is worth mentioning that, while there is the notion of past, present, and future in time, conversely, in a spatial context, there is in general no ordering; in particular, only directional dependence, known as anisotropy, may be found. A typical characteristic of the space-time data is related to the data sampling. Usually, a restricted number of survey stations are established in space, while an intense sampling in time is available at each spatial point. A poor monitoring network is often due to the relatively high cost of the sampling devices and the accessibility of sampling sites. On the other hand, long times series can be easily obtained at each spatial location. Thus, the most common data sets in a space-time context are sparse in space and dense in time. As a consequence, the spatial and temporal correlations are estimated with different degrees of reliability and accuracy. Moreover, temporal periodicity is often associated with non-stationarity in space, and, in many cases, while the temporal periodicity is limited to the first moment, the nonstationarity in space is extended to higher order moments. With respect to time, periodic functions are often combined (with deterministic or stochastic coefficients) in order to describe seasonal components with a certain period (i.e. daily, weekly, monthly cycles according to the the features of the examined variable and temporal aggregation) and different amplitudes and phases. In the presence of quasiperiodic temporal behavior with varying periods, amplitudes, and phases, it is common to assume linear or quadratic trends, unless long time series allow to detect and model a more complex functional form for the mean component. As regards the spatial non-stationarity, a wide class of drift functions is represented by polynomials or, in general, by linear combinations of basic functions, such as powers, exponentials, sigmoids, sines, and cosines in the position coordinates. Some difficulties rise in distinguishing the drift from the spatial correlated residuals, then in modeling the same drift. The intrinsic random functions of order k may be used to solve the problem when polynomial drift functions are
Spatiotemporal
1375
reasonable. Problems may also originate from relevant disparity in the variances at different locations. In this case, it is advisable to divide the spatial domain into more homogeneous regions or remove the outliers sites, if any. One of the best approaches in extending the usual geostatistical tools to space-time domain is to treat time as another dimension. After that, spatial and temporal coordinates can be kept separate or can be used to introduce a metric into the higher dimensional space. However, since the units for space and time are disparate, e.g., meters and hours, one might ask if it makes any sense to define a spatio-temporal metric, dðu1 , u2 Þ ¼ aðx1 x2 Þ2 þ bðy1 y2 Þ2 þ cðt1 t2 Þ2
1=2
,
with u1 ¼ (x1, y1, t1), u2 ¼ (x2, y2, t2) where (x1, y1), (x2, y2) D ℝ2, and t1, t2 T ℝ, where D and T are the spatial and temporal domains, respectively.
Basic Theoretical Framework In the literature, geostatistical methods have been widely used to analyze spatial and, more recently, spatio-temporal processes in nearly all the areas of applied sciences and especially in Earth Sciences, such as Hydrogeology, to study the movement, distribution and management of water; Environmental engineering, to study concentrations of pollutants in different environmental media, water/air/soil; Meteorology for the analysis of variations in atmospheric temperature, density, and moisture contents. In these contexts, geostatistical tools are flexible enough to analyze the spatio-temporal evolution of a wide range of phenomena, where their spatio-temporal pattern presents a systematic structure at the macroscopic level and a random behavior at the microscopic level. For this reason, suitable stochastic models for spatio-temporal processes should be considered, and a theory of STRFs is needed in order to proceed with the rigorous mathematical modeling of processes that jointly change in space and time. The observed values of a spatio-temporal process are considered as a finite realization of a STRF {Z(s, t); s D, t T}, where D ℝd, d 3, is the spatial domain and T ℝ+ is the temporal domain. The STRF Z can be decomposed as follows: Z ðs, tÞ ¼ mðs, tÞ þ Y ðs, tÞ,
ð1Þ
where m(s, t) ¼ E[Z(s, t)] represents a macro scale component, which can be nonconstant in space and time, and Y represents the residual stochastic component with E[Y(s, t)] ¼ 0, which describes the random fluctuations, around
m, on a small scale. The mean, or deterministic component, m( ) is defined as a linear combination of linearly independent functions of the space-time coordinates; it is also called drift, while the term trend, erroneously associated to m( ), pertains to data not to the STRF model. Under second order stationarity hypothesis, the expected value of Z is assumed to be constant over the domain, i.e., E[Z(s, t)] ¼ m, and its second order moments, such as the spatio-temporal variogram and the covariance function, depend on the spatio-temporal lag (hs, ht), for any (s, t), (s0, t0) in the spatio-temporal domain, with hs ¼ s s0 and ht ¼ t t0. Then, given the hypothesis of second order stationarity, the spatio-temporal variogram and the covariance function are defined, respectively, as follows: gðhs , ht Þ ¼ 0:5E½Z ðs, tÞ Zðs þ hs , t þ ht Þ 2 ,
ð2Þ
Cðhs , ht Þ ¼ E½Zðs, tÞZðs þ hs , t þ ht Þ m2 :
ð3Þ
The variogram is a measure of dissimilarity, in the sense that it increases when the distance between (s, t) and (s þ hs, t þ ht) increases; thus, it is usually characterized by small values for short distances in space and time. On the other hand, the covariance function is a measure of similarity, in the sense that it usually decreases when the distance between (s, t) and (s þ hs, t þ ht) increases; thus, it is often characterized by high values for short distances in space and time. Note that the term covariogram is associated with the graphical representation of the covariance function. In a 2D domain, the covariogram is obtained through a Cartesian diagram, with the stationary covariance function on the vertical axis and the distance on the horizontal axis. A space-time variogram function γ must be conditional negative definite, namely n
n
ai aj g si sj , ti tj 0,
ð4Þ
i¼1 j¼1
for any n ℕ+, any (si, ti) D T, where the coefficients ai ℝ respect the condition: n
ai ¼ 0:
ð5Þ
i¼1
The function γ is conditional strictly negative definite if the quadratic form in (4) is strictly less than zero for any n ℕ+, any choice of distinct points (si, ti), and any ai ℝ, not all zero, under the condition (5). Similarly the covariance function must be positive definite, namely:
S
1376
Spatiotemporal n
n
ai aj C si sj , ti tj 0,
ð6Þ
i¼1 j¼1
for any n ℕ+, any (si, ti) D T and any ai ℝ. It is easy to show that given a linear combination of spatialtemporal random variables, n
C¼
ai Z ðsi , ti Þ, i¼1
the above conditions ensure the variance to be non-negative. The function C is strictly positive definite (SPD) if it is excluded the chance that the quadratic form in (6) is equal to zero for any choice of distinct points (si, ti), 8 ai ℝ (with at least one different from zero) and 8n ℕ+. Note also that strict positive definiteness is desirable, since it ensures the invertibility of the kriging coefficient matrix (De Iaco and Posa 2018). Given the difficulties to verify the conditions (4) and (6), it is advisable for users to look for the best model among the wide parametric families whose members are known to respect the above-mentioned admissibility conditions. It is worth highlighting that, although a number of studies provide theoretical results in terms of the covariance, the space-time correlation is usually analyzed through the variogram, which is preferred to the covariance for several reasons (Cressie and Grondona 1992). A space-time geostatistical analysis can be carried out through the following steps: 1. Look for a model of the STRF from which data could reasonably be derived 2. Estimate the variogram or the covariogram, as a measure of the spatio-temporal correlation exhibited by the data 3. Fit a suitable model to the estimated variogram or covariogram; 4. Use this last model in the kriging system for spatiotemporal prediction. Regarding the first point, it is common to use the model for the STRF given in (1), which can include a component m and the stationary residual component Y. In presence of a non-constant component with large-scale variation, this is commonly estimated (e.g., through least squares regression) and removed from the observed data; then the spatio-temporal correlation analysis is conducted on the residuals, which can be reasonably assumed to be a realization of a second order stationary random field. After analyzing and modeling both components of model (1), predictions of the variable under study can be computed by adding the estimated drift to the predicted residuals, obtained through ordinary kriging.
Alternatively, other forms of kriging can be used in case of non-stationarity, as will be clarified in the section dedicated to prediction.
Structural Analysis Structural analysis is executed on the spatio-temporal realizations of a second-order stationary random function. These realizations correspond directly to the observed values (if the macro scale component can be reasonably supposed constant over the domain) or to the residuals otherwise. The two steps in the structural analysis consist in estimating the space-time variogram or covariogram and fitting a model. Then, geostatistical analysis proceeds with model validation.
Sample Variogram and Covariogram Under second order stationarity, structural analysis begins with the estimation of the space-time variogram or covariogram of the random function Z. Given the set A ¼ {(si, ti), i ¼ 1, 2. . ., n} of data locations in space-time, the variogram can be estimated through the sample space-time variogram g as follows: gð rs , r t Þ ¼
1 2jLðrs , r t Þj
Lðrs , rt Þ
½Zðs þ hs , t þ ht Þ Zðs, tÞ 2 , ð7Þ
where |L(rs, rt)| is the cardinality of the set. L(rs, rt) ¼ {(s þ hs, t þ ht) A, (s, t) A : hs Tol(rs) and ht Tol(rt)}, and Tol(rs), Tol(rt) are, respectively, some specified tolerance regions around rs and rt. Similarly, the space-time sample covariogram C is defined as follows: Cð rs , r t Þ ¼
1 ½Zðs þ hs , t þ ht Þ m ½Zðs, tÞ m , jLðrs , rt Þj Lðr , r Þ s
t
ð8Þ where m is substituted by the sample mean Z, if m is an unknown constant. Note that the estimator of the space-time variogram is not influenced by the expected value m or its estimator Z and is unbiased with respect to γ. It is important to underline that no space-time metric is needed for the computation of both estimators. In fact, the pairs of points separated by (hs, ht) are detected by computing, separately, the purely spatial and purely temporal distances; thus, the pairs of realizations, z(s, t) and z(s þ hs, t þ ht), correspond to the observations at the points that are simultaneously separated by hs, in the space domain, and ht, in the time domain.
Spatiotemporal
1377
Models for Variogram or Covariance Function The second step of structural analysis consists in fitting a theoretical valid model to the sample space-time variogram or covariogram. Nowadays, various classes of space-time covariance functions or variograms are available, such as the metric class of models, the sum model, the product model, and a wide range of families of non-separable space-time models, such as the product-sum and its integrated version, the classes proposed by Cressie-Huang, Gneiting, Ma, and Porcu to cite a few (De Iaco et al. 2013): • The Cressie-Huang class of models (Cressie and Huang 1999) eihs v rðht ; vÞkðvÞdv, T
Cðhs , ht Þ ¼ ℝ
ð9Þ
d
where r( ; v) is a continuous integrable correlation function for all v ℝd, and k( )is a positive function, which is integrable on ℝd; • The Gneiting class of models (Gneiting 2002) C ð hs , h t Þ ¼
s2 c h2t
f d=2
khs k2 , c h2t
ð10Þ
where f(t), t 0, is a completely monotone function, c(t), t 0 is a positive function with completely monotone derivative and s2 is the variance; • The Ma class of models (Ma 2002) 1
fCS ðhs ÞCT ðht Þgk pk ,
Cðhs , ht Þ ¼
ð11Þ
k¼0
where CS( ) and CT( ) are purely spatial and purely temporal covariance functions on D and T, respectively; {pk, k ℕ} is a discrete probability function; • The Porcu, Mateu, and Gregori class (Porcu et al. 2007; Mateu et al. 2007) Cðhs , ht Þ ¼
1 1 0
0
d
exp
ci ðjhi jÞv1 ct ðjht jÞv2 dF ðv1 , v2 Þ,
• The class of integrated models (De Iaco et al. 2002) Cðhs , ht Þ ¼
k1 CS ðhs ; xÞCT ðht ; xÞ þ k2 CS ðhs ; xÞ
ð13Þ
V
þ k3 CT ðht ; xÞ dmðxÞ, where m(x) is a positive measure on U ℝ, CS(hs; x) and CT(ht; x) are covariance functions defined on D ℝd and T ℝ, respectively, for all x V U, k1 > 0, k2, k3 0. If k2 ¼ k3 ¼ 0, the above class of models is called integrated product models; otherwise, it is called integrated product-sum models. As specified in the following, this class is very flexible, since it is able to describe all types of non-separability. The choice of an appropriate class of models can be supported by testing the main properties (symmetry/asymmetry, separability/non-separability, type of non-separability) of the spatio-temporal sample variogram/covariance function (Cappello et al. 2018); the related computational aspects were developed in Cappello et al. (2020). Although the selection of a suitable class of models can be based on its geometric features and theoretical properties, in practice, the generalized product-sum model, thanks to its versatility, is largely used in different areas, ranging from environmental sciences to medicine and from ecology to hydrology. This last model, written in terms of variogram, has the following form: gðhs , ht Þ ¼ gðhs , 0Þ þ gð0, ht Þ kgðhs , 0Þgð0, ht Þ,
ð14Þ
where γ(hs, 0) and γ(0, ht) are the spatial and temporal marginal variograms, respectively (De Iaco et al. 2001). The power of the model lies in the flexibility of the fitting process, which uses the sample marginals and only one parameter, which depends on the global sill (i.e., the space-time variogram upper bound). The parameter k is selected in such a way to ensure that the global sill is fitted. Admissible values for this parameter are dependent on the sill values of the spatial and temporal marginals γ(hs, 0) and γ(0, ht), as specified in De Iaco et al. (2001). Moreover, the generalized product-sum model enables modeling processes characterized by different variability along space and time.
Model Validation
i¼1
ð12Þ where F is a bivariate distribution function ci and ct are Bernstein functions (positive functions on ℝ+ with completely monotone derivatives), with ci(0) ¼ ct(0) ¼ 1;
After modeling the space-time empirical variogram/ covariogram surface, the subsequent step is to evaluate the reliability of the fitted model through the application of spatio-temporal cross-validation and jackknife techniques. Then, if the model performance is satisfactory, the same model can be used to predict the variable under study over
S
1378
Spatiotemporal
the spatial domain and for future time points by space-time kriging. Cross-validation allows a comparison to be made between estimated values and observed ones, simply using the sample information: the value measured at a fixed space-time point is temporarily removed, and at the same point, the value of the variable of interest is estimated by using all the other available values and the fitted space-time variogram/covariogram model. This process is repeated for all the sample points. Finally, statistical tools, such as the linear correlation coefficient between the observed values and the estimated ones, can be used to evaluate the goodness of the fitted model. In the jackknife technique, the variogram/covariogram model is used to estimate the variable under study measured at some points, which are usually not coincident with the original sample points. Then, the data available at the points, where the estimation is required, are compared with the estimated values. In other terms, the validation is based on a new data set, not previously used in the structural analysis and thus not considered in the construction of the covariance/variogram model to be validated.
Prediction Geostatistical spatio-temporal prediction techniques can be viewed as an extension of the spatial ones. Indeed, given the observed data {z(ui) ¼ z(s, t)i, i ¼ 1, . . ., n}, the problem is to predict Z(u) ¼ Z(s, t) at the unsampled space-time location u ¼ (s, t). For this aim, the following class of linear predictors is defined: n
Z ðuÞ ¼
li ðuÞZ ðui Þ:
ð15Þ
i¼1
The weights li( ), i ¼ 1, . . ., n are obtained, through the kriging methods, by requiring that (15) is unbiased and the mean squared prediction error is minimized. In particular, under the second-order stationarity, one can choose between simple kriging, if the expected value m is constant and known, and ordinary kriging, if the expected value m is constant and unknown. In this last case, the following linear system, often called ordinary kriging system, is obtained: n i¼1 n
li ðuÞg ui uj oðuÞ ¼ g uj u , j ¼ 1, . . . , n, li ðuÞ
¼ 1,
i¼1
ð16Þ where γ represents the space-time variogram of Z and o is the Lagrange multiplier. The above system can be also expressed
in terms of a space-time covariance model. Note that the kriging system requires the knowledge of the space-time covariance/variogram model to be solved in terms of the weights li( ), i ¼ 1, . . ., n.. On the other hand, if the expected value of the random field Z is known, then the estimator of Z at the unsampled spacetime location u ¼ (s, t) is equivalent to the estimator of the zero-mean random field Y at u and the weights li( ), i ¼ 1, . . ., n are the solutions of the following linear system of n equations, called simple kriging system: n
li ðuÞg ui uj ¼ g uj u ,
j ¼ 1, . . . , n,
ð17Þ
i¼1
This systems has a unique solution if the coefficient matrix (or the variogram/covariogram matrix) is non-singular, and this is guaranteed if and only if the covariance function is strictly positive definite (or the variogram is conditionally strictly negative definite) and if all sample points are distinct. Note that the simple and ordinary kriging are free-distribution predictors and are considered to be the “best” because they minimize the estimation variance among all the linear unbiased estimators. In case of a Gaussian random field, the simple kriging predictor coincides with the conditional expectation E(Z j Z1, . . ., Zn), thus it is the best estimator of Z in the mean square sense. Moreover, if the unknown expected value cannot be assumed to be constant, the drift can be modeled (even by using external variables) and removed, then the above stochastic methods for spatio-temporal prediction can be proposed for the residuals. If the expected value of the random function can be expressed in a polynomial functional form, kriging with a trend model, known as universal kriging, can be applied. Alternatively, the intrinsic random functions of order k, introduced by Matheron (1973), can be also considered; in this case the drift component is interpreted as a deterministic component, modeled with local polynomial forms. In both options, these kriging predictors can be built to have minimum error variance subject to the unbiasedness constraint.
Computational Aspects Spatio-temporal data analysis can be supported by using various packages, such as the ones in the R programming language. Among these, it is worth mentioning the package space-time (Pebesma 2012) for dealing with spatio-temporal data, the package gstat for geostatistical modelling and interpolation (Gräler et al. 2016), where different choices of spatio-temporal covariance models have been implemented, such as the separable, product-sum, metric, and sum-metric
Spatiotemporal
models. In addition, the R package RandomFields (Schlather et al. 2015) supports kriging, conditional simulation, covariance functions, and maximum likelihood function fitting for a wide range of spatio-temporal covariance models. Padoan and Bevilacqua (2015) built the R package CompRandFld for the analysis of spatial and spatio-temporal Gaussian and binary data, and spatial extremes through composite likelihood inferential methods. A more complete list of R packages related to spatio-temporal topics is available on CRAN Task Views “SpatioTemporal” (Pebesma 2020). Other contributions concern specialized routines and packages which perform spatio-temporal geostatistical analysis (De Cesare et al. 2002). The package covatest can help to cope with the problem of selecting a suitable class of space-time covariance functions for a given data set (Cappello et al. 2020).
Key Research Findings Major applications of spatio-temporal modeling and kriging continue to be found in Earth Sciences, Engineering, Mining as well as in Environmental Health and Biodiversity. Specific case studies along these lines are too numerous to be mentioned here, and thus interested readers are invited to consult some recent review books (Montero et al. 2015) for more details. Rather we provide a brief overview of a selected number of key research findings with regards to recent contributions on some properties (such as symmetry/asymmetry, separability/non-separability, type of non-separability) of the space-time covariance functions (Gneiting et al. 2007; De Iaco et al. 2019; Cappello et al. 2020), which can support the model selection. Indeed, one might look for the class of models whose properties are consistent with respect to the characteristics of the empirical space-time covariance surface estimated from the data. For this reason, as clarified in De Iaco et al. (2013), finding an answer to the following questions is essential: (a) How do the spatial and/or the temporal variogram/covariance marginals behave at the origin? (b) Does the space-time data set present different variability along space and time? (c) Which kind of spatial anisotropy is required by the data? d) Is there a space-time interaction? e) Is the empirical space-time variogram/covariance function fully symmetric?
Anisotropy in Space-Time A covariance function is said to be anisotropic in space if at least two directional spatial covariance functions differ. In the classical literature, two types of anisotropy are usually defined: the first type is the geometric anisotropy, which is associated with an isotropic covariance function where the coordinates are linearly transformed; the second one is the
1379
stratified or zonal anisotropy, which is associated with the sum of covariance models on factor spaces. Space and time cannot be directly comparable: hence, the definition of isotropy has no meaning for STRFs, although several textbooks and papers have tried to provide a definition of isotropy or geometric anisotropy in this context. An attempt to introduce a geometric anisotropy in space-time was given with the metric covariance model, where the spatial and temporal distances, hs and ht, are re-scaled through the use of appropriate spatial and temporal ranges, as and at, so that the terms hs/as and ht/at are a-dimensional. This allows to work with spatial and temporal coordinates with different physical dimensions (and units). In some cases (diffusion and transport processes), this anisotropy might describe rather well the reality, but the main problem is how to reasonably define the scale parameters on the basis of a physical interpretation. It is also worth highlighting that a second order stationary STRF Z, defined on ℝd ℝ, is said to be spatially isotropic in the weak sense, if its covariance function is such that CWI ðhs , ht Þ ¼ Cðkhs k, jht jÞ:
ð18Þ
On the other hand, the zonal anisotropic models can be built as the sum of different covariance models, each defined on different subspaces of the spatio-temporal domain. For example, zonal anisotropic covariance functions in a spacetime domain ℝd ℝ can be expressed as sum of spatial and temporal covariance models CS and Ct as follows: Cðhs , ht Þ ¼ CS ðhs Þ þ CT ðht Þ, where hs ℝd and ht ℝ. However, such models can lead to non-invertible kriging matrices, i.e., even though CS and Ct are each SPD on the respective subspaces, their sum is only positive definite on the higher dimensional space (Myers and Journel 1990). Thus, an alternative zonal anisotropic covariance model can be obtained by summing to a weak isotropic component, as in (18), another component which can be a purely spatial covariance function, i.e., Cðhs , ht Þ ¼ CWI ðkhs k, jht jÞ þ CS ðhs Þ: In the spatio-temporal literature, various classes of stationary non-separable covariance functions with spatial anisotropy are available (Porcu et al. 2006).
Separability The STRF Z on ℝd ℝ is said to have a space-time separable covariance, if it can be written as the product of a purely
S
1380
Spatiotemporal
spatial covariance function CS and a purely temporal covariance function CT, defined on ℝd and ℝ, respectively, i.e., Cðhs , ht Þ ¼ CS ðhs ÞCT ðht Þ,
ð19Þ
or equivalently Cðhs , ht Þ ¼
Cðhs , 0ÞCð0, ht Þ , Cð0, 0Þ
ð20Þ
where hs ℝd and ht ℝ. This implies that there is no interaction in space-time. It is clear that the above-mentioned spatio-temporal separability is essentially a partial separability. In case of non-separability, another interesting aspect concerns the type of non-separability, that is: pointwise or uniform and positive or negative (De Iaco and Posa 2013). The type of non-separability can be easily detected by computing, for each lag, the ratio between the empirical spacetime covariance and the product of the sample spatial and temporal covariance marginals Cðhs , 0Þ and Cð0, ht Þ, if these last are positive. Indeed, if the sample ratio is overall greater (less) than 1, uniformly positive (negative) non-separable covariance classes are required. On the other hand, if the sample ratio is greater (less) than 1 for some lags and it is less (greater) than 1 for other lags, space-time covariance classes which are non-uniformly non-separable should be considered. Regarding this aspect, a very flexible class of models is the integrated family given in (13), since pointwise or uniform positive/negative non-separable models can be derived from it.
Symmetry Given a stationary space-time covariance function, it is fully symmetric if
covariance functions, thus methods for testing isotropy can be often used to test symmetry properties. Note that the definitions of strict positive definiteness, symmetry, and separability can be given for STRFs, without assuming any form of stationarity. On the other hand, the definition of isotropy is appropriate only for second order stationary spatial random functions on ℝd and can be extended to a space-time context only in a weak sense. Figure 1 offers a graphical representation on relations among separability, symmetry, isotropy, and SPD condition for stationary random functions in space-time. Assuming full symmetry is equivalent to axial symmetry with respect to the temporal axis, and separability between space and time is equivalent of introducing a partial separability. Regarding isotropy, a spatio-temporal random function is said spatially isotropic in the weak sense, if condition (18) is satisfied. Thus, in a space-time context, it is worth pointing out that isotropy, as in (18), or partial separability, as in (19), implies axial (or full) symmetry, as in (21), the converse is not true. A covariance model can be isotropic and separable if and only if all the factors of the product model are Gaussian covariance functions (gray areas in Fig. 1), as clarified in De Iaco et al. (2020). Strict positive definiteness is also a transverse characteristic among the other mentioned properties (De Iaco and Posa 2018).
Conclusions At the end, it is worth pointing out that nowadays there are important areas of research that involve random functions in space-time (Porcu et al. 2012) and relevant computational challenges that involve big spatial and spatiotemporal data sets. Some significant lines of research can be found in: • Multivariate spatio-temporal analysis which includes methods for modeling, predicting and simulating two or
Cðhs , ht Þ ¼ Cðhs , ht Þ ¼ Cðhs ,ht Þ ¼ Cðhs , ht Þ,
8ðhs , ht Þ ℝd ℝ:
ð21Þ
Full symmetry can be an unrealistic assumption for spacetime covariance functions encountered in many cases when there is a dominant flow direction over time (for example, in atmospheric processes). Note that the definition of full symmetry for a space-time covariance function corresponds to the definition of reflection or axial symmetry on ℝ2. Although the two definitions are referred to different domains, in both cases the domains are partitioned into two factor domains (i.e., ℝ ℝ or ℝd ℝ). Taking into account the above definitions, it is important to point out that symmetry properties represent weaker hypotheses than isotropy, since isotropy requires that C is only a function of the norm of the lag vector. Isotropic covariance functions are a subset of symmetric
Spatiotemporal, Fig. 1 Relationships among some common secondorder properties for) spatio-temporal covariance functions (gray areas represent the class of Gaussian covariance functions)
Spatiotemporal
•
•
•
•
•
more variables in space and time (Apanasovich and Genton 2010; Genton and Kleiber 2015; Cappello et al. 2022b) Complex-valued random functions which have applications in modeling wind fields, sea currents, electromagnetic fields (Posa 2020, 2021; De Iaco 2022; Cappello et al. 2022a) Bayesian methods which are gaining popularity both in spatial and spatio-temporal statistics and in physics (Diggle and Ribeiro 2007; Xu et al. 2015; Gelfand and Banerjee 2017) Machine learning methods developed by the computational science researchers which can also be applied to spatio-temporal data (Hristopoulos 2015; McKinley and Atkinson 2020; De Iaco et al. 2022) Modeling a spatio-temporal random field that has nonstationary covariance structure in both space and time domains (Porcu 2007; Zimmerman and Stein 2010; Shand and Li 2017) Parametric covariance modeling techniques for space-time processes, where space is the sphere representing our planet (Jun and Stein 2012; Porcu et al. 2016, 2018)
Most of the modeling advances have been contributing to face interpolation or prediction problems, perform stochastic simulation, and deal with classification issues. In conclusion, this scientific field has been very active in the last 30 years, as confirmed by the extensive literature and the dedicated sessions in worldwide conferences, and still presents broad margins of progress.
Cross-References ▶ Copula in Earth Sciences ▶ Kriging ▶ Spatial Analysis ▶ Spatial Autocorrelation ▶ Spatial Statistics ▶ Time Series Analysis in the Geosciences ▶ Variance
References Apanasovich TV, Genton MG (2010) Cross-covariance functions for multivariate random fields based on latent dimensions. Biometrika 97(1):15–30 Cappello C, De Iaco S, Posa D (2018) Testing the type of nonseparability and some classes of space-time covariance function models. Stoch Environ Res Risk Assess 32:17–35 Cappello C, De Iaco S, Posa D (2020) Covatest: an R package for selecting a class of space-time covariance functions. J Stat Softw 94(1):1–42
1381 Cappello C, De Iaco S, Maggio S, Posa D (2022a) Modeling spatiotemporal complex covariance functions for vectorial data. Spat Stat 47:100562 Cappello C, De Iaco S, Palma M (2022b) Computational advances for spatio-temporal multivariate environmental models. Comput Stat 37:651–670 Christakos G (2017) Spatiotemporal random fields, 2nd edn. Elsevier, Amsterdam Cressie N, Grondona M.O (1992) A comparison of variogram estimation with covariogram estimation. In: Mardia, K.V. (ed.) The art of statistical science. Wiley, Chichester Cressie N, Huang H (1999) Classes of nonseparable, Spatio-temporal stationary covariance functions. J Am Stat Assoc 94:1330–1340 Cressie N, Wikle CL (2011) Statistics for Spatio-temporal data. Wiley, New York De Cesare L, Myers DE, Posa D (2002) Fortran programs for space-time modeling. Comput Geosci 28(2):205–212 De Iaco S (2022) New spatio-temporal complex covariance functions for vectorial data through positive mixtures. Stoch Environ Res Risk Assess 36:2769–2787 De Iaco S, Posa D (2013) Positive and negative non-separability for space-time covariance models. J Stat Plan Infer 143:378–391 De Iaco S, Posa D (2018) Strict positive definiteness in geostatistics. Stoch Environ Res Risk Assess 32:577–590 De Iaco S, Myers DE, Posa D (2001) Space-time analysis using a general product sum model. Stat and Probab Letters 52(1):21–28 De Iaco S, Myers DE, Posa D (2002) Nonseparable space–time covariance models: some parametric families. Math Geol 34(1):23–41 De Iaco S, Palma M, Posa D (2019) Choosing suitable linear coregionalization models for spatio-temporal data. Stoch Environ Risk Assess 33:1419–1434 De Iaco S, Posa D, Myers DE (2013) Characteristics of some classes of space-time covariance functions. J Stat Plan Infer 143(11): 2002–2015 De Iaco S, Hristopulos DT, Lin G (2022) Special issue: geostatistics and machine learning. Math Geosci 54:459–465 De Iaco S, Posa D, Cappello C, Maggio S (2019) Isotropy, symmetry, separability and strict positive definiteness for covariance functions: a critical review. Spat Stat 29:89–108 De Iaco S, Posa D, Cappello C, Maggio S (2020) On some characteristics of Gaussian covariance functions. Int Stat Rev 89(1):36–53 Diggle PJ, Ribeiro Jr. PJ (2007) Model-based Geostatistics, Springer Science+Business Media, New York, NY, USA Gelfand AE, Banerjee S (2017) Bayesian Modeling and Analysis of Geostatistical Data. Annu Rev Stat Appl 4:245–266 Genton MG, Kleiber W (2015) Cross-covariance functions for multivariate geostatistics. Stat Sci 30(2):147–163 Gneiting T (2002) Nonseparable, stationary covariance functions for space–time data. J Am Stat Assoc 97(458):590–600 Gneiting T, Genton MG, Guttorp P (2007) Geostatistical space-time models, stationarity, separability and full symmetry. In: Finkenstaedt B, Held L, Isham V (eds) Statistics of Spatio-temporal systems. Monographs in statistics and applied probability. Chapman & Hall/CRC Press, Boca Raton, pp 151–175 Gräler B, Pebesma E, Heuvelink G (2016) Spatio-temporal interpolation using gstat. The R Journal 8(1):204–218 Hristopulos DT (2015) Stochastic local interaction (SLI) model: Bridging machine learning and geostatistics. Comput Geosci 85(Part B):26–37 Jun M, Stein ML (2012) An Approach to Producing Space–Time Covariance Functions on Spheres. Technometrics 49:468–479 Kyriakidis PC, Journel AG (1999) Geostatistical space-time models: a review. Math Geol 31:651–684 Ma C (2002) Spatio-temporal covariance functions generated by mixtures. Math Geol 34(8):965–975 Mateu J, Porcu E, Gregori P (2007) Recent advances to model anisotropic space–time data. Stat Method Appl 17(2):209–223
S
1382 McKinley JM, Atkinson PM (2020) A special issue on the importance of geostatistics in the era of data science. Math Geosci 52:311–315 Montero JM, Gema Fernández-Avilés G, Mateu J (2015) Spatial and Spatio-temporal Geostatistical modeling and kriging. Wiley, Chichester Myers DE, Journel AG (1990) Variograms with zonal anisotropies and non-invertible kriging systems. Math Geol 22(7):779–785 Padoan SA, Bevilacqua M (2015) Analysis of random fields using CompRandFld. J Stat Softw 63(9):1–27 Pebesma EJ (2012) spacetime: Spatio-Temporal Data in R. J Stat Softw 51(7):1–30 Pebesma EJ (2020) CRAN task view: handling and analyzing Spatiotemporal data. Version 2020-03-18, URL https://CRAN.R-project. org/view¼SpatioTemporal Porcu E (2007) Covariance functions that are stationary or nonstationary in space and stationary in time. Stat Neerl 61(3):358–382 Porcu E, Gregori P, Mateu J (2006) Nonseparable stationary anisotropic space-time covariance functions. Stoch Environ Res Risk Assess 21: 113–122 Porcu E, Gregori P, Mateu J (2007) La descente et la montée étendues: the spatially d-anisotropic and spatio-temporal case. Stoch Environ Risk Assesst 21(6):683–693 Porcu E, Montero J, Schlather M (eds) (2012) Advances and challenges in space-time modelling of natural events. Lecture notes in statistics, vol 207. Springer, Heidelberg Porcu E, Bevilacqua M, Genton MG (2016) Spatio-temporal covariance and cross-covariance functions of the great circle distance on a sphere. J Am Stat Assoc 111(514):888–898 Porcu E, Alegrìa A, Furrer R (2018) Modeling temporally evolving and spatially globally dependent data. Int Stat Rev 86(2):344–377 Posa D (2020) Parametric families for complex valued covariance functions: some results, an overview and critical aspects. Spat Stat 9:100473 Posa D (2021) Models for the difference of continuous covariance functions. Stoch Environ Res Risk Assess 35:1369–1386 Schlather M, Malinowski A, Menck PJ, Oesting M, Strokorb K (2015) Analysis, simulation and prediction of multivariate random fields with package RandomFields. J Stat Softw 63(8):1–25 Shand L, Li B (2017) Modeling nonstationarity in space and time. Biometric 73(3):759–768 Xu G, Liang F, Genton MG (2015) A bayesian spatio-temporal geostatistical model with an auxiliary lattice for large datasets. Stat Sin 25(1):61–79 Zimmerman DL, Stein M (2010) Constructions for nonstationary spatial processes. In: Gelfand AE, Diggle P, Guttorp P, Fuentes M (eds.) Handbook of Spatial Statistics, CRC Press, Boca Raton, p. 119–127
Spatiotemporal Analysis Shrutilipi Bhattacharjee1, Johannes Madl2, Jia Chen2 and Varad Kshirsagar3 1 National Institute of Technology Karnataka (NITK), Surathkal, India 2 Technical University of Munich, Munich, Germany 3 Birla Institute of Technology and Science, Pilani, Hyderabad, India
Synonyms Remote sensing; Spatiotemporal analysis; Trace gases
Spatiotemporal Analysis
Definition The trace gases are those constituents of the atmosphere that exist in very small amounts, in total approximately around 0.1% of the atmosphere. Many common trace gases, such as water vapor, carbon dioxide (CO2), methane (CH4), and ozone (O3), are the greenhouse gases (GHGs) which are increasing rapidly in Earth’s atmosphere due to anthropogenic activity, therefore raising significant attention for its measurement and analysis. On the other hand, nitrogen dioxide (NO2) is not a GHG but is important for the formation of tropospheric ozone, therefore indirectly contributing to the GHG concentration hike. The main sources of NO2 are cars, trucks, power plants, and other industrial facilities that are burning fossil fuels (How to Find and Visualize 2021). The gas in high concentration can cause serious issues in the human respiratory system, even exposure over a short time may cause difficulty in breathing, aggravate respiratory diseases, particularly asthma, etc. In this chapter, a brief overview of spatiotemporal analysis of a few atmospheric trace gases is discussed with the inclusion in the recent developments of their remote sensing–based measurements. In particular, the applications of visualization, validation of trace gas measurements, and different analysis methods, such as prediction for missing data, atmospheric transport modeling, inverse modeling, machine learning, etc., are presented. Their spatiotemporal analysis leads to a broad spectrum of applications. Different analysis approaches are discussed in association with the recent investigations of three common atmospheric trace gases, CO2, CH4, and NO2.
Remote Sensing–based Measurement of the Trace Gases Many satellites are currently observing the atmospheric trace gas concentrations, mainly the GHGs and air pollutants, and measuring their spatiotemporal concentration data across the globe. Examples include the Orbiting Carbon Observatory-2 (OCO-2) (Crisp 2015) from NASA launched in 2014, and its successor OCO-3 (Eldering et al. 2019) which mainly estimates the column-averaged dry-air mole fraction of carbon dioxide (XCO2). Another complementary example is the Sentinel 5p satellite (Judd et al. 2020), the TROPOMI spectrometer of which estimates many trace gases, including O3, CH4, CO, NO2, sulphur dioxide (SO2), etc. and aerosols in the atmosphere. These spectrometers on aboard the satellite are measuring the emitted or reflected radiation from the Earth, to extract different parameters related to the Earth’s surface and atmosphere. The satellite retrievals are not the direct measurements; the instruments on satellites are measuring electromagnetic energy radiating from the Earth and atmosphere, which gets converted into geophysical parameters. Due to
Spatiotemporal Analysis
great achievements in enhancing measurement techniques and instruments aboard, it is possible to gain better resolutions in time and space of satellite-borne trace gas data. A good example of this can be seen by comparing the Dutch-Finnish Ozone Monitoring Instrument (OMI) on the NASA Aura spacecraft in 2004 having a spatial resolution of 13 km 24 km at nadir, and its successor, the Tropospheric Monitoring Instrument (TROPOMI) on the satellite Sentinel 5p, launched by ESA in 2017, having a spatial resolution of 3.5 km 5.5 km approximately at nadir (Judd et al. 2020). Besides good improvements in the spatial resolution, the temporal resolution of satellites generally ranges between 1 – 16 days, depending on the orbit, the sensor’s characteristics, and the swath width (What is Remote Sensing? 2022). In the case of the OCO-2 satellite, the temporal resolution is 16 days, which means that a huge portion of the Earth is unmeasured in a particular day. Whereas the Sentinel 5P provides the daily global coverage. The other well-known satellites for the trace gas measurements are the Exploratory Satellite for Atmospheric CO2 (TanSat), Japanese Greenhouse Gases Observing SATellite (GOSAT), Greenhouse Gas Satellite-Demonstrator (GHGSat-D), etc. Combining all of the desirable features into a single remote sensor is very difficult and there are trade-offs. To acquire observations with high spatial resolution, a narrower swath is required, which requires more time between observations of a given area, causing lower temporal resolution (What is Remote Sensing? 2022). Further, the missing data are preventing a complete understanding of the global cycles of trace gases, mechanisms that control their spatial and temporal variability, and other important features (Bhattacharjee and Chen 2020). In order to get the full picture of the Earth’s atmospheric conditions, trace gases, and analyze them for different applications, spatiotemporal methods need to be applied to the collected data. An appropriate spatiotemporal analysis is necessary to learn about the current atmospheric trace gas conditions and to tackle global climate change by developing innovative techniques according to the observed data. A brief overview of the spatiotemporal analysis, including a few applications and methods, applied in the field of remote sensing of trace gases, will be discussed in the next section.
Spatiotemporal Analysis Many applications and methods of spatiotemporal data analysis have been reported for satellite-borne trace gas data. These can be broadly categorized depending on their research and application objectives. In this entry, mainly three broad analysis approaches for different research objectives are described in brief, the spatiotemporal characteristics by visualization, validation of remotely-sensed trace gas data, as well as important methods of analyzing trace gases. Other research
1383
objectives could be spatiotemporal decision-making, prediction, inversion, etc., hence the abovementioned ones are not exhaustive for the remotely-sensed trace gas data. Visualization Visualization plays a central role in the process of spatiotemporal analysis of trace gases and can be considered as one of the initial steps for further analysis. Some practical uses of the visualizations are detecting emission point sources of the trace gases, like power plants or tar sand, and monitoring changes in pollution over time due to anthropogenic activities, climate change policies, and regulations. A recent example is the measured reduction of NO2 in early 2020 with NASA’s satellite Aura in 2020 over China. The NO2 values were 10 – 30% lower than the normal range observed for this time period, which can be partly attributed to the economic slowdown after the coronavirus outbreak (How to Find and Visualize 2021). Any temporal window can be chosen to visualize and compare the trace gas concentrations depending on the need of the applications. For example, a temporal window of 1 day is chosen in Fig. 1 and the NO2 concentration data of two consecutive days of April 28 and 29, 2021, measured by TROPOMI, are shown here around the harbor in Bremerhaven, Germany. It is clearly seen that there is a burst in the NO2 concentration in the lower-left corner which is mostly caused by the nearby power plant in Wilhelmshaven and is subjected to further investigations. ESA’s TROPOMI is estimating the tropospheric column density of NO2 in molec/ cm2. It is not feasible to measure surface NO2 with the current satellite instruments, but it is possible to estimate tropospheric NO2 vertical column density, which correlates well with the values on the surface in industrialized regions. Both databases of TROPOMI and OMI NO2 data can be accessed via Earthdata Search (2021), typically in hierarchical data format. With the freely available tool Panoply from NASA, this data can be easily explored and visualized. Another possible way to explore the NO2 data is delivered by the application called Giovanni. The data can be viewed as a seasonal or timeaveraged map, scatter plot, or as a time-series for the chosen temporal window. Similarly, another tool to visualize the global carbon cycle is the Global Carbon Atlas, established by the Global Carbon Project (Global Carbon Atlas 2021). It provides the estimates of the emission generated from the natural processes and anthropogenic activities and updates the database annually. Pulse GHGSat (2021) is a visualization tool to check global methane concentrations around the world, established by GHGSat. It is updated weekly and shows monthly concentrations, averaged with the resolution of 2 km 2 km over land. Validation of Datasets After the launch of a satellite, the instruments onboard have to be recalibrated regularly to make sure that they are providing
S
1384
Spatiotemporal Analysis
Spatiotemporal Analysis, Fig. 1 The two images are illustrated with the measurement of TROPOMI’s column-averaged NO2 on April 28 (LHS image 1) and 29, 2021 (RHS image 2), respectively, in the region around the harbor in Bremerhaven, Germany. The second image shows a burst in the NO2 measurements in the lower-left corner which is mostly caused by the nearby power plant and subjected to further investigations
accurate measurements. Complete and accurate satellite measurements of the trace gases are heavily influenced by the poor signal-to-noise ratio and dense cloud cover over the observed area, in consequence of which missing/erroneous measurements can be found in some locations. It is not possible to directly measure the near-surface concentrations below the clouds. There are bad weather conditions, faults in the sensors, or several other factors, due to which the uncertainty of the remotely-sensed trace gas measurement can be high. As rightly pointed out by Loew et al. (Loew et al. 2017), the uncertainties present in the satellite data are the known unknown. Further, for the general quality assurance purposes also, validation is an integral part of the satellite measurements. The spatiotemporal validation process of the remotely-sensed trace gases is typically based on a comparison with more reliable ground-based networks. In (Verhoelst et al. 2021), Sentinel 5p TROPOMI’s first 2 years’ NO2 measurements are validated against multiple ground-based instruments worldwide, such as Multi-Axis DOAS, NDACC Zenith-Scattered-Light DOAS, and 25 PGN/Pandora instruments. Another example is the Total Column Observing Network (TCCON), which is a global network of ground-based Fourier transform spectrometers. It provides accurate measurements of different trace gases, such as CO2, CH4, and others (Hardwick and Graven 2016). The instruments of TCCON offer a spectral resolution of 0.02 cm1 and temporal resolution of about 90s. Another global network for GHG measurements was started by Karlsruhe Institute of Technology, Germany, using the EM27/SUN spectrometer, named as COCCON (Frey et al. 2019). A similar ground-based urban GHG network (MUCCnet) was established in 2019 by the Technical University of Munich, Germany, which consists of five fully automated and highly
precise Fourier-transform infrared spectroscopy (FTIR) monitoring systems, distributed in and around the city of Munich (Dietrich et al. 2021). It measures the column concentrations of CO2, CH4, and CO throughout the year by using the sun as light source. Such local networks are essential in providing more reliable ground-based measurements for local/regional level validation and general calibration applied to the satellite instruments. Spatiotemporal Analysis Methods A brief overview of the common methods applied for different applications is presented here. The geostatistical interpolation methods like kriging and many of its variants, such as simple kriging, ordinary kriging, universal kriging, spatial block kriging, fixed rank kriging, etc., can be applied for predicting missing data, that occur due to cloud cover, poor signal due to bad weather conditions, etc. (Bhattacharjee et al. 2013). A recent work (Bhattacharjee and Chen 2020) has analyzed and made use of the trace gas emission estimates, which in turn contributes to their atmospheric concentrations. It has adopted the multivariate interpolation method to predict local column-averaged CO2 distribution by combining the auxiliary emission estimates and land use/land cover distribution of the study areas. As mentioned before, the prediction results are validated against the TCCON data. Wind is having a significant influence on the atmospheric transport pattern of the trace gases (Zhao et al. 2019), and these local spatiotemporal processes in turn have a high impact on the atmospheric concentrations of trace gases like CO2. Wind develops as a result of spatial differences in atmospheric pressure, which occurs at sea-land transitions due to the uneven absorption of solar radiation (Jacob 1999). The rapidly changing wind speed and wind directions
Spatiotemporal Analysis
can only be taken properly into account when the scope of the spatiotemporal analysis changes into a smaller time period. Anthropogenic CO2 emissions are perturbing the natural balance of CO2 and causing its atmospheric concentration to rise. The emitted particles of CO2 get transported by the wind from one place to another, which can change the concentration in a particular location, depending on the wind speed and direction. Therefore, we should take the wind information into account for short-term spatiotemporal analysis. One way to incorporate this information is to use the footprint information, which can be created using the Lagrangian particle dispersion model (LPDM) driven by meteorology models. In combination with surface fluxes, it is possible to determine concentration changes at any region (Bhattacharjee et al. 2020). Similarly, the high-resolution Weather Research and Forecasting model, in combination with GHG modules (WRF-GHG), can model precise mesoscale atmospheric GHG transport, especially in urban areas. In (Zhao et al. 2019), the WRF-GHG model is implemented to simulate the photosynthetic activity of vegetation, emission, and transport of CO2 and CH4 in the city of Berlin in 1 km resolution. Inverse analysis (Jacob et al. 2016) is another important and widely used approach for flux inversion to understand the source and sink distribution at the Earth’s surface. The basic idea is to combine global atmospheric transport models with trace gas concentration measurements for estimating the relationship between flux and tracer distributions with different inverse techniques, such as Bayesian inversion, adjoint approach, Markov Chain Monte Carlo (MCMC), etc. The accuracy of the inverse modeling technique depends on the quality of the transport field, the prior flux information used, etc. Many studies have used satellite observations for flux inversion to facilitate modeling very large atmospheric datasets. The inversion technique has been widely applied for CO2, CH4, NO2, and other trace gas measurements from different satellite observations including OCO-2, GOSAT, Sentinel-5P, etc. Finally, different machine learning approaches are also currently being used for the spatiotemporal analysis of trace gases for different applications. In (Lary et al. 2018), the potential of the machine learning methods for the mandatory bias correction and cross-calibration of the measurement instruments is discussed. Studies are reported to infer the CO2 flux over land region using an LSTM recurrent neural network using OCO-2 data combined with CO2 flux tower data (Nguyen and Halem 2018). Machine learning classifiers are being used to identify bad pixels from the satellite measurements (Marchetti et al. 2019), to estimate surface concentration from the vertical column densities (Chan et al. 2021), etc.
1385
Summary or Conclusions Several aspects of spatiotemporal analysis of trace gases have been discussed, including visualization, validation, and different spatiotemporal analysis methods, such as missing data handling, atmospheric transport modeling, inverse modeling, machine learning methods, etc. Each one of them explores the characteristics of atmospheric trace gases like CO2, CH4, NO2, etc., in different application domains and help to understand the global and local atmospheric processes worldwide. Satellite-borne trace gas data, combined with various groundbased monitoring networks, are the foundation that enables a broad spectrum of their spatiotemporal analysis. Different investigations around the globe have been mentioned here in order to show traditional methods for the spatiotemporal analysis of trace gases and investigate the recent extensions created with data fusion approaches in the future. Though the discussion is not exhaustive, it gives the initial pointers for further exploration.
Cross-References ▶ Bayesian Inversion in Geoscience ▶ Data Visualization ▶ Forward and Inverse Stratigraphic Models ▶ Interpolation ▶ Inversion Theory ▶ Kriging ▶ Machine Learning ▶ Markov Chain Monte Carlo ▶ Remote Sensing ▶ Spatial Analysis
Bibliography Bhattacharjee S, Chen J (2020) Prediction of satellite-based column CO2 concentration by combining emission inventory and LULC information. IEEE Trans Geosci Remote Sens 58(12):8285–8300 Bhattacharjee S, Mitra P, Ghosh SK (2013) Spatial interpolation to predict missing attributes in GIS using semantic kriging. IEEE Trans Geosci Remote Sens 52(8):4771–4780 Bhattacharjee S, Chen J, Jindun L, Zhao X (2020) Kriging-based mapping of space-borne CO2 measurements by combining emission inventory and atmospheric transport modeling. In: EGU General Assembly Conference Abstracts, p 10076 Chan KL, Khorsandi E, Liu S, Baier F, Valks P (2021) Estimation of surface NO2 concentrations over Germany from TROPOMI satellite observations using a machine learning method. Remote Sens 13(5): 969 Crisp D (2015) Measuring atmospheric carbon dioxide from space with the Orbiting Carbon Observatory-2 (OCO-2). In: James JB, Xiaoxiong X, Xingfa G (eds) Earth observing systems xx, vol 9607. International
S
1386 Society for Optics and Photonics, SPIE, pp 1–7. https://doi.org/10. 1117/12.2187291 Dietrich F, Chen J, Voggenreiter B, Aigner P, Nachtigall N, Reger B (2021) MUCCnet: Munich Urban Carbon Column network. Atmos Meas Tech 14(2):1111–1126 Earthdata Search. Available at: https://search.earthdata.nasa.gov/search. Accessed on: 27 May 2021 Eldering A, Taylor TE, O’Dell CW, Pavlick R (2019) The OCO-3 mission: measurement objectives and expected performance based on 1 year of simulated data. Atmos Meas Tech 12(4):2341–2370 Frey M, Sha MK, Hase F, Kiel M, Blumenstock T, Harig R, Surawicz G et al (2019) Building the COllaborative Carbon Column Observing Network (COCCON): long-term stability and ensemble performance of the EM27/SUN Fourier transform spectrometer. Atmos Meas Tech 12(3):1513–1530 Giovanni the bridge between data and science v 4.36. Available at: https://giovanni.gsfc.nasa.gov/giovanni/. Accessed on: 24 May 2022 Global Carbon Atlas. Available at: http://www.globalcarbonatlas.org/en/ content/welcome-carbon-atlas; Accessed on: 29 May 2021 Hardwick S, Graven H (2016) Satellite observations to support monitoring of greenhouse gas emissions. Grantham Institute Research Paper No 16. Imperial College London. https://www.imperial.ac.uk/media/ imperial-college/grantham-institute/public/publications/briefingpapers/Satellite-observations-to-support-monitoring-of-greenhousegas-emissions-Grantham-BP-16.pdf. Accessed on: 24 May 2022 How to Find and Visualize Nitrogen Dioxide Satellite Data. Available at: https://earthdata.nasa.gov/learn/articles/feature-articles/health-andair-quality-articles/find-no2-data. Accessed on: 27 May 2021 Jacob DJ (1999) Introduction to atmospheric chemistry. Princeton University Press, Princeton Jacob DJ, Turner AJ, Maasakkers JD, Sheng J, Sun K, Liu X, Chance K, Aben I, McKeever J, Frankenberg C (2016) Satellite observations of atmospheric methane and their value for quantifying methane emissions. Atmos Chem Phys 16(22):14371–14396 Judd LM, Al-Saadi JA, Szykman JJ, Valin LC, Janz SJ, Kowalewski MG, Eskes HJ et al (2020) Evaluating Sentinel-5P TROPOMI tropospheric NO2 column densities with airborne and Pandora spectrometers near New York City and Long Island Sound. Atmos Meas Tech 13(11):6113–6140 Lary DJ, Zewdie GK, Liu X, Wu D, Levetin E, Allee RJ, Malakar N et al (2018) Machine learning applications for earth observation. In: Earth observation open science and innovation, vol 165. Springer Cham, Switzerland Loew A, Bell W, Brocca L, Bulgin CE, Burdanowitz J, Calbet X, Donner RV, Ghent D, Gruber A, Kaminski T, Kinzel J (2017) Validation practices for satellite-based Earth observation data across communities. Rev Geophys 55(3):779–817 Marchetti Y, Rosenberg R, Crisp D (2019) Classification of anomalous pixels in the focal plane arrays of Orbiting Carbon Observatory-2 and-3 via machine learning. Remote Sens 11(24):2901 Nguyen P, Halem M (2018) Prediction of CO2 flux using Long Short Term Memory (LSTM) Recurrent Neural Networks with data from Flux towers and OCO-2 remote sensing. In AGU Fall Meeting Abstracts, vol 2018, pp T31E-0364 Pulse GHGSat. Available at: https://ghgsat.com/en/pulse. Accessed on: 30 May 2021 Verhoelst T, Compernolle S, Pinardi G, Lambert J-C, Eskes HJ, Eichmann K-U, Fjæraa AM et al (2021) Ground-based validation of the Copernicus Sentinel-5p TROPOMI NO2 measurements with the NDACC ZSL-DOAS, MAX-DOAS and Pandonia global networks. Atmos Meas Tech 14(1):481–510 What is Remote Sensing?. Available at: https://earthdata.nasa.gov/learn/ backgrounders/remote-sensing. Accessed on: 14 July 2022 Zhao X, Marshall J, Hachinger S, Gerbig C, Frey M, Hase F, Chen J (2019) Analysis of total column CO2 and CH4 measurements in Berlin with WRF-GHG. Atmos Chem Phys 19(17):11279–11302
Spatiotemporal Modeling
Spatiotemporal Modeling Shrutilipi Bhattacharjee1, Johannes Madl2, Jia Chen2 and Varad Kshirsagar3 1 National Institute of Technology Karnataka (NITK), Surathkal, India 2 Technical University of Munich, Munich, Germany 3 Birla Institute of Technology and Science, Pilani, Hyderabad, India
Synonyms Environmental analysis; Spatiotemporal modeling; Outlier detection
Definition Spatiotemporal modeling covers a broad spectrum of algorithms dealing with spatiotemporal data and a wide range of applications in many fields. The generic idea behind these modeling approaches is to analyze the temporal pattern of target variables within a spatial domain of interest. Therefore, it is needless to say that such modeling approaches are usually complex in nature due to heterogeneous behavior in space and time domains, diverse and voluminous data to deal with, the associated uncertainties, etc. These models can be used for different purposes, such as estimation or prediction, spatiotemporal trend analysis, pattern recognition, and more. In this chapter, we have discussed some applications in the area of environmental data analytics and the spatiotemporal modeling approaches. We further discuss spatiotemporal outlier detection approaches, taking it as a case study, as it is one of the basic data-preprocessing approaches applied in several spatiotemporal analyses. These models can help detect anomalies present in the data which changes over time as well as space, hotspot detection, and understanding their effects on different applications.
Introduction With the ongoing trend toward big data and growing computational abilities, the prior computational limits of complex spatiotemporal analysis are being narrowed down further. New spatiotemporal approaches are investigated which are upgrading the strong foundations of well-established spatiotemporal analysis methods. The new possibilities are targeting many research areas that can be very different, widely spread, and of current interests, but must address the challenge of spatial data modeling that changes over time. Among many applications of spatiotemporal data modeling,
Spatiotemporal Modeling
one significant example can be investigating geospatial processes on our planet Earth. Geospatial data is a good example of spatiotemporal data. It changes with time as well as space, and some changes even might span centuries. There are many initiatives on a global scale to capture and monitor geospatial data in many forms. Satellite missions have been measuring geospatial environmental variables, for example, TIROS-1, launched by NASA in 1960 (Schnapf 1982), is considered to be the first successful weather satellite. Modeling these data has been an area of interest for a long time, mainly due to the global problem of climate change, global warming, and related subproblems. Geospatial data usually comes with spatial and temporal uncertainties due to the difficulty of capturing it. Therefore, once this data has been captured, it needs to be processed to remove erroneous readings and converted into data products. These data products are then used for modeling to infer implicit patterns. One of the major data-preprocessing stages that could be involved in any modeling processes is the outlier detection. Outliers can reveal significant information regarding the datasets and the applications. As these are the points that do not follow a specific pattern or are highly isolated from similar data points, investigating the reason behind these points being outliers can lead us to infer important conclusions about the specific problem domain. For example, outliers can reveal information about hotspots, coldspots, or any other point sources of the target variables. Here, the spatiotemporal modeling approaches and its characteristics, one application domain, one modeling approach, i.e., the outlier detection method in general, and a few algorithms for the spatiotemporal outlier detection are discussed.
Spatiotemporal Modeling Spatiotemporal modeling covers a broad spectrum of approaches and can include different research objectives. It refers to methodological formalization, which will accurately describe the behavior of a certain set of target variables that vary over space and time. The research and application objectives of spatiotemporal modeling can include multiple aspects as follows (Song et al. 2017): • Description of spatiotemporal characteristics • Exploration of potential features and spatiotemporal prediction • Modeling and simulation of spatiotemporal processes • Spatiotemporal decision-making Similarly, many models are associated with the abovementioned objectives. For example, models such as spatiotemporal kriging build variogram models that describes
1387
underlying spatiotemporal characteristics of the data. Similarly, Bayesian Maximum Entropy (BME) model (Christakos and Li 1998) can be used for incorporating prior information in case of limited data availability. Various types of regression models, such as spatiotemporal multiple regression, geographically and temporally weighted regression (GTWR), spatiotemporal Bayes hierarchical model (BHM), etc., are used for exploring potential factors and spatiotemporal prediction. On the other hand, popular modeling and simulation models include cellular automation (CA) and the geographical agent-based model (ABM). Spatiotemporal decision-making processes use models such as the spatiotemporal multicriteria decision-making model (MCDM).
Applications of Spatiotemporal Modeling These abovementioned objectives of the spatiotemporal modeling approaches are common for a wide range of applications. For example, the analysis and modeling of meteorological and atmospheric data, which falls under the broad spectrum of environmental applications, have implemented these models for different applications. In Qin et al. (2017), the geographically and temporally weighted regression (GTWR) model has been considered for predicting ground-level concentration of NO2 over centraleastern China, based on tropospheric NO2 columns provided by the Ozone Monitoring Instrument (OMI) combined with ambient monitoring station measurements and meteorological data. The approach showed better or comparable performance with respect to chemistry-based transport models. It is also observed that the people from densely populated areas are more prone to be affected by high NO2 pollution. Another application for spatiotemporal modeling can be found within the prediction of wildfire propagation and extinction, as shown in Clarke et al. (1994). According to Pechony and Shindell (2010), global fire activity is increasing and there are indications that the future climate will be the deciding factor of global fire tends, compared to the direct human intervention effects on the wildfires. Clarke et al. (Clarke et al. 1994) have used a cellular automaton model with additional environmental data, such as wind speed and magnitude, in order to predict the behavior of wildfire and how it will expand. On the other hand, for estimating the sparse columnaveraged dry-air mole fraction of carbon dioxide (XCO2) measured by the Orbiting Carbon Observatory-2 (OCO-2) satellite, kriging-based spatiotemporal and cokriging-based spatial interpolation have been used for generating a Level-3 XCO2 mapping (Bhattacharjee et al. 2020; Bhattacharjee and Chen 2020). The used method combined the satellite-borne data with auxiliary emission data and land use land cover data, which enhanced the prediction accuracy of XCO2. Lee
S
1388
Spatiotemporal Modeling
et al. (2018) have proposed an approach using support vector regression for predicting the weather station observations from a spatial perspective. Excluding potential measurement errors within the observations of automatic weather stations is important for increasing the quality of the meteorological observations and for further weather forecasting, disaster warning, or policy formulations. In this manuscript, all abnormal data points will be detected and excluded based on the difference between their actual and estimated values. Marchetti et al. (2019) have presented a machine-learning approach to identify the damaged or unusable pixels in the focal plane array (FPA) of NASA’s OCO-2 and Orbiting Carbon Observatory-3 (OCO-3). This task is mainly challenging because of the voluminous data and the diversity of anomalous behavior. Therefore, identifying abnormal factors or data points to increase the accuracy of the measurements or to identify the hotspots is one of the crucial tasks in the field of spatiotemporal modeling. In the next section, the spatiotemporal outlier detection methods and related algorithms are discussed as a representative of widely used methods in spatiotemporal applications.
Spatiotemporal Outlier Detection Method A spatial outlier is a data point or an object whose nonspatial attribute values are distinctly different from its neighboring data points. A point can be an outlier for a certain neighborhood even if it is typical considering the entire population. Based on these backgrounds, in statistics and data science, the three common categorizations of the outliers (in general) are global, contextual, and collective outliers (Han et al. 2012). When a data point or an object is significantly different compared to the entire sample population, is referred to as the global outlier. On the other hand, the data point deviations given a selected context are the contextual outliers. For example, a high greenhouse gas concentration may be typical for an industrial area, but could be a hotspot for a vegetation cover. The collective outliers are a subset of data points if they significantly deviate from the entire population as a group, but not individually in a global or contextual sense. The same
me →
Ti
Ti+1
definition can be extended for spatiotemporal outliers. In the case of spatial and spatiotemporal outliers, the difference exists in defining the neighborhood. For a spatial outlier, the neighborhood is the set of points within a certain distance of the point under consideration at a single timestamp, whereas, spatiotemporal outliers are data points which exhibit abnormal behavior within a region of interest at some timestamps over a consecutive period (Cheng and Li 2006). Checking and identifying the anomalous points in both spatial and/or timetemporal dimensions makes the spatiotemporal outlier detection approaches more complex compared to purely spatial or temporal outlier detection. A sample point might seem like an outlier at a certain timestamp (spatial outlier) but may not actually be one in a spatiotemporal context. For example, in the below map of a region with nine pixels, the distribution of a variable land use land cover at different timestamps is shown (Anbaroğlu 2009). At time Ti, the vegetation (green) pixel in the middle might be an outlier compared to the surrounding built-up area pixels (yellow), and are subjected to further investigations. However, if future timestamps are considered, it is realized that the vegetation is growing and forming a pattern temporally, hence not a spatiotemporal outlier considering four timestamps. According to Agarwal et al. (2017), the spatiotemporal outliers can be detected in two possible ways: • Combining the spatial and temporal outliers that are found in two separate processes • Using both the spatial (for example, latitude, longitude, etc.) and temporal (for example, timestamp) attributes as contextual attributes and find the outliers The authors have mentioned that the second approach is more general, integrated, and flexible because both spatial and temporal locality can be used to detect the outlier and both the dimensions can be weighed differently based on the application requirement. However, depending on the applications and the datasets used, either of the methods can produce the best results.
Ti+2
Spatiotemporal Modeling, Fig. 1 The change/growing of “vegetation” (green) pixel over the time
Ti+3
Spatiotemporal Modeling
Spatiotemporal Outlier Detection Algorithms The algorithm used to detect spatiotemporal outliers varies from one use case to another as it highly depends on applications, temporal frequency of data collection, number of available timestamps, size of the study region, etc. Some spatiotemporal outlier detection algorithms, used in different literature, will be discussed as follows. Cheng and Li (Cheng and Li 2006) propose a multiscale technique for spatiotemporal outlier detection. In the preliminary stage of the approach, spatiotemporal objects are classified into clusters using some clustering algorithms adapted to the characteristics of the data. Then, in the second phase, more spatiotemporal objects are aggregated in each cluster by increasing the window or scale of the cluster. An object, which was not classified into any cluster before might be assigned to one now, and objects might change clusters as well. This step is followed by a comparison of the results within the first and the second phases to detect potential spatial outliers. Finally, the potential spatial outliers will be observed over multiple time periods via human observations. If they show consistent behavior over time, they are no longer considered as the outliers. The remaining set of objects is then concluded to be spatiotemporal outliers. This method is more suitable for large geographical data with a very low temporal frequency. As the final phase is carried out by human observations, too many frames or maps will make it difficult to identify patterns or trends. In the manuscript, the authors have used a dataset with a temporal frequency of once per year. Wu et al. (2008) demonstrate the use of a grid-based approach for outlier detection in the South American precipitation data set obtained from the NOAA. Here, the top k outliers are detected and extracted from each of the time periods, and a sequence of outliers are stored using the Outstretch algorithm. At the end, all the possible extracted sequences are identified as the spatiotemporal outliers. Despite the large dataset that has been used during the investigation, the running time of the algorithm seems to be admirable. Rodgers et al. (2009) propose a statistical method to identify spatiotemporal outliers using the strangeness-based outlier detection algorithm (StrOUD). The “strangeness” measure of each object is the summation of the distances from all its nearest neighbors. Here, the distance between two points are measured as a weighted distance combining their geographical, temporal, and feature vectors distance. By comparing the strangeness factor of an individual object to the baseline strangeness of the normal objects, spatiotemporal outliers can be detected. If the difference is significant, then the object is said to be an outlier. When focusing purely on spatial outlier detection algorithms, Chen et al. (2008) have proposed a Mahalanobis distance-based approach. Here, the neighborhood of each of
1389
the data points is defined with k nearest neighbor clustering method, using the spatial attributes, such as latitude and longitude. The generated cluster defines the spatial region to which each data point is compared. All nonspatial attributes within the neighborhood will be summarized with the median value. Based on the Mahalanobis distance between the nonspatial attributes of each data point and its difference from the median, individual outliers can be detected. Once the calculated distance becomes too large and exceeds a predefined threshold, the corresponding data point will be treated as an outlier. Another approach for detecting spatial outliers within a given data set is the density-based local outlier factor (LOF) algorithm is proposed by Breunig et al. (2000). Here, the isolation of a data point with respect to its surrounding neighborhood decides its anomaly score. Hence, the samples within the areas of high density are less likely to be anomalous, since they are validated by a large number of similar data samples. Using domain knowledge or further investigations, outliers can be detected based on their assigned local outlier factors. In general, for the spatiotemporal applications, the purely spatial outlier detection algorithms, applied for each timestamp independently, might not be very suitable, as the temporal correlation needs to be taken care of by the models. The efficiency of the spatiotemporal models highly depends on the resolutions (spatial and temporal) of the application dataset, their autocorrelation, measurement uncertainty, etc. In some cases, adding additional context of the study region to the process may help to define an outlier more accurately.
Summary or Conclusions Spatiotemporal dataset has three components in general, attributes, space, and time. Modeling approaches of spatiotemporal data have covered a broad spectrum of applications in many fields, including environmental applications, crime hotspot analysis, healthcare informatics, transportation modeling, social media, and many others. Clustering, predictive learning, frequent pattern mining, anomaly detection, change detection, and relationship mining are the few broad categories of the modeling approaches irrespective of the applications (Atluri et al. 2018). This chapter discusses some modeling approaches used for environmental applications in general. Further, one spatiotemporal modeling approach of outlier detection is chosen and presented here. Outlier detection within the application data is an essential preprocessing step for most of the spatiotemporal applications. Some important literature on spatiotemporal outlier detection methods is also discussed. Though the applications and methods presented here are not exhaustive, this chapter
S
1390
gives the initial pointers for further exploration of the spatiotemporal models.
Cross-References ▶ Bayesian Inversion in Geoscience ▶ Data Visualization ▶ Forward and Inverse Stratigraphic Models ▶ Interpolation ▶ Inversion Theory ▶ Kriging ▶ Machine Learning ▶ Markov Chain Monte Carlo ▶ Remote Sensing ▶ Spatial Analysis ▶ Spatiotemporal Analysis
Spatiotemporal Weighted Regression Qin K, Rao L, Jian X, Bai Y, Zou J, Hao N, Li S, Chao Y (2017) Estimating ground level NO2 concentrations over Central-Eastern China using a satellite-based geographically and temporally weighted regression model. Remote Sens 9(9):950 Rogers JP, Barbara D, Domeniconi C (2009) Detecting spatiotemporal outliers with kernels and statistical testing. In: 2009 17th international conference on geoinformatics. IEEE, pp 1–6 Schnapf A (1982) The development of the TIROS global environmental satellite system. Meteorol Satellites-Past Present Future 7 Song Y, Wang X, Tan Y, Peng W, Sutrisna M, Cheng JCP, Hampson K (2017) Trends and opportunities of BIM-GIS integration in the architecture, engineering and construction industry: a review from a spatiotemporal statistical perspective. ISPRS Int J Geo Inf 6(12):397 Wu E, Liu W, Chawla S (2008) Spatiotemporal outlier detection in precipitation data. In: International workshop on knowledge discovery from sensor data. Springer, Berlin/Heidelberg, pp 115–133
Spatiotemporal Weighted Regression Bibliography Aggarwal CC (2017) An introduction to outlier analysis. In: Outlier analysis. Springer, Cham, pp 1–34 Anbaroğlu TCB (2009) Spatio-temporal outlier detection in environmental data. Spatial and Temporal Reasoning for Ambient Intelligence Systems 1 Atluri G, Karpatne A, Kumar V (2018) Spatiotemporal data mining: a survey of problems and methods. ACM Comput Surveys (CSUR) 51(4):1–41 Bhattacharjee S, Chen J (2020) Prediction of satellite-based column CO 2 concentration by combining emission inventory and LULC information. IEEE Trans Geosci Remote Sens 58(12):8285–8300 Bhattacharjee S, Dill K, Chen J (2020) Forecasting interannual spacebased CO 2 concentration using geostatistical mapping approach. In: 2020 IEEE international conference on electronics, computing and communication technologies (CONECCT). IEEE, pp 1–6 Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density- based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data. ACM, pp 93–104 Chen D, Chang-Tien L, Kou Y, Chen F (2008) On detecting spatial outliers. GeoInformatica 12(4):455–475 Cheng T, Li Z (2006) A multiscale approach for spatio-temporal outlier detection. Trans GIS 10(2):253–263 Christakos G, Li X (1998) Bayesian maximum entropy analysis and mapping: a farewell to kriging estimators? Math Geol 30(4):435–462 Clarke KC, Brass JA, Riggan PJ (1994) A cellular automaton model of wildfire propagation and extinction. Photogramm Eng Rem S 60(11): 1355–1367 Han J, Kamber M, Pei J (2012) Outlier detection. In: Data mining: concepts and techniques. Amsterdam, Boston, pp 543–584 Lee M-K, Moon S-H, Yoon Y, Kim Y-H, Moon B-R (2018) Detecting anomalies in meteorological data using support vector regression. Adv Meteorol 2018 Marchetti Y, Rosenberg R, Crisp D (2019) Classification of anomalous pixels in the focal plane arrays of orbiting carbon observatory-2 and-3 via machine learning. Remote Sens 11(24):2901 Pechony O, Shindell DT (2010) Driving forces of global wildfires over the past millennium and the forthcoming century. Proc Natl Acad Sci 107(45):19167–19170
Xiang Que1,2, Xiaogang Ma2, Chao Ma3, Fan Liu4 and Qiyu Chen5 1 Computer and Information College, Fujian Agriculture and Forestry University, Fuzhou, China 2 Department of Computer Science, University of Idaho, Moscow, ID, USA 3 State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Chengdu University of Technology, Chengdu, China 4 Google, Sunnyvale, CA, USA 5 School of Computer Science, China University of Geosciences, Wuhan, China
Definition Spatiotemporal weighted regression (STWR) (Que et al. 2020) is a new time dimension extended GWR-based model for analyzing local nonstationarity in space and time. Different from geographically and temporally weighted regression (GTWR) (Huang et al. 2010; Fotheringham et al. 2015), a novel concept of time distance, which is the rate of value difference between regression and observed points rather than the time interval, is proposed for weighting the local temporal effects. STWR utilizes a new designed weighted average form of spatiotemporal kernel for combing the spatial weights and temporal weights.
Introduction Geographically weighted regression (GWR) (Brunsdon et al. 1996; Fotheringham et al. 2003) is a useful tool for exploring the potential spatial heterogeneity in geospatial processes. It is
Spatiotemporal Weighted Regression
widely applied in many geoscience-related fields, such as climate science (Brown et al. 2012), geology (Atkinson et al. 2003), and environmental science (Mennis and Jordan 2005). In addition to spatial heterogeneity, temporal heterogeneity also exists widely in the natural and geospatial process. Although the geographically and temporally weighted regression (GTWR) (Huang et al. 2010) can incorporate the time dimension into GWR model, the fitting and prediction performances of GTWR usually do not outperform the basic GWR model. It is probably because of the design and configuration of the spatiotemporal kernel, which might limit the performance of spatiotemporal weight matrix in GTWR. The GTWR model (Huang et al. 2010) takes time as a new distance dimension, combining with the spatial dimension to calculate the spatiotemporal (Euclidean) distance, and then obtains the weight through the Gaussian or bi-square kernel. However, time and space are at different scales and dimensions, and their units, scales, and impacts on regression points are fundamentally different. Although GTWR uses the adjustment factor t to synthesize the distance between time and space, directly integrating the spatial distance and time distance is easily misleading (Fotheringham et al. 2015), and the performances of fitting and prediction are sometimes inferior to traditional GWR models. A GTWR (Fotheringham et al. 2015) model uses a set of time-isolated spatial bandwidths to calculate the weights of the influence of observation points on regression points in space and time. But the calibration process is cumbersome and cannot simultaneously optimize time and space bandwidth. Both GTWR models use the time interval as the time distance, and they use the distance decay kernel function to calculate the weight. It will cause the observation points of different spatial positions recorded at the same time in the past to have the same temporal impacts on the regression point. However, at these observed points, some values change significantly, while others may be almost unchanged. The magnitude of the value change has a different influence on the regression point. The more significant the value change during the time interval, the higher the impact it is. The rate of change of attribute values (nonstationarity in time) also has heterogeneity in spatial position. Using the time interval as the time distance to calculate the weight cannot fully capture the local spatiotemporal effects from observations to the regression point. Besides, the spatiotemporal kernel of both GTWR models takes the form of multiplying the time kernel and the space kernel function (whose value range of the kernel function is 0–1). However, the multiple forms may cause the problem of time and space interaction, supposing the temporal weight and the spatial weight of an observation point on the regression point is equal to 0.9, while the temporal weight to 0.1. If multiple forms are adopted, the weight value of the
1391
observation point on the regression point will eventually change by 0.09. The weight value might be improper for the spatiotemporal effects. Supposed a house B (observation point) is very close to a house A (regression point) in space. And the house price of B is observed in a relatively long-time interval from now (not exceeding the optimized temporal bandwidth). The multiplying weight values may not correctly reflect the spatiotemporal effect of the observation house price at B on the current house price at the regression point A. If the rates of house prices near A change fast, the past price of B will still have substantial impacts on the current price at A. But if adopted the time interval as the time distance, the composited weight value will be seriously underestimated. How to establish a spatiotemporal kernel to capture the actual weights of observation points to the current regression points becomes a problem worthy study. Unlike both geographical and temporal weighted regression (GTWR) (Huang et al. 2010; Fotheringham et al. 2015) mentioned above, STWR (Que et al. 2020) proposed a novel time-distance decay weighting method, in which the time distance is the rate of value difference through a time interval rather than the time interval between an observed point and a regression point. Based on the temporal weighting method (temporal kernel), the degree of temporal impact from each observed point to a regression point can be measured, which is a highlighted feature in STWR. Besides, a weighted average form of the spatiotemporal kernel, instead of multiplication form in GTWR (Huang et al. 2010; Fotheringham et al. 2015), was proposed in STWR, which can better capture the combined effects of a nonstationary spatiotemporal process from observed data than GTWR. These characteristics enable STWR to significantly outperform the GWR and GTWR in model fitting and prediction for the latest observed time stage. Thus, STWR is a practical tool for analyzing the local nonstationary in spatiotemporal process.
S
Model Formulation Principle of GWR GWR is the background of STWR; it is helpful to introduce the basic framework of GWR (Brunsdon et al. 1996; Fotheringham et al. 1996, 2003). The basic formulation of GWR is described in Eq. 1: y i ¼ b0 ð u i , v i Þ þ
bk ðui , vi Þxik þ ei c
ð1Þ
k
In Eq. 1, (ui, vi) is the coordinate of a point i, and yi is a response variable of the point i. xik is the kth dependent variable, and εi denotes the error term, which is assumed to be independent and drawn identically from the normal
1392
Spatiotemporal Weighted Regression
Spatiotemporal Weighted Regression, Fig. 1 Spatiotemporal impacts of observed points with different rates of value change on a regression point at time stage T. Temporal bandwidth is the length of
time from the intersection point A of the spatiotemporal bandwidth and the timeline to the regression point. Spatial bandwidth and spatiotemporal bandwidth are illustrated in the figure legend (Que et al. 2020)
distribution N(0, s2), i.e., i.i.d. The main difference between GWR and the general global regression method, such as ordinary least squares (OLS), is that GWR allows the coefficient βk(ui, vi) vary spatially to identify spatial heterogeneity. The estimated coefficient bk ðui , vi Þ can be expressed by Eq. 2:
and T are the four-time stages from the past to present (Fig. 1). STWR considers the effects of rate of value difference between different observation points and the regression point. Although some data points are farther away from the regression point in space, they may have more significant influences on the regression point (because of the different temporal effects among observed points). As shown at the T-p time stage in Fig. 1, some star points are farther from the regression point in space than some pentagonal points. STWR can better capture the mixed effects of time and space. If the time of the observation point is too old or the rate of numerical difference is too low, that is, exceeding the time-bandwidth, then the weight of temporal effects from the observation point to the regression point is 0.
bk ðui , vi Þ ¼ XT W ðui , vi ÞX
1 T
X W ðui , vi Þy
ð2Þ
In Eq. 2, W(ui, vi) denotes a diagonal matrix, whose diagonal elements represent the influence of each observation point on the regression point i. The weight values of the diagonal elements are obtained by the bi-square or Gaussian kernel function with inputting their distances. An optimal spatial bandwidth can be optimized through minimizing the errors of the regression model. After introducing the time dimension, to calibrate the regression point, the model not only needs to “borrow points” from the nearby spatial location but also “borrow points” from the near past time to capture local spatial and temporal effects. To obtain the weight matrix W(ui, vi), we need a spatiotemporal kernel and to optimize the temporal bandwidth and spatial bandwidth. Time-Distance Decay Weighting Strategy Based on the Rate of Value Difference between Regression and Observed Points Like the spatial distance decay weighting strategy, the timedistance decay weighting strategy is discussed in the previous GTWR model (Crespo et al. 2007; Huang et al. 2010; Wu et al. 2014; Fotheringham et al. 2015). Supposed T-s, T-q, T-p,
Spatiotemporal Kernel Suppose a set of observation points ODt ¼ ON t , ONt1 , . . . ONtq jDt ¼ ½t q, t is collected within a certain time interval Δt, where t denotes the current time stage and Nt i, i {0, 1, 2, . . ., q}(Nt ¼ Nt 0) is the number of points observed at each time stage. In STWR, the spatiotemporal takes the weighted average form: wtijST ¼ ð1 aÞks d sij , bST þ akT dtij , bT , 0 a 1 ð3Þ In Eq. 3, wtijST denotes the spatiotemporal weight on the observed location j at time stage t. ks and kT, both ranging from 0 to 1, are the spatial and temporal kernel, respectively. α is an
Spatiotemporal Weighted Regression
1393
adjustable parameter used to weigh the strength of effects from local time and space. It can be optimized by using the same search strategy as bandwidth optimization. bST is the spatial bandwidth bS at a certain time stage T, and bT denotes
the temporal bandwidth. dsij and dtij denote the spatial (Euclidean) and temporal distance between the regression point i and an observed data point j, respectively. In STWR, the temporal kernel kT is specified as below:
2 wtijST
¼ 1 þ exp
yiðtÞ yjðtqÞ =yjðtqÞ
1 , if 0 < Dt < bT
ð4Þ
Dt=bT
0, otherwise
In Eq. 4, Δt is the time interval.yi(t) yj(t q) denotes the value difference between the observed value of regression point i at t and the value of observation point j at t q. The weight will be set to zero if the time interval Δt is out of the range (0, bT). Compared with GTWR models, the temporal kernel kT of STWR can better capture the effects of temporal variation. Thus, it can represent that the faster the rate of value difference is during a certain time interval (within the temporal bandwidth), the bigger weight value it will be. In the spatiotemporal process context, the optimized spatial bandwidth can also vary over time. Various functions can be specified for describing the variation of spatial bandwidth
ð 1 aÞ 1 wtijST
¼
2 2 d sij bSt tan yDt
þa
over time. For convenience of model calibration, STWR assumed that the relationship is linear: bST ¼ bSt tany Dt,
p p 0 is a shape parameter), there is no derivative discontinuity, and the accuracy generally improves from algebraic to spectral (beyond any algebraic order). Regarding item #2, one can generalize (4) to N
sðxÞ ¼
lk fðkx xk kÞ,
ð5Þ
k¼1
where || || denotes the standard Euclidean 2-norm. Here, the nodes xk can be arbitrarily scattered in a d-dimensional space. For a wide choice of radial functions f(r), the linear system for the lk that arises when enforcing sðxk Þ ¼ yk (function values at the nodes) can never be singular, no matter how distinct nodes xk , k ¼ 1, 2, . . ., N are scattered in any number of dimensions. A wealth of theory and practical computational results and enhancements are available, such as
1. Using f(r) ¼ r2 log r and f(r) ¼ r generalizes to 2-D and 3-D, respectively, the cubic spline minimal curvature result. 2. As described above, the linear systems will have a full (rather than a sparse) coefficient matrix. However, the “local” RBF-FD (Radial Basis Function-generated Finite Differences) approach overcomes this. (RBF-FD for interpolation is closely related to kriging – a statistically motivated approach to also predict function values by weighted average of nearby known ones. Splines and kriging are compared in (Dubrule 1984).) 3. With RBF-FD, it can be advantageous to use f(r) ¼ r3 and f(r) ¼ r5 (as for cubic and quintic splines), but to supplement the RBF sum with multivariate polynomials. 4. Applications of RBF-FD extend well beyond data interpolation, to solving a wide range of PDEs in complex geometries. RBFs in the present application of geoscience are surveyed in (Flyer et al. 2014; Fornberg and Flyer 2015).
Summary and Conclusions Splines provide a routine approach for interpolating or for finding smooth approximations to data, especially in one dimension. A rich set of library routines are available in almost all programming environments. Available variations include reduction of data noise, adjusting smoothness, and preventing spurious oscillations (such as enforcing positivity), etc. Generalizations are also available both to gridded and to fully unstructured data in higher dimensions.
Bibliography Akima H (1970) A new method of interpolation and smooth curve fitting based on local procedures. J ACM 17(4):589–602 de Boor C (1962) Bicubic spline interpolation. J Math Phys 41(1–4):212–218 de Boor C (2001) A practical guide to splines, Revised edn. Springer, New York/Berlin/Heidelberg Dierckx P (1993) Curve and surface fitting with splines. Oxford University Press, Oxford Dubrule O (1984) Comparing splines and kriging. Comput Geosci 10(2–3):327–338 Flyer N, Wright GB, Fornberg B (2014) Radial basis function-generated finite differences: a mesh-free method for computational geosciences. In: Freeden W, Nashed M, Sonar T (eds) Handbook of geomathematics. Springer, Berlin/Heidelberg. https://doi.org/10. 1007/978-3-642-27793-1_61-1 Fornberg B, Flyer N (2015) A primer on radial basis functions with applications to the geosciences. SIAM, Philadelphia Powell MJD (1981) Approximation theory and methods. Cambridge University Press, Cambridge Prautzsch H, Boehm W, Paluszny M (2002) B́ezier and B-spline techniques. Springer, Berlin Schoenberg IJ (1946) Contributions to the problem of approximation of equidistant data by analytic functions. Q Appl Math 4:45–99, 112–141
S
1408
Standard Deviation
Standard Deviation Alejandro C. Frery School of Mathematics and Statistics, Victoria University of Wellington, Wellington, New Zealand
Definition The standard deviation is a measure of spread of either a random variable or a sample. When referring to a random variable, it requires the existence of at least the second moment. When referring to a sample, it requires at least two observations.
Overview Consider the real-valued random variable X: Ω ! ℝ, in which Ω is the sample space; the distribution of X is uniquely characterized by its cumulative distribution function FX(t) ¼ Pr(X t). Moreover, if X is continuous, then its distribution is also characterized by the probability density function fX(x) ¼ dFX(x)/dx. The first and second moments of the continuous random variable X are, respectively, E(X) ¼ xf (x)dx and E(X2) ¼ x2f (x)dx, provided the integrals exist. If X is discrete with values in X ¼ {x1, x2, . . .} ℝ, the distribution of X is uniquely characterized by the probability function pX ¼ (xi, pi ¼ Pr(X ¼ xi)). The first and second moments of the discrete random variable X are, respectively, E(X) ¼ ixipi and E X2 ¼ i x2i pi, provided the sums exist. If both E(X) and E(X2) are finite, then the variance of X is also finite and is given by Var(X) ¼ s2(X) ¼ E(X2) – (E(X))2. We then define the standard deviation of X as the square root of its
variance: SDðXÞ ¼ sðXÞ ¼ VarðXÞ . Since the variance cannot be negative, this quantity is always well defined. The standard deviation is a measure of the spread of the distribution around its expected value. It is expressed in the same units of the random variable. Table 1 presents the name, notation, probability density function, and standard deviation of some commonly used continuous distributions. We denote the indicator function of the set A as 1A(x), i.e., 1A(x) is one if x A and zero otherwise. The gamma function Γ(z) is the extension of the factorial function to complex and, in our case, real numbers: GðzÞ ¼ ℝþ xz1 ex dx. Notice that the gamma distribution reduces to the exponential distribution when α ¼ 1. Also, the Student’s t-distribution becomes the Cauchy distribution when n ¼ 1 (Fig. 1). Table 2 presents the name, notation, parameter space, probability function, and standard deviation of some commonly used discrete distributions. Figure 2 shows the probability function of a Poisson random variable with mean l ¼ p3.5, along with the mean and the area corresponding to l l. The probability in this area is Pr(2 X 5) ≈ 0.72. The Chebyshev’s inequality relates the mean and standard deviation of a random variable with the probability of observing values distant from the mean: PrðjX mj zsÞ
1 : z2
Consider the sample n of real values x ¼ (x1, x2, . . ., xn). The sample mean is x ¼ n1 ni¼1 xi, and the sample standard deviation is sðxÞ ¼ n1 ni¼1 ðxi xÞ2 . The user should always check the accuracy of the computational platform. Simple computations as, for instance, the sample standard deviation, are prone to gross numerical errors. The reader is referred to the work by Almiron et al. (2010) for an analysis of the numerical precision of five
Standard Deviation, Table 1 Continuous distributions, parameter space, their densities, expected values, and standard deviations Name and notation Uniform U (a,b)
Parameter space 1 < a < b < 1
fX(x) (b a)1(a,b)(x)
E(X) (a þ b)/2
Gamma Γ(α, β)
α, β > 0
ba xa1 expfbxg 1ℝþ ðxÞ GðaÞ
α/β
SD(X) p 1 2 3 ðb aÞ p a=b
Normal N(m, s )
m ℝ, s > 0
m
s
Student’s t t(n)
n>0
exp 12 ðxmÞ2 p2s 2ps2 Gððnþ1Þ=2Þð1þx2 =nÞ p npGðn=2Þ
Lognormal LN(m, s2)
m ℝ, s > 0
expfðln xmÞ2 =ð2s2 Þg p 1ℝþ ðxÞ xs 2p
Weibull W(l, k)
l, k > 0
kxk1 expfðx=lÞk g
2
lk
Beta B(α, β)
α, β > 0
b1
GðaþbÞx ð1xÞ GðaÞGðbÞ a1
ðnþ1Þ=2
1ℝþ ðxÞ 1ð0,1Þ ðxÞ
0 if n > 1,
n=ðn 2Þ if n > 2,
Otherwise undefined Exp{m þ s2/2}
Otherwise undefined
lΓ(1 þ 1/k)
l Gð1 þ 2=kÞ G2 ð1 þ 1=kÞ
α/(α þ β)
ðes2 1Þ expf2m þ s2 g
ab=ða þ b þ 1Þ=ða þ bÞ
Standard Deviation
1409
Standard Deviation, Fig. 1 Normal density, standard deviations, and probability Standard Deviation, Table 2 Discrete distributions, their probability functions, parameter space, expected values, and standard deviations Name and notation Binomial Bi( p, n)
Parameter space 0 < p < 1, n
Poisson Po(l)
l>0
Negative binomial NBi( p, n)
0 < p < 1, n
Pr(X ¼ k) n k p ð1 pÞnk k lkel/k! kþn1 ð1 pÞk pn n1
E(X) np l pn/(1 p)
SD(X) npð1 pÞ p p
l pn=ð1 pÞ
S
Standard Deviation, Fig. 2 Poisson probability function for l ¼ 3.5
1410
spreadsheets (Calc, Excel, Gnumeric, NeoOffice, and Oleo) running on two hardware platforms (i386 and amd64) and on three operating systems (Windows Vista, Ubuntu Intrepid, and Mac OS Leopard). The article discusses a methodology for assessing their performance with datasets of varying numerical difficulty.
Summary The standard deviation is a measure of the spread of either the probability of a distribution or the proportion of observed data around the mean.
Bibliography Almiron M, Vieira BL, Oliveira ALC, Medeiros AC, Frery AC (2010) On the numerical accuracy of spreadsheets. J Stat Softw 34(4):1–29. https://doi.org/10.18637/jss.v034.i04. URL http:// www.jstatsoft.org/v34/i04 Johnson NL, Kotz S, Kemp AW (1993) Univariate discrete distributions. Wiley series in probability and mathematical statistics, 2nd edn. Wiley, New York Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions. Wiley series in probability and mathematical statistics, vol 1, 2nd edn. Wiley, New York Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, vol 2, 2nd edn. Wiley, New York
Stationarity Francky Fouedjio Department of Geological Sciences, Stanford University, Stanford, CA, USA
Definition The stationarity concept refers to the translation invariance of all or some statistical characteristics of a random function (also referred to as random field or spatial stochastic process). A random function is called strictly stationary if its spatial distribution is invariant under spatial shifts. Second-order (or weakly or wide-sense) stationarity for a random function means that its first- and second-order moments are translation invariant. A random function is said to be intrinsically stationary if its increments are second-order (or weakly or widesense) stationary random functions. A random function is said to be locally (or quasi) second-order stationary if its first two moments are approximately stationary at any given position of a neighborhood moved around in the study domain.
Stationarity
Introduction A classical problem in geosciences is predicting a spatially continuous variable of interest over the whole study region, from measurements taken at some locations. Examples of spatially continuous variables include the elevation, the temperature, the moisture, the rainfall, the soil fertility, the permeability, the porosity, and the mineral grade. In geostatistics, the spatially continuous variable of interest is modeled as a random function (also referred to as random field or spatial stochastic process) and the observed data as the random function’s unique realization. Simplifying modeling assumptions such as the stationarity are often made on the random function to make the statistical inference possible. The notion of stationarity broadly refers to the invariance by translation of all or part of the random function’s spatial distribution. The need for a stationarity assumption is based on the fact that any type of statistical inference requires some sort of data replication. Two reasons prevent us from carrying out statistical inference in all generality. On the one hand, one has only a single realization of the random function; the observed data are considered as a unique realization of the random function. This takes all its meaning because there are not multiple parallel physical worlds. On the other hand, the realization is only partially known at certain sampling locations. However, this second restriction is not as problematic as the first one. The question of inference would still arise if one knew the reality exhaustively because the spatial distribution or its moments are not regional quantities (i.e., any quantity whose value is determined by the data). To evaluate them, it would take many realizations (multiple copies) of the random function, but these are purely theoretical and do not exist in reality. To get out of this impasse, certain restrictions on the random function are necessary. They appeal to the notion of stationarity, which describes, in a way, a form of spatial homogeneity of the random function. The basic idea is to allow statistical inference by replacing the replication on the realizations of the random function by the replication in the space. Thus, the values encountered in the different areas of the study region present the same characteristics and can be considered as different realizations of the same random function. Thus, the stationarity assumption provides some replication of the data, which makes statistical inference possible. There are different forms of stationarity, depending on which of the random function’s statistical characteristics are restricted and the working spatial scale. They include the strict stationarity, the second-order (or weakly or widesense) stationarity, the intrinsic stationarity, and the local (or quasi) second-order stationarity. These notions can be found in the following reference books: Cressie (2015), Wackernagel (2013), Chiles and Delfiner (2012), Olea (2012), Christakos (2012), Webster and Oliver (2007), Diggle
Stationarity
1411
and Ribeiro (2007), Goovaerts (1997), and Isaaks and Srivastava (1989).
stationarity implies the existence not only of the mean and the covariance function but also other second-order moments such as the variance, the variogram, and the correlogram: 8 x, x þ h D,
Formulation Var ðZ ðxÞÞ ¼ Cov ðZ ðxÞ, ZðxÞÞ ¼ Cð0Þ Suppose that the spatially continuous variable of interest is modeled as a random function (also referred to random field or spatial stochastic process) {Z(x) : x D ℝd} defined over a fixed continuous spatial domain of interest D ℝd, where ℝd is d-dimensional Euclidean space, typically d ¼ 2 (plane), and d ¼ 3 (space). The random function Z ( ) is said to be strictly stationary if its spatial distribution is invariant under translations, that is to say, Fx1 ,...,xk ðv1 , . . . , vk Þ ¼ Fx1 þh,...,xk þh ðv1 , . . . , vk Þ
ð1Þ
for all integers k ℕ, collections of points x1, . . ., xk in the spatial domain D, set of possible values v1, . . ., vk, and vectors h ℝd; in which F denotes finitedimensional joint distributions defined by Fx1 ,...,xk ðv1 , . . . , vk Þ ¼ P ðZ ðx1 Þ v1 , . . . , Zðxk Þ vk Þ. Under the strict stationarity assumption, finitedimensional joint distributions characterizing the spatial distribution of the random function stay the same when shifting a given set of points from one part of the study domain to another. In other words, a translation of a point configuration in a given direction does not change finite-dimensional joint distributions. Thus, the random function appears as homogenous and self-repeating in the whole spatial domain. The strict stationarity assumption is very restrictive because it assumes an identity of all probability distributions in the space. Moreover, it is only partially verifiable since it is made for all k ℕ, and it does not tell us about the existence of moments of the random function. The random function Z( ) is said to be second-order (or weakly or wide-sense) stationary if its first- and secondorder moments exist and are both invariant under translations over the spatial domain D, i.e., 8x, x þ h D, EðZ ðxÞÞ ¼ E ðZ ðx þ hÞÞ
ð2Þ
Cov ðZ ðx þ hÞ, Z ðxÞÞ ¼ CðhÞ
ð3Þ
1 Var ðZ ðx þ hÞ Z ðxÞÞ ¼ C ð0Þ C ðhÞ ¼ gðhÞ 2
ð5Þ
Cov ðZ ðx þ hÞ, Z ðxÞÞ C ð hÞ ¼ ¼ rðhÞ Var ðZ ðxÞÞ Var ðZ ðx þ hÞÞ C ð0Þ
ð6Þ
In particular, when h ¼ 0, the covariance comes back to the ordinary variance which must also be constant. Stationarity of the covariance implies stationarity of variance, and the variogram, and the correlogram. These moments also do not depend on the absolute position of the points where they are calculated but only on their separation. The second-order stationarity assumption requires the existence of the covariance function which might not exist. The strict stationarity requires all the moments of the random function (if they exist) to be invariant under translations. In particular, the strict stationarity implies the weak stationarity when the mean and the covariance function of the random function exist. These two forms of stationarity are equivalent if the random function is Gaussian. Indeed, since a Gaussian distribution is completely defined by its first two moments, knowledge of the mean and the covariance function suffices to determine the spatial distribution of a Gaussian random function. Overall, weak stationarity is less restrictive than strict stationarity. Moreover, it turns out that the weak stationarity condition is sufficient for most of the statistical results derived for spatial data to hold. For these reasons, weak stationarity is employed more often than strict stationarity. Various authors use the term stationarity to refer to weak stationarity. The random function Z( ) is called intrinsically stationary if for any vector h ℝd, the increment (or difference) {Zh(x) ¼ Z (x þ h) Z(x) : x D ℝd} is a second-order stationary random function. Specifically, E ð Z ð x þ hÞ Z ð xÞ Þ ¼ 0 Var ðZ ðx þ hÞ Z ðxÞÞ ¼ E
Thus, under the second-order stationarity, the expected value (or mean) of the random function is constant, i.e., the same at any point x of the spatial domain D. The covariance function which measures the second-order association between any pair of points depends only on the separation between them in both distance and direction. When a random function is second-order stationary, its first- and second-order moments are also said to be stationary. The second-order
ð4Þ
Z ð x þ hÞ Z ð xÞ 2 Þ
¼ 2g ðhÞ
ð7Þ ð8Þ ð9Þ
The intrinsic stationarity focuses on the second-order stationarity of the increments of the random function. It is less restrictive than the second-order stationary. Indeed, the second-order stationarity implies the intrinsic stationarity.
S
1412
Stationarity
The converse is not necessarily true. The intrinsic stationarity hypothesis expands the second-order stationarity hypothesis which no longer relates to the increments of the random function. The intrinsic stationarity is the reason that the variogram has emerged as the preferred measure of spatial association in geostatistics. The variogram can still be defined for cases where the covariance function and the correlogram cannot, and it does not require knowledge of the mean. The covariance function of an intrinsic stationary random function might not exist; it exists only if the variogram is bounded, in which case one has the relation, gð hÞ ¼ C ð 0Þ C ð hÞ
ð10Þ
Thus, under the second-order stationarity, it is equivalent to work with the covariance function or the variogram, one deduced from the other by the above relation. Classical Brownian motion in one dimension (or Wiener process) provides an example of a Gaussian random function that is intrinsically stationary, but not second-order stationary. It has independent and stationary Gaussian increments, and its variogram is given by γ(h) ¼ jhj while its covariance function is expressed as follows: Cov (Z (s), Z (t)) ¼ min (s, t), s, t > 0. Examples of realizations of second-order and intrinsically stationary random functions defined on the unit square are given in Fig. 1.
The different forms of stationarity presented previously do not explicitly involve the working spatial scale, which is nevertheless an essential parameter in practice. A model valid at a particular spatial scale may no longer be valid at larger or smaller spatial scales. In most applications (especially prediction problems), it is not useful for secondorder stationary or intrinsically stationary assumptions to be valid at the scale of the whole spatial domain, but only for distances less than a certain limit distance. Thus, they do not require global stationarity, but only local stationarity. A random function is said to be locally (or quasi) secondorder stationary if its mean is a very regular function varying slowly in space such that E (Z (x þ h)) ≈ E (Z (x)) if khk b (b > 0), and its covariance function Cov (Z (x þ h), Z (x)) only depends on the vector h for distances khk b (b > 0). Such random functions, which are not necessarily second-order stationary at the scale of the whole spatial domain, have a smooth mean function, which can be evaluated locally as a constant and a covariance function which can be considered locally as stationary. Under the local (or quasi) second-order stationarity assumption, the statistical inference of the first- and secondorder moments requires in practice to have enough data locally. Thus, the local second-order stationarity hypothesis is therefore a trade-off between the size of the neighborhood where the random function can be considered as stationary
Stationarity, Fig. 1 (a) example of realization of a second-order stationary random function; and (b) example of realization of an intrinsically stationary random function
Stationarity
and the number of data necessary to perform a reliable statistical inference. In addition to the stationarity assumption, isotropy assumption is often added, which extends the notion of translational invariance of stationarity by including rotational invariance (around the origin). In other words, the covariance function (or variogram) is simply a function of the Euclidean distance, i.e., C(h) ¼ C(khk) or γ(h) ¼ γ(khk). One then calls both the random function and the covariance function (or variogram) isotropic. Thus, stationarity means translation-invariance of the statistical properties of the random function, while isotropy is a corresponding rotationinvariance.
Limitations The assumption that all or some statistical characteristics of the random function are translation invariant over the whole spatial domain may be appropriate, when the latter is small in size, when there is not enough data to justify the use of a complex model, or simply because there is no other reasonable alternative. Although often justified and leading to a reasonable analysis, this assumption is often inappropriate and unrealistic given certain spatial data collected in practice. Stationarity assumption can be doubtful due to many factors, including specific landscape and topographic features of the study region or other localized effects. These local influences are reflected in the fact that observed data can obey to a
1413
covariance function (or variogram) whose characteristics vary across the study domain. In such a setting, a stationary assumption is not appropriate because it could produce less accurate predictions, including an incorrect assessment of the prediction error. Hence, the need to go beyond the stationarity. When the stationary assumptions do not hold, we are in the non-stationarity framework which means that some statistical characteristics of the random function are no longer translational invariant; they depend on the location. A random function that does not have the stationarity property is called nonstationary. One distinguishes different types of nonstationarity: nonstationarity in mean, nonstationarity in variance, and nonstationarity in covariance (or variogram). It is now recognized that most spatial processes exhibit a nonstationary spatial dependence structure when considering large distances. A detailed discussion on nonstationary random functions and the various models can be found in Fouedjio (2017). Figure 2 shows two examples of realizations of nonstationary random functions defined on the unit square.
Diagnostic Tools The random function’s modeling choice, either in the stationary framework or the nonstationary framework, is a fundamental decision to make during geostatistical analysis. A common practice to check for the second-order stationarity assumption is to informally assess plots of local experimental variograms computed at different subregions across the study
S
Stationarity, Fig. 2 Two examples of realizations of nonstationary random functions
1414
region. If the different local experimental variograms are bounded and do not differ notably, the assumption of second-order stationarity may be reasonable. Although a useful diagnostic tool, this graphical tool is challenging to assess and open to subjective interpretations. Hypothesis tests of the assumption of second-order stationarity may be more valuable. However, it is not straightforward to build such tests in the context of a single realization of the random function. Very few formal procedures to test for the second-order stationarity have been developed (Fuentes 2005; Jun and Genton 2012; Bandyopadhyay and Rao 2017). They are adaptations of stationarity tests for time series. Most of them are built up based on the spectral representation of the random function. They consist essentially of testing either the homogeneity or the uncorrelatedness of some variables defined in the frequency or spectral domain. Some of the testing procedures check both for stationarity and isotropy.
Summary and Conclusions In geostatistics, the spatially continuous variable of interest is modeled as a random function and the observed data as the random function’s unique realization. To infer the statistical characteristics of the spatial distribution of the random function from observed data, limiting assumptions are needed. They appeal to the notion of stationarity of the random function. It enables us to treat observed data as though they have the same degree of variation over a region of interest. Several stationary assumptions are possible: strict stationarity, second-order (or weakly or wide-sense) stationarity, intrinsic stationarity, and local (or quasi) second-order stationarity. Although stationarity assumptions are often violated in practice, they remain fundamental. They form the basic building blocks of more complex models, such as second-order nonstationary random function models. Indeed, in most of these models, the stationarity assumption is present, either locally or globally. The majority of second-order nonstationary random function models include some stationary random function models as special cases. Hence, the concept of stationarity plays a critical role in geostatistics. Although it is challenging to accept or reject a stationarity hypothesis in the context of a single realization of the random function, some diagnostic tools can help to decide whether this assumption is appropriate or not. Moreover, the relevance of a stationarity hypothesis can depend on the working spatial scale considered. In this entry, the concept of stationarity has been presented in the case of a scalar random function for univariate spatially continuous data. However, it extends to the case of a vectorvalued random function (i.e., set of scalar random functions stochastically correlated to one another) for multivariate spatially continuous data.
Statistical Bias
Cross-References ▶ Geostatistics ▶ Kriging ▶ Variogram
Bibliography Bandyopadhyay S, Rao SS (2017) A test for stationarity for irregularly spaced spatial data. J R Stat Soc B Stat Methodol 79(1): 95–123 Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty. Wiley Christakos G (2012) Random field models in earth sciences. Dover earth science. Dover Publications Cressie N (2015) Statistics for spatial data. Wiley Diggle P, Ribeiro P (2007) Model-based geostatistics. Springer series in statistics. Springer, New York Fouedjio F (2017) Second-order non-stationary modeling approaches for univariate geostatistical data. Stoch Env Res Risk A 31(8): 1887–1906 Fuentes M (2005) A formal test for nonstationarity of spatial stochastic processes. J Multivar Anal 96(1):30–54 Goovaerts P (1997) Geostatistics for natural resources evaluation. Applied geostatistics series. Oxford University Press Isaaks E, Srivastava R (1989) Applied geostatistics. Oxford University Press Jun M, Genton MG (2012) A test for stationarity of spatio-temporal random fields on planar and spherical domains. Stat Sin 22: 1737–1764 Olea R (2012) Geostatistics for engineers and earth scientists. Springer Wackernagel H (2013) Multivariate geostatistics: an introduction with applications. Springer, Berlin/Heidelberg Webster R, Oliver M (2007) Geostatistics for environmental scientists. Statistics in practice. Wiley
Statistical Bias Gilles Bourgault Computer Modelling Group Ltd., Calgary, AB, Canada
Definitions Sampling Bias
Estimation Bias
In its basic definition, statistical bias is simply the difference in values between a statistic that is estimated from a sample and its equivalent characterizing the entire population. It is an intrinsic sampling error that affects all statistics estimated from the partial sampling of a population. In a spatial context (geostatistics), a statistical estimate is localized in space, and multiple estimates are required for spatial estimation from a given sampling. Because all non-
Statistical Bias
1415
sampled locations are equivalent in regards of statistical estimation, it is the set of estimated values that can be considered for statistical bias. The estimated values will, in general, show a statistical bias in their distribution shape (less variance and less skewness) and in their spatial continuity (greater spatial continuity) when compared to the population values. For spatial estimation, the estimation bias is combined to the sampling bias.
Estimate and Estimator For any population statistical parameter (θ), and its estimate (θ) from a sample, the statistical bias is the difference. Bias ¼ y y
ð1Þ
This difference measures the statistical bias for a given sampling. It cannot be avoided and fluctuates from sampling to sampling. The most important factor affecting this statistical bias is the sample size since the statistic θ will converge towards the population value as the sample size increases. But in practice, this statistical bias can never be known from a single sampling since populations are always very large and can never be sampled exhaustively. Instead of correcting the statistical bias, one tries to minimize it. It calls on the unbiasedness of the estimate when it is viewed as a random variable (estimator). The sample estimator (θ) is said to be unbiased, if on average over many samplings, the expected value of the statistical bias is zero. E½y y ¼ 0
ð3Þ
For example, from a sample of n values (zi), the estimator z¼
n i¼1 zi
n
ð4Þ
is unbiased for estimating the mean of the population (Z). E½z ¼ E½Z
An estimator can be biased or unbiased and of variable precision. For a biased estimator, its expected value, over multiple samplings, will be different from the population value (E[θ] 6¼ θ). Most often, this type of bias would be introduced by a systematic calibration error in the instrumentations, or in the laboratory processes, used to measure each sample value. For an unbiased estimator, its sampling error may be small or large, depending on its precision. The precision is higher when the estimated statistic varies little between samplings. Usually, the precision increases with the sampling size. Some of the most important estimators’ characteristics are depicted in Fig. 1. Note that a biased estimator can still be precise without being accurate. Thus, precision does not guaranty accuracy. The precision of an estimator may also depend on the distribution shape of the population values to be sampled. The precision tends to be lower as the skewness of the distribution is more pronounced.
Spatial Estimate and Spatial Estimator The mean estimator (Eq. 4) for the global population can be adapted for local spatial estimation (geostatistics) when assuming that each location (u ¼ [ux, uy, uz]) is associated with a random variable Z(u). The set of all spatial locations, within a domain of interest, defines the population (Z). Kriging (Matheron 1965; Armstrong 1998) is a spatially weighted average that has been developed as the best linear unbiased estimator (BLUE) for spatial estimation. Multiple types of kriging exist, but ordinary kriging is the most used in practice.
ð2Þ
Or equivalently, the expected value of the estimator is equal to the population parameter. E½y ¼ y
Estimator Quality
ð5Þ
The unbiasedness of an estimator is easily demonstrated by replacing each sample value (zi) with an equivalent random variable (Zi), and assuming E[Zi] ¼ E[Z] for all Zi.
z ðu0 Þ ¼
n
l a z ð ua Þ
ð6Þ
a¼1
where z*(u0) is the kriging estimate at a spatial location u0, the z(uα)’s are the sample values at n locations (uα), and the lα’s are the kriging weights. The weights are optimum in the sense that they account for spatial correlation (variogram) between the n þ 1 locations involved (Isaaks and Srivastava 1989). The estimator is unbiased by ensuring that the sum of the weights equals 1, and assuming each sample value z(uα) can be replaced by an equivalent random variable (Z(uα)), with same mean as the spatial population (E[Z(uα)] ¼ E[Z] for all locations uα in the spatial domain). Note that a statistical bias can still be induced, even with an unbiased spatial estimator, when the sampling favors certain locations in the population (i.e., high grade zones in a mineral deposit). Random or systematic samplings can be used to avoid this type of bias.
S
1416
Statistical Bias
Statistical Bias, Fig. 1 Schematic representation for the distribution (over multiple samplings) of a biased estimator (left) and the distributions of two unbiased estimators (right). For the unbiased estimators, the
estimator 2 is more precise than estimator 1 since its distribution over multiple random samplings is tighter around the population value (in red)
Transfer Function Bias
the concept of conditional bias. Kriging is such an estimator that is globally unbiased but can be conditionally biased when each estimated location value is compared to the true value at that location (comparing z(u0) with z*(u0) for all estimation locations u0). In the spatial context, a special quality for an estimator is to be conditionally unbiased. On average, over the spatial domain, one expects that the estimator will predict the correct value. Thus, the estimator is conditionally unbiased if
Kriging is a particular estimator (Eq. 6) in the sense that it can generate multiple estimated values, one for each non-sampled location in the spatial domain. Because of estimation bias, the set of all estimated values ({z*}) will have different statistical properties, such as lower variance and greater spatial continuity (variogram), as compared with the set of their corresponding population values ({z}). This will, in general, introduce some functional bias.
Bias ¼ f ðfz gÞ f ðfzgÞ
E½Z j Z ¼ z ¼ E½Z j Z ¼ z
ð8Þ
ð7Þ or equivalently
The bias depends on the degree of estimation bias in the set of estimated values, and on the degree of nonlinearity of the function f () (i.e., spatial correlation, metal recovery function in mining, flow simulation in oil and gas). Conditional simulations (Goovaerts 1997; Deutsch and Journel 1998) can minimize this bias by restoring the data variance and the spatial variability in the estimated values. But contrary to the kriging estimates, simulation results are not unique as an infinite number of realizations can be simulated.
Conditional Bias Spatial estimators are designed to be unbiased globally for the population, E[Z*(u0)] ¼ E[Z(u0)] ¼ E[Z]. However, they can still be biased, on average over the spatial domain, when considering a particular estimate value. This brings about
E½Z j Z ¼ z ¼ z
ð9Þ
where Z is the kriging estimator viewed as a random variable, Z the random variable representing the spatial population, and z* a kriging estimate value. If one averages all true values, for all locations with the same kriging estimate value, then this average should be equal to that kriging estimate value. The conditional expectation is localized in the sense that the kriging estimates with the same given value z* are localized in space. These locations represent a subset of the population. The expectation is conditional to this subset (Eq. 8). Conditional unbiasedness is very important in mining to best categorize spatial locations as being mineralized or sterile for example. Minimizing conditional bias is at the origin of kriging. It was first addressed by Krige (1951) and revisited by Bourgault (2021).
Statistical Bias
1417
In the following, statistical biases and non-biases are illustrated for various statistics, and for the kriging estimator, in using samplings from synthetic datasets.
Global Fluctuations for Basic Statistics The effects of the sampling size and distribution shape on statistical bias are illustrated when sampling two synthetic populations with same mean and variance, but with different distributions. One is characterized with a normal distribution (no skewness) and the other is characterized with a lognormal distribution (highly skewed). Each population has a total of 2500 values that are spatially distributed on a regular 50 50 grid. The lognormal distribution is from the primary variable of GSLIB dataset (Deutsch and Journel 1998). The normal distribution is the lognormal distribution transformed in normal scores and rescaled to have the same mean and same variance as the lognormal distribution. Fluctuations in sampling mean, variance, histogram, and variogram are illustrated for sample sizes of 50, 100, and 200. For each sample size, 11 random samplings were drawn from the population. For each sampling, the same spatial locations were sampled in both populations. Table 1 presents the fluctuations for the sample mean and the sample variance, over the 11 random samplings, for the 3 different sampling sizes, for the lognormal population. Table 2 presents the fluctuations for the sample mean and the sample variance, over the 11 random samplings, for the 3 different sampling sizes, for the normal population. The results show that statistical bias from sampling can be significant, but on average, the mean and variance are fairly
well estimated for the two populations. The statistical bias fluctuations (precision), as measured by the range and the variance of each statistic, tend to diminish when the sampling size increases. The fluctuations are much less when the population is normally distributed as compared to lognormally distributed, especially for the variance. Figures 2, 3, and 4 show the histogram for each sampling with the average histogram over the 11 samplings and the histogram of the lognormal population. For each sampling size, the averaged histogram matches very well the true population histogram. However, the fluctuations between the 11 histograms increase as the sample size increases. Figures 5, 6, and 7 show the histogram for each sampling with the average histogram over the 11 samplings and the histogram of the normal population. Again, for each sampling size, the averaged histogram matches very well the true histogram. Contrary to the lognormal population, the fluctuations between the 11 histograms decrease as the sample size increases. In a spatial context, not only the data histogram is important, but also the data variogram as it needs to be modelled for measuring the spatial correlation required for spatial estimation by kriging (Isaaks and Srivastava 1989). Figures 8, 9, and 10 present the variogram for each sampling with the averaged variogram over the 11 samplings, and the variogram for the lognormal population. For each sampling size, the averaged variogram matches very well the population true variogram. The fluctuations between the 11 variograms diminish somewhat when the sample size increases. However, similar to the sample variance (Table 1), there is not much reduction when increasing the sampling size from 100 to 200. Figures 11, 12, and 13 present the variogram for each sampling with the
Statistical Bias, Table 1 Average, variance, and range for the sample mean and the sample variance for 11 random samplings of different size (50, 100, 200) of the lognormal distribution. The last column provides the mean and variance for the entire population (size ¼ 2500) Mean
Variance
Average Variance Range Average Variance Range
n ¼ 50 2.60 0.49 1.66–3.59 27.35 478.60 6.55–69.48
n ¼ 100 2.57 0.20 1.92–3.07 24.49 150.78 9.98–40.44
n ¼ 200 2.68 0.09 1.98–3.01 29.75 182.62 9.09–63.45
n ¼ 2500 2.58 0 26.53 0
Statistical Bias, Table 2 Average, variance, and range for the sample mean and the sample variance for 11 random samplings of different size (50, 100, 200) of the normal distribution. The last column provides the mean and variance for the entire population (size ¼ 2500) Mean
Variance
Average Variance Range Average Variance Range
n ¼ 50 2.28 0.25 1.33–3.19 28.71 47.07 18.46–38.85
n ¼ 100 2.46 0.17 1.79–3.25 27.11 18.62 20.02–33.77
n ¼ 200 2.60 0.12 2.0–3.13 27.39 4.85 24.17–31.08
n ¼ 2500 2.58 0 26.53 0
S
1418
Statistical Bias
Statistical Bias, Fig. 2 Histograms for 11 samplings (s1–s11) of 50 sample values each for the lognormal population. The averaged histogram (in black) can be compared to the true histogram (in red)
Statistical Bias, Fig. 3 Histograms for 11 samplings (s1–s11) of 100 sample values each for the lognormal population. The averaged histogram (in black) can be compared to the true histogram (in red)
Statistical Bias
1419
Statistical Bias, Fig. 4 Histograms for 11 samplings (s1–s11) of 200 sample values each for the lognormal population. The averaged histogram (in black) can be compared to the true histogram (in red)
averaged variogram over the 11 samplings and the variogram for the normal population. For each sampling size, the averaged variogram matches very well the true variogram. Contrary to the lognormal population, and similarly to the sample variance of the normal population (Table 2), the fluctuations between the 11 variograms diminish significantly when the sample size increases.
Local Fluctuations for Spatial Estimation For both populations, ordinary kriging was used for spatial estimation (Eq. 6) for all samplings. Figure 14 shows a crossplot for the true values (z) versus their kriging estimates (z*) for the sampling size of 50 in the lognormal population. Although the kriging estimates are not globally biased (E[Z*] ¼ E[Z]), they show a regression line with a slope smaller than 1. This is typical for kriging estimates (or any type of linear spatial averaging). The regression line with a slope smaller than 1 indicates that, on average over the field, the kriging estimates tend to over-estimate the true values when the kriging estimates are greater than their global mean, and they tend to underestimate the true values when
they are smaller than their global mean. In terms of conditional expectations (Eq. 9): E[Z | Z* ¼ z*] < z* when z* > E [Z*] and the opposite E[Z | Z* ¼ z*] > z* when z* < E[Z*]. It is only when the kriging estimates are equal to their global mean that the conditional bias disappears as E[Z | Z* ¼ E [Z*]] ¼ E[Z*] (¼ E[Z] since kriging is globally unbiased). A spatial estimator such as kriging, achieves global unbiasedness at the price of conditional biases. Note that kriging is still the best spatial estimator. Other linear estimators will usually show larger conditional biases (Isaaks and Srivastava 1989). Bourgault (2021) has shown that the slope of the linear regression is 1 (45 ͦ line) when a spatial estimator is both, globally (Eq. 3) and conditionally unbiased (Eq. 9). Therefore, the slope value of the linear regression between the true values versus their kriging estimates is a good measure of the degree of conditional bias. Table 3 gives the average, variance, and range of the regression slopes over the 11 samplings for each of the 3 sampling sizes for the lognormal population. Table 4 gives the average, variance, and range of the regression slopes over the 11 samplings for each of the 3 sampling sizes for the normal population. Results show significant statistical fluctuations between samplings, especially for the lognormal population. For both
S
1420
Statistical Bias
Statistical Bias, Fig. 5 Histograms for 11 samplings (s1–s11) of 50 sample values each for the normal population. The averaged histogram (in black) can be compared to the true histogram (in red)
populations, the averaged regression slope increases closer to 1 with the sampling size. Thus, on average, conditional bias diminishes as the sampling size increases. But for all sampling sizes, the regression slope is on average larger, and its range and variance are smaller, for samplings in the normal population. The normal population is less prone to conditional bias than the lognormal population, even if the variance of the estimates is similar for both.
Spatial Correlation Figure 15 shows the variograms calculated from the 2300 kriging estimates, for each of the 11 samplings of size 200, in the lognormal population. Typically, the curves are lower (less variance), especially near the origin of the plot, when compared to the true variogram of the population. Also, the curves gradient near the origin of the plot becomes smaller than for the true variogram. This is characteristic of a smoother spatial correlation (or greater spatial continuity) (Chiles and Delfiner 2012). Smoothing is a common form of bias in the earth sciences in general and particularly with kriging. Contrary to the averaged variogram from the sample values (Fig. 10), the averaged variogram from the kriging estimates shows a clear bias when compared to the true
variogram. This estimation bias tends to be more severe for kriging estimates from the smaller sampling sizes of 50 and 100 (not shown).
Global Fluctuations for Statistics from Truncated Distributions Even, if kriging estimates are globally unbiased, and may show minimal conditional bias, another type of statistical bias is yet present. The expected value above a given threshold (t) will be underestimated due to the estimation bias (smoothing) of the kriging estimates (Isaaks 2004; Bourgault 2021). A global bias still exists when truncating the distribution for values above a threshold. This truncation is often required in the practice of mining as not all mineralized spatial locations in the population can be considered economical and should not all be considered for mineral recovery. As an example, Tables 5 and 6 give the expected value for values above the first quartile of the entire population. As with other statistics, results show significant statistical fluctuations between samplings for both populations. Increasing the sampling size tends to reduce the fluctuations. It is observed that the average expected value, for values above the first quartile over all samplings, is not biased for the sample
Statistical Bias
1421
Statistical Bias, Fig. 6 Histograms for 11 samplings (s1–s11) of 100 sample values each for the normal population. The averaged histogram (in black) can be compared to the true histogram (in red)
S
Statistical Bias, Fig. 7 Histograms for 11 samplings (s1–s11) of 200 sample values each for the normal population. The averaged histogram (in black) can be compared to the true histogram (in red)
1422
Statistical Bias
Statistical Bias, Fig. 8 Variograms for 11 samplings (s1–s11) of 50 sample values each for the lognormal population. The averaged variogram (in black) can be compared to the true variogram (in red)
Statistical Bias, Fig. 9 Variograms for 11 samplings (s1–s11) of 100 sample values each for the lognormal population. The averaged variogram (in black) can be compared to the true variogram (in red)
values (z). However, it is biased (smaller) for the estimated values by kriging (z*), because the kriging estimates are smoother (less variance ranging from about 15 to 18 from Tables 3 and 4) than the population (26.53 from Tables 1 and 2) (see also Fig. 15).
Correcting Conditional Bias Bourgault (2021) has shown that conditional bias could be corrected by averaging the kriging estimates over all the random samplings. Tables 7 and 8 provide the regression
Statistical Bias
1423
Statistical Bias, Fig. 10 Variograms for 11 samplings (s1–s11) of 200 sample values each for the lognormal population. The averaged variogram (in black) can be compared to the true variogram (in red)
S
Statistical Bias, Fig. 11 Variograms for 11 samplings (s1–s11) of 50 sample values each for the normal population. The averaged variogram (in black) can be compared to the true variogram (in red)
slopes between the true values and the averaged kriging estimates (avg(z*)), over the 11 random samplings, for each sample size.
From the results of Tables 7 and 8, it is observed that averaging the kriging estimates corrects well the conditional bias for the lognormal distribution but tends to overcorrect
1424
Statistical Bias
Statistical Bias, Fig. 12 Variograms for 11 samplings (s1–s11) of 100 sample values each for the normal population. The averaged variogram (in black) can be compared to the true variogram (in red)
Statistical Bias, Fig. 13 Variograms for 11 samplings (s1–s11) of 200 sample values each for the normal population. The averaged variogram (in black) can be compared to the true variogram (in red)
(slopes larger than 1) for the normal distribution, especially for the smaller sampling sizes. In such a case, instead of averaging the kriging estimates, it is advised to pick the sampling that is associated with the smaller data variance.
For the normal distribution, the sample with the smallest data variance for sampling size 50 is associated with kriging estimates showing a regression slope of 0.84, 0.86 for sampling size of 100, and 0.91 for sampling size 200. Bourgault (2021)
Statistical Bias
1425
Statistical Bias, Fig. 14 Crossplot of true values versus kriging estimates (Z*) using a sample size of 50 for the lognormal population. The regression line is shown in green, the 45 ͦline is shown in black and the horizontal and vertical lines indicate the global means
Statistical Bias, Table 3 Average, variance, and range for the regression slope between the true values and their corresponding kriging estimates for 11 random samplings of different size (50, 100, 200) of Regression slope
Variance of estimates
Average Variance Range Average
the lognormal distribution. Last row provides the averaged variance for the (n*) kriging estimates
n ¼ 50 0.5 0.06 0.19–0.9 15.61 (n* ¼ 2450)
n ¼ 100 0.62 0.04 0.35–0.95 14.47 (n* ¼ 2400)
n ¼ 200 0.68 0.04 0.36–0.97 16.17 (n* ¼ 2300)
Statistical Bias, Table 4 Average, variance, and range for the regression slope between the true values and their corresponding kriging estimates for 11 random samplings of different size (50, 100, 200) of the normal distribution. Last row provides the averaged variance for the (n*) kriging estimates Regression slope
Variance of estimates
Average Variance Range Average
n ¼ 50 0.71 0.01 0.55–0.94 15.28 (n* ¼ 2450)
has shown that, in general, samplings with a smaller data variance are associated with a regression slope closer to 1.0 for their kriging estimates. Comparing Table 7 and Table 3 shows that correcting for conditional bias is associated with an important variance reduction for the estimates (ranging from about 14 to 16 from Table 3 versus ranging from about 5 to 9 for the corrected kriging estimates Table 7).
n ¼ 100 0.82 0.003 0.76–0.91 16.5 (n* ¼ 2400)
n ¼ 200 0.90 0.002 0.84–0.96 18.57 (n* ¼ 2300)
For the lognormal population, the averaged kriging estimates (avg(z*)) are not conditionally biased (Eq. 9, Table 7). Thus, they are the best estimates to categorize the spatial locations in terms of ore (avg(z*) > t) or waste (avg (z*) < t), for example. This brings back the topic of expected value for truncated distribution (E[Z | Z > t]). Table 9 shows
S
1426
Statistical Bias
Statistical Bias, Table 5 Average, variance, and range for the average above the population first quartile (Q1) for the sample (z) and the (n*) estimated (z*) values of the population for 11 random samplings Q1 ¼ 0.34 E[Z | Z > Q1]
E[Z* | Z* > Q1]
Average Variance Range Average Variance Range
n ¼ 50 3.44 1.13 1.95–5.2 n* ¼ 2450 3.16 0.81 2.39–5.33
Statistical Bias, Table 6 Average, variance, and range for the average above the population first quartile (Q1) for the sample (z) and the (n*) estimated (z*) values of the population for 11 random samplings Q1 ¼ 0.89 E[Z | Z > Q1]
E[Z* | Z* > Q1]
Average Variance Range Average Variance Range
n ¼ 50 4.72 0.48 3.5–6.09 n* ¼ 2450 3.93 0.31 3.19–5.3
of different size (50, 100, 200) of the lognormal distribution. The last column provides the statistic for the entire population n ¼ 100 3.4 0.41 2.39–4.55 n* ¼ 2400 3.0 0.28 2.22–3.83
n ¼ 200 3.51 0.13 2.76–4.07 n* ¼ 2300 3.13 0.15 2.52–3.83
n ¼ 2500 3.37 0 n* ¼ 0
of different size (50, 100, 200) of the normal distribution. The last column provides the statistic for the entire population n ¼ 100 4.73 0.21 3.76–5.47 n* ¼ 2400 3.99 0.1 3.29–4.33
n ¼ 200 4.81 0.21 4.46–5.29 n* ¼ 2300 4.27 0.04 3.83–4.57
n ¼ 2500 4.77 0 n* ¼ 0
Statistical Bias, Fig. 15 Variograms using the kriging estimates (k1–k11) from the 11 samplings (s1–s11) of sample size 200 in the lognormal population. The averaged variogram (in black) can be compared to the true variogram (in red)
their expected value for values above the first quartile of the true population. From the results of Table 9, it is seen that avg(z*), over the 11 samplings, is not really conditionally biased for estimating E[Z | avg(Z*) > Q1]. However, it is underestimating E[Z | Z > Q1] ¼ 3.37 from the lognormal population. Even if E[Z |
Z > Q1] ¼ 3.37 is fairly well estimated by the sample data values (Table 5), it is always underestimated by the kriging estimates, and even more so when they are corrected for conditional bias (Tables 5 and 9) to improve the selectivity for locations above or below the threshold (i.e., between ore and waste). Indeed, on average over the 11 samplings, avg
Statistical Bias
1427
Statistical Bias, Table 7 Regression slope between the true values and their corresponding averaged kriging estimates for 11 random samplings of different size (50, 100, 200) of the lognormal distribution. Last row provides the variance for the averaged kriging estimates over the 2500 spatial locations Regression slope Variance of estimates
n ¼ 50 1.07 5.08
n ¼ 100 1.1 6.92
n ¼ 200 1.07 8.95
Statistical Bias, Table 8 Regression slope between the true values and their corresponding averaged kriging estimates for 11 random samplings of different size (50, 100, 200) of the normal distribution. Last row provides the variance for the averaged kriging estimates over the 2500 spatial locations Regression slope Variance of estimates
n ¼ 50 1.36 7.76
n ¼ 100 1.26 10.61
n ¼ 200 1.14 14.56
Statistical Bias, Table 9 Average above the population first quartile (Q1) for the averaged kriging estimates for 11 random samplings of different size (50, 100, 200) of the lognormal distribution. The second row provides the conditional expectation for the true values when their corresponding estimates are above the first quartile. The last row provides the averaged conditional expectation, over the 11 random samplings, of the true values when their corresponding kriging estimates are above the first quartile Q1 ¼ 0.34 E[avg(Z*) | avg. (Z*) > Q1] E[Z | avg.(Z*) > Q1] avg(E[Z | Z* > Q1])
Mean Range
n ¼ 50 2.87
n ¼ 100 2.71
n ¼ 200 2.93
2.59 2.81 2.7–2.96
2.65 2.89 2.8–3.02
2.80 2.96 2.84–3.17
(E[Z | Z* > Q1]), for the kriging estimates, is always a bit closer to the true value (3.37) than E[Z | avg(Z*) > Q1] for the averaged kriging estimates. The underestimation by the averaged kriging estimates has its source in the lower variance for the conditionally unbiased estimates (compare variance of estimates in Tables 3 and 7).
Correcting for Estimation Bias Some authors (McLennan and Deutsch 2002; Journel and Kyriakidis 2004) have proposed to estimate E[Z | Z > t] using conditional simulations (Deutsch and Journel 1998) instead of kriging, since conditional simulations are designed to preserve the data variance in reproducing the sample histogram and variogram. It is a correction of the estimation bias by avoiding smoothing. Bourgault (2021) has shown, that indeed, conditional simulations (Zs) can be used if their resulting statistic (E[Zs | Zs > t]) is averaged over various samplings. Even if the simulated values do not suffer from the estimation bias, they still suffer from sampling bias just like
the sample histogram and variogram do. But in practice, and unlike the kriging estimates, or averaged kriging estimates, simulations cannot be used to categorize ore and waste on the spatial domain because the simulated values (Zs) cannot be definitively associated with any particular spatial locations as they change for each simulated realization. For recovering the minerals above an economical threshold, E[Z | avg(Z*) > t] (or E[Z | Z* > t] from the kriging estimates of a particular sampling) can be achieved, but E[Z | Z > t] cannot. The statistical bias E[Z | avg(Z*) > t] < E[Z | Z > t] (or E[Z | Z* > t] < E[Z | Z > t]) cannot be avoided. It can only be estimated, either from the sample values (Tables 5 and 6) or from conditional simulations.
Summary and Conclusions Statistical estimation fluctuates, from sampling to sampling, leading to statistical bias associated with any given sampling. The fluctuations are more important when the sampling size is smaller, but also if the population has a skewed distribution like it is often the case in earth sciences. Sampling a lognormal population is more sensitive to statistical bias than sampling a normal population. Averaging the statistic of interest over multiple samplings is a proven approach for correcting the statistical bias for basic statistics (average, variance, histogram) and for spatial statistics (variogram). For spatial estimation, estimation bias will be added to the sampling bias. Averaging kriging estimates over multiple samplings (avg(z*)) can correct for conditional bias, especially when the distribution is highly skewed. Thus, more smoothing, or more estimation bias, helps to reduce conditional bias. This will provide estimated values that will be unbiased for estimating E[Z | avg(Z*) > t] but will underestimate the true E[Z | Z > t]. Even if for many statistics their statistical bias can be practically eliminated by averaging them over multiple samplings, the processing of estimated values, or averaged estimated values, by a transfer function will in general provide biased results. For example, the application of a threshold value to mineral estimation will always suffer from some degree of statistical bias due to the smoothing of the estimator (variance(avg(z*)) variance(z)). This is the mining paradox where a variance reduction (smoothing) is required to remove conditional bias, such as to improve selectivity between ore and waste, but also induces a statistical bias in the expected recovery when the ore category is defined by grades above a threshold value.
Cross-References ▶ Best Linear Unbiased Estimation ▶ Bootstrap
S
1428
▶ Expectation-Maximization Algorithm ▶ Simulation ▶ Kriging ▶ Lognormal Distribution ▶ Normal Distribution ▶ Random Variable ▶ Regression ▶ Sequential Gaussian Simulation ▶ Smoothing Filter ▶ Spatial Statistics ▶ Variogram
Statistical Computing
analysis, resampling methods for the estimation of confidence intervals or hypothesis testing, standard or robust linear regression, compositional data analysis, computational maximum likelihood estimation, nonlinear optimization, and multivariate tools such as principal component analysis. Statistical computing is especially important in the Earth sciences because data tend to be statistically complicated, rarely being described by simple distributions, and because of the proliferation of vast data sets as observing systems expand.
Introduction Bibliography Armstrong M (1998) Basic linear geostatistics. Springer, 153 p Bourgault G (2021) Clarifications and new insights on conditional bias. Math Geosci 53:623–654. https://doi.org/10.1007/s11004-02009853-6 Chiles JP, Delfiner P (2012) Geostatistics modeling spatial uncertainty. Wiley series in probability and statistics, 2nd edn. Wiley, 726 p Deutsch CV, Journel AG (1998) GSLIB Geostatistical software library and user’s guide, 2nd edn. Oxford University Press, 369 p Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York, 483 p Isaaks E (2004) The kriging oxymoron: a conditionally unbiased and accurate predictor. In: Leuangthong O, Deutsch CV (eds) Geostatistics Banff 2004, 2nd edn. Springer, pp 363–374 Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, Oxford, 561 p Journel A, Kyriakidis P (2004) Evaluation of mineral reserves: a simulation approach. Oxford University Press, New York, 216 p Krige DG (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J Chem Metall Min Soc S Afr 52: 119–139 Matheron G (1965) Principles of geostatistics. Econ Geol 58: 1246–1266 McLennan J, Deutsch C (2002) Conditional bias of geostatistical simulation for estimation of recoverable reserves. In: CIM proceedings Vancouver 2002, Vancouver
It is assumed at the outset that readers of this entry are familiar with elementary probability and statistical concepts, such as set theory, the definition of probability, sample spaces, probability density and cumulative distributions, and measures of location, dispersion, and association, that are covered in the first few chapters of textbooks such as De Groot and Schervish (2011) or Chave (2017a). Some familiarity with standard distributions, such as the Gaussian and chi-square types, is also expected. Statistical computation is implemented in software packages such as Matlab that is widely used in the Earth and ocean sciences, and open source implementations include R and Python with third-party modules. As noted in the Definition, statistical computing is a vast subject and cannot be covered comprehensively in anything short of a book. The treatment will be limited to exploratory data analysis tools, resampling methods including the bootstrap and permutation tests, linear regression, and nonlinear optimization to implement a maximum likelihood estimator. A more comprehensive treatment of computational data analysis may be found in Chave (2017a), where Matlab implementation is a focus, and at a more advanced level in Efron and Hastie (2016).
Statistical Computing Exploratory Data Analysis Tools Alan D. Chave Department of Applied Ocean Physics and Engineering, Deep Submergence Laboratory, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
Definition Statistical computing is a broad subject, covering all aspects of the analysis of data or statistical simulation that require the use of some type of computational engine. An incomplete list of statistical computing topics includes exploratory data
Exploratory data analysis comprises the early stages in evaluating a given data set where the goal is their characterization. Let {Xi} be a data sample of size N drawn from an unknown probability density function (pdf) f(x). Partition the data range {min(xi), max(xi)} into a set of r bins according to some rule. For example, the rule could partition the data into equal-sized intervals. Count the number of data in each bin. A frequency histogram is a plot of the counts of data in each bin against the midpoint of the bin. Other normalizations can be used, such as transforming counts into probabilities. The outcome is a simple way to characterize a given data set, but it remains
Statistical Computing
1429
somewhat subjective because the bin sizes and their boundaries are arbitrary. A better qualitative approach to characterize the distribution of {Xi}, which are assumed to be independent and identically distributed (iid), is the use of a kernel density estimator, which is a smoothed rather than binned representation of a distribution. The kernel density estimator that represents f(x) is given by f ðxÞ ¼
1 Nd
N
K i¼1
x xi d
4s5N 3N
3 2.5
0.2
2 0.15 1.5 0.1 1
ð1Þ
where K(x) is a symmetric kernel function that integrates to one over its support and δ > 0 is the smoothing bandwidth. A number of kernel functions are in use, but the standard Gaussian works well in practice. The smoothing bandwidth δ can be chosen by the user, but for a Gaussian kernel function the optimal value is d¼
0.25
ð2Þ
where sN is the sample standard deviation. Figure 1a shows a kernel density pdf for 5000 random draws from Gaussian distributions with means of 2 and 2 and a common variance of 1. The solid line uses (2) to give a smoothing bandwidth of 0.4971 and adequately resolve the peaks from the two Gaussian distributions. It also accurately reproduces the analytical form of the pdf. The dashed line shows the kernel density pdf for δ ¼ 0.1, which overestimates the peak heights and introduces multiple peaks at their top. The dotted line shows the kernel density pdf for δ ¼ 2, which smears the two peaks together. In many instances, data may be drawn from a distribution whose support is finite, in contrast with the previous example. In this case, the kernel density estimator will give incorrect results unless the kernel density algorithm limits the support. Figure 1b shows a simple example using the arcsine distribution, which is defined only over (0, 1). The arcsine distribution is equivalent to a beta distribution with arguments (½, ½). The solid line shows the kernel density pdf when the support is unconstrained. By contrast, the dashed line shows it when the support is limited to (0, 1) and is almost identical to the dotted line that shows the analytical form. The quantile-quantile or q-q plot enables the assessment of the distribution of a set of data in a qualitative manner and can be made quantitative by adding error bounds based on the Kolmogorov-Smirnov statistic. A q-q plot consists of the quantiles of the target distribution F(x) against the order statistics x(i) computed by sorting the sample. The quantiles are those values of the inverse cumulative distribution function at the uniform distribution quantiles:
0.05
0
0.5
-10
0
10
0
0
0.5
1
Statistical Computing, Fig. 1 (left) Kernel density pdf for 5000 random draws from Gaussian distributions with means of 2 and 2 and a variance of 1. The solid line uses (2) for the smoothing bandwidth, while the dashed and dotted lines use values of 0.1 and 1, respectively. (right) Kernel density pdf for 500 random draws from the arcsine distribution. The solid line assumes unbounded support, while the dashed line has support of (0,1). The dotted line is the analytic pdf
ui ¼
i 0:5 i ¼ 1, . . . N N
ð3Þ
The order statistics obtain from sorting and optionally scaling the data. If the data sample is drawn from the target distribution, a q-q plot will be a straight line, while departures from a line yield insights into their actual distribution. A q-q plot emphasizes the target distribution tails and hence is useful for detecting outliers but is not sensitive near the distribution mode. Figure 2a shows a q-q plot for 1000 random samples from a standard normal distribution but with the four smallest and largest values replaced by outlying values. The outliers are readily apparent at the bottom and top of the line. The dashed lines show confidence bounds obtained using the 1 α probability value of the Kolmogorov-Smirnov statistic cα, plotting F1(u) against F1(u cα) with α ¼ 0.05; see Chave (2017a, §7.2) for further details. Because the confidence bounds flare outward at the distribution extremes, the outliers are not sufficiently large to be quantitatively detected. The percent-percent or p-p plot also enables the assessment of the distribution of a set of data in a qualitative manner and can be made quantitative by adding error bounds based on the Kolmogorov-Smirnov statistic. A p-p plot consists of the uniform quantiles (3) plotted against the order statistics x(i) of the data sample transformed using a target cdf, typically after standardizing the data. As with a q-q plot, a p-p plot will be a straight line if the target distribution is correct, and deviations from a straight line provide insight into the actual distribution of the data. However, a p-p plot is most sensitive near the
S
1430
Statistical Computing
Statistical Computing, Fig. 2 (left) Q-q plot for 1000 random samples from a standard Gaussian distribution with the four smallest and largest values replaced by outliers, along with dashed lines showing the 95% confidence interval on the result. (right) P-p plot for the same data
distribution mode and hence is not very useful for outlier detection but is quite useful for data that are drawn from long-tailed distributions. Michael (1983) introduced a useful variant on the p-p plot that equalizes its variance at all points. Figure 2b shows a p-p plot of the contaminated Gaussian sample from Fig. 2a. The outliers are barely visible as a slight turn to the left and right at the bottom and top of the line. Confidence bounds are omitted but can easily be added. The outcome of a statistical procedure may require the identification and elimination of outlying data. This is called censoring and requires changes to both q-q and p-p plots in its presence, as the target distribution has to be truncated. Let fx(x) and Fx(x) be the pdf and cumulative distribution function (cdf), respectively, of the data before truncation. The pdf of the truncated data set is f 0x ðx0 Þ ¼
f x ðxÞ Fx ð d Þ Fx ð c Þ
ð4Þ
where c x0 d. Suppose the original number of data is N, and the number of data censored at the bottom and top of the distribution are m1 and m2, respectively. Suitable choices for c and d are the m1 th and N m2 th quantiles of the original distribution fx(x). A q-q or p-p plot computed using (4) as the target distribution against the censored data will yield the correct result. Exploratory data analysis sometimes requires the computation of combinations of distributions or understanding of the properties of specific distributions that cannot be obtained analytically. A suitable substitute is simulation. An example appears in Chave (2014), where it was postulated that the residuals obtained from magnetotelluric (MT) response function estimation are pervasively distributed as a member of the
stable family (a set of distributions characterized by infinite variance and sometimes mean and which describe phenomena in fields ranging from physics to finance). Stable distributions have four parameters (tail thickness α, skewness, location, and scale), and while there are known relationships for the latter three parameters for ensembles of stable random variables, these do not exist for α. Chave (2014) used simulation to show that ensembles of stable random variables with different tail thickness parameters converge to a stable distribution with a new value for α. He did this by obtaining 10,000 random draws from a uniform distribution over the range [0.6, 2.0] to specify α in random draws from a standardized symmetric stable distribution. The ensemble was fit by a stable distribution with a tail thickness parameter of 1.16.
Resampling Methods Beginning in the early twentieth century, a variety of parametric (meaning that they are based on some distributional assumptions) estimators for the mean and variance (among other statistics) and for hypothesis testing were established. In many instances, a data analyst will not want to make assumptions about the data distribution required for parametric estimators due to the possibility of mixture distribution, outliers, and non-centrality. This led to the development of estimators based on resampling of the data as computational power grew. The most widely used resampling estimator is the bootstrap introduced by Efron (1979). More recently, permutation hypothesis tests that typically are more powerful than the bootstrap have been developed. The empirical distribution based on the data sample plays a central role in the bootstrap, taking the place of a parametric
Statistical Computing
1431
sampling distribution. The bootstrap utilizes sampling with replacement (meaning that a given sample may be obtained more than once) and hence is not exact. It can be used for parameter estimation, confidence interval estimation, and hypothesis testing. Suppose that the statistic of interest is lN along with its standard error. Obtain B bootstrap samples Xk from the data set, each of which contains N samples, by resampling with replacement. Apply the estimator for lN to each bootstrap sample to get a set of bootstrap replicates lN Xk for k ¼ 1, . . .B. The sample mean of the bootstrap replicates is 1 B
l ¼
B
lN Xk
ð5Þ
k¼1
and the standard error on lN is SEB lN ¼
1 B
B
lN
Xk
l
2
1=2
ð6Þ
k¼1
The same methodology applies to other common statistics such as the skewness or kurtosis. While statistical lore holds that a couple of hundred replicates should suffice, it is better to use many thousands and examine the bootstrap distribution for consistency with a given statistic; see Chave (2017a, §8.2) for the details. Note also that a naïve bootstrap applied to estimators derived from the order statistics (e.g., the interquartile range) will fail to converge. The bootstrap is most commonly used to obtain confidence intervals when the statistic under study is complicated such that its sampling distribution is either unknown or difficult to work with. The standard bootstrap confidence intervals are computed by simply substituting bootstrap estimates for the statistic and its standard deviation into the Student t confidence interval formula, yielding
Pr l tN1 1
a a SEB lN l l þ tN1 1 SEB lN 2 2
¼1a
ð7Þ
where l and SEB lN are given by (5) and (6). A better approach to the naïve formulation (7) is the bootstrap-t confidence interval based on the bootstrap replicates given by the normalized statistic:
zk
¼
lN l
ð8Þ
SEB lN
The {zk} are then sorted, and the α/2 and 1 α/2 values are extracted for use in (7) in place of the tN 1 quantile. Accurate
results require the use of a large (many thousands) number of bootstrap replicates. The bootstrap percentile method directly utilizes the α/2 and 1 α/2 quantiles of the bootstrap distribution and has better convergence and stability properties than either the standard or bootstrap-t approaches. The best bootstrap confidence interval estimator is the bias-corrected and accelerated (BCa) approach that adjusts the percentile method to correct for bias and skewness. The formulas for bias correction and acceleration are complicated and will be omitted; see Chave (2017a, §8.2.3) for details. It is the preferred approach for most applications. The bootstrap is also useful for hypothesis testing, although the results typically are neither exact (meaning that the probability of a type 1 error for a composite hypothesis is α for all of the possibilities which make up the hypothesis) nor conservative (meaning that the type 1 error never exceeds α); an exact test is always conservative. A better alternative for most applications is the permutation test that was introduced by Pitman (1937) but only became practical in recent years due to its computational complexity. Permutation tests are usually applied to the comparison of two populations, although they can be used with a single data set with some additional assumptions. A permutation test is at least as powerful as the alternative for large samples and is nearly distribution-free, requiring only very simple assumptions about the population. Permutation tests are based on sampling without replacement. A sufficient condition for a permutation test to be exact and unbiased is exchangeability of the observations (meaning that any finite permutation of their indices does not change their distribution) in a combined sample. Hypothesis testing usually utilizes the p-value to make decisions. A p-value is the probability of observing a value of the test statistic as large as or larger than the one that is observed for a given experiment. A small number is taken as evidence for the alternate hypothesis or rejection of the null hypothesis. The definition of small is subjective, but typically a p-value below 0.05 is taken as strong evidence for rejection of the null hypothesis. A p-value is a random variable and should not be confused with the type 1 error α which is a parameter chosen before an experiment is performed and hence applies to an ensemble of experiments. The p-value concept is not without controversy, and the American Statistical Association issued a statement on its concept, context, and purpose in Wasserstein and Lazar (2016) that is certainly pertinent. The key steps in carrying out a permutation test on two populations X and Y are: 1. Choose a test statistic l that measures the difference between X and Y. 2. Compute the value of the test statistic for the original sets of data.
S
1432
Statistical Computing
3. Combine X and Y into a pooled data set. 4. Compute the permutation distribution of l by randomly permuting the pooled data, ensuring that the values are unique to enforce sampling without replacement, and recompute the test statistic lk . 5. Repeat step 4 many (typically tens of thousands to millions) of time.
pperm ¼ 2 min
1 Bþ1
B i¼1
2
1.5
1
0.5
-0.5
0
0.5
1
1.5
Let B be the number of permutations. The two-sided equal tail permutation p-value is given by
1 lk l þ 1 ,
where 1(x) is an indicator function that is either 0 or 1 as x is negative or positive. An example taken from Chave (2017a, §8.3.3) serves to illustrate a permutation test. In the 1950s and 1960s, several countries carried out experiments to see if cloud seeding would increase the amount of rain. Simpson et al. (1975) provide an exemplar data set containing 52 clouds, half of which were unseeded or seeded. Q-q plots of the data show that they are quite non-Gaussian, and a q-q plot of the logarithms of the data is quite linear. A two-sample t test on the data yields a p-value of 0.0511, which constitutes weak acceptance of the null hypothesis that unseeded and seeded clouds yield different amounts of rain. However, a two-sample t test on the logs of the data yields a p-value of 0.0141, strongly rejecting the null hypothesis. A permutation test will be used to test the null hypothesis using 1,000,000 permutations of the merged data set. The test statistic is the difference of the means of the first and last 26 entries in the permuted merged data. A set of 1,000,000 random permutations of the 52 data indices is generated and checked to ensure that none of them constitutes the original data order and that they are unique. The p-value (9) is 0.0441, rejecting the null
0 -1
6. Accept or reject the null hypothesis according to the p-value for the original test statistic compared with the permutation test statistics.
2
Statistical Computing, Fig. 3 Kernel density pdf of the distribution of the test statistic for the cloud seeding data. The vertical dashed line is the test statistic based on all of the data. (Taken from Chave (2017a))
1 Bþ1
B i¼1
1 lk < l þ 1
ð9Þ
hypothesis. Repeating the permutation test using the logs of the data gives a p-value of 0.0143, again strongly rejecting the null hypothesis and showing that a permutation test does not require strong distributional assumptions, in contrast to the parametric t test. Figure 3 shows the permutation distribution compared to a Gaussian distribution using sample mean and variance as its parameters for reference and the value of the test statistic. Permutation tests can be used for one sample tests if the underlying data distribution is symmetric. It can also be applied to two sample tests on paired data such as come from medical trials, where patients are assigned at random to control and treated groups, and two sample tests for dispersion. Care is required in devising a test statistic for the latter unless the means of the two data sets are known to be the same. The solution is to first center each data set about its median value and then use the difference of the sum of squares or absolute values as the test statistic. Details on these applications may be found in Chave (2017a, §8.3). A final application of resampling is bias correction in goodness-of-fit tests such as the Kolmogorov-Smirnov or Anderson-Darling types. It is well known that such tests require that the parameters in the target distribution be known a priori, a condition that rarely holds in practice, and they are usually estimated from the data. This results in unknown bias in the test which can be removed using a Monte Carlo approach. The steps to follow are: 1. Obtain the test statistic for the N observations against a target distribution whose parameters are estimated from the data. 2. Compute N random draws with replacement from the target distribution using the same parameters. 3. Obtain the K-S or A-D statistic for the random draw. 4. Repeat steps 2 and 3 a large number of times. 5. Compute the p-value using (9). This approach was used by Chave (2017b) to remove bias while assessing the fit of a stable distribution to regression
Statistical Computing
1433
residuals in the computation of the magnetotelluric response function.
Linear Regression
and in practice the expected value is replaced with an estimate. 3. The predicted value N-vector is y ¼ $ X∙ $ XH ∙$ X
Linear regression is the most widely used statistical procedure in existence. It encompasses problems that can be cast as a linear model for p parameters given N data, where p N is assumed y ¼ $ X∙b þ ϵ
ð10Þ
where y is the response N-vector, $ X is an N p matrix of predictors, ∙ denotes the inner product, β is a parameter p-vector, and ϵ is an N-vector of unobservable random errors. Four assumptions underlie linear regression: (a) the model is linear in the parameters, which is implied by (10); (b) the error structure is additive, which is also implied by (10); (c) the random errors have zero mean and equal variance while being mutually uncorrelated; and (d) the rank of the predictor matrix is p. In elementary statistical texts, the predictor variables are presumed to have no measurement error, which is rarely true in practice. In real applications, $ X contains random variables, in which case the statistical model underlying (10) is Eðyj$ XÞ ¼ $ X∙b covðyj$ XÞ ¼ s2 $ IN
ð11Þ
where the leading elements are the conditional expected value and covariance, s2 is the population variance, and $ IN is the N N identity matrix. The least squares estimator for (10) is b ¼ $ XH ∙$ X
1
ð12Þ
1. The unconditional expected value of (12) is E b ¼b
ð13Þ
so that linear regression is unbiased presuming assumption (c) holds. 2. The unconditional covariance matrix for b is cov b ¼ s E
∙$ XH ∙y $ H∙y
ð15Þ
where $ H is the N N hat matrix. The hat matrix is a projection matrix and hence is symmetric and idempotent. Its diagonal elements hii are real and satisfy 0 hii 1. Chave and Thomson (2003) showed that when the rows of $ X are complex multivariate Gaussian, the distribution of hii is beta with parameters p and N p. 4. The estimated regression residuals are given by r ¼ y $ X∙b ¼ ð$ IN $ HÞ∙y
ð16Þ
5. The sum of squared residuals is H
r ∙r ¼ yH ∙ð$ IN $ HÞ∙y
ð17Þ
6. An unbiased estimate for the variance is s2 ¼
H
r ∙r Np
ð18Þ
7. The regression residuals r are uncorrelated with the predicted values y. 8. The parameter vector b is the best linear unbiased estimator (Gauss-Markov theorem). 9. Statements 1–8 do not require any distributional assumptions. If the random errors are real and N-variate Gaussian distributed with mean m and common variance s2, then b is also the maximum likelihood estimator and hence b b =s converges in distribution to a standardized
∙$ XH ∙y
where the superscript H denotes the Hermitian transpose to accommodate complex response and predictor variables. A list of the properties of the linear regression estimator is given in Chave (2017a, §9.2). A summary is:
2
1
$ X ∙$ X H
1
ð14Þ
Gaussian distribution. Further, b N p b, s2 $ XH ∙$ X ðN pÞs2 w2Np s2
1
S ð19Þ
where ~ means “is distributed as” and Np is a p-variate Gaussian. In addition, the regression estimator is consistent, and b and s2 are independent and serve as sufficient statistics for estimating β and s2. Complex random errors require that propriety also be considered; see Schreier and Scharf (2010) for the details. Equation (12) is called the normal equation solution and possesses poor numerical properties, particularly when the
1434
Statistical Computing
predictor matrix has entries that cover a wide range. The problem comes from inverting $ XH ∙$ X , and the issue can be avoided if that step is removed. The predictor matrix can always be factored into the product of two matrices $ Q∙$ R , where $ Q is N p orthonormal, so that $ QH ∙$ Q5$ Ip . The second matrix is p p upper triangular, so that all of the entries to the left of the main diagonal are zero. Combining the QR decomposition with the least squares problem yields QH ∙y ¼ $ R∙b
ð20Þ
The left side of (19) is a p-vector. The p-th row of b multiplied by the ( p, p) element of $ R is equal to the pth element of the left side of (20). Stepping up one row, there is only one unknown because the p-th element of b is already determined. This process of stepping back row by row is called back substitution, and the least squares problem has been solved without ever computing $ XH ∙$ X or its inverse. QR decomposition is the best approach for numerically solving least squares problems. Statistical inference for linear regression comprises a set of tools for assessing the fit of a given set of a data to a linear model. Analysis of variance (ANOVA) provides a gross tool to assess the need for linear regression and, along with the widely used residual sum of squares, is not very informative. A more useful approach is a test of the null hypothesis H0 : bj ¼ bj against the alternate hypothesis H1 : bj 6¼ bj , where bj is a particular value for the jth element in β. The most widely used case sets bj ¼ 0 to test whether a given parameter is required. Let z ¼ $ XH ∙$ X. The test statistic is
tj ¼
zjj bj bj s
ð21Þ
and is distributed as Student’s t distribution with N p degrees of freedom if the null hypothesis is true. It is standard practice to produce a table that lists each element in β, its standard error, the test statistic (21), and the corresponding p-value. If a given element of β has a p-value that lies below the type 1 error level α, then it should be removed from the linear model, although this must be done carefully and in conjunction with assessing the regression for outlying data. The well-known equivalence between hypothesis testing and confidence interval estimation yields that the 1 α confidence interval on bj is the set of all values for bj for which the null hypothesis is acceptable with a type 1 error of α. Consequently, it follows that the 1 α confidence interval on βj is
bj tNp 1
rH ∙r
a 2
bj þ tNp 1
ðN pÞzjj a 2
bj
rH ∙r ðN pÞzjj
ð22Þ
When there is more than one parameter in ζ, the Bonferroni method of replacing α with α/p should be utilized to avoid underestimating the confidence interval. The bootstrap may also be used to estimate both the regression parameters and confidence intervals on them and is preferred because it does not depend as strongly on distributional assumptions compared to the parametric form (22). First, B bootstrap replicates are obtained by sampling with replacement from the response and predictor variables, and the linear regression estimator is applied to each to yield the B p matrix bk whose column averages are the bootstrap
estimate of the regression coefficients b . Bootstrap confidence intervals using the percentile method follow by sorting the columns of bk and taking the bBα/2pc and bB(1 α)/2pc entries as the confidence interval for each. The BCa method may also be utilized. A double bootstrap estimator may be used to assess the significance of the regression coefficients; see Chave (2017), §9.4.3, and §9.5.2 for the details. It is important to evaluate the regression residuals for lack of correlation, or randomness, to see that assumption (c) above is in effect. This is most easily achieved using the nonparametric Wald-Wolfowitz runs test that is described in Kvam and Vidakovic (2007) §6.5. The regression residuals should also be evaluated for the presence of autocorrelation using the Durbin-Watson test. Positive autocorrelation in the residuals will downward bias the parameter standard errors, hence increasing the false rejection rate on the parameters themselves. The details may be found in Chave (2017a) §9.4.4. Influential data are those that are statistically unusual or that exert undue control on the outcome of a linear regression. They are classified as outliers or leverage points when they appear in the residuals and predictors, for which the hat matrix diagonal is a useful statistic. It is certainly possible to hand cull these from small data sets, but as data sets get larger this becomes impractical. What is needed is estimators that are robust to outlying data but that approach the efficiency of standard linear regression when outliers are absent. Such estimators have been devised over the past 50 years. The most widely used robust estimators are maximum-likelihoodlike, or M-estimators. Chave et al. (1987) review the topic in a geophysical context. M-estimators turn maximum likelihood estimation backward by a priori specifying a loss function that ensures robustness while maintaining efficiency rather than
Statistical Computing
1435
specifying their distribution. The loss function r(r) for a Gaussian distribution is r2 þ c, where c is a constant and r is a residual (e.g., difference between a datum xi and the mean l), and the estimation problem is
a suitable choice is sIQ/sIQ, where IQ means the interquartile range. The second step in implementing a robust estimator is linearization of the equations. This is accomplished by rewriting (24) as a weighted least squares problem:
N
rðr i Þ
min l
ð23Þ
N
N
cðr i Þ
i¼1 i¼1
A typical M-estimator will utilize a loss function that is Gaussian at the center but with longer tails at the extremes. Taking the derivative in (23) with respect to l yields the M-estimation analog to the normal equations: N
c ðr i Þ ¼ 0
ð24Þ
i¼1
where c(r) ¼ @ rr(r), and c(r) is called the influence function. Huber (1964) introduced a simple robust estimator based conceptually on a model that is Gaussian at the center and double exponential in the tails. The double exponential is the natural distribution for the L1 (least absolute values) norm, just as the Gaussian is the natural estimator for the L2 (least squares) norm. The loss and influence functions for the Huber estimator are rðr Þ ¼ r 2 jr j a a2 ¼ aj r j jr j > a: 2 cðr Þ ¼ r jr j a
ð25Þ
¼ a sgn ðr Þ jr j > a When α ¼ 1.5, the Huber estimator has 95% efficiency with purely Gaussian data, in contrast with an L1 estimator that has a ~60% variance penalty. From (25), the Huber estimator clearly bounds the influence of extreme data where the L2 estimator does not. More extreme loss and influence functions may be devised; for example, trimming the data at |r| ¼ α means replacing the second terms in (25) with α2/2 and 0, respectively. The numerical implementation of a robust estimator requires two additional steps because (24) is nonlinear, as the residuals are not known until the equation is solved. The argument in (23) or (24) must be independent of the scale of the data; in the Huber estimator (25), the L2 to L1 transition occurs at α ¼ 1.5 standard deviations, so the argument needs to be expressed in standard deviation rather than data units. This can be achieved by replacing ri in (24) with ri/d, where d is the sample scale estimate divided by the population scale estimate for the target distribution (which is usually Gaussian). The parameter d should not be sensitive to extreme data;
wi r i ¼ 0 i¼1
wi ¼
cðr i =d Þ r i =d
ð26Þ
Linearization is achieved by iteratively solving (26) and computing the weights wi and scale d using the residuals ri from the prior iteration. Equation (26) is initialized using standard least squares (i.e., wi ¼ 1), and the iterative solution proceeds until the sum of squared residuals does not change above a threshold amount (such as 1%). The robust estimator extends to linear regression if the residuals in (24) are identified as the regression residuals (15). Performing the minimization that yielded (24) gives N
cðr i =d Þxij ¼ 0
ð27Þ
i¼1
whose solution using weighted least squares is straightforward. The result should be assessed using the runs test, Durbin-Watson test, and t-statistic p-value, along with a q-q plot to check for outliers, just as for ordinary linear regression. Rousseeuw and Leroy (1987, p. 27) presented what is now a classic data set comprising the logarithms of the effective surface temperature and light intensity of stars from the cluster Cygnus OB1. Chave (2017a) §9.5.2 presented their analysis including an intercept using tedious manual culling that showed that four outlying values due to red giant stars were “pulling” the fit and yielding a result that is orthogonal to the visual best fit. In §9.6.1, Chave (2017a) repeated the analysis using a Huber robust estimator. Only one iteration ensued, and the normalized sum of squared residuals was reduced from 0.3052 to 0.3049. The residuals are random and uncorrelated, but the regression is not fitting anything very well. Figure 4 shows the data and the fitted line. It is clear that the four red giant stars at the upper left are “pulling” the fit and forcing it to be rotated about 90 from a visual best fit through the main cluster of stars at the right side of the plot. In fact, the robust fit is not very different from what is achieved with ordinary least squares. The problem is that the red giant stars produce leverage manifest in the predictors rather than outliers manifest in the residuals, and robust estimators are insensitive to leverage points. In fact, M-estimators often make leverage worse or create leverage points that were not present at the start.
S
1436
Statistical Computing
Statistical Computing, Fig. 4 The star data (crosses) along with a robust linear regression fit including an intercept (dashed)
This limitation has led to the introduction of bounded influence estimators that protect against both outliers and leverage points. Chave and Thomson (2003) replaced (26)–(27) with N
½k ½k ½k1
wi vi r i
xij ¼ 0 j ¼ 1, . . . p
ð28Þ
i¼1
where the superscript in brackets denotes the iteration number. The first set of weights wi are standard M-estimator weights such as the Huber set. The second set of weights vi utilize a hat matrix measure of leverage that is applied in an iteratively multiplicative fashion to prevent instability as leverage points appear and disappear during the solution of (28). The form used is ½k
½k1
vi ¼ vi
exp ewðxi wÞ ½k
ð29Þ ½k
The statistic xi is chosen to be Nhii =p, where hii is the diagonal elements of the hat matrix that includes the product of the M-estimator and leverage weights. The free parameter w sets the value where down-weighting begins, and a suitable
value is the Nth quantile of the beta distribution with parameters ( p, N p) at some chosen probability level, such as 0.99. Returning to the star data, the bounded influence estimator (28) was applied, resulting in four iterations during which the normalized residual sum of squares was reduced from 0.3052 to 0.1140. The residuals are random and uncorrelated. Weighted residual and hat matrix diagonal qq-plots reveal no outliers or leverage points remaining, and five data have been removed by the estimator. Figure 5 shows the result, which compares favorably with the hand-culled result from Chave (2017a) §9.5.2.
Nonlinear Multivariable Maximum Likelihood Estimation As a final example of statistical computing, the implementation of a maximum likelihood estimator (MLE) for the analysis of magnetotelluric (MT) data will be described. MT utilizes measurements of the electric and magnetic fields on Earth’s surface to infer the electrical structure beneath it. The fundamental datum in MT is a location-specific, frequency-
Statistical Computing
1437
Statistical Computing, Fig. 5 The star data (crosses) and a bounded influence linear regression fit to them including an intercept (dashed).
The axis limits are the same as for Fig. 4, and data that have been eliminated by bounded influence weighting have been omitted
dependent tensor $ Z linearly connecting the horizontal electric and magnetic fields E ¼ $ Z∙B. This relation does not hold exactly when E and B are measurements, and instead the row-by-row linear regression is employed:
pervasively distributed according to a stable model. This immediately suggested that an MLE based on the stable model could be implemented. For stable MT data, the pdf of a single residual is Sðr i j§Þ, where r i ¼ ei bi ∙z is the ith estimated residual, bi is the ith row of $ b in (30), and § ¼(α, β, γ, δ) are the tail thickness, skewness, scale, and location parameters for a stable distribution. For independent data, the sampling distribution is
e ¼ $ b∙z þ «
ð30Þ
where e is the electric field response N-vector, $ b is the N 2 magnetic field predictor matrix, z is the MT response 2-vector, and ε is an N-vector of random errors, with all of these entities being complex. The time series measurements are Fourier transformed into the frequency domain over N segments that are of order one over the frequency of interest in length; see Chave (2017b) for details. MT was plagued by problems with bias and erratic responses until the late 1980s, when robust and then bounded influence estimators were devised that dramatically improved results. All of these approaches are based on the robust model comprising a Gaussian core of data plus outlying nonGaussian data that cause analysis problems. However, Chave (2014, 2017b) showed that the residuals from a robust estimator were systematically long tailed, so that the Gaussian core does not exist, and then showed that the residuals were
N
SN ðrj§Þ ¼
i¼1
Sðr i j§Þ
ð31Þ
The likelihood function is the sampling distribution regarded as a function of the parameters for a given set of residuals. The MLE is obtained by maximizing the likelihood function, or equivalently its logarithm: N
Lð§, zjrÞ ¼
log Sðr i j§Þ
ð32Þ
i¼1
The first-order conditions for the MLE solution follow by setting the derivatives of (32) with respect to the parameters to zero:
S
1438
Statistical Computing
@ Bj Lð§, zjrÞ ¼ N
@ zk Lð§, zjrÞ ¼ i¼1
N i¼1
0 Real Robust Real MLE Δ Imaginary Robust ∇ Imaginary MLE
−5 −10 √T Zyx
Statistical Computing, Fig. 6 The real and imaginary parts of the Zxy component of the MT response scaled by the square root of period for a site located in South Africa. The confidence limits for the robust (gray) and stable MLE (black) are obtained using the jackknife and diagonal elements of the improper Fisher information matrix, respectively, assuming that the tail probability is apportioned equally among the eight real and imaginary elements of the response tensor. The stable MLE estimates have been offset slightly for clarity. (Taken from Chave (2014))
−15 −20 −25 −30 1.5
2
@ Bj Sðr i j§Þ ¼ 0 j ¼ 1, . . . 4 Sðr i j§Þ
@ zk Sðr i j§Þ b ¼0 Sðr i j§Þ ik
k ¼ 1, . . . 4
ð33Þ
The sufficient condition for the solution to (33) to be a maximum is that the Hessian matrix of the log likelihood function (32) be negative definite. Chave (2014) solved (33) using a two-stage approach that decouples the stable distribution parameter vector § and the MT response function z. In the first stage, the stable distribution was estimated using the stable MLE algorithm of Nolan (2001). In the second stage, the MT response function was obtained using the second part of (33) while fixing the stable distribution parameters. This proceeds iteratively until convergence is obtained. The starting solution is the ordinary least squares result with 5% trimming. Equations (31) can be solved using a weighted least squares approach, but convergence slows as the tail thickness parameter α decreases toward 1 and fails for smaller values. Experience showed that MT data exhibit tail thicknesses as low as 0.8, so this limitation was a real problem. An alternative approach utilized nonlinear multivariable minimization of an objective function based on a trust region algorithm for the second stage, including explicit provision of the gradient and Hessian matrix to speed convergence. The objective function that was minimized is the negative log likelihood given by (32) preceded by a minus sign and with the stable parameter vector fixed. This method worked well for all values of the tail thickness parameter at the cost of one to two orders of magnitude more computer time than weighted least squares. Figure 6 shows a typical result from the stable MLE compared to a standard robust estimator. The stable MLE produces a smoother result as a function of frequency, as
2.5
3 Log Period (s)
3.5
4
4.5
expected from diffusion physics, and smaller error estimates as compared to a bounded influence estimator. The latter is because the MLE is asymptotically efficient, reaching the lower bound of the Cramér-Rao inequality which a bounded influence estimator cannot achieve. In fact, there are substantial differences (up to 10 standard deviations) between the stable MLE and bounded influence results, although these are subtle in Fig. 6. The tail thickness parameter hovers around 1 at periods under 30 s. The p-values for a test of the null hypothesis that the data are stably distributed based on the Kolmogorov-Smirnov statistic does not reject at any frequency once the test is corrected for bias due to estimation of the distribution parameters from the data using the resampling approach described above.
Summary This chapter has covered four topics in statistical computing: exploratory data analysis, resampling methods, linear regression, and nonlinear optimization. The first topic includes tools to statistically characterize a data set using a kernel density estimator, quantile-quantile plot, percentile-percentile plot, and simulation. Resampling methods include the use of the bootstrap to compute estimators and confidence limits and permutation methods for hypothesis testing that are nearly exact. Linear regression was described both theoretically and numerically and then extended to robust and bounded influence estimators that automatically handle unusual data (e.g., outliers). Nonlinear optimization was described based on the solution for the magnetotelluric response function after recognizing that such data in the frequency domain pervasively follow a stable distribution. This set of topics is not comprehensive, and many additional ones exist that comprise statistical computing.
Statistical Inferential Testing
1439
Cross-References
Statistical Inferential Testing ▶ Compositional Data ▶ Computational Geoscience ▶ Constrained Optimization ▶ Exploratory Data Analysis ▶ Frequency Distribution ▶ Geostatistics ▶ Hypothesis Testing ▶ Iterative Weighted Least Squares ▶ Multivariate Analysis ▶ Normal Distribution ▶ Optimization in Geosciences ▶ Ordinary Least Squares ▶ Regression
Bibliography Chave AD (2014) Magnetotelluric data, stable distributions and impropriety: an existential combination. Geophys J Int 198: 622–636 Chave AD (2017a) Computational statistics in the earth sciences. Cambridge University Press, Cambridge Chave AD (2017b) Estimation of the magnetotelluric response function: the path from robust estimation to a stable mle. Surv Geophys 38: 837–867 Chave AD, Thomson DJ (2003) A bounded influence regression estimator based on the statistics of the hat matrix. J R Stat Soc Ser C 52: 307–322 Chave AD, Thomson DJ, Ander ME (1987) On the robust estimation of power spectra, coherences, and transfer functions. J Geophys Res 92: 633–648 De Groot MH, Schervish MJ (2011) Probability and statistics, 4th edn. Pearson, London Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26 Efron B, Hastie T (2016) Computer age statistical inference. Cambridge University Press, Cambridge Huber P (1964) Robust estimation of a scale parameter. Ann Math Stat 35:73–101 Kvam PH, Vidakovic B (2007) Nonparametric statistics with applications to science and engineering. Wiley, Hoboken Michael JR (1983) The stabilized probability plot. Biometrika 70: 11–17 Nolan JP (2001) Maximum likelihood estimation and diagnostics for stable distributions. In: Lévy processes: theory and applications. Birkhäuser, Cham Pitman EJG (1937) Significance tests which may be applied to samples from any distribution. J R Stat Soc Suppl 4:119–130 Rousseeuw PJW, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York Schreier PJ, Scharf LL (2010) Statistical signal processing of complexvalued data. Cambridge University Press, Cambridge Simpson J, Olsen A, Eden JC (1975) A Bayesian analysis of a multiplicative treatment effect in weather modifications. Technometrics 17: 161–166 Wasserstein RL, Lazar NA (2016) The ASA’s statement on p-values: context, process and purpose. Am Stat 70:129–133
R. Webster Rothamsted Research, Harpenden, UK
Definition Information on the constituents of rock, sediment, and soil necessarily derive from samples; it is never complete. Inferences drawn from data are uncertain, and the main aim of statistical testing is to attach probabilities to those inferences. The tests are prefaced by hypotheses or models, such as no differences among population means or relations between variables, the so-called null hypotheses. They are designed to refute those hypotheses by determining the probabilities that the hypotheses or models are true, given the data. A result is declared statistically significant when the probability, P, is less than some small value, conventionally P < 0.05. Statistical significance is not the same as scientific significance, and calculated values of P should always be presented along with standard errors in reports and papers so that clients and readers can draw their own inferences from results.
Introduction Statistical inference is a broad subject. In the geosciences it can be narrowed to inferences about populations that may be drawn from sample data. Rock formations and sediments are continuous bodies of variable material. Soil mantles the land surface except where it is broken by rock, and it too varies both from place to place and from time to time. Knowledge about many of the properties of these materials is incomplete because it derives from observations on samples; one cannot measure them everywhere. Further, measurements themselves are more or less in error. Knowledge of the true states of the materials is therefore uncertain, and one of the main aims of statistics in this context is to put that uncertainty on a quantitative footing; it is to attach probabilities to the inferences drawn from data. Following from it is the notion of statistical significance, of separating meaningful information, i.e., signal, from “noise,” i.e. from unaccountable variation in data or from sources of no interest. The information might be change of some property of the soil arising from change of use, the difference between analyses of the metal content of an ore from two laboratories, and the response of some treatment to combat pollution in a sediment. Given the uncertainties in data, the question then becomes: are observed differences or response so large in relation to the unaccountable variation
S
1440
Statistical Inferential Testing
that their occurrence by chance is highly improbable? If the answer is “yes,” then the result is said to be “significant”; it will be accompanied by a value P, a measure of this improbability – see below. A significance test is prefaced by a hypothesis. In the above examples this would be that there has been no effect on the soil caused by change of land use, that results from different laboratories do not differ, that the pollution in the sediment has not abated in response to the treatment. That is the “null hypothesis,” often designated H0, and the test is designed to reject it, not to confirm it. The alternative that there is a difference is denoted either H1 or HA. A valid test requires (a) a plausible underlying model of the phenomenon being investigated and (b) that the sampling has been suitably randomized to provide the data. Embodied in the model is a statistical distribution within which to judge the data.
The Normal Distribution Statistical tests of significance are set against a background of variation with assumed probability distributions, by far the commonest of which is the normal, or Gaussian, distribution. Its probability density is given by ðx mÞ 1 y ¼ p exp 2s2 s 2p
2
ð1Þ
where m is the mean and s2 is the variance of the distribution. Its graph of y against x is a familiar bell-shaped curve. A variable with mean m ¼ 0, variance s2 ¼ 1, and this distribution is a standard normal deviate, often denoted z. Many variables describing the properties of geological bodies and materials are distributed approximately in this way or can be made so by transformation of their scales of measurement. Further, the distributions of sample means increasingly approach normal as the sample sizes increase whatever the distributions of individual measurements, a result known as the central limit theorem.
80
90
95
99
z 1:28
1:64
1:96
2:58
Confidenceð%Þ
:
The z test is often used to decide whether a sample belongs to some population or that it deviates from some norm with mean m and variance s2. The null hypothesis is no deviation: x ¼ m. The test is given by z¼
xm , s2 =n
ð2Þ
for which the corresponding probability, P, can be calculated or found in tables of the standard normal deviate for the number of degrees of freedom. In practice the population variance is rarely known and is estimated by the sample variance, s2. In these circumstances, unless the sample is large (n >≈ 60), z must be replaced by student’s t: t¼
xm : s2 =n
ð3Þ
Again, the values of P corresponding to those of t for the number of degrees are tabulated in published tables and nowadays can be calculated in most statistical packages. For sample sizes exceeding, the distributions of z and t are so similar that the tests give almost identical results. Equations (2) and (3) lead to probabilities that the differences between x and m lie in either tail of the distributions; the differences may be either negative or positive. The tests are two-tailed or two-sided in the jargon. In many instances investigators are interested in only one tail. For example, a public health authority may want to know only whether the lead content of the soil exceeds some threshold; it would calculate c¼xz
s2 =n or c ¼ x þ t
s2 =n,
ð4Þ
and obtain the corresponding probability, P, which would be half that for the two-sided test.
Tests With z One use of z is to place confidence limits about a population mean, m, estimated by the mean, x, of a sample. For a sample of size n with variance s2, the variance of its mean is s2/n and its square root is the standard error. In these circumstances the lower and upper limits on m for some given confidence are xz
s2 =n and x þ z
Frequently used values are
s2 =n:
The F Test The quantity F, named in honor of R.A. Fisher, is the quotient resulting from the division of one variance, say s21 , by a smaller one s22 : s21 =s22 . Fisher worked out its distribution if s21 and s22 are variances of two independent random samples from two normally distributed populations with the same variance s2. The calculated value of F is compared with that distribution to obtain the probability, P, given the evidence, that the null hypothesis, i.e., no difference between the
Statistical Inferential Testing
1441
variances of the populations from which the samples were drawn, is true. The larger is the calculated F, the smaller is P, other things being equal. More often F is invoked to test for differences among means of classes in experiments and surveys after analyses of variance (ANOVAs). Table 1 is an example; it summarizes the ANOVA of lead (Pb) in the soil in the Swiss Jura classified by geology (from Atteia et al. 1994) with measurements in mg kg1 transformed to their common logarithms. The probability of F ¼ 5.95 under the null hypothesis of no differences among the means is P < 0.001; the null hypothesis is clearly untenable. Table 2 reveals why; the soil on the Argovian rocks is poorer in Pb than on the others. Table 2 lists the means of both the measurements in mg kg1 and of their transforms to common logarithms to bring the distributions close to normal. The last column lists the standard errors derived from the residual mean square from the ANOVA set out in Table 1. The differences are not huge but with so many data the test is very sensitive. The natural scientist might seek reasons for the soil on the Argovian’s being poorer in Pb. The environmental protection agency might be more concerned because the concentrations of Pb on four of the five formations exceed the statuary limit of 50 mg kg1. Individual means may be compared. The standard error of a difference is SED ¼
s2 s2 þ , n1 n2
ð5Þ
where s2 is the residual mean square from the ANOVA and n1 and n2 are the sizes of the two samples. Student’s t for the comparison of the means, say x1 and x2 , is then t ¼ ðx1 x2 Þ=SED
ð6Þ
In this example, the comparison of the Argovian with the four others would be the main interest. It would require a vector of weighting coefficients w ¼ 1, 14, 14, 14, 14 by which to multiply both the means and standard errors to obtain the appropriate value of t. Details are set out in Sokal and Rohlf (2012), which is among the most up-to-date and nearly comprehensive didactic texts. The practice of comparing every pair or sets of pairs in this way is to be deprecated. In particular, one cannot assign a probability to a difference only when it becomes apparent from an ANOVA. Hypotheses should be set out first, and experiments and surveys designed to test them. Ideally, they should comprise orthogonal contrasts, the number of which is equal to the degrees of freedom between classes; Sokal and Rohlf (2012) and Webster and Lark (2018) explain. If there are only two classes, then the F test is equivalent to the t test: F ¼ t2.
Tests for Normality Geoscientists often worry that the validity of ANOVAs and regression analyses are compromised if data are not normally distributed. The tests such as the Kolmogorov–Smirnov test, the Shapiro–Wilk test, or a w2 test on frequencies of a histogram are of little help. With many data almost any departure from normal will be judged significant, whereas with few data the tests are inconclusive. The tests do not reveal in what way the distributions deviate from normal. Further, what matter for ANOVA and regression are the distributions of the residuals, not those of the data. Far better guides are histograms and Q–Q graphs of the residuals (see Welham et al. 2015). Figure 1 shows the distributions of the residuals from the ANOVAS of the lead data both on the original measurements, Fig. 1a and b, and on their logarithmic transforms, Figure 1c and d, summarized in Table 2. The histogram of the residuals from the
S
Statistical Inferential Testing, Table 1 Analysis of variance for log10 Pb in the soil of the Swiss Jura on five geological formations Source Between classes Within classes (residual) Total
Degrees of freedom 4 358 362
Sum of squares 0.83017 12.08266 13.31261
Mean square 0.20754 0.03847
F ratio 5.95
P k is regarded as an outlier. Figure 1 provides a visualization of the univariate and bivariate distance measures. In the univariate case, distance is measured by z ¼ ðx xÞ s1, where s is the sample standard deviation. Points with high z-scores are treated as outliers. In the multivariate case, outlier detection considers patterns between all variables. Since the Mahalanobis distance is sensitive to outliers, Rousseeuw (1985) described a method for removing outliers (i.e., retaining representative samples) in a given domain to minimize the impact of outliers on the S estimate. This estimator is robust as it uses only a subset of observations (the h points) with the smallest covariance determinant. The zscore-bivariate analogy generalizes to higher dimensions via the Mahalanobis distance. Assuming the data distribution is multivariate Gaussian [for compositional data such as chemical assays, this is only approximately true after log-ratio
Z = (x-– )S-1
D2(x,–) = (x-–)T S-1 (x-–)
D2 –
S
S
2
Z x
–
1
o (x1,x2)
S
1446
Statistical Outliers
transformation is applied], it has been shown that the distribution of the Mahalanobis distance behaves as a chi-square distribution given a large number of samples (Garrett 1989). Therefore, a popular choice for the proposed cutoff point is k ¼ w2ðp,1aÞ , where w2 stands for the chi-square distribution and a is the significance level – usually taken as 0.025 (Rousseeuw 1985). Minimum Volume Ellipsoid The minimum volume ellipsoid (MVE) estimator finds the mean xJ and covariance SJ from h( n) points that minimize the volume of the enclosing ellipsoid. This ellipsoid is computed from the covariance matrix associated with the subsample. Formally, MVE ðXÞ ! xJ , SJ where the optimal set {Xi | Xi J} satisfies volume ellipsoid SJ
volume ellipsoid SK
ð2Þ
for all observation subsets K 6¼ J with #(J ) ¼ #(K ) ¼ h. Concretely, Xi denotes the ith row (or assay sample) in the observation matrix X. We interpret J as a collection of h row vectors from X such that the ellipsoid fitted to the data (i) passes through p þ 1 data points, (ii) contains the h observations, and (iii) has the smallest volume among all sets K with cardinality h that also satisfy conditions (i) and (ii). By construction, outliers must be excluded from the set J. Typically, h ¼ [(n þ p þ 1)/2]. Minimum Covariance Determinant The minimum covariance determinant (MCD) estimator finds the robust mean and covariance from h( n) points that minimize the determinant of the covariance matrix associated with the subsample. It looks for an ellipsoid with the smallest volume that encompasses the h data points. Formally, MCD ðXÞ ! xJ , SJ where the optimal set {Xi | Xi J} satisfies jSJ j jSK j ¼ h:
8 subsets K 6¼ J with #ðJ Þ ¼ #ðK Þ ð3Þ
In other words, J minimizes the volume of the ellipsoid among all subsets K that contain h observations from X. Note: j SJ j is used as a proxy measure since the actual volume is proportional to det SJ . Obtaining the center and covariance estimates in the Mahalanobis distance using the MCD estimator produces a robust distance measure that is not skewed by outliers. Meanwhile, outliers will continue to possess large values based on this robust distance measure. The most common cutoff value k selected for the D2 ðx, xÞ > k test is again based on chisquare critical values. The chi-square Q-Q plot provides a useful way to visually assess whether distances are w2p
distributed. To generate a plot of robust distance versus estimated percentiles, data points are ordered according to their Mahalanobis distances, and paired up with an ordered sequence of w2p percentiles which may be obtained by drawing n samples from a chi-square distribution with p degrees of freedom. If the data is from a multivariate normal distribution, then the data points should be on a straight line. However, points with a higher Mahalanobis distance than the chi-square quantiles do not follow a straight line trend, thus they are treated as outliers.
Application: Multivariate Outlier Detection in the Context of Iron Ore Mining in Western Australia In mining, assay analysis is performed routinely to determine the geochemical composition of subterranean ore bodies. This is achieved by sampling material extracted from drilled holes. In the context of iron ore mining in the Pilbara region in Western Australia (Fig. 2), the chemical components of interest include Fe, SiO2, Al2O3, P, LOI, S, MgO, Mn, TiO2, and CaO. These measurements constitute a point cloud in a highdimensional space which makes it more difficult to articulate meaningful differences between samples that belong to different geological domains. Balamurali and Melkumyan (2015) evaluated the standard classical Mahalanobis distance method and the robust distance methods: minimum volume ellipsoid (MVE) and minimum covariance determinant (MCD), to find the presence of outliers in the given geological domain. The study further evaluated the methods with different contamination levels and concluded that the robust methods outperform the classical Mahalanobis distance-based approach. However, the performance of robust methods tends to decrease with increasing contamination levels. The authors introduced ratios between the weight percentage (wt%) of chemical grades: SiO2/Al2O3, LOI/Al2O3, and TiO2/Fe as additional variables, and observed a significant increase in the outlier detection rate even with a high contamination level (Fig. 3). In addition, Gaussian process classification and the chisquare Q-Q plot were tested to gain insight with respect to finding abnormal data points. Not surprisingly, some of the multivariate outliers identified were not detected as univariate outliers. On the other hand, univariate outliers pertaining to trace elements (nondominant chemical components) may trigger false positives when the samples themselves are not considered multivariate outliers. Ensembles of Subsets of Variables In order to further understand the influence of raw chemical variables and the ratios between the variables in outlier detection, a novel method of outlier detection was proposed in (Balamurali and Melkumyan 2018) using ensembles of
Statistical Outliers
1447
S Statistical Outliers, Fig. 2 Geological overview of the Pilbara region in Western Australia. The economically valuable banded-iron formations are hosted mainly by the Hamersley Group of the Mount Bruce Supergroup © Commonwealth of Australia (Geoscience Australia) 2021
subsets of variables (EoSV). This study extended the previous work by incorporating a variable selection method. Similar to the work that incorporated three additional variables, in this work, all possible combinations of ratios between the raw chemical variables were used as additional variables. The Tree Bagger (Breiman 1998) method was used to combine features that have a large discrimination power at different contamination levels. Previous studies recognized that feature engineering can significantly improve training and model prediction outcomes (Mitra et al. 2002; Saeys et al. 2007).
The EoSV method demonstrated that the relevance of features varies with contamination level in the outlier classification problem. Hence, the stability of the influence of features was assessed and a method was proposed to keep only the features with stable influences in outlier detection. t-Distributed Stochastic Neighbor Embedding (t-SNE) for Outlier Detection Although multivariate outlier identification is important, it is a means rather than an end. Very often, there is value in
1448
Statistical Outliers
Statistical Outliers, Fig. 3 Comparing robust outlier detection methods using MCD with different feature combinations (Reference: Balamurali and Melkumyan (2018))
retaining the outliers for further analysis. When the outliers are viewed in the context of the raw and filtered data, spatial clustering may emerge. These clusters may suggest the existence of latent domains and highlight locations where the existing domain boundary is possibly incorrect. Figure 4 demonstrates a successful application of t-SNE (Van der Maaten and Hinton 2008) to outlier identification. The proposed method combines the t-SNE dimensionality reduction technique with spectral clustering (Von Luxburg 2007) to uncover persistent outliers. Figure 4 (top) shows two random realizations of the t-SNE and the chemical signatures of two clusters in the t-SNE domain. The smaller cluster corresponds to outliers sampled from a spatially adjacent geological domain (see Fig. 4 bottom). These green points are chemically distinct from the main population (target domain) and they are spatially correlated. By projecting the assay data to two or three dimensions, t-SNE allows geochemical differences between the main population and outliers to be visualized in a scatter plot and meaningful geological domain clusters to be discerned (Balamurali and Melkumyan 2016). Although t-SNE was successfully used in identifying subpopulations, because of its stochastic nature, t-SNE provides different results in different runs (Fig. 4). In order to produce stable results, the authors proposed a novel methodology for
ensemble clustering based on t-distributed stochastic neighbor embedding (EC-tSNE) to extract persistent patterns that more reliably correlate with geologically meaningful clusters (Balamurali and Melkumyan 2020). The proposed method uses silhouette scores to select the optimal number of clusters in a data-dependent way and builds consensus to cancel out inconsistency. EC-tSNE uses a consensus matrix (Monti et al. 2003) and agglomerative hierarchical clustering to achieve this. A map of multivariate outliers (see Fig. 4 bottom-left) can clearly identify the “contaminated” sites where samples are collected outside the target domain and potential contamination (or dilution) of high-grade ore material may occur. MCD Outlier Detection with Sample Truncation Strategy Following the experiments on EC-tSNE, Leung et al. (2019) proposed two sample truncation methods: maximum hull distance and maximum silhouette, to reject outliers in multivariate geochemical data. This builds on the foundation where two essential points remain unchanged: (i) the distance between samples is computed using robust estimators and (ii) outliers manifest as points that are a long distance away from the consensus values such as the robust mean obtained from the sampled population. The key observation is that outliers are relegated to the upper tail region when points are sorted by their robust distances in ascending order and
Statistical Outliers
1449
Clusters in t-SNE domain Run 1
10
5
5
0
0
-5
-5
-10
-10 -10
-5
0
5
10
Run 2
10
-10
Clusters in spatial domain
-5
0
5
10
Sample chemistry (spatial domain)
Low
High
Fe concentration
Statistical Outliers, Fig. 4 Top row: Results from t-SNE spectral clustering are displayed in t-SNE coordinates for two random runs. Bottom row (left): corresponding results in spatial coordinates; (right)
the normalized Fe grade of samples determined by geochemical assays (Reference: Leung et al. (2019))
plotted in a chi-square quantile plot. Sample truncation simply refers to the action of removing samples from this ordered sequence beyond a certain cutoff point (see Fig. 5). This approach allows the outlier removal decision to be made after the points are sorted by their robust distances; it also provides flexibility as the rejection threshold may be chosen by a domain expert or computed automatically to reduce false negatives, for instance. Traditionally, a fixed threshold based on the chi-square critical value, w2p,1a , is used for MCD. For compositional data analysis, this choice may result in suboptimal outlier rejection. Figure 5 illustrates the improvement that may be obtained using one of the proposed MCD sample truncation strategies, where threshold A and B correspond to
an alternative to EC-tSNE since t-SNE is computationally costly even for datasets of a moderate size (n > 10 k).
choosing D
w2p,1a
and the proposed cut-off value,
respectively. For the former, the extent of outlier removal is clearly inadequate. The outliers identified using MCD were compared to the EC-tSNE results in (Leung et al. 2019) and the MCD sample truncation strategies were recommended as
Summary This chapter described the fundamental characteristics of multivariate outliers, focusing specifically on a scenario most often encountered during mining whereby outliers result from collecting assay samples from an adjacent rather than the intended domain due to inaccurate domain boundary definitions. The presence of outliers can impact the performance of grade prediction models and this has far-reaching consequences for grade control and mine planning as it affects target conformance, process efficiency, and profitability of a mining operation. In the technical section, univariate and multivariate outlier identification methods were reviewed. This covered Tukey’s method, the use of Mahalanobis distance, and robust
S
1450
Statistical Outliers
Multivariate geochemical data
Compute robust distance Y
Y
X
X
MCD robust distance
100 80 60
Threshold A
Low
120
Threshold B
X
Using threshold A
100
MCD robust distance
… Y
Concentration
High
Feature 1= Fe
Feature N = SiO2
Outliers rejection via sample truncation
40 20 0 0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
Using threshold B
x p,v 2
5
5.5
Statistical Outliers, Fig. 5 Outlier rejection based on MCD robust distance and threshold selection (Reference: Leung et al. (2019))
alternatives based on minimum volume ellipsoid (MVE). Additionally, more recent approaches based on ensemble clustering t-distributed stochastic neighbor embedding (EC-tSNE) and MCD robust distances were introduced. The importance of feature selection was highlighted. Using real geochemical assay data collected from a Pilbara iron ore mine; the case study demonstrated successful extraction of outliers using both EC t-SNE and MCD robust distances. When spatially coherent outlier clusters emerge, latent domains and the areas where domain boundary delineation is poor can both be identified. This can give potentially new insight and local knowledge into the structural geology. While there is no universal outlier detection method that works in all circumstances, the MCD robust distance approach combined with appropriate threshold selection strategies seems to offer promising results, in terms of efficiency and efficacy. At the very least, it allows outliers to be segregated and, if necessary, modeled as a separate process (or latent domain) during grade estimation.
Cross-References ▶ Geologic Time Scale Estimation ▶ Mine Planning ▶ Multiple Point Statistics ▶ Object Boundary Analysis ▶ t-Distributed Stochastic Neighbor Embedding ▶ Tukey, John Wilder Acknowledgments This work has been supported by the Australian Centre for Field Robotics and the Rio Tinto Centre for Mine Automation, the University of Sydney.
Bibliography Balamurali M, Melkumyan A (2015) Multivariate outlier detection in geochemical data. In: Proceedings of the IAMG conference, vol 17. International Association of Mathematical Geosciences, Freiberg, pp 602–610 Balamurali M, Melkumyan A (2016) t-SNE based visualisation and clustering of geological domain. In: International conference on neural information processing. Springer, Berlin, pp 565–572 Balamurali M, Melkumyan A (2018) Detection of outliers in geochemical data using ensembles of subsets of variables. Math Geosci 50(4): 369–380. https://doi.org/10.1007/s11004-017-9716-8 Balamurali M, Melkumyan A (2020) Computer aided subdomain detection using t-sne incorporating cluster ensemble for improved mine modelling. International Association for Mathematical Geosciences (under review) Barnett V, Lewis T (1994) Outliers in statistical data. Wiley series in probability and mathematical statistics applied probability and statistics. Wiley, New York Breiman L (1998) Arcing classifier (with discussion and a rejoinder by the author). Ann Stat 26(3):801–849. https://doi.org/10.1214/aos/ 1024691079 Daisy Summerfield “;Australian Resource Reviews: Iron Ore 2019”, Geoscience Australia. Available at: https://www.ga.gov.au/scien tific-topics/minerals/mineral-resources-and-advice/australianresource-reviews/iron-ore De Maesschalck R, Jouan-Rimbaud D, Massart DL (2000) The Mahalanobis distance. Chemom Intell Lab Syst 50(1):1–18 Filzmoser P, Garrett RG, Reimann C (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31(5):579–587 Garrett RG (1989) The chi-square plot: a tool for multivariate outlier recognition. J Geochem Explor 32(1–3):319–341 Gnanadesikan R, Kettenring JR (1972) Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28:81–124 Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11(1):1–21 Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (2011) Robust statistics: the approach based on influence functions, vol 196. Wiley, New York Hawkes HE, Webb JS (1962) Geochemistry in mineral exploration. Harper & Row, New York
Statistical Quality Control
1451
Iglewicz B, Hoaglin D (1993) How to detect and handle outliers (the ASQC basic reference in quality control). In: Mykytka EF (ed) Statistical techniques. American Society for Quality, Statistics Division, Milwaukee Leung R, Balamurali M, Melkumyan A (2019) Sample truncation strategies for outlier removal in geochemical data: the MCD robust distance approach versus t-SNE ensemble clustering. Math Geosci:1–26. https://doi.org/10.1007/s11004-019-09839-z Leung R, Lowe A, Chlingaryan A, Melkumyan A, Zigman J (2022) Bayesian surface warping approach for rectifying geological boundaries using displacement likelihood and evidence from geochemical assays. ACM Transactions on Spatial Algorithms and Systems 18(1). https://doi.org/10.1145/3476979. Preprint available at https://arxiv. org/abs/2005.14427 Mahalanobis PC (1936) On the generalized distance in statistics. National Institute of Science of India, vol 2, pp 49–55. http://bayes. acs.unt.edu:8083/BayesContent/class/Jon/MiscDocs/1936_ Mahalanobis.pdf Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3): 301–312 Monti S, Tamayo P, Mesirov J, Golub T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Mach Learn 52(1):91–118 Reimann C, Filzmoser P, Garrett RG (2005) Background and threshold: critical comparison of methods of determination. Sci Total Environ 346(1–3):1–16 Rousseeuw PJ (1985) Multivariate estimation with high breakdown point. Math Stat Appl 8(283–297):37 Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley series in probability and mathematical statistics. https:// doi.org/10.1002/0471725382 Saeys Y, Inza I, Larranaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517 Strehl A, Ghosh J (2002) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617 Tukey JW (1977) Exploratory data analysis, vol 2. Pearson, Reading Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579–2605 Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
concerned about controlling the process in such a way to guarantee the on spec quality of the products and is referred to as statistical process control. It typically includes the steps (1) establishment of expected performance indicators, (2) targeted process monitoring, (3) comparison between observed and expected performance, and (4) corrective adjustment of process control parameters and decisions. Although, initially developed for manufacturing industry in the 1920s and then intensively applied during World War II, the principles of SQC found its way in many nonmanufacturing applications, also in earth sciences (e.g., Montgomery 2020). As a part of a quality management system to fulfill quality requirements, today, SQC can be a key constituent to implement ISO 9000/ISO 9001 – quality management system standards.
Statistical Quality Control
• The lot is rejected although it has the acceptable quality. This is referred to the type I error in statistical testing and is quantified by the error probability α. • The lot is accepted although it is not of acceptable quality. This is referred to the type II error in statistical testing and is quantified by the error probability β.
Jörg Benndorf TU Bergakademie Freiberg, Freiberg, Germany
Definition Statistical quality control (SQC) is the application of statistical methods and optimization to monitor and maintain the quality of products and services. Two branches of SQC are distinguished. The first one is concerned about the quality of particular product items and refers to the name acceptance sampling. Here, a decision is made to either accept or reject the items based on their quality. The second branch is
Acceptance Sampling The idea of acceptance sampling is to choose a representative sample of a product lot and test for defects. If the number of defects exceeds a certain acceptance criterion, the product lot is rejected. The underlying sampling plan is based on a hypothesis test and involves the following steps: • Step 1: Definition of the size of the product lot to be samples • Step 2: Definition of a representative random sample size for the lot • Step 3: Definition of an acceptance criterion for not rejecting the lot (number of defects) The null-hypothesis H0 is that the lot is of acceptable quality. The alternative hypothesis HA is that the lot is not of acceptable quality. There are two types of risks associated with acceptance sampling:
A main decision in acceptance sampling is the number of required samples necessary to guarantee acceptable probabilities α and β while still reacting on a desired deviation limit. This number is defined based on the understanding of the product variability, the uncertainty in monitoring using the test statistical value. An example in a later section demonstrates this process.
S
1452
Statistical Process Control Statistical Process Control (SPC) is a methodology for measuring and controlling quality during the manufacturing process. It requires that quality data are obtained in the form of product or process observations in regular intervals or in real-time during manufacturing. This data is then plotted on a graph with predetermined control limits, the so-called control chart. Control limits are determined by the capability of the process, whereas specification limits are determined by the client’s needs. A typical example of such a control chart is given in Fig. 1 for quality control of run-of-mine (ROM) ore. Control limits correspond to the 2s and 3s confidence intervals. According to the founder of this technique, W.A. Shewhart, the variation observed within the process results from two sources (Shewhart 1931): • Superposition of a number of sources of small variation and background noise (unavoidable) • Large variability resulting from inaccuracies in process control caused by process defects such as imprecise working equipment of human error (avoidable)
Statistical Quality Control, Fig. 1 Example of a control chart
Statistical Quality Control
A reactive process control is managed by the means of the control chart and the defined upper and lower limits. Typically and as illustrated in Fig. 1, limits are derived by statistical inspection of the sample population. The process is deemed “in control” if sample measurements are between the limits. Observations, which are close to the limits or even exceed these indicate the likely occurrence of avoidable causes within the process and call for a root cause analysis and corresponding corrective action. In this way, a so-called iterative Shewhart-cycle can be implemented, which consists of the following steps: • Plan: Plan and design the process and establish expectations in form of process indicators. • Do: Implement the design. • Check: Monitor the process and identify differences between expectations and outcomes or statistical anomalies. • Act: Adjust the process if deviations are significant. This iterative cycle is often used to continuously monitor and adjust the process to guarantee the possible improvement of product quality.
Statistical Quality Control
1453
Application of Acceptance Sampling in Environmental Monitoring One goal of environmental monitoring for any industrial activity is to check the compliance of emission or contamination levels with respect to restrictions imposed by permitting authorities or legislation. Restrictions can be defined in a way that the background contamination existing before the industrial activity starts should not increase significantly. In a sense of statistical quality control, environmental monitoring can be interpreted as a process to take a set of representative samples of a contamination or emission variable within a possible impacted area or domain. This is done to test if a contamination level at a given point in time is statistically and significantly increased compared to the background contamination. Sampling is repeated regularly as imposed by the application dynamics or the permitting requirements and also to understand possible timely trends. An essential task of this type of environmental monitoring is to define the appropriate number of samples in space and in time to meet a desired quality of the applied hypothesis test (e.g., De Gruijter et al. 2006). The quality of the hypothesis test is defined by the acceptable levels of risks associated with the type I and type II error in acceptance sampling. Thus, the required number of samples is a direct function of the frequency distribution if the variable considered within the domain, the threshold of interest, and the error probabilities α and β are related to the statistical test. Example A simple example shall illustrate the reasoning behind this type of application. As part of an environmental impact assessment, the average Cd content of soil within a possible impacted area by mining activities is determined during baseline monitoring by a set of 50 randomly selected independent samples. The mean value of the data and the variance of the data are calculated according to Eqs. (1) and (2). z¼
s2 ðzÞ ¼
1 n
n
n
ðzi zÞ2
ð2Þ
i¼1
1 nðn 1Þ
n
ðzi zÞ2
ð3Þ
i¼1
Applied to the sample values, a mean Cd content of z0¼0.20 mg/kg with a standard deviation (mean) of sðz0 Þ ¼ 0.02 mg/kg has been obtained. The latter has been derived
ð4Þ
where m0 and m1 are the true mean values of the Cd content for the two epochs, and a standard deviation of sz1 z0 ¼
s2 ðz1 Þ þ s2 ðz0 Þ:
ð5Þ
The resulting normalized standard variable is
z¼
z 1 z 0 ð m1 m 0 Þ s2 ðz1 Þ þ s2 ðz0 Þ
ð6Þ
To test, if the difference is of systematic or random nature, a null hypothesis H 0 : D0 ¼ m1 m0 ¼ 0
ð7Þ
can be formulated that assumes a zero difference between the mean Cd content of the baseline monitoring and monitoring epoch 1. The corresponding test statistical value is z¼
i¼1
The standard deviation of the mean value is sðzÞ ¼
D ¼ m1 m0,
ð1Þ
zi
1 ð n 1Þ
based on the standard deviation of the samples of s(z)¼ 0.14 mg/kg. After some months of operation, during a first monitoring epoch, the mean value has been determined again. This time, the mean Cd content is z1 ¼ 0.23 mg/kg with a comparable standard deviation of the mean value sðz1 Þ ¼ 0.02. The first obvious question is whether the Cd content has statistically significantly increased in comparison to the baseline monitoring or in other words: What is the difference between both mean values of systematic or random nature? To answer this question, we can start with the following thoughts: The difference between the two mean values is distributed with an expectation of
z1 z0 ðD0 Þ s2 ðz1 Þ þ s2 ðz0 Þ
ð8Þ
This statistical test value is to be compared to the quantile of the standard normal distribution u that corresponds to the desired error probability α related to the type I error in statistical testing. Depending on whether the focus is on a change in general or a directed change (increase or decrease), a quantile of the standard normal distribution is chosen. This corresponds to either the 100α/2 confidence interval or the 100α interval. For the numeric example, we are interested in a change in general of the mean Cd content. With a desired error probability α ¼5%, the corresponding quantile of the standard
S
1454
Statistical Quality Control
Statistical Quality Control, Fig. 2 Hypothesis test for change detection
Desired max. detection error
Variance of the difference between the mean values
0=
0
Desired error probability
Variance of the difference between the mean values
1=
DL
Detection Limit
normal distribution is u100α/2¼1.96. This value is compared to the test statistical value, which for the numerical example is z ¼ 1.06. H0 is accepted, if z < u100α/2. Thus, in a sense of acceptance sampling, the quality of the product (the impact of the mining activity on the Cd content) is within the limits. The second question is now related to the required number of samples to detect a change within a predefined detection limit with related error probabilities α and β. Figure 2 illustrates the case for a two-sided problem, that means a change in any direction, positive or negative, is to be detected. The detection limit Δ1 is equal to the difference between the true mean values of the Cd content m1 m0. According to Fig. 2, the detection limit DL is obtained by summing up the quantiles of the confidence interval 100 – α/2 related to H0 and β related to the alternative hypothesis HA. Thus the detection limit is DL ¼ u100a=2 ∙ ub ∙
s2 ðz1 Þ þ s2 ðz0 Þ
s2 ðz1 Þ þ s2 ðz0 Þ:
1 DL sðzÞ ¼ p : 2 u100a=2 ub
ð10Þ
For the numerical example, a necessary standard deviation of the mean of sðzÞ ¼0.01 mg/kg is required. According to the relationship 1 sðzÞ2 ¼ s2 ðzÞ n
ð11Þ
and taking into account the standard deviation of the data of s(z) ¼ 0.14 mg/kg, the required number of samples is approximately 200.
Application of SPC in Grade Control in Mining ð9Þ
u100 α/2 quantile of the standard normal distribution related to a two-sided confidence level of 100 – α uβ quantile of the standard normal distribution related to an error probability β (detection error) Assuming α ¼ 5% and β ¼ 5% and the standard deviation of the mean value for each epoch of being equals with sðzÞ ¼ 0.02 mg/kg, the detection limit is DL ¼ 0.10 mg/kg. What is now the minimum number of data n required to guarantee a certain detection limit, say DL ¼ 0.05 mg/kg? First, we assume that standard deviations of the mean value remain the same for two epochs s 2 ðz1 Þ ¼ s 2 ðz2 Þ ¼ s 2 ðzÞ
Solving Eq. (9) with respect to s2 ðzÞ results in
A typical application of statistical quality control in the field of geosciences and geoengineering is ore quality or grade control in mineral resource extraction. Benndorf (2020) documents a closed-loop framework to implement a real-time process monitoring and control framework for grade control in mineral resource exploitation. This is based on the plan-docheck-act (PDCA) iterative management cycle as suggested by Shewhart. It is general and applicable for surface mining and underground operations and can be interpreted as follows: P – Plan and predict: Based on the mineral resource and grade control model, strategic long-term mine planning, shortterm scheduling, and production control decisions are made. Performance indicators such as expected ore tonnage extracted per day, expected ore quality attributes, and process efficiency are predicted.
Statistical Quality Control
1455
Statistical Quality Control, Fig. 3 The real-time mining concept. (After Benndorf and Jansen 2017)
D – Do: The mine plan is executed. C – Check: Production monitoring systems continuously deliver data about process indicators using modern sensor technology. For example, the grade attributes of the ore extracted are monitored using cross-belt scanners. Differences between model-based predictions from the planning stage and actual measured sensor data are detected. A – Act: Differences between prediction and production monitoring are analyzed and root causes investigated. One root cause may be the uncertainty associated with the resource or grade control model to predict the expected performance. Another root cause may be the precision and accuracy of sensor measurements. Using innovative data assimilation methods (e.g., Wambeke et al. 2018), differences are then used to update the resource model and mine planning assumptions, such as ore losses and dilution. With the updated resource and planning model, decisions made in the planning stage may have to be reviewed and adjusted in order to maximize the process performance and meet production targets through the use of sophisticated optimization methods (e.g., Paduraru and Dimitrakopoulos 2018) (Fig. 3). The two key requirements in this application of statistical process control to real-time grade control are the ability to update the resource/grade control model quickly and to derive fast new decisions based on the new model. Key enablers are techniques of data assimilation for resource model updating and mine planning optimization.
Conclusions Natural variability is a key characteristic of any spatial or temporal process or variable considered in earth sciences.
Due to the interaction of humans with the geosphere, variables may change significantly over time. The ability to monitor impacts of anthropogenic activities with respect to environmental or economic indicators requires distinguishing between natural variability and systematic change. Techniques of SQC allow for a statistically sound judgment of this change by designing appropriate sampling strategies, not only to detect change but also to take corrective action in case some indicators drift away. The availability of online sensors and sensor networks in earth observation allows today for the implementation near realtime closed-loop concepts for transparent impact monitoring and process control. In this context, results of SQC provide the means of an objective and transparent communication of project impact.
Bibliography Benndorf J (2020) Closed loop management in mineral resource extraction. Springer, Cham. https://doi.org/10.1007/978-3-030-40900-5 Benndorf J, Jansen JD (2017) Recent developments in closed-loop approaches for real-time mining and petroleum extraction. Math Geosci 49(3):277–306 De Gruijter J, Brus DJ, Bierkens MF, Knotters M (2006) Sampling for natural resource monitoring. Springer Montgomery DC (2020) Introduction to statistical quality control. Wiley Paduraru C, Dimitrakopoulos R (2018) Adaptive policies for short-term material flow optimization in a mining complex. Mining Technol 127(1):56–63 Shewhart WA (1931) Economic control of quality of manufactured product. Van Nostrand, New York Wambeke T, Elder D, Miller A, Benndorf J, Peattie R (2018) Real-time reconciliation of a geometallurgical model based on ball mill performance measurements – a pilot study at the Tropicana gold mine. Mining Technol 127(3):115–130
S
1456
Statistical Rock Physics Gabor Korvin Earth Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Kingdom of Saudi Arabia
Definition Statistical Rock Physics is a part of Rock Physics (Petrophysics) where we use concepts, methods, and computational techniques borrowed from Stochastic Geometry and Statistical Physics to describe the interior geometry of rocks; to derive their effective physical properties based on their random composition and the random arrangement of their constituents; and to simulate geologic processes that had led to the present state of the rock.
Introduction This chapter is about the novel computational tools used in Rock Physics, based on concepts, methods, and computational techniques borrowed from Stochastic Geometry and Statistical Physics. Most techniques that will be discussed belong to the toolbox of the “Digital Rock Physics” (DRP) technology of Core Analysis or the “Statistical Rock Physics” (SRP) procedure of Seismic Reservoir Characterization. My aim has been to help the reader to understand the claims and published results of these powerful new technologies, and – most importantly – to creatively apply stochastic geometry and statistical physics in their own research to other fields of rock physics. For this purpose, the theory discussed will be illustrated on diverse applications: hydraulic permeability, rock failure, NMR, contact forces in random sphere packings, compaction of shale, HC maturation. There are five sections, each with a few applications. Section “Stochastic Geometry”, introduces Gaussian Random Fields and the Kozeny-Carman equation (1.1), and then gives three applications: specific surface from 2D microscopic image, estimating tortuosity, and reconstructing 3D pore space from a 2D image (sections “ACF (Autocorrelation Function) Based Estimates of the Specific Surface”, “The Random Geometry of Tortuous Flow Paths”, and “Reconstructing 3D Pore Space from 2D Sections”). Section “Maximum Entropy Methods” is devoted to Maximum Entropy (ME). ME is explained in section “The Maximum Entropy Method”; section “Maximum Entropy (ME) Pore Shape Inversion” applies ME to pore-shape determination; section “Shale Compaction and Maximal Entropy” derives Athy’s empirical law of shale compaction from the ME principle; section “Contact Force Distribution in Granular Media” is about contact-force distribution in loose
Statistical Rock Physics
granular packs. Section “Self-Consistency and Effective Media” treats the Effective Medium and Differential Effective Medium methods. EM is introduced on the example of random resistor networks in section “Effective Medium (EM) Approximation”, and DEM is illustrated by the iterative computation of the effective elastic moduli of porous or cracked rock in section “Differential Effective Medium (DEM) Approximation”. Section “Thermodynamic Algorithms” summarizes three important thermodynamic algorithms: random walk in section “Random Walk” (applications: spin magnetic relaxation, and electric conductivity in porous rocks); Lattice Boltzmann Method in section “Lattice Boltzmann Methods for Flow in Porous Media” (application: hydraulic flow); and Simulated Annealing (already introduced in section “Reconstructing 3D Pore Space from 2D Sections” as a technique for pore space reconstruction) is applied in section “Simulated Annealing” to find the thermal conductivity distribution of soil constituents. The last part, section “Computational Phase Transitions”, is on Computational Phase Transitions: section “Percolation Theory” deals with Percolation Theory (application: permeability of kaolinite-bearing sandstones); section “Renormalization Group (RNG) Models of Rock Failure” with the Renormalization Group (RNG) approach to rock failure; section “Discrete Scale Invariance” with determining the critical point of phase transition (based on Discrete Scale Invariance); the final section “Arrhenius Law and Source-Rock Maturity” is about how Arrhenius Law of Physical Chemistry is applied to estimate the maturation of source rocks.
Stochastic Geometry Random Functions Along a Line, Random Fields, KozenyCarman Equation A random function, {f(x)}α, is a family of functions depending on a random parameter α, where the independent variable x varies along some line. A given f(x) picked at random from among all possible {f(x)}α-s is a realization. At some fixed point on the line x1, f(x1) is a random value, and it attains different values y with the probabilities Prob [f(x1) < y] ¼ F(x1, y). The following expected values (taken with respect to α, i.e., over all realizations of {f(x)}α) are often enough to characterize a random function: hf(x1)i, hf2(x1)i, hf(x1) ∙ f(x2)i, termed mean value, mean square value, and autocorrelation function. A random function is translation invariant if its statistical properties do not change with respect to a shift along the line; in particular, if hf(x)i ¼ hf(0)i; hf2(x)i ¼ hf2(0)i; hf(x1) ∙ f(x2)i ¼ hf(0) ∙ f(x2 x1)i ≔ Rff(|x1 x2|) where the function Rff is the autocorrelation function (ACF) of f(x). It is an even function, R(x) ¼ R(x). assuming its maximum at x ¼ 0, Rff(0) ¼ hf2(x)i. The normalized autocorrelation function is rff(x) ¼ Rff(x)/hf2(x)i.
Statistical Rock Physics
1457
If x(x,y) or x(x,y,z) is a point in two- resp. threedimensional Euclidean space, then {f(x, y)}α resp. {f(x, y, z)}α are called random fields over the plane, or space, a given field f(x) picked out at random is a realization of the field. A random field is homogeneous if its statistical properties are invariant with respect to a shift, that is if hf(x)i and hf2(x)i are constant, and the autocorrelation function only depends on the difference of x and y hf(x)f(y)i ≔ Rff(x, y) Rff(x y). If the statistical properties of the field are also invariant with respect to rotations and reflections, we speak about a homogeneous and isotropic random field. In such a field the autocorrelation is a function of the magnitude of x y : hf(x)f(y)i ¼ Rff(|x y|). The autocorrelation function
R(r)
¼
a2
exp
(r/r0),
where
tropic field, and r0 is the correlation distance, that is the value of r for which the autocorrelation function decreases to 1/e times its value at r ¼ 0. Most rock-physical applications of random geometry are related to the Kozeny-Carman (KC) Equation which expresses the permeability of a porous sedimentary rock as
t¼
Lhydr 1 L
F3 : S2spec
ð2Þ
There are three concepts of specific surface: Sspec ¼ surface area per unit bulk volume of the rock; S0 ¼ surface area per unit volume of solid material; Sp ¼ surface area per unit volume of pore space. For any arrangement consisting of spherical grains of the same radius r one has Sspec ¼ ð1 FÞS0 ; Sspec ¼ FSp ; Sp ¼
1F S0 : ð3Þ F
ð4aÞ
F3 ð1 FÞ2 S20
ð4bÞ
F S2p
ð4cÞ
1 : bt2
ð4dÞ
k¼C where C¼
An important geometric theorem (Debye’s theorem, see Korvin 2016) about specific surface area Sp (surface area per unit volume of pore space) states that Select in a 2D image of an isotropic porous rock two randomly placed points A and B a distance r apart, where r is small, r 1. Then the probability that A and B are in different media (i.e., one is in a pore, the other in a grain) is Sp ∙
r 2
ð5Þ
ACF (Autocorrelation Function) Based Estimates of the Specific Surface A plane section of the rock can be represented as a map of binary values Pðx, yÞ ¼
where hLhydri is the average hydraulic path length between two points a distance L. Combining bt12 to a single constant C, the KC equation is expressed as k¼C
k¼C
ð1Þ
where b is a shape factor (of the pore throats) of order one, F [0, 1] is porosity, Sspec is specific surface, the dimensionless parameter t is called hydraulic tortuosity,
F3 S2spec
k¼C
r¼
ðx1 y1 Þ2 þ ðx2 y2 Þ2 þ ðz1 z2 Þ2 , belongs to an iso-
1 1 1 k ¼ F3 2 b Sspec t2
(F is porosity) and the KC equation can be written in three forms
1 0
if if
ðx, yÞ P ðx, yÞ P
where P is the total set of pores in the microscopic image of the rock. P(x,y) is the characteristic function of the pores. Assume that P(x,y) is a translation invariant and isotropic random field. If F is the overall porosity, a randomly selected point (x,y) is with probability F in pore, with probability 1 F in the rock matrix. Taking expected values over different realizations of P(x,y), the mean and variance of P (x,y) are hPðx, yÞi ¼ F 1 þ ð1 FÞ 0 ¼ F ½Pðx, yÞ F 2 ¼ P2 ðx, yÞ 2FPðx, yÞ þ F2 ¼ F F2 ¼ Fð1 FÞ: In case of translation- and rotation invariance of P(x,y) its normalized Autocorrelation Function
S
1458
Statistical Rock Physics
rPP(x, ) only depends on r ¼ rPP ðx, Þ ¼
estimates of F and of the ACF are needed. A practical problem that arises here is the thresholding of the images, that is, distinguishing pores from grains.
x2 þ 2 :
h½Pðx, yÞ F ½Pðx þ x, y þ Þ F i ½Pðx, yÞ F 2
¼ rPP ðr Þ: Using Debye’s theorem (Eq. 5), the specific surface of the pore boundaries can be estimated from the slope of the ACF rPP(r) at r ¼ 0 as Sp ¼ 4
d r ðr Þ dr PP
r¼0
:
ð6Þ
Using the Kozeny-Carman Eq. (4) and assuming reasonable default values for shape factor b and tortuosity t, Eq. (6) can be used to predict permeability from the ACF of the microscopic image. If the characteristic function P(x,y) of the pore space has an exponential ACF with correlation distance r0, that is, if rPP ðr Þ ¼ exp rr0 , we get Sspec ¼ r4
The Random Geometry of Tortuous Flow Paths A simple scaling argument (Korvin 2016) shows that the hydraulic tortuosity in 2D sections of granular porous sedimentary rocks is an increasing function of grain size, while it decreases with porosity. Let L be the vertical size of the section considered (assuming that flow goes from top to bottom); F porosity (in fraction); t tortuosity; r0, P0, A0 characteristic size, characteristic perimeter, characteristic area of the grains in the 2D cross section; Z average number of pores adjacent to a grain (in the 2D section). We shall denote by DP/A the exponent in the celebrated Mandelbrot’s perimeter-area scaling law (Mandelbrot 1982) applied to the grains in the microscopic 2D section: p A P ¼ P0 p pr 0
0
and using this in the KC equation gives k ¼ const F3 r 20 : The exponential ACF rðr Þ ¼ exp
ð7Þ
rr0
rðr Þ ¼ exp
r r0
n
ð8Þ
has been found (Ioannidis et al. 1996) to better describe the measured ACF. Here r0 is the correlation distance, and n a positive real exponent. The derivative of this ACF at r ¼ 0 is d dr rðr Þ r¼0
¼ rnn r n1 0
u¼0
which, except for n ¼ 1, cannot
express the specific surface by means of Eq. (6) because for n < 1 it is divergent, while for n > 1 it is zero. To avoid this difficulty, one can use an average correlation distance 1
IS ¼
1
rðr Þdr ¼ 0
exp 0
r r0
n
dr ¼
r0 1 G n n
ð9Þ
ð11Þ
We proceed to show that the average tortuosity of a flow path from top to bottom is given by Lhydr L
works well for
ordinary black-and-white photographs, but for microscopic images of sedimentary rock a stretched exponential function
DP=A
ð1 FÞ P0 ¼t¼Fþ Z r0
p A p ro p
DP=A
ð12Þ
Note the special cases of Eq. (12). For F ¼ 1 we have t ¼ 1; for F ¼ 0 there are no pores at all, the grain/pore coordination number is Z ¼ 0 and consequently t ¼ 1 as it should be (see section “Percolation Theory”; Korvin 1992b). Along a randomly selected vertical line of length L a part FL of the line goes through pore space. In these parts the flow goes along straight line segments. The remaining (1 F)L length of the vertical line is filled by grains, and the fluid ÞL would cross ð1F grains if it could go straight. But it cannot, r0 because every time it reaches a grain it changes direction and continues in a “throat” following the curvature of the grain’s perimeter. By the definition of coordination number Z, the periphery P of a grain is adjacent to Z other grains, so that every individual “detour” adds a length PZ to the hydraulic path. This length of the detour is, by Mandelbrot’s Eq. (11), equal to
P Z
¼ PZ0
p DP=A pA . r0 p
ÞL As there are ð1F such detours, r0
the average tortuosity down to the bottom is indeed and, instead of using (Eq. 7), which was valid for the ACF Rðr Þ ¼ exp rr0 , expresses permeability as k ¼ const FB I Cs , that is, log k ¼ A þ B∙logF þ C∙ log I S
ð10Þ
To find the value of I S ¼ rn0 G 1n , one needs the correlation distance r0 and the stretching exponent n, for which precise
Lhydr L
Þ ¼ t ¼ F þ ð1F Z
P0 r0
p DP=A pA . ro p
Equation (12) com-
pares fairly well with other theoretical predictions (Matyka et al. 2008) of the dependence of tortuosity on porosity. In a published Lattice Gas (LG) model t ¼ 0.8(1 F) + 1; in a ð1FÞ percolation model t ¼ 1 þ a ðFF (a and m are fitting m cÞ parameters, Fc is the percolation threshold). The rule t ¼ Fp for Archie’s electric tortuosity, or t ≈ 1 p logF (p is
Statistical Rock Physics
1459
a fitting parameter; “log” will denote natural logarithm from now on) was obtained for cube-shaped grains and in theoretical studies on diffusive transport in porous systems composed of freely overlapping spheres, the empirical relation t ¼ 1 + p(1 F) was found for sandy or clay-silt sediments, and t ¼ [1 + p(1 F)]2 was reported for marine muds. Reconstructing 3D Pore Space from 2D Sections In this technique, statistical information, obtained from binary images of thin-sections, is used to create a 3D representation of the porous structure. The resulting 3D model honors the statistics of the thin sections, and is used to determine the topological attributes and effective physical properties of the rock. A powerful reconstruction algorithm of this kind employs an optimization technique imitating a thermodynamic process (Simulated Annealing, Yeong and Torquato 1998; see section “Simulated Annealing”). The statistical properties needed are the first two moments ! ! of the characteristic function w r (which is unity if r lies in the pore space, and zero otherwise): the porosity, F ¼ ! w r , and the normalized autocorrelation function, h½wð!r ÞF ∙½wð!r þ!u ÞF i ! . The ACF is fitted by the r u ¼ Fð1FÞ !
model (u) ¼ exp [(u/r0)n], where u ¼ u , r0 is a characteristic length scale (correlation length), and n is an adjustable parameter. In each iteration step i we compute the “energy” function Ei ðrÞ ¼
ri ðuk Þ rref ðui Þ
2
coordination number Zc, defined as the number of throats connected to the given pore. The connectivity of a volume V of the rock is measured by its specific genus GV, GV ¼ G/ V ¼ (b n + 1)/V where b and n are the number of branches and nodes in the pore network (see Korvin 1992b: 302–303). The parameter G ¼ b n + 1 is the maximal number of independent paths between two points in pore space, that is, the maximum number of independent cuts that can be made in the branches of the network without disconnecting it into separate networks. (Only pores with Z 3 are considered “nodes” in the topological context.) For large networks, the genus per node G0 is related to average coordination number by G0 ¼ (hZ3i/2) 1).
Maximum Entropy Methods The Maximum Entropy Method Suppose a measurable rock property l can assume values belonging to L distinct ranges Λ1, . . ., ΛL. If we measure l on a large number N of samples, we will find N1 values in range Λ1, . . ., NL values in range ΛL. Letting ¼
L i¼1
N i , pi ¼ NNi , the
set of numbers L
fp1 , p2 , , pL g, pi 0,
pi ¼ 1
ð13Þ
i¼1
which is to be min-
k
imized, where {ui} ¼ {0, Δu, 2Δu, } is the digitized argument of the ACF, ri(u) is the actually observed ACF in this iteration step, rref(u) is the ACF of the field to be reconstructed. Starting from a random configuration with porosity F, two voxels of different phases are exchanged at each iteration step. Thus F remains constant. The new configuration is accepted with the probability P given by the Metropolis rule P¼
exp½ðEi1 Ei Þ=T
1 if Ei Ei1 if Ei >Ei1
where T plays the role of temperature. In case of rejection, the old configuration is restored. By decreasing T, fields fi with minimal energy Ei(r) will be generated, that is their correlation properties will be similar to that of the reference field fref. A 3D pore space can be transformed to a network of nodes (pore bodies) connected by bonds (pore channels, throats). In the most commonly used method one applies a 3D morphological thinning algorithm to extract the pores’ medial axes for the skeleton of the pore space. The network model is useful to determine the rock’s local and overall connectivity properties. Local connectivity is expressed by the pore’s
constitute a discrete probability distribution. It can represent different degrees of randomness: the distribution p1 ¼ 1, p2 ¼ p3 ¼ pL ¼ 0 is not random; the distribution p1 ¼ 12 , p2 ¼ 12 , p3 ¼ pL ¼ 0 is not too random, the distribution where all lithologies are equally possible, p1 ¼ p2 ¼ p3¼ ¼ pL ¼ L1 is the most random. To characterize quantitatively the “randomness” of the distribution {p1, p2, , pL}, count how many ways one can classify the N samples such that N1¼ p1N belongs to Λ1, N2¼ p2N to Λ2, . . ., NL ¼ pLN to ΛL. The number of such classifications is given by P¼
N! N 1 !N 2 ! N L !
ð14Þ
The larger is P, the more random the distribution. Instead of P it is easier to estimate log P (“log” always means natural logarithm in this chapter), logP ¼ log N! log N 1 ! log N L If n>>1 we have the approximate Stirling’s formula
ð15Þ
S
1460
Statistical Rock Physics
logðn!Þ ¼ logð1∙2∙3 nÞ ¼ log 1 þ log 2 þ log 3 þ þ log n
n
logxdx ¼ nlogn n~nlogn
a, c
(16) a, c
1
Using this approximation in Eq. (15): L
L
N i log N i ¼
logP N log N i¼1 L
¼ N i¼1
N i log i¼1
Ni n
Ni N log i N n
L
pi log pi ¼ N∙Sðp1 , p2 , , pL Þ
¼ N
ð17Þ
i¼1
where Sðp1 , p2 , , pL Þ ¼
L
pi log pi
is the Shannon
i¼1
entropy of the probability distribution (p1, p2, , pL). In Rock Physics we frequently have to solve an overdetermined system of equations ð1Þ
F1 ðx1 , x2 , , xL Þ
¼
ymeasured
⋮ FM ðx1 , x2 , , xL Þ
⋮ ¼
ðMÞ ymeasured
⋮
ðL M Þ
In the Maximum Entropy (ME) Technique we accept that particular solution of this system whose Shannon entropy is maximal. Maximum Entropy (ME) Pore Shape Inversion In an ME-based rock-physical inversion Doyen (1987) determined the pore-shape distribution in igneous rocks from hydraulic conductivity and/or dc electric conductivity measurements performed at a series of confining pressures. His rock model consisted of a random distribution of spherical pores connected with each other by tubes (“throats”). (This model is discussed in Korvin et al. 2014.) At a given reference pressure P ¼ P0 the pores’ radii r are distributed according to the probability density function (pdf) n (r); the throat-length l is distributed according to the pdf n(l); the throats have elliptic cross-section with semi-axes b and c following the bivariate pdf n(α, c) where α is the aspect ratio α ¼ b/c. Using Elasticity Theory (Zimmerman 1991) to compute the deformation of voids under pressure P, the pdf’s for r(P), c(P), l(P), α (P) can be determined for any pressure step from their values at reference pressure. The pdf n(α,c) satisfies the selfconsistency equation (cf. Kirkpatrick 1973; see section “Effective Medium (EM) Approximation”) for all pressure steps P:
ge ðPÞ ge ðP, a, c, lÞ ge ðP, a, c, lÞ þ Z2 1 ge ðPÞ
ð18aÞ
gh ðPÞ gh ðP, a, c, lÞ ¼0 gh ðP, a, c, lÞ þ Z2 1 gh ðPÞ
ð18bÞ
nða, cÞ
nða, cÞ
where Z is the average pore coordination number (number of throats incident to a pore); ge ðPÞ and gh ðPÞ are the measured bulk electric and hydraulic conductivities at pressure P, and ge ¼ ge(P, α, c, l) and gh ¼ gh(P, α, c, l) are the theoretical conductivities of an originally (α,c)-shaped throat of length l subjected to pressure P. Elasticity Theory also provides a theoretical value βp(r, F) for the compressibility of a rock at pressure P if it contains a volume-fraction F of r-sized pores; and an expression for its expected value βP ({n(r)},F). Similarly, for a volume fraction F of throats of length l and cross-sectional shape (α,c) both βP(α, c, l, F) and its expected value can be theoretically determined. Subjecting the rock to pressure P, the pore parameters change as r ! r(P), l ! l(P), c ! (P), α ! α(P), the measured bulk compressibility becomes β(P). Assuming that the compressibilities due to pores and throats are independent and additive, one gets bp fnðr Þg, Fp , P þ bl ðfnða, cÞg, Fl , l, PÞ ¼ bðPÞ
ð19Þ
Let V(r) be the volume of a pore of radius r, and V(α,c,l) the volume of an (α,c)-shaped tube of length l. Then n(r) and n(α,c) satisfy the additional constraints: Sr nðr Þ ¼ 1, Sa,c nða, cÞ ¼ 1, N p Sr nðr ÞV ðr Þ þ N t Sa,c nða:cÞV ða:c, lÞ ¼ Fp þ Ft ¼ F where Np and N t ¼ N p Z2 are the numbers of pores , respectively throats, in a unit volume of rock. By simple geometry, Np, Nt, Fp, Ft, Z, l can be expressed in terms of n(r),n(α, c), F. As there are much more possible values of n(r) and n(a,c) than pressure steps P, we assume that n(r) and n(α,c) are Maximum Entropy solutions, that is, Sr nðr Þ log nðr Þ ¼ max; Sr nða, cÞ log nða, cÞ ¼ max
nðr Þ log nðrÞ ¼ max; r
a, c
nða, cÞ log nða, cÞ ¼ max
ð20Þ subject to the conditions
Statistical Rock Physics
a, c
nða, cÞ
a, c
1461
ge ðPÞ ge ðP, a, c, lÞ ¼0 ge ðP, a, c, lÞ þ Z2 1 ge ðPÞ
ð20aÞ
gh ðPÞ gh ðP, a, c, lÞ gh ðP, a, c, lÞ þ Z2 1 gh ðPÞ
ð20bÞ
nða, cÞ
bp fnðr Þg, Fp , P þ bl ðfnða, cÞg, Fl , l, PÞ ¼ bðPÞ
ð20cÞ
Sr nðr Þ ¼ 1, Sa,c nða, cÞ ¼ 1, N p Sr nðr ÞV ðr Þ þ N t Sa,c nða:cÞV ða:c, lÞ ¼ Fp þ Ft ¼F
ð20dÞ
The constraining Equations (20a–d) have the common form r, a, c
nðr:a, cÞAðr, a, c, PÞ ¼ 0 for P ¼ ¼ P1 , P2 , , PN
where (r, α, c), satisfying nðr, a, cÞ 0;
r, a, c
ð21Þ
nðr, a, cÞ ¼ 1 is
the joint pdf of {r, α, c } at the reference pressure. Its Shannon entropy is S½nðr, a, cÞ ¼
r , a, c
nðr, a, cÞ log nðr, a, cÞ
ð22Þ
Using the Lagrange multiplier technique, we first find the vector which minimizes the function N
f ðlP1 , , lPN Þ ¼ log
r , a, c
exp
lPk Aðr, a, c, Pk Þ k¼1
ð23Þ and then find the ME solution (called “pore spectra”) as exp
N
lPk Aðr, a, c, Pk Þ
k¼1 N
nðr, a, cÞ ¼ r , a, c
exp
ð24Þ lPk Aðr, a, c, Pk Þ
k¼1
Shale Compaction and Maximal Entropy By Athy’s law (Athy 1930) in thick pure shale porosity decreases with depth as FðzÞ ¼ F0 expðkzÞ
ð25Þ
where F(z) is porosity at depth z, F0 porosity at the surface, and k a constant. Assuming all pores have the same volume, the porosity of a rock is proportional to the number of pores in
a unit volume of the rock. Athy’s rule states in this case that the pores in compacted shales are distributed in such a manner that their number in a unit volume of rock exponentially decreases with depth. There are several analogies of this rule in Statistical Physics. The most familiar is the barometric equation of Boltzmann for the density r(z) of the air at altitude z as nrðzÞ ¼ rð0Þ∙ exp mgz kT , where m is the mass of a single gas molecule, g gravity acceleration, k Boltzmann’s constant, and T absolute temperature. In Statistical Physics (Landau and Lifshitz 1980: 106–114) Boltzmann’s equation is derived from the assumptions that the gas particles move independently of each other and the system tends toward its most probable (maximum entropy) state. During compaction of shale, water is expulsed and clay particles rearrange themselves toward a more dense system of packing. Korvin (1984) adopted Litwiniszyn’s model (1974) and considered shale compaction history as an upward migration of pores. Take a rectangular prism P of the present-day shale of unit cross-section reaching down to the basement at depth Z0, and suppose its mean porosity is F, that is, it contains a fractional volume FZ0 of fluid and a volume (1 F)Z0 of solid clay particles. Assume that the compaction process is ergodic, that is, it tends toward the maximumentropy final state. Neglecting the actual depositional history we assume for time t ¼ 0 an initial condition where a prism of water of unit cross-section, height FZ0 and density r1 had been overlain by solid clay of height (1 F)Z0 and density r2, r1 < r2. Divide the prism of water into N “particles” (water-filled pores), each of volume ΔV, which at t ¼ 0 started to migrate upwards independently of each other, until the final (maximum entropy) state had been reached. The initial potential energy of the system had been E ¼ N∙V∙gðr2 r1 Þð1 FÞZ 0,
ð26Þ
where initial porosity and particle number are connected by 0F N ¼ ZDV . Divide the prism P into N equal slabs of thickness Δz ¼ Z0/Δz, denote the ith slab by γi (i ¼ 0, 1, , N 1), and divide the prism P into N ¼ Z0/ΔV nonoverlapping small cubes. We have N N if F is sufficiently small. We rank the N* possible positions, called “states,” of a pore into N groups: a pore is said to belong to the group γi if and only if its center (x, y, z) lies within the slab γi. This implies every group γi contains G ¼ Δz/ΔV states. Suppose that Ni pores are found in state γi. The numbers Ni satisfy two constraints, the conservation of pore-particle number, and the conservation of total potential energy: N1
Ni ¼ N i¼0
ð27aÞ
S
1462
Statistical Rock Physics N1
ei N i ¼ E
ð27bÞ
i¼0
where E is the total energy (see Eq. 26); εi is the potential energy of a single pore particle in group γi, due to buoyancy: ei ¼ g ðr2 r1 Þ∙DV∙i∙Dz
ð28Þ
The set of numbers {Ni} determine the macroscopic distribution of pores inside the prism P. Apart from a constant factor, the entropy of the distribution is N1
S¼
N i log i¼1
eG Ni
are acting on this body’s boundary, contact forces arise between neighboring grains and the average stress of the assembly, sij , depends on the contact forces f ci between neighbor grains and on the vectors lci pointing from the center of the first grain to the center of the second,
ð29Þ
sij ¼
N1
ni log i¼1
e ni
ð30Þ
and the constraints (27a, b) become N1
N1
ni ¼ N, G∙
G∙ i¼0
ni ei ¼ E
ð31Þ
i¼0
The pore particles will migrate to a position such that the entropy (Eq. 29) be maximal. To maximize the entropy subject to the constraints (31), we introduce Lagrange multipliers α, β assume that @ N E @ni S þ a G þ b G ¼ 0 ði¼ 0, 1, , N 1Þ, that is F ni ¼ expða þ bei Þ, wherefrom exp a ¼ 1f , b ¼ E1 and FðzÞ ¼
F z ∙ exp 1f ð1 fÞZ 0
ð32Þ
Identifying the first factor in Eq. (32) with surface porosity F0, the equation becomes FðzÞ ¼ F0 ∙ exp
ð1 þ F0 Þz Z0
ð33Þ
an equation that reproduces Athy’s compaction law. Contact Force Distribution in Granular Media K. Bagi (2003) determined the probability density functions of contact force components in the principal directions of average stress in random granular assemblies by maximizing the statistical entropy, and proved that the distribution of force magnitudes is exponential. She considered a body consisting of randomly packed, cohesionless grains. If external stresses
M
ð34Þ
lci f cj c¼1
(M is the total number of contacts; the index “c” runs over all contacts; i, j ¼ 1, 2, 3 indicate vector components; V is the total volume). If the coordinate axes (x1, x2,x3) are parallel to the principal directions of sij , then sij ¼ 0 if i 6¼ j. Consider Eq. (34) first for i ¼ j ¼ 1:
Denote the average number of pore particles in group γi by ni , then ni ¼ N i =G, S¼G
1 V
s11 ¼
1 V
M
lc1 f cj :
ð35Þ
c¼1
Since when two grains are in contact, any of them can be “first” or “second,” we can assume that 0 f c1 f max .We discretize this domain into equal (small) intervals: a particular f c1 can fall into the kth interval where the medium value is F1k ð1Þ with probability pk , k ¼ 1, 2, , or into the second interval where the medium value is F12, etc. The distribution of the f1ð1Þ ð1Þ ð1Þ component is described by p1 , p2 , , pk , ; representing the probabilities that f c1 falls into the 1st, 2nd, . . . kth,. . . interval. Next consider all those grain contacts where the f c1 contact force component falls into a chosen kth interval (that is f c1 F1k ). Take the average of the lc1 grain-center-to-graincenter vector components for these contacts, and denote it as ð1Þ l1k where the superscript “(1)” expresses that the grouping was done according to the f1-components. With these notations, Eq. (35) is averaged to s11 ¼
1 V
1
ð1Þ
M∙pk
ð1Þ
ð36Þ
ðjÞ
ð37Þ
lk Fk1k :
k¼1
and we similarly get all nine equations sij ¼
1 V
1
ðjÞ
M∙pk
lik Fkjk
k¼1
Suppose the geometrical data and average stress are given and we are searching for the distribution of the ith contact force component {fi}. The unknown {p1, p2, , pk, } values satisfy 1
pk ¼ 1 k¼1
and (for a fixed i ¼ j)
ð38Þ
Statistical Rock Physics
1463 1
M V
sij ¼
ðjÞ pl F k¼1 k ik jk:
ð39Þ ðjÞ
We need a geometric assumption on the lik values. Suppose that they are constant: ðjÞ
ð jÞ
ðjÞ
ðjÞ
li1 ¼ li2 ¼ li3 ¼ ≔li
ð40Þ
The average stress then can be expressed (for fixed i ¼ j) as sij ¼
1
M l V i
pF : k¼1 k jk
ð41Þ
We are looking for that distribution which has the highest entropy:
1
p k¼1 k
log pk ¼ max,
ð42Þ
pk ¼ 1
ð43Þ
subject to the conditions 1 k¼1
sij ¼
M V
1
ðjÞ pl F k¼1 k ik jk
ðfor a fixed i ¼ jÞ:
ð44Þ
Introduce the Lagrange-multipliers α and β associated with the constraints (Eqs. 43 and 44). Easy algebra yields: pk ¼
ð jÞ 1 ∙ exp bli Fjk expð1 þ aÞ
ð45Þ
where α and β can be (numerically) determined from the system of nonlinear equations (using the given data sii , li , V and M): 1
ðjÞ
Fjk exp 1 bli Fjk a k¼1 1
V ð jÞ
Mli
sij ¼ 0 ði¼ jÞ ð46Þ
ðjÞ
exp 1 bli Fjk a 1 ¼ 0
ð47Þ
k¼1
Equation (45) means that the distribution of contact force components is exponential, provided that Eq. (40) is an acceptable simplification.
Self-Consistency and Effective Media Effective Medium (EM) Approximation A good introduction (Kirkpatrick 1973) to classical Effective Medium Theory is the study of random resistors on the bounds
of a lattice. Apply an external field along one axis so that the voltages increase by a constant amount per row of nodes. The average effect of the random resistors can be described by an effective medium, that is, a hypothetical homogeneous medium in which the total field inside is equal to the external field. Consider an effective field which is made up of equal conductances, gm, connecting nearest neighbors on a cubic mesh. To find gm, we require that the local perturbations in this medium due to the voltages, induced when individual conductances gij replace gm, should average to zero. Consider a conductance gAB between randomly picked neighboring nodes A, B in the direction of the external field. Using Kirchoff’s laws it can be shown that if we change gm to gAB, the arising voltage, Vo, induced between A and B will be V 0 ¼ V m ðgm gAB Þ= gAB þ Z2 1 gm , where Vm is the constant increase in voltages per row, and Z is the coordination number. If the conductivities gij are distributed according to the pdf f(g), the requirement that the average of Vo should vanish yields an equation for gm: dgf ðgÞðgm gÞ= g þ Z2 1 gm . (A special case of this equation was applied in section “Maximum Entropy (ME) Pore Shape Inversion”, as Eq. 18a, 18b.) Another simple case of the EM approximation (Stroud 1998) concerns the random mixture of two types of grains, denoted by 1 and 2, which are present in relative volume fractions p and 1 p, and characterized by conductivities s1 and s2. To calculate the effective conductivity se of this composite, imagine that each grain, instead of being embedded in its actual random environment, lies in an homogeneous effective medium, whose conductivity se will be determined self-consistently. If the grains are spherical, the electric field e inside them is given in Electrostatics as Ei ¼ E0 si 3s þ2se (i ¼ 1 or i ¼ 2), where E0 is the electric field far from the grain. Selfconsistency requires that the average electric field within the grain should equal E0, that is 3se e pE0 s13s þ2se þ ð1 pÞE0 s2 þ2se ¼ E0 leading to the quadratic equation se se p ss11þ2s þ ð1 pÞ ss22þ2s ¼ 0. The equation has two solue e tions for se, one physically sensible, the other unphysical. The EM approximation is also used in Percolation Theory (see section “Percolation Theory”) where transport properties (such as conductivity or diffusivity) grow as a power-function above, but near to, the percolation threshold, pc. For p > pc the electrical conductivity s of a random network grows as s / (p pc)t, p > pc where p is the occupation probability for a site or bond, and the scaling exponent t is equal to 2 in 3D. In the Effective Medium (EM) approach (Ghanbarian et al. 2014) a randomly selected part of the heterogeneous medium is replaced by the homogeneous material. Replacing the actual transport coefficient (e.g., conductivity) by an effective one perturbs the local current. The effective conductivity is determined by requiring that the average of these
S
1464
Statistical Rock Physics
perturbations be zero. For the lattice percolation model of electric conductivity in rocks this gives (for two components, 1p see Kirkpatrick 1973, Stroud 1998) sðpÞ=sb ¼ 1 1 2= Z
where sb is the electrical conductivity of the brine, grains are nonconducting, and Z is the coordination number. In n-dimensional lattices the coordination number Z and pc are approximately related as Zpc ¼ d/(d 1) so in 2D one has 2/ c Z ≈ pc, and ssðpb Þ pp 1p . c
Differential Effective Medium (DEM) Approximation The elastic properties of porous or cracked rocks are commonly modeled by the DEM (Differential Effective Medium) approach (Mavko et al. 1998; Almqvist et al. 2011; Haidar et al. 2018). In DEM a composite material is constructed stepwise by making infinitesimal changes to an already existing composite. For example, a two-phase porous or cracked rock can be built up by incrementally adding small fractions of pores or cracks to the rock matrix. As we insert new inclusions into the background model, it will continuously change as: ð1 F Þ½ K ðF Þ ¼ ð1 FÞ½m ðFÞ ¼
ðK 2 K ÞPð2Þ ðFÞ ðm2 m ÞQð2Þ ðFÞ
(see Mavko et al. 1998) where F is porosity; K(F) is the effective bulk moduli of DEM; K* is bulk modulus of the matrix (phase 1); K2 bulk modulus of the inclusion (phase 2); P(2) is geometry factor for an inclusion of material 2 in a background medium with effective moduli K* and m; m(F) is the effective shear moduli of DEM; m shear modulus of the matrix (phase 1); m2 shear modulus of the inclusion (phase 2); Q(2) geometry factor for an inclusion of material 2 in a background medium with effective moduli K and m. The DEM equations (for K and m) are coupled, as both depend on both the bulk and shear moduli of the composite through the geometry factors. The equations are numerically integrated starting from porosity F ¼ 0 with initial values K*(0) ¼ Km and m(0) ¼ mm which are the mineral values for the single homogeneous solid constituent. Integration then proceeds from y ¼ 0 to the desired highest value y ¼ Fmax as the inclusions are slowly introduced into the solid. The method needs a background rock matrix, the geometry factors, the elastic moduli of the inclusions, and the fraction of inclusions as input. The value of the aspect ratios are also needed in the geometry factors. There are three ranges of aspect ratio values, for the interparticle pores, the stiff pores, and the cracks. There is a control step where one calculates the reference velocity V ref P using the DEM equations. (It depends on the aspect ratio of the interparticle pores, the fraction of inclusions, the elastic moduli of the rock matrix and of the inclusions.) The reference value V ref P serves to decide whether stiff pores or cracks are to be added to the rock matrix. If
meas meas , then stiff pores are added, and if V ref , V ref P < VP P > VP then cracks are added to the matrix. When the process is complete, the effective elastic moduli can be calculated. In an interesting application (Almqvist et al. 2011) of DEM to compute the elastic constants for calcite and muscovite mixtures the background medium was anisotropic. For anisotropic elastic media DEM is computed as
dC DEM 1 ¼ C CDEM Ai 1V i dV where CDEM is the elastic tensor of the effective media, V is the volume fraction of the inclusions, Ci is the elastic tensor of the inclusion, and Ai, which relates the strain inside the inclusion with that of the background matrix, is defined as Ai ¼ I þ G Ci CDEM
1
Here I is the fourth-rank unit tensor, G is a symmetric tensor Green’s function, Ci is the elastic modulus of the inclusion, the tensor CDEM is updated in each inclusion step. Inclusions are ellipsoids with given aspect ratio (a b c).
Thermodynamic Algorithms Random Walk Gaussian random walk is a mathematical model of the Brownian motion of small particles in liquids. For this motion the change of the position X(t) of the randomly moving particle ΔX(t) ¼ X(t + Δt) X(t) obeys a Gaussian distribution with mean hΔX(t)i ¼ 0 and variance h[ΔX(t)]2i1/2 / Δt. Two characteristic rock-physical applications of random walk modeling are a study of nuclear magnetic relaxation (NMR) in brine-filled porous rocks (Jin et al. 2009), and randomwalk-based determination of the electric conductivity in rocks (Ioannidis et al. 1997). The random-walk method of simulating the Nuclear Magnetic Resonance (NMR) response of fluids in porous rocks can be applied on grain-based and voxel-based representations of the rock. In the grain-based approach a spherical grain pack is the input, where the solid surface is analytically defined; in the voxel-based approach the input is a computer-generated 3D image of the reconstructed porous media. The implementation of random walk is the same in both cases, but it has been found that spin magnetization decays much faster in the digitized models than in case of analytically given surfaces because of overestimating the irregular pore surface areas. For fluids in porous media, total magnetization originates from diffusion and relaxation processes. Three processes are involved in the relaxation of spin magnetization (only
Statistical Rock Physics
1465
transverse relaxation is considered): (1) bulk fluid relaxation, with characteristic time T2B due to dipole–dipole interactions between spins within the fluid; (2) surface relaxation, characterized by effective relaxation time T2S, due to the additional interaction of protons at the pore-grain interface with paramagnetic impurities in the grains, and the motion of water molecules in a layer near the pore-grain interface; and (3) relaxation due to background magnetic field heterogeneities, T2D. The effect of the pore surface on the relaxation time T2S is proportional to specific surface area: T12S ¼ r VS where r is surface relaxivity, DB diffusion coefficient of the bulk fluid, S is pore surface area, and V pore volume. The decay rate T2 of the magnetization is given by T12 ¼ T12B þ T12S. Far away from a boundary, T2 is equal to the bulk fluid transversal relaxation time, T2B. For spins close to a pore boundary, the magnetization decay is locally enhanced by surface relaxation. Most random-walk simulations of NMR assume that the spin is “killed” with a probability p when it reaches a boundary, and relate this probability to surface relaxivity r as p ¼ 0:96A rDr DB where Δr is the displacement the spin achieves during the interval [t, t + Δt] when it reaches the boundary, and A is an empirical factor of order 1 (usually set to 2/3). In the discussed more realistic model (Jin et al. 2009) of local surface relaxation the spin is not “killed,” but its magnetization decreases by a factor (1 p) ≈ exp [Δt(p/Δt)]. Therefore, the total relaxation rate becomes 1/T2 ¼ 1/T2B + p/Δt. Equivalently, the relaxation time used in each step of the random walk can be written 1 1 3:84r ¼ þw T 2 T 2B Dr
ð48Þ
where in the grain-based model, w ¼ 1 when the walker is located within a step of the relaxing boundary, and 0 otherwise. The starting points of walkers are randomly picked in the pore-space occupied by fluids (water and/or oil). Each walker is allowed to walk for a sufficiently long time, whenever it attempts to step out of the pore space into the matrix, its magnetization decays with a rate determined by Eq. (48). At ! each time-step Δt, a random vector n ¼ nx , ny , nz is generated, whose independent components are normally distributed with unit variance and zero mean. The walker is then ! ! spatially displaced by D r ¼ ð6DB DtÞ !nn . The time step Δt k k is dynamically changed along the walk so that it is smaller than any magnetic-field pulse duration, and the amplitude of ! the total displacement, D r , is much shorter than the relevant geometric length scales (e.g., pore throats, wetting film thickness, etc.). It was assumed that the pores are fully brinesaturated, the pulse sequence was of the Carr–Purcell– Meiboom–Gill (CPMG) type with an inter-echo spacing of 1 ms, the parameters were T2B ¼ 2800ms, DB ¼ 2mm2/ms,
r ¼ 0.005. One analytically treatable case, a cubic packing of spheres, a random packing of monodisperse and polydisperse spheres, and a digitized micro X-ray-CT image were used as test cases. The theoretical case was a single spherical pore of radius R where the magnetization decay in the fast diffusion limit (rR/DB < 1) is given by MMð0tÞ ¼ exp t T12B þ r VS , M0 being the initial transverse magnetization (equal to 1 in this study). Simulations were performed for the case DrRB ¼ 0:02. Grain-pack based simulations were made for the cubic packing of spheres, and for random packing of monodisperse and polydisperse spheres. The voxel-based approach was run on the micro X-ray-CT image of a Fontainebleau sandstone sample with 21.5% porosity, and the image consisted of 512 512 512 voxels, with 5.68 mm voxel size. In another random-walk model (Ioannidis et al. 1997), in an attempt to find the formation factor F in Archie’s equation, instead of solving Laplace’s equation these authors exploited the formal analogy between random walks and the solutions of the Laplace equation (Kim and Torquato 1990). They calculated the mean square displacement a function of time for a large number of random walkers. A single walker starts at a random void voxel (i0. j0, k0). It can take random steps to any neighboring void, without penetrating solid grains. The 2
Þ time required for each step is, by Einstein’s rule tvv ¼ ð@D s0 , where s0 is conductivity of the pore fluid (set to 1), @D is distance traveled p in one step p in a 3D lattice of spacing δ, so that it can be d, d 2 or d 3. To avoid premature disappearance of walkers, periodic boundary conditions are assumed. The random walker travels a large number of steps (~106) and there are many (~103) independent walkers. The average displacement after time t will be R2 ¼ Fsa t where s(¼s0/F) is the effective electrical conductivity, and Fa is the fraction of voxels accessible for the walker. For such rocks (carbonates, shales) where solid-void, void-solid, and solidsolid transitions are also allowed, Einstein’s formula changes Þ2 ð@DÞ2 to tss ¼ ð@D ¼ F (Fs is the formation factor of the s ss s0 matrix). Steps between solid and void are only allowed with a 1 given probability. In the model they used pvs ¼ 1þF and s
psv ¼
Fs 1þFs
. The time to cross an interface is t ¼
ð@DÞ2 2 t t F , and tsv ¼ ss þ t, tvs ¼ vv þ t. ∙ 1 þ Fs 4s0 s 4 4 Lattice Boltzmann Methods for Flow in Porous Media The Lattice Boltzmann Method (LBM) is a numerical tool for simulating complex fluid flows or other transport processes (Ghassemi and Pak 2010), a finite-difference method to solve the Boltzmann transport equation on a lattice. It is a simplified molecular dynamics model where space, time, and particle-velocities assume discrete values, each lattice node is connected to its neighbors, there can be either 0 or 1 particles at a lattice node, particles move with a given velocity. In
S
1466
Statistical Rock Physics
discrete time instants, each particle moves to a neighboring node, and this process is called streaming. When more than one particle arrive at the same node from different directions, they collide and change their velocities according to collision rules conserving particle number (mass), momentum, and energy before and after the collision. The LBM originated from Boltzmann’s kinetic theory of gases, where the gas consists of a large number of particles in random motion. Transport in this model is described by the ! ! equation @f @t þ u ∙∇f ¼ O where f x , t is the particle distribution function, ! u is particle velocity, Ω the collision operator. As compared to Boltzmann’s gas dynamics the number of particles is much reduced, and they are confined to the nodes of a lattice. The different numerical realizations of LBM are classified by the DnQm scheme, where “Dn” stands for “n dimensions,” “Qm” stands for “m speeds.” Here we only discuss D2Q9, the 2D model where a particle moves (“streams”) in nine possible directions, including the possibility to stay at rest. These velocities are referred to as microscopic velocities and are denoted by ei, i ¼ 0,. . .,8. In the D2Q9 model the nine velocity vectors are: i 0 1 2 3 4 5 6 7 8 ! ei (0,0) (1,0) (0,1) (1,0) (0,-1) (1.1) (1,1) (1,-1) (1,-1)
Each particle follows a discrete distribution of probabili! ties f i x , t , i ¼ 0. . .8, of streaming in any one particular direction. The macroscopic fluid density is the sum of micro8
!
scopic particle distribution functions, r x , t ¼
!
fi x, t ,
i¼0
! !
the macroscopic velocity u x , t is the average of microscopic velocities ! e weighted with the distribution functions fi, u x , t ¼ r1
i
8
! !
cf i ! ei where c ¼ Δx/Δt is the lattice speed.
i¼0
The key steps in LBM are streaming and collision, which are given by !
f i x þc ei ∙Dt, tþDt f i ð x , tÞ !
!
!
¼ !
where f eq x, t i
fi x,t t
! f eq x,t i
ðstreamingÞ
ðcollisionÞ
is the local equilibrium distribution, and t
the relaxation time toward local equilibrium. For single-phase flows, it suffices to use the simplified Bhatnagar-Gross-Krook collision rules (Bhatnagar et al. 1954) where equilibrium ! ! ! distribution is defined as f eq x , t ¼ wi r þ rsi u x , t i 2 ! !! !! ei ∙!u ! with si u ¼ wi 3 eic∙ u þ 92 32 uc∙2u and the 2 c weights wi are 4/9 for i ¼ 0; 1/9 for i ¼ 1, 2, 3, 4; and 1/36 for i ¼ 5, 6, 7, 8. The fluid’s kinematic viscosity in the D2Q9
2
ðDxÞ model is related to relaxation time by ¼ 2t1 6 ∙ Dt . Parameters are selected such that the lattice speed c be much larger than the maximum fluid velocity vmax expected to arise in the simulation. This is expressed by the “computational” Mach number, Ma ¼ vmax/c. Theoretically, it is required that Ma 1, in practice, Ma should be, at least, 0.1. ! The algorithm runs as follows: (1) Initialize r, u , f i and eq f i ; (2) Streaming step: f i ! f i in the direction of ! ei ; ! (3) Compute the macroscopic r and u from f i ; (4) Compute f eq i ; (5) Collision step: calculate the updated distribution f i ¼ f i 1t f i f eq i ; (6) Repeat steps 2–5. During streaming and collision, the boundary nodes need special treatment. In classical fluid dynamics, the interface between solids and the fluid is assumed to be non-slip, but this would slow down computations. In most studies “bounceback” conditions are used, so that any flux of fluid particles that hits a boundary simply reverses its velocity.
Simulated Annealing Simulated Annealing (SA) is an optimization algorithm devised by Kirkpatrick et al. (1983), who realized the analogy between the thermodynamics of the annealing of metals (i.e., the metallurgic technology of alternating heating and controlled cooling of a metal to reduce its defects), and the optimization problem of finding the global minimum of a multiparameter objective function. In SA a random perturbation is applied to a system’s current configuration, so that a trial configuration is obtained. Let Ec and Et be the energy levels of the current and trial configurations. If Ec Et, then the trial configuration is accepted and it becomes the current configuration. On the other hand, if Ec < Et, then the trial configuration is accepted with a probability given by P(ΔE) ¼ exp [ΔE/kBT] where ΔE ¼ Et Ec, kB is the Boltzmann constant, and T is temperature. This step helps the system to jump out from local minima. After a sufficient number of iterations at a fixed temperature, the system approaches equilibrium, where the free energy reaches its minimum value. By gradually decreasing T and repeating the simulation process (starting out every time from the equilibrium state found for the previous T value), new lower energy configurations are achieved. Two characteristic rockphysical applications (apart from pore-space reconstruction, see section “Reconstructing 3D Pore Space from 2D Sections”) are determination of fluid distribution in a porous rock (Politis et al. 2008) and finding the soil-components’ specific thermal conductivity (Stefaniuk et al. 2016). In the first, pore space is represented by a set of cubic voxels of side l. Each voxel is labeled by an integer indicating its phase. The solid phase is labeled as 0, the fluid phases as 1, 2, 3, , n if there are n different pore-fillers. The saturation Si of phase i is the volume fraction of the total pore space occupied by phase i. The distribution of the n fluid phases in the pore space is
Statistical Rock Physics
1467
determined assuming that it minimizes the total interfacial free energy n1
n
i¼0
j>i
Gs ¼
Aij sij ,
ð49Þ
where Aij is the area of the interface between phases i and j, sij the interfacial free energy per unit area between phases i, j. The interfacial free energies obey Young’s equation so that the following n(n 1)/2 equations are satisfied: s0i ¼ sij cos yij
ð50Þ
where θij is the contact angle of the ij-interface with the solid surface, i 6¼ 0, j 6¼ 0, i < j. Suppose n fluid phases are distributed in the pore space. The number of voxels belonging to each phase depends on the saturation Si. At each minimization step n voxels, each from a different fluid phase, are randomly selected, and n(n 1)/2 random swaps are performed. Each trial swap causes a change of Gs by ΔGs, ij where i 0, the swap i $ j is accepted with probability Pij ¼
exp DGs,ij =Gref nðn 1Þ=2
ð51Þ
where Gref stands instead of the kBT parameter. After a sufficient number of iterations, the system approaches equilibrium for a specific Gref value. By gradually decreasing Gref, and repeating the process, lower energy levels of Gs will be approached. The “cooling schedule” in this example was Gref ¼ lNGref, N where N is iteration number, l (0 < l < 1) a tunable parameter. In the soil-physical application the soil consists of organic matter, water in its pores, and some soil materials say clay, silt, and sand. Suppose the thermal conductivity of water sw and of the organic matter som are known, and we want to determine the probability density functions fCl(s), fSi(s), fSa(s) for the specific thermal conductivities of the clay, silt, and sand constituents, from the knowledge of the volumetric fractions FCl, FSi, FSa, Fw, Fom and from the measured overall thermal conductivity smeas of the composite soil. The pdf of the thermal conductivity of the solid phase is given by the mixture rule (Korvin 1982) f ðsÞ ¼ ½FCl ∙f Cl ðsÞ þ FSi ∙f Si ðsÞ þ FSa ∙f Sa ðsÞ =ðFCl þ FSi þ FSa Þ:
ð52Þ We create a large cubic sample consisting of N 1 voxels. Out of them, N∙ð1 FCl FSi FSa Þ randomly selected
voxels are kept for water and organic matter, we assign to them the known thermal conductivities sw ¼ 0.6Wm1K1 and som ¼ 0.25Wm1K1. The remaining N∙ð1 Fw Fom Þ voxels are randomly associated with clay, silt, or sand, for example, clay is taken with the probability FCl/ (FCl + FSi + FSa). The thermal conductivities assigned to these voxels are random values with the pdf (52). To find the effective thermal conductivity s for a realization o we numerically solve the heat flow boundary value problem. The solution yields the effective thermal conductivity hs(o)i and averaging over a sufficient number N of realizations we get the cube’s homogenized thermal conductivity as shom ¼
1 N
N
s oj
ð53Þ
j¼1
The SA step is used to find such pdfs fCl(l), fSi(l), fSa(l) for which shom of Eq. (53) gives the closest match to the measured data. Starting from some initial guess of the pdfs we make them evolve toward optimal solutions that minimize the “energy” E which, at any step, is defined as E ¼ (shom smeas)2. As the pdf changes through the steps, it modifies the energy function such that E ! E, ΔE ¼ E E. The change of the pdf is accepted with the probability PðDEÞ ¼
1 exp½DE=T
DE0 DE>0 i
where T is a fictitious temper-
ature, being decreased as T ¼ αTi 1, where 0 < α < 1 is a control parameter. (In Stefaniuk et al. 2016, they had α ¼ 0.9.)
Computational Phase Transitions Percolation Theory We explain the method through a hydraulic permeability study (Korvin 1992a; for a percolation treatment of Archie’s law of electric conductivity, see, e.g., Hunt 2004). Percolation Theory was invented by S. R. Broadbent in the 1950s who worked on the design of gas masks for use in coal mines. The masks contained porous carbon granules into which the gas could penetrate. Broadbent found that if the pores were large enough and sufficiently well connected, the gas could permeate the interior of the granules; but if the pores were too small or inadequately connected, the gas would not get beyond the granules’ surface. There was a critical porosity and pore interconnectedness, above which the mask worked well and below which it was ineffective. Thresholds of this sort are typical of percolation processes. In the bond-percolation problem we assume that a fraction 1-p (0 < p < 1) of the bonds of a regular grid are randomly cut and a fraction p are left uncut. Then there exists a critical fraction pc (called percolation threshold) such that there is no continuous connection along the bonds of the network between the opposite
S
1468
Statistical Rock Physics
Statistical Rock Physics, Table 1 Lattices with their percolation probabilities (From Korvin 1992a, b: 22) Lattice
Dimension
Coordination n
Statistical Rock Physics, Table 2 Tortuosity exponents (after Korvin 1992a, b: 29) Model of the percolation path Straight line through the correlation length x Minimum path Conductive path Self-avoiding random walk on uncut bonds Brownian motion in 3D Brownian motion on a dfdimensional fractal
faces for p < pc, and there exists a connection with probability 1 for p > pc. For the two-dimensional square lattice the percolation threshold is 0.5. In the more general case the percolation threshold depends on the dimensionality of the network, d, and on its coordination number Z (the average number of bonds connected to any node of the network), but it is independent of the detailed structure of the network. Table 1 lists coordination numbers and percolation thresholds for some common networks. In n-dimensions, the percolation thresholds and coordination numbers conform closely to the empirical rule: Zpc ¼d/(d 1). Close to the percolation threshold (p > pc) the nodes that are connected with each other by continuous paths form large clusters of average size x, called the correlation length. The correlation distance diverges for p ! pc, p > pc as x / (p pc)n, for three-dimensional networks u ¼ 0.83, independently of the coordination number. Percolation between two opposite nodes of a cluster, a distance x apart, takes place along tortuous zig-zag paths. Near the percolation threshold the length L(x) of a typical flow path will grow as a power of x: L(x) / (p pc)α for p ! pc, p > pc. As the correlation length x is the natural length scale in percolation problems, we define the tortuosity of the percolation path as: t ¼ L(x)/x / xα 1 ≔ (p pc)γ where, for different models of the percolation path the tortuosity exponents γ are compiled in Table 2. Percolation theory was applied to compute the permeability of kaolinite-bearing sandstone samples (see Korvin 1992a, b: 28–33. Kaolinite is a “discrete-particle” clay, and it is preferentially deposited in the throats of the sandstone’s pores, completely blocking them.) If the pore structure of a sedimentary rock is converted to a discrete lattice by letting pores correspond to nodes, and throats to bonds, then the continuous Darcy flow becomes a lattice percolation. For kaolinite-bearing sandstones, if a given throat is completely blocked by kaolinite the corresponding bond will be considered as “cut.” If any throat is open with probability p and blocked by kaolinite particles with probability q ¼ 1-p, then in the equivalent lattice
γ 0
Note 3D percolation
0.25 0.29
3D percolation 3D percolationconduction 3D percolation
0.58 0.83 0.83(1.5df 1)
Using the AlexanderOrbach conjecture
percolation problem a fraction q of the bonds is randomly cut. There exists a percolation threshold such that the fluid cannot flow through the sample for p < pc and percolation starts for p > pc. At the onset of percolation, the fluid particles follow zig-zag paths; the closer p is to pc, the greater will be the length L(x) of a typical path between two nodes, which are geometrically a distance x apart. Expressing the Kozeny-Carman (KC) equation in terms of the hydraulic radius as R2HYD 1 b ∙F∙ t2 ðRHYD ½mm Þ2 F t12 ∙109 ), b
k¼
,
(more
k½md ¼
precisely,
let l denote the volume fraction of kaolinite, F porosity, then the ratio of open pore space to the total F space filled by pores or clays is p ¼ Fþð1F Þl. The tortuosity tends to infinity with p ! pc, p > pc as t / (p pc)n, that is n2 1 t2 / ðp pc Þ . Define a percolation function PERC as PERC ¼
n2
∁ 0 ð p pc Þ ¼ ∁ 0 ð p pc Þ
0
if
p pc
PEX
if
p > pc
where PEX ¼ n2, the normalizing constant ∁0 is chosen such as to make PERC(1) ¼ 1, that is ∁0 ¼ 1/(1 pc)PEX. To find 2 the prefactor in the asymptotic law t12 / ðp pc Þn , we consider clean sand with l ¼ 0 kaolinite content, in which case p ¼ 1 and PERC(1) ¼ 1, that is for t0 we can choose a reasonable average tortuosity for clean sands, say t0 ¼ 4. ð1FÞl p Geometrical considerations give RHYD ¼ 13 ∙ ðFþ 1FÞð1lÞ ∙r p (where F, l 6¼ 1, r is mean grain radius). The final expression for k becomes, as function of F, r, Z, l (porosity, grain radius, coordination number, and kaolinite volume content):
k¼
R2HYD p pc ∙F∙109 ∙ 1 pc bt20
PEX
p pc 0
ð54Þ
p < pc
F with b ¼ 2, t0 ¼ 4, pc ¼ 1.5/Z; p ¼ Fþð1F Þl ; RHYD ¼ p Fþ ð 1F Þl RHYD ¼ 13 ∙ ð1FÞð1lÞ ∙r p.
Statistical Rock Physics
1469
Renormalization Group (RNG) Models of Rock Failure How can we find the critical percolation probability pc, or the critical number of cracks per unit volume that make an originally solid rock fall to pieces? Such questions of rock physics can be solved by the Renormalizations Group method of Statistical Physics (where it had been adopted from Quantum Field Theory). In the RNG approach, to study the critical behavior of a large system, we first consider its smaller subsystems, find their behavior, and then we put them together and express in terms of them the behavior of a larger-scale subsystem. Suppose that each of the nth order subsystems can be in one of the states S1, S2, , SK with probabilities ðnÞ ðnÞ ð nÞ ð nÞ K p1 , p2 , , pK i¼1 pi ¼ 1 . Similarly, any (n + 1)stðnþ1Þ
, order subsystem belongs to state Si with probability pi ðnþ1Þ ðnþ1Þ ðnÞ K p ¼ 1 . The distributions p and pi i i¼1 i are related by a system of K nonlinear equations ðnþ1Þ
ðnÞ
ðnÞ
ðnÞ
p1
¼
F1 p1 , p2 , , pK
ðnþ1Þ pK
ðnÞ ðnÞ ðnÞ ¼ FK p1 , p2 , , pK
ð55Þ
where the functions Fi depend on the physical properties of the states, and on the geometry as the (n + 1)st-order subsystems are put together from the nth order ones (“coarsegraining”). The critical probabilities are solutions of Eq. (55) in the limit n ! 1: p1
ðcritÞ
¼
F1 p1
ðcritÞ
ðcritÞ pK
¼
FK p1
ðcritÞ
, p2
ðcritÞ
ðcritÞ
, , pK
ðcritÞ ðcritÞ , p2 , , pK
ð56Þ
In the case of rock damage (Allègre et al. 1982) there are only twostates(fragile/sound)withprobabilitiesp(n),q(n);p(n) +q(n) ¼ 1 in the nth step of coarse-graining, that is Eqs. (55 and 56) become p(n + 1) ¼ F[p(n)] and, for n ! 1, pc¼ F(pc). Coarse-graining is based on the observation, that rock failure is consequence of fractures, a fracture is consequence of microfissures, and a microfissure is consequence of microcracks. This is shown in Fig. 1 where each block in the nth step can be fragile or sound with probabilities p(n) and 1 p(n). Define a cubic cell “fragile” if it contains no continuous pillar connecting any two opposite faces. Otherwise, the cell is “sound.” Summing up the probabilities of all possible fragile configurations (with multiplicities because of topological equivalence) and letting n ! 1 the critical probability satisfies the algebraic equation p ¼ p4(3p4 8p3 + 4p2 + 2), whose nontrivial solution is pc ¼ 0.896. We note that a
Statistical Rock Physics, Fig. 1 Coarse graining. (From Allègre et al. 1982)
different definition (Turcotte 1986) of “fragility” would lead to another equation p ¼ 3p8 32p7 + 88p6 96p5 + 86p4 whose acceptable solution yields quite a different (and more realistically!) pc ¼ 0.49. Discrete Scale Invariance In May, 2006, a mud volcano erupted in Java, triggered by the drilling of an oil exploration well. The pressure created by the drilling was sufficient to break an about 1800 m long connected path for the deep mud to propagate up to the surface. Can we understand, simulate, and predict such overall failures of solid rock massifs using Renormalization Group (RNG) and Phase Transition Theory? Suppose there exists a function F(t) of time (or of temperature, pressure, porosity, etc.) that can be continuously measured and which behaves as an order parameter in the sense of Landau’s theory of second order phase transitions, that is, it is identically zero on one side of the phase transition, while nonzero and increasing as a power function on the other side, changing its behavior at a critical time instant tc. How can one find such an F(t) and the critical time-instant tc? A way to solve this is to assume Discrete Scale Invariance (Saleur et al. 1996; Sornette 1998) at the critical time tc. Suppose a system undergoes phase transition (nonconductive ! conductive, predictable ! chaotic, solid ! damaged, etc.). In
S
1470
Statistical Rock Physics
Statistical Rock Physics, Table 3 Lopatin’s TT index and corresponding stages of CH generation. (From Waples 1980) TTI 15 75 160 ~500 ~1000 ~1500 >65,000
Stage Onset of oil generation Peak oil generation End of oil generation 40 oil preservation deadline 50 oil preservation deadline Wet gas preservation deadline Dry gas preservation deadline
spent in the heat-window 100 C + 10 ∙ k C T0 100 C + 10 ∙ (k + 1) C; k ¼ 0, 1, 2, . N.V. Lopatin used TTI in the 1970s to predict the thermal maturation of coal. Both for coal and HC good correlation has been established between TTI and vitrinite reflection, which is a microscopically determinable measure of maturity. The threshold values given in Table 3 connect Lopatin’s TT index to the distinct stages of CH generation.
Conclusions case of scale invariance, the order parameter F(t) satisfies F(0) ¼ 0 and F(l|t tc|)¼ m(l) F(|t tc|) for all l > 0. This is Pexider’s functional equation (Korvin 1992b: 75–76), whose solution is m(l) ¼ lα; F(|t tc|) ¼ |t tc|α, that is F behaves as a power function. In case of discrete scale invariance F(l|t tc|)¼ m(l) F(|t tc|) only holds for a finite set of the l values, l ¼ l1, l2, , ln. For the case lk ¼ lk and letting t ¼ |t tc| we have FðtÞ ¼ ta Y
log t log l
where Θ is a periodic function. Fourier expanding to first order: F(t) ¼ A + Btα + Ctα cos (o ∙ logt j) with o ¼ 2π/logl, that is at phase transition the power-function-like increase is “decorated” with logperiodic corrections. The unknown parameters of the Fourier expansion, A. B, C, tc, α, l, ’ can be found with nonlinear fitting. (An approximate value of tc might be found from RNG.) Arrhenius Law and Source-Rock Maturity In Physical Chemistry, according to the Arrhenius equation, the rate r of chemical processes exponentially grows with temperature: r ¼ A ∙ exp [Ea/RT] where A is a constant, Ea activation energy, R the universal gas constant, and T absolute temperature. In Basin Analysis (Waples 1980; Lerche 1990), Arrhenius equation is our main tool to estimate the maturation of hydrocarbon in source rocks, in the knowledge of the basin’s burial-temperature-time history. The summary effect of increasing temperature T(t) during a geological time window [t1, t2] is given by the maturation integral t ½ a =RT ðtÞ dt þ C0 . For hydrocarbon maturation, A∙ t12 expE the constant Ea/R is selected such as to satisfy the empirical finding that in the 50 C 250 C temperature range, and for typical reservoir pressures, the rate of decomposition of organic materials doubles at every 10 C temperature rise. If one selects A ¼ 1, C0 ¼ 0, and the integral is substituted by a discrete sum, it will go over to Lopatin’s Time Temperature Index(TTIÞ ¼ Dtk ∙2k where Δtk is the time (in million years) k
Apart from living organisms, rocks are the most complex objects in the world. Prediction of their effective physical properties (such as elastic moduli, hydraulic or electric conductivity) based on their mineralogical composition and internal geometry as function of geologic age, depth, pressure, and temperature is the basic task of Geophysics, Geodynamics, and Geomechanics. Because of the inherent randomness of rocks, this task requires the powerful tools developed for treating random systems of an abundant number of degrees of freedom. This chapter described some recent Rock Physical applications of Stochastic Geometry and Statistical Physics. To help readers to follow recent developments, and to apply these techniques in their own research, each technique was described in a self-contained manner, and illustrated by actual examples. We foresee that similar to the Edwards-style Statistical Physics of Granular Materials (Makse et al. 2004) Statistical Rock Physics would become an important, independent part of Condensed Matter Physics.
Cross-References ▶ Autocorrelation ▶ Computational Geoscience ▶ Entropy ▶ Flow in Porous Media ▶ Geocomputing ▶ Geologic Time Scale Estimation ▶ Geomechanics ▶ Grain Size Analysis ▶ Inversion Theory in Geoscience ▶ Maximum Entropy Method ▶ Optimization in Geosciences ▶ Partial Differential Equations ▶ Porosity ▶ Pore Structure ▶ Porous Medium ▶ Probability Density Function ▶ Rock Fracture Pattern and Modeling
Statistical Rock Physics
▶ Scaling and Scale Invariance ▶ Simulated Annealing ▶ Simulation ▶ Spatial Autocorrelation ▶ Spatial Statistics ▶ Statistical Computing ▶ Stochastic Geometry in the Geosciences Acknowledgments This report could not have been prepared without the excellent facilities provided by my “second home,” the Oriental Collection of the Hungarian Academy of Sciences’ Library. When working on the first revision, I learned the sad news about the death of Dr. Károly Posgay, geophysicist, on December 25, 2019, at the age of 95. Twenty years under his guidance, 1966–1986, changed me, a budding applied mathematician, to an Earth Scientist. I dedicate this contribution to his memory.
Bibliography Allègre CJ, Le Mouel JL, Provost A (1982) Scaling rules in rock fracture and possible implications for earthquake prediction. Nature 297:47–49 Almqvist BSG, Mainprice D, Madonna C, Burlini L, Hirt AM (2011) Application of differential effective medium, magnetic pore fabric analysis, and X-ray microtomography to calculate elastic properties of porous and anisotropic rock aggregates. J Geophys Res 116:B01204. https://doi.org/10.1029/2010JB007750 Athy LI (1930) Compaction and oil migration. Bull Am Ass Petrol Geol 14:25–35 Bagi K (2003) Statistical analysis of contact force components in random granular assemblies. Granul Matter 5:45–54 Bhatnagar P, Gross E, Krook M (1954) A model for collisional processes in gases I: small amplitude processes in charged and neutral onecomponent system. Phys Rev A 94:511–524 Doyen PE (1987) Crack geometry of igneous rocks: a maximum entropy inversion of elastic and transport properties. J Geophys Res 92(B8):8169–8181 Ghanbarian B, Hunt AG, Ewing RP, Skinner TE (2014) Universal scaling of the formation factor in porous media derived by combining percolation and effective medium theories. Geophys Res Lett 41(11):3884–3890 Ghassemi A, Pak A (2010) Pore scale study of permeability and tortuosity for flow through particulate media using lattice Boltzmann method. Int J Numer Anal Meth Geomech. https://doi.org/10.1002/ nag.932 Haidar MW, Reza W, Iksan M, Rosid MS (2018) Optimization of rock physics models by combining the differential effective medium (DEM) and adaptive Batzle-Wang methods in “R” field, East Java. Sci Bruneiana 17(2):23–33 Hunt AG (2004) Continuum percolation theory and Archie’s law. Geoph Res Lett 31:L19503 Ioannidis MA, Kwiecen MJ, Chatzis I (1996) Statistical analysis of porous microstructure as a method for estimating reservoir permeability. J Pet Sci Eng 16:251–261 Ioannidis MA, Kwiecen MJ, Chatzis I (1997) Electrical conductivity and percolation aspects of statistically homogeneous porous media. Transp Porous Media 29(1):61–83
1471 Jin G, Torres-Verdín C, Toumelin E (2009) Comparison of NMR simulations of porous media derived from analytical and voxelized representations. J Magn Reson 200:313–320 Kim IC, Torquato S (1990) Determination of the effective conductivity of heterogeneous media by Brownian motion simulation. J Appl Phys 68:3892–3903 Kirkpatrick S (1973) Percolation and conduction. Rev Mod Phys 45(4):574–588 Kirkpatrick S, Gelatt CD Jr, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 Korvin G (1982) Axiomatic characterization of the general mixture rule. Geoexploration 19(4):267–276 Korvin G (1984) Shale compaction and statistical physics. Geophys J R Astron Soc 78(1):35–50 Korvin G (1992a) A percolation model for the permeability of kaolinitebearing sandstone. Geophys Trans 37(2–3):177–209 Korvin G (1992b) Fractal models in the earth sciences. Elsevier, Amsterdam Korvin G (2016) Permeability from microscopy: review of a dream. Arab J Sci Eng 41(6):2045–2065 Korvin G, Oleschko K, Abdulraheem A (2014) A simple geometric model of sedimentary rock to connect transfer and acoustic properties. Arab J Geosci 7(3):1127–1138 Landau LD, Lifshitz EM (1980) Statistical physics. Part1. (Vol. 5 of Course of theoretical physics). Pergamon Press, Oxford, pp 106–114 Lerche I (1990) Basin analysis. Quantitative methods. Part I. Academic, San Diego Litwiniszyn J (1974) Stochastic methods in the mechanics of granular bodies. International centre for mechanical sciences. Courses and lecture notes no 93. Springer, Wien Makse HA, Brujic J, Edwards SF (2004) Statistical mechanics of jammed matter. Wiley – VCH Verlag GmbH, Berlin Mandelbrot B (1982) The fractal geometry of nature. W.H. Freeman, New York Matyka M, Khalili A, Koza Z (2008) Tortuosity-porosity relation in porous media flow. Phys Rev E 78:026306 Mavko G, Mukerji T, Dvorkin J (1998) The rock physics handbook: tools for seismic analysis in porous media. Cambridge University Press, Cambridge Politis MG, Kainourgiakis ME, Kikkinides ES, Stubos AK (2008) Application of simulated annealing on the study of multiphase systems. In: Tan CM (ed) Simulated annealing. I-Tech Education and Publishing, Vienna, pp 207–226 Saleur H, Sammis CG, Sornette D (1996) Discrete scale invariance, complex fractal dimensions, and log-periodic fluctuations in seismicity. J Geophys Res 101(88):17661–17677 Sornette D (1998) Discrete scale invariance and complex dimensions. Phys Rep 297:239–270 Stefaniuk D, Różański A, Łydżba D (2016) Recovery of microstructure properties: random variability of soil solid thermal conductivity. Stud Geotech Mech 38(1):99–107 Stroud D (1998) The effective medium approximations: some recent developments. Superlattice Microst 23(3/4):567–573 Turcotte DL (1986) Fractals and fragmentation. J Geophys Res B91:1921–1926 Waples DW (1980) Time and temperature in petroleum formation: application of Lopatin’s method to petroleum exploration. AAPG Bull 64:916–926 Yeong CLY, Torquato S (1998) Reconstructing random media. Phys Rev E Stat Nonlinear Soft Matter Phys 57:495–506 Zimmerman RW (1991) Compressibility of sandstones. Elsevier, Amsterdam
S
1472
Statistical Seismology Jiancang Zhuang The Institute of Statistical Mathematics, Research Organization of Information and Systems, Tokyo, Japan
Synonyms Earthquake statistics; Seismometrics; Seismostatistics
Statistical Seismology
The history of statistical seismology can be roughly divided into three episodes: • Episode I: Exploration and accumulation of simple individual empirical laws • Episode II: Construction of a theoretical framework and development of time-dependent forecasting models • Episode III: Model development and forecasting and testing attempts This chapter presents an episode-wise discussion of the concepts, theories, and methods in statistical seismology according to the above timeline.
Definition The terminology statistical seismology can be considered from two different perspectives or scopes: special and general. The special scope is similar to that used in statistical physics. Statistical seismology aims to develop stochastic models that describe the patterns of seismicity, along with related statistical inferences, to understand the physical mechanisms of earthquake occurrence and produce probability forecasts of earthquakes. The general scope of statistical seismology includes the use of statistical methods related to traditional seismology, such as Bayesian methods in geophysical inversion. Synonyms of statistical seismology include “seismometrics” and “seismostatistics.” Generally, statistical seismology falls under the special scope. Knowledge of this interdisciplinary subject has been widely applied in earthquake physics, earthquake forecasting, and earthquake engineering. This chapter only considers topics under the special scope.
Historical Overview The term “statistical seismology” first appeared in a paper, titled “An Application of the Theory of Fluctuation to Problems in Statistical Seismology,” by Kishinouye and Kawasumi (1928) in the Bulletin of the Earthquake Research Institute, Tokyo Imperial University, 4, 75–83 (Fig. 1). This term was then used by Keiti Aki in a paper, titled “Some problems in statistical seismology” in Japanese on Zisin in 1956. In 1995, David Vere-Jones gave lectures on the statistical analysis of seismicity at the Graduate School of Chinese Academy of Science (now the University of Chinese Academy of Science) in Beijing. As suggested by Yaolin Shi, he named the lecture “statistical seismology” (Vere-Jones 2001). At present, this subject has become an important branch of seismology, serving as a bridge between seismic-waveform dominant traditional seismology and tectonics/geodynamics by providing theoretical and technical tools for analyzing seismicity.
Episode I: Exploration and Accumulation of Simple Individual Empirical Laws This period started at the beginning of the twentieth century. In 1982, John Milne, James Ewing, and Thomas Cray installed the first model seismometer in Japan, marking the start of modern seismology. Seismometers enable the detection of the occurrence of global earthquakes, such that we can calculate their occurrence time and hypocenter locations and compile relatively complete earthquake catalogs. The primary applications of statistics in earthquake studies during this period were simple statistical techniques, such as linear regression and point estimates, among others, scattered among individual studies on different topics. The following subsections summarize the main findings during this stage. The Gutenberg-Richter Law for the Magnitude-Frequency Relationship In 1944, Gutenberg and Richter published a formula describing the relationship between any magnitude, e.g., m, and the number of earthquakes in any given region and period (Gutenberg and Richter 1944): log 10 N ðPmÞ ¼ a bm,
or,
N ¼ 10abm ,
ð1Þ
where N(⩾m) is the number of earthquakes with a magnitude no less than m in the given region and period and b is the so-called Gutenberg-Richter b-value. In probability terms: PrfmagnitudePm j magnitudePmc g ¼
10abm ¼ 10bðmmc Þ : 10abmc
N ðPmÞ N ðPmc Þ ð2Þ
Therefore, the cumulative distribution function (c.d. f. or cdf) for the magnitude distribution is as follows:
Statistical Seismology
1473
Statistical Seismology, Fig. 1 Head portion of the first page of Kishinouye and Kawasumi’s 1928 paper
S
1474
Statistical Seismology
FðmÞ ¼ PrfmagnitudeOmg ¼ 1 PrfmagnitudePmg ¼
1 10bðmmc Þ ,
if mPmc ,
0,
otherwise,
lðtÞ ¼ K=ðt t0 þ cÞp þ
i¼1
ð3Þ
and the corresponding probability density function (p.d.f. or pdf) is as follows: f ðm Þ ¼ ¼
b 10bðmmc Þ ln 10,
if mPmc ;
0,
otherwise,
b ebðmmc Þ 0,
if mPmc ; otherwise,
ð4Þ
where β is linked with b by β ¼ b ln 10 ≈ 2.3026b. The Omori-Utsu Formula for the Aftershock Frequency The Omori-Utsu formula describes the decay of the aftershock frequency with time after the mainshock, as an inverse power law. Omori (1894) examined the aftershocks of the 1891 Ms8.0 Nobi earthquake, first attempting to use an exponential decay function to fit the data, but unsatisfactory results were obtained. Afterward, it was found that the number of aftershocks occurring each day can be described by the equation: 1
nð t Þ ¼ K ð t þ c Þ ,
ð5Þ
where t is the time from the occurrence of the mainshock, and K and c are constants. Utsu (1957) postulated that the decay of the aftershock numbers can vary, proposing the following equation: nðtÞ ¼ K ðt þ cÞp ,
NT
ð6Þ
which yields better fitting results. Equation (6) is referred to as the modified Omori formula or the Omori-Utsu formula. The Omori-Utsu formula has been used extensively to analyze, model, and forecast aftershock activity. Utsu et al. (1995) reviewed the values of p for more than 200 aftershock sequences, finding that this parameter ranges from 0.6 to 2.5, with a median of 1.1. No clear relationship between the estimates of the p-values, and the mainshock magnitudes was found. Not only do mainshocks trigger aftershocks, but large aftershocks can also trigger their own aftershocks. To model such phenomena, Utsu (1970) used the following multiple Omori-Utsu formula:
K i H ðt t i Þ , ðt ti þ ci Þpi
ð7Þ
where t0 is the occurrence time of the mainshock, ti, i ¼ 1, . . ., NT define the occurrence times of the triggering aftershocks, and H is the Heaviside function. The likelihood for the multiple Omori-Utsu formula is slightly more complicated than that for the simple Omori- Ustu formula but can be written in a similar manner. One difficulty in applying the multiple Omori-Utsu formula is to determine which earthquakes have triggered an event. The largest aftershocks often (but not always) have secondary aftershocks. We can use the techniques of residual analysis described in the section as a diagnostic tool, to understand which events have secondary aftershocks. The Ba˚th Law for the Maximum Magnitude of Aftershocks and Other Scaling Laws Another well-known empirical law in earthquake clusters is the Båth law, which asserts that the magnitude difference between the mainshock and the largest aftershock has a median of 1.2 (Båth 1965). A more general form proposed by Utsu (1961) has the following form: M0 M1 ¼ c1 M0 þ c2 ,
ð8Þ
where c1 and c2 are constants and M0 and M1 are the magnitudes of the mainshock and its largest aftershock, respectively. This law is useful for evaluating the possible loss caused by aftershocks. Many empirical scaling laws related to earthquake magnitude were established during this period, including the following: 1. Several empirical laws for the relationship between the Richter magnitude, ML Richter 1935, and the amplitude, A, of waves recorded at seismographs. For example, the Lilie empirical formula is as follows: ML ¼ log 10 A c1 þ c2 log 10 D,
ð9Þ
where Δ is the epicenter distance and c1 and c2 are constants. 2. Relationship among the shaking intensity, I0, at the epicenter, magnitude, M, and focal depth, h (e.g., Gutenberg and Richter 1942): M ¼ a0 I 0 þ a1 log h a2 , where a0, a1, and a2 are constants.
ð10Þ
Statistical Seismology
1475
3. Scaling relationship between the number, N, of aftershocks and the size, M, of the mainshock (Yamanaka and Shimazaki 1990): log 10 N ¼ c log 10 M d,
ð11Þ
where c and d are constants. 4. Seismic moment, M0, scaling with the fault area, S (Kanamori and Anderson 1975): log 10 S ¼ a0 log 10 M0 a1 ,
Basic Formulation Point process modeling is the key tool in statistical seismology for describing the occurrence of earthquakes, where some probabilistic rules are specified for the occurrence of earthquakes in time and/or space. When some other quantities, such as intensities or magnitudes, are attached to each event, the point process is then referred to as a marked point process.
ð12Þ
Introduction to Conditional Intensity
ð13Þ
Denote a point process, in time by N, and a certain temporal location, by t. We assume in the following discussion that we have known observations up to time t. The most important characteristic is the waiting time u to the next event from time t. We can therefore consider the following cumulative probability distribution function of the waiting time:
or fault length, L (Shimazaki 2013): log 10 L ¼ b0 log 10 M0 b1 ,
model and the ETAS model were developed based on the Omori-Utsu formula.
where a0, a1, b0, and b1 are constants. Most empirical studies have been based on simple linear regressions.
Ft ðuÞ ¼ Prffrom t onward, waiting time to next event u j observations before tg,
Episode II: Construction of a Theoretical Framework and Development of Time-Dependent Forecasting Models This stage began in the 1970s with two milestones: the introduction of the point process model and the development of the theory of conditional intensity in the point process. First, the development of stochastic models for earthquake risks was a requirement for earthquake engineering. Building codes, with respect to the design of a building structure, required the consideration of the probability of the occurrence of the largest ground-shaking event for a specific period into the future. In earthquake engineering, the stationary Poisson model (also referred to as the time-independent model) was often used to estimate the future earthquake hazard. Considered as a classic reference, Vere-Jones (1970) proposed the point process to de- scribe the process of earthquake occurrence times, focusing on the tools required to generate a functional and spectrum analysis. Vere-Jones (1973) introduced the use of the conditional intensity in statistical seismology, which is defined as the expectation of earthquake occurrence under the condition of previous knowledge on the earthquake process and/or external observations. The core idea of model development is bringing into existing stochastic models with more nonrandomness based on physical theory and observations. During this episode, for long-term earthquake hazard evaluations, renewal models were developed by modifying the inter-event time in the Poisson model to more general random distributions. The stress release model was developed by adding Reid’s elastic rebound theory to the rate function of the Poisson model. For short-term earthquake forecasting, the Reasenberg and Jones
ð14Þ with the corresponding probability density function: f t ðuÞdu ¼ Prffrom t onward, waiting time is between u and u þ du j observations before tg, ð15Þ The survival function can be defined as follows: St ðuÞ ¼ Prffrom t onward, waiting time > u j observtions before tg ¼ PrfNo event occurs between t and t þ u j observations before tg:
ð16Þ The hazard function can be defined as follows: ht ðuÞdu ¼ Pr
next event occurs between tþu and tþuþdu
Observations bef ore t and that no event occurs between t and tþu
ð17Þ
Finally, the cumulative hazard function can be defined as L t ð uÞ ¼
u 0
ht ðsÞ ds:
ð18Þ
These functions are related in the following manner: St ð uÞ ¼ 1 Ft ð uÞ ¼
1 u
¼ expL ½ t ðuÞ ,
u
f t ðsÞds ¼ exp ht ðsÞds 0
ð19Þ
S
1476
Statistical Seismology
f t ð uÞ ¼
u dFt dS ¼ t ¼ ht ðuÞ exp ht ðsÞds du du 0
¼ expL ½ t ð uÞ ht ð uÞ ¼
dLt ðuÞ , du
n
ð20Þ
dLt ðuÞ d , ½logð1 Ft ðuÞÞ ¼ du du
u 0
f t ðsÞds :
n
log lðti Þ
ð25Þ
T
lðuÞ du:
ð26Þ
S
i¼1
ð21Þ
ð22Þ
Using the above formulas, if a random variable, W, represents the waiting time, which has a cumulative distribution function of Ft( ), then Ft(W) belongs to a uniform distribution on [0, 1]; consequently, Λt(W ) follows an exponential distribution of the unit rate. In the above concepts, the hazard function ht has the property of additivity; the hazard function of the superposition of two subprocesses is the sum of the two hazard functions of these subprocesses. Additionally, if there is no event occurring between t and t þ u, and v u 0, then ht(v) ¼ ht þ u(v u). Thus, ht(0) is identical to ht0 ðt t0 Þ for any time t0 between t and the occurrence time of the previous event. Assuming left-continuity with t, the function ht(0) is referred to specifically as the conditional intensity, denoted by l(t) or l(t| ℋt), where ℋt represents the observation history up to time t, but does not include t. Based on the definition of the hazard function, we can obtain the following: lðtÞdt ¼ Prfone or more events occur in ½t, t þ dt Þ j ℋt g: ð23Þ This is typically used as the definition of conditional intensity. When the point process, N, is simple, i.e., there is at most one event occurring at the same time and location, then Pr{N[t, t þ dt) > 1} ¼ o(dt). Using this relationship, l(t) can also be defined as follows: E½N ½t, t þ dtÞjℋt ¼ lðtÞdt þ oðdtÞ:
lðuÞdu ,
or in logarithm form: log LðN; S, T Þ ¼
Lt ðuÞ ¼ log St ðuÞ ¼ log½1 Ft ðuÞ ¼ log 1
T S
i¼1
f t ð uÞ d ¼ ½log:St ðuÞ du St ð uÞ
¼
lðti Þ exp
LðN; S, T Þ ¼
A direct use of the likelihood function is parameter estimation: When the model involves some unknown regular parameters, e.g., θ, we can estimate θ by maximizing the likelihood, i.e., the MLE (maximum likelihood estimate) is y ¼ argy max LðN; S, T; yÞ:
If several models fit to the same dataset, the optimal model can be selected using the Akaike information criterion (AIC, see Akaike 1974). The statistic AIC ¼ 2 max log LðuÞ þ 2kp u
Transformed Time Sequence and Residual Analysis
Residual analysis is a powerful and widely used tool for assessing the fit of a particular model to a set of occurrence times (e.g., Ogata 1988). Assume that we have a realization of a point process with event times denoted by t1, t2, , tn. We can then calculate the transformed event times, denoted by t1, t2, , tn, in such a manner that they have the same distributional properties as the homogeneous Poisson process with a unit rate parameter. If we suppose that the conditional intensity is l(t), the transformed time sequence for i¼ i ¼ 1, 2, , n, is calculated with ti ¼
Likelihood, Maximum Likelihood Estimate, and AIC
Given an observed dataset of a point process, N, say {t1, t2, tn}, in a given time interval [S, T], the likelihood function, L, is the joint probability density of the waiting times for each of these events, which can be written as
ð28Þ
is computed for each model fit to the data, where kp is the total number of estimated parameters for a given model. Based on a comparison of models with different numbers of parameters, the addition of 2kp roughly compensates for the additional flexibility provided by the extra parameters. The model with the lowest AIC value is the optimal choice for forward prediction purposes.
ð24Þ
This process becomes a stationary Poisson process when ht(u) is a constant.
ð27Þ
ti 0
lðuÞ du:
ð29Þ
The sequence {ti : i ¼ 1, 2, , n} then forms a Poisson process with a unit rate as the above transformation modifies the waiting time into an exponential random variable with a unit rate. This method can be used to test the goodness-of-fit of the model. If the fitted model, lðtÞ, is close enough to the true t model, then ti ¼ 0i lðuÞ du : i ¼ 1, 2, , is similar to the
Statistical Seismology
1477
standard Poisson model. The deviation of the rate in the transformed time sequence from the unit rate indicates either increased activity or quiescence relative to the seismic rate based on the original model. Models Developed in Episode II Renewal/Recurrence Models
A renewal process is a generalization of the Poisson process. The Poisson process has independent identically distributed waiting times that are exponentially distributed before the occurrence of the next event; however, a renewal process has a more general distribution for the waiting times. This indicates that the time interval between any two adjacent events is independent from the other intervals; the occurrence of the next event depends only on the time of the last event, but not on the full history. Renewal models yield the characteristic recurrence of earthquakes along a fault or in a region. This class of models is widely used in seismicity and seismic hazard analysis. For example, Field (2007) summarized how the Working Group on California Earthquake Probabilities (WGCEP) estimates the recurrence probabilities of large earthquakes on major fault segments using various recurrence models to produce the official California seismic hazard map. These models sometimes use the elastic rebound theory proposed by Reid (1910). According to this theory, large earthquakes release elastic strain that has accumulated since the last large earthquake. Some seismologists have deduced that longer quiescence periods indicate a higher probability of an imminent event (e.g., Nishenko and Buland 1987), whereas others contend that the data contradict this view (e.g., Kagan and Jackson 1995). Renewal models are often used to quantitatively demonstrate whether earthquakes occur temporally in clusters or quasiperiodically. If we denote the density function of the waiting times by f( ), which is also usually referred to as the renewal density, the conditional intensity of the renewal process is the same as the hazard function: lðtÞ ¼
f t tNðt Þ , 1 F t tNðt Þ
ð30Þ
where tNðt Þ is the occurrence time of the last event before t and F is the cumulative probability function of f. The following probability functions are often selected as the renewal densities: • The gamma renewal model has a density defined as follows:
f ðu; k, yÞ ¼ uk1
eu=0 , yk GðkÞ
for u 0 and k, y
> 0,
ð31Þ
where θ is the scale parameter, k is the shape parameter, and Γ and Γα are the gamma and incomplete gamma functions, 1 respectively, defined by GðxÞ ¼ 0 tx1 et dt and Ga ðxÞ ¼ 1 x1 t e dt: a t • The log-normal renewal model has a density function defined as follows: f ðu; m, sÞ ¼
ðlnðuÞmÞ2 1 p e 2s2 , us 2p
for u 0,
ð32Þ
where m and s are the mean and standard deviation of the variable’s natural logarithm, respectively, and erf is the error function. • The Weibull distribution, used in the Weibull renewal model, is one of the most widely used lifetime distributions in reliability engineering. The probability density function of a Weibull random variable, X, is defined as follows:
f ðu; m, kÞ ¼
k u m m 0
k1
k
eðu=mÞ
u 0,
ð33Þ
u < 0,
where k > 0 is the shape parameter and m > 0 is the scale parameter of the distribution. • Brownian passage time model Kagan and Knopoff (1987) used an inverse Gaussian distribution to model the evolution of stress as a random walk with tectonic drift. Matthews et al. (2002) then proposed the Brownian passage-time model based on the properties of the Brownian relaxation oscillator (BRO). In this conceptual model, the loading of the system has two components: (1) a constant-rate loading component, lt, and (2) a random component, ε(t) ¼ sW(t), defined as a Brownian motion (where W is a standard Brownian motion and s is a nonnegative scale parameter. The Brownian perturbation process for the state variable, X(t), (see Matthews et al. 2002) is defined as: XðtÞ ¼ lt þ sW ðtÞ: An event will occur when X(t) Xf. These event times can be observed as the “first Passage” or “hitting” times of Brownian motion with drift. The BRO is a family of stochastic renewal processes defined by four parameters: the drift or
S
1478
Statistical Seismology
mean loading (l), perturbation rate (s2), ground state (X0), and failure state (Xf). In fact, the recurrence properties of the BRO (response times) can be described by a Brownian passage-time distribution, which is characterized by two parameters: (1) the mean time or period between events, (m), and (2) the aperiodicity of the mean-time, α, which is equivalent to the coefficient of variation (defined as the ratio of the variance to the mean occurrence time). The probability density for the Brownian passage-time (BPT) model is given by f ðt; m, aÞ ¼
m 2pa2 t3
1 2
exp
ðt mÞ2 , 2a2 mt
t0
tm 2 tþm p þ exp 2 F p , a mx a mx a
si :
ð37Þ
i:ti 250 m) Stage 2 (100 m) Stage 3 (40 m) Stage 4 (15 m) Stage 5 (6 m) Total
Degrees of freedoma 211 38 37 38 38 362
Mean square 2047 1082 856 937 326 1528
Variance component 590.2 154.0 3.6 458.8 325.6
Cumulative variance 1532.2 942.0 788.0 784.4 325.6
The measurements were in mg kg1 There were three missing values; the total degrees of freedom are therefore 362 for 363 data from the 366 sampling points
a
estimate 402.7 282.4 7.0 483.2 277.2
REML
Uncertainty Quantification
Bibliography Atteia O, Dubois J-P, Webster R (1994) Geostatistical analysis of soil contamination in the Swiss Jura. Environ Pollut 86:315–327 Cochran WG, Cox GM (1957) Experimental designs. Wiley, New York Gower JC (1962) Variance component estimation for unbalanced hierarchical classification. Biometrics 18:168–182 Miesch AT (1975) Variograms and variance components in geochemistry and ore evaluation. Geol Soc Am Mem 142:333–340 Patterson HD, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58:545–554 Webster R, Lark RM (2013) Field sampling for environmental science and management. Routledge, London Webster R, Welham SJ, Potts JM, Oliver MA (2006) Estimating the spatial scales of regionalized variables by nested sampling, hierarchical analysis of variance and residual maximum likelihood. Comput Geosci 32:1320–1333 Youden WJ, Mehlich A (1937) Selection of efficient methods of soil sampling. Contrib Boyce Thompson Inst Plant Res 9:59–70
Uncertainty Quantification Behnam Sadeghi1,2, Eric Grunsky3 and Vera Pawlowsky-Glahn4 1 EarthByte Group, School of Geosciences, University of Sydney, Sydney, Australia 2 Earth and Sustainability Research Centre, University of New South Wales, Sydney, Australia 3 Department of Earth and Environmental Sciences, University of Waterloo, Waterloo, ON, Canada 4 Department of Computer Science, Applied Mathematics and Statistics, Universitat de Girona, Girona, Spain
Definition An issue in all geochemical anomaly classification methods is uncertainty in the identification of different populations and allocation of samples to those populations, including the critical category of geochemical anomalies or patterns that are associated with the effects of mineralization. This is a major challenge where the effects of mineralization are subtle.
Introduction There are various possible sources of such uncertainty, including (i) gaps in coverage of geochemical sampling within a study area; (ii) errors in geochemical data analysis, spatial measurement, interpolation; (iii) misunderstanding of geological and geochemical processes; (iv) fuzziness or vagueness of the threshold between geochemical background and geochemical anomalies; and (v) errors associated with instrument measurement and the associated calibration, in
1583
particular errors associated with accuracy and errors associated with precision (Costa and Koppe 1999; Lima et al. 2003; Walker et al. 2003; Bárdossy and Fodor 2004; McCuaig et al. 2007, 2009, 2010; Kreuzer et al. 2008; Kiureghian and Ditlevsen 2009; Singer 2010; Singer and Menzie 2010; Caers 2011; Zuo et al. 2015). However, there are further potential biases induced by data closure effects in compositional data, which may be dealt with by the use of ratios for which a number of methods exist on the basis of Aitchison geometry (Filzmoser et al. 2009; Carranza 2011; PawlowskyGlahn and Buccianti 2011; Gallo and Buccianti 2013; Buccianti and Grunsky 2014; Sagar et al. 2018; Zuzolo et al. 2018; Pospiech et al. 2020). The use of appropriate log-ratio transformations, including clr and ilr – see the ▶ “Compositional Data” entry – is adequate to represent the data in real coordinates, necessary for a proper modeling and outlier detection. Such issues generate two main types of uncertainty in geochemical anomaly recognition and mapping that are (i) stochastic uncertainty related to the quality and variability of geochemical data, which affects the results of methods and (ii) systematic uncertainty associated with assumptions and procedures for collection, analysis, and modeling of geochemical data (Porwal et al. 2003; Zuo et al. 2015). Due to analytical and interpolation errors in unsampled or undersampled areas, there will be both stochastic uncertainty and procedural systematic uncertainty in thresholds achieved from geochemical anomaly classification models. Mathematical uncertainties inherent within the different thresholddetermining methods defined can easily be quantified using existing statistical measures. Geospatial interpolation by any method is one of the main sources of uncertainty in continuous field geochemical models (Costa and Koppe 1999). However, errors in the characterization of significant geochemical anomalies are not only due to interpolation errors in sampled/unsampled areas. Error propagation analysis should be considered and can be assessed using methods such as Monte Carlo simulation. Error propagation from data collection to data analysis results in thresholds to populations spanning a range of values with the relative magnitude of such ranges dependent on the magnitude of sampling, analytical and interpolation errors (Stanley 2003; Stanley and Lawie 2007; Stanley et al. 2010). In effect, every estimate of a geochemical value in a continuous field model represents a range of values due to statistical and interpolation errors. However, existing methods of geochemical anomaly recognition do not consider error propagation and stochastic and systematic uncertainty quantification in the analysis. Additionally, existing methods assume in general errors to be additive, what is clearly not appropriate with compositional data. This, in turn, results in uncertainty whether observed geochemical anomalous samples or populations are statistically significant.
U
1584
Uncertainty Quantification
Uncertainty in Geochemical Anomaly Classification Models A variety of geostatistical and other mathematical methods have been applied to classify or cluster regolith geochemical data. A common objective is to define clusters that can be spatially related to mineralization, based on univariate or multivariate characteristics. Given the range of factors that can affect regolith geochemistry, including variations in parent lithology, regolith processes, and climatic effects, all such methods are subject to uncertainty in sample classification and pattern recognition, and identification of samples which geochemistry has (or has not) been affected by mineralization (i.e., a geochemical anomaly). This affects stochastic geochemical classification methods including fractal-based models. Uncertainties associated with geochemical data collection and data analysis can result in two types of errors (Koch and Link, 1970) in relation to the null hypothesis connecting detection or lack of detection of a geochemical anomaly with the associated presence or absence of mineralization. As with all such statistical tests, a Type I error or false positive occurs when the null hypothesis is rejected when the proposition is true, and a Type II error or false negative occurs when the null hypothesis is false but is accepted. From geochemical exploration point of view (Table 1), a Type I error occurs when an ore deposit exists in an area, but no mineralization is detected on the classified map; and a Type II error occurs in an area where there is no deposit but has been classified as an anomalous area. Type I errors preclude mineral discovery, whereas Type II errors will entail exploration expenditure that does not result in mineral discovery. Both types of errors have economic consequences (Rose et al. 1979; Table 1), hence there is a need to be able to quantify the overall accuracy (OA) of the geochemical population characterization as represented by the OA matrix contingency table of Carranza (2011) set out in Table 2. This ideal approach is, however, flawed as neither b or d can be determined due to the inability in many cases to determine whether undetected mineralization is actually
Uncertainty Quantification, Table 1 Classification of samples versus real condition
Assigned class
Anomalous
Background
Reality Ore present Correct decision
Type I error – mineralization not detected
Adapted from Rose et al. (1979)
Ore not present Type II error – wasted exploration expenditure Correct decision
present in an area where the geology would permit the presence of such mineralization (Yilmaz et al. 2017). It is only possible to experimentally gauge the OA in terms of the geochemical response to known mineralization, hence the typical design of most geochemical orientation surveys that focus on patterns associated with known mineralization and commonly provide insufficient assessment of pattern variations away from the direct influence of mineralization. A partial overall accuracy (POA) contingency table is therefore defined as: POA ¼ a=ða þ cÞ
ð1Þ
In geochemical exploration projects or orientation studies, the number of samples collected is always limited due to budgetary and other constraints. Therefore, in order to define and predict the spatial mineralization patterns using the available samples in sampled and unsampled areas, interpolation of the available data values is required to assign estimated values to unsampled areas (Chilès and Delfiner 2012; Sadeghi et al. 2015). Interpolation using any single method is one of the main sources of uncertainty in continuous field geochemical models as interpolation estimates are based only on data within the search window (Costa and Koppe, 1999). In order to evaluate the effect of interpolation errors in detecting geochemically anomalous populations, error propagation should be analyzed (Bedford and Cooke 2001; Oberkampf et al. 2002, 2004; Stanley 2003; Helton et al. 2004; Stanley and Lawie 2007; Verly et al. 2008; Stanley et al. 2010; Mert et al. 2016; Sagar et al. 2018).
Monte Carlo Simulation (MCSIM) to Quantify Systematic Uncertainty One significant method for analyzing the propagation of error in models and evaluating their stability is the Monte Carlo simulation (MCSIM) (Taylor 1982; Heuvelink et al. 1989). The MCSIM is a computational algorithm that generates random numbers or realizations given a specific density function (Pyrcz and Deutsch 2014; Pakyuz-Charrier et al. 2018; Athens and Caers 2019; Madani and Sadeghi 2019), derived from various parameters controlling that probability density function or the cumulative equivalent (the PDF or CDF) as shown in Fig. 1 (Deutsch and Journel 1998; Scheidt et al. 2018). As the MCSIM is based on the PDF and CDF evaluations, it can be applied to high-dimensional and nonlinear spatial modeling approaches to determine the probability of the target mineralization values occurrence based on the probability distributions relating to a given model (Athens and Caers 2019).
Uncertainty Quantification
1585
Uncertainty Quantification, Table 2 Overall accuracy (OA) matrix to compare performance of a binary model geochemical anomaly or signal recognition with binary ground truth (Sadeghi 2020)
Classification of sample (or pixel)
Anomalous Background
Geological ground truth Mineralization present and affecting sample geochemistry True anomaly ¼ a False background ¼ c Type I error ¼ Type II error ¼ Overall accuracy (OA) ¼
No mineralization present False anomaly ¼ b True background ¼ d c/(a þ b þ c þ d) b/(a þ b þ c þ d) (a þ d)/(a þ b þ c þ d)
Uncertainty Quantification, Fig. 1 Schematic PDF and CDF plots and associated quantiles (in this case the P10, P50, and P90 values). (Sadeghi 2020: modified from Scheidt et al. 2018)
Under the MCSIM approach, the P50 (median) value (the average 50th percentile of the multiple simulated distributions) represents a neutral probability in decision-making (Scheidt et al. 2018), and is defined as the expected “return.” The uncertainty is calculated, in this approach, as 1/(P90-P10) for which P10 (lower decile) and P90 (upper decile) are the average tenth and 90th percentiles of the multiple simulated values (Caers 2011; Scheidt et al. 2018), associated with each element. The implications of the MCSIM modeling mainly relate to sampling density (though may also relate to sample data quality if that can be disentangled from natural geochemical variability). In the recognition of geochemical processes, the sampling density reflects processes that can be adequately represented or processes that are under-represented. Underrepresented processes can appear as noise and is apparent when multivariate methods such as principal components are applied to the data (Grunsky and Kjarsgaard 2016). Typically, under-represented processes occur in such methods that are defined by the smaller eigenvalues. Where the quantified return is low or negative, and the quantified uncertainty is high, especially higher than its related return, a decision may be made that additional sampling is required to achieve the minimum required spatial continuity in the data or the stability of the fractal models (Fig. 2). If it goes uncertain, the final models and related classes would not be certain enough, which affects the entire certainty of the models generated in the whole study area.
Spatial Uncertainty The systematic uncertainty of the thresholds or classes obtained by each classification model was discussed above. However, there is another type of uncertainty that is related to the final maps generated by interpolation of the available values in the sampled areas and assignation of interpolated values to unsampled areas. A single interpolated map has unknown precision (stability) and accuracy. The precision can be estimated by application of simulation methods, such as sequential Gaussian simulation (SGSIM), with a large number of realizations (simulated maps) – see the ▶ “Simulation” entry. Such simulation models are mostly based on kriging interpolation methods (e.g., ordinary kriging and regression kriging) that minimize or compensate for precision issues. The difference or dissimilarity between realizations provides the estimate spatial uncertainty of the simulated models. Based on the initial and estimated values in a frequency framework, and the spatial uncertainty obtained using several realizations simulated in a Bayesian framework, areas for targeting are those with highest concentration frequency and lowest spatial uncertainty. The principal target areas for the follow-up exploration are those displaying strongly anomalous or very strongly anomalous populations. For such evaluations and to apply the established simulation algorithms, the rasterized (regular or Cartesian) grids are not required as
U
1586
Uncertainty Quantification
Uncertainty Quantification, Fig. 2 Schematic of Monte Carlo uncertainty propagation and decision-making workflow applicable to any classification model (Sadeghi 2020)
well, so they are applied to irregular grids such as point-sets (Remy et al. 2009). The realizations are often complex and of high dimensionality, hence representing them mathematically in extremely high-dimensional Cartesian space is not feasible. As an alternative to focusing on the realizations themselves, the differences between realizations can be analyzed. To achieve this, the concept of the “distance” has been proposed by Scheidt and Caers (2008, 2009). Conceptually, the distance is a single positive value quantifying the total difference between the outputs from any two realizations (Caers 2011; Scheidt et al. 2018). The average distances or variability between realizations (similar to the standard error of the mean) can also be determined. Based on the data types and the associated parameters, various metrics have been developed to quantify the distance between two realizations, such as the Hausdorff distance (Suzuki and Caers 2006), time-of-flight-based distances (Park and Caers 2007), and flow-based distance using fast flow simulators (Scheidt and Caers 2008). The simplest distance calculation in 2D models for non-dynamic data such as regolith geochemical mapping data is the Euclidean distance (Caers 2011; Scheidt et al. 2018; Fig. 3). Of course, for compositional data, all conventional methods can be used, as long as the data are represented in orthonormal log-ratio
coordinates (ilr, clr, balances) (Pawlowsky-Glahn and Buccianti 2011). Here such spatial uncertainty quantifications were introduced in general, and we do not go into mathematical details.
Summary and Conclusions The uncertainty in the classification models has been approached from two aspects. These are (i) determination of errors in classification and ways in which the errors can be measured and (ii) comparison of different classification models to determine overall stability of the classification and especially samples defined as “anomalous” using the two test datasets where the location of a number of mineral deposits is known. In essence, it is the question of being able to assess the risk that a geochemical “anomaly” is a false indicator of mineralization based on the combination of the nature of the data and the modeling approaches. In establishing the efficiency of models to accurately classify samples in relation to indications of the actual presence or absence of mineralization, it is again emphasized that the OA model is potentially misleading in quantifying the efficiency of classification models in orientation studies. In the classic contingency tables applied to geochemical sample
Uncertainty Quantification
1587
Uncertainty Quantification, Fig. 3 Schematic definition of Euclidean distance demonstrating the dissimilarities of different realizations (Sadeghi 2020: modified from Scheidt et al. 2018)
classification and the presence or absence of mineralization, the proportion of “b” samples that display a lack of mineralization signal in an area with no mineralization (or true background) and the “d” samples where there is an anomalous geochemical signal despite no mineralization being present (or the Type II errors) cannot be reliably known. The reason for this is the potential at most scales of geochemical mapping for the existence of unknown or undiscovered mineral deposits and ensuing incorrect allocation of areas or domains on maps as to the “expected background” classification. The POA approach evaluates the efficiency of the models based only on the known related mineral deposits; hence, it is a partial assessment of efficiency as it relates only to the probability of a geochemical anomaly being present within the dispersion halo of mineralization and not the probability of having no geochemical signal in areas unaffected by mineralization. The POA approach also provides a single scenario in a frequency framework. For cases with different patterns but similar POA, deciding on the most reliable or valid approach to model geochemical data to extract univariate or multivariate patterns or signals related to the effects of mineralization was not valid. MCSIM modeling, applied to each of the models and their individual characterized population, has partly addressed this problem as a precondition test before either final selection of the models to apply or the decision-making process for selecting the most reliable areas for follow-up exploration. The use of MCSIM on the input data requires initial estimates of (statistical) sampling uncertainty or measurement errors to determine if the potential effects of error propagation on the
final interpolated or simulated models are sufficiently low to allow for modeling to be justified. This is in comparison with a simple visual assessment of the raw data in relation to known mineralization, variations in lithology, structures, and other key supporting data. The MCSIM approach should preferentially be applied to geochemical samples separated into the main domains (typically lithology controlled) but might include landform-regolith associations in terrains different to those in Sweden or Cyprus. The metric to accept the sampling statistical certainty is “return” versus “uncertainty” (Caers 2011; Scheidt et al. 2018). The return must be higher than uncertainty otherwise the sampling (physical or statistical) may need to be improved. This approach has similarity to the question in generating data for purposes of ore reserve estimations of “when to stop drilling” (King et al. 1982).
Cross-References ▶ Compositional Data ▶ Simulation
Bibliography Athens ND, Caers JK (2019) A Monte Carlo-based framework for assessing the value of information and development risk in geothermal exploration. Appl Energy. https://doi.org/10.1016/j.apenergy. 2019.113932 Bárdossy G, Fodor J (2004) Evaluation of uncertainties and risks in geology. Springer, Berlin
U
1588 Bedford T, Cooke R (2001) Probabilistic risk analysis, foundations and methods. Cambridge University Press. ISBN 978-052-1773-20-1 Buccianti A, Grunsky E (2014) Compositional data analysis in geochemistry: are we sure to see what really occurs during natural processes? J Geochem Explor 141:1–5 Caers JK (2011) Modeling uncertainty in earth sciences. Wiley, Hoboken Carranza EJM (2011) Analysis and mapping of geochemical anomalies using logratio-transformed stream sediment data with censored values. J Geochem Explor 110:167–185 Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, 2nd edn. Wiley, Hoboken Costa JF, Koppe JC (1999) Assessing uncertainty associated with the delineation of geochemical anomalies. Nat Resour Res 8:59–67 Deutsch CV, Journel AG (1998) GSLIB. Geostatistical software library and User's guide. Oxford University Press, New York Filzmoser P, Hron K, Reimann C (2009) Univariate statistical analysis of environmental (compositional data): problems and possibilities. Sci Total Environ 407:6100–6108 Gallo M, Buccianti A (2013) Weighted principal component analysis for compositional data: application example for the water chemistry of the Arno river (Tuscany, Central Italy). Environmetrics 24:269–277 Grunsky EC, Kjarsgaard BA (2016) Recognizing and validating structural processes in geochemical data: examples from a diamondiferous kimberlite and a regional lake sediment geochemical survey. In: Martin-Fernandez JA, Thio-Henestrosa S (eds) Compositional data analysis, Springer proceedings in mathematics and statistics, vol 187. Springer, Cham, pp 85–115. 209 p Helton JC, Johnson JD, Oberkampf WL (2004) An exploration of alternative approaches to the representation of uncertainty in model predictions. Reliab Eng Syst Saf 85:39–71 Heuvelink GBM, Burrough PA, Stein A (1989) Propagation of errors in spatial modeling with GIS. Int J Geog Info Sys 3:303–322 King H, McMahon DW, Bujtor GJ (1982) A guide to the understanding of ore reserve estimation. AusIMM Rpt 281 Kiureghian AD, Ditlevsen O (2009) Aleatory or epistemic? Does it matter? Struct Saf 31:105–112 Koch GS, Link RF (1970) Statistical analysis of geological data, vol I. Wiley, New York, 375 p Kreuzer OP, Etheridge MA, Guj P, Maureen E, McMahon ME, Holden DJ (2008) Linking mineral deposit models to quantitative risk analysis and decision-making in exploration. Econ Geol 103:829–850 Lima A, De Vivo B, Cicchella D, Cortini M, Albanese S (2003) Multifractal IDW interpolation and fractal filtering method in environmental studies: an application on regional stream sediments of Campania region (Italy). Appl Geochem 18:1853–1865 Madani N, Sadeghi B (2019) Capturing hidden geochemical anomalies in scarce data by fractal analysis and stochastic modeling. Nat Resour Res 28:833–847 McCuaig TC, Kreuzer OP, Brown WM (2007) Fooling ourselves – dealing with model uncertainty in a mineral systems approach to exploration. In: Proceedings of the ninth biennial SGA meeting, Dublin McCuaig TC, Porwal A, Gessner K (2009) Fooling ourselves: recognizing uncertainty and bias in exploration targeting. Centre Explor Target 2:1–6 McCuaig TC, Beresford S, Hronsky J (2010) Translating the mineral systems approach into an effective targeting system. Ore Geol Rev 38:128–138 Mert MC, Filzmoser P, Hron K (2016) Error propagation in isometric log-ratio coordinates for compositional data: theoretical and practical considerations. Math Geosci 48:941–961 Oberkampf WL, DeLand SM, Rutherford BM, Diegert KV, Alvin KF (2002) Error and uncertainty in modelling and simulation. Reliab Eng Syst Saf 75:333–357
Uncertainty Quantification Oberkampf WL, Helton JC, Joslyn CA, Wojtkiewicz SF, Ferson S (2004) Challenge problems, uncertainty in system response given uncertain parameters. Reliab Eng Syst Saf 85:11–19 Pakyuz-Charrier E, Lindsay M, Ogarko V, Giraud J, Jessel M (2018) Monte Carlo simulation for uncertainty estimation on structural data in implicit 3-D geological modelling: a guide for disturbance distribution selection and parameterization. Solid Earth 9:385–402 Park K, Caers JK (2007) History matching in low-dimensional connectivity vector space. Stanford Univ SCRF Rpt 20 Pawlowsky-Glahn V, Buccianti A (2011) Compositional data analysis: theory and applications. Wiley, Hoboken, 378 p Porwal A, Carranza EJM, Hale M (2003) Artificial neural networks for mineral-potential mapping: a case study from Aravallia province, Western India. Nat Resour Res 12:155–171 Pospiech S, Tolosana-Delgado R, van den Boogaart KG (2020) Discriminant analysis for compositional data incorporating cell-wise uncertainties. Math Geosci. https://doi.org/10.1007/s11004-020-09878-x Pyrcz MJ, Deutsch CV (2014) Geostatistical reservoir modeling. Oxford University Press Remy N, Boucher A, Wu J (2009) Applied geostatistics with SGeMS (a user’s guide). Cambridge University Press, Cambridge, 264 p Rose AW, Hawkes HE, Webb JS (1979) Geochemistry in mineral exploration, 2nd edn. Academic Press, London Sadeghi B (2020) Quantification of uncertainty in geochemical anomalies in mineral exploration. PhD thesis, University of New South Wales Sadeghi B, Madani N, Carranza EJM (2015) Combination of geostatistical simulation and fractal modeling for mineral resource classification. J Geochem Explor 149:59–73 Sagar BSD, Cheng Q, Agterberg F (2018) Handbook of mathematical geosciences. Springer, Berlin Scheidt C, Caers JK (2008) Uncertainty quantification using distances and kernel methods – application to a Deepwater Turbidite reservoir. pangea.stanford.edu, pp 1–29 Scheidt C, Caers JK (2009) Representing spatial uncertainty using distances and kernels. Math Geosci 41:397–419 Scheidt C, Li L, Caers JK (2018) Quantifying uncertainty in subsurface systems, American Geophysical Union. Wiley, New York Singer DA (2010) Progress in integrated quantitative mineral resource assessments. Ore Geol Rev 38:242–250 Singer DA, Menzie WD (2010) Quantitative mineral resource assessments-an integrated approach. Oxford University Press, New York Stanley CR (2003) Estimating sampling errors for major and trace elements in geological materials using a propagation of variance approach. Geochem Explore Environ Anal 3:169–178 Stanley C, Lawie D (2007) Average relative error in geochemical determinations: clarification, calculation, and a plea for consistency. Explor Min Geol 16(3–4):267–275 Stanley C, O’Driscoll NJ, Ranjan P (2010) Determining the magnitude of true analytical error in geochemical analysis. Geochem Explor Environ Anal 10(4):355–364 Suzuki S, Caers JK (2006) History matching with and uncertain geological scenario. SPE Ann. Tech Conf Taylor JR (1982) An introduction to error analysis: the study of uncertainties in physical measurement. Oxford University Press, Sausalito Verly G, Brisebois K, Hart W (2008) Simulation of geological uncertainty, resolution porphyry copper deposit. In: Proceedings of the eighth geostatistics congress, Gecamin, vol 1, pp 31–40 Walker WE, Harremoës P, Rotmans J, van der Sluijs JP, van Asselt MBA, Janssen P (2003) Defining uncertainty: a conceptual basis for uncertainty management in model-based decision support. Integr Assess 4: 5–17 Yilmaz H, Cohen DR, Sonmez FN (2017) Comparison between the effectiveness of regional BLEG and 0 ,
ð7Þ
CN C ðY Þ ¼ y U : 0 < mCy ðyÞ < 1 :
ð8Þ
ð3Þ
If the borderline area of Y is empty which means if CN C ðY Þ ¼ ; , then dataset Y is considered to be exact in
Now we can give two definitions for rough sets shown as follows:
U
1602
Upper Approximation
Definition 1 Set Y is rough with respect to C if, C ðY Þ 6¼ C ðY Þ
ð9Þ
Once the certain rules are generated, the possible rules can also be generated by considering upper approximation. Which means the inconsistent objects are also taken into account for generating the rules.
Definition 2 Set Y is rough with respect to C if for some y, 0 < mCy ðyÞ < 1
ð10Þ
Reduction of Features Feature reduction is a vital aspect of RST. Reducing the redundant information from the system but at the same time to keep the relation of indiscernibility is the key idea of this step. The target is to search for indispensable features and eliminating the dispensable features. If k is an indiscernible feature in K, deleting it from the information system I will cause I to be inconsistent. In the other hand, if a feature is discernible, it could be eliminated from the dataset, and in this way the dimensionality of the dataset will be reduced. Let L be a subset of K and k belongs to L. The following exist: 1. If I(L ) ¼ I(L {k}), then k is discernible in L; or else k is indiscernible in L. 2. If features are necessary, then L is independent. 3. If I(L0) ¼ I(L ) and L0 is independent, then L0, the subset of L, is an L’s reduction. Reduction consists of some functions. First, the core of the features is calculated. Core is basically the feature set in where the features are present in all the reducts. In another words, a Core consists of the features which are indispensable from the information system devoid of breaking the structure of the similarity class. Let us assume L to be a subset of K. Then the set of the necessary features in L is the core of L. Core and reduction are related in the following way: CoreðLÞ ¼ \ReductðLÞ
Experimental Steps Data Collection A collection of 2747 samples and 13 well log features are considered where ten major lithology classeslithology class found in the well namely Mudstone, Claystone, Sandstone, Siltston, Sandy Siltstone, Sandy Mudstone, Silty Sandstone, Muddy Sandstone, Silty Mudstone, and Granulestone are the categories to be classified by the methodology. Table 1 shows the information of the lithology classeslithology class and the assigned class number. Discretization There are 12 well log conditional features such as Gamma Ray, Neutron Porosity, Density Correlation, Photo Electric Effect, Density Porosity, Conductivity, Caliper, Borehole Volume, Compressional Sonic, Hole Diameter, SQp, and SQs. For applying the methodology, the above mentioned conditional features need to be discretized. In this experiment, each of these conditional features is discretized into 10 equal length intervals. Upper and Lower Approximations Based on the RST theory, for the training samples belonging to each particular class, the lower and the upper approximations are found. The quality and accuracy of approximation can be computed with the RST approximations. The accuracy of approximations lies between [0,1]. The accuracy of approximation is defined as
ð11Þ
where Reduct(L ) is the list of all the reductions s of L. Certain and Possible Rules Generation Once the reduction is done, the core finding is done for the examples. Core is derived with the condition that the table still needs to be consistent (Hossain et al. 2021; Hossain et al., 2–5, December 2018). Then certain rules are generated from the consistent portion of the information table. In certain rules generation, lower approximation set is taken into account. Which means, the inconsistent objects are ignored to generate only the definite rules.
Upper Approximation, Table 1 Lithology classeslithology class and their corresponding information
Lithology Claystone Mudstone Siltston Sandstone Sandy mudstone Sandy siltstone Silty sandstone Silty mudstone Muddy sandstone Granulestone
Class no. 1 2 3 4 5 6 7 8 9 10
Upper Approximation
1603
Upper Approximation, Fig. 2 RST approximation accuracy. (Note this is based on Rose 2 http://idss.cs.put.poznan.pl/site/rose.html)
Upper Approximation, Fig. 3 Approximation rules. (Note this is based on Rose 2 http://idss.cs.put.poznan.pl/site/rose.html)
Approximation accuracy ¼
No:of objects in lower approximation No:of objects in upper approximation
ð12Þ The figure below shows the sample sizes for upper and lower approximations for each class. Certain and Uncertain Rules Generation 1. In Fig. 2, we can see that, other than class no. 8, the approximation accuracy is less than 1.
2. Since, for class no. 8, lower approximation ¼ upper approximation, only certain rules can be generated from this class and no possible rules are needed. 3. For all the classes except class no. 8, both certain and possible rules can be generated. 4. Sample Certain and Possible rules are shown in Fig. 3. Rules no. 47–51 are certain rules. And Rules No. 52–54 are possible rules.
U
1604
Classification Using the RST Rules The rules that are found in section “Certain and Uncertain Rules Generation” are implemented on the test dataset for classifying the objects and computing the accuracy. 1. First, the certain rules are applied on the testing samples. 2. Then the possible rules are applied to the samples that could not be classified by the certain rules. 3. The classification accuracy is 84.83% and the misclassification rate is 15.17%. 4. The computation of minimizing the decision rules is a combinatorial computation. It is time-consuming. Kim et al. (2013) and Kim et al. (2011) enable us to solve the fast solution of RST by using DNA computation.
Summary In this entry, a classification method based on upper and lower approximations for generating possible and certain rules is shown and a real time application is also provided to solve the lithology identification problem. The conclusions are as follows: 1. By using the RST approximations, certain and possible rules can be generated and data inconsistency is handled. 2. The generated rules are usable for classifying the decision classes for the future objects. 3. The classification accuracy is 84.83%. Therefore, the model can be adapted as a real-world solution to lithology classification. 4. The rules provide explainability to the model which helps the researchers to extract the reasoning for the classifications of the objects.
Upper Approximation
Bibliography Grzymala-Busse JW (1992) LERS a system for learning from examples based on rough sets. In: Slowinski R (ed) Intelligent decision support handbook of applications and advances of the rough sets theory. Kluwer Academic, p 318 Hossain TM, Watada J, Hermana M, Sakai H (2018) A rough set based rule induction approach to geoscience data. In: International conference of unconventional modelling, simulation optimization on soft computing and meta heuristics, UMSO 2018, Kitakyushu, Japan, 2–5, December 2018. IEEE. https://doi.org/10.1109/UMSO.2018. 8637237 Hossain TM, Watada J, Aziz IA, Hermana M (2020a) Machine learning in electrofacies classification and subsurface lithology interpretation: a rough set theory approach. Appl Sci 10:5940 Hossain TM, Watada J, Hermana M, Aziz IA (2020b) Supervised machine learning in electrofacies classification: A rough set theory approach. J Phys Conf Ser 1529:052048 Hossain TM, Watada J, Aziz IA, Hermana M, Meraj ST, Sakai H (2021) Lithology prediction using well logs: a granular computing approach. Int J Innov Comput Inf Control 17:225. Accepted on August, 22, 2020 Kim I, Chu Y-Y, Watada J, Wu J-Y, Pedrycz W (2011) A DNA-based algorithm for minimizing decision rules: A rough sets approach. IEEE Trans NanoBioscience 10(3):139–151. https://doi.org/10. 1109/TNB.2011.2168535 Kim I, Watada J, Pedrycz W (2013) DNA rough-set computing in the development of decision rule reducts. In: Skowron A, Suraj Z (eds) Rough sets and intelligent systems: Professor Zdzis?aw Pawlak in Memoriam, volume 1 of Intelligent systems reference library. Springer, pp 409–438. https://doi.org/10.1007/978-3-64230344-915 Pawlak Z (1982) Rough sets. Int J Parallel Prog 11(5):341–356. https:// doi.org/10.1007/BF01001956 Pawlak ZI (2002) Rough sets and intelligent data analysis. Inf Sci 147: 1–12 Sudha M, Kumaravel A (2018) Quality of classification with lers system in the data size context. Appl Comput Inf 16(1/2):29–38 Zhang Q, Xie Q, Wang G (2016) A survey on rough set theory and its applications. CAAI Trans Intell Technol. https://doi.org/10.1016/j. trit.2016.11.001
V
Variance Claudia Cappello1 and Monica Palma2 1 Dipartimento di Scienze dell’Economia, Università del Salento, Lecce, Italy 2 Università del Salento-Dip. Scienze dell’Economia, Complesso Ecotekne, Lecce, Italy
In the following, the first- and second-order moments of Z() are provided and the central role of the variance is highlighted in the geostatistical context. Given a random variable Z(x) at the point x, its first-order moment depends on the spatial location x and it is defined as follows: E½ZðxÞ ¼ mðxÞ,
x V,
Synonyms
while the second-order moments are given below:
Second-order moment; Standard deviation
• Variance, also known as a priori variance, which measures the dispersion of Z(x) around its expected value m(x)
Definition The variance of a regionalized variable measures the dispersion around the expected value and it is defined as the expected value of the squared difference between the random variable and its expected value.
Var½Z ðxÞ ¼ E½Z ðxÞ mðxÞ2 ,
x V:
ð1Þ
The variance can also be written as Var[Z(x)] ¼ E[Z2(x)] [m(x)]2; thus, when the expectation of Z(x) is zero, the variance corresponds to E[Z2(x)]. • Covariance, which is defined as follows:
Overview In spatial statistics, the observed value z(x) of a given aspect of interest at the location x is considered as a particular realization of the corresponding random variable Z(x), and the set of the random variables {Z(x), x V}, where V ℝd, d ℕ + (d 3), is called spatial random field. It is important to point out that in the applications, the sample data could be measured on points or blocks. If the spatial data refer to a block both its location and its volume (in terms of shape and size, which define the support of the sample) have to be considered. Moreover, if the volume of the blocks is very small and equal for all the sample data, it can be neglected and the sample data can be treated as points (Isaaks and Srivastava, 1989).
C½ZðxÞ, Zðx0 Þ ¼ Ef½ZðxÞ mðxÞ½Zðx0 Þ mðx0 Þg, x, x0 V;
ð2Þ it exists if the variance of the two random variables Z(x) and Z(x0 ) is finite. • Variogram, which measures the variance of the increments between Z(x) and Z(x0 ) 2g½Z ðxÞ, Zðx0 Þ ¼ Var ½ZðxÞ Zðx0 Þ
x, x0 V:
ð3Þ
Note that the function γ is called semivariogram. Looking at the second-order moments of a regionalized variable, the variance measures the spread of the phenomenon around the mean, and is a measure of the dissimilarity between Z(x) and Z(x0 ).
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
1606
Variance
The covariance (2) and the variogram (3) depend on the support points x and x0 , hence many realizations of the random variables at the points x and x0 are necessary to make statistical inference on these moments. However, the spatial data are non-repetitive, since only one observation is available in every single location. To overcome this inferential problem, the stationarity hypotheses have to be introduced, which assume that the observations at the spatial points characterized by the same separation vector h (in length and direction) are considered as replicates of the pair Z(x) and Z(x0). In particular, a random field is second-order stationary if • The expected value does not depend on the spatial point x: E[Z(x)] ¼ m, x V. • The covariance only relies on the separation vector h and not on the spatial locations x and x0 : CðhÞ ¼ Ef½Z ðxÞ m ½Z ðx þ hÞ mg,
variance. Moreover, the covariance is a bounded function, namely, |C(h)| C(0), then, according to (6), also the variogram is bounded by C(0). However, for some phenomena the variance and the covariance do not exist in finite form; in this case it is convenient to resort to the intrinsic hypotheses, which are based on the second-order stationarity of the increments between Z(x + h) and Z(x). In particular, given a separation vector h: • The expected value of the increments is supposed to be equal to zero: E½Zðx þ hÞ ZðxÞ ¼ 0,
x, x þ h V;
• The variance of the increments [Z(x þ h) Z(x)] is finite and does not depend on x: 2γ(h) ¼ Var[Z(x þ h) Z(x)] ¼ E[Z(x þ h) Z(x)]2 x, x þ h V.
x, x þ h V;
where h ¼ (|h|, α) is the separation vector, whose elements are |h|, which is the length of the vector and α, which is the corresponding direction. The stationarity of the covariance implies the stationarity of the variance and of the variogram (Journel and Huijbregts 1978), as specified hereinafter. • For h ¼ 0 the covariance is equal to the variance Cð0Þ ¼ Ef½Z ðxÞ m ½Z ðxÞ mg ¼ E½ZðxÞ m2 ¼ Var½ZðxÞ, x V;
Variance for an Indicator Random Field In some applications it is interesting to evaluate the probability whether or not a random variable at unsampled locations exceeds a fixed threshold. For this aim it is relevant introducing the indicator random field and the corresponding variance. Given a second-order stationary random field {Z(x), x V} where V ℝd and a fixed threshold z ℝ, the indicator random field {I(x; z), x V} is such that I ðx; zÞ ¼
ð4Þ
1
if Z ðxÞ z,
0
otherwise:
hence, all the random variables over the same spatial domain have the same expectation and the same finite variance.
The variance of an indicator random field is finite and equal to
• For any couple of random variables Z(x) and Z(x þ h) the variogram is equal to
Var½I ðx; zÞ ¼ FðzÞ F2 ðzÞ,
2gðhÞ ¼ Var½Z ðx þ hÞ Z ðxÞ ¼ E½Zðx þ hÞ ZðxÞ2 ,
x, x þ h V:
ð5Þ
From (5) it can be proved that if two random variables are separated by a vector h, then the semivariogram γ is strictly related to the variance and the covariance through the following relation gðhÞ ¼ Cð0Þ CðhÞ,
ð6Þ
where C(0) ¼ Var [Z(x)]. In addition, under the secondorder stationarity hypothesis, it is assumed the existence of the covariance and, consequently, of a finite a priori
ð7Þ
where F () denotes the marginal distribution of Z(x). The remarkable characteristic of the variance of an indicator random field is that its maximum value is 0.25 for z ¼ zM, where zM is the median of the distribution of F(z), hence F(zM) ¼ 0.5. Properties of the Variance Given a random variable Z(x) at the spatial location x, the following properties of the variance are provided: 1. 2. 3. 4.
Var½ZðxÞ 0, 8a ℝ, VarðaÞ ¼ 0, 8a ℝ, Var½Z ðxÞ þ a ¼ Var½ZðxÞ, 8a ℝ, Var½aZ ðxÞ ¼ a2 Var½ZðxÞ:
Variance
1607
Moreover, given the regionalized variables Z(xi), i ¼ 1,. . . , N, the variance of a sum of random variables results as follows: N
N
Z ðxi Þ ¼
Var i¼1
N1
N
Var ½Z ðxi Þ þ 2 i¼1
C Zðxi Þ, Z xj i¼1 j¼iþ1
where C() is the covariance function. Starting from the first property of the variance, the admissibility condition for a function to be a covariogram (i.e. the condition of positive definiteness) can be derived. The variance of the vector Z ¼ [Z(x1), Z(x2),. . . , Z(xN)] can also be written in matrix form, which is known in the literature as variance-covariance matrix, usually indicated with the Greek letter S. This matrix, of order (NN ), contains the variances of the regionalized variables Z(xi), i ¼ 1,. . . , N on the diagonal, while the off-diagonal entries are the covariances between the variables, i.e.,
¼
Var½Z ðx1 Þ
C½Z ðx1 Þ, Z ðx2 Þ
C½Z ðx1 Þ, Z ðxN Þ
C½Z ðx2 Þ, Z ðx1 Þ
Var½Z ðx2 Þ
C½Z ðx2 Þ, Z ðxN Þ
⋮ C½Z ðxN Þ, Z ðx1 Þ
⋮ C½Z ðxN Þ, Z ðx2 Þ
⋱
⋮ Var½Z ðxN Þ
:
Note that the S matrix is symmetric and positive semidefinite (Schott 2018).
S2 ¼
1 2N 2
N
N
2
Z ð xi Þ Z xj
As previously pointed out, the a priori variance of a secondorder stationary random field is finite and corresponds to C(0), on the other hand if Z satisfies the intrinsic hypotheses but not the second-order stationarity ones, then V ar[Z(x)] might not be finite, however, and the variance of the increments between Z(x + h) and Z(x) is finite. Given the random variables Z(xi), i ¼ 1, . . . , N, of a random function Z, the estimator of the variance is the sample variance, whose analytic expression is given below: S2 ¼
where Z ¼ N1
1 N
N i¼1
N
2
Z ð xi Þ Z ,
xi V,
ð8Þ
i¼1
Z ðxi Þ is the sample mean.
Alternatively, the sample variance can also be expressed in terms of the sum of the squared differences between all the random variables Z(xi)
xi , xj V: ð9Þ
It is important to point out that the sample variance is a random variable; hence it is characterized by first- and second-order moments. In particular, the expected value of the sample variance, denoted by s2, is obtained as follows (Chilès and Delfiner 2012): E S2 ¼ s 2 ¼
1 N2
N
N
2
g xi xj :
ð10Þ
i¼1 j¼1
Moreover, for second-order stationary random functions, the expected value of the sample variance (10) is also equivalent to E S2 ¼ s2 ¼ Cð0Þ
1 N2
N
N
C xi xj :
ð11Þ
i¼1 j¼1
Hence, starting from the equivalence in (11) the following relation can be derived: s2 < C(0). In addition, if the sample locations are so far apart, which means that the regionalized variables are uncorrelated, the expectation of the sample variance is equal to E S2 ¼ s2 ¼ 1
Sample Variance
,
i¼1 j¼1
1 Cð0Þ, N
ð12Þ
where the difference between s2 and C(0) vanishes as the sample size increases.
Estimation Variance It is well known that in Geostatistics the estimation of a regionalized variable at unsampled spatial locations, obtained through different techniques, is a relevant aspect. The estimation error occurs in every estimation technique due to the fact that the quantity to be estimated generally differs from the estimate. Given a random function Z over a spatial domain V, Z takes its values Z(x) on small regular-shaped areas v centered on x, and if the support v is a point its area is negligible with respect to V. Assuming that the true value z(xi) is estimated through zðxi Þ, the estimation error is equal to r ðxi Þ ¼ ^ ^ zðxi Þ zðxi Þ. Then, the error r(xi) is a particular realization of the random variable Rðxi Þ ¼ Z^ðxi Þ Z ðxi Þ, i ¼ 1, . . . , N. Note that if the random field Z is stationary, also R(x) is stationary; moreover
V
1608
Variance
since R(x) is a random variable it is relevant to evaluate its first- and second-order moments (Journel and Huijbregts 1978) that are: • the expected value E[R(x)] ¼ mE, 8x; • the variance (or estimation variance) Var ½RðxÞ ¼ E½RðxÞ2 m2E ¼ s2E , 8x The estimation of the above-mentioned moments is required since a good estimation technique ensures a mean error equal to zero and the smallest estimation variance. In particular, the estimator Z^ must be a function of Z(xi) Z^ðxÞ ¼ f fZðx1 Þ, Z ðx2 Þ, . . . , ZðxN Þg and it is such that: • The unbiasedness condition is satisfied, i.e., E½RðxÞ ¼ E Z^ðxÞ ZðxÞ ¼ 0. • It is possible to compute easily the estimation variance, i.e., Var ½RðxÞ ¼ s2E ¼ Var Z^ðxÞ Z ðxÞ For any function f (), the calculation of the expected value and of the variance of the random variable R() requires that the distribution of Z(xi), i ¼ 1,. . . , N, is known. Nevertheless, since it is generally not possible to infer this distribution from a unique realization of Z(x), usually the class of the linear Z^ is chosen in estimators (Journel and Huijbregts 1978): N
Z^ ¼
li Zðxi Þ:
i¼1
li ¼ 1 and the estimation variance (Matheron
1963, 1965) corresponds to s2E ¼ 2gðu, V Þ gðV, V Þ gðu, uÞ,
ð14Þ
or, more specifically can be calculated as follows: s2E ¼ 2
N
N
N
li lj g xi , xj ,
li gðxi , V Þ gðV, V Þ i¼1
V gðxi ,xÞdx
, is the average semivariogram
between the sampling points xi and all the other points in the domain V;
• gðV, V Þ ¼ ½V12
V V gðx, x
0
Þdxdx0 , is the average semi-
variogram over V From (14) or (15) it is possible to conclude that s2E does not depend on the shape and size of the area under study, the relative locations of the sample data within V as well as the spatial correlation structure γ. Hence, s2E is strictly related to the model chosen for modeling γ, keeping constant the area V and the location of the sample data. Note that, by minimizing the estimation variance s2E , the Ordinary Kriging (OK) estimator is obtained (Stein, 1999). Hence, in the class of the linear weighted average estimators, the kriging provides the best linear unbiased estimator (B.L. U.E.), since the optimal weights lNi minimize the estimation variance s2E , under the constraint li ¼ 1. It is well known that different forms of kriging arei¼1available in the literature and each type is characterized by a specific estimation variance. In the contributions of Journel and Huijbregts (1978) and Journel (1977) the hierarchy among the estimation variances is detailed, that is s2EDK s2ESK s2EOK s2EUK , where s2EDK is the estimation variance associated to the disjunctive kriging, s2ESK is the one associated with the simple kriging, while s2EOK and s2EUK refer to the estimation variance in case of ordinary and universal kriging, respectively.
Dispersion Variance
Hence, given a stationary random field Z, if Z^ is chosen in the class of the linear estimators the unbiaseness condition is N
• gðxi ,V Þ ¼ ½V1
ð13Þ
i¼1
ensured if
where
i¼1 j¼1
ð15Þ
The term dispersion variance, as clarified in Journel and Huijbregts (1978), is used in Geostatistics to denote: • The dispersion around the mean value of a set of data collected within a domain V. This dispersion increases as the dimension of the spatial domain increases. • The dispersion within a fixed domain V with respect to the support v. This dispersion decreases as the support v on which each datum is defined increases. The dispersion variance of the support v within the spatial domain V also depends on the variogram: D2 ðujV Þ ¼ gðV, V Þ gðu, uÞ:
ð16Þ
Variogram
1609
From the relationship in (16) it is evident that as the support v increases, then g(v, v) increases, hence the dispersion variance D2(v|V) decreases, keeping other factors constant. For the extreme case in which the spatial unit is a point (v ¼ 0), the dispersion variance in terms of variogram, corresponds to D2 ð0jV Þ ¼ gðV, V Þ:
ð17Þ
Variogram Behnam Sadeghi EarthByte Group, School of Geosciences, University of Sydney, Sydney, NSW, Australia Earth and Sustainability Research Centre, University of New South Wales, Sydney, NSW, Australia
It is important to note that the dispersion variances in (16) and (17) are linked by the additive relationship that is:
Definition 2
2
2
D ð0jV Þ ¼ D ð0juÞ þ D ðujV Þ: This property explains the drop of the variance when the support changes from a quasi point sample support to a larger support of interest: D2 ð0jV Þ D2 ðujV Þ ¼ D2 ð0juÞ ¼ gðu, uÞ:
A variogram is a tool to describe the data spatial continuity. When a number of data samples are available, the analyst can use a variability measure between each pair of the samples at different distances to calculate the experimental variogram function considering the best type of the variogram fitted (Deutsch and Journel 1998).
Summary Introduction In spatial statistics the second-order moments of a random variable Z(x) at the location x are always evaluated. In particular, the variance measures the dispersion of the random variable around its expected value. In this chapter, the first- and second-order moments of Z() are provided and the central role of the variance is highlighted in the geostatistical context.
Cross-References ▶ Kriging ▶ Stationarity ▶ Variogram
References Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, 2nd edn. Wiley, Hoboken. 734 pp Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, New York. 561 pp Journel AG (1977) Kriging in terms of projections. J Int Assoc Math Geol 9:563–586 Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic, London/New York. 600 pp Matheron G (1963) Principles of geostatistics. Econ Geol 58(8): 1246–1266 Matheron G (1965) Les variables rgionalises et leur estimation: Une application de la thorie des fonctions alatoires aux sciences de la nature. Masson, Paris Schott JR (2018) Matrix analysis for statistics, 3rd edn. Wiley, Hoboken. 552 pp Stein ML (1999) Interpolation of spatial data: some theory for kriging. Series in statistics. Springer, New York. 247 pp
Geostatistics is the combination of different disciplines of geology and statistics. Its main focus is on spatial and spatiotemporal types of dataset in various fields of geosciences and engineering. One of the significant and basic concepts in geostatistics is variogram, its different types, and the variography. A variogram is a function to describe the data spatial continuity using the variability between sample points located at different distances from each other and affecting each other considering the distances (lags). In actual, it helps with finding the highest distance between the samples in which the samples are spatially in relation and affecting each other. In other words, closer samples have lower variabilities, i.e., higher spatial effects, yet the distant samples provide higher variabilities, i.e., lower spatial effects. The final interpolation would be implemented on those samples which are situated in that effective range.
Samples and Datasets In the geosciences, we may have various types of samples in 2D (on the ground) or 3D (under the ground and subsurface), such as litho, soil, till, rock chip, and stream sediment samples. Based on the study for which the samples are collected, one looks for something specific as the target value, i.e., data. For instance, in geochemical studies, that value could be the concentration of the target elements in various samples.
V
1610
Interpolation In general, mostly in geological studies, due to lack of access to some parts of the study areas like mountainous areas, or even lack of budget, which is always critical, the number of the samples collected is limited. Based on the experts’ opinion and the exploration level and scale (e.g., reconnaissance, regional, local, and detailed exploration), the sampling network could be regular or irregular and dense or scarce, and even the distance between samples could be short in meters or higher than kilometers. To generate a final continuous map based on the sample values to provide a better imagination about the mineral deposits, their concentrations, and trend, we need to estimate the values in the unsampled areas as well, however based on those in sampled areas. To do so, we need to apply interpolation methods. A group of multivariate interpolation methods are applicable in geostatistical analysis considering the type of samples and studies. Some of the more conventional methods are bilinear interpolation and nearest-neighbor interpolation, which are mainly based on the distance between samples. Generally, in geochemical surface map generation procedures and exploration projects, weighted moving average (WMA) interpolation techniques have been significantly taken into consideration. Inverse distance weighting (IDW) and kriging are among the most common WMA techniques (Carranza 2009). The IDW interpolation technique considers both distances between samples and the spatial effect of samples on each other considering that distance (assigned as weight). Simply, this method says the closer samples have higher effects (assigned weights) on each other than the distant samples (Fig. 1). This interpolation technique is mostly applicable to 2D modeling/mapping; however, in 3D modeling, it has some limitations, which are out of this entry’s scope. To apply this technique, we need to have a knowledge of the point uni-element concentration data. This helps with better adjustments of the “moving average” window (or kernel) size and distance (assigned as weight) parameters (Carranza 2009).
Variogram, Fig. 1 The simplified concept of IDW
Variogram
The main point raised here is we may have a local group of samples together, in addition to several other samples in various distances from them. Can we use all those samples for the interpolation? Does it make the final interpolation biased? Do those samples really have any spatial effect on the main group of samples? If so, how much? If not, considering the samples’ values and their variances, what samples could effectively get involved with the estimation and what samples will not have any spatial effect? This means we need to have a distance range in which all samples are in spatial relationship, and they are proper for interpolation, and the rest of the samples beyond that range have no spatial effect and association and will not be used for the interpolation (Fig. 2). To do so, we need a new concept or interpolation method. Kriging interpolation/estimation was developed based on such a search radius (Sagar et al. 2018; see the entry “Kriging”). Kriging, as a Gaussian process regression, is the most important interpolation method in geostatistics that provides the best linear unbiased estimation (BLUE) based on covariances (Deutsch and Journel 1998). It applies GaussMarkov theorem to demonstrate the estimation/error independence. The main difference between kriging and regression is that kriging is applied to an individual realization of a single random field, but the regression is multiple scenarios of multivariate dataset. To apply kriging interpolation (see the entry “Kriging”), or Gaussian simulation to generate more than a single scenario (i.e., realizations) based on the same data – see the entries “Simulation,” “Sequential Gaussian Simulation,” and “Uncertainty Quantification” – variogram is required.
Variogram Main Principles In Fig. 2, imagine the distances between two samples of A and B, and B and C are 50 m and 100, respectively. The differences between the target element concentration values per samples are available, too. In this respect, the distances between the other samples, with defined concentration values,
Variogram
1611
Variogram, Fig. 3 Schematic variogram and its representative parts
Variogram, Fig. 2 Variogram calculation process
are also calculated. Having all these details, variogram is generated using the equation below (David 1982; Isaaks and Srivastava 1990; Journel and Huijbregts 2004; Chilès and Delfiner 2012; Pyrcz and Deutsch 2014): g ð hÞ ¼
1 2N ðhÞ
N ðhÞ
ðZðxi Þ Zð xiþh Þ2
i¼1
where N(h) is the number of sample pairs with h distance (lag distance) from each other. Z(xi) and Z ( xi þ h) are also the variables with the same distance of h. γ(h) represents semivariograms, generally called variograms in literature although it is half of a variogram, in actuality. Variogram is generally a tool to evaluate the dissimilarity of a quantitative value, i.e., a local variable, between two samples with h distance. In most of the mineral deposits, such local variables are represented as element concentrations. Figure 3 simply demonstrates the variogram’s structure. In most of the mineralization and mineral deposits, variogram does not start from 0, i.e., the variance is not zero. Such a difference is called the “nugget effect” (Bohling 2005). In general, the nugget effect represents any discontinuities at the origin, even with different causes. It happens due to some reasons as the sources of variance such as (Chilès and Delfiner 2012): • A structure, microstructure, or “geological noise”; it happens when the range is less than the distance between the samples in sampling network. • Measurement or positioning errors.
Variogram, Fig. 4 Various main types of variograms
The variogram fitted to the points often inclines gently until it reaches the sill, i.e., variance. Such a distance is called “range,” which represents the maximum distance between the samples in which they spatially affect each other, i.e., their covariance is zero.
Main Types of Variogram In order to evaluate variograms, we need to study the standard variograms rather than the experimental ones. The main standard variogram types are (Fig. 4) (Remy et al. 2009):
V
1. Spherical variogram (Matheron 1963): 0 g ð hÞ ¼
c: 1:5 c
;h ¼ 0 khk khk 0:5 a a
3
; khk a:
;h > a
where c represents the difference between the nugget effect and the sill, a is the range, and h is the lag. This equation is
1612
Variogram, Fig. 5 Schematic explanation of the experimental variogram tolerance
Variogram
Variogram
1613
applicable when the nugget effect is zero; however, in case it is not, we need to add its value to the equation. This variogram is the most useful variogram in geostatistics. 2. Exponential variogram: gðhÞ ¼ c: 1 exp
3khk a
3. Gaussian variogram: gðhÞ ¼ c: 1 exp
3khk2 a2
The covariance equivalent of the above models is defined as (Remy et al. 2009): CðhÞ ¼ Cð0Þ g ðhÞ, with Cð0Þ ¼ 1 It is always advised to generate several variograms with different dip and azimuth values to define the optimum values of sill and range using different available scenarios (Deutsch and Journel 1998; Remy et al. 2009; Sadeghi 2020). It helps with the evaluation of the experimental variogram tolerance (Fig. 5). In other words, variogram is calculated per principal direction after any coordinate and data transformation step. Such directions are represented by azimuth and dip angles. The final variogram tolerance is demonstrated in Fig. 5. Generally, in fitting any types of the abovementioned variograms, one condition should be taken into consideration (Goovaerts 1997; Remy et al. 2009): Nugget effect þ sill ¼ 1 In case we face zonal mineralization, the variogram could have more than a single structure. In other words, it can have one nugget effect but several sills and ranges. Each structure could have its own individual type of variogram. However, again the sum of the nugget effect and all the sill values together should be 1 while fitting the variogram.
Summary and Conclusions In projects such as mineral or petroleum exploration, the limited number of relevant geological samples on or under the ground has been always critical. This is because of some reasons such as lack of access to some areas, limited budget for further sampling and analysis, and of course limited time for further sampling and analysis. To study of the sample values as the representatives for sampled and unsampled areas, interpolation techniques such as IDW and various types of kriging have been developed and applied to generate estimated models and provide a better view of the
mineralization and spatial continuity and connectivity of the sample data values. In order to apply the kriging interpolation method and, through that, quantify the errors related and the probabilities required for spatial and stochastic uncertainty quantification, (semi-)variograms have been introduced and applied as the main tools. Using variograms and variography, the range that samples can affect each other spatially is calculated. The samples located in the same range will only be used for the interpolations. Such variograms can be studied in various azimuths and dips (e.g., 0, 45, 90, 135, and 180) to provide optimum values of the sill, the nugget effect, range, and other required statistics. Then, the interpolated models will be generated based on such calculated values. Such details are also applied to simulate more than one scenario (i.e., realization) based on the same datasets, to provide both Bayesian and frequency frameworks and calculate the stochastic and spatial uncertainties – see the entries “Uncertainty Quantification” and “Simulation.”
Cross-References ▶ Kriging ▶ Sequential Gaussian Simulation ▶ Simulation ▶ Uncertainty Quantification
Bibliography Bohling G (2005) Introduction to geostatistics and Variogram analysis. Kansas Geological Survey, 20 p Carranza EJM (2009) Geochemical anomaly and mineral prospectivity mapping in GIS. In: Handbook of exploration and environmental geochemistry, vol 11. Elsevier, Amsterdam, 368 p Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty, 2nd edn. Wiley, Hoboken, New Jersey, USA, 726 p David M (1982) Geostatistical ore reserve estimation, 1st edn. Elsevier Science, Amsterdam, 384 p Deutsch CV, Journel AG (1998) GSLIB. Geostatistical software library and user’s guide. Oxford University Press, New York, USA Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York Isaaks EH, Srivastava RM (1990) An introduction to applied geostatistics. Oxford University Press, New York, USA, 592 p Journel AG, Huijbregts CJ (2004) Mining geostatistics. The Blackburn Press, Caldwell, New Jersey, USA, 600 p Matheron G (1963) Trait’e de g’eostatistique appliqu’ee, tome ii, vol 2. Technip, Paris Pyrcz MJ, Deutsch CV (2014) Geostatistical reservoir modeling. Oxford University Press, New York, NY Remy N, Boucher A, Wu J (2009) Applied geostatistics with SGeMS (A user’s guide). Cambridge University Press, Cambridge, UK, 264 p Sadeghi B (2020) Quantification of uncertainty in geochemical anomalies in mineral exploration. PhD thesis, University of New South Wales Sagar BSD, Cheng Q, Agterberg F (2018) Handbook of mathematical geosciences. Springer, Switzerland
V
1614
Very Fast Simulated Reannealing
Very Fast Simulated Reannealing Dionissios T. Hristopulos School of Electrical and Computer Engineering, Technical University of Crete, Chania, Crete, Greece
Abbreviations ASA SA SQ VFSR
Adaptive simulated annealing Simulated annealing Simulated quenching Very fast simulated reannealing
Definition Very fast simulated reannealing (VFSR) is an improved version of simulated annealing (SA). The latter is a global optimization method suitable for complex, non-convex problems. It aims to optimize an objective function ℋ(x), with respect to the D-dimensional parameter vector x ¼ (x1, . . .xD)⊤. Simulated annealing relies on random Metropolis sampling of the parameter space which mimics the physical annealing process of materials. The annealing algorithm treats ℋ(x) as a fictitious energy function. The algorithm proposes moves which change the current state x, seeking for the optimum state. The proposed states are controlled by an internal parameter which plays the role of temperature and controls the acceptance rate of the proposed states. Very fast simulated reannealing employs a fast temperature reduction schedule in combination with periodic resetting of the annealing temperature to higher values. In addition, it allows different temperatures for different parameters, and an adaptive mechanism which tunes temperatures to the sensitivities of the objective function with respect to each parameter. These improvements allow fast convergence of the algorithm to the global optimum. For reasons of conciseness and without loss of generality, it is assumed in the following that the optimization problem refers to the minimization of the objective function ℋ(x).
Overview Annealing is a physical process which involves heat treatment and is used in metallurgy to produce materials (e.g., steel) with improved mechanical properties. It was presumably crucial in the manufacturing of Damascus steel swords which were famous in antiquity for their toughness, sharpness, and strength. The well-kept secrets of Damascus swordmaking were lost in time as the interest in swords as weapons waned. However, their legend survived in popular culture,
inspiring the references to “Valyrian steel” in the popular Game of Thrones saga. The secrets of the long-lost art were presumably re-discovered in 1981 by Oleg D. Sherby and Jeffrey Wadsworth at Stanford University. A key factor for the effectiveness of annealing is the heating process which enables the material to escape metastable atomic configurations (corresponding to local optima of the energy) and to find the most stable configuration that corresponds to the global minimum of the energy. Simulated annealing (SA) is a stochastic optimization algorithm developed by Scott Kirkpatrick and co-workers. It uses randomness in the search for the global minimum of ℋ(x). The randomness is introduced through (i) a proposal distribution which generates new proposal states xnew of the parameter vector and (ii) through probabilistic decisions whether to accept or reject the proposal states. In contrast, in deterministic optimization methods, every move is fully determined from the current state and the properties of the objective function. Simulated annealing belongs in a class of models inspired by physical or biological processes; these algorithms are known as meta-heuristics (see Ingber (2012) and references therein.) Physical annealing involves a heat treatment process which in SA is simulated using the Metropolis Monte Carlo algorithm (Metropolis et al. 1953). At every iteration of the SA algorithm, a random solution (state) xnew is generated for the parameters by local perturbation of the current state xcur. The proposed state is accepted and becomes the new current state according to an acceptance probability; the latter depends on the value of ℋ(xnew) compared to ℋ(xcur) and an internal SA “temperature” variable T. The new state is accepted if the proposal lowers the objective function. On the other hand, even states xnew such that ℋ(xnew) > ℋ(xcur) are assigned a non-zero acceptance probability which is higher for higher temperatures. The SA temperature is initially set to a problem-specific, user-defined high value which allows higher acceptance rates for sub-optimal states. This choice helps the algorithm to effectively explore the parameter space because it allows escaping from local minima of the objective function. The temperature is then gradually lowered in order to “freeze” the system in the minimum energy state. The function T(k), where k is an integer index, describes the evolution of temperature as a function of the “annealing time” k and defines the SA cooling schedule. This should be carefully designed to allow convergence of the algorithm to the global minimum (Geman and Geman 1984; Salamon et al. 2002). SA can be applied to objective functions with arbitrary nonlinearities, discontinuities, and noise. It does not require the evaluation of derivatives of the objective function, and thus it does not get stuck in local minima. In addition, it can handle arbitrary boundary conditions and constraints imposed on the objective function. On the other hand, SA requires
Very Fast Simulated Reannealing
tuning various parameters, and the quality of the solutions practically achieved by SA depends on the computational time spent. SA provides a statistical guarantee for finding the (global) optimal solution, provided the correct combination of state proposal (generating) distribution and cooling schedule are used (Ingber 2012). The convergence of the algorithm is based on the weak ergodic property of SA which ensures that almost every possible state of the system is visited. Achieving the statistical guarantee can in practice significantly slow down classical SA, since a very gradual cooling schedule is required. To reduce the computational time, often a fast temperature reduction schedule is applied, i.e., T(k þ 1) ¼ cT(k), where 0 < c < 1, which amounts to simulated quenching (SQ) of the temperature. However, this exponential cooling schedule does not guarantee convergence to the global optimum. Very fast simulated reannealing (VFSR) is also known as adaptive simulated annealing (ASA) (Ingber 1989, 2012). ASA is the currently preferred term, while VFSR was used initially to emphasize the fast convergence of the method compared to the standard Boltzmann annealing approach. ASA employs a generating probability distribution for proposal states which allows an optimal cooling schedule. This setup enables the algorithm to explore efficiently the parameter space. Reannealing refers to a periodical resetting of the temperature to higher (than the current) value after a number of proposal states have been accepted. Then, the search begins again at the higher temperature. This strategy helps the algorithm to avoid getting trapped at local minima. Finally, ASA allows for different temperatures in each direction of the parameter space. The parameter temperatures determine the width of the generating distribution for each parameter, thus enabling the cooling schedule to be adapted according to the sensitivity of the objective function in each direction. ASA maintains SA’s statistical guarantee of finding the global minimum for a given objective function, but it also features significantly improved convergence speed. It has been demonstrated that ASA is competitive with respect to other nongradient-based global optimization methods such as genetic algorithms and Taboo search (Chen and Luk 1999; Ingber 2012).
Methodology The SA algorithm is a method for global – possibly constrained – optimization of general, nonlinear, real-valued objective functions ℋ(x) where x w ℝD is a vector of parameters that takes values in the space w: x ¼ arg min ℋðxÞ, potenially with constraints on x: xw
1615
The standard SA algorithm is based on the Markov chain Monte Carlo method (Geman and Geman 1984). It involves homogeneous Markov chains of finite length which are generated at progressively lower temperatures. The following parameters should thus be specified: (i) A sufficiently high initial temperature T0; (ii) a final “freezing” temperature Tf (alternatively a different stopping criterion); (iii) the length of the Markov chains; (iv) the procedure for generating a proposal state xnew “neighboring” the current state xcur; (v) the acceptance criterion which determines if the proposal state xnew is admitted; and (vi) a rule for temperature reduction (annealing schedule).
Boltzmann Simulated Annealing The SA algorithm is initialized with a guess for the parameter vector x0. The simulation proceeds by iteratively proposing new states xnew based on the Metropolis algorithm (Metropolis et al. 1953). These proposal states are generated by perturbing the current state xcur. For example, in Boltzmann annealing, this can be done by drawing the proposal state xnew from the proposal distribution gðxnew jxcur Þ ¼
1 exp kxnew xcur k2 =2T : ð2pT ÞD=2
For every proposed move, the difference Δℋ ¼ ℋ(xnew) ℋ(xcur) between the current state, xcur, and the proposed state, xnew, is evaluated. If Δℋ < 0, the proposed state is accepted as the current state. On the other hand, if Δℋ > 0, the acceptance probability is given by the exponential expression Pacc ¼ 1/ [1 þ exp Δℋ/T] (Salamon et al. 2002; Ingber 2012). The decision whether to accept xnew is implemented by generating a random number r~U(0, 1) from the uniform distribution between 0 and 1; if r Pacc, the proposed state is accepted (xcur is updated to xnew), otherwise the present state is retained as the current state. The above procedure is repeated a number of times before the temperature is lowered. The acceptance rate is equal to the percentage of proposal states that are accepted. The initial temperature is selected to allow high acceptance rate (e.g., ≈80%) so that the algorithm can move between different local optima. The temperature reduction is determined by the annealing schedule which is a key factor for the performance of the SA algorithm. The annealing schedule depends on the form of the generating function used: to ensure that the global minimum of ℋ(x) is reached, it must be guaranteed that all states x w can be visited an infinite number of times during the annealing process. In the case of Boltzmann annealing, this condition
V
1616
Very Fast Simulated Reannealing
requires a cooling schedule no faster than logarithmic, i.e., Tk ¼ T0 ln k0/ ln k, where kmax k k0. The integer index k counts the simulation annealing time and k0 > 1 is an arbitrary initial counter value (Ingber 2012). The final temperature, T kmax , should be low enough to trap the objective function in the global optimum state.
Fast (Cauchy) Simulated Annealing Boltzmann SA requires a slow temperature reduction schedule as determined by the logarithm function. In practical applications, an exponential schedule Tk þ 1 ¼ cTk where 0 < c < 1 is often followed. However, this fast temperature reduction enforces simulated quenching which drives the system too fast toward the final temperature. The exponential annealing schedule offers computational gains, but it does not fulfill the requirement of weak ergodicity (infinite number of visits to each state). Hence, it does not guarantee convergence of SA to the global minimum. A fast annealing schedule which lowers the temperature according to Tk ¼ T0/k is suitable for the Cauchy generating distribution: gðxnew jxcur Þ ¼
T kxnew xcur k2 þ T 2
ðDþ1Þ=2
:
Fast cooling works in this case due to the “heavy tail” of the Cauchy distribution which carries more weight in the tail than the Gaussian distribution used in Boltzmann SA (Ingber 2012; Salamon et al. 2002). The resulting higher probability density for proposal states considerably different than the current state allows the algorithm to visit efficiently all the probable states in the parameter space.
Very Fast Simulated Reannealing ASA includes three main ingredients similar to classical annealing: the generating distribution of proposal states, the acceptance probability, and the annealing schedule. ASA uses an acceptance temperature Tacc(ka) which controls the acceptance rate of proposed moves and a set of D temperatures fT i ðki ÞgD i¼1 which control the width of the generating distribution for each parameter individually. In addition to these three ingredients, ASA includes a reannealing scheme which rescales all the temperatures after a certain number of steps so that they adapt to the current state of ℋ(x). Reannealing adjusts the rate of change of the annealing schedule independently for each parameter. This helps the algorithm adapt to changing sensitivities of the objective function as
it explores the parameter space, encountering points (in D-dimensional space) with very different local geometry, where ℋ may change rapidly with respect to some parameters but considerably more slowly with respect to others. The mains steps of ASA are outlined in the text below and in Algorithm 1. Generation of proposal states: If the parameters xi are constrained within [Ai, Bi], where Ai and Bi are, respectively, the lower and upper bounds, the ASA proposal states are generated by means of the following steps (Ingber 1989; Chen and Luk 1999; Ingber 2012): xnew ¼ xcur þ Dx,
ð1aÞ
Dxi ¼ yi ðBi Ai Þ, i ¼ 1, . . . , D,
ð1bÞ
yi ¼ sign ui
1 T ðk Þ 2 i i
1þ
1 T i ðki Þ
j2ui 1j
where ui U ½0, 1,
1
ð1cÞ ð1dÞ
is a random variable uniformly distributed between zero and one. Consequently, the random variable yi is centered around zero and takes values in the interval [1, 1]. The integer indices ki are used to count time. Certain values of yi can yield proposed parameters outside the range [Ai, Bi]; these values should be discarded (Ingber 1989). Lower values of temperature force yi to concentrate around zero, while high temperature values lead to an almost uniform distribution of yi between 1 and 1. Acceptance probability: The acceptance probability of a proposed state is given by (Chen and Luk 1999) Pacc ¼
1 : 1 þ eDℋ=T acc ðka Þ
ð2Þ
In Eqs. (1) and (2), the set of integer indices fki gD i¼1 [ fk a g represents different annealing times. ASA uses one time index per parameter, so that the reannealing process can adjust the annealing time differently for each parameter. This multidimensional annealing schedule enables adaptation of the proposal states to different sensitivities of ℋ(x) with respect to the parameters xi, i ¼ 1, . . ., D. Reannealing: Every time a number Nacc of proposal states have been accepted, reannealing is performed. This procedure adjusts the annealing temperatures and times to the local geometry of the parameter space. The sensitivities fsi gD i¼1 are calculated as follows: si ¼
@ℋðxÞ ℋðx þ aei Þ ℋðxÞ , i ¼ 1, . . . , D, @xi a ð3Þ
Very Fast Simulated Reannealing
1617
Very Fast Simulated Reannealing, Algorithm 1 ASA main steps. The integers n and N count the accepted and generated states, respectively. Ns: # proposed states between annealing steps. Nacc: # accepted proposals between reannealing steps.
Input: Initialize: x0 ∈ X , Tacc (0) ← H(x0 ) , ki ← 1 , ka ← 1 , xcur ← x0 , Ti (0) ← 1, n ← 0, N ← 0 1
while termination condition is not met do Annealing loop
2
Generate proposal state xnew using Eq. (1) ;
3
N ← N + 1 (increase generated states counter) ;
4
if H(xnew ) < H(xcur ) then Update current state xcur ← xnew ; n ← n + 1 (increase accepted states counter)
5 6
else
7
Calculate acceptance probability Pacc from Eq. (2) ;
8
if Pacc > r ∼ U (0, 1) then Update current state xcur ← xnew ; n ← n + 1 (increase accepted states counter)
9
end
10 11
end
12
if n > Nacc then
13
Reannealing based on Eqs. (3)-(5) ;
14
n ← 0 (reset accepted states counter)
15
else Go to Step 2
16 17
end
18
if N > Ns then
19
Annealing based on Eqs. (6) ;
20
N ← 0 (reset generated states counter)
21 22
end end
where x is the current optimal parameter vector, a is a small increment, and ei is a D-dimensional unit vector in the i-th direction of parameter space, i.e., fei gk ¼ di,k (δi, k ¼ 1 for i ¼ k and δi,k ¼ 0 for i 6¼ k being the Kronecker delta). Reannealing adjusts the temperatures and annealing times as follows ( implies assignment of the value on the right side of to the variable appearing on the left side) T i ðk i Þ
ki
smax T ðk Þ, si i i T ð 0Þ 1 log i c T i ðk i Þ
smax ¼ max si , i¼1, ..., D
ð4aÞ
D
, i ¼ 1, . . . , D,
ð4bÞ
where c > 0 is a user-defined parameter that adjusts the rate of reannealing and Ti(0) is usually set to unity (Chen and Luk 1999). Similarly, the acceptance temperature is rescaled according to
T acc ðka Þ ka
ℋðxÞ,
T acc ð0Þ
T acc ð0Þ 1 log c T acc ðka Þ
ℋðxa Þ,
ð5aÞ
D
,
ð5bÞ
where xa is the last accepted state. The reannealing procedure enables ASA to decrease the temperatures Ti along the high-sensitivity directions of ℋ(x), thus allowing smaller steps in these directions, and to increase Ti along the low-sensitivity directions, thus allowing larger jumps. Annealing: After a number of steps Ns have been completed, the annealing procedure takes place. The annealing times increase by one, and the annealing temperatures are modified according to Eq. (6). The annealing schedules for the temperature parameters follow the stretched exponential expression
V
1618
Very Fast Simulated Reannealing
T acc ðka Þ ¼ T acc ð0Þ exp ck 1=D , a 1=D
T i ðki Þ ¼ T i ð0Þ exp ck i
, i ¼ 1, . . . , D,
ð6Þ
where Ti(ki) is the current temperature for the i-th parameter (Ingber 1989). Termination: The above operations are repeated until the algorithm terminates. Different termination criteria can be used in ASA depending on the available computational resources and a priori knowledge of ℋ. Such criteria include the following: (i) an average change of ℋ(x) (over a number of accepted proposal states) lower than a specified tolerance; (ii) exceedance of a maximum number of iterations (generated proposal states); (iii) exceedance of a maximum number (usually proportional to D) of ℋ evaluations; (iv) exceedance of a maximum computational running time; and (v) attainment of a certain target minimum value for ℋ. Initialization: An initial value, Tacc(0), should be assigned to the acceptance parameter. One possible choice it to set Tacc(0) ¼ ℋ(x0) where x0 is the random initial parameter vector (Chen and Luk 1999). A more flexible approach matches Tacc(0) with a selected acceptance probability, e.g., Pacc ¼ 0.25 (Iglesias-Marzoa et al. 2015). The optimal values of Ns and Nacc do not seem to depend crucially on the specific problem (Chen and Luk 1999). Often an adequate choice for Nacc is on the order of tens or hundreds and for Ns on the order of hundreds or thousands. Higher values may be necessary to adequately explore high-dimensional and topologically complex parameter spaces. A good choice for the annealing rate control parameter is c ≈ 1 10. In general, higher values lead to faster temperature convergence at the risk that the algorithm may get stuck near a local minimum. Lower values lead to slower temperature reduction and therefore increase the computational time. Given ASA’s adaptive ability, the choices for c, Nacc, Ns do not critically influence the algorithm’s performance but they have an impact on the computational time. For problems where ASA is used for the first time, it is a good idea to experiment with different choices (IglesiasMarzoa et al. 2015).
determining the radial velocities of binary star systems and exoplanets is given in Iglesias-Marzoa et al. (2015). Software Implementations In Matlab, SA is implemented by means of the command simulannealbnd. In R, the packages optimization and optim_sa provide SA capabilities. In Python, this functionality can be found in the scipy.optimize module. A C-language code for ASA (VFSR) developed by Lester Ingber is available from his personal website.
Summary and Conclusions ASA (VFSR) is a computationally efficient implementation of the simulated annealing algorithm which employs a fast (stretched-exponential) annealing schedule in combination with a reannealing program which periodically resets the temperature. ASA is an adaptive algorithm: both the annealing and reannealing schedules take into account the sensitivity of the objective function in different directions of parameter space. Thanks to this adaptive ability, ASA is an effective stochastic global optimization method which has been successfully applied to complex, nonlinear objective functions that may involve many local minima. Estimates of parameter uncertainty can be obtained by means of the Fisher matrix and Markov chain Monte Carlo methods. ASA converges faster than the classical SA algorithm, it allows each parameter to adapt individually to the local topology of the objective function, and it involves a faster schedule for temperature reduction (Iglesias-Marzoa et al. 2015). The counterbalance of these advantages is that ASA is more complex than the classical SA algorithm, it involves more internal parameters, and the stretched exponential expression for temperature reduction requires using double precision arithmetic and checking of the exponents to avoid numerical problems. Overall, ASA has found several applications in the geosciences (Sen and Stoffa 1996; Pyrcz and Deutsch 2014; Hristopulos 2020).
Cross-References Applications An introduction to SA for spatial data applications is found in Hristopulos (2020). Geostatistical applications are reviewed in Pyrcz and Deutsch (2014). An informative review of ASA (VFSR) is given in Ingber (2012), while a mathematical treatment of SA is presented in Salamon et al. (2002). ASA has been successfully used for geophysical inversion (Sen and Stoffa 1996). The ASA algorithm and its application in various signal processing problems are reviewed in Chen and Luk (1999). A detailed investigation of ASA’s application for
▶ Markov Chain Monte Carlo ▶ Optimization in Geosciences ▶ Simulated Annealing ▶ Simulation
Bibliography Chen S, Luk BL (1999) Adaptive simulated annealing for optimization in signal processing applications. Signal Process 79(1):117–128. https://doi.org/10.1016/S0165-1684(99)00084-5
Virtual Globe Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741. https://doi.org/10.1109/TPAMI.1984. 4767596 Hristopulos DT (2020) Random fields for spatial data modeling: a primer for scientists and engineers. Springer, Dordrecht. https://doi.org/10. 1007/978-94-024-1918-4 Iglesias-Marzoa R, López-Morales M, Arévalo Morales MJ (2015) The rvfit code: a detailed adaptive simulated annealing code for fitting binaries and exoplanets radial velocities. Publ Astron Soc Pac 127(952):567–582. https://doi.org/10.1086/682056 Ingber L (1989) Very fast simulated re-annealing. Math Comput Model 12(8):967–973. https://doi.org/10.1016/0895-7177(89)90202-1 Ingber L (2012) Adaptive simulated annealing. In: Hime A, Ingber L, Petraglia A, Petraglia MR, Machado MAS (eds) Stochastic global optimization and its applications with fuzzy adaptive simulated annealing. Springer, Heidelberg, pp 33–62. https://doi.org/10.1007/ 978-3-642-27479-4 Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. https://doi.org/10.1063/1.1699114 Pyrcz MJ, Deutsch CV (2014) Geostatistical reservoir modeling, 2nd edn. Oxford University Press, New York Salamon P, Sibani P, Frost R (2002) Facts, conjectures, and improvements for simulated annealing. SIAM monographs on mathematical modeling and computation, SIAM, Philadelphia. https://doi.org/10. 1137/1.9780898718300 Sen MK, Stoffa PL (1996) Bayesian inference, Gibbs’ sampler and uncertainty estimation in geophysical inversion. Geophys Prospect 44(2):313–350. https://doi.org/10.1111/j.1365-2478.1996. tb00152.x
Virtual Globe Jaya Sreevalsan-Nair Graphics-Visualization-Computing Lab, International Institute of Information Technology Bangalore, Electronic City, Bangalore, Karnataka, India
Definition While data visualization is widely used for exploring and visually comprehending the data, virtual reality (VR) takes it to a higher level of immersive experience. VR provides an artificially simulated environment in which the user can be part of the environment and interact with it, as permitted by the VR application. VR has been used as an extension to several graphical user interface (GUI) applications, but one of the most relevant applications is representing the threedimensional (3D) model of the planet Earth itself. This 3D representation of the Earth extends to a four-dimensional (4D) one whenever spatiotemporal data is used. Such an application allows the user to explore and experience the environment and to edit the data loaded to visualize the world, representing the natural and man-made artifacts. Such a VR application is referred to as the virtual world
1619
(Yu and Gong 2012). An example of a virtual globe that has popularized this technology in recent times is the Google Earth VR.1
Overview The history of virtual globes is not very old. The Geoscope was built as a large spherical display that entailed modeling and simulating a virtual environment of the Earth using computers, by the architect Buckminster Fuller in the 1960s (Yu and Gong 2012). This is considered as the early beginnings of the virtual globe. This has been further conceptualized by Al Gore in the early 1990s as the “Digital Earth” to serve as a digital access point for massive scientific as well as human-centric data. After a hiatus, by the late 1990s, the advent of high-performance graphics processor unit (GPU) technology, availability of high-resolution satellite images, and high-speed Internet connectivity paved the way to actual implementations of the virtual globes. The popular virtual globes released by the prominent technology as well as space organizations include the Encarta Virtual Globe 98 and Bing Maps Platform (Microsoft), 3D World Atlas (Cosmi), WorldWind (NASA), Google Earth (Google), and ArcGIS Explorer (ESRI). There are also several free virtual globe software, such as Earth3D and Marble, where the latter is open source too. The virtual globe must be distinguished from a data visualization tool with a GUI involving 3D rendering of the globe. An example of such a browser-based visualization tool with 3D animated map for visualization of global weather conditions is the Earth Nullschool.2 While a virtual globe is a special visualization tool of the planet, all visualization tools depicting the Earth cannot be considered as virtual globes. This is because virtual globes involve VR where the user is immersed in the virtual environment, which is achieved by using simulations of the camera orientation and position in the graphical application. The virtual globes have also been referred to as “geobrowsers” (Yu and Gong 2012) owing to their function of exploring the Earth, which is analogous to the exploration of the Internet by the Web browsers. In a virtual globe, the worldwide geographic maps can be graphically rendered using different viewpoints and geographical projection systems (Yang et al. 2018). It can be set up in one of the four ways – (i) the conventionally used exocentric 3D globe, where the viewer sees from outside of the globe; (ii) a 2D flat map of the globe projected onto a plane; (iii) an egocentric 3D globe, where the viewer is immersed in the environment; or (iv) a curved map of the
1 2
https://arvr.google.com/earth/ https://earth.nullschool.net/about.html
V
1620
globe. The curved map is a projection on a section of a sphere, giving the appearance of an opera stage or a theater. Extensive user study has shown that the most accurate responses in the analytical tasks using the virtual globe, as well as in direction estimation, have been obtained in the case of the exocentric virtual globe. Motion sickness is a known issue users face in VR applications. The user study also revealed that the motion sickness was least felt in the cases of exocentric globe and flat maps. Exocentric globe, where the viewer sees the world as from space, is the most intuitive, and egocentric globe, where the viewer is in the center of the sphere, which renders the globe in its inner concave walls, is the least intuitive. The egocentric view provided the best immersive experience, but the perceptual distortion of the familiar convex (spherical) globe to concave walls made it a poor choice for this specific VR application. Similarly, the flat map using Hammer map projection, i.e., equal-area projection, has been found to be a better fit for VR application compared to the spherical projection used in curved map. This study has been conducted using head-mounted displays (HMDs). HMDs and controllers are important accessories for virtual globes, and the technologies supported by the VR application are critical for its usability, e.g., Google Earth VR supports HMDs from HTC Vive and Oculus Rift, which are widely used HMDs. Figure 1 shows how an HMD and a hand controller are used for different settings of the Google Earth VR application.
Applications Google Earth itself has served several purposes since its release in 2005 (Yu and Gong 2012). It has been instrumental in supporting and advancing scientific research in the different domains, such as earth observations, natural hazards, human geography, etc. It has provided services for data visualization, collection, validation, integration, dissemination, modeling, and exploration and also as a decision support system. Its merits include the following: (i) support for global-scale research owing to the access to global data acquired from high-resolution satellite imaging, aerial imagery, etc., through the use of a single coordinate system, i.e., the World Geodetic System of 1984 data, and (ii) browserbased easy-to-use visualization through the use of WebGL. Studying the evolution of Google Earth points to a few pertinent gaps to be addressed to improve the uptake in the usability of virtual globes. These include the data inconsistency issue of nonuniform precision of satellite imaging and other Earth observation data; limited interoperability of the digital elevation model with other high-resolution topography data, e.g., acquired using Light Detection and Ranging (LiDAR) sensors (Bernardin et al. 2011); and limited flexibility in analytical functionality when using specific file
Virtual Globe
format, i.e., Keyhole Markup Language (KML), Web services, and API or software development kit (SDK), e.g., NASA WorldWind. The learnings from the Google Earth project have been incorporated in the making of the Google Earth VR, which is a relatively modern virtual globe, launched in late 2016 (Käser et al. 2017). In addition to the features to scale-andfly, search, rotate, and teleport using the HMD and controllers (Fig. 1), Google Earth VR also provides haptics and sound effects, e.g., when using the 3D Cone Drag gesture to grab the planet itself, the user can feel and hear the static friction that the Earth exhibits. In terms of performance, despite the requirement to render trillions of triangles in the data, Google Earth VR with both hardware (GPU) and software optimizations provides 90 fps (frames per second), which is a requirement for VR. The improved user interactivity of the virtual globe has found itself an application in facilitating public participation in an urban planning project (Wu et al. 2010). In an example where Google Earth is used as the virtual globe application, a distributed system has been designed to display urban planning information gathered from the virtual globe and to facilitate discussion forums between urban planning designers and the general public. The general public can explore the relevant geospatial data from a macro- to a micro-spatial scale, where the latter is the street scale. The architecture for such a distributed system has been possible by using the Web Service technology, namely, Service-Oriented Architecture (SOA) integrated with a virtual globe, which is now treated as a Web Service. It has three layers, namely, the support, the service, and the client layers. Such a system has enabled the public to inspect an urban planning project at their own convenience and provide feedback to the planners. Additionally, the application uses interoperable CityGML, a standard descriptive language for city models, which improves the system extensibility. However, this also requires appropriate system capability to reduce network latency issues incurred owing to transfer of large XML files in CityGML. Overall, this example shows how virtual globes can be relevant when integrated with other systems for pertinent applications. Demonstrating the diversity in applications, Crusta is an example of virtual globe to visualize planetary scale topography for geological studies (Bernardin et al. 2011). Crusta has been proposed to address the challenges in visualization of digital elevation models (DEMs) of high spatial resolution of 0.5–1 m2 per sample and large extent (>2000 km2). The globe is geometrically rendered using a 30-sided polyhedron, defined by base patches, that can be subdivided into different resolutions in a pyramidal format. The subdivision is allowed to accommodate input data of varying resolutions, including global (BlueMarble by NASA) and local (acquired using tripod LiDAR) spatial scales. The use of the polyhedron resolves the issue of “pinching” or distortion at the poles.
Virtual Globe
1621
V
Virtual Globe, Fig. 1 Examples of the usage of head-mounted display (HMD) and hand controller by a viewer in a virtual globe application, namely, the Google Earth VR, and its corresponding visualizations. (Image courtesy: https://arvr.google.com/earth/)
1622
Crusta implements dynamic shading for its graphical rendering, which has clearly revealed the previous ruptures along the Owens Valley fault in Eastern California. One of the strengths of Crusta has been in providing visualizations with high-resolution topographic detail along with the shading and exaggeration of the topography, in addition to the vertical exaggeration that all virtual globes provide. This is a use case of virtual globes for geological exploration and knowledge discovery. Archaeology is an area where virtual globes has practical applications, as they help to preserve fragile artifacts by its remote exploration without the physical presence of human beings. Archaeological data involves spatiotemporal information and hence is 4D (De Roo et al. 2017). The usability test of extending the application of a 3D virtual globe to a 4D archaeological geographical information system (GIS) has revealed its feasibility for the same. Testing the application on five archaeologists with a scenario of excavation site of Kerkhove, Belgium, has shown that the 4D virtual globe can be used for fieldwork preparation, report generation, and communication by knowledge sharing on the Internet. While the spatial and temporal dimensions have been decoupled to a larger extent in its treatment in this specific application, there is scope for tighter coupling between the two to improve querying as well as simulation in the combined 4D space. Altogether, virtual globes can be used in archaeology for site exploration, as well as for gaining insights to the site evolution using 4D GIS. A similar use case of 4D geospace in virtual globe is in visualizing and assessing geohazards (Havenith et al. 2019). The requirement here for assessment and analysis of spatiotemporal phenomena, namely, geohazards, is to have a combination of GIS, and geological modeling, supplemented by interactive tools, such as the freely available Google Earth. Virtual geographic environments (VGEs) have been originally conceptualized and implemented to provide multidimensional spatiotemporal geographic analysis along with real-time collaboration between geoscientists. However, the relatively low contributions by geoscientists to build VGEs had not led to the intended fluid interactions and results sharing, despite the presence of collaborative virtual environments (CVEs). In the past, CVEs had been effectively used for collaborative visualization and analysis of geospatial data, using mapping elements, interactive cooperative work, and semiotics. In this backdrop, the development of VR has now allowed immersive visualization, thus adding a new dimension to the VGEs. The increased commoditization and advent of low-cost VR technology has promoted the use of virtual globes as a VGE. In addition to geological applications, e.g., Crusta, the virtual globes can now be extended with numerical models to simulate, visualize, and analyze natural hazards, such as dam breaks, rock falls, etc. Such a system requires combining large-scale high-resolution geophysical-
Virtual Globe
geological data, DEMs, textures from remote imagery, etc. and visualization support, such as 3D stereo-visualization with HMD for VR. While there is scope for extending Google Earth VR to studying geohazards, there is now the need for integrating more compute resources to solve numerical models, along with maintaining the frame rate of 90 fps for rendering in VR. As an application dealing with risk assessment, handling the epistemic and total uncertainty in geohazard assessment is a desirable feature in the 4D geospace depicted in virtual globe. Thus, handling uncertainty has to be considered as a future development. Another critical feature of a fully integrated 3D geohazard modeling application is the seamless interoperability of the virtual globe with several decision support systems. Apart from geospatial applications, virtual environments also play an instrumental role in geo-education (Shakirova et al. 2020). VR is considered one of the effective emerging technologies that can be used in STEM education. An example is the use of virtual globes in geoscience education. A specific study testing the usability of Google Earth VR, Apple Maps (for “visiting” cities), My Way VR (for a virtual journey through countries and cultures), and VR Museum of Fine Arts (for visiting museums), for teaching/learning a physical geography course, has revealed that the experience of using the virtual globes had shifted the aspects of e-learning, such as gamification, interaction, professional activity imitation, etc., from being “desirable” to “expected,” by the students. This shows that the impact of virtual globes has been significant, leading to an increase in demand of its application in e-learning. In addition to developing a pedagogy involving experiences of physical phenomena (climatic changes, natural phenomena) and world tourism, using VR in geo-education provides opportunities for creating content through international educational groups, thus improving the quality of global education. Altogether, the use of more advanced GVEs, through virtual globes, in geo-education promotes geographic research.
Future Scope While there are specific challenges to be addressed in specialized applications, the future of virtual world technology is in its transition to open-source software (OSS) (Coetzee et al. 2020). Several technologies of virtual world are already available as free applications, for which Google Earth is an example. One step further, NASA WorldWind3 is an example of an open-source virtual globe (Pirotti et al. 2017). WorldWind was originally developed for visualizing the NASA MODIS (Moderate Resolution Imaging Spectroradiometer) land
3
https://worldwind.arc.nasa.gov/about/
Vistelius, Andrey Borisovich
1623
products. It enables visualization of multidimensional data with the help of a framework that uses a digital terrain model (DTM) as its base. It supports several applications involving local communities through GeoWeb 2.0, e.g., exploration of cultural heritage sites, virtual cities to promote world tourism, etc. Just like Google Earth, it was initially freely available, but then it slowly transitioned to open source in the early 2000s. The open source of WorldWind has been available in a variety of programming languages and operating systems – in C# since 2003, in Java since 2006, in Android since 2012, and in JavaScript since 2014. This has been made possible by publishing a SDK that developers can use for building their software for GUIs. These GUIs are developed on the browser increasingly, thus using HTML5 and JavaScript. The SDK supports several file formats for input data, e.g., GeoTiFF, JPG, and PNG formats for raster data, ESRI Shapefiles, KML, and Collada for vector models. It also supports REST, WMS, and Bing Maps services for adding Web Service layers. Releasing the WorldWind software as open source has encouraged the developer community to build several relevant applications, e.g., the Environment Space and Time Web Analyzer (EST-WA) based on netCDF file format, PoliCrowd as a collaborative tourism/cultural heritage platform based on Java, etc. Thus, the evolution of WorldWind in itself and the engagement of larger developer community have extensively promoted the collaborative use of virtual globes in several innovative applications. One of the near-future goals in terms of improving the usability of virtual globes is to include them in OSGeo (open-source geospatial) software. This can open up further opportunities in benchmarking, technology maturing, and increased uptake in applications worldwide (Coetzee et al. 2020). Overall, the virtual globe continues to be a useful geoscientific exploratory and analytic tool through immersive experience.
engine for 3D applications: NASA WorldWind. Open Geospat Data Softw Stand 2(1):1–14 Shakirova N, Said N, Konyushenko S (2020) The use of virtual reality in geo-education. Int J Emerg Technol Learn 15(20):59–70 Wu H, He Z, Gong J (2010) A virtual globe-based 3D visualization and interactive framework for public participation in urban planning processes. Comput Environ Urban Syst 34(4):291–298 Yang Y, Jenny B, Dwyer T, Marriott K, Chen H, Cordeil M (2018) Maps and globes in virtual reality. Comput Graph Forum 37:427–438 Yu L, Gong P (2012) Google Earth as a virtual globe tool for earth science applications at the global scale: progress and perspectives. Int J Remote Sens 33(12):3966–3986
Bibliography
Biography
Bernardin T, Cowgill E, Kreylos O, Bowles C, Gold P, Hamann B, Kellogg L (2011) Crusta: a new virtual globe for real-time visualization of sub-meter digital topography at planetary scales. Comput Geosci 37(1):75–85 Coetzee S, Ivánová I, Mitasova H, Brovelli MA (2020) Open geospatial software and data: a review of the current state and a perspective into the future. ISPRS Int J Geo Inf 9(2):90 De Roo B, Bourgeois J, De Maeyer P (2017) Usability assessment of a virtual globe-based 4D archaeological GIS. In: Advances in 3D geoinformation. Springer, pp 323–335 Havenith HB, Cerfontaine P, Mreyen AS (2019) How virtual reality can help visualise and assess geohazards. Int J Digital Earth 12(2): 173–189 Käser DP, Parker E, Glazier A, Podwal M, Seegmiller M, Wang CP, Karlsson P, Ashkenazi N, Kim J, Le A et al (2017) The making of Google Earth VR. In: ACM SIGGRAPH 2017 talks, ACM, pp 1–2 Pirotti F, Brovelli MA, Prestifilippo G, Zamboni G, Kilsedar CE, Piragnolo M, Hogan P (2017) An open source virtual globe rendering
Andrey Borisovich Vistelius was born in St. Petersburg, Russia, on 7 December 1915. His father Boris Vistelius was a lawyer before the October Revolution of 1917. Boris’s father (Andrey’s grandfather) was a senior civil servant in the Russian Empire. Relatives of Andrey’s mother (the Bogaevsky family) included distinguished academics. There is no published information on Vistelius’ early childhood and how he and his family fared during the
Vistelius, Andrey Borisovich Stephen Henley Resources Computing International Ltd, Matlock, UK
Fig. 1 Andrey Borisovich Vistelius (1915–1995)
Modified after Henley S. (2018) Andrey Borisovich VISTELIUS. In: Daya Sagar B., Cheng Q., Agterberg F. (eds) Handbook of Mathematical Geosciences. Springer, Cham; direct link to the article (https://link.springer. com/article/10.1007/s13202-015-0213-7). According terms of Creative Commons Attribution 4.0 International License (http://creativecommons. org/licenses/by/4.0/)
V
1624
revolution and civil war. However, in 1933 Andrey entered Leningrad State University as a student. But in 1935, the family was exiled from Leningrad like many other intellectuals; first to a remote village in middle Russia, later to the city of Samara. As a result, Andrey’s education was interrupted. His studies in Leningrad were eventually resumed and in 1939 he graduated brilliantly from the Department of Mineralogy. Geology and mathematics were his overwhelming passions, with emphasis on practical applications. He was committed to evidence-based science, which in the prevailing atmosphere of “lysenkoism” was a barrier to his career progress. During World War II, Andrey Vistelius was trapped in besieged Leningrad. He was not enlisted into the army because of poor eyesight. However, his studies continued, with award of his “Candidacy” (equivalent to PhD) in 1941, and Doctor of Science in 1948. He then worked in several state geological organizations, including as director of “expeditions” (responsible for regional geological mapping). However, to overcome continuing political obstacles, with the support of mathematical Academicians, in 1961 a group headed by Vistelius was set up as the Laboratory of Mathematical Geology within the Leningrad Branch of the Steklov Institute of Mathematics (LOMI) of the USSR Academy of Sciences. In 1968, he was instrumental, with others, in founding the International Association for Mathematical Geology (IAMG), and was elected its first president. Although the Cold War prevented him from participating in many of IAMG’s activities, he continued work as a prolific researcher in Leningrad. Vistelius’ scientific achievements, notably in process modeling and statistical testing of hypotheses against real data were recognized by IAMG, awarding him the Krumbein Medal in 1980 and naming a new award (for young scientists) after him. The Soviet Union authorities rejected this latter proposal on the grounds that such an honor should not be conferred upon a living person, so it was initially designated the President’s Award, and changed to the Vistelius Award, as originally intended, after his death in 1995.
Vistelius, Andrey Borisovich
In 1987 the Laboratory of Mathematical Geology was transferred to the Institute of Precambrian Geology and Geochronology, as a prelude to its conversion to an Institute in 1991 when the Russian Academy of Natural Sciences (RANS) was founded, and Andrey Vistelius was named an Honorary Member of this Academy. Andrey Borisovich Vistelius died on 12 September, 1995. He continued to work, lucid and inventive, to the end despite serious illness. In 1992, an English translation of his life’s work “Principles of Mathematical Geology” was published (Vistelius 1992) – a reworked version of his Russian monograph published in 1980. Altogether he had more than 200 published works (Henley 2018; Dech and Henley 2003; Romanova and Sarmanov 1970). Dech and Glebovitsky (2000) give a detailed account of the many fields in which the work of Vistelius advanced geological knowledge. A special issue of the Journal of Mathematical Geology (volume 35, number 4) in memory of Vistelius was published in 2003 with papers by many of his former colleagues, as well as one previously unpublished paper by Vistelius himself. The breadth of geoscientific subject matter and mathematical approaches shown by this collection of papers is ample illustration of the scientific legacy of Andrey Borisovich Vistelius.
Bibliography Dech VN, Glebovitsky VA (2000) Establishment of mathematical geology in Russia: some brush strokes in the portrait of the founder of mathematical geology, Prof. A.B. Vistelius. Math Geol 32(8):897–918 Dech VN, Henley S (2003) On the scientific heritage of Prof. A.B. Vistelius. Math Geol 35(4):363–379 Henley S (2018) Andrey Borisovich Vistelius. In Sagar BSD, Cheng Q, Agterberg F (eds) Handbook of mathematical geosciences: fifty years of IAMG. SpringerOpen, pp 793-812, 914pp. https://doi.org/10. 1007/978-3-319-78999-6 Romanova MA, Sarmanov OV (1970) Chapter 2: The published works of A. B. Vistelius. In: Romanova MA, Sarmanov OV (eds) Topics in mathematical geology. Springer, New York, pp 6–12 Vistelius AB (1992) Principles of mathematical geology. Kluwer Academic Publishers, Dordrecht/Boston/London, 477 p
W
Watson, Geoffrey S. Noel Cressie1 and Carol A. Gotway Crawford2 1 School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, NSW, Australia 2 Albuquerque, NM, USA
Fig. 1 By Grasso Luigi – Own work, CC BY-SA 4.0, https://commons. wikimedia.org/w/index.php?curid=68766582
Biography Geoffrey S. Watson was born in 1921 in Bendigo, a rural community in the state of Victoria, Australia. He was an Australian statistician who spent the majority of his career in North America (USA and Canada), with shorter periods in Australia and the UK. He received a Bachelor of Arts with Honours from the University of Melbourne (1942) and a PhD in statistics from North Carolina State University (1951). He wrote his PhD thesis while visiting Cambridge University in the UK and, while there, he worked with James Durbin of the
London School of Economics. Together they published companion papers in 1950 and 1951 in the journal Biometrika, on what is now called the Durbin-Watson statistic. It is used to test the presence of serial correlation in data observed at equal time intervals and is a fundamental component of most time series software packages. Generalizing from time series to spatial data brings attention to the variogram at its closest spatial lag: If this value is small compared to its sill, a null hypothesis of spatial independence is rejected. With fast computing, it is now possible to test for temporal (spatial) independence by computing the statistic for all permutations of the times (locations) of the data; from this null distribution, the percentile of the observed statistic can be obtained. Geof (as he was known and as he signed his letters) was a quintessential statistical scientist, curious about new ideas in science, with strong mathematical statistical skills and a gift for clear writing in the collaborating discipline. In the geosciences, his most influential work was to help resolve the controversy between “fixism” and “mobilism,” leading up to mobile-plate tectonic theory (continental drift). His research on statistics for directional data on the circle and the sphere allowed hypothesis testing on paleomagnetism of rocks on different sides of the oceans. He is the author of an important monograph, Statistics on Spheres, published in 1983. He was also influential in bringing the work of Georges Matheron and his centers of geostatistics and mathematical morphology, located in Fontainebleau, France, to the Englishspeaking world. Watson spent a summer visiting Matheron in 1972. Much of his working life was spent at Princeton University; he arrived in 1970, to head the Department of Statistics, a position he held until 1985, and he retired from the university in 1992 as emeritus professor. While at Princeton, Geof became involved in energy and environmental issues such as estimating US oil and gas reserves, climate trends, and assessing the effects of air pollution on public health and the ozone layer.
© Springer Nature Switzerland AG 2023 B. S. Daya Sagar et al. (eds.), Encyclopedia of Mathematical Geosciences, Encyclopedia of Earth Sciences Series, https://doi.org/10.1007/978-3-030-85040-1
1626
An autobiographical account of his life and research is given in two chapters of the edited book, The Art of Statistical Science: A Tribute to G.S. Watson, published in 1992. Geof was an accomplished watercolor painter; during his retirement until his death in 1998, he had shown his work in several solo shows in Princeton art galleries. His personality was unique, coming from a palette of Australian, British, and US influences.
Cross-References ▶ Geostatistics ▶ Mathematical Morphology ▶ Matheron, Georges ▶ Spatial Statistics ▶ Variogram
Bibliography Durbin J, Watson GS (1950) Testing for serial correlation in least squares regression, I. Biometrika 37:409–428 Durbin J, Watson GS (1951) Testing for serial correlation in least squares regression, II. Biometrika 38:159–178 Mardia KV (ed) (1992) The Art of Statistical Science: A Tribute to G.S. Watson. Wiley, Chichester Watson GS (1983) Statistics on Spheres. Wiley, New York
Wavelets in Geosciences Guoxiong Chen1 and Henglei Zhang2 1 State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences, Wuhan, China 2 Institution of Geophysics and Geomatics, China University of Geosciences, Wuhan, China
Definition Wavelets (or wavelet transform) are a series of mathematical functions which can analyze nonstationary signal with both time and frequency resolutions. The fundamental idea behind wavelet transform is to analyze signals according to scale, by using the scaling and shifting transform of wavelet function. They decompose a signal into different frequency components that make it up, just like the Fourier transform, but also identify where a certain frequency or wavelength exists in the temporal or spatial domain. Wavelets have been endowed with many excellent properties for signal analyzing, including the time-frequency localization, multiresolution, compression (or sparsity), decorrelation, and so
Wavelets in Geosciences
on. Wavelets have therefore enjoyed a tremendous attention and success in Geosciences community, including, but not limited to, the applications of time-frequency analysis, multiscale decomposition, filtering/denoising, and fractal/multifractal analysis.
Historic Wavelets Analysis Wavelets have greatly fascinated the research in engineering, mathematics and nature science communities since 1980s because of the versatility in numerical analysis and function approximation. Its origin should be traced back to the Fourier transform theory that was pioneered by Joseph Fourier in 1807 to express signal as the sum of a, possibly infinite, series of sines. Fourier transform provides a powerful tool for analyzing stationary signals by converting them into frequency-domain, in order to explain the characteristics of frequency (or wavelength) of signals. As the most classical and successful signal processing and analyzing method, Fourier transform has achieved great success in both theory and applications in the past 200 years. However, it is theoretically only suitable for analyzing stationary signal without resolution of timescale, making them unavailable to realize the timefrequency localization analysis of nonstationary signal. For this reason, Gabor (1946) proposed the windowed Fourier transform, which has made substantial progresses in the aspect of time-frequency localization of signals, whereas it is often limited in the practical applications for analyzing complicated signals due to the lack of self-adaptability of time-frequency window. As a milestone in the development history of Fourier transform, wavelet transform (WT) has been a perfect achievement of functional analysis, Fourier analysis, harmonic analysis, and numerical analysis. WT overcomes many shortcomings of classical Fourier analysis, such as the limitation for analyzing nonstationary signal, the lack of timefrequency positioning ability and thus absence of time-frequency resolution. As such, wavelets have the reputation of “mathematical microscope” which can analyze nonstationary signals and capture many details of signals at multiscale timefrequency resolutions. The initial concept of wavelet transform was first proposed by French geophysicists, Morlet and his colleagues (Morlet et al. 1982), with incentive to analyze and process seismic reflection signals in oil and gas exploration. They proposed the rudiment of wavelet analysis that uses shape-invariance property to overcome the shortcoming of windowed Fourier transform. Immediately, the powerful numerical analysis ability of Morlet’s method inspired many scholars, including Morlet himself, French theoretical physicist Grossman, French mathematician Meyer, and many others, to promote the rapid development of wavelet transform/analysis (e.g., Daubechies 1988; Grossmann and Morlet 1984;
Wavelets in Geosciences
Lemarié-Rieusset and Meyer 1986; Mallat 1989). In 1986, Meyer and his student Lemarie firstly proposed the fundamental idea of wavelet multiscale analysis. In 1988, Daubechies proposed the smooth orthogonal wavelet with compact support set-Daubechies basis which became one of the most widely used wavelet basis in WT. In 1989, Mallat proposed the concept of multiresolution analysis, the general method to construct orthogonal wavelet basis, and the famous Mallat algorithm of multiscale decomposition based on discrete wavelet transform. Sweldens (1995) developed the “promotion method” to construct the second-generation wavelets and made the construction of wavelets get rid of its conventional dependence on Fourier Transform. The fundamental theory of wavelet transform and analysis have been maturely developed in the last decade of the 20 century (Mallat 1999). As wavelet analysis have many advantages over Fourier analysis in many aspects, including the time-frequency localization, multiresolution, compression, and decorrelation, its applications have achieved tremendous popularity and success in wide range of scientific fields such as mathematics, physics, chemistry, biology, earth science, and computer science. In the field of earth science, wavelet transform, as an advanced data and signal analysis tool, has remarkable achievements in various case studies, such as signal denoising/filtering and time-frequency analysis in seismic exploration data processing, multiresolution analysis of geophysical field (e.g., gravity and magnetic fields) and remote sensing image, hydrological or geological time series analysis, turbulence analysis, to name but few examples. It is worth mentioning that wavelet analysis has been used as powerful tool for multiscale analysis of fractal/multifractals which show emphasis on scale-invariant properties across scales, such as singularity detection (Chen and Cheng 2016; Mallat and Hwang 1992) and multifractal spectrum analysis (Arneodo et al. 1988; Chen and Cheng 2017). Overall, wavelets are essentially used to study non-stationary processes and signals in geoscience in two ways: (1) as a signal analysis tool to extract multiscale information or components of signals and (2) as a basis for representation or characterization of the signal/process. Since wavelets is not a new theory or methodology anymore, and the theoretical developments of wavelets have been largely completed over the last two decades of the twentieth century, we will not present a full and in-depth theory description here. Several good handbooks (e.g., Mallat (1999), Percival and Walden (2000)) on wavelet theory are available and many readable papers (e.g., Kumar and Foufoula (1997)) with a good review of wavelet theory have been published. We do however describe some mathematical background of wavelet analysis in this paper, and also present some important applications of using wavelet analysis in geoscience research.
1627
Concepts Fourier Transform The Fourier transform of a function f(x) is formulated as FðoÞ ¼
1 2p
1 1
f ðxÞeiox dx,
ð1Þ
where o is the angular frequency of the periodic function eiox, and therefore F(o) can represent the frequency content of the function f(x). Using the inverse Fourier transform, the original function can be recovered by f ðxÞ ¼
1 2p
1 1
FðoÞeiox do:
ð2Þ
According to the form of Eq. (2), f(x) can be regarded as the weighted sum of the simple waveforms eiox, where the weight at particular frequency o is given by F(o). Although the Fourier transform built a bridge between the time and frequency domains, it has significant limitations in analyzing and processing nonstationary signals in several aspects. Specifically, the Fourier transform integrates the entire time domain and lacks local analysis capabilities; on the other hand, the instantaneous change of the signal (such as singularity) often affects the entire spectrum analysis. The short-time Fourier Transform (STFT) was developed to overcome above issues, and it is defined as (Gabor 1946) STFT ðo, xÞ ¼
1 2p
1 1
f ðuÞwðu xÞeiou du,
ð3Þ
where w(x) is the windowing function (e.g., Gaussian window), o and u are frequency and translation (time) parameters, respectively. The Fourier transform of the segmented signal through w(x) centered at x ¼ u then provides the position of frequency and extract local-frequency information of signal. However, STFT cannot address the problem of adaptive adjustment of resolution units in practical applications because of the fixed resolution units defined in Eq. (3). There is an urgent need for the time-frequency window or resolution units that have an adaptive adjustment function when analyzing real signals. In detail, when analyzing the low-frequency components of the signal, the resolution unit or time-frequency window is automatically widened, and the ability of frequency localization becomes relatively weak. On the other hand, when analyzing the high frequency components of the signal, the resolution unit or time-frequency window is automatically narrowed, and the ability of frequency localization is improved. In short, an ideal method of time-frequency localization should have the function of “microscopy” that is scale independent.
W
1628
Wavelets in Geosciences
Continuous Wavelet Transform A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases, and then decreases back to zero. A mother wavelet function is defined as cðxÞ ¼
1 xb , s > 0, x Z n , b Zn c sn s
ð4Þ
where s is the scaling factor and b is a timescale like translation parameter. Note that L1 norm (1/s) is used in Eq. (4) for normalization. The continuous wavelet transformation (CWT) of a function f(x) L(Zn) at scale a (related to frequency) and position b is given by (Grossmann and Morlet 1984) Wf ðs, bÞ ¼
1 sn
þ1 1
f ðxÞc
xb dx s
ð5Þ
where cs, b defines a family of isotropic wavelets through translation and dilation of the mother wavelet c(x). Eq. (5) can be further considered as a convolution product: Wf(s, b) ¼ f cs(b), and thereby the function f(x) can be decomposed into elementary spacescale contribution by convolving it with a suit of mother wavelets, which are well localized in time and frequency space. Wavelet transform provides a good solution for addressing the above-mentioned practical problems facing Fourier transform or STFT. Figure 1 shows a tiling of spacefrequency plane defined by a 1D wavelet basis, which illustrates the shape of a wavelet resolution adaptively depends on the scale. As the scale decreases, the space support is reduced but the frequency spread increases. It is becoming clear that the
resolution of wavelet transform can be improved adaptively for analyzing high-frequency signal, while it becomes poor for analyzing low-frequency signal. The zooming capability of the wavelet transform not only can help to locate isolated singularity events, but can also characterize more complex multifractal signals when possessing non-isolated singularities (Mallat 1999; Mallat and Hwang 1992). Discrete Wavelet Transform Like the discrete Fourier transform, the discrete wavelet transform (DWT) decomposes the signal into mutually orthogonal set of wavelets using a discrete set of wavelet basis with scales a and translations b. This specificity represents the main difference between CWT and DWT. In practical, the DWT often employs a dyadic grid (Mallat 1999), where the mother wavelet is scaled by power two (a ¼ 2j) and translated by an integer (b ¼ k2j), where k is a location index running from 1 to 2jN (N is the number of observations) and j runs from 0 to J (J is the total number of scales). Through defining the dyadic mother wavelet as: cj,k ðxÞ ¼ 2j=n ’ 2j x k
ð6Þ
we can obtain the DWT coefficients using the following expression Wj,k ¼ W 2j , 2j k ¼ 2j=2
þ1 1
f ðxÞc 2j x k dx
ð7Þ
On the other hand, the original signal (or its parts) can be reconstructed from the wavelet coefficients by means of the Inverse Discrete Wavelet Transform (IDWT), which is formulated as f ð xÞ ¼
1
1
Wj,k cj,k ðxÞ
ð8Þ
j¼1 k¼1
The DWT supports different mother wavelets, such as Harr, Daubechies, Biorthogonal, Symlet, Meyer, and Coiflets. The Haar wavelet is the first and simplest mother wavelet, and the Daubechies (Db) wavelet is a family of orthogonal wavelets, where Db1 is similar to Haar wavelet. A proper wavelet should be chosen for real signal analysis, which often depends on the similarly between wavelets and signal.
Applications Review of Wavelets in Geosciences
Wavelets in Geosciences, Fig. 1 Multiresolution space-frequency tilling of wavelet analysis, illustrating the shape of a wavelet resolution adaptively depends on scale. (Chen and Cheng 2016)
Wavelet transform or analysis has been widely used in many aspects of geoscience research for the past decades. In this section, we only present several interesting applications of using wavelets in geoscience (e.g., multiscale decomposition,
Wavelets in Geosciences
1629
multifractal analysis, time-frequency analysis, filtering/ denoising, etc.) as reviewing all of them in a couple of pages is certainly not possible, and hopefully will be a source of inspiration for new research. Multiresolution Analysis/Decomposition An essence of WT is “to see the wood and the trees,” in other words, to obtain multiscale natures of signals through the wavelet-based multiscale decomposition (WMD) algorithm. The classic framework of WMD is introduced by Mallat(1989) to project signal f(x) onto ði Þ
’J,k ðxÞ, cJ,k ðxÞ
a set of basis
jJ
, j Zn , k Zn , i ¼
1, 2, . . . 2n 1, where ’J,k ðxÞ ¼ 2j=n ’ 2j x k
ð9Þ
and ðiÞ
cJ,k ðxÞ ¼ 2j=n cðiÞ 2j x k
ð10Þ
are obtained by dilating and translating the mother wavelet c(x) and father wavelet (scaling function) ’(x) with the special choice a ¼ 2j and b ¼ 2jk for discretization, respectively. The WMD is realized using a simple scheme, the so-called pyramid algorithm (Fig. 2). More details about the WMD algorithm can be found in Mallat (1989). Hence, the signal f(x) can be expressed as f ð xÞ ¼ k Zn
Ak ’J,k ðxÞ þ
J j¼1 k Zn i
ðiÞ
ðiÞ
dj,k cJ,k ðxÞ
ð11Þ
Wavelets in Geosciences, Fig. 2 Pyramid architecture of the multiresolution analysis. Details and approximations are progressively built in pyramid from up to scale j (Chen and Cheng 2016)
ðiÞ
ðiÞ
with Ak ¼ f(x)’j, k(x)dx and dj,k ¼ f ðxÞcJ,k ðxÞdx termed as approximation coefficient and detail/wavelet coefficient, respectively. For instance, the multiscale representation of 2D signal f(x) up to scale J consists of a low-frequency approxðiÞ imation AJx and three high-frequency details Dj,k , i ¼ 1, 2, 3 in horizontal, vertical, and diagonal direction. In this entry, we present an application of wavelet analysis of geological time series data to determine the cyclicity of Earth’s long-term evolution, such as Wilson cycle of plate tectonics - supercontinent assembly and breakup. The zircon U-Pb age peaks and evolved δ18O isotopes in deep geological time are often linked to supercontinent assembly. The recently improved global database of U-Pb ages and δ18O isotopes from zircon grains allows the time series periodicity analysis by a popular method of wavelet transform (see Chen and Cheng (2018), and references therein). In this section, wavelet-based multiscale decomposition is undertaken to decompose one dimensional (1D) time series signal of U-Pb age distribution into multiscale components including approximations and details. In practical, there exist a variety of wavelet families in WMD algorithm (e.g., Haar, Daubechies (db), symlets (sym), and Biorthogomal, etc.), and a proper wavelet should be determined for processing geochemical time series data. Three criteria may influence the choice of wavelets including number of vanishing moments, support size, and regularity (Mallat 1999). Considering the fact that the practical geochemical time series signal often presents complex or fractal natures, the regular orthonormal wavelets with higher compact support like db8 may be better choices for obtaining reasonable decompositions. Using the five levels in WMD, the raw U-Pb age time series signal was decomposed into multiscale components involving wavelet details at scale j ¼ 1, 2, 3, 4, and 5, as shown in Fig. 3a–e, and wavelet approximation at scale j ¼ 5 in Fig. 3f. The wavelet approximation signals could be regarded as smoothing (averaging) results of original signal because of the lowpass filtering of father wavelet in WMD. Superposition of Fig. 3a–f time series can recover the raw sedimentary record, while superposition of Fig. 3a–d could construct the new signal without background interference, i.e., detrended time series. Also shown in Fig. 3g–k are Fourier spectrum analysis of each wavelet detail components, and they identified the periodicity of each detail components of U-Pb age distribution, ranging from 100 Myr to 800 Myr, related to global and continental-scale plate tectonics. In addition, WMD is often used to extract multiscale natures (including components and edges) of two-dimensional image or fields (e.g., Chen et al. 2015; Fedi and Quarta 1998). Fractal/Multifractal Analysis Wavelet transform acts as a natural tool for investigating the interscale relationship of fractal measures (e.g., scale-free,
W
1630
Wavelets in Geosciences, Fig. 3 (continued)
Wavelets in Geosciences
Wavelets in Geosciences
1631
self-similarity and singularity) because of the inherent scaling property of wavelet basis (Arneodo et al. 1988). Using wavelet or detail coefficients, the scaling behavior (i.e., Hurst exponent h) of singular increment function: |f(x þ Δl) f(x)| / |Δl|h(x), can be expressed as (Argoul et al. 1989) W c ðls, xÞ / lhnþ1 W ðs, xÞ
ð12Þ
On the other hand, the scaling behavior (i.e., singularity index α) of fractal clustering measure: m(ϵ) ¼ B(ϵ)dm / ϵ α(x), can be mirrored by scaling or approximation coefficients as (Chen and Cheng 2016) W ’ ðls, xÞ / lan W ðs, xÞ
ð13Þ
where s represents the scale, l represents the scalar number and n is the dimension of dataset. Notably, within the context of fractals, the Hurst index h characterizes the persistence or intermittency of time series, while the singularity index α characterizes the clustering or accumulation of mass or energy. Therefore, the wavelet transform provides an appropriate framework for both local Hurst detection (LHD) and local singularity analysis (LSA) of fractal signals or fields. Moving forward, utilizing the inherent scaling property of the wavelet basis, the classic multifractal spectrum analysis based on box counting also can be revisited in wavelet forms. Accordingly, the power-law scaling relation defined in Eq (13) can be written as fractal density model when using approximate coefficients of DWT (Chen and Cheng 2016), namely A(j) / (2j)α n. As such, the mass-partition function of the q-th moment routinely used for multifractal spectrum calculation can be revisited as (Chen and Cheng 2017) wr ðj, qÞ ¼
1 Nj
Nj
½Aðj, k Þq , ð1 < q < þ1Þ
ð14Þ
k¼1
where N is the total number of scaling coefficients at level j. The DWT is generally implemented using a dyadic decimation (or down-sampling) scheme, the so-called Mallat’s algorithm. These functions are built recursively in the space-scale half-plane (i.e., dyadic tree) of the orthogonal wavelet transform (see Fig. 1), cascading from an arbitrary given large scale toward small scales. Above wavelet-based strategy for implementing multifractal spectrum analysis employs the fast multiscale decomposition algorithm and the sparsity of wavelet representation; therefore, it would be less time-consuming
compared to conventional method especially for analyzing large datasets. Figures 4a, b show the local singularity sequence of the zircon U-Pb age distribution and δ18O isotope time series, respectively. Also shown in Figs. 4e, f are the point-wise Hurst sequences. Both indices were estimated using the multiscale wavelet method. The local singularity indices representing the clustering of zircon ages identify the peaks and troughs in detail, while the point-wise Hurst index characterizes the persistence of adjacent values within the series. Six statistically significant peaks at approximately 4.1, 3.4, 2.6, 1.8, 1.0, and 0.2 Ga were observed in both Fig. 4a, b for the U-Pb age spectra and δ18O analyses, and relatively smaller peaks at approximately 3.7, 2.5, 2.1, 1.4 0.8, 0.6, and 0.1 Ga were detected in the zircon age spectra. In particular, the peak around 2.7-2.5 Ga, which was subdued in the raw δ18O pattern was significantly amplified in Fig. 4b. Consequently, the estimated peaks of the age clusters and δ18O values and their extents correspond well with supercontinent assembly periods at approximately 2.7-2.5 Ga (Superia and Sclavia), 2.1-1.7 Ga (Nuna), 1.3-0.95 (Rodinia), 0.7-0.18 (Gondwana and Pangea). In addition, the Hurst sequences (Fig. 4e, f) obtained from the zircon age distribution and δ18O time series also reveal periodic patterns over 4.4 Gyrs. The higher Hurst values indicate the strong persistence of the time series (for UPb age signal it may imply a continuous growth of continent), while the lower values indicate a weaker persistence, where h ¼ 0 implies an interrupted time series value. In particular, the timing of valleys within the Hurst sequence mark the beginning and end of supercontinent development well (see Fig. 4e, f). Both the singularity and Hurst series indicate the geological time series have a highly episodic nature from a fractal perspective, but the time series periodicity requires further quantitative evaluation. Time-Frequency Analysis Wavelet analysis allows a rapid detection of periodic patterns and the determination of persistence of cycle across time. According to the form of Eq. (5), a discrete time series x(ti) of length N with sample spacing Δt can be transformed into wavelet coefficients in both time and frequency domains by Dt W ðsÞ ¼ p s
N 1
x t j cðDt ðj iÞ=sÞ,
ð15Þ
j¼0
W
ä Wavelets in Geosciences, Fig. 3 Multiscale decomposition of U-Pb age time series using wavelet-based multiscale decomposition algorithm. (a–e) represent multiscale wavelet details at scale j ¼ 1, 2, 3, 4, 5, and (f) represents approximation or residual component.
Superposition of (a–f) time series can recover the raw sedimentary record (L). Also shown in (g–k) are Fourier spectrum analysis of each wavelet detailed components, and this identifies a dominant Wilson cycle of 600–800 Myr with minor cycles (~300, ~200, ~100, Myr)
1632
Wavelets in Geosciences
Wavelets in Geosciences, Fig. 4 Local singularity analysis of (a) zircon U-Pb age database and (b) δ18O isotope signal, and point-wise Hurst detection of (e) zircon U-Pb age and (f) δ18O series (Chen and Cheng 2018a). Also shown in (c), (d), (g), and (h) are the continuous
wavelet spectrum of the singularity logs and Hurst logs of zircon U-Pb age distribution and δ18O signal, respectively. The black dotted line represents the estimates of cyclicity determined on the basis of global wavelet power spectrum
where i represents the position shifting and s is the scale expansion factor. In fact, the above equation can be regarded as an enhanced version of the discrete Fourier transform in Eq. (2): FðoÞ ¼ x tj eiotj . The difference is that the periodic exponential eiotj is replaced by a wavelet function c(t, s), and the advantage of wavelet functions is that they are well localized in both time and frequency space. There exists a series of wavelet function as diverse as Haar, Daubichies, Morlet, Mexican Hat, and so on. Among them, the Morlet wavelet is widely used because of the non-orthogonal and optimal joint time-frequency concentration, and its form is defined as
Wavelet coherence analysis (WCA) (also noted as wavelet cross-correlation analysis) is a method for quantifying similarity or coherency between two nonstationary series by using the localized correlation coefficients in time and frequency space (Torrence and Compo 1997). Given two time series X and Y, firstly, the wavelet cross-correlation transform is defined as X Y W XY i ¼ W i ðsÞ∙W i ðsÞ:
Then wavelet coherence can be defined by 2
2
cðt Þ ¼ p1=4 eiot e1=2t ,
ð16Þ
where o is dimensionless frequency. Although WT suffers from the Heisenberg uncertainty principle, the Morlet wavelet is a good choice for achieving a balance between time and frequency.
ð17Þ
2
2
X Y R2 ðsÞ ¼ S W XY i ðsÞ =s = S W i ðsÞ =s ∙S W i ðsÞ =s ,
ð18Þ in which S is a smoothing operator defined by the wavelet type used. R2 takes a value between 0 and 1, which resembles the implication of traditional correlation coefficient.
Wavelets in Geosciences
1633
Here, the CWT was used to determine the periodicity of the global zircon age distribution and δ18O time series. When processing raw data, the trend in zircon preservation potential interferes with the continuous wavelet spectra which poses the significant problem for periodicity determination (Chen and Cheng 2018a). This tendency has been eliminated by LSA and LHD, as shown in Figs. 4(a) and (b), and the resultant singularity or Hurst consequences representing isolated fluctuations geological time series signals can be employed to explore the underlying periodicity quantitatively. As expected, the application of CWT to singularity and Hurst sequences (Figs. 4c, d, g, and h) yields clearer patterns with persistent power for period estimation. The results reveal three prominent cycles, i.e., ~200 Ma, ~400 Ma, and ~ 750 Ma for both the U-Pb age distribution and δ18O signal. Specifically, the ~750 Ma cycle is persistent throughout geological time, indicating a secular change, while the less persistent cycles of ~400 Ma and ~ 200 Ma are mainly identified for the period