High Performance Computing for Geospatial Applications [1st ed.] 9783030479978, 9783030479985

This volume fills a research gap between the rapid development of High Performance Computing (HPC) approaches and their

454 53 10MB

English Pages XIII, 296 [298] Year 2020

Table of contents :
Front Matter ....Pages i-xiii
Navigating High Performance Computing for Geospatial Applications (Wenwu Tang, Shaowen Wang)....Pages 1-5
Front Matter ....Pages 7-7
High Performance Computing for Geospatial Applications: A Retrospective View (Marc P. Armstrong)....Pages 9-25
Spatiotemporal Domain Decomposition for High Performance Computing: A Flexible Splits Heuristic to Minimize Redundancy (Alexander Hohl, Erik Saule, Eric Delmelle, Wenwu Tang)....Pages 27-50
Front Matter ....Pages 51-51
Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions (Zhenlong Li)....Pages 53-76
Parallel Landscape Visibility Analysis: A Case Study in Archaeology (Minrui Zheng, Wenwu Tang, Akinwumi Ogundiran, Tianyang Chen, Jianxin Yang)....Pages 77-96
Quantum Computing for Solving Spatial Optimization Problems (Mengyu Guo, Shaowen Wang)....Pages 97-113
Code Reusability and Transparency of Agent-Based Modeling: A Review from a Cyberinfrastructure Perspective (Wenwu Tang, Volker Grimm, Leigh Tesfatsion, Eric Shook, David Bennett, Li An et al.)....Pages 115-134
Integration of Web GIS with High-Performance Computing: A Container-Based Cloud Computing Approach (Zachery Slocum, Wenwu Tang)....Pages 135-157
Cartographic Mapping Driven by High-Performance Computing: A Review (Wenwu Tang)....Pages 159-172
Front Matter ....Pages 173-173
High-Performance Computing for Earth System Modeling (Dali Wang, Fengming Yuan)....Pages 175-184
High-Performance Pareto-Based Optimization Model for Spatial Land Use Allocation (Xiaoya Ma, Xiang Zhao, Ping Jiang, Yuangang Liu)....Pages 185-209
High-Performance Computing in Urban Modeling (Zhaoya Gong, Wenwu Tang)....Pages 211-225
Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery Using Automatic Identification System Data (Xuantong Wang, Jing Li, Tong Zhang)....Pages 227-248
Domain Application of High Performance Computing in Earth Science: An Example of Dust Storm Modeling and Visualization (Qunying Huang, Jing Li, Tong Zhang)....Pages 249-268
Front Matter ....Pages 269-269
High Performance Computing for Geospatial Applications: A Prospective View (Marc P. Armstrong)....Pages 271-284
Back Matter ....Pages 285-296

Recommend Papers

High Performance Computing Systems and Applications 0306470152, 0792377745

Contains fully refereed papers from the 13th Annual Symposium on High Performance Computing, held in Kingston, Canada, J

413 48 7MB Read more

Exscalate4CoV: High-Performance Computing for COVID Drug Discovery 3031306902, 9783031306907

This book highlights the different aspects of the research project “E4C Horizon 2020 European Project” aimed at fighting

159 31 3MB Read more

Tools for High Performance Computing: Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing, July 2008, HLRS, Stuttgart [1 ed.] 9783540685616, 3540685618

With the advent of multi-core processors, implementing parallel programming methods in application development is absolu

420 9 3MB Read more

High Performance Computing in Clouds: Moving HPC Applications to a Scalable and Cost-Effective Environment [1st ed. 2023] 3031297687, 9783031297687

This book brings a thorough explanation on the path needed to use cloud computing technologies to run High-Performance C

155 30 27MB Read more

Advances in High Performance Computing: Results of the International Conference on “High Performance Computing” Borovets, Bulgaria, 2019 [1st ed.] 9783030553463, 9783030553470

Every day we need to solve large problems for which supercomputers are needed. High performance computing (HPC) is a par

350 57 65MB Read more

Principles of High-Performance Processor Design: For High Performance Computing, Deep Neural Networks and Data Science [1st ed. 2021] 3030768708, 9783030768706

This book describes how we can design and make efficient processors for high-performance computing, AI, and data science

397 70 3MB Read more

Electrospun Nanofibers from Bioresources for High-Performance Applications 1032126469, 9781032126463

Nanofibers are possible solutions for a wide spectrum of research and commercial applications and utilizing inexpensive

196 63 104MB Read more

High Performance Simulation for Industrial Paint Shop Applications 3030716244, 9783030716240

This book describes the current state of the art for simulating paint shop applications, their advantages and limitation

378 73 4MB Read more

Parallel and High Performance Computing [1 ed.] 1617296465, 9781617296468

Parallel and High Performance Computing offers techniques guaranteed to boost your code’s effectiveness. Summary Comple

1,530 394 16MB Read more

Parallel and High Performance Computing [1 ed.] 1617296465, 9781617296468

Complex calculations, like training deep learning models or running large-scale simulations, can take an extremely long

439 20 16MB Read more

High Performance Computing for Geospatial Applications [1st ed.]
9783030479978, 9783030479985

Author / Uploaded
Wenwu Tang
Shaowen Wang

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Geotechnologies and the Environment

Wenwu Tang Shaowen Wang Editors

High Performance Computing for Geospatial Applications

Geotechnologies and the Environment Volume 23

Series editors Jay D. Gatrell, Department of Geology & Geography, Eastern Illinois University, Charleston, IL, USA Ryan R. Jensen, Department of Geography, Brigham Young University, Provo, UT, USA

The Geotechnologies and the Environment series is intended to provide specialists in the geotechnologies and academics who utilize these technologies, with an opportunity to share novel approaches, present interesting (sometimes counter intuitive) case studies, and, most importantly, to situate GIS, remote sensing, GPS, the internet, new technologies, and methodological advances in a real world context. In doing so, the books in the series will be inherently applied and reflect the rich variety of research performed by geographers and allied professionals. Beyond the applied nature of many of the papers and individual contributions, the series interrogates the dynamic relationship between nature and society. For this reason, many contributors focus on human-environment interactions. The series is not limited to an interpretation of the environment as nature per se. Rather, the series “places” people and social forces in context and thus explores the many socio-spatial environments humans construct for themselves as they settle the landscape. Consequently, contributions will use geotechnologies to examine both urban and rural landscapes. More information about this series at http://www.springer.com/series/8088

Wenwu Tang • Shaowen Wang Editors

High Performance Computing for Geospatial Applications

Editors Wenwu Tang Center for Applied Geographic Information Science Department of Geography and Earth Sciences University of North Carolina at Charlotte Charlotte, NC, USA

Shaowen Wang Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign Urbana, IL, USA

ISSN 2365-0575 ISSN 2365-0583 (electronic) Geotechnologies and the Environment ISBN 978-3-030-47997-8 ISBN 978-3-030-47998-5 (eBook) https://doi.org/10.1007/978-3-030-47998-5 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

“Use the right tool for the job.” While high-performance computing (HPC) has been recognized as the right tool for computationally intensive geospatial applications, there is often a gap between the rapid development of HPC approaches and their geospatial applications that are often lagging behind. The objective of this edited book is to (help) fill this gap so that this important right tool can be used in an appropriate and timely manner. This book includes fifteen chapters to examine the utility of HPC for novel geo spatial applications. Chapter 1 serves as an introduction to the entire book overarch ing all the other fourteen chapters, which are organized into four parts. Part I (Chaps. 2 and 3) focuses on theoretical and algorithmic aspects of HPC within the context of geospatial applications. Part II (Chaps. 4–9) concentrates on how HPC is applied to geospatial data processing, spatial analysis and modeling, and cartography and geovisualization. Part III (Chaps. 10–14) covers representative geospatial applica tions of HPC from multiple domains. Part IV (Chap. 15) is a perspective view of HPC for future geospatial applications. This book serves as a collection of recent work written as review and research papers on how HPC is applied to solve a variety of geospatial problems. As advanced computing technologies keep evolving, HPC will continue to function as the right tool for the resolution of many complex geospatial problems that are often compu tationally demanding. The key is to understand both the computational and geospa tial nature of these problems for best exploiting the amazing power of HPC. The book is designed to serve this key purpose, which may help readers to identify per tinent opportunities and challenges revolving around geospatial applications of HPC—i.e., use the right tool for the job at the right time. Charlotte, NC, USA Wenwu Tang Urbana, IL, USA Shaowen Wang February 28, 2020

v

Acknowledgements

The editors want to take this opportunity to sincerely thank all the coauthors and reviewers of chapters of this book for their considerable efforts and hard work. The editors also owe special thanks to Zachary Romano and Silembarasan Panneerselvam from Springer Nature for their strong support and timely help during the review and editing process of this book. Minrui Zheng and Tianyang Chen provided assistance on the formatting of the manuscripts of this book. Nothing is more motivating than a baby cry; nothing is more relaxing than baby giggles. Wenwu Tang wants to thank his wife, Shuyan Xia, and two sons, William Tang and Henry Tang, for their love and understanding. The preparation of this book was accompanied by the birth of baby Henry. While it is always challenging to bal ance between interesting academic work and continuous family duties, enduring support from his family is the greatest source of strength for completing this book. Shaowen Wang wants to acknowledge the support of the U.S. National Science Foundation under the grant numbers: 1443080 and 1743184. Any opinions, find ings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

vii

Contents

1 Navigating High Performance Computing for Geospatial Applications �� 1 Wenwu Tang and Shaowen Wang Part I Theoretical Aspects of High Performance Computing 2 High Performance Computing for Geospatial Applications: A Retrospective View �� 9 Marc P. Armstrong 3 Spatiotemporal Domain Decomposition for High Performance Computing: A Flexible Splits Heuristic to Minimize Redundancy �� 27 Alexander Hohl, Erik Saule, Eric Delmelle, and Wenwu Tang Part II High Performance Computing for Geospatial Analytics 4 Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions�� 53 Zhenlong Li 5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology�� 77 Minrui Zheng, Wenwu Tang, Akinwumi Ogundiran, Tianyang Chen, and Jianxin Yang 6 Quantum Computing for Solving Spatial Optimization Problems�� 97 Mengyu Guo and Shaowen Wang 7 Code Reusability and Transparency of Agent-Based Modeling: A Review from a Cyberinfrastructure Perspective�� 115 Wenwu Tang, Volker Grimm, Leigh Tesfatsion, Eric Shook, David Bennett, Li An, Zhaoya Gong, and Xinyue Ye ix

x

Contents

8 Integration of Web GIS with High-Performance Computing: A Container-Based Cloud Computing Approach�� 135 Zachery Slocum and Wenwu Tang 9 Cartographic Mapping Driven by High-Performance Computing: A Review�� 159 Wenwu Tang Part III Domain Applications of High Performance Computing 10 High-Performance Computing for Earth System Modeling�� 175 Dali Wang and Fengming Yuan 11 High-Performance Pareto-Based Optimization Model for Spatial Land Use Allocation�� 185 Xiaoya Ma, Xiang Zhao, Ping Jiang, and Yuangang Liu 12 High-Performance Computing in Urban Modeling�� 211 Zhaoya Gong and Wenwu Tang 13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery Using Automatic Identification System Data�� 227 Xuantong Wang, Jing Li, and Tong Zhang 14 Domain Application of High Performance Computing in Earth Science: An Example of Dust Storm Modeling and Visualization�� 249 Qunying Huang, Jing Li, and Tong Zhang Part IV Future of High Performance Computing for Geospatial Applications 15 High Performance Computing for Geospatial Applications: A Prospective View�� 271 Marc P. Armstrong Index�� 285

Contributors

Li An Department of Geography and PKU-SDSU Center for Complex Human- Environment Systems, San Diego State University, San Diego, CA, USA Marc P. Armstrong Department of Geographical and Sustainability Sciences, The University of Iowa, Iowa City, IA, USA David Bennett Department of Geographical and Sustainability Sciences, University of Iowa, Iowa City, IA, USA Tianyang Chen Center for Applied Geographic Information Science, The University of North Carolina at Charlotte, Charlotte, NC, USA Department of Geography and Earth Sciences, The University of North Carolina at Charlotte, Charlotte, NC, USA Eric Delmelle Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA Zhaoya Gong School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, UK Volker Grimm Department of Ecological Modelling, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany Mengyu Guo University of California, Berkeley, CA, USA Alexander Hohl Department of Geography, University of Utah, Salt Lake City, UT, USA Qunying Huang Department of Geography, University of Wisconsin-Madison, Madison, WI, USA Ping Jiang School of Resource and Environmental Science, Wuhan University, Wuhan, Hubei, China

xi

xii

Contributors

Jing Li Department of Geography and the Environment, University of Denver, Denver, CO, USA Yuangang Liu School of Geosciences, Yangtze University, Wuhan, Hubei, China Zhenlong Li Geoinformation and Big Data Research Laboratory, Department of Geography, University of South Carolina, Columbia, SC, USA Xiaoya Ma School of Geosciences, Yangtze University, Wuhan, Hubei, China Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen, Guangdong, China Akinwumi Ogundiran Department of Africana Studies, The University of North Carolina at Charlotte, Charlotte, NC, USA Erik Saule Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, USA Eric Shook Department of Geography, Environment, and Society, University of Minnesota, Minneapolis, MN, USA Zachery Slocum Center for Applied Geographic Information Science, Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA Wenwu Tang Center for Applied Geographic Information Science, Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA Leigh Tesfatsion Department of Economics, Iowa State University, Ames, IA, USA Dali Wang Oak Ridge National Laboratory, Oak Ridge, TN, USA Shaowen Wang Department of Geography and Geographic Information Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA Xuantong Wang Department of Geography and the Environment, University of Denver, Denver, CO, USA Jianxin Yang School of Public Administration, China University of Geoscience (Wuhan), Wuhan, China Xinyue Ye Department of Informatics, New Jersey Institute of Technology, Newark, NJ, USA Fengming Yuan Oak Ridge National Laboratory, Oak Ridge, TN, USA Tong Zhang State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, Hubei, China

Contributors

xiii

Xiang Zhao School of Resource and Environmental Science, Wuhan University, Wuhan, Hubei, China Minrui Zheng Center for Applied Geographic Information Science, The University of North Carolina at Charlotte, Charlotte, NC, USA Department of Geography and Earth Sciences, The University of North Carolina at Charlotte, Charlotte, NC, USA

Chapter 1

Navigating High Performance Computing for Geospatial Applications Wenwu Tang and Shaowen Wang

Abstract High performance computing, as an important component of state-of- the-art cyberinfrastructure, has been extensively applied to support geospatial studies facing computational challenges. Investigations on the utility of high performance computing in geospatially related problem-solving and decision-making are timely and important. This chapter is an introduction to geospatial applications of high performance computing reported in this book. Summaries of all other chapters of this book, adapted from contribution from their authors, are highlighted to demonstrate the power of high performance computing in enabling, enhancing, and even transforming geospatial analytics and modeling. Keywords High performance computing · Geospatial applications · Computational science · Geospatial analytics and modeling

High performance computing (HPC) is an important enabler for computational and data sciences (Armstrong 2000), and allows for the use of advanced computer systems to accelerate intensive computation for solving problems in numerous domains including many geospatial problems. As multi- and many-core computing architecture continues to evolve rapidly in the context of advanced cyberinfrastructure (NSF 2007), HPC capabilities and resources (e.g., computing clusters, computational grids, and cloud computing) have become abundantly available to support and transform geospatial research and education. HPC is a key component of stateof-the-art cyberinfrastructure (NSF 2007) and its utility in empowering and enhancW. Tang (*) Center for Applied Geographic Information Science, Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA e-mail: [email protected] S. Wang Department of Geography and Geographic Information Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_1

1

2

W. Tang and S. Wang

ing domain-specific problem-solving has been widely recognized. Over the past decade in particular, HPC has been increasingly introduced into geospatial applications to resolve computational and big data challenges facing these applications, driven by the emergence of cyberGIS (Wang 2010). Therefore, investigation on how HPC has been applied into geospatially related studies is important and timely. This book is edited to serve this purpose by covering the following four parts: (1) theoretical and algorithmic aspects of HPC for geospatial applications; (2) applications of HPC in geospatial analytics including data processing, spatial analysis, spatial modeling, and cartography and geovisualization; (3) domain applications of HPC within geospatial context; and (4) future of HPC for geospatial applications. Below are the highlights of chapters in these four parts. Theories are fundamental in guiding us to fully and properly harness HPC power for geospatial applications. In Chap. 2, Armstrong provides a brief overview of the computational complexity associated with many types of geospatial analyses and describes the ways in which these challenges have been addressed in previous research. Different computer architectures are elucidated as current approaches that employ network-enabled distributed resources including cloud and edge computing in the context of cyberinfrastructure. Domain decomposition is one of the important parallel strategies for HPC from the algorithmic perspective. Spatiotemporal domain decomposition divides a computational task into smaller ones by partitioning input datasets and related analyses along spatiotemporal domain. In Chap. 3, Hohl et al. present a spatiotemporal domain decomposition method (ST-FLEX-D) that supports flexible domain splits to minimize data replication for refining a static partitioning approach. Two spatiotemporal datasets, dengue fever and pollen tweets, were used in their study. Their spatiotemporal domain decomposition strategies hold potential in tackling the computational challenge facing spatiotemporal analysis and modeling. Geospatial big data plays an increasingly important role in addressing numerous scientific and societal challenges. Efficacious handling of geospatial big data through HPC is critical if we are to extract meaningful information that supports knowledge discovery and decision-making. In Chap. 4, Li summarizes key components of geospatial big data handling using HPC and then discusses existing HPC platforms that we can leverage for geospatial big data processing. Three future research directions are suggested in terms of utilizing HPC for geospatial big data handling. Viewshed analysis is a representative spatial analysis approach that has a variety of geospatial applications. In Chap. 5, Zheng et al. propose a parallel computing approach to handle the computation and data challenges of viewshed analysis within the context of archaeological applications. This chapter focuses on bridging the gap of conducting computationally intensive viewshed analysis for geospatial applications, and investigating the capabilities of landscape pattern analysis in understanding location choices of human settlements. Spatial modeling has three pillars: spatial statistics, spatial optimization, and spatial simulation (Tang et al. 2018). HPC has been extensively applied to enable these spatial modeling approaches. For example, quantum computing has been

1 Navigating High Performance Computing for Geospatial Applications

3

receiving much attention as its potential power in resolving computational challenges facing complex problems. In Chap. 6, Guo and Wang present one of the quantum computing approaches, quantum annealing, in tackling combinatorial optimization problems. An example of applying a p-median model for a spatial supply chain optimization, representative of spatial optimization problems, was used in this chapter. Agent-based modeling falls within the category of spatial simulation, as a pillar of spatial modeling. In Chap. 7, Tang et al. present a discussion on the code reusability and transparency of agent-based models from a cyberinfrastructure perspective. Code reusability and transparency post a significant challenge for spatial models in general and agent-based model in particular. Tang et al. investigated in detail the utility of cyberinfrastructure-based capabilities such as cloud computing and high-performance parallel computing in resolving the challenge of code reusability and transparency of agent-based modeling. Stimulated by cyberinfrastructure technologies, Web GIS has become more and more popular for geospatial applications. However, the processing and visual presentation of increasingly available geospatial data using Web GIS are often computationally challenging. In Chap. 8, Slocum and Tang examine the convergence of Web GIS with HPC by implementing an integrative framework, GeoWebSwarm. Container technologies and their modular design are demonstrated to be effective for integrating HPC and Web GIS to resolve spatial big data challenges. GeoWebSwarm has been made available on GitHub. While HPC has been amply applied to geospatial data processing, analysis and modeling, its applications in cartography and geovisualization seem less developed. In Chap. 9, Tang conducts a detailed review of HPC applications in cartographic mapping. This review concentrates on four major cartographic mapping steps: map projection, cartographic generalization, mapping methods, and map rendering. Challenges associated with big data handling and spatiotemporal mapping are discussed within the context of cartography and geovisualization. Earth system models have been developed for understanding global-level climate change and environmental issues. However, such models often require support from HPC to resolve related computational challenges. In Chap. 10, Wang and Yuan conduct a review on HPC applications for Earth system modeling, specifically covering the early phase of model development, coupled system simulation, I/O challenges, and in situ data analysis. This chapter then identifies fundamental challenges of Earth system modeling, ranging from computing heterogeneity, coupling mechanisms, accelerator-based model implementation, artificial intelligence-enhanced numerical simulation, and in situ data analysis driven by artificial intelligence. Geospatial applications in the context of land use and land cover change have gained considerable attention over years. Alternative spatial models have been developed to support the investigation of complex land dynamics. Pareto-based heuristic optimization models, belonging to the category of spatial optimization for land change modeling, are effective tools for the modeling of land use decision- making of multiple stakeholders. However, Pareto-based optimization models for spatial land use allocation are extremely time-consuming, which becomes one of

4

W. Tang and S. Wang

the major challenges in obtaining the Pareto frontier in spatial land use allocation problems. In Chap. 11, Ma et al. reported the development of a high-performance Pareto-based optimization model based on shared- and distributed-memory computing platforms to better support multi-objective decision-making within the context of land use planning. How to leverage HPC to enable large-scale computational urban models requires in-depth understanding of spatially explicit computational and data intensity. In Chap. 12, Gong and Tang review the design and development of HPC-enabled urban models in a series of application categories. A case study of a general urban system model and its HPC implementation are described that show the utility of parallel urban models in resolving the challenge of computational intensity. The advancement of sensor technologies leads to huge amounts of location- based data often available at fine levels such as the ship tracking data collected by the automatic identification system (AIS). Yet, the analysis of these location-aware data is often computationally demanding. In Chap. 13, Wang et al. describe a graphics processing unit (GPU)-based approach that supports automatic and accelerated analysis of AIS trajectory data for the discovery of interesting patterns. Their parallel algorithms were implemented at two levels: trajectory and point. A web-based portal was developed for the visualization of these trajectory-related data. A range of HPC technologies has been available for geospatial applications. However, the selection of appropriate HPC technologies for specific applications represents a challenge as these applications are often involved with different data and modeling approaches. In Chap. 14, Huang et al. present a generalized HPC architecture that provides multiple computing options for Earth science applications. This generalized architecture is based on the integration of common HPC platforms and libraries that include data- and computing-based technologies. A dust storm example was used to illustrate the capability of the proposed architecture. Understanding of the past and present of geospatial applications of HPC is for better insights into its future. In Chap. 15, Armstrong highlights current advances in HPC and suggests several alternative approaches that are likely candidates for future exploitation of performance improvement for geospatial applications. A particular focus is placed on the use of heterogeneous processing, where problems are first decomposed and the resulting parts are then assigned to different, well-suited, types of architectures. The chapter also includes a discussion of quantum computing, which holds considerable promise for becoming a new generation of HPC. In summary, HPC has been playing a pivotal role in meeting the computational needs of geospatial applications and enhancing the quality of computational solutions. This trend will likely persist into the foreseeable future, particularly as transformative advance in studies of big data and artificial intelligence and the lowering of learning curve of HPC for geospatial applications continuously driven by research and development of cyberGIS (Wang and Goodchild 2018).

1 Navigating High Performance Computing for Geospatial Applications

5

References Armstrong, M. P. (2000). Geography and computational science. Annals of the Association of American Geographers, 90(1), 146–156. NSF. (2007). Cyberinfrastructure vision for 21st century discovery (Report of NSF Council). Retrieved from: http://www.nsf.gov/od/oci/ci_v5.pdf Tang, W., Feng, W., Deng, J., Jia, M., & Zuo, H. (2018). Parallel computing for geocomputational modeling. In GeoComputational analysis and modeling of regional systems (pp. 37–54). Cham: Springer. Wang, S. (2010). A CyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Annals of the Association of American Geographers, 100(3), 535–557. Wang, S., & Goodchild, M. F. (Eds.). (2018). CyberGIS for geospatial innovation and discovery. Dordrecht: Springer.

Part I

Theoretical Aspects of High Performance Computing

Chapter 2

High Performance Computing for Geospatial Applications: A Retrospective View Marc P. Armstrong

Abstract Many types of geospatial analyses are computationally complex, involving, for example, solution processes that require numerous iterations or combinatorial comparisons. This complexity has motivated the application of high performance computing (HPC) to a variety of geospatial problems. In many instances, HPC assumes even greater importance because complexity interacts with rapidly growing volumes of geospatial information to further impede analysis and display. This chapter briefly reviews the underlying need for HPC in geospatial applications and describes different approaches to past implementations. Many of these applications were developed using hardware systems that had a relatively short life-span and were implemented in software that was not easily portable. More promising recent approaches have turned to the use of distributed resources that includes cyberinfrastructure as well as cloud and fog computing. Keywords Computational complexity · Parallel processing · Cyberinfrastructure · Cloud computing · Edge computing

1 Introduction High performance computing (HPC) has been used to address geospatial problems for several decades (see, for example, Sandu and Marble 1988; Franklin et al. 1989; Mower 1992). An original motivation for seeking performance improvements was the intrinsic computational complexity of geospatial analyses, particularly combinatorial optimization problems (Armstrong 2000). Non-trivial examples of such

M. P. Armstrong (*) Department of Geographical and Sustainability Sciences, The University of Iowa, Iowa City, IA, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_2

9

10

M. P. Armstrong

problems require considerable amounts of memory and processing time, and even now, remain intractable. Other spatial analysis methods also require substantial amounts of computation to generate solutions. This chapter briefly reviews the computational complexity of different kinds of geospatial analyses and traces the ways in which HPC has been used in the past and during the present era. Several HPC approaches have been investigated, with developments shifting from an early focus on manufacturer-specific systems, which in most cases had idiosyncrasies (such as parallel language extensions) that limited portability. This limitation was recognized and addressed by using software environments that were designed to free programmers from this type of system dependence (e.g., Message Passing Interface). This is also acknowledged, for example, by the work presented in Healey et al. (1998) with its focus on algorithms rather than implementations. Later approaches used flexible configurations of commodity components linked using high- performance networks. In the final section of this chapter, still-emerging approaches such as cloud and edge computing are briefly described.

2 C omputational Complexity of Geospatial Analyses: A Brief Look Computing solutions to geospatial problems, even relatively small ones, often require substantial amounts of processing time. In its most basic form, time complexity considers how the number of executed instructions interacts with the number of data elements in an analysis. Best case, average and worst-case scenarios are sometimes described, with the worst case normally reported. In describing complexity, it is normative to use “Big O” notation, where O represents the order of, as in the algorithm executes on the order of n2, or O(n2). In geospatial data operations it is common to encounter algorithms that have an order of at least O(n2) because such complexity obtains for nested loops, though observed complexity is slightly less since diagonal values are often not computed (the distance from a point to itself is 0). Nevertheless, the complexity remains O(n2), or quadratic, because the n2 factor is controlling the limit. In practice, any complexity that is n2 or worse becomes intractable for large problems. Table 2.1 provides some simple examples to demonstrate the explosive growth in computational requirements of different orders of time complexity. The remaining parts of this section sketch out some additional examples of complexity for different kinds of geospatial analyses. • Spatial optimization problems impose extreme computational burdens. Consider the p-median problem in which p facility locations are selected from n candidate demand sites such that distances between facilities and demand location are minimized. In its most basic form, a brute force search for a solution requires the evaluation of n!/[(n-p)!p!] alternatives (see Table 2.2 for some examples).

2 High Performance Computing for Geospatial Applications: A Retrospective View

11

Table 2.1 Number of operations required with variable orders of time complexity and problems sizes Input 1 2 3 4 5 6 7 8 9 10 20

log n 0 0.301 0.477 0.602 0.699 0.778 0.845 0.903 0.954 1 1.301

Table 2.2 Brute force solution size for four representative p-median problems

n log n 1 0.602 1.431 2.408 3.495 4.668 5.915 7.224 8.586 10 26.021

n2

n3

2n

1 4 9 16 25 36 49 64 81 100 400

1 8 27 64 125 216 343 512 729 1000 8000

2 4 8 16 32 64 128 256 512 1024 1,048,576

n! 1 2 6 24 120 720 5040 40,320 36,2880 3,628,800 2.40E+17

n candidates p facilities Possible solutions 10 3 120 20 5 15,504 50 10 10,272,278,170 100 15 253,338,471,349,988,640

• As a result of this explosive combinatorial complexity, much effort has been expended to develop robust heuristics that reduce the computational complexity of search spaces. Densham and Armstrong (Densham and Armstrong 1994) describe the use of the Teitz and Bart vertex substitution heuristic for two case studies in India. This algorithm has a worst-case complexity of O(p∗n2). In one of their problems 130 facilities are located at 1664 candidate sites and in a larger problem, 2500 facilities are located among 30,000 candidates. The smaller problem required the evaluation of 199,420 substitutions per iteration, while the larger problem required 68,750,000 evaluations. Thus, a problem that was approximately 19 times larger required 344 times the number of substitutions to be evaluated during each iteration. In both cases, however, the heuristic search space was far smaller than the full universe of alternatives. • Many geospatial methods are based on the concept of neighborhood and require the computation of distances among geographical features such as point samples and polygon centroids. For example, the Gi∗(d) method (Ord and Getis 1995) requires pairwise distance computations to derive distance-based weights used in the computation of results. Armstrong et al. (2005) report a worst-case time complexity of O(n3) in their implementation. • Bayesian statistical methods that employ Markov Chain Monte Carlo (MCMC) are computationally demanding in terms of memory and compute time. MCMC samples are often derived using a Gibbs sampler or Metropolis–Hastings approach that may yield autocorrelated samples, which, in turn require larger

12

M. P. Armstrong

sample sizes to make inferences. Yan et al. (2007) report, for example, that a Bayesian geostatistical model requires O(n2) memory for n observations and the Cholesky decomposition of this matrix requires O(n3) in terms of computational requirements. These examples are only the proverbial tip-of-the-iceberg. Computational burdens are exacerbated when many methods (e.g., interpolation) are used with big data, a condition that is now routinely encountered. Big data collections also introduce substantial latency penalties during input-output operations.

3 Performance Evaluation Until recently, computational performance routinely and steadily increased as a consequence of Moore’s Law and Dennard Scaling. These improvements also applied to the individual components of parallel systems. But those technological artifacts are the result of engineering innovations and do not reflect conceptual advances due to creation of algorithms that exploit the characteristics of problems. To address this assessment issue, performance is often measured using “speedup” which is simply a fractional measure of improvement (t1/t2) where times are normally execution times achieved using either different processors or numbers of processors. In the latter case,

Speedup = t1 / t n

(2.1)

where t1 is sequential (one processor) time and tn is time required with n processors. Speedup is sometimes standardized by the number of processors used (n) and reported as efficiency where

Efficiency n = Speedup n / n

(2.2)

Perfect efficiencies are rarely observed, however, since normally there are parts of a program that must remain sequential. Amdahl’s Law is the theoretical maximum improvement that can be obtained using parallelism (Amdahl 1967):

Theoretical _ Speedup = 1 / ( (1 − p ) + p / n )

(2.3)

where p = proportion of the program that can be made parallel (1−p is the serial part) and n is the number of processors. As n grows, the right-hand term diminishes and speedups tend to 1/(1−p). The consequence of Amdahl’s law is that the weakest link in the code will determine the maximum parallel effectiveness. The effect of Amdahl’s Law can be observed in an example reported by Rokos and Armstrong (1996), in which serial input-output comprised a large fraction of total compute time, which had a deleterious effect on overall speedup.

2 High Performance Computing for Geospatial Applications: A Retrospective View

13

It should be noted that Amdahl had an axe to grind as he was involved in the design of large mainframe uniprocessor computer systems for IBM (System/360) and later Amdahl Computer, a lower-cost, plug-compatible, competitor of IBM mainframes. In fact, Gustafson (1988, p. 533) reinterprets Amdahl’s “law” and suggests that the computing community should overcome the “mental block” against massive parallelism imposed by a misuse of Amdahl’s equation, asserting that speedup should be measured by scaling the problem to the number of processors.

4 A lternative Approaches to Implementing HPC Geospatial Applications The earliest work on the application of HPC to geospatial analyses tended to focus first on the use of uniprocessor architectures with relatively limited parallelism. Subsequent work exploited pipelining and an increased number of processors executing in parallel. Both cases, however, attempted to employ vertical scaling that relies on continued increases in the performance of system components, such as processor and network speeds. These vertically scaled parallel systems (Sect. 4.1) were expensive and thus scarce. Moreover, companies that produced such systems usually did not stay in business very long (see Kendall Square Research, Encore Computer, Alliant Computer Systems). A different way to think about gaining performance improvements is horizontal scaling in which distributed resources are integrated into a configurable system that is linked using middleware (NIST 2015). This latter approach (Sect. 4.2) has been championed using a variety of concepts and terms such as grid computing and cyberinfrastructure.

4.1 T he Early Era: Vertical Scaling, Uni-Processors, and Early Architectural Innovations The von Neumann architecture (see for example, Stone and Cocke 1991) serves as a straightforward basis for modern computer architectures that have grown increasingly complex during the past several decades. In its most simple form, a computer system accepts input from a device, operates on that input, and moves the results of the operation to an output device. Inputs in the form of data and instructions are stored in memory and operations are typically performed by a central processing unit (CPU) that can take many forms. Early computers used a single bus to access both data and instructions and contention for this bus slowed program execution in what is known as the von Neumann “bottleneck.” Though performance could be improved by increasing the clock speed of the processor, computer architects have also invented many other paths (e.g., the so-called Harvard architecture) that exploit multiple buses, a well-developed memory hierarchy (Fig. 2.1) and multiple cores

14

M. P. Armstrong

Fig. 2.1 Memory hierarchy in an abstract representation of a uniprocessor. Level 1 cache is usually accessible in very few clock cycles, while access to other levels in the hierarchy typically requires an increased number of cycles

and processors, in an attempt to overcome processing roadblocks. In practice, most geospatial processing takes place using the limited number of cores that are available on the commodity chips used by desktop systems. Some current generation CPUs, for example, feature six cores and 12 threads, thus enabling limited parallelism that may be exploited automatically by compilers and thus remain invisible to the average user. When examining alternative architectures, it is instructive to use a generic taxonomy to place them into categories in a 2 × 2 grid with the axes representing data streams and instruction streams (Flynn 1972). The top left category (single instruction and single data streams; SISD) represents a simple von Neumann architecture. The MISD category (multiple instruction and single data streams; top right) has no modern commercial examples that can be used for illustration. The remaining two categories (the bottom row of Fig. 2.2) are parallel architectures that have had many commercial implementations and a large number of architectural offshoots. The simple SISD model (von Neumann architecture) is sometimes referred to as a scalar architecture. This is illustrated in Fig. 2.3a, which shows that a single result is produced only after several low-level instructions have been executed. The vector approach, as the name implies, operates on entire vectors of data. It requires the same number of steps to fill a “pipeline” but then produces a result with every clock cycle (Fig. 2.3b). An imperfect analogy would be that a scalar approach would allow only one person on an escalator at a time, whereas a vector approach would allow a person on each step, with the net effect of much higher throughput. Vector processing was a key performance enhancement of many early supercomputers, particularly those manufactured by Cray Research (e.g., Cray X-MP) which

2 High Performance Computing for Geospatial Applications: A Retrospective View

15

Fig. 2.2 Flynn’s taxonomy represented in a 2 × 2 cross-classification (S single, M multiple, I instruction streams, D data streams)

Fig. 2.3 (a) (top) and (b) (bottom). The top table shows a scalar approach that produces a single result after multiple clock cycles. The bottom table (b) shows that the same number of cycles (40) is required to compute the first result and then after that, the full pipeline produces a result at the end of every cycle. (Adapted from Karplus 1989)

had only limited shared memory parallelism (August et al. 1989). The Cray-2 was a successor to the X-MP and Griffith (1990) describes the advantages that accrue to geospatial applications with vectorized code. In particular, in the Fortran examples provided, the best improvements occur when short loops are processed; when nested do-loops are used the inner loops are efficiently vectorized, while outer loops are scalar.

16

M. P. Armstrong

In a more conventional SIMD approach to parallel computing, systems are often organized in an array (such as 64 × 64 = 4 K) of relatively simple processors that are 4- or 8-connected. SIMD processing is extremely efficient for gridded data because rather than cycling through a matrix element by element, all elements in an array (or a large portion of it) are processed in lockstep; a single operation (e.g., add two integers) is executed on all matrix elements simultaneously. This is often referred to as data parallelism and is particularly propitious for problems that are represented by regular geometrical tessellations, as encountered, for example, during cartographic modeling of raster data. In other cases, while significant improvements can be observed, processing efficiency may drop because of the intrinsic sequential nature of required computation steps. For example, Armstrong and Marciano (1995) reported substantial improvements over a then state-of-the-art workstation for a spatial statistics application using a SIMD MasPar system with 16 K processors, though efficiency values were more modest. In the current era, SIMD array-like processing is now performed using GPU (graphics processing units) accelerators (e.g., Lim and Ma 2013; Tang and Feng 2017). In short, a graphics card plays the same role that an entire computer system played in the 1990s. Tang (2017, p. 3196) provides an overview of the use of GPU computing for geospatial problems. He correctly points out that data structures and algorithms must be transformed to accommodate SIMD architectures and that, if not properly performed, only modest improvements will obtain (see also Armstrong and Marciano 1997). MIMD computers take several forms. In general, they consist of multiple processors connected by a local high-speed bus. One simple model is the shared memory approach, which hides much of the complexity of parallel processing from the user. This model typically does not scale well as a consequence of bus contention: all processors need to use the same bus to access the shared memory (see Fig. 2.4). This

Fig. 2.4 A simplified view of a four-processor shared memory architecture. The connection between the bus and shared memory can become a choke-point that prevents scalability

2 High Performance Computing for Geospatial Applications: A Retrospective View

17

approach also requires that the integrity of shared data structures be maintained, since each process has equal memory modification rights. Armstrong et al. (1994) implemented a parallel version of G(d) (Ord and Getis 1995) using an Encore Multimax shared memory system with 14 processors. While performance improvements were observed, the results also show decreasing scalability, measured by speedup, particularly when small problems are processed, since the initialization and synchronization processes (performed sequentially) occupy a larger proportion of total compute time for those problems (Amdahl’s Law strikes again). Shekhar et al. (1996) report better scalability in their application of data partitioning to support range query applications, but, again, their Silicon Graphics shared memory system had only 16 processors. A different approach to shared memory was taken by Kendall Square Research when implementing their KSR-1 system from the mid-1990s. Rather than employing a single monolithic shared memory it used a hierarchical set of caches (ALLCACHE, Frank et al. 1993). As shown in Fig. 2.5, each processor can access its own memory as well as that of all other processing nodes to form a large virtual address space. Processors are linked by a high-speed bus divided into hierarchical zones, though the amount of time required to access different portions of shared memory from any given processor will vary if zone borders must be crossed. A

Fig. 2.5 Non-uniform (hierarchical) memory structure of the KSR1 (after Armstrong and Marciano 1996)

18

M. P. Armstrong

processor can access its 256 K sub-cache in only 2 clock cycles, it can access 32 MB memory in its own local cache in 18 clock cycles, and if memory access is required on another processor within a local zone, then a penalty of 175 clock cycles is incurred (Armstrong and Marciano 1996). However, if a memory location outside this zone must be accessed, then 600 cycles are required. This hierarchical approach exploits data locality and is a computer architectural restatement of Tobler’s First Law: All things are interconnected, but near things are interconnected faster than distant things.

4.2 D istributed Parallelism and Increased Horizontal Scalability Horizontal scaling began to gain considerable traction in the early 1990s. In contrast to the vertically scaled systems typically provided by a single, turn-key vendor, these new approaches addressed scalability issues using a shared nothing approach that had scalability as a central design element. Stonebraker (1986), for example, described three conceptual architectures for multiprocessor systems (shared memory, shared disk, and shared nothing) and concluded that shared nothing was the superior approach for database applications. Smarr and Catlett (1992, p. 45) pushed this concept further in conceptualizing a metacomputer as “a network of heterogeneous, computational resources linked by software in such a way that they can be used as easily as a personal computer.” They suggest an evolutionary path in network connectivity from local area networks to wide area networks to a third stage: a transparent national network that relies on the development of standards that enable local nodes to interoperate in flexible configurations. These concepts continued to be funded by NSF and eventually evolved into what was termed grid computing (Foster and Kesselman 1999) with its central metaphor of the electric grid (with computer cycles substituting for electricity). Wang and Armstrong (2003), Armstrong et al. (2005), and Wang et al. (2008) illustrate the effectiveness of the grid computing paradigm to geospatial information analysis. At around the same time, several other related concepts were being developed that bear some similarity to the grid approach. The Network of Workstations (NOW) project originated in the mid-1990s at UC-Berkeley in an attempt to construct configurable collections of commodity workstations that are connected using what were then high-performance networks (Anderson et al. 1995). Armstrong and Marciano (1998) developed a NOW implementation (using Message Passing Interface, Snir et al. 1996) to examine its feasibility in geospatial processing (inverse-distance weighted interpolation). While substantial reductions in computing time were realized, the processor configuration achieved only moderate levels of

2 High Performance Computing for Geospatial Applications: A Retrospective View

19

efficiency when more than 20 processors were used due, in part, to communication latency penalties from the master-worker approach used to assign parallel tasks. At around the same time, Beowulf clusters were also developed with a similar architectural philosophy: commodity class processors, linked by Ethernet to construct a distributed parallel architecture.

4.3 Cyberinfrastructure and CyberGIS Cyberinfrastructure is a related term that was promoted by the National Science Foundation beginning in the early 2000s (Atkins et al. 2003) and it continues into the present era with the establishment of NSF’s Office of Advanced Cyberinfrastructure, which is part of the Computer and Information Science and Engineering Directorate. While numerous papers have described various aspects of Cyberinfrastructure, Stewart et al. (2010, p. 37) define the concept in the following way: Cyberinfrastructure consists of computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible.

The “all linked together” part of the definition moves the concept from a more localized view promulgated by NOW and Beowulf to a much more decentralized, even international, scope of operation more aligned with the concepts advanced as grid computing. This linking is performed through the use of middleware, software that acts as a kind of digital plumbing to enable disparate components to work together. The most fully realized geospatial implementation of the cyberinfrastructure concept is the CyberGIS project at the University of Illinois (Wang 2010, 2013; also see http://cybergis.illinois.edu/).Wang (2019) characterizes CyberGIS as a “fundamentally new GIS modality based on holistic integration of high-performance and distributed computing, data-driven knowledge discovery, visualization and visual analytics, and collaborative problem-solving and decision-making capabilities.” CyberGIS uses general-purpose middleware, but goes beyond that to also implement geospatially tailored middleware that is designed to capture spatial characteristics of problems in order to promote application specific efficiencies in locating and using distributed resources (Wang 2010). It seems clear, based on current trends, that the concept of cyberinfrastructure will continue to be central to developments in HPC at least for the next decade (NSF 2018) and that CyberGIS will continue to undergo “parallel” developments.

20

M. P. Armstrong

4.4 Cloud Computing Cloud computing is yet another general distributed model in which configurable computer services, such as compute cycles and storage, are provided over a network (Sugumaran and Armstrong 2017). It is, as such, a logical outgrowth of grid computing and it is sometimes referred to as utility computing. The US National Institute of Standards and Technology has provided an authoritative definition of cloud computing (Mell and Grance 2011, p. 2) with five essential characteristics that are paraphrased here: 1. On-demand self-service. Consumers must be able to access computing capabilities, such as compute time and network storage automatically. 2. Broad network access. Capabilities must be available over the network and accessed through standard protocols that enable use by diverse platforms (e.g., tablets and desktop systems). 3. Resource pooling. Service provider’s resources are pooled to serve multiple consumers, with different physical and virtual resources dynamically assigned according to consumer demand. The consumer has no control or knowledge about the location of the provided resources. 4. Rapid elasticity. Capabilities are elastically provided commensurate, with demand. 5. Measured service. Cloud systems control and optimize resource use by metering resources; usage is monitored, controlled, and reported, providing transparency to providers and consumers. The flexibility of the cloud approach means that users and organizations are able to adapt to changes in demand. The approach also changes the economics of computing from a model that may require considerable capital investment in hardware, with associated support and upgrade (fixed) costs to one in which operational expenses are shifted to inexpensive networked clients (variable costs). While there are private clouds, cloud computing services are often public and provided by large corporate enterprises (e.g., Amazon and Google) that offer attractive, tailorable hardware and software configurations. Cloud computing is reaching a mature stage and because of its tailorable cost structures and configurability, the approach will continue to be broadly adopted in the foreseeable future. Cloud computing has been demonstrated to be effective in several geospatial problem domains (Hegeman et al. 2014; Yang et al. 2013).

4.5 Moving Closer to the Edge Despite the substantial advantages provided by cloud computing, it does suffer from some limitations, particularly latency. Communication is, after all, limited by the speed of light and in practice it is far slower than that limit (Satyanarayanan 2017).

2 High Performance Computing for Geospatial Applications: A Retrospective View

21

With the proliferation of electronic devices connected as part of the Internet of Things (estimated to be approximately 50,000,000,000 by 2020) that are generating zettabytes (1021 bytes) of data each year, bandwidth is now a major concern (Shi and Dustdar 2016). Trends in distributed data collection and processing are likely to persist, with one report1 by the FCC 5G IoT Working Group suggesting that the amount of data created and processed outside a centralized center or cloud is now around 10% and will likely increase to 75% by 2022. Communication latency is particularly problematic for real-time systems, such as augmented reality and autonomous vehicle control. As a consequence, edge and fog computing have emerged as important concepts in which processing is decentralized, taking place between individual devices and the cloud, to obviate the need to move massive amounts of data, and thereby increase overall computational performance. It turns out that geography matters. Achieving a proper balance between centralized and distributed processing is a key here. The movement of massive amounts of data has also become a source of concern for those companies that are now providing 5G wireless network service that could be overwhelmed by data fluxes before the systems are even fully implemented. In a fashion similar to that of cloud computing, NIST has promulgated a definition of fog computing that contains these six fundamental elements as reported by Iorga et al. (2018, pp. 3–4): 1. Contextual awareness and low latency. Fog computing is low latency because nodes are often co-located with end-devices, and analysis and response to data generated by these devices are faster than from a centralized data center. 2. Geographical distribution. In contrast to centralized cloud resources, fog computing services and applications are geographically distributed. 3. Heterogeneity. Fog computing supports the collection and processing of different types of data acquired by multiple devices and network communication capabilities. 4. Interoperability and federation. Components must interoperate, and services are federated across domains. 5. Real-time interactions. Fog computing applications operate in real-time rather than in batch mode. 6. Scalability and agility. Fog computing is adaptive and supports, for example, elastic computation, resource pooling, data-load changes, and network condition variations. The battle between centralization and de-centralization of computing is ongoing. Much like the episodic sagas of the mainframe vs the personal computer, the cloud vs fog approach requires that trade-offs be made in order to satisfy performance objectives and meet economic constraints. While cloud computing provides access to flexibly specified, metered, centralized resources, fog computing offloads

1 https://www.fcc.gov/bureaus/oet/tac/tacdocs/reports/2018/5G-Edge-Computing-Whitepaperv6-Final.pdf.

22

M. P. Armstrong

burdens from the cloud to provide low-latency services to applications that require it. Cloud and fog computing, therefore, should be viewed as complementary.

5 Summary and Conclusion The use of HPC in geospatial applications has had a checkered history with many implementations showing success only to see particular architectures or systems become obsolete. Nevertheless, there were lessons learned that could be passed across generations of systems and researchers. For example, early conceptual work on geospatial domain decomposition (Armstrong and Densham 1992) informed further empirical investigations of performance that modelled processing time as a function of location (Cramer and Armstrong 1999). A decade later, this work was significantly extended to undergird the assignment of tasks in modern cyberinfrastructure- based approaches to distributed parallelism (Wang and Armstrong 2009). It also seems clear that HPC is reaching a maturation point with much attention now focused on the use of distributed resources that interoperate over a network (e.g., cyberinfrastructure, cloud and fog computing). Though much work remains, there is at least a clear developmental path forward (Wang 2013). There is a cloud on the horizon, however. Moore’s Law (Moore 1965) is no longer in force (Hennessy and Patterson 2019). As a consequence, computer architects are searching for alternative methods, such as heterogeneous processing and quantum computing, to increase performance. It is likely, however, that many of these emerging technologies will continue to be accessed using cyberinfrastructure.

References Amdahl, G. M. (1967). Validity of the single-processor approach to achieving large-scale computing capabilities. In Proceedings of the American Federation of Information Processing Societies Conference (pp. 483–485). Reston, VA: AFIPS. Anderson, T. E., Culler, D. E., Patterson, D. A., & the NOW Team. (1995). A case for NOW (networks of workstations). IEEE Micro, 15(1), 54–64. Armstrong, M. P. (2000). Geography and computational science. Annals of the Association of American Geographers, 90(1), 146–156. Armstrong, M. P., Cowles, M., & Wang, S. (2005). Using a computational grid for geographic information analysis. The Professional Geographer, 57(3), 365–375. Armstrong, M. P., & Densham, P. J. (1992). Domain decomposition for parallel processing of spatial problems. Computers, Environment and Urban Systems, 16(6), 497–513. Armstrong, M. P., & Marciano, R. J. (1995). Massively parallel processing of spatial statistics. International Journal of Geographical Information Systems, 9(2), 169–189. Armstrong, M. P., & Marciano, R. J. (1996). Local interpolation using a distributed parallel supercomputer. International Journal of Geographical Information Systems, 10(6), 713–729.

2 High Performance Computing for Geospatial Applications: A Retrospective View

23

Armstrong, M. P., & Marciano, R. J. (1997). Massively parallel strategies for local spatial interpolation. Computers & Geosciences, 23(8), 859–867. Armstrong, M. P, & Marciano, R. J. (1998). A network of workstations (NOW) approach to spatial data analysis: The case of distributed parallel interpolation. In Proceedings of the Eighth International Symposium on Spatial Data Handling (pp. 287–296). Burnaby, BC: International Geographical Union. Armstrong, M. P., Pavlik, C. E., & Marciano, R. J. (1994). Parallel processing of spatial statistics. Computers & Geosciences, 20(2), 91–104. Atkins, D. E., Droegemeier, K. K., Feldman, S. I., Garcia-Molina, H., Klein, M. L., Messerschmitt, D. G., et al. (2003). Revolutionizing science and engineering through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure. http://www.nsf.gov/od/oci/reports/toc.jsp August, M. C., Brost, G. M., Hsiung, C. C., & Schiffleger, A. J. (1989). Cray X-MP: The birth of a supercomputer. IEEE Computer, 22(1), 45–52. Cramer, B. E., & Armstrong, M. P. (1999). An evaluation of domain decomposition strategies for parallel spatial interpolation of surfaces. Geographical Analysis, 31(2), 148–168. Densham, P. J., & Armstrong, M. P. (1994). A heterogeneous processing approach to spatial decision support systems. In T. C. Waugh & R. G. Healey (Eds.), Advances in GIS research (Vol. 1, pp. 29–45). London: Taylor and Francis. Flynn, M. J. (1972). Some computer organizations and their effectiveness. IEEE Transactions on Computers, C-21(9), 948–960. https://doi.org/10.1109/TC.1972.5009071 Foster, I., & Kesselman, C. (Eds.). (1999). The grid: blueprint for a new computing infrastructure. San Francisco, CA: Morgan Kaufmann. Frank, S., Burkhardt, H., & Rothnie, J. (1993). The KSR 1: Bridging the gap between shared memory and MPPs. COMPCON Spring '93. Digest of Papers, pp. 285–294. https://doi.ieeecomputersociety.org/10.1109/CMPCON.1993.289682 Franklin, W. R., Narayanaswami, C., Kankanhalli, M., Sun, D., Zhou, M.-C., & Wu, P. Y. F. (1989). Uniform grids: A technique for intersection detection on serial and parallel machines. In Proceedings of the Ninth International Symposium on Computer-Assisted Cartography, Baltimore, MD, 2–7 April (pp. 100–109). Bethesda, MD: American Congress on Surveying and Mapping. Griffith, D. A. (1990). Supercomputing and spatial statistics: A reconnaissance. The Professional Geographer, 42(4), 481–492. Gustafson, J. L. (1988). Reevaluating Amdahl’s law. Communications of the Association for Computing Machinery, 31(5), 532–533. Healey, R., Dowers, S., Gittings, B., & Mineter, M. (Eds.). (1998). Parallel processing algorithms for GIS. Bristol, PA: Taylor & Francis. Hegeman, J. W., Sardeshmukh, V. B., Sugumaran, R., & Armstrong, M. P. (2014). Distributed LiDAR data processing in a high-memory cloud-computing environment. Annals of GIS, 20(4), 255–264. Hennessy, J. L., & Patterson, D. A. (2019). A new golden age for computer architecture. Communications of the Association for Computing Machinery, 62(2), 48–60. Iorga, M., Feldman, L., Barton, R., Martin, M. J., Goren, N., & Mahmoudi, C. (2018). Fog computing conceptual model. Gaithersburg, MD: NIST . NIST Special Publication 500-325. https:// doi.org/10.6028/NIST.SP.500-325 Karplus, W. J. (1989). Vector processors and multiprocessors. In K. Hwang & D. DeGroot (Eds.), Parallel processing for supercomputers and artificial intelligence. New York, NY: McGraw Hill. Lim, G. J., & Ma, L. (2013). GPU-based parallel vertex substitution algorithm for the p-median problem. Computers & Industrial Engineering, 64, 381–388. Mell, P., & Grance, T. (2011). The NIST definition of cloud computing. Gaithersburg, MD: NIST . National Institute of Standards and Technology Special Publication 800-145. https://doi. org/10.6028/NIST.SP.800-145

24

M. P. Armstrong

Moore, G. (1965). Cramming more components onto integrated circuits. Electronics, 38(8), 114–117. Mower, J. E. (1992). Building a GIS for parallel computing environments. In Proceedings of the Fifth International Symposium on Spatial Data Handling (pp. 219–229). Columbia, SC: International Geographic Union. NIST (National Institute of Standards and Technology). (2015). NIST big data interoperability framework: Volume 1, Definitions . NIST Special Publication 1500-1. Gaithersburg, MD: NIST. https://doi.org/10.6028/NIST.SP.1500-1 NSF (National Science Foundation). (2018). CI2030: Future advanced cyberinfrastructure. A Report of the NSF Advisory Committee for Cyberinfrastructure. https://www.nsf.gov/cise/oac/ ci2030/ACCI_CI2030Report_Approved_Pub.pdf Ord, J. K., & Getis, A. (1995). Local spatial autocorrelation statistics: Distributional issues and an application. Geographical Analysis, 27, 286–306. Rokos, D., & Armstrong, M. P. (1996). Using Linda to compute spatial autocorrelation in parallel. Computers & Geosciences, 22(5), 425–432. Sandu, J. S., & Marble, D. F. (1988). An investigation into the utility of the Cray X-MP supercomputer for handling spatial data. In Proceedings of the Third International Symposium on Spatial Data Handling IGU, Sydney, Australia, pp. 253–266. Satyanarayanan, M. (2017). The emergence of edge computing. IEEE Computer, 50(1), 30–39. Shekhar, S., Ravada, S., Kumar, V., Chubb, D., & Turner, G. (1996). Parallelizing a GIS on a shared address space architecture. IEEE Computer, 29(12), 42–48. Shi, W., & Dustdar, S. (2016). The promise of edge computing. IEEE Computer, 49(5), 78–81. Smarr, L., & Catlett, C. E. (1992). Metacomputing. Communications of the Association for Computing Machinery, 35(6), 44–52. Snir, M., Otto, S., Huss-Lederman, S., Walker, D., & Dongarra, J. (1996). MPI: The complete reference. Cambridge, MA: MIT Press. Stewart, C., Simms, S., Plale, B., Link, M., Hancock, D., & Fox, G. (2010). What is cyberinfrastructure? In SIGUCCS’'10 Proceedings of the 38th Annual ACM SIGUCCS Fall Conference: Navigation and Discovery. Norfolk, VA, 24–27 Oct, pp. 37–44. https://dl.acm.org/citation. cfm?doid=1878335.1878347 Stone, H. S., & Cocke, J. (1991). Computer architecture in the 1990s. IEEE Computer, 24(9), 30–38. Stonebraker, M. (1986). The case for shared nothing architecture. Database Engineering, 9, 1. Sugumaran, R., & Armstrong, M. P. (2017). Cloud computing. In The international encyclopedia of geography: people, the earth, environment, and technology. New York, NY: John Wiley. https://doi.org/10.1002/9781118786352.wbieg1017 Tang, W. (2017). GPU computing. In M. F. Goodchild & M. P. Armstrong (Eds.), International encyclopedia of geography. Hoboken, NJ: Wiley. https://doi.org/10.1002/9781118786352. wbieg0129 Tang, W., & Feng, W. (2017). Parallel map projection of vector-based big spatial data using general- purpose graphics processing units. Computers, Environment and Urban Systems, 61, 187–197. Wang, S. (2010). A CyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Annals of the Association of American Geographers, 100(3), 535–557. Wang, S. (2013). CyberGIS: Blueprint for integrated and scalable geospatial software ecosystems. International Journal of Geographical Information Science, 27(11), 2119–2121. Wang, S. (2019). Cyberinfrastructure. In J. P. Wilson (Ed.), The geographic information science & technology body of knowledge. Ithaca, NY: UCGIS. https://doi.org/10.22224/gistbok/2019.2.4 Wang, S., & Armstrong, M. P. (2003). A quadtree approach to domain decomposition for spatial interpolation in grid computing environments. Parallel Computing, 29(10), 1481–1504. Wang, S., & Armstrong, M. P. (2009). A theoretical approach to the use of cyberinfrastructure in geographical analysis. International Journal of Geographical Information Science, 23(2), 169–193. https://doi.org/10.1080/13658810801918509

2 High Performance Computing for Geospatial Applications: A Retrospective View

25

Wang, S., Cowles, M. K., & Armstrong, M. P. (2008). Grid computing of spatial statistics: Using the TeraGrid for Gi∗(d) analysis. Concurrency and Computation: Practice and Experience, 20(14), 1697–1720. https://doi.org/10.1002/cpe.1294 Yan, J., Cowles, M. K., Wang, S., & Armstrong, M. P. (2007). Parallelizing MCMC for Bayesian spatiotemporal geostatistical models. Statistics and Computing, 17(4), 323–335. Yang, C., Huang, Q., Li, Z., Xu, C., & Liu, K. (2013). Spatial cloud computing: A practical approach. Boca Raton, FL: CRC Press.

Chapter 3

Spatiotemporal Domain Decomposition for High Performance Computing: A Flexible Splits Heuristic to Minimize Redundancy Alexander Hohl, Erik Saule, Eric Delmelle, and Wenwu Tang

Abstract There are three steps towards implementing the divide-and-conquer strategy for accelerating spatiotemporal analysis: First, performing spatiotemporal domain decomposition: dividing a large computational task into smaller parts (subdomains) by partitioning the input dataset along its spatiotemporal domain. Second, computing a spatiotemporal analysis algorithm (e.g., kernel density estimation) for each of the resulting subdomains using high performance parallel computing. Third, collecting and reassembling the outputs. However, as many spatiotemporal analysis approaches employ neighborhood search, data elements near domain boundaries need to be assigned to multiple processors by replication to avoid spurious boundary effects. We focus on the first step of the divide-and-conquer strategy because replication of data decreases the efficiency of our approach. We develop a spatiotemporal domain decomposition method (ST-FLEX-D) that allows for flexibility in domain split positions, which refines the well-known static approach (ST-STATIC-D). We design a heuristic to find domain splits that minimize data replication and compare the resulting set of subdomains to ST-STATIC-D using the following metrics: (1) execution time of decomposition, (2) total number of replicated data points, (3) average leaf node depth, and (4) average leaf node size. We make the following key A. Hohl (*) Department of Geography, University of Utah, Salt Lake City, UT, USA e-mail: [email protected] E. Saule Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, USA E. Delmelle · W. Tang Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA Center for Applied Geographic Information Science, University of North Carolina at Charlotte, Charlotte, NC, USA © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_3

27

28

A. Hohl et al.

assumption: The spatiotemporal analysis in the second step of the divide-and- conquer strategy uses known, fixed spatial and temporal search radii, as determining split positions would be very difficult otherwise. Our results show that ST-FLEX-D is successful in reducing data replication across a range of parameterizations but comes at the expense of increased decomposition time. Our approach is portable to other space-time analysis methods and holds the potential to enable scalable geospatial applications. Keywords Domain decomposition · Algorithms · Parallel computing · Space-time

1 Introduction Cyberinfrastructure (CI) and high performance computing (HPC) enable solving computational problems that were previously inconceivable or intractable (Armstrong 2000) and have transformative impacts on many disciplines such as Geography, Engineering, or Biology. The advent of CI and HPC has allowed for significant scientific enterprise within the GIScience community: the study of health and wellbeing through participatory data collection using mobile devices (Yin et al. 2017) or web-scraping (Desjardins et al. 2018), the analysis of human digital footprints (Soliman et al. 2017), the use of social media data for a geospatial perspective on health issues (Gao et al. 2018; Padmanabhan et al. 2014; Ye et al. 2016; Shi and Wang 2015), hydrologic modelling (Survila et al. 2016; Ye et al. 2014), biomass and carbon assessment (Tang et al. 2016, 2017; Stringer et al. 2015), and agent-based modelling (Shook et al. 2013; Fachada et al. 2017; Tang 2008; Tang and Bennett 2010; Tang and Wang 2009; Tang et al. 2011). The evolution of technologies such as sensor systems, automated geocoding abilities, and social media platforms has enabled us to collect large quantities of spatial and spatiotemporal data at an increasingly fast pace (Goodchild 2007). Therefore, we are facing a stream of geographically referenced data of unprecedented volume, velocity, and variety, which necessitates advanced computational capabilities for analysis and prediction (Zikopoulos and Eaton 2011). There are three additional factors motivating the call for increased computational power. The first is the soaring computational intensity of many methods for analyzing geographic phenomena, due to underlying algorithms that require large amounts of spatial search. Computational intensity is defined as “the magnitude of computational requirements of a problem based on the evaluation of the characteristics of the problem, its input and output, and computational complexity” (Wang 2008). The second is the use of simulation approaches (i.e., Monte Carlo) for significance testing, which further increases the computational cost of geospatial analysis (Tang et al. 2015). The third is the inclusion of a true representation of time in geographic models that further complicates the computation of geospatial applications (Kwan and Neutens 2014). Together, these factors unequivocally raise the need for further

3 Spatiotemporal Domain Decomposition for High Performance Computing…

29

scientific investigation to cope with computational problems within the domain of GIScience. Deploying many computing resources in parallel allows for increased performance and therefore, solving large scale computational problems using parallel algorithms and strategies (Wilkinson and Allen 2004). As opposed to sequential algorithms, parallel algorithms perform multiple operations concurrently (Blelloch and Maggs 1996). They include message passing, a standard for sending messages between tasks or processes to synchronize tasks and to perform operations on data in transit (Huang et al. 2011); shared memory, which is accessed by multiple concurrent programs to enable communication (Armstrong and Marciano 1997); MapReduce, which is a programming model and implementation that allows for processing big data and features the specification of the map and a reduce function (Dean and Ghemawat 2008); and lastly, hybrid parallel algorithms, which combine the above processes (Biswas et al. 2003). Parallel strategies are techniques designed to achieve the major aim of parallelism in a computation (Wilkinson and Allen 2004). They include load balancing, which aims at balancing the workload among concurrent processors (Hussain et al. 2013); task scheduling, which optimizes the time computing tasks start execution on a set of processors (Dutot et al. 2004), and finally, spatial and spatiotemporal domain decomposition, which divide a dataset into smaller subsets along its spatiotemporal domain (Ding and Densham 1996). Spatial and spatiotemporal domain decomposition are widely used strategies for parallel problem solving (Deveci et al. 2016; Berger and Bokhari 1987; Nicol 1994) and are one out of three crucial steps for implementing the divide-and-conquer strategy (Hohl et al. 2015). These steps are: 1. Partition the dataset by decomposing its spatiotemporal domain into smaller subsets. 2. Distribute the subsets to multiple concurrent processors1 for computing a spatiotemporal analysis algorithm (e.g., kernel density estimation, KDE). 3. Collect the results and reassemble to one coherent dataset. This approach is scalable if the computational intensity is balanced among concurrent processors by accounting for the explicit characteristics of the data (Wang and Armstrong 2003; Wang et al. 2008; Wang 2008). Here, we focus on recursive domain decomposition such as quadtrees and octrees, as these are common approaches that promote balanced workloads for processing data that are heterogeneously distributed in space and time (Turton 2003; Hohl et al. 2016; Ding and Densham 1996). Spatiotemporal domain decomposition approaches partition the domain of a given dataset into a hierarchical set of rectangular (2D) or cuboid (3D, 2D + time) subdomains (Samet 1984). Each subdomain contains a similar number of data points, which facilitates workload balance across concurrent processors. However, We use in this chapter the term “processor” as a generic term for what is performing the computation which depending on context could be a core, a thread, a reduce task, a computing node, an MPI rank, a physical processor, etc. 1

30

A. Hohl et al.

many spatiotemporal analysis and modelling approaches rely on neighborhood information within a given distance of a location (“bandwidth”). This complicates the domain decomposition procedure because by partitioning and distributing the dataset to different processors, exactly that neighborhood information is no longer accessible near domain boundaries (Zheng et al. 2018). Approaches to prevent the introduction of edge effects (e.g., in KDE) due to partitioning include overlapping subdomains, where data that falls within the overlapping region is assigned to multiple neighboring subdomains, which is either achieved through (1) replication of data points (Hohl et al. 2015) or (2) communication between processors (Guan and Clarke 2010). Both approaches reduce the efficiency of spatiotemporal domain decomposition and therefore, limit scalability and applicability of the divide-and- conquer strategy. In this study, we focus on the first step of the divide-and-conquer strategy and present a methodology that allows for scalable space-time analysis by leveraging the power of high-performance computing. We introduce a heuristic that optimizes the spatiotemporal domain decomposition procedure by reducing the loss in efficiency from overlapping subdomains, which greatly benefits the scalability of the divide-and-conquer strategy. Our method accelerates spatiotemporal analysis while maintaining its integrity by preventing the introduction of edge effects through overlapping subdomains. This chapter is organized as follows: Section 2 lays out our methodology and data, Sect. 3 illustrates the results, and Sect. 4 holds discussion and conclusions.

2 Data and Methods 2.1 Data To illustrate the general applicability of our approach, we use two spatiotemporally explicit datasets in this study. They differ in size, as well as spatial and temporal extents, but have in common that they contain [x, y, t] tuples for each record. The first dataset (“the dengue fever dataset,” Fig. 3.1, left) contains dengue fever cases in the city of Cali, Colombia (Delmelle et al. 2013). We use a total of 11,056 geocoded cases, 9606 cases in 2010, and 1562 in 2011. The cumulative distribution of the number of cases has a steep initial incline early on and is illustrated in Fig. 3.1, right. We explain the difference in the number of cases by the fact that 2010 was identified as an epidemic year (Varela et al. 2010). We use the home addresses of patients geomasked to the nearest street intersection as the spatial coordinates of the cases to maintain privacy (Kwan et al. 2004). The second dataset (“the pollen tweets dataset,” Fig. 3.2, left) contains 551,627 geolocated tweets within the contiguous USA from February 2016 to April 2016 (Saule et al. 2017). Tweets were selected by keywords, such as “pollen” or “allergy” and if precise geographic location was not available, we picked a random location within the approximated region provided by

3 Spatiotemporal Domain Decomposition for High Performance Computing…

12,000

2010

31

2011

#Dengue Cases

10,000 8,000 6,000 4,000 2,000

N Dengue Fever Cases Cali City Boundary 0

1

2

3

0

0

100

200

300 400 500 Time (julian day)

600

700

4 km

Fig. 3.1 Spatial (left) and cumulative temporal (right) distribution of geomasked dengue fever cases in Cali, Colombia

Gnip (www.gnip.com). Therefore, the dataset contains spurious rectangular patterns; however, they should not matter for the purpose of our study. The cumulative temporal distribution (Fig. 3.2, right) shows a lower number of tweets at the beginning of the study period and a steeper incline later on.

2.2 Methods In this section, we introduce a method (ST-FLEX-D) that improves computational scalability of spatiotemporal analysis, such as space-time kernel density estimation (STKDE, Nakaya and Yano 2010). We seek to refine and complement an existing methodology (ST-STATIC-D, Hohl et al. 2015), which has been successfully used to enable and accelerate spatial and spatiotemporal analysis (Desjardins et al. 2018; Hohl et al. 2016, 2018), and which involves the three steps of domain decomposition outlined previously. We attempt to address a crucial problem of ST-STATIC-D by developing a heuristic that is characterized by flexible split locations for domain partitioning with the goal of minimizing data replication due to domain overlap. We assume known, fixed bandwidths of the spatiotemporal analysis algorithm at step 2 of the divide-and-conquer strategy.2 We employ the concept of space-time cube, which substitutes the third (vertical) dimension with time (2D + time, Hagerstrand 1970). All computational experiments are conducted using a workstation with a Otherwise, we would need to determine the spatial and temporal bandwidths prior to decomposition by utilizing a sequential procedure. 2

32

A. Hohl et al.

600,000 2016

500,000

# Tweets

400,000 300,000 200,000

Tweets U.S. Boundary 0

500 1000 1500 2000 km

100,000

N

0 30

40

50

60 70 80 90 Time (julian day)

100 110

Fig. 3.2 Spatial (left) and cumulative temporal (right) distribution of tweets in the USA

64-bit operating system, Intel® Core i3 CPU at 3.60 GHz clock speed, and 16 GB of memory. The decomposition procedures, which we coded using Python programming language, are run sequentially. 2.2.1 The Existing Method: ST-STATIC-D ST-STATIC-D is an existing implementation of spatiotemporal domain decomposition for accelerating geospatial analysis using parallel computing (Hohl et al. 2015). The procedure results in a set of subdomains of similar computational intensity, thereby facilitating balanced workloads among concurrent processors. Computational intensity of the spatiotemporal analysis algorithm at step 2 of the divide-and-conquer strategy may depend on the following characteristics: (1) The number of data points within the subdomain, (2) the number of grid points within a regular grid in 3D space (x, y, t), at which local analysis is performed, e.g., STKDE (Nakaya and Yano 2010), or local space-time Ripley’s K function (Hohl et al. 2017). Recursion is a concept where “the solution to a problem depends on solutions to smaller instances of the same problem” (Graham 1994). We use recursive spatiotemporal domain decomposition, which explicitly handles heavily clustered distributions of point data, which pose a threat to workload balance otherwise. The ST-STATIC-D decomposition algorithm consists of the following steps (see Fig. 3.3): (1) Find the minimum and maximum values for each dimension (x, y, t), a.k.a. the spatiotemporal domain of the dataset. (2) Bisect each dimension (midway split), which creates eight subdomains of equal size and cuboid shape. Assign each data point to the subdomain it falls into and (3) decompose subdomains recursively until (4) the exit condition is met: the crossing of either threshold T1 or T2. T1 is the number of data points within subdomain threshold and T2 the subdomain volume threshold. Both metrics typically decrease as decomposition progresses. Therefore, low thresholds result in fine-grained decompositions, which is advantageous as a high number of small computing tasks (as opposed to a low number of large tasks)

3 Spatiotemporal Domain Decomposition for High Performance Computing…

33

Fig. 3.3 Octree-based recursive spatiotemporal domain decomposition

facilitates workload balance among processors. However, choosing the thresholds low increases the depth of recursion, which could cause an exponential increase in computational intensity. We use the load balancing procedure described in Hohl, Delmelle, and Tang et al. (2015), which equalizes the computational intensity CI (Wang 2008) among multiple processors. Partitioning the spatiotemporal domain of a dataset and distributing the subdomains to different processors means crucial neighborhood information is no longer available near subdomain boundaries. Unless properly dealt with, this loss of information degrades the results of the spatiotemporal analysis in step 2 of the divide- and-conquer strategy, e.g., creating spurious patterns in the resulting kernel density estimates. To prevent this undesirable case, we create circular/cylindrical buffers of distance equal to the spatial and temporal search radii around each data point (Fig. 3.4). A data point is assigned to subdomain sd1 either if it is located within sd1 (i.e., point p1 in Fig. 3.4) or if its buffer intersects with the boundary plane of sd1 (i.e., p2). As boundaries separate neighboring domains, it is possible that buffers intersect with up to eight subdomains (i.e., p3). Therefore, data points may be replicated and assigned to the eight subdomains in the worst case, which causes data redundancy. Further recursive decomposition could cause a data point to be part of many subdomains. Such redundancy limits our ability to run computation at scale in two ways: First, the redundancy grows with increasing dataset size. While the redundancy may not be prohibitive for the datasets used in this study, it certainly is for bigger datasets (i.e., billions of observations). Second, the redundancy grows with increasing search radii. While it is up to the analyst to choose buffer distances (e.g., spatial and temporal bandwidths for STKDE), it is our goal to decrease redundancy at any choice of buffer distance.

34

A. Hohl et al.

Fig. 3.4 Buffer implementation for handling edge effects. Note that we chose to represent this 3D problem (2D + time) in 2D for ease of visualization. Same concepts apply in 3D, where circles become cylinders and lines become planes

2.2.2 The New Approach: ST-FLEX-D Here, we introduce two alternatives to ST-STATIC-D: ST-FLEX-D-base, and ST-FLEX-D-uneven. These methods focus on minimizing the redundancy caused by replication of points whose buffers intersect with subdomain boundaries. Their design is based on the observation that ST-STATIC-D bisects domains at the midpoint of each dimension (x, y, t). Therefore, ST-FLEX-D relaxes the midway split dictate and picks the optimal split out of a set of candidate splits. ST-FLEX-D-base defines Nc = 5 candidate split positions by regular increments along each axis (see Figs. 3.5, 3.6, and 3.7). It then chooses the best candidate split for partitioning according to the following rules: • Rule 1—The number of cuts. Pick the candidate split that produces the lowest number of replicated points. A data point is replicated if the splitting plane intersects (cuts) the cylinder centered on it. The spatial and temporal search distance (bandwidth) of the spatiotemporal analysis algorithm are equal to the radius and height of the cylinder (Fig. 3.5). • Rule 2—Split evenness. In case of a tie (two candidate splits both produce the lowest number of replicated points), pick the most even split. This split partitions the set of points most evenly among candidate splits in consideration (Fig. 3.6).

3 Spatiotemporal Domain Decomposition for High Performance Computing…

35

Fig. 3.5 Rule 1 of ST-FLEX-D-base. In this example, we choose S4 because it minimizes the number of circles cut by the bisection line

• Rule 3—Centrality. If still tied, pick the most central split. This split is most central among candidate splits in consideration (Fig. 3.7). Figure 3.8 illustrates the entire process using an example in 2D (again, same concepts apply in 3D). We focus on the x-axis first: Two candidate splits (SX1 and SX5) are tied in producing the minimum number of cuts. Therefore, we apply Rule 2 and pick SX5 because its split evenness (9/1) is higher than SX1 (0/10). Then, we focus on the y-axis, where again two candidate splits (SY1 and SY5) both produce the minimum number of cuts. We pick SY1 by applying Rule 2 (evenness of 1/9 over evenness of 10/0). ST-FLEX-D-base may face the issue of picking splits that do not advance the decomposition procedure (“bad splits”). The issue arises by selecting the outer splits (SX1, SX5, SY1, SY5) when points are distributed more centrally within the domain. Bad splits cut zero cylinders (and therefore are chosen by ST-FLEX-D- base), all points lie on the same side of the split, which does not further the goal of having decompositions of the data points. ST-FLEX-D-uneven addresses the problem by an uneven distribution of candidate split locations that do not cover the entire axis but congregate around the midway split (Fig. 3.9). This regime maintains flexible split locations while reducing the odds of choosing bad splits. Rules 1–3 of ST-FLEX-D-base for picking the best candidate split still apply for ST-FLEX-D-uneven.

36

A. Hohl et al.

Fig. 3.6 Rule 2 of ST-FLEX-D-base. Here, the minimum number of cut circles ties between S4 and S5 (Rule 1). Hence, we pick S4, which bisects the set of points more evenly

2.2.3 Performance Metrics and Sensitivity For both datasets, we compare the performance of ST-STATIC-D with the implementations of ST-FLEX-D (ST-FLEX-D-base, ST-FLEX-D-uneven) using the following metrics: 1 . execution time of decomposition, 2. total number of cut cylinders (total number of replicated data points), 3. average leaf node depth, 4. average leaf node size. The execution time of decomposition is the total amount of time required for decomposing the dataset, disregarding I/O. The total number of cut cylinders is equal to the number of replicated data points that result from the decomposition due to partitioning. It is a measure of the redundancy within the decomposition procedure and our goal is to minimize it. The decomposition procedure is inherently hierarchical, where a domain splits into multiple subdomains. Therefore, it is common to illustrate the procedure as a tree, where the domain of the input dataset is represented by the root node, and the subdomains resulting from the first split are children nodes linked to the root node (see Fig. 3.10 for illustration and example).

3 Spatiotemporal Domain Decomposition for High Performance Computing…

37

Fig. 3.7 Rule 3 of ST-FLEX-D-base. Here, the minimum number of cut circles ties between candidate splits S2, S3 and S4 (Rule 1). Split evenness ties between S3 and S4 (Rule 2). We pick S3, which is more central than S4

Since the recursion does not go equally deep in all of its branches, we use the average leaf node depth to measure how many times on average the initial domain is split to form a particular subdomain. The average leaf node size is the number of data points that leaf nodes contain: this measures the granularity of the decomposition. The largest leaf node ultimately limits scalability of the calculation as it is the largest chunk of undivided work. 2.2.4 Decomposition Parameters Given a set of spatiotemporal points, the following parameters determine the outcomes (i.e., performance metrics) of our spatiotemporal domain decomposition implementations: (1) the maximum number of points per subdomain (threshold T1, see Sect. 2.2.1), (2) the buffer ratio (threshold T2, Sect. 2.2.1), (3) spatial and temporal bandwidths, and (4) output grid resolution. We picked the spatial and temporal bandwidths, which are very important and well-discussed parameters in spatiotemporal analysis (Brunsdon 1995), to illustrate the sensitivity of our

38

A. Hohl et al.

Fig. 3.8 Example of ST-FLEX-D-base

decomposition implementations to varying inputs. We decomposed both datasets using the parameter configuration given in Table 3.1, where all values are kept steady but values for spatial and temporal bandwidth vary (spatial: 200–2500 m in steps of 100 m; temporal: 1 day–14 days in steps of 1 day). Hence, we have 336 different parameter configurations (treatments) for which we compute the metrics introduced above for all implementations (ST-STATIC-D, ST-FLEX-D-base, ST-FLEX-D-uneven) and datasets (dengue fever, pollen tweets) separately. We report boxplots to illustrate the distribution of the performance metrics across the varying parameter configurations. The parameters in Table 3.1 have a substantial influence on the outcome of decomposition. We expect an increase in the number of cut cylinders metric (and the number of replicated points) with increasing bandwidths, as likely more buffers will

3 Spatiotemporal Domain Decomposition for High Performance Computing…

39

Fig. 3.9 Uneven candidate splits

Fig. 3.10 Domain decomposition. Spatial depiction (left), tree (right). Leaf nodes of the tree are denoted by gray color. The average leaf node depth is (1 + 1 + 1 + 2 + 2 + 2 + 2)/7 = 1.57

be intersected by the splitting planes. Therefore, we assessed the sensitivity of the number of cut cylinders metric to varying bandwidth configurations. For both datasets, we report contour plots to show how the metric varies across the parameter space. This gives us an indication of how the decomposition implementations behave for varying parameter configurations, which might be an important factor for choosing a method for a given decomposition task.

40

A. Hohl et al.

Table 3.1 Parameter values for ST-STATIC-D and ST-FLEX-D Parameter Name 1 Maximum number of points per subdomain 2 Buffer ratio 3 Grid resolution 4 Spatial and temporal bandwidths

Values 50

0.01 50 m, 1 day [200 m, 300 m, 400 m, 500 m, 600 m, 700 m, 800 m, 900 m, 1000 m, 1100 m, 1200 m, 1300 m, 1400 m, 1500 m, 1600 m, 1700 m, 1800 m, 1900 m, 2000 m, 2100 m, 2200 m, 2300 m, 2400 m, 2500 m], [1 day, 2 days, 3 days, 4 days, 5 days, 6 days, 7 days, 8 days, 9 days, 10 days, 11 days, 12 days, 13 days, 14 days]

3 Results 3.1 Distribution of Performance Metrics Here, we report boxplots that illustrate the distribution of the performance metrics across the 336 different parameter configurations that result from the values given in Table 3.1. We draw a boxplot for each metric (execution time, number of cut cylinders, average leaf node depth, average leaf node size), decomposition implementation (ST-STATIC-D, ST-FLEX-D-base, ST-FLEX-D-uneven), and dataset (dengue fever, pollen tweets), resulting in a total of 24 plots. 3.1.1 Execution Time of Decomposition Figure 3.11 shows decomposition execution times in seconds. ST-STATIC-D is the fastest, followed by ST-FLEX-D-uneven and ST-FLEX-D-base. We attribute the difference to the added cost of choosing the optimal split for the ST-FLEX-D implementations. In addition, ST-STATIC-D has the smallest variation across parameter configurations, whereas execution times vary substantially more for ST-FLEX-D- base and ST-FLEX-D-uneven. The pollen tweets dataset takes much longer to decompose, but for comparing decomposition implementations, we largely observe the same patterns for both datasets. 3.1.2 T otal Number of Cut Cylinders (Total Number of Replicated Data Points) Generally, the number of cut cylinders is very high (Fig. 3.12), ranging from around 55,000 to 5,500,000 for the dengue fever dataset and from 10,000,000 to 100,000,000 for the pollen tweets dataset. What seem to be unexpectedly high numbers, given the initial number of data points, can be explained by the fact that cylinders may be

3 Spatiotemporal Domain Decomposition for High Performance Computing…

41

Fig. 3.11 Execution times in seconds for dengue fever (left) and pollen tweets (right) datasets

Fig. 3.12 Number of cut cylinders for dengue fever (left) and pollen tweets (right) datasets

cut multiple times as the recursion runs multiple levels deep. ST-STATIC-D performs surprisingly well for this metric (Fig. 3.12), its median is lower than ST-FLEX-D-base and ST-FLEX-D-uneven, although it exhibits the largest range of values for the dengue fever dataset, as well as outliers of extremely high values. Such a distribution is not desired in spatiotemporal domain decomposition as the data redundancy is less predictable. However, the situation is different for the pollen tweets dataset, where the range is similar across all implementations, and the median of ST-STATIC-D (61,591,096) is lower than ST-FLEX-D-base (66,964,108), but higher than ST-FLEX-D-uneven (56,951,505). For both datasets, the interquartile range is similar between ST-STATIC-D and ST-FLEX-D-uneven, whereas ST-FLEX-D-base performs worse.

42

A. Hohl et al.

3.1.3 Average Leaf Node Depth ST-STATIC-D produces the shallowest trees out of all implementations across both datasets (Fig. 3.13). Subdomains resulting from the decomposition procedure are created by splitting the initial domain 5.4 times (dengue fever) and 7.8 times (pollen tweets) on average. ST-FLEX-D-base produces a substantially deeper tree for both datasets, which explains the higher number of cut cylinders as compared with other implementations (deeper recursion = more splits = more cut cylinders), as seen in Sect. 3.1.2. ST-FLEX-D-uneven has a similar average depth than ST-STATIC-D because the candidate split locations are not distributed evenly across the entire domain. Therefore, the split locations are more similar to those of ST-STATIC-D than to those of ST-FLEX-D-base. 3.1.4 Average Leaf Node Size This metric is most relevant for scalability, as the “biggest” subdomain represents the biggest chunk of undivided work. Therefore, it sets a limit for parallel execution time of the spatiotemporal analysis algorithm (step 2 of the divide-and-conquer strategy, see Sect. 1). Figure 3.14 shows that the ST-FLEX-D implementations result in slightly smaller subdomains than ST-STATIC-D for the dengue fever dataset. The pollen tweets dataset produces subdomains of similar size for all implementations, but ST-STATIC-D produces slightly smaller subdomains and a smaller range of values than ST-FLEX-D-base, whereas ST-FLEX-D-uneven is about the same.

Fig. 3.13 Average leaf node depth for dengue fever (left) and pollen tweets (right) datasets

3 Spatiotemporal Domain Decomposition for High Performance Computing…

43

Fig. 3.14 Average leaf node size for dengue fever (left) and pollen tweets (right) datasets

Fig. 3.15 Number of cut cylinders vs. bandwidths resulting from decomposition using ST-STATIC-D for dengue fever (left) and pollen tweets (right) datasets

3.2 Sensitivity to Varying Bandwidths Figures 3.15, 3.16, and 3.17 show the sensitivity of the number of cut cylinders metric to varying spatial-temporal bandwidth configurations. We calculated the metric for each of the 336 different combinations of spatial- and temporal bandwidths separately for both datasets. The values of the metric are substantially higher for the pollen tweets dataset, as compared to the dengue fever dataset. A comparison across all three methods shows that depending on the bandwidth configuration, the ST-FLEX-D implementations may outperform ST-STATIC-D or not. For instance, the number of cut cylinders for the bandwidth configuration of 10 days/1500 m

44

A. Hohl et al.

Fig. 3.16 Number of cut cylinders vs. bandwidths resulting from decomposition using ST-FLEX- D-base for dengue fever (left) and pollen tweets (right) datasets

Fig. 3.17 Number of cut cylinders vs. bandwidths resulting from decomposition using ST-FLEX- D-uneven for dengue fever (left) and pollen tweets (right) datasets

using the pollen tweets dataset are: 88,809,305 (ST-STATIC-D), 85,066,912 (ST-FLEX-D-base), and 71,058,247 (ST-FLEX-D-uneven). Even though the opposite is true for other configurations, it shows that gains in efficiency do exist. We focus on ST-STATIC-D first (Fig. 3.15): Using the dengue fever dataset, the number of cut cylinders is highly influenced by the spatial bandwidth, at least more than compared to the ST-FLEX-D implementations. At spatial- and temporal bandwidths of around 1600 m and 11 days, we observe the highest values, where a further increase in either bandwidth results in a sharp drop and a subsequent increase in the metric (sawtooth pattern). We believe that the pattern is formed by interacting decomposition parameters (bandwidths, maximum number of points per s ubdomain,

3 Spatiotemporal Domain Decomposition for High Performance Computing…

45

buffer ratio). For instance, beyond a spatial bandwidth of 1400–2000 m, the buffer ratio threshold (T2, Sect. 2.2.1) could have the effect of preventing deeper recursion, finer decomposition and therefore, higher values of the number of cut cylinders metric. ST-FLEX-D-base exhibits a similar pattern, where the metric increases with increasing bandwidths (Fig. 3.16). With the pollen tweets dataset, the temporal bandwidth dominates the increase, as compared to the dengue fever dataset. This is evident by the contour lines that mostly exhibit steeper decreasing slopes for the dengue fever dataset. Interestingly, the dengue fever dataset exhibits an “elbow” pattern with ST-FLEX-D-base, where the increase is dominated by the temporal bandwidth up to a value of 3 days. Above that, the spatial bandwidth is more influential. With ST-FLEX-D-uneven (Fig. 3.17), the increase is dominated by the spatial bandwidth for the dengue fever dataset, and by the temporal bandwidth by the pollen tweets dataset. Both datasets exhibit a gradual increase in the number of cut cylinders metric.

4 Discussion and Conclusions In this study, we focused on the first step of the divide-and-conquer strategy, designed and implemented two spatiotemporal domain decomposition methods, ST-FLEX-D-base and ST-FLEX-D-uneven. We built upon a previous method, ST-STATIC-D, which faced the problem of excessive replication of data points. Replication of data points near domain boundaries is a strategy to prevent the introduction of spurious patterns that would arise from the loss of neighborhood information due to partitioning. This could severely degrade the results of the spatiotemporal analysis algorithm we seek to accelerate through decomposition and high-performance parallel computing. Our results indicate that the new methods prevent worst case decomposition where the number of replicated points is excessively high. This is a very encouraging result, given the redundancy stemming from replication limits scalability. Note that reporting performance results from subsequent parallel processing of a spatiotemporal analysis algorithm (the second step of the divide-and-conquer strategy) like STKDE lies beyond the scope of our study. In addition, parallel performance of such algorithms is heavily dependent on leaf node size (the number of data points per subdomain), which we include in our study. A deeper look at the results reveals that reducing redundancy may be the price of increased decomposition time, coarse granularity, and deeper recursion of the decomposition, depending on parameter configuration and input dataset (Sect. 3.1). These results are potentially troublesome: First, the goal of domain decomposition is to run spatiotemporal analysis at scale, and a slow execution time of decomposition is not a step toward that goal. However, we think that it is time well spent as our domain decomposition methods not only enable parallel processing, but also result in increasingly balanced workloads and therefore, scalable computing. Second, a

46

A. Hohl et al.

fine-grained decomposition fosters workload balance because it increases flexibility for allocating subdomains to processors. Contrary, a coarse-grained decomposition takes away flexibility and likely leads to unevenly distributed workloads. Third, if the decomposition is too coarse (which was the case for some parameter configurations of ST-FLEX-D-base using the pollen tweets dataset), scalability is limited because the largest subdomain resulting from decomposition determines parallel execution time. Fourth, the numbers of replicated points (Sect. 3.1.2) are very high, which decreases the efficiency of our approach. A straightforward way to bring down these numbers would be to adjust the decomposition parameters, i.e., the spatial and temporal bandwidths. However, these parameters are determined by analytical considerations and not computational ones and are therefore out of our control. For instance, the bandwidth of STKDE can be determined by domain experts or data-driven methods. Delmelle et al. (2014) used a bandwidth configuration of 750 m/10 days for STKDE using the dengue fever dataset. They justified their choice as the scale at which clustering was strongest. This shows that the range of bandwidths we chose in this study is within the realm of the possible. Previous efforts of spatiotemporal domain decomposition for parallel STKDE have shown some encouraging results regarding computational efficiency using the dengue fever dataset (Hohl et al. 2016), where efficiency is decreasing with increasing levels of parallelization, but a less strongly so for bigger computations (larger bandwidths). Naturally, the performance metrics are heavily influenced by the parameter configuration of decomposition and the input dataset (Sect. 3.2). We observed tremendous variation of the number of cut cylinders metric for different spatial and temporal bandwidths. As expected, the values generally increase with increasing bandwidths across all decomposition implementations and datasets presented here, but some patterns stand out: First, the spatial bandwidth drives the increase of the metric for the dengue fever dataset, at least more so than for the pollen tweets. We think that the reason for this disparity lies in the relationship between spatial- and temporal clustering intensity: due to the geomasking procedure (see Sect. 2.1), the dengue fever cases exhibit higher temporal- than spatial clustering intensity compared to the pollen tweets. Choosing a bandwidth larger than the scale at which points cluster does not substantially increase the number of points that fall within the buffer zone, hence the “elbow” pattern in Fig. 3.16. This knowledge has direct implications on general applicability of our decomposition methods to other datasets. Second, the sawtooth pattern of ST-STATIC-D that is visible for both datasets is likely caused by an interplay between parameters of the decomposition, especially the thresholds T1 and T2. Knowing about this behavior can be helpful for reducing redundancy for future applications of ST-STATIC-D. Either way, since ST-FLEX- D-base and ST-FLEX-D-uneven do not exhibit the sawtooth pattern, they are preferable from this point of view. Current and future efforts include a systematic sensitivity analysis of our decomposition implementations to varying parameter configurations. We will expand our analysis to include more parameters that are relevant to decomposition, such as the spatial and temporal grid resolutions, the number of data points threshold (T1) and

3 Spatiotemporal Domain Decomposition for High Performance Computing…

47

the buffer ratio threshold (T2) and the number of candidate splits (Nc, Sect. 2.2.2). In addition, we work on identifying and implementing ways to speed up the decomposition. These include algorithmic improvements in finding the best split for the ST-FLEX-D implementations, where we could substitute sequential procedures with increased memory usage. Further, we intend to relax our assumption of known, fixed bandwidths and put forward decomposition procedures that cater to space- time analyses that are characterized by adaptive bandwidths (Tiwari and Rushton 2005; Davies and Hazelton 2010). The same applies to anisotropic point patterns, which necessitate an elliptic/ellipsoid search region rather than a circle/cylinder at step 2 of the divide-and-conquer strategy (Li et al. 2015). Lastly, we will investigate different concepts in recursive decomposition. So far, each domain is partitioned by partitioning all dimensions concurrently. We think an investigation may be warranted into the utility of picking the best dimension for partitioning of each subdomain. This may promote subdomain compactness, which is ultimately beneficial for computing performance. As a final note, the concepts and methods set forth in this chapter are transferable to other analysis algorithms and different data types. For instance, raster datasets exhibit constant grid spacing, which is different from the irregularly distributed (clustered) data points of our datasets. However, decomposing a raster dataset for parallel processing belongs to a different class of problems and the methods presented here would require some adaptation. In the case of an irregular domain, quadtree or octree-based methods have been applied for decomposing raster datasets (see Guan and Clarke 2010). However, in the simple case of a square or rectangular domain, a raster dataset can be easily partitioned into subdomains of same size and shape (stripes, squares, or rectangles), and do not require the recursive decomposition methods used in this article.

References Armstrong, M. P. (2000). Geography and computational science. Annals of the Association of American Geographers, 90(1), 146–156. Armstrong, M. P., & Marciano, R. J. (1997). Massively parallel strategies for local spatial interpolation. Computers & Geosciences, 23(8), 859–867. Berger, M. J., & Bokhari, S. H. (1987). A partitioning strategy for nonuniform problems on multiprocessors. IEEE Transactions on Computers, 5, 570–580. Biswas, R., Oliker, L., & Shan, H. (2003). Parallel computing strategies for irregular algorithms. Annual Review of Scalable Computing, 5, 1. Blelloch, G. E., & Maggs, B. M. (1996). Parallel algorithms. ACM Computing Surveys (CSUR), 28(1), 51–54. Brunsdon, C. (1995). Estimating probability surfaces for geographical point data: An adaptive kernel algorithm. Computers & Geosciences, 21(7), 877–894. Davies, T. M., & Hazelton, M. L. (2010). Adaptive kernel estimation of spatial relative risk. Statistics in Medicine, 29(23), 2423–2437. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113.

48

A. Hohl et al.

Delmelle, E., Casas, I., Rojas, J. H., & Varela, A. (2013). Spatio-temporal patterns of dengue fever in Cali, Colombia. International Journal of Applied Geospatial Research (IJAGR), 4(4), 58–75. Delmelle, E., Dony, C., Casas, I., Jia, M., & Tang, W. (2014). Visualizing the impact of space-time uncertainties on dengue fever patterns. International Journal of Geographical Information Science, 28(5), 1107–1127. Desjardins, M. R., Hohl, A., Griffith, A., & Delmelle, E. (2018). A space–time parallel framework for fine-scale visualization of pollen levels across the Eastern United States. Cartography and Geographic Information Science, 46(5), 1–13. Deveci, M., Rajamanickam, S., Devine, K. D., & Çatalyürek, Ü. V. (2016). Multi-jagged: A scalable parallel spatial partitioning algorithm. IEEE Transactions on Parallel and Distributed Systems, 27(3), 803–817. Ding, Y., & Densham, P. J. (1996). Spatial strategies for parallel spatial modelling. International Journal of Geographical Information Systems, 10(6), 669–698. Dutot, P. F., Mounié, G., & Trystram, D. (2004). Scheduling parallel tasks: Approximation algorithms. In J. T. Leung (Ed.), Handbook of scheduling: Algorithms, models, and performance analysis. Boca Raton, FL: CRC Press. Fachada, N., Lopes, V. V., Martins, R. C., & Rosa, A. C. (2017). Parallelization strategies for spatial agent-based models. International Journal of Parallel Programming, 45(3), 449–481. Gao, Y., Wang, S., Padmanabhan, A., Yin, J., & Cao, G. (2018). Mapping spatiotemporal patterns of events using social media: A case study of influenza trends. International Journal of Geographical Information Science, 32(3), 425–449. Goodchild, M. F. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal, 69(4), 211–221. Graham, R. L. (1994). Concrete mathematics: [a foundation for computer science; dedicated to Leonhard Euler (1707–1783)]. New Delhi: Pearson Education. Guan, Q., & Clarke, K. C. (2010). A general-purpose parallel raster processing programming library test application using a geographic cellular automata model. International Journal of Geographical Information Science, 24(5), 695–722. Hagerstrand, T. (1970). What about people in regional science? Papers of the Regional Science Association, 24, 7–21. Hohl, A., Delmelle, E., Tang, W., & Casas, I. (2016). Accelerating the discovery of space-time patterns of infectious diseases using parallel computing. Spatial and spatio-temporal epidemiology, 19, 10–20. Hohl, A., Delmelle, E. M., & Tang, W. (2015). Spatiotemporal domain decomposition for massive parallel computation of space-time kernel density. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2(4), 7. Hohl, A., Griffith, A. D., Eppes, M. C., & Delmelle, E. (2018). Computationally enabled 4D visualizations facilitate the detection of rock fracture patterns from acoustic emissions. Rock Mechanics and Rock Engineering, 51(9), 2733–2746. Hohl, A., Zheng, M., Tang, W., Delmelle, E., & Casas, I. (2017). Spatiotemporal point pattern analysis using Ripley’s K function. In H. A. Karimi & B. Karimi (Eds.), Geospatial data science: techniques and applications. Boca Raton, FL: CRC Press. Huang, F., Liu, D., Tan, X., Wang, J., Chen, Y., & He, B. (2011). Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS. Computers & Geosciences, 37(4), 426–434. Hussain, H., Shoaib, M., Qureshi, M. B., & Shah, S. 2013. Load balancing through task shifting and task splitting strategies in multi-core environment. Paper Read at Eighth International Conference on Digital Information Management. IEEE, pp. 385–390. Kwan, M.-P., Casas, I., & Schmitz, B. (2004). Protection of geoprivacy and accuracy of spatial information: How effective are geographical masks? Cartographica: The International Journal for Geographic Information and Geovisualization, 39(2), 15–28. Kwan, M.-P., & Neutens, T. (2014). Space-time research in GIScience. International Journal of Geographical Information Science, 28(5), 851–854.

3 Spatiotemporal Domain Decomposition for High Performance Computing…

49

Li, L., Bian, L., Rogerson, P., & Yan, G. (2015). Point pattern analysis for clusters influenced by linear features: An application for mosquito larval sites. Transactions in GIS, 19(6), 835–847. Nakaya, T., & Yano, K. (2010). Visualising crime clusters in a space-time cube: An exploratory data-analysis approach using space-time kernel density estimation and scan statistics. Transactions in GIS, 14(3), 223–239. Nicol, D. M. (1994). Rectilinear partitioning of irregular data parallel computations. Journal of Parallel and Distributed Computing, 23(2), 119–134. Padmanabhan, A., Wang, S., Cao, G., Hwang, M., Zhang, Z., Gao, Y., et al. (2014). FluMapper: A cyberGIS application for interactive analysis of massive location-based social media. Concurrency and Computation: Practice and Experience, 26, 13. Samet, H. (1984). The quadtree and related hierarchical data structures. ACM Computing Surveys (CSUR), 16(2), 187–260. Saule, E., Panchananam, D., Hohl, A., Tang, W., & Delmelle, E. (2017). Parallel space-time kernel density estimation. Paper read at 2017 46th International Conference on Parallel Processing (ICPP). Shi, X., & Wang, S. (2015). Computational and data sciences for health-GIS. Annals of GIS, 21(2), 111–118. Shook, E., Wang, S., & Tang, W. (2013). A communication-aware framework for parallel spatially explicit agent-based models. International Journal of Geographical Information Science, 27(11), 2160–2181. Soliman, A., Soltani, K., Yin, J., Padmanabhan, A., & Wang, S. (2017). Social sensing of urban land use based on analysis of twitter users’ mobility patterns. PLoS One, 12(7), e0181657. Stringer, C. E., Trettin, C. C., Zarnoch, S. J., & Tang, W. (2015). Carbon stocks of mangroves within the Zambezi River Delta, Mozambique. Forest Ecology and Management, 354, 139–148. Survila, K., Yιldιrιm, A. A., Li, T., Liu, Y. Y., Tarboton, D. G., & Wang, S. (2016). A scalable high- performance topographic flow direction algorithm for hydrological information analysis. Paper read at Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale. Tang, W. (2008). Geographically-aware intelligent agents. Iowa: University of Iowa. Tang, W., & Bennett, D. A. (2010). Agent-based modeling of animal movement: A review. Geography Compass, 4(7), 682–700. Tang, W., Bennett, D. A., & Wang, S. (2011). A parallel agent-based model of land use opinions. Journal of Land Use Science, 6(2–3), 121–135. Tang, W., Feng, W., & Jia, M. (2015). Massively parallel spatial point pattern analysis: Ripley’s K function accelerated using graphics processing units. International Journal of Geographical Information Science, 29(3), 412–439. Tang, W., Feng, W., Jia, M., Shi, J., Zuo, H., Stringer, C. E., et al. (2017). A cyber-enabled spatial decision support system to inventory mangroves in Mozambique: Coupling scientific workflows and cloud computing. International Journal of Geographical Information Science, 31(5), 907–938. Tang, W., Feng, W., Jia, M., Shi, J., Zuo, H., & Trettin, C. C. (2016). The assessment of mangrove biomass and carbon in West Africa: A spatially explicit analytical framework. Wetlands Ecology and Management, 24(2), 153–171. Tang, W., & Wang, S. (2009). HPABM: A hierarchical parallel simulation framework for spatially- explicit agent-based models. Transactions in GIS, 13(3), 315–333. Tiwari, C., & Rushton, G. (2005). Using spatially adaptive filters to map late stage colorectal cancer incidence in Iowa. In Developments in spatial data handling (pp. 665–676). Berlin: Springer. Turton, I. (2003). Parallel processing in geography. Paper Read at Geocomputation. Varela, A., Aristizabal, E. G., & Rojas, J. H. (2010). Analisis epidemiologico de dengue en Cali. Cali: Secretaria de Salud Publica Municipal. Wang, S. (2008). Formalizing computational intensity of spatial analysis. Paper Read at Proceedings of the 5th International Conference on Geographic Information Science.

50

A. Hohl et al.

Wang, S., & Armstrong, M. P. (2003). A quadtree approach to domain decomposition for spatial interpolation in grid computing environments. Parallel Computing, 29(10), 1481–1504. Wang, S., Cowles, M. K., & Armstrong, M. P. (2008). Grid computing of spatial statistics: Using the TeraGrid for G i∗(d) analysis. Concurrency and Computation: Practice and Experience, 20(14), 1697–1720. Wilkinson, B., & Allen, M. (2004). Parallel programming: Techniques and applications using networked workstations and parallel computers (2nd ed.). Upper Saddle River, NJ: Pearson Prentice Hall. Ye, S., Li, H.-Y., Huang, M., Ali, M., Leng, G., Leung, L. R., et al. (2014). Regionalization of subsurface stormflow parameters of hydrologic models: Derivation from regional analysis of streamflow recession curves. Journal of Hydrology, 519, 670–682. Ye, X., Li, S., Yang, X., & Qin, C. (2016). Use of social media for the detection and analysis of infectious diseases in China. ISPRS International Journal of Geo-Information, 5(9), 156. Yin, J., Gao, Y., & Wang, S. (2017). CyberGIS-enabled urban sensing from volunteered citizen participation using mobile devices. In Seeing cities through big data (pp. 83–96). Cham: Springer. Zheng, M., Tang, W., Lan, Y., Zhao, X., Jia, M., Allan, C., et al. (2018). Parallel generation of very high resolution digital elevation models: High-performance computing for big spatial data analysis. In Big data in engineering applications (pp. 21–39). Singapore: Springer. Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data. New York: McGraw-Hill Osborne Media.

Part II

High Performance Computing for Geospatial Analytics

Chapter 4

Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions Zhenlong Li

Abstract Geospatial big data plays a major role in the era of big data, as most data today are inherently spatial, collected with ubiquitous location-aware sensors such as mobile apps, the global positioning system (GPS), satellites, environmental observations, and social media. Efficiently collecting, managing, storing, and analyzing geospatial data streams provide unprecedented opportunities for business, science, and engineering. However, handling the “Vs” (volume, variety, velocity, veracity, and value) of big data is a challenging task. This is especially true for geospatial big data since the massive datasets must be analyzed in the context of space and time. High performance computing (HPC) provides an essential solution to geospatial big data challenges. This chapter first summarizes four critical aspects for handling geospatial big data with HPC and then briefly reviews existing HPC- related platforms and tools for geospatial big data processing. Lastly, future research directions in using HPC for geospatial big data handling are discussed. Keywords Geospatial big data · High performance computing · Cloud computing · Fog computing · Spatiotemporal indexing · Domain decomposition · GeoAI

1 Introduction Huge quantities of data are being generated across a broad range of domains, including banking, marketing, health, telecommunications, homeland security, computer networks, e-commerce, and scientific observations and simulations. These data are Z. Li (*) Geoinformation and Big Data Research Laboratory, Department of Geography, University of South Carolina, Columbia, SC, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_4

53

54

Z. Li

called big data. While there is no consensus on the definition of big data (Ward and Barker 2013; De Mauro et al. 2015), one widely used definition is: “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze” (Manyika et al. 2011, p. 1). Geospatial big data refers to a specific type of big data that contains location information. Location information plays a significant role in the big data era, as most data today are inherently spatial, collected with ubiquitous location-aware sensors such as satellites, GPS, and environmental observations. Geospatial big data offers great opportunities for advancing scientific discoveries across a broad range of fields, including climate science, disaster management, public health, precision agriculture, and smart cities. However, what matters is not the big data itself but the ability to efficiently and promptly extract meaningful information from it, an aspect reflected in the widely used big data definition provided above. Efficiently extracting such meaningful information and patterns is challenging due to big data’s 5-V characteristics—volume, velocity, variety, veracity, value (Zikopoulos and Eaton 2011; Zikopoulos et al. 2012; Gudivada et al. 2015)—and geospatial data’s intrinsic feature of space and time. Volume refers to the large amounts of data being generated. Velocity indicates the high speed of data streams and that accumulation exceeds traditional settings. Variety refers to the high heterogeneity of data, such as different data sources, formats, and types. Veracity refers to the uncertainty and poor quality of data, including low accuracy, bias, and misinformation. For geospatial big data, these four Vs must be handled in the context of dynamic space and time to extract the “value”, which creates further challenges. High performance computing (HPC) provides an essential solution to geospatial big data challenges by allowing fast processing of massive data collections in parallel. Handing geospatial big data with HPC can help us make quick and better decisions in time-sensitive situations, such as emergency response (Bhangale et al. 2016). It also helps us to solve larger problems, such as high-resolution global forest cover change mapping in reasonable timeframes (Hansen et al. 2013) and to achieve interactive analysis and visualization of big data (Yin et al. 2017). This chapter explores how HPC is used to handle geospatial big data. Section 2 first summarizes four typical sources of geospatial big data. Section 3 describes the four key components, including data storage and management (Sect. 3.1), spatial indexing (Sect. 3.2), domain decomposition (Sect. 3.3), and task scheduling (Sect. 3.4). Section 4 briefly reviews existing HPC-enabled geospatial big data handling platforms and tools, which are summarized into four categories: general-purpose (Sect. 4.1), geospatial-oriented (Sect. 4.2), query processing (Sect. 4.3), and workflow-based (Sect. 4.4). Three future research directions for handling geospatial big data with HPC are suggested in Sect. 5, including working towards a discrete global grid system (Sect. 5.1), fog computing (Sect. 5.2), and geospatial artificial intelligence (Sect. 5.3). Lastly, Sect. 6 summarizes the chapter.

4 Geospatial Big Data Handling with High Performance Computing: Current…

55

2 Sources of Geospatial Big Data Four typical sources of geospatial big data are summarized below. • Earth observations Earth observation systems generate massive volumes of disparate, dynamic, and geographically distributed geospatial data with in-situ and remote sensors. Remote sensing, with its increasingly higher spatial, temporal, and spectral resolutions, is one primary approach for collecting Earth observation data on a global scale. The Landsat archive, for example, exceeded one petabyte and contained over 5.5 million images several years ago (Wulder et al. 2016; Camara et al. 2016). As of 2014, NASA’s Earth Observing System Data and Information System (EOSDIS) was managing more than nine petabytes of data, and it is adding about 6.4 terabytes to its archives every day (Blumenfeld 2019). In recent years, the wide use of drone- based remote sensing has opened another channel for big Earth observation data collection (Athanasis et al. 2018). • Geoscience model simulations The rapid advancement of computing power allows us to model and simulate Earth phenomena with increasingly higher spatiotemporal resolution and greater spatiotemporal coverage, producing huge amounts of simulated geospatial data. A typical example is the climate model simulations conducted by the Intergovernmental Panel on Climate Change (IPCC). The IPCC Fifth Assessment Report (AR5) alone produced ten petabytes of simulated climate data, and the next IPCC report is estimated to produce hundreds of petabytes (Yang et al. 2017; Schnase et al. 2017). Besides simulations, the process of calibrating the geoscience models also produces large amounts of geospatial data, since a model often must be run many times to sweep different parameters (Murphy et al. 2004). When calibrating ModelE (a climate model from NASA), for example, three terabytes of climate data were generated from 300 model-runs in just one experiment (Li et al. 2015). • Internet of Things The term Internet of Things (IoT) was first coined by Kevin Ashton in 1999 in the context of using radio frequency identification (RFID) for supply chain management (Ashton 2009). Simply speaking, the IoT connects “things” to the internet and allows them to communicate and interact with one another, forming a vast network of connected things. The things include devices and objects such as sensors, cellphones, vehicles, appliances, and medical devices, to name a few. These things, coupled with now-ubiquitous location-based sensors, are generating massive amounts of geospatial data. In contrast to Earth observations and model simulations that produce structured multidimensional geospatial data, IoT continuously generates unstructured or semi-structured geospatial data streams across the globe, which are more dynamic, heterogeneous, and noisy.

56

Z. Li

• Volunteered geographic information Volunteered geographic information (VGI) refers to the creation and dissemination of geographic information from the public, a process in which citizens are regarded as sensors moving “freely” over the surface of the Earth (Goodchild 2007). Enabled by the internet, Web 2.0, GPS, and smartphone technologies, massive amounts of location-based data are being generated and disseminated by billions of citizen sensors inhabiting the world. Through geotagging (location sharing), for example, social media platforms such as Twitter, Facebook, Instagram, and Flickr provide environments for digital interactions among millions of people in the virtual space while leaving “digital footprints” in the physical space. For example, about 500 million tweets are sent per day according to Internet Live Stats (2019); assuming the estimated 1% geotagging rate (Marciniec 2017), five million tweets are geotagged daily.

3 K ey Components of Geospatial Big Data Handling with HPC 3.1 Data Storage and Management Data storage and management is essential for any data manipulation system, and it is especially challenging when handling geospatial big data with HPC for two reasons. First, the massive volumes of data require large and reliable data storage. Traditional storage and protective fault-tolerance mechanisms, such as RAID (redundant array of independent disks), cannot efficiently handle data at the petabyte scale (Robinson 2012). Second, the fast velocity of the data requires storage with the flexibility to scale up or out to handle the ever-increasing storage demands (Katal et al. 2013). There are three common types of data storage paradigms in HPC: shared- everything architecture (SEA), shared-disk architecture (SDA), and shared-nothing architecture (SNA) (Fig. 4.1). With SEA, data storage and processing are often

Fig. 4.1 Illustration of different data storage architectures in HPC systems

4 Geospatial Big Data Handling with High Performance Computing: Current…

57

backed by a single high-end computer. The parallelization is typically achieved with multi-cores or graphics processing units (GPUs) accessing data from local disks. The storage of SEA is limited to a single computer and thus cannot efficiently handle big data. SDA is a traditional HPC data storage architecture that stores data in a shared system that can be accessed by a cluster of computers in parallel over the network. Coupled with the message passing interface (MPI) (Gropp et al. 1996), the SDA- based HPC enables data to be transferred from storage to the compute nodes and processed in parallel. Most computing-intensive geospatial applications used it prior to the big data era. However, SDA does not work well with big data, as transferring large amounts of data over the network quickly creates a bottleneck in the system (Yin et al. 2013). In addition, the shared disk is prone to become the single point failure of the system. Shared-nothing architecture (SNA) is not a new paradigm. Stonebraker pointed out in 1986 that shared-nothing was a preferred approach in developing multiprocessor systems at that time. With SNA, the data are distributedly stored on the cluster computers, each locally storing a subset of the data. SNA has become the de-facto big data storage architecture nowadays because: (1) it is scalable, as new compute nodes can be easily added to an HPC cluster to increase its storage and computing capacity, (2) each data subset can be processed locally by the computer storing it, significantly reducing data transmission over the network, and (3) the single point failure is eliminated since the computers are independent and share no centralized storage. One popular implementation of SNA is the Hadoop Distributed File System (HDFS) (Shvachko et al. 2010)—the core storage system for the Hadoop ecosystem. HDFS splits data into blocks and stores them across different compute nodes in a Hadoop cluster so that they can be processed in parallel. Like HDFS, most NoSQL (not only SQL) databases—including HBase (Vora 2011), MongoDB (Abramova and Bernardino 2013), and Google BigTable (Chang et al. 2008)— adopt SNA to store and manage big unstructured or semi-structured data. Since HDFS and NoSQL databases are not designed to store and manage geospatial data, many studies have been conducted to modify or extend these systems by integrating the spatial dimension (e.g., Wang et al. 2013a; Zhang et al. 2014; Eldawy and Mokbel 2015). Because the access patterns of a geospatial data partition (or block) are strongly linked to its neighboring partitions, co-locating the partitions that are spatially close with each other to the same computer node often improves data access efficiency in SNA (Fahmy et al. 2016; Baumann et al. 2018).

3.2 Spatial Indexing With HPC, many processing units must concurrently retrieve different pieces of the data to perform various data processing and spatial analysis in parallel (e.g., clipping, road network analysis, remote sensing image classification). Spatial indexing is used

58

Z. Li

to quickly locate and access the needed data, such as specific image tiles for raster data or specific geometries for vector data, from a massive dataset. Since the performance of the spatial index determines the efficiency of concurrent spatial data visits (Zhao et al. 2016), it directly impacts the performance of parallel data processing. Most spatial indexes are based on tree data structures, such as the quadtree (Samet 1984), KD-tree (Ooi 1987), R-tree (Guttman 1984), and their variants. Quadtree recursively divides a two-dimensional space into four quadrants based on the maximum data capacity of each leaf cell (e.g., the maximum number of points allowed). A KD-tree is a binary tree often used for efficient nearest-neighbor searches. An R-tree is similar to a KD-tree, but it handles not only point data but also rectangles such as geometry bounding boxes. As a result, R-trees and their variants have been widely used for spatial indexing (e.g., Xia et al. 2014; Wang et al. 2013a). Especially focusing on geospatial big data, He et al. (2015) introduced a spatiotemporal indexing method based on decomposition tree raster data indexing for parallel access of big multidimensional movement data. SpatialHadoop uses an R-tree-based, two-level (global and local) spatial indexing mechanism to manage vector data (Eldawy and Mokbel 2015) and a quadtree-based approach to index raster data (Eldawy et al. 2015). The ability to store and process big data in its native formats is important because converting vast amounts of data to other formats requires effort and time. However, most indexing approaches for handling geospatial big data in an HPC environment (such as Hadoop) require data conversion or preprocessing. To tackle this challenge, Li et al. (2017c) proposed a spatiotemporal indexing approach (SIA) to store and manage massive climate datasets in HDFS in their native formats (Fig. 4.2). By linking the physical location information of node, file, and byte to the logical spatiotemporal information of variable, time, and space, a specific climate variable at a specific time, for example, can be quickly located and retrieved from terabytes of climate data at the byte level. The SIA approach has been extended to support other array-based datasets and distributed computing systems. For example, it was adopted by the National Aeronautics and Space Administration (NASA) as one of the key technologies in its Data Analytics and Storage System (DAAS) (Duffy et al. 2016). Based on SIA, Hu

Fig. 4.2 Illustration of the spatiotemporal indexing approach (Li et al. 2017c)

4 Geospatial Big Data Handling with High Performance Computing: Current…

59

et al. (2018) developed an in-memory distributed computing framework for big climate data using Apache Spark (Zaharia et al. 2016). Following a concept similar to SIA, Li et al. (2018) developed a tile-based spatial index to handle large-scale LiDAR (light detection and ranging) point-cloud data in HDFS in their native LAS formats.

3.3 Domain Decomposition Taking a divide-and-conquer approach, HPC first divides a big problem into concurrent small problems and then processes them in parallel using multiple processing units (Ding and Densham 1996). This procedure is called decomposition. Based on the problem to be solved, the decomposition will take one of three forms: domain decomposition, function decomposition, or both. Domain decomposition treats the data to be processed as the problem and decomposes them into many small datasets. Parallel operations are then performed on the decomposed data. Function decomposition, on the other hand, focuses on the computation, dividing the big computation problem (e.g., a climate simulation model) into small ones (e.g., ocean model, atmospheric model). We focus on domain decomposition here, as it is the typical approach used for processing geospatial big data with HPC. Geospatial data, regardless of source or type, can be abstracted as a five- dimensional (5D) tuple , where X, Y, Z denotes a location in three- dimensional space, T denotes time, and V denotes a variable (spatial phenomenon), such as the land surface temperature observed at location X, Y, Z and time T. If a dimension has only one value, it is set to 1 in the tuple. For example, NASA’s Modern-Era Retrospective analysis for Research and Applications (MERRA) hourly land surface data can be represented as since there are no vertical layers. Based on this abstraction, domain decomposition can be applied to different dimensions of the data, resulting in different decompositions, such as 1D decomposition, 2D decomposition, and so on (Fig. 4.3). The total number of subdo-

Fig. 4.3 Illustration of domain decomposition. (a) 1D decomposition, decomposing any dimension of ; (b) 2D decomposition, decomposing any two dimensions of ; (c) 3D decomposition, decomposing any three dimensions of

60

Z. Li

mains produced by a domain decomposition equals the product of the number of slices of each domain. Spatial decomposition occurs when data along the spatial dimensions are decomposed. 2D spatial decomposition along often utilizes the regular grid or a quadtree-based approach, though irregular decomposition has also been used (Widlund 2009; Guan 2009). Wang and Armstrong (2003), for example, developed a parallel inverse-distance-weighted (IDW) spatial interpolation algorithm in an HPC environment using a quadtree-based domain decomposition approach. The quadtree was used to decompose the study area for adaptive load balancing. In a similar approach described by Guan et al. (2006), a spatially adaptive decomposition method was used to produce workload-oriented spatially adaptive decompositions. A more recent study by Li et al. (2018) used a regular grid to divide the study area into many equal-sized subdomains for parallel LiDAR data processing. The size of the grid cell is calculated based on the study area size and available computing resources to maximize load balancing. Like 2D spatial decomposition, 3D spatial decomposition often uses a regular cube or octree-based approach to create 3D subdomains (Tschauner and Salinas 2006). For example, Li et al. (2013) processed 3D environmental data (dust storm data) in parallel in an integrated GPU and CPU framework by equally dividing the data into 3D cubes. Temporal decomposition decomposes data along the time dimension, which works well for time-series data. Variable decomposition can be applied when a dataset contains many variables. For instance, MERRA land reanalysis data (MST1NXMLD) contains 50 climate variables that span from 1979 to the present with an hourly temporal resolution and a spatial resolution of 2/3 × 1/2 degree (Rienecker et al. 2011). In this case, the decomposition can be applied to the temporal dimension (T), the variable dimension (V), or both (T, V) (Li et al. 2015, 2017c). When conducting domain decomposition, we need to consider whether dependence exists among the subdomains—in other words, whether a subdomain must communicate with others. For spatial decomposition, we need to check whether spatial dependence exists. For example, when parallelizing the IDW spatial interpolation algorithm using quadtree-based spatial decomposition, neighboring quads need to be considered (Wang and Armstrong 2003). For some other operations, such as rasterizing LiDAR points, each subdomain can be processed independently without communicating with others (Li et al. 2018). For temporal decomposition, temporal dependence may need to be considered. For example, to extract the short- or long-term patterns from time-series data requires considering temporal dependencies in the decomposition (Asadi and Regan 2019). Conversely, computing the global annual mean of an hourly climate variable does not require such consideration. Knowing whether to consider dependence when decomposing data helps us design more efficient decomposition methods because avoiding unnecessary communications among subdomains often leads to better performance (Li et al. 2018). The problem of spatial dependence can be solved in multiple ways as summarized in Zheng et al. (2018). Spatial and temporal buffering can be used in domain decomposition to prevent communication with neighboring subdomains. For example,

4 Geospatial Big Data Handling with High Performance Computing: Current…

61

Hohl et al. (2015) used spatiotemporal buffers to include adjacent data points when parallelizing the kernel density analysis. In addition to spatiotemporal dependence, the distribution of underlying data also needs special consideration for spatial and spatiotemporal domain decomposition because different data might pose different requirements for decomposition. For instance, while Hohl et al. (2018) decompose data that are distributed irregularly in all three dimensions, the data in Desjardins et al. (2018) are distributed irregularly in space, but regularly in time. As a result, different decomposition methods are used in the two examples for optimized performance.

3.4 Task Scheduling Task scheduling refers to distributing subtasks (subdomains) to concurrent computing units (e.g., CPU cores or computers) to be processed in parallel. Task scheduling is essential in HPC because the time spent to finish subtasks has a direct impact on parallelization performance. Determining an effective task schedule depends on the HPC programming paradigms and platforms (e.g., MPI-based or Hadoop-based), the problems to be parallelized (e.g., data-intensive or computation-intensive), and the underlying computing resources (e.g., on-premise HPC cluster or on-demand cloud-based HPC cluster). Regardless, two significant aspects must be considered to design efficient task scheduling approaches for geospatial big data processing: load balancing and data locality. Load balancing aims to ensure each computing unit receives a similar (if not identical) number of subtasks for a data processing job so that each finishes at the same time. This is important because in parallel computing, the job’s finishing time is determined by the last finished task. Therefore, the number of subdomains and the workload of each should be considered along with the number of available concurrent computing units for load balancing. A load balancing algorithm can use static scheduling that either pre-allocates or adaptively allocates tasks to each computing unit (Guan 2009; Shook et al. 2016). For example, Wang and Armstrong (2003) scheduled tasks based on the variability of the computing capacity at each computing site and the number of workloads used to partition the problem in a grid computing environment. While most big data processing platforms (such as Hadoop) have built-in load balancing mechanisms, they are not efficient when processing geospatial big data. Hadoop-based geospatial big data platforms, such as GeoSpark (Yu et al. 2015) and SpatialHadoop (Eldawy and Mokbel 2015), often provide customized load balancing mechanisms that consider the nature of spatial data. For example, Li et al. (2017c) used a grid assignment algorithm and a grid combination algorithm to ensure each compute node received a balanced workload when processing big climate data using Hadoop. When processing big LiDAR data, Li et al. (2018) calculated the number of subdomains to be decomposed based on the data volume and the number of compute nodes in a cluster. In all cases, the subdomains should be comparably sized to better

62

Z. Li

balance the load. In a cloud-based HPC environment, load balancing can also be achieved by automatically provisioning computing resources (e.g., add more compute nodes) based on the dynamic workload (Li et al. 2016). Data locality refers to how close data are to their processing locations; a shorter distance indicates better data locality (Unat et al. 2017). Good data locality requires less data movement during parallel data processing and thus leads to better performance. Discussing data locality makes little sense in traditional HPC since it uses shared-disk architecture (Sect. 2.1). A shared-disk architecture separates compute nodes and storage, thus requiring data movement. However, data locality is important for geospatial big data processing (Guo et al. 2012) because big data platforms (e.g., Hadoop) use shared-nothing storage; moving massive data among the compute nodes over the network is costly. To archive data locality, the task scheduler is responsible for assigning a subdomain (data subset) to the compute node where the subdomain is located or stored. Thus, the task scheduler must know a subdomain’s storage location, which can be realized by building an index to link data location in the cluster space to other spaces—geographic, variable, and file spaces. For instance, with a spatiotemporal index recording of the compute node on which a climate variable is stored, 99% of the data grids can be assigned to the compute nodes where the grids are stored, significantly improving performance (Li et al. 2017c). In a LiDAR data processing study (Li et al. 2018), a spatial index was used to record a data tile’s location in both the cluster and geographic spaces. Each subdomain was then assigned to the node where most of the tiles were stored. It is worth noting that besides load balancing and data locality, other factors such as computing and communication costs should also be considered for task scheduling.

4 E xisting Platforms for Geospatial Big Data Handling with HPC There are many existing platforms for handling geospatial big data with HPC. These offer various programming models and languages, software libraries, and application programming interfaces (APIs). Here I briefly review some of the popular platforms by summarizing them into four general categories.

4.1 General-Purpose Platforms General-purpose parallel programming platforms are designed to handle data from different domains. Open MPI, for example, is an open source MPI implementation for traditional HPC systems (Gabriel et al. 2004). Another open source HPC software framework is HTCondor (known as Condor before 2012), which supports both MPI and Parallel Virtual Machine (Thain et al. 2005). Different from Open MPI and HTCondor, CUDA is a parallel computing platform designed to harness the power of

4 Geospatial Big Data Handling with High Performance Computing: Current…

63

the graphics processing unit (GPU) (Nvidia 2011). GPU has a transformative impact on big data handling. A good example of how GPU enables big data analytics in the geospatial domain can be found in Tang et al. (2015). Entering the big data world, Hadoop, an open source platform, is designed to handle big data using a shared-nothing architecture consisting of commodity computers (Taylor 2010). With Hadoop, big data is stored in the Hadoop distributed files system (HDFS) and is processed in parallel using the MapReduce programming model introduced by Google (Dean and Ghemawat 2008). However, Hadoop is a batch processing framework with high latency and does not support real-time data processing. Apache Spark, an in-memory distributed computing platform using the same shared-nothing architecture as Hadoop, overcomes some of Hadoop’s limitations (Zaharia et al. 2016).

4.2 Geospatial-Oriented Platforms As general-purpose platforms are not designed for handling geospatial data, efforts have been made to adapt existing parallel libraries or frameworks for them. Domain decomposition, spatial indexing, and task scheduling are often given special considerations when building geospatial-oriented programming libraries. One outstanding early work is GISolve Toolkit (Wang 2008), which aims to enhance large geospatial problem-solving by integrating HPC, data management, and visualization in cyber- enabled geographic information systems (CyberGIS) environment (Wang 2010; Wang et al. 2013b). Later, Guan (2009) introduced an open source general-purpose parallel-raster-processing C++ library using MPI. More recently, Shook et al. (2016) developed a Python-based library for multi-core parallel processing of spatial data using a parallel cartographic modeling language (PCML). In the big data landscape, an array of open source geospatial platforms has been developed based on Hadoop or Hadoop-like distributed computing platforms, including, for example, HadoopGIS (Wang et al. 2011), GeoTrellis (Kini and Emanuele 2014), SpatialHadoop (Eldawy and Mokbel 2015), GeoSpark (Yu et al. 2015), GeoMesa (Hughes et al. 2015), EarthServer (Baumann et al. 2016), GeoWave (Whitby et al. 2017), and ST_Hadoop (Alarabi et al. 2018). While not open source, Google Earth Engine (Gorelick et al. 2017) is a powerful and planetary-scale geospatial big data platform for parallel processing and analysis of petabytes of satellite imagery and other geospatial datasets.

4.3 Query Processing Most general-purpose and geospatial-oriented programming libraries allow users to develop parallel data processing programs based on APIs. Computer programming or scripting is generally needed, though some platforms offer high-level interfaces

64

Z. Li

to ease development. Query processing falls into another category of big data processing that leverages structured query language for programming. Query processing, especially SQL-based, has gained noticeable popularity in the big data era, partly because it balances the usability and flexibility of a big data processing platform: more flexible than a static graphic user interface with fixed functions but less complicated than programming libraries (Li et al. 2019). For raster data processing, the data can be naturally organized as data cubes (an array database), and traditional data cube operations—such as roll-up, drill-down, and slice—can be performed in parallel in an HPC environment. Examples of such platforms include RasDaMan (Baumann et al. 1999), SciDB (Cudré-Mauroux et al. 2009), and EarthDB (Planthaber et al. 2012). More recently, large-scale raster data query processing has been investigated using Hadoop Hive and Apache Spark. Li et al. (2017b), for example, introduced a query analytic framework to manage, aggregate, and retrieve array-based data in parallel with intuitive SQL-style queries (HiveSQL). Based on the query analytical framework, an online scalable visual analytical system called SOVAS (Fig. 4.4) was developed for query processing of big climate data using an extended-SQL as the query language (Li et al. 2019). Instead of using Hadoop, Hu et al. (2018) developed an in-memory big climate data computing framework based on the Spark platform that uses Spark SQL for query processing. PostGIS is a good example demonstrating how SQL works for vector data query processing (Ramsey 2005). However, it falls short in handling geospatial big data due to its limited scalability. Esri tools for Hadoop (Esri 2013) is one early effort to build a scalable big-vector data query processing framework based

Fig. 4.4 SQL-based query analytics of big climate data with SOVAS (https://gidbusc.github.io/ SCOVAS)

4 Geospatial Big Data Handling with High Performance Computing: Current…

65

on Hadoop. In this framework, HiveSQL is the query language, and a suite of user-defined functions (UDFs) developed on top of the Esri Geometry API support various spatial operations, such as point-in-polygon and overlay. Later, Apache SparkSQL was adapted to develop a number of large-scale vector data query processing systems, such as GeoMesa SparkSQL (Kini and Emanuele 2014), GeoSpark SQL (Huang et al. 2017), and Elcano (Engélinus and Badard 2018). In contrast to these open source systems, Google BigQuery GIS offers a commercial tool that performs spatial operations using standard SQL to analyze big vector data (Google 2019).

4.4 Workflow-Based Systems Scientific workflow treats the data processing task as a pipeline consisting of a series of connected operations. For big data processing, an operation can be a parallel data processing task powered by HPC. There are many general-purpose scientific workflow systems developed to work in a distributed computing environment, including Kepler (Altintas et al. 2004), Triana (Taylor et al. 2005), Taverna (Hull et al. 2006), and VisTrails (Callahan et al. 2006). Since these workflow systems are not designed to work with geospatial data, efforts have been made to adapt them to build workflows for geospatial data processing (e.g., Jaeger et al. 2005; Zhang et al. 2006; Bouziane et al. 2008). Geospatial service chaining is a service-based workflow approach for geospatial data processing in which each operation is provided as a web service (Yue et al. 2010; Gong et al. 2012). The web services used in the service chain are often based on the Open Geospatial Consortium’s (OGC) standardized spatial web services for interoperability, including its Web Processing Service (WPS) for data processing, Web Feature Service (WFS) for vector data manipulation, Web Coverage Service (WCS) for raster data manipulation, and Web Mapping Service (WMS) for data visualization (Li et al. 2011). Over the past few years, studies have developed geospatial processing services running in the cloud-based HPC environment (Yoon and Lee 2015; Tan et al. 2015; Baumann et al. 2016; Zhang et al. 2017; Lee and Kim 2018). A cloud-based HPC brings several advantages for geoprocessing workflow with big data, such as on-demand computing resource provision and high scalability. For example, Li et al. (2015) developed a cloud-based workflow framework for parallel processing of geospatial big data (Fig. 4.5). In this framework, computing resources, such as Hadoop computing clusters and MaaS clusters (Li et al. 2017a), can be provisioned as needed when running the workflow.

66

Z. Li

Fig. 4.5 Geospatial big data handling using a cloud-based and MapReduce-enabled workflow

5 Directions for Further Research 5.1 T owards a Discrete Global Reference Framework with HPC Heterogeneity has for a long time been a challenge in traditional geospatial data handling. Heterogeneity manifests in multiple aspects, including data collection approaches (e.g., remote sensing, land surveying, GPS), data models and formats (e.g., raster, vector), spatiotemporal scales/resolutions (e.g., from local to regional to world, from centimeters to meters to kilometers). Geospatial big data further creates heterogeneity through the ubiquitous location-based sensors collecting data from a broad range of sectors. Such heterogeneity makes it challenging to integrate and fuse geospatial big data with HPC. Most current HPC systems and studies handle a specific type of geospatial data with specific parallel algorithms, partly due to the lack of a referencing framework that can efficiently store, integrate, and manage the data in a way optimized for data integration and parallel processing. While traditional coordinate systems (such as the system based on latitude and longitude) have been successful as a frame of reference, a relatively new framework called the discrete global grid system (DGGS) is believed to work better in managing and processing the heterogeneous geospatial big data associated with the curved surface of the Earth (Sabeur et al. 2019). DGGS represents “the Earth as hierarchical sequences of equal area tessellations on the surface of the Earth, each with global coverage and with progressively finer spatial resolution” (OGC 2017). It aims to provide a unified, globally consistent reference framework to integrate heterogeneous spatial data—such as raster, vector, and point cloud—with different

4 Geospatial Big Data Handling with High Performance Computing: Current…

67

spatiotemporal scales and resolutions. The design of DGGS makes it natively suitable for parallel processing with HPC, as the data that it stores and manages has already been decomposed into discrete subdomains. However, currently, most HPC- based spatial data processing research and tools remain based on traditional reference frameworks. Future research is needed to investigate spatiotemporal indexes, parallel algorithms, and big data computing platforms in the context of DGGS and HPC.

5.2 Towards Fog Computing with HPC Fog computing is an emerging computing paradigm that resides between smart end- devices and traditional cloud or data centers (Iorga et al. 2017). It aims to process big data generated from distributed IoT devices (also called edge devices) in real- time to support applications such as smart cities, precision agriculture, and autonomous vehicles. In traditional IoT architecture, the limited computing power of edge devices means the data they generate are directly uploaded to the cloud with no or very limited processing. This creates noticeable latency because the data are often far away from the cloud (poor data locality). Fog computing provides a middle computing layer—a cluster of fog nodes—between the edge devices and cloud. Since the fog nodes have more computing power and are close to the edge devices with low network latency (good data locality), edge device data can be quickly transferred to them for real-time filtering and processing. The filtered data can then be transferred to the cloud as needed for data mining and analysis using Hadoop- like systems, artificial intelligence, or traditional HPC. IoT generates geospatial big data, thanks to the ubiquitous location-based sensors on edge devices. In this sense, real-time geospatial data processing is critical in fog computing. HPC should be researched and utilized in fog computing to deliver real-time responses for decision making (e.g., by an autonomous vehicle) from the following aspects: (a) Geospatial data processing in the cloud: As cloud computing plays an important role in fog computing, research on how to efficiently transfer data from edge devices to the cloud and to process geospatial data in parallel in a cloud environment is greatly needed. (b) Geospatial data processing on the fog node: Since fog computing aims to provide real-time data processing, research is needed to design parallel computing algorithms and platforms that better utilize the embedded, mobile, and low-end fog node computers. (c) Geospatial data processing in the fog cluster: Fog nodes are connected with a high-speed, low-latency network, which can form a high performance computing cluster. Unlike traditional computing clusters, such nodes might be mobile within a complex networking environment. For example, if autonomous cars are deployed as fog nodes, we could use those parked in a garage as a computing cluster. The challenges include, for example, how to efficiently form a computing cluster considering the spatial locations of fog nodes, how to use domain decomposition to assign the distributed edge devices

68

Z. Li

to fog nodes, and how to develop smart scheduling algorithms to assign data processing tasks to appropriate nodes.

5.3 Towards Geospatial Artificial Intelligence with HPC Artificial intelligence (AI) is a computer science field that uses computers to mimic human intelligence for problem-solving (Minsky 1961). Deep learning, a branch of machine learning in AI, has made significant progress in recent years with a broad range of applications, such as natural language processing and computer visions (Chen and Lin 2014; LeCun et al. 2015). Unlike traditional machine learning, in which parameters of an algorithm (e.g., support vector machine) are configured by experts, deep learning determines these parameters by learning the patterns in a large amount of data based on artificial neural networks. Geospatial artificial intelligence (GeoAI) uses AI technologies like deep learning to extract meaningful information from geospatial big data (VoPham et al. 2018). GeoAI has had success across a broad range of applications, especially in remote sensing, such as image classification (Hu et al. 2015), object detection (Cheng et al. 2016), and land cover mapping (Kussul et al. 2017; Ling and Foody 2019). While GeoAI is a promising solution for geospatial big data challenges, geospatial big data is likewise critical in training GeoAI’s complex deep neural networks (DNNs) and is the catalyst that has stimulated deep learning advancements in recent years. As highlighted by Jeff Dean (2016) of the Google Brain team, an important property of neural networks is that results improve when using more data and computations to train bigger models. This is where high performance computing comes into play. Tech giants such as Google, Microsoft, and IBM have been leading the development of large-scale AI platforms that run on big computing clusters. Most current GeoAI research in the literature, however, is conducted on single-node computers or workstations using relatively small amounts of data to train the model. For example, Zhang et al. (2018) conducted an object-based convolutional neural network for urban land use classification based on only two 0.5 m resolution images of about 6000 × 5000 pixels. A recent review reveals that 95.6% of published research on remote sensing land-cover image classification cover less than 300 ha and use small training sets (Ma et al. 2017). One potential reason is the lack of geospatial-oriented deep learning platforms available for academic research that support parallelization in a distributed environment. For example, DeepNetsForEO, an open source deep learning framework based on the SegNet architecture for semantic labeling of Earth observation images (Badrinarayanan et al. 2017; Audebert et al. 2018), only supports reading the entire training set into the computer memory, which is not scalable to large datasets. More research, from the geospatial big data and engineering perspectives, is urgently needed to develop high-performance, scalable GeoAI frameworks and platforms that take full advantage of geospatial big data to build bigger and better models. This can be achieved by integrating general-purpose deep learning

4 Geospatial Big Data Handling with High Performance Computing: Current…

69

p latforms, such as TensorFlow (Abadi et al. 2016), Caffe (Jia et al. 2014), and Apache SINGA (Ooi et al. 2015), with HPC technologies in the geospatial context, similar to adopting general-purpose big data platforms in Hadoop to handle geospatial big data. Specific research directions might include the development of efficient spatiotemporal indexing, domain decomposition, and scheduling approaches to parallelize a deep convolutional neural network in a distributed HPC environment.

6 Summary Geospatial big data is playing an increasingly important role in the big data era. Effectively and efficiently handling geospatial big data is critical to extracting meaningful information for knowledge discovery and decision making, and HPC is a viable solution. This chapter began with a brief introduction of geospatial big data and its sources and then discussed several key components of using HPC to handle geospatial big data. A review of current tools was then provided from four different aspects. Lastly, three research directions were discussed in the context of HPC and geospatial big data. HPC has been used for geospatial data handling for almost two decades (Armstrong 2000; Clarke 2003; Wang and Armstrong 2003) and is becoming more important in tackling geospatial big data challenges. Geospatial big data, in turn, brings new challenges and opportunities to HPC. It is evident that the interweaving of geospatial big data, cloud computing, fog computing, and artificial intelligence is driving and reshaping geospatial data science. High performance computing, with its fundamental divide-and-conquer approach to solving big problems faster, will continue to play a crucial role in this new era.

References Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (pp. 265–283). Abramova, V., & Bernardino, J. (2013, July). NoSQL databases: MongoDB vs Cassandra. In Proceedings of the International C∗ Conference on Computer Science and Software Engineering (pp. 14–22). New York: ACM Alarabi, L., Mokbel, M. F., & Musleh, M. (2018). St-Hadoop: A MapReduce framework for spatio-temporal data. GeoInformatica, 22(4), 785–813. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., & Mock, S. (2004, June). Kepler: an extensible system for design and execution of scientific workflows. In Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004 (pp. 423– 424). Piscataway, NJ: IEEE. Armstrong, M. P. (2000). Geography and computational science. Annals of the Association of American Geographers, 90(1), 146–156.

70

Z. Li

Asadi, R., & Regan, A. (2019). A spatial-temporal decomposition based deep neural network for time series forecasting. arXiv preprint arXiv:1902.00636. Ashton, K. (2009). That ‘Internet of Things’ thing. RFID Journal, 22(7), 97–114. Athanasis, N., Themistocleous, M., Kalabokidis, K., & Chatzitheodorou, C. (2018, October). Big Data Analysis in UAV Surveillance for Wildfire Prevention and Management. In European, Mediterranean, and Middle Eastern Conference on Information Systems (pp. 47–58). Cham: Springer. Audebert, N., Le Saux, B., & Lefèvre, S. (2018). Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks. ISPRS Journal of Photogrammetry and Remote Sensing, 140, 20–32. Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder- decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481–2495. Baumann, P., Dehmel, A., Furtado, P., Ritsch, R., & Widmann, N. (1999, September). Spatio- temporal retrieval with RasDaMan. In VLDB (pp. 746–749). Baumann, P., Mazzetti, P., Ungar, J., Barbera, R., Barboni, D., Beccati, A., et al. (2016). Big data analytics for earth sciences: The EarthServer approach. International Journal of Digital Earth, 9(1), 3–29. Baumann, P., Misev, D., Merticariu, V., Huu, B. P., Bell, B., Kuo, K. S., et al. (2018). Array databases: Concepts, standards, Implementations. Research Data Alliance (RDA) Working Group Report. Bhangale, U. M., Kurte, K. R., Durbha, S. S., King, R. L., & Younan, N. H. (2016, July). Big data processing using hpc for remote sensing disaster data. In 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (pp. 5894–5897). Piscataway, NJ: IEEE. Blumenfeld J. (2019). Getting petabytes to people: How EOSDIS facilitates earth observing data discovery and use. Retrieved May 1, 2019, from https://earthdata.nasa.gov/getting-petabytesto-people-how-the-eosdis-facilitates-earth-observing-data-discovery-and-use Bouziane, H. L., Pérez, C., & Priol, T. (2008, August). A software component model with spatial and temporal compositions for grid infrastructures. In European Conference on Parallel Processing (pp. 698-708). Berlin: Springer. Callahan, S. P., Freire, J., Santos, E., Scheidegger, C. E., Silva, C. T., & Vo, H. T. (2006, June). VisTrails: Visualization meets data management. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (pp. 745–747). New York: ACM. Camara, G., Assis, L. F., Ribeiro, G., Ferreira, K. R., Llapa, E., & Vinhas, L. (2016, October). Big earth observation data analytics: Matching requirements to system architectures. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (pp. 1–6). New York: ACM. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., et al. (2008). Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 4. Chen, X. W., & Lin, X. (2014). Big data deep learning: Challenges and perspectives. IEEE Access, 2, 514–525. Cheng, G., Zhou, P., & Han, J. (2016). Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 54(12), 7405–7415. Clarke, K. C. (2003). Geocomputation’s future at the extremes: High performance computing and nanoclients. Parallel Computing, 29(10), 1281–1295. Cudré-Mauroux, P., Kimura, H., Lim, K. T., Rogers, J., Simakov, R., Soroush, E., et al. (2009). A demonstration of SciDB: A science-oriented DBMS. Proceedings of the VLDB Endowment, 2(2), 1534–1537. De Mauro, A., Greco, M., & Grimaldi, M. (2015, February). What is big data? A consensual definition and a review of key research topics. In AIP Conference Proceedings (Vol. 1644, No. 1, pp. 97–104). College Park, MD: AIP.

4 Geospatial Big Data Handling with High Performance Computing: Current…

71

Dean, J. (2016). Large-scale deep learning for building intelligent computer systems. https://storage.googleapis.com/pub-tools-public-publication-data/pdf/44921.pdf Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. Desjardins, M. R., Hohl, A., Griffith, A., & Delmelle, E. (2018). A space–time parallel framework for fine-scale visualization of pollen levels across the Eastern United States. Cartography and Geographic Information Science, 46(5), 428–440. Ding, Y., & Densham, P. (1996). Spatial strategies for parallel spatial modelling. International Journal of Geographical Information Systems, 10(6), 669–698. https://doi.org/10.1080/02 693799608902104 Duffy, D., Spear, C., Bowen, M., Thompson, J., Hu, F., Yang, C., et al. (2016, December). Emerging cyber infrastructure for NASA’s large-scale climate data analytics. In AGU Fall Meeting Abstracts. Eldawy, A., & Mokbel, M. F. (2015, April). SpatialHadoop: A mapreduce framework for spatial data. In 2015 IEEE 31st International Conference on Data Engineering (pp. 1352–1363). Piscataway, NJ: IEEE. Eldawy, A., Mokbel, M. F., Alharthi, S., Alzaidy, A., Tarek, K., & Ghani, S. (2015, April). Shahed: A mapreduce-based system for querying and visualizing spatio-temporal satellite data. In 2015 IEEE 31st International Conference on Data Engineering (pp. 1585–1596). Piscataway, NJ: IEEE. Engélinus, J., & Badard, T. (2018). Elcano: A Geospatial Big Data Processing System based on SparkSQL. In Geographical Information Systems Theory, Applications and Management (GISTAM) (pp. 119–128). Esri. (2013). GIS tools for Hadoop. Retrieved April 25, 2019, from https://github.com/Esri/ gis-tools-for-hadoop Fahmy, M. M., Elghandour, I., & Nagi, M. (2016, December). CoS-HDFS: Co-locating geo- distributed spatial data in Hadoop distributed file system. In Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (pp. 123– 132). New York: ACM Gabriel, E., Fagg, G. E., Bosilca, G., Angskun, T., Dongarra, J. J., Squyres, J. M., et al. (2004, September). Open MPI: Goals, concept, and design of a next generation MPI implementation. In European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting (pp. 97–104). Berlin: Springer. Gong, J., Wu, H., Zhang, T., Gui, Z., Li, Z., You, L., et al. (2012). Geospatial service web: Towards integrated cyberinfrastructure for GIScience. Geo-spatial Information Science, 15(2), 73–84. Goodchild, M. F. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal, 69(4), 211–221. Google. (2019). Google BigQuery GIS. Retrieved April 25, 2019, from https://cloud.google.com/ bigquery/docs/gis-intro Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google earth engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. Gropp, W., Lusk, E., Doss, N., & Skjellum, A. (1996). A high-performance, portable implementation of the MPI message passing interface standard. Parallel Computing, 22(6), 789–828. Guan, Q. (2009). pRPL: An open-source general-purpose parallel Raster processing programming library. SIGSPATIAL Special, 1(1), 57–62. Guan, Q., Zhang, T., & Clarke, K. C. (2006, December). GeoComputation in the grid computing age. In International Symposium on Web and Wireless Geographical Information Systems (pp. 237–246). Berlin: Springer. Gudivada, V. N., Baeza-Yates, R., & Raghavan, V. V. (2015). Big data: Promises and problems. Computer, 3, 20–23.

72

Z. Li

Guo, Z., Fox, G., & Zhou, M. (2012, May). Investigation of data locality in mapreduce. In Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012) (pp. 419–426). Washington, DC: IEEE Computer Society. Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. ACM SIGMOD International Conference on Management of Data (Vol. 14, No. 2, pp. 47–57). Boston: ACM. Hansen, M. C., Potapov, P. V., Moore, R., Hancher, M., Turubanova, S. A. A., Tyukavina, A., et al. (2013). High-resolution global maps of 21st-century forest cover change. Science, 342(6160), 850–853. He, Z., Wu, C., Liu, G., Zheng, Z., & Tian, Y. (2015). Decomposition tree: A spatio-temporal indexing method for movement big data. Cluster Computing, 18(4), 1481–1492. Hohl, A., Delmelle, E. M., & Tang, W. (2015). Spatiotemporal domain decomposition for massive parallel computation of space-time Kernel density. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences (Vol. 2–4). International Workshop on Spatiotemporal Computing. July 13–15, 2015, Fairfax, VA. Hohl, A., Griffith, A. D., Eppes, M. C., & Delmelle, E. (2018). Computationally enabled 4D visualizations facilitate the detection of rock fracture patterns from acoustic emissions. Rock Mechanics and Rock Engineering, 51, 2733–2746. Hu, F., Xia, G. S., Hu, J., & Zhang, L. (2015). Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sensing, 7(11), 14680–14707. Hu, F., Yang, C., Schnase, J. L., Duffy, D. Q., Xu, M., Bowen, M. K., et al. (2018). ClimateSpark: An in-memory distributed computing framework for big climate data analytics. Computers & Geosciences, 115, 154–166. Huang, Z., Chen, Y., Wan, L., & Peng, X. (2017). GeoSpark SQL: An effective framework enabling spatial queries on spark. ISPRS International Journal of Geo-Information, 6(9), 285. Hughes, J. N., Annex, A., Eichelberger, C. N., Fox, A., Hulbert, A., & Ronquest, M. (2015, May). GeoMesa: A distributed architecture for spatio-temporal fusion. In Geospatial informatics, fusion, and motion video analytics V (Vol. 9473, p. 94730F). Washington, DC: International Society for Optics and Photonics. Hull, D., Wolstencroft, K., Stevens, R., Goble, C., Pocock, M. R., Li, P., et al. (2006). Taverna: A tool for building and running workflows of services. Nucleic Acids Research, 34(suppl_2), W729–W732. Internet Live Stats. (2019). Retrieved May 3, 2019, from https://www.internetlivestats.com/ twitter-statistics/ Iorga, M., Feldman, L., Barton, R., Martin, M., Goren, N., & Mahmoudi, C. (2017). The nist definition of fog computing (No. NIST Special Publication (SP) 800–191 (Draft)). National Institute of Standards and Technology. Jaeger, E., Altintas, I., Zhang, J., Ludäscher, B., Pennington, D., & Michener, W. (2005, June). A scientific workflow approach to distributed geospatial data processing using web services. In SSDBM (Vol. 3, No. 42, pp. 87–90). Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., et al. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (pp. 675–678). New York: ACM Katal, A., Wazid, M., & Goudar, R. H. (2013, August). Big data: issues, challenges, tools and good practices. In 2013 Sixth International Conference on Contemporary Computing (IC3) (pp. 404–409). Piscataway, NJ: IEEE. Kini, A., & Emanuele, R. (2014). Geotrellis: Adding geospatial capabilities to spark. Spark Summit. Kussul, N., Lavreniuk, M., Skakun, S., & Shelestov, A. (2017). Deep learning classification of land cover and crop types using remote sensing data. IEEE Geoscience and Remote Sensing Letters, 14(5), 778–782. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.

4 Geospatial Big Data Handling with High Performance Computing: Current…

73

Lee, K., & Kim, K. (2018, July). Geo-based image analysis system supporting OGC-WPS standard on open PaaS cloud platform. In IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium (pp. 5262–5265). Piscataway, NJ: IEEE. Li, J., Jiang, Y., Yang, C., Huang, Q., & Rice, M. (2013). Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs). Computers & Geosciences, 59, 78–89. Li, Z., Hodgson, M., & Li, W. (2018). A general-purpose framework for large-scale LiDAR data processing. International Journal of Digital Earth, 11(1), 26–47. Li, Z., Hu, F., Schnase, J. L., Duffy, D. Q., Lee, T., Bowen, M. K., et al. (2017c). A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce. International Journal of Geographical Information Science, 31(1), 17–35. Li, Z., Huang, Q., Carbone, G., & Hu, F. (2017b). A high performance query analytical framework for supporting data-intensive climate studies, computers. Environment and Urban Systems, 62(3), 210–221. Li, Z., Huang, Q., Jiang, Y., & Hu, F. (2019). SOVAS: A scalable online visual analytic system for big climate data analysis. International Journal of Geographic Information Science. https:// doi.org/10.1080/13658816.2019.1605073 Li, Z., Yang, C., Huang, Q., Liu, K., Sun, M., & Xia, J. (2017a). Building model as a service for supporting geosciences, computers. Environment and Urban Systems, 61(B), 141–152. Li, Z., Yang, C., Liu, K., Hu, F., & Jin, B. (2016). Automatic Scaling Hadoop in the Cloud for Efficient Process of Big Geospatial Data. ISPRS International Journal of Geo-Information, 5(10), 173. Li, Z., Yang, C., Yu, M., Liu, K., & Sun, M. (2015). Enabling big geoscience data analytics with a cloud-based, MapReduce-enabled and service-oriented workflow framework. PloS one, 10(3), e0116781. Li, Z., Yang, C. P., Wu, H., Li, W., & Miao, L. (2011). An optimized framework for seamlessly integrating OGC Web services to support geospatial sciences. International Journal of Geographical Information Science, 25(4), 595–613. Ling, F., & Foody, G. M. (2019). Super-resolution land cover mapping by deep learning. Remote Sensing Letters, 10(6), 598–606. Ma, L., Li, M., Ma, X., Cheng, L., Du, P., & Liu, Y. (2017). A review of supervised object-based land-cover image classification. ISPRS J. Photogramm. Remote Sens., 130, 277–293. Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C. et al. (2011). Big data: The next frontier for innovation, competition, and productivity (pp. 1–143). McKinsey Global Institute. Retrieved from: https://bigdatawg.nist.gov/pdf/MGI_big_data_full_report.pdf Marciniec, M. (2017). Observing world tweeting tendencies in real-time. Retrieved May 3, 2019, from https://codete.com/blog/observing-world-tweeting-tendencies-in-real-time-part-2 Minsky, M. (1961). Steps toward artificial intelligence. Proceedings of the IRE, 49(1), 8–30. Murphy, J. M., Sexton, D. M., Barnett, D. N., Jones, G. S., Webb, M. J., et al. (2004). Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature, 430, 768–772. Nvidia, C. U. D. A. (2011). Nvidia CUDA C programming guide. Nvidia Corporation, 120(18), 8. OGC. (2017). OGC announces a new standard that improves the way information is referenced to the earth. https://www.ogc.org/pressroom/pressreleases/2656 Ooi, B. C. (1987). Spatial kd-tree: A data structure for geographic database. In Datenbanksysteme in Büro, Technik und Wissenschaft (pp. 247–258). Berlin: Springer. Ooi, B. C., Tan, K. L., Wang, S., Wang, W., Cai, Q., Chen, G., et al. (2015, October). SINGA: A distributed deep learning platform. In Proceedings of the 23rd ACM International Conference on Multimedia (pp. 685–688). New York: ACM. Planthaber, G., Stonebraker, M., & Frew, J. (2012, November). EarthDB: scalable analysis of MODIS data using SciDB. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (pp. 11–19). New York: ACM

74

Z. Li

Ramsey, P. (2005). PostGis manual. Refractions Research Inc, 17. Rienecker, M. M., Suarez, M. J., Gelaro, R., Todling, R., Bacmeister, J., Liu, E., et al. (2011). MERRA: NASA’s modern-era retrospective analysis for research and applications. Journal of climate, 24(14), 3624–3648. Robinson. (2012). The storage and transfer challenges of Big Data. Retrieved November 25, 2015, from http://sloanreview.mit.edu/article/the-storage-and-transfer-challenges-of-big-data/ Sabeur, Z, Gibb, R., & Purss, M. (2019). Discrete global grid systems SWG. Retrieved March 13, 2019, from http://www.opengeospatial.org/projects/groups/dggsswg Samet, H. (1984). The quadtree and related hierarchical data structures. ACM Computing Surveys (CSUR), 16(2), 187–260. Schnase, J. L., Duffy, D. Q., Tamkin, G. S., Nadeau, D., Thompson, J. H., Grieg, C. M., ... & Webster, W. P. (2017). MERRA analytic services: Meeting the big data challenges of climate science through cloud-enabled climate analytics-as-a-service. Computers, Environment and Urban Systems, 61, 198–211. Shook, E., Hodgson, M. E., Wang, S., Behzad, B., Soltani, K., Hiscox, A., et al. (2016). Parallel cartographic modeling: A methodology for parallelizing spatial data processing. International Journal of Geographical Information Science, 30(12), 2355–2376. Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010, May). The hadoop distributed file system. In MSST (Vol. 10, pp. 1–10). Tan, X., Di, L., Deng, M., Fu, J., Shao, G., Gao, M., et al. (2015). Building an elastic parallel OGC web processing service on a cloud-based cluster: A case study of remote sensing data processing service. Sustainability, 7(10), 14245–14258. Tang, W., Feng, W., & Jia, M. (2015). Massively parallel spatial point pattern analysis: Ripley’s K function accelerated using graphics processing units. International Journal of Geographical Information Science, 29(3), 412–439. Taylor, I., Wang, I., Shields, M., & Majithia, S. (2005). Distributed computing with Triana on the grid. Concurrency and Computation: Practice and Experience, 17(9), 1197–1214. Taylor, R. C. (2010, December). An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics (BioMed Central), 11(12), S1. Thain, D., Tannenbaum, T., & Livny, M. (2005). Distributed computing in practice: The Condor experience. Concurrency and Computation: Practice and Experience, 17(2-4), 323–356. Tschauner, H., & Salinas, V. S. (2006, April). Stratigraphic modeling and 3D spatial analysis using photogrammetry and octree spatial decomposition. In Proceedings of the 34th Conference. Digital Discovery. Exploring New Frontiers in Human Heritage. Computer Applications and Quantitative Methods in Archaeology: CAA2006 (pp. 257–270). Fargo, ND. Unat, D., Dubey, A., Hoefler, T., Shalf, J., Abraham, M., Bianco, M., et al. (2017). Trends in data locality abstractions for HPC systems. IEEE Transactions on Parallel and Distributed Systems, 28(10), 3007–3020. VoPham, T., Hart, J. E., Laden, F., & Chiang, Y. Y. (2018). Emerging trends in geospatial artificial intelligence (geoAI): Potential applications for environmental epidemiology. Environmental Health, 17(1), 40. Vora, M. N. (2011, December). Hadoop-HBase for large-scale data. In Proceedings of 2011 International Conference on Computer Science and Network Technology (Vol. 1, pp. 601– 605). Piscataway, NJ: IEEE. Wang, F., Aji, A., Liu, Q., & Saltz, J. (2011). Hadoop-GIS: A high performance spatial query system for analytical medical imaging with MapReduce. Center for Comprehensive Informatics, Technical Report. Retrieved September 21, 2015, from https://pdfs.semanticscholar.org/578f/7 c003de822fbafaaf82f0dc1c5cf8ed92a14.pdf Wang, L., Chen, B., & Liu, Y. (2013a, June). Distributed storage and index of vector spatial data based on HBase. In 2013 21st International Conference on Geoinformatics (pp. 1–5). Piscataway, NJ: IEEE.

4 Geospatial Big Data Handling with High Performance Computing: Current…

75

Wang, S. (2008, November). GISolve toolkit: Advancing GIS through cyberinfrastructure. In Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (p. 83). New York: ACM Wang, S. (2010). A CyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Annals of the Association of American Geographers, 100(3), 535–557. Wang, S., Anselin, L., Bhaduri, B., Crosby, C., Goodchild, M. F., Liu, Y., et al. (2013b). CyberGIS software: A synthetic review and integration roadmap. International Journal of Geographical Information Science, 27(11), 2122–2145. Wang, S., & Armstrong, M. P. (2003). A quadtree approach to domain decomposition for spatial interpolation in grid computing environments. Parallel Computing, 29(10), 1481–1504. Ward, J. S., & Barker, A. (2013). Undefined by data: A survey of big data definitions. School of Computer Science, University of St Andrews, UK Whitby, M. A., Fecher, R., & Bennight, C. (2017, August). Geowave: Utilizing distributed key- value stores for multidimensional data. In International Symposium on Spatial and Temporal Databases (pp. 105–122). Cham: Springer. Widlund, O. B. (2009). Accommodating irregular subdomains in domain decomposition theory. In Domain decomposition methods in science and engineering XVIII (pp. 87–98). Berlin: Springer. Wulder, M. A., White, J. C., Loveland, T. R., Woodcock, C. E., Belward, A. S., Cohen, W. B., et al. (2016). The global Landsat archive: Status, consolidation, and direction. Remote Sensing of Environment, 185, 271–283. Xia, J., Yang, C., Gui, Z., Liu, K., & Li, Z. (2014). Optimizing an index with spatiotemporal patterns to support GEOSS Clearinghouse. International Journal of Geographical Information Science, 28(7), 1459–1481. Yang, C., Huang, Q., Li, Z., Liu, K., & Hu, F. (2017). Big data and cloud computing: Innovation opportunities and challenges. International Journal of Digital Earth, 10(1), 1–41. Yin, D., Liu, Y., Padmanabhan, A., Terstriep, J., Rush, J., & Wang, S. (2017, July). A CyberGIS- Jupyter framework for geospatial analytics at scale. In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact (p. 18). New York: ACM Yin, J., Foran, A., & Wang, J. (2013, October). DL-MPI: Enabling data locality computation for MPI-based data-intensive applications. In 2013 IEEE International Conference on Big Data (pp. 506–511). Piscataway, NJ: IEEE. Yoon, G., & Lee, K. (2015). WPS-based satellite image processing onweb framework and cloud computing environment. Korean Journal of Remote Sensing, 31(6), 561–570. Yu, J., Wu, J., & Sarwat, M. (2015, November). GeoSpark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (p. 70). New York: ACM. Yue, P., Gong, J., & Di, L. (2010). Augmenting geospatial data provenance through metadata tracking in geospatial service chaining. Computers & Geosciences, 36(3), 270–281. Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., et al. (2016). Apache spark: A unified engine for big data processing. Communications of the ACM, 59(11), 56–65. Zhang, C., Di, L., Sun, Z., Eugene, G. Y., Hu, L., Lin, L., et al. (2017, August). Integrating OGC Web Processing Service with cloud computing environment for Earth Observation data. In 2017 6th International Conference on Agro-Geoinformatics (pp. 1–4). Piscataway, NJ: IEEE. Zhang, C., Sargent, I., Pan, X., Li, H., Gardiner, A., Hare, J., et al. (2018). An object-based convolutional neural network (OCNN) for urban land use classification. Remote Sensing of Environment, 216, 57–70. Zhang, J., Pennington, D. D., & Michener, W. K. (2006, May). Automatic transformation from geospatial conceptual workflow to executable workflow using GRASS GIS command line modules in Kepler. In International Conference on Computational Science (pp. 912–919). Berlin: Springer.

76

Z. Li

Zhang, X., Song, W., & Liu, L. (2014, June). An implementation approach to store GIS spatial data on NoSQL database. In 2014 22nd International Conference on Geoinformatics (pp. 1–5). Piscataway, NJ: IEEE. Zhao, L., Chen, L., Ranjan, R., Choo, K. K. R., & He, J. (2016). Geographical information system parallelization for spatial big data processing: A review. Cluster Computing, 19(1), 139–152. Zheng, M., Tang, W., Lan, Y., Zhao, X., Jia, M., Allan, C., et al. (2018). Parallel generation of very high resolution digital elevation models: High-performance computing for big spatial data analysis. In Big data in engineering applications (pp. 21–39). Singapore: Springer. Zikopoulos, P., & Eaton, C. (2011). Understanding big data: Analytics for enterprise class hadoop and streaming data. New York: McGraw-Hill. Zikopoulos, P., Parasuraman, K., Deutsch, T., Giles, J., & Corrigan, D. (2012). Harness the power of big data the IBM big data platform. New York: McGraw-Hill.

Chapter 5

Parallel Landscape Visibility Analysis: A Case Study in Archaeology Minrui Zheng, Wenwu Tang, Akinwumi Ogundiran, Tianyang Chen, and Jianxin Yang

Abstract Viewshed analysis is one of geographic information system (GIS) applications commonly used in archaeology. With the availability of large and high- resolution terrain data, viewshed analysis offers a significant computational opportunity while also posing a challenge in GIS and landscape archaeology. Although a number of studies adopted high-performance and parallel computing (HPC) for handling the compute- and data-challenge of viewshed analysis, there are few archaeological studies involving HPC to address such a challenge. It could be because of the complexity of HPC techniques that are difficult for archaeologists to apply. Therefore, this study presents a simple solution to accelerate viewshed analysis for archaeological studies. It includes a parallel computing approach with shared-nothing parallelism (each computing node accesses some specific pieces of datasets, and it has own memory and storage). Moreover, it is powerful for handling compute- and data-intensive research in landscape archaeology. In addition, the unique features of visibility patterns (irregular, fragmented, and discontinuous) may introduce useful information for landscape archaeologists. Thus, we added fragmentation calculation following viewshed analysis to further examine the influence

M. Zheng (*) · W. Tang · T. Chen Center for Applied Geographic Information Science, The University of North Carolina at Charlotte, Charlotte, NC, USA Department of Geography and Earth Sciences, The University of North Carolina at Charlotte, Charlotte, NC, USA e-mail: [email protected] A. Ogundiran Department of Africana Studies, The University of North Carolina at Charlotte, Charlotte, NC, USA J. Yang School of Public Administration, China University of Geoscience (Wuhan), Wuhan, China © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_5

77

78

M. Zheng et al.

of visibility patterns. We draw our case study from the metropolitan area of Oyo Empire, West Africa (1600–1830 AD). This parallel computing approach used an equal-viewpoint decomposition strategy on a Windows-based computing cluster. Our results showed that our parallel computing approach significantly improve computing performance of viewshed fragmentation analysis. Also, the results of viewshed fragmentation analysis demonstrated that there exists a relationship between visibility patterns and terrain information (elevation). Keywords Viewshed analysis · Fragmentation · Parallel computing · Landscape archaeology

1 Introduction Since the early 1990s, archaeologists have been attracted to the potential qualities of geographic information systems (GIS) applications for studying visualscape. Visualscape refers to how humans and societies use visual senses to coordinate all other senses in their interactions with one another spatially and with the physical landscape through time (Llobera 2003). The standard GIS package for exploring these relationships is the viewshed procedure. Viewshed is important in archaeology because it has enabled archaeologists to use GIS to explore research questions about cognition and human agency in the past across different levels of social complexity. This is an important departure from environmental determinism that tends to dominate GIS applications in general (Kvamme 1999). In particular, viewshed has been utilized in archaeology to examine questions of territoriality by using the computational advantages of GIS—spatial and mathematical—to estimate the visible areas from a particular locus, defensibility of a site in relation to the surrounding landscape, and the degree of visual dominance of a location viz-a-viz other sites in the study area. Viewshed analysis has also been used to account for how social and cognitive factors affect settlement location decisions in the past. For example, by analyzing intervisibility among around 400 potentially defensive sites in Arizona, Haas and Creamer (1993) were able to determine that visualscape or lines of site constituted a major factor in settlement location among the Kayenta Anasazi, a Native American group that lived in the Black Mesa area of northeastern Arizona (USA) during the thirteenth century. The Kayenta Anasazi were monoagriculturists who cultivated corn but their decisions on where to locate their fields and settlement were not based only on access to fertile soils and water but also on intervisibility factors to enhance security, social relations, and spiritual piety. Viewshed analysis has also empowered archaeologists to explore how a place may be imbued with meanings and values to the extent that having a line of sight to that place becomes a primary factor for settlement location decisions (Gaffney and Van Leusen 1995).

5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology

79

The heuristic value of viewshed analysis in GIS studies has offered insights into how the landscape could be read as a contested social space. In this regard, the location of particular residential, religious, or military features on the most prominent landscape that enjoys the most lines of sight was often a form of social statement and a way of establishing social control over the surrounding regions (Lock and Harris 1996). One of the challenges that have faced the archaeological adaptation of viewshed analysis in GIS is how to account for the obstacles that may impair the line of sight between the observer location and the viewpoint. These obstacles could be both social and physical and may include vegetation, built environment, movement restriction, atmospheric conditions, and the ability of the observer. To this end, developing algorithms that address these obstacles has been an ongoing effort in GIS applications. Those algorithms have emphasized the imperatives of combining viewshed factors with other physiographic factors such as slope and aspect in the analysis of human visual space (James 2007). Multiple approaches to viewshed analysis have compounded the computational challenges of GIS applications in archaeology. This is because viewshed algorithms are usually based on raster-based terrain datasets that require an enormous amount of computational power. Moreover, studies pertaining to visualscapes generally cover a vast area and in irregular, fragmented, and discrete patches. In archaeology especially, the unit or point of data collection must be at a fine scale in order to have a reliable result. This means that the database for viewshed analysis is usually large. Several archaeological scholars have attempted to resolve some of these inherent problems of viewshed calculation (Fisher 1993; Llobera 2007). One solution focuses on developing GIS viewshed algorithms that concentrate on how much a viewpoint is visible at different viewpoint locations (visual exposure) as opposed to simply determining whether or not a location is visible (Llobera 2003). These computational algorithms are often coupled with mathematical models that are based on the fuzzy set theory (Fisher 1995; Wheatley and Gillings 2000). Other approaches have used computational simulations to explore multiple variables at multiscalar levels (Llobera 2007). Meanwhile, with the availability of high-resolution terrain data, viewshed analysis poses a significant computational challenge in archaeology and GIS communities. In GIS community, there exists a popular and effective technique that involves high-performance and parallel computing (HPC) to address the computational challenge (De Floriani et al. 1994; Zhao et al. 2013; Song et al. 2016). HPC environment consists of either a supercomputer or parallel computing (e.g., computer clusters) that provides more computation power for handling data- and compute-intensive applications (Wilkinson and Allen 1999). HPC usually can be achieved by pipelining or parallelism (Ding and Densham 1996). Pipelining divides an entire operation into a number of steps, each of which is executed by a processor and the order of individual steps is important. In other words, pipelining is the same as an assembly line, where the input of each step is the output from the previous one (Ding and Densham 1996). In parallelism, there typically exists a master processor (i.e., head node) and multiple worker processors (i.e., computing nodes or computing elements, such as CPUs) in the HPC. The master processor splits a task into

80

M. Zheng et al.

sub-tasks, and then send these sub-tasks to worker processors. Once these assigned sub-tasks are terminated, results of these sub-tasks are aggregated by the master processor (Ding and Densham 1996). Decomposition is a crucial part of parallelism because parallelism requires that a problem is divided into a set of sub-problems and the way of decomposition determines the computing performance (Ding and Densham 1996; Wilkinson and Allen 1999). HPC technique has been used in viewshed analysis, such as CPU-based viewshed analysis (Mills et al. 1992; Germain et al. 1996; Llobera et al. 2010), and GPU- based viewshed analysis (Xia et al. 2010; Chao et al. 2011; Zhao et al. 2013). Meanwhile, one of the key issues of HPC-based viewshed analysis is how to select the suitable decomposition method, which minimizes the computing time of calculating viewshed (De Floriani et al. 1994; Zhao et al. 2013; Song et al. 2016). Moreover, the calculation of viewshed runs independently based on the related terrain datasets, that is, the partition boundaries do not have any special requirements for viewshed analysis (Wu et al. 2007). Due to this characteristic, various solutions have been developed, which were based on block and irregular partition-based (Song et al. 2016). For example, De Floriani et al. (1994) used three decomposition criteria based on the locations of viewpoints, including equal-angle, quadrant-based, and equal-area. The results showed that the equal-area criterion can make the best use of computing resources. Wu et al. (2007) adopted block-based decomposition method in their study and concluded that the optimal blocks of the entire datasets should be loaded into the main memory of a computer only once. Song et al. (2016) proposed an equal-area DEM segmentation with free-shape boundaries domain decomposition method to handle null points in DEM dataset. However, a few studies in archaeology applied these proposed parallel computing approaches in the viewshed analysis because of the relative complex implementation algorithms and expensiveness of HPC techniques. An example was presented by Lake and Ortega (2013). They used a computer with 4 cores and 4 logical processors to handle the computational challenge of viewshed analysis.1 Another example was conducted by Llobera et al. (2010). In their study, they used Condor, which is a software system that supports high-throughput computing on collections of distributive computing resources (https://research.cs.wisc.edu/htcondor/), to accelerate viewshed analysis. The main reason for adopting Condor was that it works with a non-dedicated pool of computers and cheaper than building a dedicated set of computing resources. However, a set of limitations of Condor are: (1) the configurations of computing resources in the Condor pool may be different and (2) all computing resources are volunteers in the Condor pool and may be removed from the pool without any reasons. These limitations cause unpredictable computing time and may have similar computing time as a single computer (the worst scenario). Thus, a parallel computing approach that has a simple decomposition method and a

“For each stone circle/control point the second smaller geographical region is used to constrain the area that must be examined when one is computing the viewshed” (Lake and Ortega 2013, p. 224). 1

5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology

81

low-cost and efficient parallel computing technique is of necessity in landscape archaeology. Although viewshed analysis is a common method in landscape archaeology, most studies only consider total viewshed2 or cumulative viewshed3 (two results of viewshed analysis). However, few studies investigated visibility patterns (another result of viewshed analysis). Visibility patterns are irregular, fragmented, and discontinuous. The analysis of visibility patterns, also called landscape fragmentation analysis, may provide additional information for understanding the locations of monuments and settlements. Landscape fragmentation analysis adopts a number of quantitative descriptions or indexes to examine the visibility patterns. Typically, the locations of monuments and settlements were built in areas that have compact and physical connectedness visibility patterns. But, only a few archaeological studies took into account landscape fragmentation analysis. Two pioneering archaeologists used visibility patterns and the idea of landscape fragmentation analysis to investigate the settings of prehistoric stone circles (Lake and Ortega 2013). Although they did not find any linear or polynomial relationship between the locations of stone circles and visibility patterns, they found higher fragmented visibility patterns of stone circles located close to inland areas. The findings of their study indicate that landscape fragmentation analysis based on visibility patterns is meaningful in understanding the locations of monuments and settlements. Thus, landscape fragmentation analysis should add to conventional viewshed analysis in archaeological studies. This study focuses on bridging the gap of applying a simple and low-cost parallel fragmented viewshed analysis in archaeological studies and investigating the capabilities of landscape pattern analysis in understanding the locations of archaeological sites. A parallel computing approach is designed to address the compute- and data-intensity challenge of viewshed analysis. In the following sections, we first introduce our study area and data. Then we focus on our parallel computing approach and other related methods, such as the evaluation of computing performance and fragmentation analysis. After that, we show experiments and results. At last, we discuss the results and summarize the findings.

2 Study Area and Data We draw our case study from the metropolitan area of Oyo Empire (1600–1830 AD), Nigeria, West Africa (Fig. 5.1). The study area was an integrated and hierarchical urban landscape comprising of the capital of the Empire, Oyo-Ile. The area of the case study is 453.80 km2. Topography is constantly changing but in a very Total viewshed is calculating viewshed results from all points from a given terrain dataset (Llobera et al. 2010). 3 Cumulative viewshed is to calculate viewshed results from all viewpoints of interests on a given terrain dataset and then adding them together (Wheatley 1995). 2

82

M. Zheng et al.

Fig. 5.1 Map of DEM and study area

slow process. Our assumption is that the topography in this area maintains approximately similar in the past 200 years so that we can utilize the data collected in recent years to represent that in history. We used digital elevation models (DEM) data as our terrain dataset. The DEM dataset is retrieved from Shuttle Radar Topography Mission (SRTM; https://www2.jpl.nasa.gov/srtm) with a 30 m spatial resolution. The highest elevation of the DEM in our study area is 456 m, and most of those high elevation regions are located in the west of the study area. The elevation of east region in our study area is lower than the west region, the lowest elevation is 223 m. In this study area, the number of rows and columns of the DEM dataset are 1,099 and 442. We use the center of each cell as the observer point to calculate the viewshed. After removing null values, we have 478,197 observer points in this study. Furthermore, in order to get accurate viewshed results, the DEM data used in this study covers our study area and with extending 30 km from the study area (30 km is the maximal visible distance of human eyes that is used in this study), which

5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology

83

contains 7,340,706 cells (the number of rows and columns are 3,051 and 2,406). In other words, the viewshed result of each observer point is calculated based on the extended DEM data.

3 Methods The workflow of our parallel landscape visibility analysis is shown in Fig. 5.2. The workflow contains four steps: (1) pre-processing, (2) domain decomposition, (3) parallel computing, and (4) post-processing. Pre-processing focuses on acquiring terrain dataset and observer points from the raw data. Domain decomposition splits entire observer points into different subsets. The viewshed analysis and landscape fragmentation analysis (thereafter referred as fragmentation analysis) are running in the parallel computing step. Result aggregation executes in the post-processing.

Fig. 5.2 Workflow of parallel landscape visibility analysis in this study

84

M. Zheng et al.

3.1 Domain Decomposition Generally, parallel computing divides a task into a collection of sub-tasks based on decomposition methods. There are two types of partitioning methods: data decomposition and algorithm decomposition (Wilkinson and Allen 1999). In viewshed analysis, the calculation of viewshed is independent at each location of observer point. In addition, the task can be divided into completely independent parts because there is no spatial relation among sub-tasks of viewshed analysis and no communication among computing processors. This is the so-called embarrassingly parallel computing (Wilkinson and Allen 1999). Although there exists a suite of data decomposition methods (Ding and Densham 1996), we use equal-point domain decomposition method by evenly splitting all observer points into different subsets. Each subset is allocated to a computing element for computation. Figure 5.3 illustrates the equal-point domain decomposition method used in this study. The general equal-point domain decomposition method includes the following steps: we first decide the number of points (n) per subset and then evenly split N observer points into k subsets. In order to find appropriate n value, the authors suggest readers distribute the tasks more evenly and use trial-and- error approach.

3.2 Viewshed Analysis and Fragmentation Analysis Viewshed analysis has been examined from different aspects in previous studies, such as the development of improved viewshed algorithms (Fisher 1993; Chao et al. 2011; Osterman et al. 2014), addressing the computational issues of viewshed analysis (De Floriani et al. 1994; Wu et al. 2007; Zhao et al. 2013; Lake and Ortega 2013), or the generation of viewsheds on different terrain data (i.e., DEMs or

Fig. 5.3 Illustration of equal-point domain decomposition method (in this example, each subset contains 5 observer points)

5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology

85

triangulated irregular networks) (De Floriani et al. 1994). The surrounding topography has an influence on the calculation of viewshed. In 1979, Benedict first introduced a number of quantitative descriptions (e.g., geometric properties of viewsheds) of viewshed4 to describe the landscape. However, those quantitative indexes proposed by Benedict have been applied to a small number of studies, because the quantitative indexes can only represent local properties of the landscape and it is hard to interpret these indexes (Benedikt 1979; Turner et al. 2001). Turner et al. (2001) proposed a visibility graph to resolve these limitations of Benedict’s study and applied this method in an architectural space. Following Turner’s study, O’Sullivan and Turner (2001) applied visibility graph in a landscape region. The major benefit of visibility graph is that if the visibility graph of a landscape has been previously generated and stored, the landscape information of a given location can be rapidly retrieved and displayed (O’Sullivan and Turner 2001; Turner et al. 2001). However, the visibility graph posted three significant shortcomings. First, the visibility graph cannot generate and store landscape information from all observer points. Second, the computing time of visibility graph is highly related to the size of a study area—that is, larger study area needs more computing time. Third, visibility graph needs more memory space because the pre- processing of a landscape under the study area is involved. Although there exist a number of quantitative methods in analyzing properties of viewsheds in a landscape, archaeologies still focus on adopting straightforward methods, such as mapping techniques, and ignoring the relationships between viewsheds in their studies. For instance, Wheatley (1995) used cumulative viewsheds to determine the number of visible locations from ancient monuments. Lake and Ortega (2013) derived a fragmentation index to investigate the spatial pattern of viewsheds. However, the use of landscape fragmentation analysis in analyzing spatial patterns of viewsheds is still implicit. Landscape fragmentation analysis involves a collection of quantification of spatial heterogeneity. Landscape metrics quantify characteristics of spatial pattern at different levels (i.e., patch, class, and landscape; see McGarigal 2014 for details). The evaluation of spatial patterns using landscape metrics receives much attention, particularly in landscape ecology (Uuemaa et al. 2009). A variety of metrics were developed, such as shape complexity, richness, or diversity (McGarigal 2014). Moreover, a number of software packages are available (e.g., Fragstats) and some metrics have been integrated into GIS software (e.g., patch analyst in ArcView) (Uuemaa et al. 2009). Compared with graph-based landscape pattern analysis, landscape metrics have a set of advantages. For example, landscape metrics provide more measurements that can be fully interpreted by the local and global properties of spatial pattern using numerical results. In addition, landscape metrics are easy to implement for archaeologists because of a number of available software packages. In Benedict’ study, he used “isovist” instead of “viewshed.” The definition of “isovist” is “ taking away from the architectural or landscape site a permanent record of what would otherwise be dependent on either memory or upon an unwieldy number of annotated photographs” (Tandy 1967, p. 9). 4

86

M. Zheng et al.

However, few archaeological studies adopted landscape metrics in examining the characteristics of spatial pattern of viewsheds. Therefore, following Lake and Ortega’s study, we involve a set of fragmentation indexes in analyzing the spatial patterns of viewsheds. In this study, we calculate fragmentation indexes for each observer point, and we use the software Fragstats 4.2 to calculate fragmentation in each visibility pattern (McGarigal et al. 2012). Specifically, our viewshed analysis is calculated within a range of 30 km with respect to each observer point, and each observer point has its visibility pattern, which is stored in raster format. And then, we put the visibility patterns in Fragstats software to get the fragmentation index for each observer point. Fragstats provides a collection of metrics of spatial pattern analysis from patch, class, and landscape levels (McGarigal and Marks 1994). We choose two particular metrics to measure visibility patterns based on two different aspects (i.e., shape and connectedness), including mean perimeter-area ratio (PARA_MN) and patch cohesion index (COHESION). Both of them were calculated at the class level. Complete descriptions of these metrics and equations for their calculation are provided in McGarigal and Marks (1994).

3.3 Evaluation of Computing Performance There are two measurements for evaluating computing performance: speedup (sp) and efficiency (e). Speedup is the ratio of sequential computing time to parallel computing time for the same algorithm (Wilkinson and Allen 1999). The speedup (sp) is calculated as

sp = Ts / Tn

(5.1)

where Ts is the execution time using only one computing element (e.g., CPU), and Tn is the total running time elapsed using n computing elements. The closer the speedup to n, the computing performance is higher. The theoretical upper limit of speedup is n, when the speedup reaches n (same as the number of computing elements used; the so-called linear speedup). However, when the speedup reaches far beyond the theoretical upper limit, this situation is known as superlinear speedup (Wilkinson and Allen 1999). Efficiency (e) measures the proportion of execution time for which a computing element is fully utilized. Efficiency is defined as:

e = sp / n

(5.2)

where sp is the speedup, and n is the number of computing elements. Generally, efficiency decreases as the number of computing elements increases.

5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology

87

3.4 Implementation Our parallel computing approach was built on a Windows-based computing cluster. There are 34 computing nodes in our cluster, including 24 computing nodes with eight CPUs (in total 192 available CPUs; CPU clock rate: 3.4 GHz) and ten computing nodes with two CPUs (in total 20 available CPUs; CPU clock rate: 3.00 GHz). The interconnection network is connected by a gigabit network switch. In each computing node, the operating system, job scheduling software, and GIS software were installed, which are Windows Server 2016, Microsoft HPC Pack 2016, and ArcGIS 10.6, respectively. In order to control the size of datasets, we used geodatabases to organize the input (DEM dataset and viewpoints) and outputs (visibility patterns and fragmentation results). Moreover, Python script was used to automatically calculate viewshed and fragmentation results and handle the post-processing of outputs. The total number of observer points in our study area is 478,197. The number of observer points per group was 1,000. This means that 479 jobs were submitted to the computing cluster. We used computing nodes with same configurations (each computing node with eight CPUs) for the purpose of computing performance evaluation.

4 Results 4.1 Viewshed Analysis The viewshed analysis used the viewshed tool from spatial analyst toolbox in ArcMap 10.6. We presented the number of visible cells in Fig. 5.4a. As shown in Fig. 5.4a, the observer point with relatively higher elevation in the visual field (30 km) corresponds to higher number of visible cells,5 whereas those with relatively lower elevations have lower number of visible cells. Generally, we can see that the number of visible cells is basically correlated with elevation. However, our results showed an interesting phenomenon: the number of visible cells in the east part of our study area (low elevation region; see red boxes in Fig. 5.4a) was higher. However, this finding is against the common sense, that is, the highest elevation has the largest number of visible cells. This is because the visible areas of an observer point are determined by not only its elevation, but also its surrounding terrain. Even though the observer point has the highest elevation, the number of visible cells for this observer point may not have the largest value.

The number of visible cells stands for how many cells can be seen from other cells.

5

88

M. Zheng et al.

Fig. 5.4 Maps of the number of visible cells (a), DEM (b), mean perimeter-area ratio (PARA_ MN) (c; landscape fragmentation index), and patch cohesion index (COHESION) (d; landscape fragmentation index)

5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology

89

4.2 Fragmentation Analysis We used two landscape metrics, PARA_MN (mean perimeter-area ratio) and COHESION (patch cohesion index), to analyze the landscape fragmentation based on visibility patterns in our case study. PARA_MN is a shape metric that measures the geometric complexity of patches (McGarigal and Marks 1994). The range of PARA_MN is from zero to infinity, that is, as PARA_MN increases, the shapes of visibility pattern become more complex. On the other hand, the lower the value, the shape of visibility patterns is more compact. As shown in Fig. 5.4c, the shapes of most visibility patterns are complex. Specifically, most complex shapes concentrated in the lower elevation regions. In this case study, the ranges of PARA_MN were from 233.15 to 1239.46. The distribution of PARA_MN was skewed toward large value of PARA_MN (Fig. 5.5a). The average (870.41) is slightly smaller than the median (879.71). Thus, most shapes of the visibility patterns were non-compact in our study area. COHESION is an aggregation metric that evaluates the level of connectivity of patches (McGarigal and Marks 1994). The range of COHESION is from 0 to 100. COHESION increases (i.e., tends to 100) as the patches become more physically connected. Otherwise, the patches become more isolated when COHESION approaches zero. From our result, we can see that the spatial distribution of COHESION values is similar to that of the number of visible cells (Fig. 5.4d). That is, small values of COHESION are along with the lowest elevations, whereas large values of COHESION concentrate around high elevations. In other words, the areas with higher elevations are more aggregated. Figure 5.5b showed that COHESION values were from 63.50 to 99.31. The distribution of COHESION was negatively skewed distribution, such that the median value (96.52) was slightly larger than the mean value (95.95) in this study. In this case study, the visibility pattern showed more physically connected characteristics. The ground-truthing of the settlement model and predictive viewshed analysis with archaeological survey shows that settlement distribution across the landscape seems to be distributed across multiple elevation zones during the pre-imperial period, especially between the thirteenth and sixteenth century. The settlement of that period, belonging to what has been named Stonemarker period, consisted mostly of dispersed homesteads and hamlets (Ogundiran 2019). These were located along with the major drainage systems and different soil types (lateritic clay and sandy soils). Elevation appears not to be a major consideration for settlement location. We concluded that viewshed was not a significant consideration for settlement location during the pre-imperial period. However, the settlement strategy changed during the imperial period which began ca. 1560 and was rapidly consolidated by 1650. At this period, the settlements at the base of these hills were larger, nucleated, and compact towns. So far, our archaeological survey has demonstrated that the higher the elevation, the more likely an imperial period settlement or activity will be found there. The ceramics and structural remains found on the summit and escarpment of these hills attest to the fact that the hills were actively patrolled to watch for

90

M. Zheng et al.

Fig. 5.5 Histograms of viewshed fragmentation (a: mean perimeter-area ratio (PARA_MN); b: patch cohesion index (COHESION))

movements from far and close distances, to protect the inhabitants of the cities, and to monitor the inflow and outflow of populations. All of these show that excellent visibility was an integral part of the reasons for the imperial-period settlement location and that visibility was central to the strategy of security during the Oyo Imperial Period. A follow-up archaeological survey and excavations that target those areas will likely yield more information about the role of viewshed fragmentation in settlement location and spatial configuration of activity areas within the broad landscape.

5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology

91

4.3 Computing Performance In this study, we use 24 computing nodes that have the same configuration (see Sect. 3.4 for details) to conduct the landscape visibility analysis. The computing results obtained using equal-point decomposition method are presented in Table 5.1. Total computing times (including viewshed analysis, fragmentation analysis, and data reading and writing), speedups, and efficiency were reported. Compared with the sequential computing result (one CPU involved), the results presented an increased performance pattern. That is, the total computing time decreases with increasing the number of CPUs. The total computing time decreased from 91.5 days with one CPU to around 15 h with 190 CPUs. The general pattern of speedup ratio had an opposite pattern to the total computing time, which was an increasing pattern. The reason is that sub-tasks of viewshed analysis are independent and there is no communication among different CPUs. In addition, a linear speedup pattern was also observed when the number of CPUs is less than 40 (Fig. 5.6). But, the linear speedup pattern was no longer preserved with larger number of CPUs used. This is because reading data from disk is more frequent with larger number of CPUs, which produces more computer I/O time. In other words, there is an optimum number of CPUs involved to keep the

Table 5.1 Computing performance of parallel algorithm (time unit: seconds) #CPUs 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

Time 7,613,597 1,531,053 770,906 516,111 387,968 323,191 262,234 227,037 198,737 183,272 164,924 148,847 133,185 131,254 115,895 116,716 103,318 101,824 99,704 97,747

Speedup – 4.97 9.88 14.75 19.62 23.56 29.03 35.53 38.31 41.54 46.16 51.15 57.17 58.01 65.69 65.23 73.69 74.77 76.36 77.89

Efficiency – 0.99 0.99 0.98 0.98 0.94 0.97 0.96 0.96 0.92 0.92 0.93 0.95 0.89 0.94 0.87 0.92 0.88 0.85 0.82

#CPUs 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190

Time 83,891 85,099 83,146 83,553 69,651 68,623 68,788 70,704 67,896 67,919 68,488 68,994 52,369 52,198 52,016 51,607 52,375 54,037 52,112

Speedup 90.66 89.47 91.57 91.12 109.31 110.95 110.68 107.68 112.14 112.10 111.17 110.35 145.38 145.86 146.37 147.53 145.37 140.90 146.10

Efficiency 0.91 0.85 0.83 0.79 0.91 0.89 0.85 0.80 0.80 0.77 0.74 0.71 0.91 0.88 0.86 0.84 0.81 0.76 0.77

92

M. Zheng et al.

Fig. 5.6 Computing time and speedup for different numbers of CPUs

computing time to a minimum. 160 CPUs were the optimum number in this study because the time required and speedup ratio were almost the same with increase in the number of CPUs. Our results show improvement in computational performance based on the execution time of viewshed analysis. This improvement is straightforward to understand because the computation tasks are distributed among different computing nodes and simultaneously calculated the results. Increase in the number of computing nodes leads to upgraded performance of the execution time. However, it gives worse performance of the speedup. The major reason is that reading and writing data from disk are more frequent.

5 Concluding Discussion GIS-based viewshed analysis is a widespread GIS application in archaeology. The primary contribution of this study includes a low-cost parallel computing approach and a simple decomposition strategy to address the compute- and data-intensive issues of viewshed analysis. This low-cost and straightforward parallel computing approach provides a piece of strong evidence that GIS-based viewshed analysis could become a common part of landscape archaeological studies, and archaeologists can draw an inference from entire datasets rather than using part of the datasets. Meanwhile, viewshed fragmentation results presented a detailed investigation

5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology

93

of visibility patterns, which give us an alternative way for future studies in archaeology. Computational experiments were conducted on a Windows-based computing cluster. The sequential computing time (one CPU used) was 91.5 days, whereas the computing time decreased to 15 h (190 CPUs involved) when applied our parallel computing approach. Compared with previous parallel viewshed analysis in archaeology, our results had a better performance than others did. For instance, in Lake and Ortega’s study, they calculated the 29,624 observer points based on 50 m × 50 m spatial resolution using four CPUs, it took them approximately 425 h (Lake and Ortega 2013). However, our parallel approach only needs around 34.32 h (estimated computation time) with four CPUs to calculate 29,624 observer points. Moreover, we use a finer spatial resolution (30 m × 30 m) elevation dataset. From the fragmentation analysis results, we conclude that viewshed fragmentation has a relationship with the elevation information. That is, higher physical connectedness is also located in high elevation regions, and complex shapes of visibility patterns concentrated in the lower elevation regions. Moreover, our viewshed fragmentation results showed that the shape complexity of visibility patterns was similar in different observer points (most PARA_MN values located within 822.78 to 927.78) in our study area. Meanwhile, the visible areas presented a continual pattern because over 90% of COHESION values were greater than 90, which means higher physical connectedness. However, future viewshed fragmentation analysis should focus on the total number of points contained within the viewshed of each of these elevations and determine the number of points or areas that overlap between two of these elevations and among all of them. Future research directions of viewshed analysis include: (1) adding viewshed fragmentation as a conventional part of archaeological studies; (2) more detailed investigation on fragmentation indexes, such as fractal dimension index or number of patches; (3) using open source GIS software, such as QGIS; and (4) further improving the computing performance by using advanced multicore HPC resources.

Appendix See Table 5.2.

94

M. Zheng et al.

Table 5.2 Computing performance of each analysis step over different CPUs (time unit: seconds) #CPUs 1 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 165 170 175 180 185 190

Viewshed analysis 4,855,088 975,781 492,684 330,079 248,239 207,080 168,058 145,462 128,313 119,216 106,498 96,192 85,518 84,507 74,735 75,280 67,756 66,320 64,284 62,669 54,309 55,478 53,574 54,187 45,202 44,784 44,760 46,742 44,071 43,966 44,682 45,589 34,437 34,011 34,049 33,523 34,293 36,061 34,045

Fragmentation analysis 804,628 165,170 80,916 53,999 40,543 33,688 27,080 23,753 20,299 18,614 16,903 15,242 13,595 13,564 11,899 11,869 10,221 10,209 10,195 10,142 8,499 8,513 8,520 8,481 6,843 6,808 6,812 6,847 6,830 6,824 6,794 6,782 5,152 5,127 5,157 5,139 5,150 5,135 5,134

I/O time 1,953,881 393,702 197,306 132,033 99,186 82,423 67,097 57,823 50,125 45,442 41,524 37,413 34,073 33,183 29,261 29,566 25,342 25,295 25,224 24,937 21,173 21,107 21,052 20,885 17,606 17,031 17,216 17,115 16,995 17,129 17,012 16,623 12,779 13,060 12,810 12,945 12,932 12,841 12,933

Total time 7,613,597 1,531,053 770,906 516,111 387,968 323,191 262,234 227,037 198,737 183,272 164,924 148,847 133,185 131,254 115,895 116,716 103,318 101,824 99,704 97,747 83,891 85,099 83,146 83,553 69,651 68,623 68,788 70,704 67,896 67,919 68,488 68,994 52,369 52,198 52,016 51,607 52,375 54,037 52,112

5 Parallel Landscape Visibility Analysis: A Case Study in Archaeology

95

References Benedikt, M. L. (1979). To take hold of space: Isovists and isovist fields. Environment and Planning B: Planning and design, 6(1), 47–65. Chao, F., Yang, C., Zhuo, C., Xiaojing, Y., & Guo, H. (2011). Parallel algorithm for viewshed analysis on a modern GPU. International Journal of Digital Earth, 4(6), 471–486. Ding, Y., & Densham, P. J. (1996). Spatial strategies for parallel spatial modelling. International Journal of Geographical Information Systems, 10(6), 669–698. Fisher, P. F. (1993). Algorithm and implementation uncertainty in viewshed analysis. International Journal of Geographical Information Science, 7(4), 331–347. Fisher, P. F. (1995). An exploration of probable viewsheds in landscape planning. Environment and Planning B: Planning and Design 22(5), 527–546. Floriani, D., Leila, C. M., & Scopigno, R. (1994). Parallelizing visibility computations on triangulated terrains. International Journal of Geographical Information Systems, 8(6), 515–531. Gaffney, V., & Van Leusen, M. (1995). Postscript-GIS, environmental determinism and archaeology: A parallel text. In Archaeology and Geographical Information Systems: A European Perspective (pp. 367–382). London: Taylor and Francis. Germain, D., Laurendeau, D., & Vézina, G. (1996). Visibility analysis on a massively data-parallel computer. Concurrency: Practice and Experience, 8(6), 475–487. Haas, J., & Creamer, W. (1993). Stress and warfare among the Kayenta Anasazi of the thirteenth century AD. Fieldiana. Anthropology, 21, 149–211. James, N. N. (2007). Using enhanced GIS surface analysis in landscape archaeology: A case study of the hillforts and defended enclosures on Gower, Wales. Postgraduate Certificate School of Archaeology and Ancient History University of Leicester. Kvamme, K. L. (1999). Recent directions and developments in geographical information systems. Journal of Archaeological Research, 7(2), 153–201. Lake, M., & Ortega, D. (2013). Compute-intensive GIS visibility analysis of the settings of prehistoric stone circles. Computational Approaches to Archaeological Spaces, 60, 213. Llobera, M., Wheatley, D., Steele, J., Cox, S., &Parchment, O. (2010). Calculating the inherent visual structure of a landscape (inherent viewshed) using high-throughput computing. Beyond the artefact: Digital Interpretation of the Past: Proceedings of CAA2004, Prato, April 13–17, 2004 (pp. 146–151). Budapest, Hungary: Archaeolingua. Llobera, M. (2003). Extending GIS-based visual analysis: the concept of visualscapes. International Journal of Geographical Information Science, 17(1), 25–48. Llobera, M. (2007). Reconstructing visual landscapes. World Archaeology, 39(1), 51–69. Lock, G. R., & Harris, T. M. (1996). Danebury revisited: An English Iron Age Hillfort in a digital landscape. In Anthropology, space, and geographic information systems (pp. 214–240). Oxford, UK: Oxford University Press. McGarigal, K. (2014). Landscape pattern metrics. Wiley StatsRef: Statistics Reference Online. McGarigal, K., Cushman, S. A., & Ene, E. (2012). FRAGSTATS v4: Spatial pattern analysis program for categorical and continuous maps (Computer software program produced by the authors at the University of Massachusetts, Amherst). Retrieved from: http://www.umass.edu/ landeco/research/fragstats/fragstats.html McGarigal, K., & Marks, B. J. (1994). Fragstats: Spatial pattern analysis program for quantifying landscape structure. Reference manual. Forest Science Department. Corvallis, OR: Oregon State University. Mills, K., Fox, G., & Heimbach, R. (1992). Implementing an intervisibility analysis model on a parallel computing system. Computers & Geosciences, 18(8), 1047–1054. Ogundiran, A. (2019). The Oyo Empire archaeological research project (Third Season): Interim report of the fieldwork in Bara, Nigeria. January 11–February 15, 2019. Submitted to the Nigerian National Park Service, Abuja. August 12, 2019.

96

M. Zheng et al.

Osterman, A., Benedičič, L., & Ritoša, P. (2014). An IO-efficient parallel implementation of an R2 viewshed algorithm for large terrain maps on a CUDA GPU. International Journal of Geographical Information Science, 28(11), 2304–2327. O’Sullivan, D., & Turner, A. (2001). Visibility graphs and landscape visibility analysis. International Journal of Geographical Information Science, 15(3), 221–237. Song, X.-D., Tang, G.-A., Liu, X.-J., Dou, W.-F., & Li, F.-Y. (2016). Parallel viewshed analysis on a PC cluster system using triple-based irregular partition scheme. Earth Science Informatics, 9(4), 511–523. Tandy, C. R. V. (1967). The isovist method of landscape survey. In H. C. Murray (Ed.), Symposium on methods of landscape analysis (pp. 9–10). London: Landscape Research Group. Turner, A., Doxa, M., O’Sullivan, D., & Penn, A. (2001). From isovists to visibility graphs: A methodology for the analysis of architectural space. Environment and Planning B: Planning and Design, 28(1), 103–121. Uuemaa, E., Antrop, M., Roosaare, J., Marja, R., & Mander, Ü. (2009). Landscape metrics and indices: An overview of their use in landscape research. Living Reviews in Landscape Research, 3(1), 1–28. Wheatley, D. (1995). Cumulative viewshed analysis: A GIS-based method for investigating intervisibility, and its archaeological application. In Archaeology and GIS: A European perspective (pp. 171–186). London: Routledge. Wilkinson, B., & Allen, M. (1999). Parallel programming (Vol. 999). Upper Saddle River, NJ: Prentice Hall. Wheatley, D., & Gillings, M. (2000). “Vision, perception and GIS: developing enriched approaches to the study of archaeological visibility.” Nato Asi Series a Life Sciences, 321, 1–27. Wu, H., Mao, P., Yao, L., & Luo, B. (2007). A partition-based serial algorithm for generating viewshed on massive DEMs. International Journal of Geographical Information Science, 21(9), 955–964. Xia, Y., Yang, L., & Shi, X. (2010). Parallel viewshed analysis on GPU using CUDA. 2010 Third International Joint Conference on Computational Science and Optimization. Zhao, Y., Padmanabhan, A., & Wang, S. (2013). A parallel computing approach to viewshed analysis of large terrain data using graphics processing units. International Journal of Geographical Information Science, 27(2), 363–384.

Chapter 6

Quantum Computing for Solving Spatial Optimization Problems Mengyu Guo and Shaowen Wang

Abstract Ever since Shor’s quantum factoring algorithm was developed, quantum computing has been pursued as a promising and powerful approach to solving many computationally complex problems such as combinatorial optimization and machine learning. As an important quantum computing approach, quantum annealing (QA) has received considerable attention. Extensive research has shown that QA, exploiting quantum-mechanical effects such as tunneling, entanglement and superposition, could be much more efficient in solving hard combinatorial optimization problems than its classical counterpart—simulated annealing. Recent advances in quantum annealing hardware open the possibility of empirical testing of QA against the most challenging computational problems arising in geospatial applications. This chapter demonstrates how to employ QA to solve NP-hard spatial optimization problems through an illustrative example of programming a p-median model and a case study on spatial supply chain optimization. The research findings also address the shortand long-term potential of quantum computing in the future development of high- performance computing for geospatial applications. Keywords Quantum computing · Spatial optimization · Quantum annealing p-median

M. Guo University of California, Berkeley, CA, USA S. Wang (*) University of Illinois at Urbana-Champaign, Urbana, IL, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_6

97

98

M. Guo and S. Wang

1 Introduction Spatial optimization involves the use of mathematical models and computational approaches to search for the best solutions of spatial decision problems that are ubiquitous in our everyday lives. For example, how should emergency vehicles be distributed across a road network to ensure an adequate coverage and a quick response to incidents (Geroliminis et al. 2009)? How do we design a system of charging stations to minimize the total travel distance of electric vehicles (Zhu et al. 2016)? Other recognizable examples range from land use planning, facility location to school and voting area redistricting (Tong and Murray 2012; Williams 2002; Church 1999). Classical spatial optimization problems are usually solved with mathematical programming techniques (Tong and Murray 2012). A generic form of spatial optimization problems can be defined as follows:

Minimize ( or Maximize ) f ( x ) ,

Subject to gi ( x ) ≤ bi , ∀i,

x conditions.

(1) (2) (3)

where x is a vector of decision variables with conditions (3) on real, integer, or binary requirements, and/or non-negativity stipulations. f(x) is the objective function to be minimized or maximized, often reflecting goals to be achieved in a specific problem context. Constraints (2) make sure that only the solutions satisfying all constraints are considered as feasible solutions. Numerous spatial optimization models exist in the literature and a representative sample is provided in Table 6.1 (Laporte et al. 2015; Drezner and Hamacher 2002; Mladenović et al. 2007; Burkard 1984). While finding optimal solutions of spatial optimization models is of vital importance to real-world applications, this task is computationally complex. In Table 6.1 Representative examples of spatial optimization models Name P-median model

Problem description Locating p facilities so that the total weighted distance of serving all demand is minimized P-center model Locating p facilities so that the maximal weighted distance of serving all demand is minimized Quadratic assignment Assigning N facilities to N locations so that the total weighted model assignment cost is minimized Maximal covering Locating N facilities so that the largest proportion of demand is covered model within the desired service distance Set covering model Locating a minimum number of facilities so that all demand is covered within a coverage radius

99

6 Quantum Computing for Solving Spatial Optimization Problems

computational complexity theory, all of the models listed in Table 6.1 belong to the class of NP-hard problems and it is believed that for such problems there exists no classical polynomial-time algorithm (Karp 1975; Papadimitriou and Steiglitz 1998; Smith-Miles and Lopes 2012). This conclusion is straightforward if we consider the problem of deploying emergency vehicles across a road network for responding to possible incidents. For an instance of k emergency vehicles and n candidate deployment locations, the number of potential solutions is C(n, k), i.e. the k-combinations of n elements, which grows exponentially with n (Robbins 1955). Table 6.2 shows C(n, k) for a selection of small values of n and k, to provide a sense of magnitude. Consequently, to find the optimal solution, classical methods need to enumerate all potential solutions, which will surely end up with “combinatorial explosion” in both algorithmic runtime and physical memory of classical computers. In order to tackle the computational challenges of NP-hard spatial optimization problems, high-performance computing (HPC) has been extensively studied. Exploiting high-performance and massively parallel computer architecture, various parallel algorithms such as parallel genetic algorithms (Liu and Wang 2015; Liu et al. 2016), parallel simulated annealing algorithms (Lou and Reinitz 2016) were developed for challenging spatial optimization problems such as the generalized assignment problem and political redistricting problem and linear speedups on large problem instances were obtained with desirable scalability as the number of processor cores increases. However, as the performance of classical computer is approaching a physical limit following the Moore’s law (Lundstrom 2003), fundamentally new approaches are needed for high-performance computing in order to achieve breakthroughs for solving NP-hard spatial optimization problems. During the past few decades, quantum computing has been advanced significantly in both theoretical and empirical aspects with the aim of solving problems believed to be intractable on classical computers. Notable examples include integer factoring, where Shor’s quantum factoring algorithm yields an exponential speedup over any known classical algorithms (Shor 1997); combinatorial search, where a quantum search algorithm gives a quadratic speedup (Grover 1997; Hogg 1998); and combinatorial optimization, where both theoretical and numerical evidences of significant quantum speedup have been observed (Biswas et al. 2017). Besides the development of quantum computing algorithms, hardware (i.e. quantum computer) Table 6.2 C(n/k) for a selection of small values of n and k k (n) 2 3 5 7 9 … 15

5 10 10 1

6 15 20 6

…

…

… … … … … … … …

10 45 120 252 120 10 …

… … … … … … … …

25 300 2300 53,130 480,700 2.0×106 … 3.2×106

… … … … … … … …

50 1225 19,600 2.2×106 9.9×107 2.5×109 … 2.2×1012

100

M. Guo and S. Wang

that utilizes quantum effects to enable high-performance computing has also begun to emerge, such as D-Wave’s 2000-qubit annealing-based machine, Google’s 72-qubit gate-based machine. All of these impressive results raise the question of whether quantum computing could be employed for NP-hard spatial optimization problems and how. In this chapter, we apply a quantum computing approach, i.e., quantum annealing, to solve an NP-hard spatial optimization problem. In Sect. 2, the background of quantum annealing is introduced in both theoretical and empirical aspects. In Sect. 3, the details of programming spatial optimization models in state-of-the-art quantum annealing hardware are illustrated through the example of a p-median model. In Sect. 4, quantum annealing is applied to solve a real-world biomass-to-biofuel supply chain optimization problem. In Sect. 5, conclusions and future research directions are provided.

2 Background of Quantum Annealing Quantum annealing is a quantum computing approach to finding the ground state (global minimum or maximum equivalently) over a rugged energy (cost) landscape, which lies at the core of solving hard optimization problems (Tayarani et al. 2014). The power of quantum annealing comes from encoding information in a non- classical way, in quantum bits (i.e. qubits) that enable computations to take advantage of purely quantum effects, such as superposition and quantum tunneling (Nielsen and Chuang 2001; Rieffel and Polak 2011). Unlike the classical binary bit, which has to be in one state or the other (i.e. 0 or 1), quantum mechanics allows a qubit to be in a linear superposition of basis states at the same time:

ϕ =α 0 +β 1

(4)

where α and β are probability amplitudes satisfying |α|2 + |β|2 = 1. This means when we measure this qubit in the standard basis |0〉 and |1〉, the probability of outcome |0〉 is |α|2 and the probability of outcome |1〉 is |β|2. Superposition provides a fundamental computational advantage to quantum annealing. For example, given a system of 3 qubits, the state of the system can be prepared in a superposition of 23 = 8 basis states with equal amplitudes:

ϕ =

1 3

2

( 000

+ 001 + 010 + 011 + 100 + 101 + 110 + 111

)

(5)

Thus, a calculation performed on |φ〉 is performed on each of the 8 basis states in the superposition, which implies that quantum annealing can search for the optimal solution in a parallel way starting from all potential solutions (Farhi et al. 2001;

6 Quantum Computing for Solving Spatial Optimization Problems

101

Kadowaki and Nishimori 1998). Existing theoretical results demonstrate that, by means of quantum tunneling (i.e. the quantum mechanical phenomenon where a particle tunnels through an energy barrier that it classically could not surmount), quantum annealing can explore rugged energy landscape more efficiently than classical methods, such as the classical simulated annealing (Das et al. 2005; Denchev et al. 2016; Kirkpatrick et al. 1983). Besides the potential theoretical computational advantage, quantum annealing is also recognized as a feasible computational paradigm for realizing large-scale quantum computing. By adopting a sparsely connected hardware topology (i.e. Chimera graph, see Fig. 6.1 for an illustration), recent quantum annealing hardware (i.e. the D-Wave 2000Q™ system) has reached a scale of approximately 2000 qubits, and it has shown great potential for solving real-world complex problems,

Fig. 6.1 (a) A unit cell with 8 physical qubits and 16 couplers; (b) the hardware topology of the D-Wave II system: consisting of 8×8 grid of unit cells. Imperfections in the fabrication process lead to inactive qubits (e.g., 502 of the 512 qubits are active in this chip)

102

M. Guo and S. Wang

such as optimization (Neukart et al. 2017; Rieffel et al. 2015; Stollenwerk et al. 2017), machine learning (Li et al. 2018; Mott et al. 2017) and sampling problems (Benedetti et al. 2017). With recent significant advances in both the theory and hardware aspects of quantum annealing, we are now on the cusp of being able to tackle NP-hard spatial optimization problems in a quantum computing manner, promising a fundamental breakthrough.

3 P rogramming Spatial Optimization on Quantum Annealing Computer 3.1 Overview Because of the fundamental difference in working mechanism, programming on a quantum annealing computer is very different from programming a classical computer. In this section, we introduce how to program with a D-Wave system, which is by now one of the few commercially available quantum annealing computers. For readers’ interests, we focus on the programming aspect. The technical specifications of the D-Wave system can be found in (Bian et al. 2010). Overall, the quantum annealing programming process consists of two interrelated models (see Fig. 6.2): physical model at hardware level and original model at user level. At hardware level, the D-Wave system is designed to find the lowest energy of a quantum system, the well-known Ising model (Bian et al. 2010). In the Ising model, let qi represents a qubit that can take values ±1, N represents the number of all qubits in the system. The energy of the quantum system made up of a collection of qubits q = [q1, q2, …, qN] is then modeled as:

Minimize : ObjIsing = ∑hi qi + i

∑

(6)

J ij qi q j

( i ,j )∈neigh

Fig. 6.2 The two models involved in the programming process of D-Wave system

6 Quantum Computing for Solving Spatial Optimization Problems

103

where hi and Jij are the programmable parameters of the system, representing the bias on each qubit and interaction between two neighboring qubits, respectively. In the D-Wave system, neighboring relationship is defined based on Chimera topology. Therefore, in order to use the D-Wave system to solve real-world optimization models, we need to program those models into the Chimera-structured Ising model which consists of the following two steps. Step 1: Mapping to QUBO The quadratic unconstrained binary optimization (QUBO) model is a well- known optimization model (Kochenberger et al. 2014) and can be viewed as an arbitrarily structured Ising model with binary-valued variables xi ∈ {0, 1}transformed to qi ∈ {−1, +1}via qi = 2xi − 1. N

Minimize : f ( x ) = ∑Qii xi + ∑Qij xi x j i =1

i< j

(7)

where Q is an n × nupper-triangular matrix of real weights. Step 2: Embedding in Chimera topology Given current Chimera-structured hardware, not every Ising model can be directly programed in the D-Wave system. Therefore, an embedding process is needed to determine which physical qubits should represent each logical variable, the strength of the internal couplers between qubits representing the same logical variable, and how to distribute the external couplers between sets of qubits representing coupled logical variables. Use the following simple example as an illustration:

Minimize : ObjIsing = a1q1 + a2 q2 + a3 q3 + a4 q4 + b12 q1q2 + b13 q1q3 + b14 q1q4 + b23 q2 q3 + b24 q2 q4 + b34 q3 q4

(8)

One prominent characteristic of the 4-variable Ising model in (8) is its full connectivity, i.e. each variable is interacted with other three variables. If four physical Fig. 6.3 An example illustrating how to embed a fully connected Ising model in Chimera topology

104

M. Guo and S. Wang

qubits are used to represent the four logical variables q = [q1, q2, q3, q4] as shown in Fig. 6.3a, terms b13q1q3 and b24q2q4 will become illegal due to the lack of physical couplers between variables q1 and q3, q2 and q4. By grouping two physical qubits as a chain to represent one logical variable, enough physical couples are available to model the full connectivity of the Ising model in (8), as shown in Fig. 6.3b. Moreover, with this embedding strategy, large negative weights should be assigned to the physical couplers within a chain in order to keep the physical qubits in a chain stay aligned (i.e. misaligned chains will impose large positive penalties to the total objective value in (8) that is to be minimized). Also, the weights ai and bij need to be mapped to the physical qubits and couplers. For example in Fig. 6.3b, logical variable q1 is represented by a chain of two physical qubits, so the corresponding weight a1 needs to be divided in half and apply a1/2 to each physical qubit in the chain. Likewise, the weight b12 should be distributed evenly between the two physical couplers that connect the two chains representing logical variables q1 and q2, respectively. Next, we will give a proof-of-principle demonstration of how to program the p-median model in the D-Wave system.

3.2 Programming the p-Median Model In general, given a network of N demand nodes and M candidate facilities, the p-median model can be formulated as the following integer programming (IP) form: N

M

Minimize : ObjIP = ∑∑dij xij , i =1 j =1

(9)

M

Subject to : ∑xij = 1, i = 1,…, N , j =1

M

∑y j =1

j

(10)

= P,

(11)

xij ≤ y j , i = 1,…, N , j = 1,…, M ,

xij , y j ∈ {0,1} , i = 1,…, N , j = 1,…, M ,

(12)

(13)

where xij denotes whether demand node i is allocated to facility j, and yj denotes whether facility j is open. The first step of programming the p-median model in the D-Wave system is to convert it to QUBO form, which can be accomplished by adding quadratic penalty terms to the objective function (9) as an alternative to explicitly imposing constraints (10)–(12). For constraints (10)–(12), equivalent quadratic penalty terms are as follows:

6 Quantum Computing for Solving Spatial Optimization Problems

105

2

N  M  W ∑  ∑xij − 1  i =1  j =1 

 M  W  ∑y j − P   j =1 

(14)

2

N

(15)

M

W ∑∑ ( xij − xij y j ) i =1 j =1

(16)

where W is a positive penalty weight. After some simple algebra we derive the QUBO form of the p-median model: N

M

M

N

M

M

ObjQUBO = ∑∑dij xij + W (1 − 2 P ) ∑y j + 2W ∑∑∑xij xik i =1 j =1

M

j =1

M

N

i =1 j =1 k > j

+2W ∑∑y j yk − W ∑∑xij y j j =1 k > j

(17)

M

i =1 j =1

which can be further converted to the Ising form via qij = 2xij − 1 and N M  dij WN W   M  3WN s j = 2 y j − 1 : ObjIsing = ∑∑  + −  qij +  − WP  ∑s j 2 4  j =1  4 i =1 j =1  2 W N M M W M M W N M + ∑∑∑qij qik + ∑∑s j sk − ∑∑qij s j 2 i =1 j =1 k > j 2 j =1 k > j 4 i =1 j =1

W + 2

N

M

M

W qij qik + ∑∑∑ 2 i =1 j =1 k > j

M

M

W s j sk − ∑∑ 4 j =1 k > j

N

(18)

M

∑∑q s

ij j

i =1 j =1

Then, the Ising model (18) needs to be embedded in Chimera topology before it can be solved in the D-Wave system. In Fig. 6.4, we present a possible scheme for embedding a small size p-median model with N = M = 4 and P = 2 in a D-Wave system.

4 S olving Spatial Optimization Models with Quantum Annealing In this section, we apply quantum annealing to solve a real-world biomass-to- biofuel supply chain optimization problem. Specifically, the problem is formulated as a quadratic assignment model and solved with quantum annealing as well as classical simulated annealing. Quantum annealing is simulated on classical computers using a path-integral Monte Carlo method.

106

M. Guo and S. Wang

Fig. 6.4 A scheme for embedding a small size p-median model in a D-Wave system

4.1 Q uantum Annealing Using a Path Integral Monte Carlo Method As an efficient approach to simulating the dynamics of quantum systems, the path- integral Monte Carlo method has been widely used for implementing quantum annealing on classical computers as an efficient algorithm for hard optimization problems (Martoňák et al. 2002; Heim et al. 2015). Given a quantum system modeled in the Ising model in (6), through standard Suzuki-Trotter transformation (Martoňák et al. 2002), it can be represented as the following path integral form: ObjPI =

P N  1 P  + qi , ρ qi , ρ +1 h q J q q J −   ∑ ∑ i i,ρ i,j ∑ ij i , ρ j , ρ  Γ ∑∑ P ρ =1  i ρ =1 i =1 ( )∈neigh 

(19)

where qi, ρ ∈ {−1, +1}. The classical system (19) can actually be seen as composed of P replicas of the original quantum system (6), coupled by a user-controlled nearest-neighbor interaction JΓ. Theoretically, when P goes to infinity, the statistical- mechanical properties of the two systems (6) and (19) become equivalent. One can then expect to simulate the dynamics of (6) by applying a classical Monte Carlo method (e.g. Metropolis sampling) to its path-integral representation (19). The accuracy of the simulation will increase for large values of P, while the memory and time requirements will also increase accordingly.

6 Quantum Computing for Solving Spatial Optimization Problems

107

4.2 F ormulation of Biomass-to-Biofuel Supply Chain Optimization Biofuels such as ethanol and biogas have emerged as a renewable energy option to alleviate the current heavy dependence on fossil fuels, reduce greenhouse gas emissions, and improve energy security. However, due to distributed supply and low energy density of biofuel crops, transportation of bulky feedstock incurs one of the major operational costs in biofuel supply chain systems. Therefore, strategic and effective designs of biofuel supply chain involve optimal location of biorefineries and centralized storage and pre-processing (CSP) sites, with the aim of minimizing the cost for transportation (Lin et al. 2013; Hu et al. 2015). Formally, this problem could be formulated as the following NP-hard quadratic assignment model: N

Subject to

M

∑x

ik

= 1, k = 1,…, N ,

∑x

ik

k =1

(20) (21)

i =1

N

M

l =1 k =1 i =1 j =1

M

N

Minimize ObjIP = ∑∑∑∑ fij dkl xik x jl ,

= 1, i = 1,…, M ,

(22)

xik ∈ {0,1} , i = 1,…, M , k = 1,…, N .

(23)

where xik denote whether facility i (i.e. biorefinery, CSP site) is assigned to location k. The objective is to assign M facilities to N locations in such a way as to minimize the transportation cost, which is the sum, over all pairs, of the amount of feedstock transportation fij between a pair of facilities (i, j) multiplied by the distance dkl between their assigned locations (k, l). Constraints (21)–(22) ensure a one-to-one assignment between facilities and locations.

4.3 Solution Procedures By converting the quadratic assignment model (20)–(23) to the Ising form as described in Sect. 3 and further adopting the path-integral representation in (19), we could apply the following quantum annealing algorithm (pseudo code in Fig. 6.5) and classical simulated annealing algorithm (pseudo code in Fig. 6.6) to solve the biomass-to-biofuel supply chain optimization problem.

108

M. Guo and S. Wang

Fig. 6.5 Pseudo code for the quantum annealing algorithm

Fig. 6.6 Pseudo code for the classical simulated annealing algorithm

4.4 Computational Results In order to test the performance of quantum annealing, numerical experiments are conducted on two instances (N = M = 12 and 15 (QAPLIB n.d.)) of the biomass-to- biofuel supply chain optimization problem formulated in Sect. 4.2. Two performance indicators are used to compare the computational performance of quantum

6 Quantum Computing for Solving Spatial Optimization Problems

109

annealing and classical simulated annealing: (1) success rate—probability of finding optima in 45 independent runs of each algorithm; (2) residual error—relative difference between averaged objective value (of 45 runs) and optima. In all experiments, t is measured as Monte Carlo steps. Controllable parameters are tuned during experiments to achieve best performance. For instance-1 with N = M = 12, we can see from Fig. 6.7 that quantum annealing outperforms simulated annealing in both success rate and residual error. Regarding success rate, by the end of annealing processes, quantum annealing achieves a success rate of about 55%, nearly 2 times the success rate of simulated annealing. Regarding residual error, the results in Fig. 6.7 show that quantum annealing anneals more efficiently, reducing the residual error at a much steeper rate than simulated

Fig. 6.7 Numerical results for instance-1: success rate and residual error

110

M. Guo and S. Wang

Fig. 6.8 Numerical results for instance-2: success rate and residual error

annealing at the early stage of annealing: after only one fifth (i.e. about 650,000 steps) of total Monte Carlo steps, the residual error of quantum annealing stabilizes around 0.5%, which is half of the final residual error of simulated annealing. Similar results can be observed for instance-2 with N = M = 15 (see Fig. 6.8), validating the computational advantage of quantum annealing over classical simulated annealing.

5 Conclusions In this chapter, we explore the frontiers of applying quantum computing to solve NP-hard spatial optimization problems. Specifically, quantum annealing is a quantum computing approach utilizing quantum mechanics such as superposition and

6 Quantum Computing for Solving Spatial Optimization Problems

111

quantum tunneling to facilitate quantum computation processes. Both theoretical and numerical results have demonstrated the significant potential of quantum computing for solving classically intractable problems. Moreover, with recent advances in hardware, empirical testing of quantum annealing becomes more and more feasible. However, at this early stage of the development of quantum computing, much more needs to be researched about how to solve NP-hard spatial optimization problems with quantum annealing and what could be the best computational performance. In this study, we first investigate the programming details of using a D-Wave system, commercially available quantum annealing hardware, to solve a p-median model. Next, for testing the computational performance of quantum annealing, we resort to the path-integral Monte Carlo method, which is a reliable and widely adopted method for simulating quantum annealing on classical computers. Using the biomass-to-biofuel supply chain optimization problem (which is formulated as a quadratic assignment model) as a case study, our numerical results indicate an evident computational advantage of quantum annealing over classical simulated annealing. Although promising results have been achieved in this study, it is still too early to conclude on the ultimate power of quantum annealing. Large-scale quantum computer is difficult to build mainly because of serious decoherence phenomena (i.e. superpositions of states can be easily destroyed by noise). Therefore, a promising near-term future for quantum computing would be to integrate quantum processing units (QPUs) with modern HPC architecture (Britt and Humble 2017). Novel HPC approaches are needed to decompose large-scale real-world spatial optimization models into scalable and solvable forms to take advantage of quantum computing. In this way, QPU that is small in size could become an important supplement of HPC if it could achieve high-quality solutions to suitable problems significantly faster than CPUs. In the long run, quantum computing promises to be a game changer for solving computationally demanding geospatial problems to which the classical von Neumann computer architecture is ill suited.

References Benedetti, M., Realpe-Gómez, J., Biswas, R., & Perdomo-Ortiz, A. (2017). Quantum-assisted learning of hardware-embedded probabilistic graphical models. Physical Review X, 7(4), 041052. Bian, Z., Chudak, F., Macready, W. G., & Rose, G. (2010). The Ising model: Teaching an old problem new tricks. Burnaby, Canada: D-Wave Systems. Biswas, R., Jiang, Z., Kechezhi, K., Knysh, S., Mandrà, S., O’Gorman, B., et al. (2017). A NASA perspective on quantum computing: Opportunities and challenges. Parallel Computing, 64, 81–98. Britt, K. A., & Humble, T. S. (2017). High-performance computing with quantum processing units. ACM Journal on Emerging Technologies in Computing Systems, 1(1), 1–13. Burkard, R. E. (1984). Quadratic assignment problems. European Journal of Operational Research, 15(3), 283–289.

112

M. Guo and S. Wang

Church, R. L. (1999). Location modelling and GIS. Geographical Information Systems, 1, 293–303. Das, A., Chakrabarti, B. K., & Stinchcombe, R. B. (2005). Quantum annealing in a kinetically constrained system. Physical Review E, 72(2), 026701(4). Denchev, V. S., Boixo, S., Isakov, S. V., Ding, N., Babbush, R., Smelyanskiy, V., et al. (2016). What is the computational value of finite-range tunneling? Physical Review X, 6(3), 031015(19). Drezner, Z., & Hamacher, H. W. (2002). Facility location: Applications and theory. Berlin: Springer. Farhi, E., Goldstone, J., Gutmann, S., Lapan, J., Lundgren, A., & Preda, D. (2001). A quantum adiabatic evolution algorithm applied to random instances of an NP-complete problem. Science, 292(5516), 472–475. Geroliminis, N., Karlaftis, M. G., & Skabardonis, A. (2009). A spatial queuing model for the emergency vehicle districting and location problem. Transportation Research Part B: Methodological, 43(7), 798–811. Grover, L. K. (1997). Quantum mechanics helps in searching for a needle in a haystack. Physical Review Letters, 79(2), 325–328. Heim, B., Rønnow, T. F., Isakov, S. V., & Troyer, M. (2015). Quantum versus classical annealing of Ising spin glasses. Science, 348(6231), 215–217. Hogg, T. (1998). Highly structured searches with quantum computers. Physical Review Letters, 80(11), 2473–2476. Hu, H., Lin, T., Liu, Y. Y., Wang, S., & Rodríguez, L. F. (2015). CyberGIS-BioScope: A cyberinfrastructure-based spatial decision-making environment for biomass-to-biofuel supply chain optimization. Concurrency Computation Practice and Experience, 27, 4437–4450. Kadowaki, T., & Nishimori, H. (1998). Quantum annealing in the transverse Ising model. Physical Review E, 58(5), 5355. Karp, R. M. (1975). On the computational complexity of combinatorial problems. Networks, 5(1), 45–68. Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680. Kochenberger, G., Hao, J. K., Glover, F., Lewis, M., Lü, Z., Wang, H., et al. (2014). The unconstrained binary quadratic programming problem: A survey. Journal of Combinatorial Optimization, 28, 58–81. Laporte, G., Nickel, S., & da Gama, F. S. (2015). Location science. Berlin: Springer. Li, R. Y., Di Felice, R., Rohs, R., & Lidar, D. A. (2018). Quantum annealing versus classical machine learning applied to a simplified computational biology problem. NPJ Quantum Information, 4(1), 14. Lin, T., Rodríguez, L. F., Shastri, Y. N., Hansen, A. C., & Ting, K. (2013). GIS-enabled biomass- ethanol supply chain optimization: Model development and Miscanthus application. Biofuels, Bioproducts and Biorefining, 7, 314–333. Liu, Y. Y., Cho, W. K. T., & Wang, S. (2016). PEAR: A massively parallel evolutionary computation approach for political redistricting optimization and analysis. Swarm and Evolutionary Computation, 30, 78–92. Liu, Y. Y., & Wang, S. (2015). A scalable parallel genetic algorithm for the generalized assignment problem. Parallel Computing, 46, 98–119. Lou, Z., & Reinitz, J. (2016). Parallel simulated annealing using an adaptive resampling interval. Parallel Computing, 53, 23–31. Lundstrom, M. (2003). Moore's law forever? Science, 299(5604), 210–211. Martoňák, R., Santoro, G. E., & Tosatti, E. (2002). Quantum annealing by the path-integral Monte Carlo method: The two-dimensional random Ising model. Physical Review B, 66(9), 094203. Mladenović, N., Brimberg, J., Hansen, P., & Moreno-Pérez, J. A. (2007). The p-median problem: A survey of metaheuristic approaches. European Journal of Operational Research, 179(3), 927–939.

6 Quantum Computing for Solving Spatial Optimization Problems

113

Mott, A., Job, J., Vlimant, J. R., Lidar, D., & Spiropulu, M. (2017). Solving a Higgs optimization problem with quantum annealing for machine learning. Nature, 550(7676), 375. Neukart, F., Compostella, G., Seidel, C., von Dollen, D., Yarkoni, S., & Parney, B. (2017). Traffic flow optimization using a quantum annealer. arXiv: 1708.01625v2, pp 1–12. Nielsen, M. A., & Chuang, I. L. (2001). Quantum computation and quantum information. Cambridge: Cambridge University Press. Papadimitriou, C. H., & Steiglitz, K. (1998). Combinatorial optimization: Algorithms and complexity. Chelmsford, MA: Courier Corporation. QAPLIB. (n.d.). A quadratic assignment problem library. http://anjos.mgi.polymtl.ca/qaplib/ Rieffel, E. G., & Polak, W. H. (2011). Quantum computing: A gentle introduction. MIT Press. Rieffel, E. G., Venturelli, D., O’Gorman, B., Do, M. B., Prystay, E. M., & Smelyanskiy, V. N. (2015). A case study in programming a quantum annealer for hard operational planning problems. Quantum Information Processing, 14(1), 1–36. Robbins, H. (1955). A remark on stirling's formula. The American Mathematical Monthly, 62(1), 26–29. Shor, P. W. (1997). Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Journal on Computing, 26(5), 1484–1509. Smith-Miles, K., & Lopes, L. (2012). Measuring instance difficulty for combinatorial optimization problems. Computers and Operations Research, 39(5), 875–889. Stollenwerk, T., O’Gorman, B., Venturelli, D., Mandrà, S., Rodionova, O., Ng, H. K., et al. (2017). Quantum annealing applied to de-conflicting optimal trajectories for air traffic management. arXiv: 1711.04889v2, pp 1–13. Tayarani, N., Mohammad, H., & Prügel-Bennett, A. (2014). On the landscape of combinatorial optimization problems. IEEE Transactions on Evolutionary Computation, 18(3), 420–434. Tong, D., & Murray, A. T. (2012). Spatial optimization in geography. Annals of the Association of American Geographers, 102(6), 1290–1309. Williams, J. C. (2002). A zero-one programming model for contiguous land acquisition. Geographical Analysis, 34(4), 330–349. Zhu, Z. H., Gao, Z. Y., Zheng, J. F., & Du, H. M. (2016). Charging station location problem of plug-in electric vehicles. Journal of Transport Geography, 52, 11–22.

Chapter 7

Code Reusability and Transparency of Agent-Based Modeling: A Review from a Cyberinfrastructure Perspective Wenwu Tang, Volker Grimm, Leigh Tesfatsion, Eric Shook, David Bennett, Li An, Zhaoya Gong, and Xinyue Ye

Abstract Agent-based models have been increasingly applied to the study of space-time dynamics in real-world systems driven by biophysical and social processes. For the sharing and communication of these models, code reusability and transparency play a pivotal role. In this chapter, we focus on code reusability and W. Tang (*) Center for Applied Geographic Information Science, University of North Carolina at Charlotte, Charlotte, NC, USA Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA e-mail: [email protected] V. Grimm Department of Ecological Modelling, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany L. Tesfatsion Department of Economics, Iowa State University, Ames, IA, USA E. Shook Department of Geography, Environment, and Society, University of Minnesota, Minneapolis, MN, USA D. Bennett Department of Geographical and Sustainability Sciences, University of Iowa, Iowa City, IA, USA L. An Department of Geography and PKU-SDSU Center for Complex Human-Environment Systems, San Diego State University, San Diego, CA, USA Z. Gong School of Geography, Earth and Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, UK X. Ye Department of Informatics, New Jersey Institute of Technology, Newark, NJ, USA © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_7

115

116

W. Tang et al.

transparency of agent-based models from a cyberinfrastructure perspective. We identify challenges of code reusability and transparency in agent-based modeling and suggest how to overcome these challenges. As our findings reveal, while the understanding of and demands for code reuse and transparency are different in various domains, they are inherently related, and they contribute to each step of the agent-based modeling process. While the challenges to code development are daunting, continually evolving cyberinfrastructure-enabled computing technologies such as cloud computing, high-performance computing, and parallel computing tend to lower the computing-level learning curve and, more importantly, facilitate code reuse and transparency of agent-based models. Keywords Agent-based models · Code reusability · Code transparency · Model sharing · Cyberinfrastructure

1 Introduction Agent-based models (ABMs) are computational models that use representations of individualized behaviors and interactions (agents) to simulate space-time dynamics in real-world systems (Epstein and Axtell 1996; Grimm et al. 2005). The use of agents as building blocks in ABMs allows for the representation of decentralized actions of and interactions among individuals belonging to the same or different organization levels (e.g., decision makers or their institutions) and often allows for better understanding of how small-scale bottom up processes lead to identifiable system-level outcomes. This representational power of ABMs makes them an important tool for the understanding of how real-world systems self organize to produce complex, cross-scale spatiotemporal phenomena. ABMs thus have extensive applications in multiple domains, such as biology, ecology, economics, computer science, engineering, and social sciences (An et al. 2014; Benenson and Torrens 2004; Parker et al. 2003; Tang and Bennett 2010; Vincenot 2018). As a result, ABMs are conceptual frameworks, as well as computational platforms, that support the integration of various data and models (e.g., statistics, optimization, or domain-specific models) for tackling spatiotemporal complexity in real-world systems (Parker et al. 2003). The wide-spread application of ABMs has led to increasing interest in code and model sharing. Code reuse, in turn, requires code transparency to ensure that any modifications made to these complicated models perform as expected and the underlying scientific or statistical assumptions are not violated. This is particularly important when sharing, reusing, or modifying ABMs that incorporate sophisticated domain-specific knowledge. While code sharing has the potential to save resources and standardize commonly used modules, a number of factors significantly complicate code reuse. Modelers develop ABMs with different objectives in mind, such as prediction, scenario exploration, thought experiments, and theory development

7 Code Reusability and Transparency of Agent-Based Modeling: A Review…

117

(Grimm et al. 2005), using a variety of programming languages developed for a wide array of computing environments. Individual models embed theories and assumptions that simulate complex behaviors of agents who interact with other agents as well as with their simulated environment. These models are often implemented using complicated code spreading across a suite of interconnected modules written by scientists not well versed in software design and documentation. Further, to maintain functionality, ABMs must be adapted (e.g., re-programed or re-compiled) to work within continually changing computing environments. Code reusability and transparency, therefore, have been recognized as a “bottleneck problem” whose resolution is essential for ensuring communication and progress within the ABM community (An et al. 2014; NRC 2014; Parker et al. 2003). Some important progress has been made, represented by the Open ABM (https:// www.openabm.org). Yet, the lack of adequate transparency and reusability makes it difficult to verify and validate ABMs, wasting valuable resources (e.g., modules and programming libraries) expended on development and testing by ABM experts. This is an issue facing scientific research in general (Kedron et al. 2019; Nosek et al. 2015) and agent-based modeling, in particular. Code Reusability and Transparency (CRaT) are fundamental in computer science and, in particular, the domain of software engineering. Over the past decades, computer hardware and software technologies have advanced dramatically, as evidenced by cyberinfrastructure technologies (Atkins et al. 2003; NSF 2007) such as high-performance computing, cloud computing (Mell and Grance 2011; Yang and Huang 2013), and scientific workflows. Advanced computing resources are increasingly available and associated technologies provide strong support for code reuse and reusability. CRaT, therefore, cannot be explored without considering the current capabilities and potential advancement of cyberinfrastructure-enabled technologies more generally. In this chapter, we focus on the challenges associated with the CRaT of ABMs from a cyberinfrastructure perspective and offer potential solutions, as well as direction for future research. Our investigation integrates perspectives from domains related to ecological, social, and social-ecological dimensions. These three domains, which have received a wide variety of ABM applications, are broadly defined here to cover biophysical and social aspects of study systems of interest. We then focus our discussion on potential contribution of cyberinfrastructure-enabled technologies, including cloud computing and high-performance and parallel computing, to CRaT of ABMs. Figure 7.1 illustrates a framework that guides our discussion in this chapter.

2 C ode Reusability and Transparency for Agent-Based Modeling CRaT is a core software engineering field that covers the requirement, design, development, testing, and management of software products using a systematic approach (Sommerville 2016). Interest in software or code reuse endures because of

118

W. Tang et al.

Fig. 7.1 Framework of code reusability and transparency of agent-based models from a cyberinfrastructure perspective

the benefits brought by reusing existing software or code (e.g., saving of time and cost, direct leverage of domain-specific knowledge that has been implemented). The reusable code could be functions, sub-models, or an entire ABM, which are often only slightly modified for addressing new questions, or integrated with other code, for example a biophysical model. The shift of paradigm from procedure-based to object-oriented programming has greatly stimulated the reuse of software products in the form of, for example, software libraries, design patterns, objects/components, or architectures (Frakes and Kang 2005). A suite of metrics and models have been proposed to evaluate software reusability (the likelihood that code can be reused) and reuse (Frakes and Terry 1996), including cost-benefit models, maturity assessment models, amount of reuse metrics, failure modes models, reusability assessment models, and reuse library metrics. These metrics and models can be used to guide and evaluate code reusability in the context of ABMs. Of course, code transparency is the necessary condition for ensuring ABM code reusability. In other words, even if the source code and associated documentation are available, the code of ABMs may not be able to be reused (easily) because of the complex nature of many ABMs. To achieve transparent and reproducible scientific discovery using ABMs, a set of modeling aspects need to be archived, which include model code, model depen-

7 Code Reusability and Transparency of Agent-Based Modeling: A Review…

119

dencies (such as the modeling system, software packages, and operating system), model description, model input and/or output data, and associated workflow steps. Workflow documents might include such topics as model parameterization, model execution, the capture of outputs, and the analysis and visualization of output, which ideally document all elements of model design, implementation, analysis, and application to warrant transparency and credibility (e.g., TRACE documents; Schmolke et al. 2010; Grimm et al. 2014). Following these principles and best practices will ensure that others can reproduce an ABM with confidence. In the bibliometric analysis conducted by Janssen (2017), ABMs published in the literature were typically defined using one or more of seven forms of documentation—i.e., written narrative, mathematical description, flowcharts, source code, pseudo-code, Overview-Design Concepts-Detail (ODD) protocol (Grimm et al. 2006, 2010), and Unified Modeling Language (UML). The ODD protocol is increasingly used for describing ABMs and has the potential to become a standard format (Hauke et al. 2017; Vincenot 2018), which would facilitate writing and reading model descriptions, and communication and collaboration with and between disciplines and, hence, theory development with ABMs (Lorscheid et al. 2019). ODD is meant for writing model descriptions which are both transparent and complete, so that in principle a replication of the ABM even without the code would be possible. Without a complete written model description, which is independent of the code, we would not know what the code is supposed to do and therefore could not assess whether it is correct and hence can be reused. CRaT has, as we will discuss next, great potential to make ABM development and use more coherent and efficient, but without corresponding model descriptions CRaT cannot work. Certainly, mathematics is a fundamental tool for us to formulate and communicate theoretical, empirical, or simulation models. Mathematical expressions are viewed by modelers as the most precise description of model behaviors; most ABMs include such expressions at some level(s), which in turn are also included in ODD model descriptions, for example. Flow charts serve as an intuitive depiction of how model elements and processes are related to each other. Sometimes, flow charts are viewed as adding additional level of clarity rather than being necessary.

2.1 Object-Orientation The implementation of ABMs has greatly benefited from the object-oriented paradigm (OOP), which groups operations and data (or behavior and state) into modular units referred to classes (Rumbaugh et al. 1991). Agents and their environment(s) in an ABM can be encapsulated into classes that can act on their own and interact with each other. Agents of various types can be organized in a hierarchical (i.e., inheritance) or non-hierarchical (e.g., linked via various ad hoc or empirical rules) manner, and common behaviors of agents can be represented or coded as methods associated with the corresponding objects (instantiated from class). The object- oriented paradigm provides concepts, design principles, and programming support

120

W. Tang et al.

that are well suited to the design and development of ABMs (Bennett 1997; Bennett and Tang 2006; Tang 2008; Tang and Yang 2020). As a result, the object-oriented approach has been extensively used by domain-specific researchers for the development of agent-based models. Further, the nature of object-oriented approach greatly facilitates the reuse and sharing of ABMs. For example, modelers can design sub- classes of an agent type (e.g., farmer agents from household-type agents as a superclass) that has been well implemented in the existing ABMs. Thus, modelers just need to focus on the implementation of specific properties and behavioral rules (for an overview on this topic, see An 2012) and leverage existing ABMs for generic properties and rules. Still, it seems that using OOP is not necessary for most ABMs, which tend to be of intermediate complexity. For example, the software platform and programming language most commonly used to implement ABMs at this time, NetLogo, is not object-oriented, although it was implemented using OOP. Establishing class- hierarchies that can be reused can take more than a decade and involve huge numbers of software developers and users, such as for operating systems, which is by orders of magnitude larger than the agent-based modeling community. Based on different considerations of reusability, existing ABM platforms fall into two categories: general-purpose and domain-specific. The former, such as NetLogo and Repast, aims to serve potential needs from any domains of application. The latter is specifically designed for a certain domain of application. For example, MATSim (Horni et al. 2016) is a large-scale agent-based simulation platform for transportation planning and engineering, and UrbanSim (Waddell et al. 2006) is for microscopic simulations of urban land-use dynamics to support urban planning. Benefitting from the OOP paradigm, general-purpose ABM platforms commonly supply capabilities to allow potential reusability through both high-level modular abstraction to describe and represent elements of ABMs (e.g., built-in generic agents in NetLogo) and standardized software design, libraries, and toolkits (e.g., modular functionalities such as model specification and execution, data storage, and visualization in Repast Simphony and NetLogo) with portable programming environments (e.g., Java). These platforms also provide internal model libraries and external portals for model sharing as a supplementary way to promote model reusability. In contrast, domain-specific ABM platforms act more as templates or meta-models that allow direct reuse and customization. This is because built-in agent abstractions within the model template are tailored for specific domains where problem sets are well-defined (Horni et al. 2016; Waddell et al. 2006). For example, MATSim has built-in network and vehicle agents, which can be extended to more specific types of agents.

2.2 ODD Protocol The ODD protocol (Grimm et al. 2006, 2010) provides specific guidelines and support for standardizing the description and design of ABMs. The ODD protocol functions as an easy-to-use and discipline-independent framework to describe the

7 Code Reusability and Transparency of Agent-Based Modeling: A Review…

121

components of an ABM and the way that the ABM is implemented. While the ODD protocol is evolving, the use of this ABM-specific protocol ensures the transparency of ABMs to a large degree, thus greatly facilitating the reuse of ABMs. Other tools exist to facilitate understanding and replicating a model, for example, UML. Refinements of the ODD protocol for specific purposes can be developed. For example, the ODD+D protocol which provides additional categories for describing how human decision making has been represented (Müller et al. 2013). An et al. (2014) presented an ODD expansion for application in coupled human-environment systems where complex features and system dynamics are prevalent. The ODD protocol was designed as a standard for written model descriptions, which are essential when we want to show that our programs are doing what they are supposed to do. Written model descriptions include equations and pseudo-code, but by themselves are free of program code or markup languages. ODD was also intended to provide information needed for re-implementing a model. Yet whether or not this works has never been tested so far. General experience, though, shows that model replication based on written descriptions will always be likely to include some ambiguities that only can be resolved if the corresponding code (or at least pseudo-code) is available. Therefore, and also to limit the length of the ODD description, Becher et al. (2014) mixed pieces of NetLogo (Wilensky and Evanston 1999) code into the ODD of the honeybee model BEEHAVE, which was short enough to be understood without knowing NetLogo. Still, a better solution would be code-free ODDs which include hyperlinks to the corresponding elements of the code. Overall, ODD does not directly address code reusability but complements it because code without corresponding written model descriptions makes no sense. A condition for this is that writing ODD model descriptions is taken seriously enough, which is often not the case. Sloppy use of ODD is counterproductive and should not be tolerated. ODD model descriptions should be based on what the code is set up to do, not on the image of the model in the modeler’s mind.

3 C RaT of Agent-Based Modeling in Application-Specific Science ABMs have been extensively applied in a variety of domains. In this chapter, we focus on three representative application domains related to biophysical and social dimensions of study systems. Either biophysical or social aspects of study systems cover a range of application domains. Thus, in this section, we focus on the domain of ecological studies for biophysical aspects. For social aspects, we opt to concentrate on economic studies, while ABMs have been applied to other social aspects, including crime, culture, history, politics, public health, and psychology. The third application domain is social-ecological studies as it covers both biophysical and social aspects of study systems.

122

W. Tang et al.

3.1 Ecological Studies for Biophysical Aspects Code reuse is rare in ecological modeling. The main reason might be that the number of programming languages used in ecology is much greater than that in social modeling. In social modeling, NetLogo has become the dominant programming platform and language, which is used by many and understood by many modelers in the corresponding fields. On the other hand, the practice of providing corresponding code with a publication is increasing. Accessibility of these online supplements is limited if the journals are proprietary, or unstable, or if institutional web pages are used (Janssen 2017). The use of ODD is much more established in ecology than in other disciplines presumably because it was developed by ecologists. ODD has the potential to directly support the provisioning and actual use of existing code by referring to “sub-models” of certain behaviors, for example, territorial behavior. This makes it easier to identify procedure, functions, or methods from existing programs, which can be used without having to work with the rest of the program. The developers of ODD also hoped that using ODD will make it easier to review sub- models of certain behaviors in order to identify generic formulations. For example, Zakrzweski et al. (unpublished manuscript) reviewed agent-based models of territorial behavior and identified two main classes, based on habitat maps or movement data. They then formulated generic versions of both classes and implemented them in NetLogo. The generic models could then immediately be used for a model of the cotton rat (A. Larson et al., personal communication), although the original models were developed for other species. In contrast to social and social-ecological sciences, where Open ABM has become an established platform for models and, in the future, also reusable building blocks of code, no such platform exists in ecology. Open ABM is open to ecologists, so perhaps we only need a critical mass of ecologists using it. Ecological modeling also includes a wide range of model types and models of different complexity. This makes different models seemingly incompatible. Ecology needs big models, and models that build on each other (Thiele and Grimm 2015). We thus need good examples of original scientific work, which is brilliant, inspiring, and moving the field forward, despite being built on building blocks of existing models or modules.

3.2 Economic Studies for Social Aspects Agent-based computational economics (ACE, Tesfatsion 2002) is a variant of agent- based modeling developed specifically for the study of economic systems. ACE is roughly defined to be the computational modeling of economic processes, including whole economies, as open-ended dynamic systems of interacting agents (Tesfatsion 2020). In Tesfatsion (2017), seven specific modeling principles are presented that characterize the ACE modeling approach. These principles highlight the relevance of ACE for the study of economic systems and permit ACE to be carefully distinguished from other modeling approaches.

7 Code Reusability and Transparency of Agent-Based Modeling: A Review…

123

A key issue facing ACE researchers is how to communicate their models and model findings to others in a careful, clear, and compelling manner. ACE modeling often proceeds from agent taxonomy and flow diagrams, to pseudo-code, and finally to a software program that can be compiled and run. In this case, the resulting software program is the model; it is not simply a computational implementation of a model originally expressed in analytical form. To present such a model, most ACE researchers resort to verbal descriptions, graphical depictions for model components and interactions, UML diagrams, and/or pseudo-code expressing the logical flow of agent interactions over time. Anyone wishing to replicate reported results is pointed to the original source code. On the other hand, it follows from the modeling principles presented in Tesfatsion (2017), Sect. 2) that ACE models are initial-value state space models. Consequently, in principle, any ACE model can equivalently be represented as a system of discrete- time or discrete-event difference equations. These analytical representations become increasingly complex as the number of agents increases. However, they can greatly facilitate the communication of ACE models and model findings to researchers and industry practitioners who are used to seeing problem situations represented in analytical model form.

3.3 Social-Ecological Studies for Coupled Aspects ABMs are pivotal in the study of social-ecological systems (SES), which are also referred to as coupled human and natural systems (CHANS; see Liu et al. 2007a, b). For example, the study of land use and land cover change is often framed within the context of SES or CHANS. Investigations on space-time dynamics in CHANS are often very complicated because, for example, a series of influential factors from both social and natural dimensions and feedback between these sub-systems and this needs to be taken into account. The coupled nature of SES further introduces the difficulties in the development of ABMs because of the need to tackle this spatiotemporal complexity. Human decision-makers, for example, often decompose space and time into discrete units (e.g., a farm field, decisions made on an annual cycle), while natural processes tend to be continuous in space and time. Schulze et al. (2017) conducted detailed review on ABMs of SES from alternative modeling aspects (Augusiak et al. 2014; Grimm et al. 2014), including problem formulation, data and model evaluation, implementation verification, model output corroboration, upscaling or transferability, to name a few. They stressed that software (or code) availability of ABMs (e.g., through Open ABM or github (https:// github.com/)) is important for theory development, model implementation and refinement, and model replication. The readers are directed to Tang and Yang (2020) for an open-source agent-based land change model (Agent-LCM) available on github (also documented using ODD protocol). In coupled (often complex) human-environment systems, there is the potential to publicize and reuse some modules or pseudo-code, such as those related to creating

124

W. Tang et al.

the environment, childbearing, migration, marriage, or converting land use of a certain land parcel. Figure 7.2 is the pseudo-code for creating the physical environment of a SES (An et al. 2020). The same (at least very similar) process has occurred in a number of SES studies (e.g., see An et al. 2005; Zvoleff and An 2014). Once these common processes are identified, they can be implemented as modules or primitives that can be easily shared and reused (similar to the design pattern approach in software engineering). Code transparency is a necessary condition that ensures the reusability of ABM code. Rendering the code of ABMs reusable remains a significant challenge, in particular, for agent-based modeling of social-ecological systems. Because ABMs of these systems may be based on different spatiotemporal granularity (see Bennett and Tang 2006; Tang and Bennett 2010), the code that implements the driving processes may only be suitable or feasible for a specific range of spatiotemporal scales. Thus, modelers need to make sure that the spatiotemporal scales at which the processes operate are consistent between the ABM to be reused and their new ABM to be developed.

4 P erspectives from Cyberinfrastructure for ABM Code Reusability and Transparency Cyberinfrastructure is a system that provides powerful and advanced computing capabilities by connecting networked computing resources (hardware and software), data, people, and technologies (Atkins et al. 2003). A cyberinfrastructure is characterized by three key capabilities: high-performance and parallel computing, massive data handling, and virtual organization (NSF 2007). The power and connectivity of cyberinfrastructure have the potential to support the ABM community with better reusability and transparency. A cyberinfrastructure-enabled ABM platform could provide a shared resource from which to build, test, and validate ABMs. A recent example is the R package nlrx (Salecker et al. 2019), which allows for defining and running simulation experiments with models implemented in NetLogo, CreateEnviroment() { Create objects representing the environment; Read in land parcel locational and attribute data; Loop through all pixels or parcels: Assign geographic data to objects (pixels); Assign environmental data to objects (pixels); End loop }

Fig. 7.2 Illustration on the module identification and reuse of agent-based models (environment creation of an agent-based land use model was used as an example)

7 Code Reusability and Transparency of Agent-Based Modeling: A Review…

125

and analyzing the results in R. Standard experiments, for example, global sensitivity analyses or model calibration, are predefined and the simulations can easily be run on HPC clusters such as the NetLogo ABM Workflow System (NAWS) that parallelizes BehaviorSpace experiments (Shook et al. 2015). The platform OpenMole (https://openmole.org/) has a similar scope but is not restricted to NetLogo. Models built using such platforms could easily be reused, because they could share the same computational environment including data, software packages, and operating systems. This would reduce the burden on model developers to reuse existing models. It would also have an implicit effect on improving transparency, because model developers could access to the model code, modify the model, and rerun simulations to evaluate the effect of the modifications. Yet, these types of models do not fully exploit the advanced computing capabilities of cyberinfrastructure because each individual model only uses one processing core. To create ABMs that leverage multiple processing cores, called parallel ABMs, requires more advanced programming and technical skills. Parallel ABMs embed parallelization algorithms within models themselves which limits code reusability and transparency. Further, the process of re-implementing an ABM to be parallel is a major source of bugs (Parry and Bithell 2012). So, an additional challenge from a cyberinfrastructure perspective is a tradeoff between transparency and reusability from a developer/scientist perspective and computational performance from a cyberinfrastructure perspective. While a suite of cyberinfrastructure technologies exists, we focus on cloud computing and high-performance and parallel computing in this chapter.

4.1 Cloud Computing Cloud computing provides on-demand computing support through virtualization technologies. Computing resources, such as processors, networks, and storage can be virtualized and configured in response to alternative needs of users (Mell and Grance 2011; Yang and Huang 2013). Computing capabilities such as operating systems, development environments, software, and data can be encapsulated as services that allow users to customize their computing requirements. Agent-based modeling can directly reap the benefits of cloud computing. For example, virtual machines are software-level replications of the computer environments of physical machines based on the emulation of computer architectures including CPU, memory, and storage devices. In other words, virtual machines are simulators of physical machines that allow platform independence for running ABMs regardless of the operating system of a physical machine. Virtual machine technologies have become mature, and well supported by such cloud computing vendors as Amazon EC2, Google Cloud, and Microsoft Azure. Thus, virtual machines that are designed or used for specific ABMs can be provided to developers or users so that they will not need to spend extra efforts on the configuration of computing environments (including operating systems, software

126

W. Tang et al.

dependencies), which is often complicated and time-consuming (e.g., the configuration and use of SWARM; see https://www.swarm.org/). Once ABMs are implemented, they can be used or reused through virtual machines on cloud computing infrastructures (see Kim and Tsou 2013). This may hold great promise for the sharing and reuse of ABMs as cloud computing resources are increasingly available. Furthermore, the specific functionality of ABMs can be implemented and shared as services (SaaS; Software as Service) (see Tang et al. 2011). Thus, developers or users only need to integrate these services through mash-up mechanisms for development of their own ABMs. State-of-the-art cyberinfrastructure environments such as NSF-supported Jetstream (http://www.jetstream-cloud.org/) as part of the Extreme Science and Engineering Discovery Environment (XSEDE) (Towns et al. 2014) provide free access to customized Virtual Machines (VMs) that could be used for ABM research. The VMs can be published with a DOI providing a stable system repository that developers can use to develop and test their models. Cyberinfrastructure also provides modelers a new window to the world through web-accessible computing. These infrastructures lower barriers to reuse by making software, data, and models publicly available over the web. For example, NetLogo Web (http://www.netlogoweb.org) provides an online platform to run and explore models as well as a feature to export them to the users’ own computer. This low barrier to entry through a web browser with the potential to export and modify provides solid support for reusability and transparency. However, if models are hidden through a web-based interface with no opportunity for export, the same systems can quickly create black boxes in which models are opaque to the user.

4.2 High-Performance and Parallel Computing High-performance computing resources, based on networking of individual computing elements, have been extensively available as the emergence of cyberinfrastructure, such as computing clusters (tightly or loosely coupled; physical or virtual; CPU- or GPU-based). These high-performance computing resources provide a shared platform enabled by cyberinfrastructure for scientific discovery. It would be possible to create shared scientific workflows that follow best practices for scientific computing (Wilson et al. 2014) for common tasks such as sensitivity analyses or validation testing of ABMs. The platform could support multiple users and provide high-performance computing power to ABMs, thus eliminating many computational limitations of desktop computers. High-performance computing could greatly facilitate code reusability of ABMs, particularly for those facing a computational challenge. Without resolving the computational limitation, ABMs may not be reusable even with sufficient code transparency. Parallel computing strategies play a critical role in harnessing high-performance computing power for agent-based modeling that is often time consuming and computationally demanding (e.g., calibration, validation, and experimentation). Parallel computing strategies allow for the splitting of a large

7 Code Reusability and Transparency of Agent-Based Modeling: A Review…

127

computational problem into many sub-problems that are computationally affordable or tractable. Parallel computing capabilities have been available and supported in generic ABM software platforms. For example, NetLogo (open source now; see https://github.com/NetLogo/NetLogo) allows for the parallelization of repetitive model runs for parameter sweeping (in BehaviorSpace; see https://ccl.northwestern.edu/netlogo/docs/behaviorspace.html). Repast supports the exploitation of computer clusters for accelerating ABMs that require considerable computation (see https://repast.github.io/). Repast for high-performance computing is specifically developed for large-scale agent-based modeling. DMASON (https://github. com/isislab-unisa/dmason) is a parallel alternative of the ABM software called MASON. The parallel computing solutions of these generic ABM software platforms are available in an open source manner. A series of domain-specific parallel ABMs that leverage cyberinfrastructure- enabled high-performance computing resources for acceleration have been reported in the literature (Shook et al. 2013; Tang et al. 2011; Tang and Bennett 2011; Tang and Jia 2014; Tang and Wang 2009). However, code transparency of these domain-specific parallel ABMs is low, which may limit their reusability.

5 M ajor Challenges Facing Cyberinfrastructure for ABM Code Reusability and Transparency 5.1 C hallenges from Software Engineering and Potential Solutions Common ABM platforms (e.g., NetLogo, Repast) are often developed with help from software engineering experts, which explains the high reusability of these ABM platforms. However, specific ABMs for particular domain problems are often developed by domain-specific modelers in an ad hoc manner (even based on existing platforms) and, as a result, often lack systematic support that facilitates the maintenance, update, user support, and extensions often needed to keep models relevant and viable in software engineering. Applying the knowledge and approach of software engineering to the design and development of ABMs in specific domains or applications is, therefore, urgently needed to ensure and enhance the software quality and code reusability of ABMs. For example, while there have been a number of ABMs with source code and documentation available, the computing environments (operating system, compiling software) required by these models may be only suitable for older computer environments (e.g., legacy software). Thus, the coupling of these ABMs (or their reusable modules), which are only functional for legacy computing platforms, to a new ABM under development presents significant challenges. This is linked to another challenging issue in the development of ABMs: version control. While a suite of generic version control platforms (e.g., Git or Subversion (SVN); see https://

128

W. Tang et al.

subversion.apache.org/) are available, modelers may need to spend a significant amount of time and effort to handle the incompatibility brought on by software versions. One potential solution that is gaining attention in other fields is the use of container technology such as Docker (https://www.docker.com) and Singularity (https://github.com/sylabs/singularity) that provides a means to package software, data, and documentation into a single container that can be shared and re-executed. Another challenge related to code reusability and transparency of ABMs is associated with the representation of adaptive behavior (learning and evolution) in real- world systems. Machine learning approaches (e.g., artificial neural network, reinforcement learning, evolutionary algorithms, ant colony algorithm; see Niu et al. 2016) have been well developed in the domain of artificial intelligence and applied to the modeling of individual- or collective-level adaptive decision making processes in real-world systems (see Bennett and Tang 2006; Tang and Bennett 2010). Software packages or libraries have been implemented to support these machine learning algorithms. However, the appropriate representation of adaptive decision making processes of agents or their aggregates often complicates the programming process (developers need background from both ABMs and machine learning). A loose coupling approach (see Brown et al. 2005; Tang 2008; Tang and Bennett 2010) is often ill-suited to the representation of adaptive behavior. In other words, tight or full coupling scheme may be needed. But, this further brings more requirements on the background and skillset of ABM developers.

5.2 C hallenges Facing Cloud Computing and Potential Solutions While computing resources provided by cloud computing are configurable and scalable, the use of cloud computing for agent-based modeling is mostly done through the sharing of virtual machines. However, a virtual machine may not be efficient in terms of its use and reuse because, for example, the associated operating systems may not be compatible with the model’s code. This efficiency issue poses a challenge for code reuse and sharing of ABMs. A potential solution or alternative is to use Container-as-a-Service (CaaS), which is a lightweight virtualization approach that provides (only) packages of software environments (instead of entire operating systems). Docker is a representative CaaS platform, which has been supported by such commercial cloud vendors as Amazon EC2 (https://aws.amazon.com/ecs/). This CaaS approach allows for quick and easy configuration of development environments for ABMs. In other words, developers or users can focus more on the adaptation and reuse of the ABMs of their interest. For example, MIRACLE is a web-based platform that uses Docker container technologies for the reproducible visual analytics of ABM outputs (see Jin et al. 2017). While the MIRACLE platform has a focus on data analytics, it may provide insights into CRaT of ABMs.

7 Code Reusability and Transparency of Agent-Based Modeling: A Review…

129

One promising technological platform is Jupyter Notebooks (http://jupyter.org/), which is being used by thousands of data and computational scientists. Notebooks represent executable diaries that can contain documentation, code, data, and visualizations. Creating workflows and documented models as Jupyter Notebooks, which can be shared and made available through online repositories and supplementary materials, could go a long way to improving reusability and transparency. GISandbox, for example, is a Science Gateway for Geographic Information Science (including for agent-based modeling). The GISandbox is based on Jupyter Notebooks and powered by Jetstream, a high-performance cloud computing platform on the advanced cyberinfrastructure XSEDE. Computing infrastructure, like GISandBox, allows model developers to create, execute, and share ABMs using only a web browser, which could be linked to existing repositories such as OpenABM, thus facilitating reusability and transparency of ABMs. Here are two interesting case studies in which Jupyter Notebooks, integrated within cyberinfrastructure environments, were applied to facilitate agent-based modeling for emergency evacuation and influenza transmission (Kang et al. 2019; Vandewalle et al. 2019).

5.3 C hallenges Facing High-Performance and Parallel Computing and Potential Solutions While cyberinfrastructure-enabled agent-based modeling is an exciting prospect to enhance reusability and transparency, several key challenges still exist. First, computational resources including supercomputers, cluster, high-end visualization resources, and massive data stores are retired and replaced every 4–5 years mostly due to the maintenance cost of these massive resources. As a result, cyberinfrastructure is constantly shifting, raising challenges for agent-based modelers to keep up with changing platforms. This constant shift, and oftentimes steep learning curve, detracts from the modelers’ primary goal, which is to improve understanding of the underlying mechanism that drives complex phenomena in real-world systems. Furthermore, it can be difficult to replicate computing performance results of ABMs on a supercomputer that no longer exists. However, this drawback is not exclusive to cyberinfrastructure, as operating systems and software packages continue to evolve as well. Cyberinfrastructure can be an enabler, but only to those who have access to cyberinfrastructure. Developing countries and science domains that are not traditional users of cyberinfrastructure may lack access or the skillset to leverage advanced cyberinfrastructure. This separation may create a rift in the ABM community if cyberinfrastructure is fully adopted in certain scientific areas and cannot be adopted in others. Combined with the advanced computational capabilities and lower barriers for accessibility and sharing, the cyberinfrastructure community may leap ahead of the technologically limited communities, thus increasing a digital or

130

W. Tang et al.

cyber divide between those who have access and skills to use cyberinfrastructure and those who do not. While high-performance and parallel computing resources have been available, parallel algorithms that are implemented to accelerate agent-based models are often developed within alternative parallel computing platforms (e.g., message-passing, shared-memory, or hybrid of the two) (see Tang and Wang 2009; Tang and Bennett 2011; Tang and Jia 2014). Further, the parallel ABMs that have been implemented may be only suitable for special hardware architectures (e.g., Graphics Processing Units; see Tang and Bennett 2011). Therefore, while high-performance and parallel computing resources and technologies increase processing speed and have the potential to resolve the computationally intensive issues facing ABMs, significant difficulties remain for the reuse of the code of the existing parallel ABMs (e.g., the availability of computing resources and parallel computing environments). Further, if we want to reuse the source code of ABMs implemented within sequential computing environments, the development of parallel ABMs may need additional handling of specific computing aspects, for example, random number generation, parallelization of agent-based interactions, analysis of agent or environmental patterns, and evaluation of acceleration performance (see Tang 2013). These computing- level aspects must be handled in the code that implements parallel agent-based models.

6 Conclusion The use of agent-based modeling for the study of real-world social and/or ecological systems has received more and more attention. CRaT of the ABMs are key to model sharing and communication among researchers with common interest in agent-based modeling. Perspectives from social, ecological, social-ecological studies show that alternative domains have different understandings and demands for CRaT of ABMs. More importantly, the CRaT are inherently related to each modeling step of the development of ABMs (from model conceptualization, design, implementation, verification, calibration and validation, and experimentation), and thus will be of great help for model evaluation, integration, and application (O'Sullivan et al. 2016). Code with high transparency will greatly stimulate the sharing and reuse of ABMs, as indicated by Janssen (2017). The continually evolving cyberinfrastructure-enabled computing resources and technologies, represented by cloud computing and high-performance and parallel computing, will keep spurring the sharing and reuse of ABMs as the learning curve at the computing level becomes lower. Thus, modelers can focus on the design and implementation of their own ABMs, part of which may be replicated or reused from existing models. Using ABMs implies not only their design, formulation, and implementation, but usually also extensive simulation experiments for analysis and application. For this, considerable computing power is often required. Ideally, platforms would exist that facilitate the setup and execution of such experiments, and even could provide standard experiments, for example, for sensitivity analysis. Such platforms would

7 Code Reusability and Transparency of Agent-Based Modeling: A Review…

131

benefit from, and foster, code reusability. The option to quickly run extensive simulations and obtain pre-analyzed results would be a strong incentive to write code that is compatible with the platform and therefore could more easily be reused by others. Further, the combined sharing of source code with other model documentation approaches (e.g., mathematic expression, UML, and ODD) is suggested to ensure model transparency and thus reusability. This has been increasingly encouraged by journals such as Ecological Modelling, Environmental Modeling and Software, and Journal of Artificial Societies and Social Simulation. Among all the alternatives, we call for developing relatively simple, easy-to-master pseudo-code language (or UML language), including definition of terms, simple syntax, and declaration or presentation rules. Such a language may have common components (universally useable for ABMs in many, if not all, domains) and domain specific components. The former should be jointly discussed and developed by ABM experts from various disciplines, while the latter would be more of a disciplinary effort and product. Moreover, the documentation of ABMs should be compliant with those standards or protocols of code reuse and transparency (e.g., open source standards), which will facilitate the sharing and communication of ABMs among researchers with different domain backgrounds. The goal of this chapter was to raise awareness and sensitivity to the issue of CRaT and promote the continued discussion of related issues among researchers interested in agent-based modeling. Acknowledgments This chapter was partially sponsored by USA National Science Foundation through the Method, Measure & Statistics (MMS) and the Geography and Spatial Sciences (GSS) programs (BCS #1638446). We also thank the comments and input from all the participants of the ABM Code Reusability and Transparency Workshop at the ABM 17 Symposium (http://complexities.org/ABM17/). Special thanks go to Drs. Michael Barton and Marco Janssen for leading the oral discussion of this symposium session. The authors owe thanks to the reviewers for their insightful comments.

References An, L. (2012). Modeling human decisions in coupled human and natural systems: Review of agent-based models. Ecological Modelling, 229, 25–36. An, L., Linderman, M., Qi, J., Shortridge, A., & Liu, J. (2005). Exploring complexity in a human– environment system: An agent-based spatial model for multidisciplinary and multiscale integration. Annals of the Association of American Geographers, 95(1), 54–79. An, L., Mak, J., Yang, S., Lewison, R., Stow, D. A., Chen, H. L., et al. (2020). Cascading impacts of payments for ecosystem services in complex human-environment systems. Journal of Artificial Societies and Social Simulation (JASSS), 23(1), 5. An, L., Zvoleff, A., Liu, J., & Axinn, W. (2014). Agent-based modeling in coupled human and natural systems (CHANS): Lessons from a comparative analysis. Annals of the Association of American Geographers, 104(4), 723–745. Atkins, D. E., Droegemeie, K. K., Feldman, S. I., Garcia-Molina, H., Klein, M. L., Messerschmitt, D. G., Messina, P., Ostriker, J. P., & Wright, M. H. (2003). Revolutionizing science and

132

W. Tang et al.

engineering through Cyberinfrastructure: Report of the National Science Foundation blueribbon advisory panel on Cyberinfrastructure. Augusiak, J., Van den Brink, P. J., & Grimm, V. (2014). Merging validation and evaluation of ecological models to ‘evaludation’: A review of terminology and a practical approach. Ecological Modelling, 280, 117–128. Becher, M. A., Grimm, V., Thorbek, P., Horn, J., Kennedy, P. J., & Osborne, J. L. (2014). BEEHAVE: A systems model of honeybee colony dynamics and foraging to explore multifactorial causes of colony failure. Journal of Applied Ecology, 51(2), 470–482. Benenson, I., & Torrens, P. M. (2004). Geosimulation: Automata-based modeling of urban phenomena. Hoboken, NJ: Wiley. Bennett, D. A. (1997). A framework for the integration of geographical information systems and modelbase management. International Journal of Geographical Information Science, 11(4), 337–357. Bennett, D. A., & Tang, W. (2006). Modelling adaptive, spatially aware, and mobile agents: Elk migration in Yellowstone. International Journal of Geographical Information Science, 20(9), 1039–1066. Brown, D. G., Riolo, R., Robinson, D. T., North, M., & Rand, W. (2005). Spatial process and data models: Toward integration of agent-based models and GIS. Journal of Geographic Systems, 7(1), 1–23. Epstein, J. M., & Axtell, I. (1996). Growing artificial societies: Social science from the bottom up. Cambridge, MA: The MIT Press. Frakes, W., & Terry, C. (1996). Software reuse: Metrics and models. ACM Computing Surveys (CSUR), 28(2), 415–435. Frakes, W. B., & Kang, K. (2005). Software reuse research: Status and future. IEEE Transactions on Software Engineering, 31(7), 529–536. Grimm, V., Augusiak, J., Focks, A., Frank, B. M., Gabsi, F., Johnston, A. S., et al. (2014). Towards better modelling and decision support: Documenting model development, testing, and analysis using TRACE. Ecological Modelling, 280, 129–139. Grimm, V., Berger, U., Bastiansen, F., Eliassen, S., Ginot, V., Giske, J., et al. (2006). A standard protocol for describing individual-based and agent-based models. Ecological Modelling, 198(1), 115–126. Grimm, V., Berger, U., DeAngelis, D. L., Polhill, J. G., Giske, J., & Railsback, S. F. (2010). The ODD protocol: A review and first update. Ecological Modelling, 221(23), 2760–2768. Grimm, V., Revilla, E., Berger, U., Jeltsch, F., Mooij, W. M., Railsback, S. F., et al. (2005). Pattern-oriented modeling of agent-based complex systems: Lessons from ecology. Science, 310(5750), 987–991. https://doi.org/10.1126/science.1116681 Hauke, J., Lorscheid, I., & Meyer, M. (2017). Recent development of social simulation as reflected in JASSS between 2008 and 2014: A citation and co-citation analysis. Journal of Artificial Societies and Social Simulation, 20(1). Horni, A., Nagel, K., & Axhausen, K. W. (2016). The multi-agent transport simulation MATSim: Ubiquity press London. Janssen, M. A. (2017). The practice of archiving model code of agent-based models. Journal of Artificial Societies and Social Simulation, 20(1), 1–2. Jin, X., Robinson, K., Lee, A., Polhill, J. G., Pritchard, C., & Parker, D. C. (2017). A prototype cloud-based reproducible data analysis and visualization platform for outputs of agent-based models. Environmental Modelling & Software, 96, 172–180. Kang, J.-Y., Aldstadt, J., Michels, A., Vandewalle, R., & Wang, S. (2019). CyberGIS-Jupyter for spatially explicit agent-based modeling: a case study on influenza transmission. In: Paper presented at the proceedings of the 2nd ACM SIGSPATIAL international workshop on GeoSpatial simulation. Kedron, P., Frazier, A. E., Trgovac, A. B., Nelson, T., & Fotheringham, A. S. (2019). Reproducibility and replicability in geographical analysis. Geographical Analysis.

7 Code Reusability and Transparency of Agent-Based Modeling: A Review…

133

Kim, I.-H., & Tsou, M.-H. (2013). Enabling digital earth simulation models using cloud computing or grid computing–two approaches supporting high-performance GIS simulation frameworks. International Journal of Digital Earth, 6(4), 383–403. Liu, J., Dietz, T., Carpenter, S. R., Alberti, M., Folke, C., Moran, E., et al. (2007a). Complexity of coupled human and natural systems. Science, 317(5844), 1513–1516. Liu, J., Dietz, T., Carpenter, S. R., Folke, C., Alberti, M., Redman, C. L., et al. (2007b). Coupled human and natural systems. Ambio: A Journal of the Human Environment, 36(8), 639–649. Lorscheid, I., Berger, U., Grimm, V., & Meyer, M. (2019). From cases to general principles: A call for theory development through agent-based modeling. Ecological Modelling, 393, 153–156. Mell, P., & Grance, T. (2011). The NIST definition of cloud computing (draft). NIST Special Publication, 800(145), 7. Müller, B., Bohn, F., Dreßler, G., Groeneveld, J., Klassert, C., Martin, R., et al. (2013). Describing human decisions in agent-based models–ODD+ D, an extension of the ODD protocol. Environmental Modelling & Software, 48, 37–48. Niu, J., Tang, W., Xu, F., Zhou, X., & Song, Y. (2016). Global research on artificial intelligence from 1990–2014: Spatially-explicit Bibliometric analysis. ISPRS International Journal of Geo-Information, 5(5), 66. Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S., Breckler, S., et al. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425. NRC. (2014). Advancing land change modeling: Opportunities and research requirements. Washington, DC: National Academies Press. NSF. (2007). Cyberinfrastructure vision for 21st century discovery. In: Report of NSF council, Retrieved from http://www.nsf.gov/od/oci/ci_v5.pdf. O'Sullivan, D., Evans, T., Manson, S., Metcalf, S., Ligmann-Zielinska, A., & Bone, C. (2016). Strategic directions for agent-based modeling: Avoiding the YAAWN syndrome. Journal of Land Use Science, 11(2), 177–187. https://doi.org/10.1080/1747423X.2015.1030463 Parker, D. C., Manson, S. M., Janssen, M. A., Hoffmann, M. J., & Deadman, P. (2003). Multi- agent systems for the simulation of land-use and land-cover change: A review. Annals of the Association of American Geographers, 93(2), 314–337. Parry, H. R., & Bithell, M. (2012). Large scale agent-based modelling: A review and guidelines for model scaling. In A. J. Heppenstall, A. T. Crooks, L. M. See, & M. Batty (Eds.), Agent-based models of geographical systems (pp. 271–308). Dordrecht, Netherlands: Springer. Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., & Lorensen, W. E. (1991). Object-oriented modeling and design (Vol. 199). Englewood Cliffs, NJ: Prentice-Hall. Salecker, J., Sciaini, M., Meyer, K. M., & Wiegand, K. (2019). The nlrx R package: A next- generation framework for reproducible NetLogo model analyses. Methods in Ecology and Evolution, 10, 1854–1863. Schmolke, A., Thorbek, P., DeAngelis, D. L., & Grimm, V. (2010). Ecological models supporting environmental decision making: A strategy for the future. Trends in Ecology & Evolution, 25(8), 479–486. Schulze, J., Müller, B., Groeneveld, J., & Grimm, V. (2017). Agent-based Modelling of social- ecological systems: Achievements, challenges, and a way forward. Journal of Artificial Societies and Social Simulation, 20(2), 1–8. Shook, E., Wang, S., & Tang, W. (2013). A communication-aware framework for parallel spatially explicit agent-based models. International Journal of Geographical Information Science, 27(11), 2160–2181. Shook, E., Wren, C., Marean, C. W., Potts, A. J., Franklin, J., Engelbrecht, F., O'Neal, D., Janssen, M., Fisher, E., & Hill, K. (2015). Paleoscape model of coastal South Africa during modern human origins: progress in scaling and coupling climate, vegetation, and agent-based models on XSEDE. In: Paper presented at the proceedings of the 2015 XSEDE conference: Scientific advancements enabled by enhanced Cyberinfrastructure. Sommerville, I. (2016). Software engineering (10th ed.). Essex, UK: Pearson Education.

134

W. Tang et al.

Tang, W. (2008). Simulating complex adaptive geographic systems: A geographically aware intelligent agent approach. Cartography and Geographic Information Science, 35(4), 239–263. Tang, W. (2013). Accelerating agent-based modeling using graphics processing units. In X. Shi, V. Kindratenko, & C. Yang (Eds.), Modern accelerator technologies for geographic information science (pp. 113–129). New York: Springer. Tang, W., Bennett, D., & Wang, S. (2011). A parallel agent-based model of land use opinions. Journal of Land Use Science, 6(2–3), 121–135. Tang, W., & Bennett, D. A. (2010). Agent-based modeling of animal movement: A review. Geography Compass, 4(7), 682–700. Tang, W., & Bennett, D. A. (2011). Parallel agent-based modeling of spatial opinion diffusion accelerated using graphics processing units. Ecological Modelling, 222(19), 3605–3615. Tang, W., & Jia, M. (2014). Global sensitivity analysis of large agent-based modeling of spatial opinion exchange: A heterogeneous multi-GPU acceleration approach. Annals of Association of American Geographers, 104(3), 485–509. Tang, W., & Wang, S. (2009). HPABM: A hierarchical parallel simulation framework for spatially- explicit agent-based models. Transactions in GIS, 13(3), 315–333. Tang, W., & Yang, J. (2020). Agent-based land change modeling of a large watershed: Space-time locations of critical threshold. Journal of Artificial Societies and Social Simulation, 23(1), 15. Tesfatsion, L. (2002). Agent-based computational economics: Growing economies from the bottom up. Artificial Life, 8(1), 55–82. Tesfatsion, L. (2017). Modeling economic systems as locally-constructive sequential games. Journal of Economic Methodology, 24(4), 384–409. Tesfatsion, L. (2020). Agent-based computational economics: homepage. http://www2.econ. iastate.edu/tesfatsi/ace.htm. Thiele, J. C., & Grimm, V. (2015). Replicating and breaking models: Good for you and good for ecology. Oikos, 124(6), 691–696. Towns, J., Cockerill, T., Dahan, M., Foster, I., Gaither, K., Grimshaw, A., et al. (2014). XSEDE: Accelerating scientific discovery. Computing in Science & Engineering, 16(5), 62–74. Vandewalle, R., Kang, J.-Y., Yin, D., & Wang, S. (2019). Integrating CyberGIS-Jupyter and spatial agent-based modelling to evaluate emergency evacuation time. In: Paper presented at the proceedings of the 2nd ACM SIGSPATIAL international workshop on GeoSpatial simulation. Vincenot, C. E. (2018). How new concepts become universal scientific approaches: Insights from citation network analysis of agent-based complex systems science. Proceedings of the Royal Society of London B, 285(1874), 20172360. Waddell, P., Borning, A., Ševčíková, H., & Socha, D. (2006). Opus (the open platform for urban simulation) and UrbanSim 4. In: Paper presented at the proceedings of the 2006 international conference on digital government research, San Diego, California, USA. Wilensky, U., & Evanston, I. (1999). NetLogo: Center for connected learning and computer-based modeling. Evanston, IL: Northwestern University. Wilson, G., Aruliah, D. A., Brown, C. T., Hong, N. P. C., Davis, M., Guy, R. T., et al. (2014). Best practices for scientific computing. PLoS Biology, 12(1), e1001745. Yang, C., & Huang, Q. (2013). Spatial cloud computing: A practical approach. Boca Raton, FL: CRC Press. Zvoleff, A., & An, L. (2014). The effect of reciprocal connections between demographic decision making and land use on decadal dynamics of population and land-use change. Ecology and Society, 19(2).

Chapter 8

Integration of Web GIS with High-Performance Computing: A Container-Based Cloud Computing Approach Zachery Slocum and Wenwu Tang

Abstract In this chapter, we present a Web GIS framework, called GeoWebSwarm, which is driven by containers-based cloud computing technologies. Web GIS applications have been widely used for the dissemination of spatial data and knowledge. However, the computationally intensive nature of these applications prevents the use of Web GIS to explore large spatial data when using traditional single-server paradigms—i.e., a big data challenge. Containers as a service (CaaS) are a potential solution to implementing responsive and reliable Web GIS applications while handling big data. CaaS is made possible through cyberinfrastructure-enabled high- performance computing. Our container-based framework is designed using container orchestration to integrate high-performance computing with Web GIS, which results in improvements on the capacity and capability of Web GIS over single-server deployments. Map tile requests are distributed using a load balancing approach to multiple Web GIS servers through cloud computing based technologies. Through experiments measuring real-time user request performance of multiple Web GIS containers, we demonstrate significant computing performance benefits in response time and concurrent capacity. Utilizing the GeoWebSwarm framework, Web GIS can be efficiently used to explore and share geospatial big data. Keywords Big Data · Container Orchestration · Containers as a Service · Docker

Z. Slocum (*) · W. Tang Center for Applied Geographic Information Science, Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_8

135

136

Z. Slocum and W. Tang

1 Introduction Web GIS applications are built to provide support for the dissemination of spatial information on specific subjects via Internet technologies. Web GIS applications are easy to use as well as informative through spatial data processing, analysis, and visualization within web-based computing environments (Peng and Tsou 2003; Fu and Sun 2010). A variety of Web GIS applications have been developed for different domains, including modelling vector-borne disease risk (Huang et al. 2012), provisioning of information on political representation (Chen and Perrella 2016), and providing citizens with tax or land records (Askounis et al. 2000). Web GIS is well-suited to presenting spatial data and maps as visual forms to researchers, decision-makers, and the general public. At the most basic level of a Web GIS application is data, an essential component used to create maps and knowledge. Large amounts of data, popularly referred to as big data, are now being utilized for geographic studies, as predicted by Krygier (1999). However, big data is a challenge for many areas of research, including geography and Web GIS specifically. The big data challenges can be categorized into three aspects, born out of the general frameworks behind Web GIS: storage, processing, and visualization. In addition to these main aspects, behind them all is the problem of big data transmission and concurrent access. Each aspect of a Web GIS framework is affected by the characteristics of big data (Yang et al. 2017) including volume (file size), velocity (collection or upgrade over time), variety (multiple data types), veracity (quality), and value (usefulness). Web GIS must be able to cope with the volume of big data as modern web browsers are not designed to support large data sizes. Velocity of data is also a concern, as web browsers are not always connected to a Web GIS portal to handle changes to data. Support for multiple data types, qualities, and usefulness of data is also required by Web GIS because it is the tool that needs to present information to users in an effective manner. One must design a Web GIS system to cut through the noise of big data to provide useful knowledge and actionable insights. Web GIS that handles big data, however, requires large amounts of processing power, data storage capacity, and networking speed. As a result, Web GIS has seen an infusion of cyberinfrastructure technologies (NSF 2007), represented by high-performance computing (Wilkinson and Allen 2005), to solve the challenges of working with big spatial data. High-performance computing organizes a set of computing elements (e.g., server) into a computing cluster for acceleration of a large computational task (Wang et al. 2019). The use of HPC for Web GIS often requires the development of a Web GIS cluster, which needs the configuration of multiple Web GIS servers (nodes) one by one (and the number of GIS servers is fixed). The difficulties in developing Web GIS clusters are not only in setting up the basic hardware and operating systems, but also in configuring applications to communicate with each other in the goal of creating an integrated system to serve web maps. The addition of load

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

137

balancing to a web-based system is not a trivial exercise. Software packages may be incompatible and unable to be deployed on the same computing node or have incompatible external dependencies on other packages or libraries. Additionally, initial software installation and applying updates can be non-trivial in an operating system supporting multiple software packages. This introduces significant amounts of efforts, thus poses a challenge, for the development of Web GIS clusters (e.g., setup and reconfiguration). One of the potential solutions is to use containers as a service (CaaS), a state-of- the-art technology from cloud computing (Pahl et al. 2017). Containers are the building blocks of CaaS (Tosatto et al. 2015), and represent a shift in the way applications are run in the cloud. Cloud computing technologies have solved some of the issues raised by Krygier (1999). Specifically, cloud computing is a welcome boost to the capabilities and reduction of processing times of Web GIS, with the additional cost of increased time and effort. CaaS provides another layer of abstraction away from the bare-metal servers. Development of a cyberinfrastructure for a Web GIS application can be driven by CaaS to create the computing environment in a reproducible and automatic fashion. Time and effort are saved through the development of a CaaS-based framework, as it is a one-time process with continuous benefits. Thus, in this study, we propose GeoWebSwarm, a framework of Web GIS cluster, as a potential solution to address the big data challenge facing Web GIS by leveraging cutting-edge Container as a Service (CaaS) technologies. Our framework allows for the orchestration of an entire Web GIS application, from data storage to processing, and finally provisioning of geospatial web services within high-performance computing environments. This Web GIS cluster framework could be implemented on CaaS platforms for ease of use during installation, maintenance, and tasks of geospatial analytics. We provide an easy to use solution to allow geospatial web services to leverage networked computing resources with minimal efforts. For example, setup, maintenance, and redeployment is a straightforward process accomplished with a handful of commands. Our framework is designed to provide high-performance computing capabilities without requiring extensive effort or expert knowledge. Our framework can bring the benefits of high-performance computing to those who may not have access to high-performance computing resources. The design of this framework also allows for high-throughput computing by taking advantage of parallel services to increase available throughput. Our framework uses containers technologies to reduce the use of data storage compared to virtual machines. Distribution of updates to containers is incremental, as opposed to virtual machine disk images requiring distribution of the complete disk. While virtual machines are a widely used tool to manage deployment, their size introduces unnecessary delays during deployment.

138

Z. Slocum and W. Tang

2 Literature Review 2.1 Web GIS Web GIS is based on the use of web technologies to provide mapping and other GIS functionality (e.g., spatial data management, processing, and analysis) over a network. Over years, Web GIS technologies have been evolving, resulting in several Web GIS products. The first Web GIS example is the Xerox PARC Map Viewer that appeared in 1993 (Hardie 1998; Neumann 2008; Veenendaal et al. 2017). The PARC Map Viewer was a static mapping application with multiple zoom levels, map projections, and layers. Different vendors have their own Web GIS solutions (commercial or open source). For example, ESRI entered the Web GIS business with the MapObjects Internet Map Server in 1998 and now ArcGIS Server and ArcGIS Online are their representative of Web GIS products. GeoServer (http://geoserver. org/) is a popular server-side Web GIS software that was first released in 2001 and implements many of the OGC standards. Another Web GIS example is OpenStreetMap, which was founded in 2004 and is based on the mechanism of crowdsourcing (also known as volunteered geographic information (VGI); see Goodchild (2007)). The Open Geospatial Consortium (OGC), which was founded in 1994, is in charge of directing the development of multiple Web GIS standards and technologies. The goal of this new consortium is to provide open standards for Findable, Accessible, Interoperable, and Reusable (FAIR) geospatial information and services (Open Geospatial Consortium 2020). The consortium members consist of a variety of organizations such as government agencies, universities, and private sector businesses. The launch of OGC has substantially stimulated the development of Web GIS. A series of OGC standards have been developed to support geospatial web services. These standards include, according to Batty et al. (2010), Web Map Service (WMS) for single-image raster maps, Web Feature Service (WFS) for vector data, Web Coverage Service (WCS) for raster layers, and Web Processing Service (WPS) for server-side data processing and analysis. Each of these services may have specific extensions. For example, Web Map Tile Service (WMTS), as an extension from WMS, was first developed by Google in 2005 (as a new method for the management of map tiles at that time). WMTS was later codified into an OGC standard, with tiles defined by their location and zoom level instead of a direct rendering of a specific user’s view of a map. Beyond web standards, OGC is responsible for a number of other protocols such as GeoRSS, GML, and GeoPackages (Open Geospatial Consortium 2020). An important feature of Web GIS is mashup. While originally used as a term in music production, the term mashup is commonly used in Web GIS to refer to the combination of data layers to form a single web map. Batty et al. (2010) describe mashups as a powerful tool to provide abstraction from raw datasets. Despite the addition of large datasets, web applications must remain fault tolerant and responsive to user interactions. Users on the internet are not interested in waiting on maps

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

139

to load or data to appear. Yang et al. (2011) discussed this issue in terms of the performance of Web GIS. Yang et al. (2011) stressed the necessity of using performance indicators to quantitatively measure user interactions of Web GIS infrastructure.

2.2 Container Technologies Containers are computing processes running on a host system with access controls in place to prevent interaction between the container processes and host processes. Containers are also configured to share part of the resources of the host system instead of replicating an entire operating system (as in virtual machines). A container only knows about disk resources it has been provided, thus the sandbox paradigm, which is similar to a virtual machine but light weighted. Compared to virtual machines, containers perform much closer in performance to native resources running directly on a computing node and can out-perform a virtual machine solution (Felter et al. 2015). A single container may be designed to include all required software dependencies for a specific version of software package. Computing tasks within a container are completed without interfering with other applications. The containers allow a set of software packages to be deployed on one or more physical machines. Container instances can be easily replicated and destroyed as needed. These containers can be chained together through orchestration mechanism—so- called container orchestration (Khan 2017). A suite of container software are available now, including Docker (https://www.docker.com/), Singularity (https:// singularity.lbl.gov/), and Kubernetes (https://kubernetes.io/). Pahl and Lee (2015) reviewed the technology required to move computing from large-scale data centers to distributed nodes at the edge of networks. They argued that virtual machines (VMs), the traditional virtualization technology of cloud computing, consume too many resources to be viable in, for example, an edge cloud architecture. They investigate the lighter weight alternative—i.e., containers here. The authors noted that containers are more flexible, portable, and interoperable than VMs. They also stressed the use of componentizing workflows. A key component of containers is also their replicability to provide multiple endpoints for a workflow task. Pahl and Lee (2015) defined this as an application service—a “logical group of containers from the same image,” and highlighted that application services allow for scaling an application across computing nodes. Often, this multi-node scaling is conducted in search of high availability and/or increased throughput. In this fashion, containers may be placed closer to or at the source of data and provide increased access, distributed across all data sources. For example, Docker provides the necessary virtualization of workflows to make the lightweight distribution of workflows possible. Pahl and Lee (2015) uses the example of a web interface component provided by Tomcat web server to illustrate the need for containers. A container contains all libraries and binaries required to run the Tomcat web server wherever the container is placed.

140

Z. Slocum and W. Tang

Geographic Information Science (GIScience) and related domains have seen uses of containers in recent literature. Containers are quickly becoming another tool for GIScientists and researchers to efficiently conduct and share analyses. Containers have been shown in the literature to benefit monitoring the transportation of hazardous waste, as shown in Cherradi et al. (2017). Containers technologies have also been used to support Data as a Service (DaaS). For example, they provide increased capabilities to local machines through the leveraging of cloud resources, such as increasing access to real-time spatiotemporal datasets using containers as proven by Wang et al. (2019). While containers have been used to bridge the gap between the cloud and local machines, workflows deployed purely in the cloud have been introduced to spatial data handling. For example, JupyTEP IDE was developed for high- performance processing of Earth observation data in a multi-user, multi-data source fashion (Rapiński et al. 2019). All these examples demonstrate the utility of containers technologies in GIScience.

3 Methods We present the GeoWebSwarm framework, which supports the rapid development of high-throughput Web GIS cluster via containers technologies—i.e., integration of high-performance computing and Web GIS for the need of high availability or high-throughput computing. First, we focus on the overall framework and its modules. Then, we discuss specific implementation details of the conceptual framework.

3.1 Framework Design of Web GIS Cluster The HPC-enabled Web GIS cluster is a framework designed for easy and rapid development of a cluster of Web GIS servers based on container as a service (CaaS) technologies (see Fig. 8.1). While creating GeoWebSwarm, we aimed to satisfy six of our own goals: approachable for geographers, rapid deployment, straightforward setup, increase performance over traditional solutions, provide efficient scaling, and remain hardware agnostic. We gain a high-throughput cluster when efficient scaling is combined with a container-aware load balancer. These design goals relate to increasing use and applicability of the framework. The design of this framework also maintains support for hosting multiple datasets and layers. This is an important feature of Web GIS which was not compromised in our design. Our framework supports map mashups, or the use of multiple data layers on a map to further an analysis. Crucially, containers are an essential element in the design of the framework as they provide a variety of benefits over traditional bare-metal servers or virtual machines. The design of the HPC-enabled Web GIS cluster framework follows the generic Internet GIS framework, which is based on the coupling of client- and server-side Web GIS and distributed geospatial web

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

141

Fig. 8.1 Conceptual design of the Web GIS Cluster framework, GeoWebSwarm

services. Specifically, on the server side, this framework includes four main modules: parallel web services, load balancing, data server, and container orchestration. 3.1.1 Parallel Web Services The framework relies on the coordination of multiple Web GIS servers to provide high-throughput geospatial web services. This module oversees management of Web GIS servers. Each of the servers communicate with the data server for configuration and data storage. There are two types of Web GIS servers: worker and manager. Configuration of multiple Web GIS server workers is conducted by a Web GIS server manager—i.e., forming a cluster of Web GIS servers. Containers are the underlying mechanism used to host Web GIS servers in this module. Each container hosts a web server such as Apache Tomcat. The web server runs a Web GIS server for the provisioning of geospatial web services, represented by Web Map Service, Web Feature Service, Web Coverage Service, and Web Processing Service. Any change in configuration of these Web GIS servers can be propagated through the container orchestration platform, restarting containers according to a schedule for seamless transitions between configurations. By

142

Z. Slocum and W. Tang

restarting the entire container, this framework avoids complicated setup of communication between manager and worker Web GIS servers. Workers are only aware of themselves, allowing service discovery, load balancing, and configuration propagation to take place at the orchestrator platform level. 3.1.2 Load Balancing The load balancer module is another key component of this framework. Without the load balancer, this Web GIS Cluster would be unable to handle requests efficiently. The load balancer serves as a proxy (under a single web address) to communicate with the set of Web GIS servers. The primary function of the load balancer is to assign user requests to Web GIS servers that are available so that the workload assigned to each Web GIS server is balanced as much as possible. There are multiple load balancing algorithms (see DeJonghe 2019), such as round-robin to distribute requests among servers equally, least-connected to distribute requests among servers which have the least number of active connections (see Traefik 2019), and a hybrid approach using multiple algorithms. Load balancing also provides features such as high availability and fault tolerance as needed for Web GIS applications designed as microservices enabled by containers (see Khan 2017). Load balancing has been applied in the literature, allowing client requests to be distributed among servers for a service (Wu et al. 2013). Additionally, Cheng and Guan (2016) contributed the use of caching servers in a Web GIS framework to further speed up services under large loads. Load balancing techniques are often used in microservices to better distribute workloads across nodes. High availability relies on detecting failures in a microservice and adjusting the routing of work to working microservices. This process is enabled through another responsibility of the orchestration platform known as service discovery. When a microservice fails, the platform must remove the malfunctioning container and route to other containers of the same image. In traditional VM-based workflows, services are statically defined by IP addresses and ports (in application configuration files). Containers are dynamically assigned to nodes and software-based IP addresses, requiring service discovery techniques to keep track of where container endpoints are located. A load balancer supporting service discovery is a server-based method to handle service requests. When a container is created or removed, the load balancer is updated through the orchestration platform with a new list of available containers and their network locations. This method of managing the connections between services is beneficial for fault tolerance and maintenance activities as well as the rebalancing of services across a cloud computing cluster. Containers may be changed in a rolling-update fashion to maintain service availability while performing platform or application maintenance tasks. The framework utilizes a cluster-aware load balancer such as Traefik or NGINX to adapt to changing container IP addresses and network topologies.

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

143

3.1.3 Data Server Module The data server module is for storage of service configuration and geographic datasets, resulting in the need for a data server that can support the size and speed required to provide data to all servers. Examples of a data server that fit into the framework in this module are Ceph, GlusterFS, and NFS. Service configuration for the framework is file based, requiring the use of a network file system. Management of these configurations is conducted using text editors to access the files. 3.1.4 Container Orchestration Module The functionality of the first three modules (parallel web services, load balancing, and data server) in this framework is encapsulated in containers. Containers allow a workflow to be split into independent tasks, called services, which communicate over the network. However, there is a need to manage running services as the number can be modified on-the-fly as user demands change. Container orchestration, as an important module in our framework, provides such support for the management of these container-based cloud services. Container orchestration is “coordinating, at software and hardware layer, the deployment of a set of virtualized services in order to fulfill operational and quality objectives of end users and Cloud providers” (Tosatto et al. 2015, p. 70). With support from container orchestration, software and hardware in a computing system from different containers work together to provide cloud-based Web GIS composite services (e.g., Web GIS services in this study). Container orchestration provides the ability to rapidly deploy several containers and scale them on the fly. In order to remain cost effective, containerized services often need to be scaled to satisfy end user demands that are dynamic. Managing and scheduling the containers for cloud services is an important function of container orchestration platforms. A platform must provide abilities such as flexible scheduling, supporting maintenance activities, service discovery, and throttling tasks. Existing container-based software platforms provide capabilities for container orchestration. There is a collection of such orchestration platforms available (see Khan 2017), such as Kubernetes (https://kubernetes.io/), Amazon’s Elastic Container Service (https://aws.amazon.com/ecs/), Docker Swarm (https://docs. docker.com/engine/swarm/), Microsoft Azure (https://azure.microsoft.com/), and Rancher OS (https://rancher.com/rancher-os/). The primary use of container orchestration in this framework is to provide high availability and consistent installation of the framework.

144

Z. Slocum and W. Tang

3.2 Implementation of the Framework The implementation of the framework consists of six major software packages: Docker Swarm, Traefik, Apache HTTP server, Apache Tomcat, GeoServer, and NFS. The components and their relationships are described in this section and Fig. 8.1 Specifications of the commodity hardware used to implement GeoWebSwarm is described in Table 8.1. We used a Beowulf-style cluster (Wilkinson and Allen 2005) due to the low cost and readily available hardware. Our cluster is in the Beowulf-style because it uses heterogeneous hardware. The commodity hardware in our cluster was not originally designed for cluster computing but was pooled together to approach a consistent computing environment. The container-enabled Web GIS framework is designed for easy setup and rapid reconfiguration. To support this goal, we used Docker to implement containers of Web GIS framework. Docker is a container virtualization solution based on the idea of operating system images defined by scripts called Dockerfiles. A Docker container starts as a bare-bones (less than 100 MB) Linux image with software and configuration added as layers. Container images may share these layers to conserve disk space and decrease build times. Each line of a Dockerfile script creates a new layer. The Docker project also provides mechanisms to distribute a container after it has been built. A publicly available registry, Docker Hub, allows developers to share their container images. A container may be pulled down then executed from Docker Hub to any node running Docker using a single command. Container layers are also cached locally on nodes to decrease initialization times to within seconds. In this way, containers are extremely portable as they require only a Docker installation on the host node. This portability suites the cloud computing infrastructure well as nodes in a cloud maintain a consistent environment. Further, in this study, we used Docker Swarm (https://docs.docker.com/engine/swarm/) as the container orchestration software. Traefik (https://docs.traefik.io/) is a reverse proxy and load balancer software which integrates with the Docker daemon to provide a range of benefits to this framework. The most important benefit is service discovery. The service discovery functionality allows Traefik to reconfigure itself automatically when the structure of the framework is changed. For example, if an additional GeoServer worker is started, Traefik detects this change and begins forwarding requests to the worker. The next benefit is acting as a reverse proxy. A reverse proxy takes requests from outside sources and forwards them to the correct service. Table 8.1 Physical configuration of the high-performance computing cluster

Node (# nodes) RAM CPU Disk Network

Manager node (1) 32 GB 4 cores, 3.6 GHz 500 GB HDD 1 Gbit/s

Worker nodes (11) 16 GB 4 cores, 3.6 GHz 500 GB HDD 1 Gbit/s

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

145

The Apache container is another component of the framework. An Apache web server (https://httpd.apache.org/) provides the Web GIS client software (see Fig. 8.2) written in HTML, CSS, JavaScript, and PHP. A single Apache container is used because the size of the client is small with comparison to the number of requests and request file sizes served by GeoServer. A web browser requests data from the Apache container only once, during the initial page load. Our implementation uses a single Network File System (NFS) host to provide persistent storage to the containers. Traefik is configured on-the-fly during container startup and does not require any data persistence. Apache web server sources all Web GIS client files from a network volume defined for its container. GeoServer, with its extension GeoWebCache, uses the network volumes for all configuration data and geospatial data stores. GeoServer provides geospatial web services, for example, Web Map Service (WMS) and Web Map Tile Service (WMTS), that can be mashed up (assembled) in the Web GIS client (OpenLayers was used in this study; https://openlayers.org/). In our implementation, GeoServer instances in containers are independent and do not communicate with each other directly. Additionally, a management container is designed to provide access to the configuration interface of GeoServer. The framework does not limit the number of

Fig. 8.2 Client interface of the GeoWebSwarm framework

146

Z. Slocum and W. Tang

GeoServer workers that are allowed and needs no reconfiguration when the number of workers changes (i.e., our Web GIS system is auto scalable).

4 Experiment and Results In this section, we conduct experiments to evaluate the computing performance of our Web GIS cluster framework—i.e., GeoWebSwarm. We focus on evaluating the capacities and capabilities of the framework.

4.1 Hardware Configuration and Data The experiment was conducted on a computing cluster with identical configurations for each node. The computing cluster consists of one manager node and a set of worker nodes. Physical specifications of these nodes were detailed in Table 8.1. The manager node is responsible for running the load balancer. Worker nodes were dedicated to Web GIS functionality—i.e., each of them is a Web GIS server. We use a case study to determine the utility of this framework in enabling access to big data through HPC resources. Our case study is designed to put the framework under a large amount of load for a single Web GIS server to accurately measure the performance gains of multiple parallel servers. Road data in the Southeastern United States from OpenStreetMap (OSM) are used to provide a sufficiently large dataset for experimentation. Figure 8.3 shows the extent of this dataset. The Web GIS cluster is necessary to process this large dataset. The dataset does not fit inside the memory of a cluster computer used for the experiment, allowing

Fig. 8.3 Map of study area, Southeast United States

8 Integration of Web GIS with High-Performance Computing: A Container-Based… Table 8.2 Details on datasets used in this study

Dataset Motorway Trunk Primary Secondary Tertiary Residential Others Unclassified Total

File size (MB) 309 199 335 477 463 8,360 6,460 304 16,907 MB (16.9GB)

147

# Spatial features 170,672 107,561 167,458 241,266 227,481 4,225,428 3,647,482 150,041 8,937,389

for a test of all components of the cluster computer. OpenStreetMap data presents a big data challenge because it must be processed from files and sent as tiles back to the Web GIS client. An overview of the file size and feature count of the case study dataset is included in Table. 8.2.

4.2 Simulation of Client-Side User Activities We used Apache JMeter as a simulation engine to emulate user activities on the client side. JMeter was designed to the functional behavior of Internet users to evaluate the performance of web applications (https://jmeter.apache.org/). Apache JMeter is flexible, and often used to test the performance of Web GIS servers (e.g., Guiliani et al. 2013). Figure 8.4 details the relationship between the framework and the testing machine. JMeter can simulate multiple users to interact with a web application. In this experiment, 200 users were simulated to submit requests. Each user was a jMeter thread, requesting single tiles in a loop over the course of the treatment. Each request is a standard HTTP GET request and conforms to the Web Map Tile Service (WMTS) specification. A WMTS request includes parameters for the URL of the Web GIS server, the requested layer, image format, projection, and bounding box of the requested tile. The request is recorded in JMeter (as part of a Test Plan). To facilitate the performance test, we simulated 100 WMTS requests in a pool of requests. Simulated users draw randomly a simulated request from the request pool and then submit to the Web GIS Cluster. Users will then receive a response, which is a single tile for a specified location and zoom level. We refer to this request and response set as a transaction (see Fig. 8.4). Additionally, to further eliminate the impact of a web cache, we disabled tile caching in GeoServer to force redrawing of each tile from the source dataset. We also disabled any caching in jMeter as otherwise this would skew our experiment results.

148

Z. Slocum and W. Tang

Fig. 8.4 Interaction between load testing computer and the Web GIS cluster

Table 8.3 Experimental results with performance metrics (time unit: seconds; std.: standard deviation) Treatment T1 T2 T3 T4 T5 T6

Web GIS servers 1 2 4 6 8 10

#Transactions 138,155 232,097 348,832 420,302 438,168 514,810

Average trans time (std) 136.05 (129.99) 98.7 (96.63) 69.01 (81.75) 56.82 (69.87) 49.97 (61.55) 45.17 (54.55)

Throughput 460.52 773.66 1162.77 1401.01 1460.56 1716.03

Speedup 1.000 1.680 2.525 3.042 3.172 3.726

Efficiency 1.000 0.595 0.396 0.329 0.315 0.268

4.3 Experiment Configuration This experiment includes six treatments to evaluate the performance of the Web GIS cluster framework. These treatments use different number of worker nodes in the Web GIS cluster. Treatment T1 uses one Web GIS server worker (as a control to compare with other treatments). The number of Web GIS server workers for other treatments (T2–T6) are 2, 4, 6, 8, and 10 (see Table 8.3). In this study, JMeter was deployed on a single computer (outside the cluster network). All treatments have the same parameters in JMeter. In each treatment, simulated users (from JMeter) will interact with the Web GIS cluster for 5 min. The experiment was run for 5 min to avoid a “burst” of throughput in the cluster from datasets already loaded into memory. 5 min provided time to simulate enough transactions that the burst was averaged out of the data for realistic sustained performance metrics. During the testing of this framework, only transactions which were

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

149

completed within 500 ms were analyzed. Transactions which took longer than 500 ms were outliers and represented 1.79% of the total transactions.

4.4 Performance Metrics To quantify the performance of utilizing multiple Web GIS servers, two measures were collected for each treatment. These are the number of transactions completed in the test duration per treatment, and transaction time—i.e., the amount of time each transaction took to complete (se). Transaction time in this study is defined as the total round-trip time from the start of a web map tile request to the end of the response from the cluster. This time is measured in milliseconds (ms). A lower value of transaction time shows a better performance. We have three metrics defined in this study to measure the computing performance of our system: throughput, speedup, and efficiency. Average transaction time serves as the basis for all three metrics. Throughput is a metric that evaluates the number of transactions completed per time unit (seconds here; see Eq. 8.2), which is used for the evaluation of the capacity of our Web GIS cluster. Higher values of throughput mean higher computing capacities to process user requests. Speedup and efficiency are metrics for evaluating the acceleration capabilities of our Web GIS cluster. Speedup is derived by dividing the execution time using a multiprocessor by that using a single processor (Wilkinson and Allen 2005). Speedup is a well- used metric to evaluate the acceleration of parallel computing algorithms. We adapt the speedup equation to use servers instead of processors in this study. As the experiment used a consistent time duration for each treatment, the speedup, s(p) (see Eq. 8.3), was adapted to use average transaction time instead of execution time. Efficiency, e(i), is defined as the speedup using i servers divided by the number of servers (see Eq. 8.4). t trans ( i ) = thi =

atrans ( i ) ntrans ( i )

ntrans ( i )

tduration

s (i ) =

e=

thi th1

s (i ) i

(8.1) (8.2)

(8.3) (8.4)

150

Z. Slocum and W. Tang

where ttrans(i) is the average transaction time when the number of Web GIS servers is i. atrans(i) represents the sum of transaction times for a specific treatment. ntrans(i) is the number of transactions for a specific treatment in terms of number of Web GIS servers, i, working together. tduration is the time duration of the test conducted for a specific treatment. thi represents the throughput of a treatment using i Web GIS servers. The speedup formula used in this study will use the average transaction time to evaluate the acceleration of our Web GIS cluster (see the following derivation for its relationship with original speedup formula). ntrans ( i ) s (i ) =

thi t = duration ntrans (1) th1 tduration

tduration ntrans (1) = tduration ntrans ( i ) =

(8.5)

t trans (1) t trans ( i )

4.5 Results For each treatment of the experiment, performance measures were collected by jMeter. The program provides data on every request it makes to a service. We monitored the entire process for our experiment in terms of total CPU, memory, and network usage. The monitoring results were collected from the Docker Swarm cluster by external containers running on each cluster node. Figure 8.5 shows the monitoring results, which indicate that there is no computational bottleneck for our Web GIS cluster to run the six treatments in our experiment. We did not observe a CPU utilization overload on the Web GIS workers, nor problems in memory or network utilization. In each monitoring metric, there was additional capacity remaining unused. Results of performance measures were processed by treatment and are shown in Table 8.3. Our experimental results illustrate that the framework based on a Web GIS cluster provides substantial improvement over a single Web GIS server in general. This is supported by the throughput and speedup results (in Table 8.3). Efficiency has a negative relationship with the number of servers, indicating additional overhead with each additional server. This may be due to some overhead introduced by the load balancer. While monitoring the experiment, the testing servers did not reach maximum capacity while serving requests. Although the efficiency

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

151

a 8000% 6000% 4000% 2000% 0%

13:40

13:45

13:50

13:55

14:00

14:05

14:10

14:15

13:40

13:45

13:50

13:55

14:00

14:05

14:10

14:15

b 150 GB 100 GB

50 GB

0B

c

50 MB/s 0 B/s -50 MB/s

-100 MB/s

13:40 Received Avg: 23.3 MB/s

13:50

14:00 Transmited Avg: -43.0 MB/s

14:10

Fig. 8.5 Monitored results of the Web GIS cluster performance (a: Total cluster CPU usage; b: Total cluster memory usage; c: Total cluster network usage)

of the framework tends to be lower with each increment in the number of Web GIS servers, performance gains are evident. The total number of transactions as well as the averaged transaction time improves as the number of servers increases (see Fig. 8.6). Concurrent capacity is shown to increase as in Fig. 8.7. Here, the number of transactions completed in approximately 8 ms increases with the number of servers. While constructing Fig. 8.7, we chose to cutoff at 150 ms to focus on the lower transaction times. Times above 150 ms continued the pattern of decline until 500 ms where we set a hard cutoff on transactions. Transactions exceeding 500 ms in duration were excluded from this analysis as outliers. In this figure it is apparent the peak of each line is around 14 ms, no matter the number of servers used. We believe the similarity in this peak as well as the increasing slope of the transaction times as servers increased is a good indicator that a performance bottleneck was not present while sending requests to the servers. In the case of an ingress

152

Z. Slocum and W. Tang

Fig. 8.6 Number of transactions served in 5 min compared to average transaction time

Fig. 8.7 Distribution of transaction time by treatment

bottleneck, we would expect the peaks to shift into higher transaction times because there would be requests in the queue pending completion. It also appears that between 8 and 14 ms is the absolute minimum time needed by each individual GeoServer worker to respond to a request.

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

153

A thorough investigation of the response times per treatment is necessary to fully understand the speedup factor. Here we specifically compare treatment T1 (1 server used) and T6 (10 servers used). Figure 8.8 shows histograms of response times. The histograms were constructed with a common bin size and frequency axis for comparison purpose. In these histograms, we can observe performance improvements while using multiple Web GIS servers. Table 8.4 shows the data for the generation of histograms. Using ten servers, the framework was capable of handling 56.06% of requests within 25 ms, compared to only 19.50% for the single server (used in treatment T1). These results demonstrate that our framework’s capacity increases with the number of servers.

5 Concluding Discussion Web GIS presents an opportunity to share spatial data and information with others across a range of domains. As the introduction of big data to GIScience has progressed, Web GIS has remained as a key tool for the processing, analytics and visualization of spatial data. However, the configuration and maintenance of Web GIS application backends has always been a challenge. Cyberinfrastructure-enabled computing platforms and technologies have been introduced to provide Web GIS the computational support it requires, including high-performance computing and cloud computing represented by containers technologies. Virtual machines could have been used in our framework in exchange for slower deployment, startup, and

Fig. 8.8 Histograms of transaction times for treatment 1 and 6

154

Z. Slocum and W. Tang

Table 8.4 Frequency distribution of transaction times Bin (in ms) 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 More Total

T1 (1 worker) 26,937 27,799 13,900 9123 6682 5260 4664 4360 4523 4750 4847 4600 4150 3568 3115 2632 2223 1896 1639 1487 0 138,155

T2 (2 workers) 47,212 53,791 30,469 21,665 17,583 12,086 8699 7473 6633 5442 4248 3496 3042 2451 2022 1553 1386 1142 944 760 0 232,097

T3 (4 workers) 126,120 99,373 34,824 17,580 11,796 10,831 10,395 8932 6750 4745 3718 3055 2591 2018 1657 1309 1017 822 662 637 0 348,832

T4 (6 workers) 202,779 96,150 30,498 20,066 15,900 12,006 11,051 8379 5629 4342 3469 2724 2099 1647 1172 892 623 397 241 238 0 420,302

T5 (8 workers) 245,043 71,616 37,281 24,683 15,622 9000 8918 7371 5078 3934 2973 2222 1598 1122 700 436 215 145 101 110 0 438,168

T6 (10 workers) 288,578 95,057 51,996 26,116 13,744 6829 7178 7429 5588 4345 2989 2103 1273 759 369 177 96 61 57 66 0 514,810

configuration reloads. We found that while virtual machines may boot rather quickly (within 30 s), they add this to the time it takes the applications to start (around 20 s). The combined time is quite good but at least doubles the time for each boot. Our framework uses approximately 2GB for Apache, GeoServer, and Traefik containers. In comparison, a virtual machine image would require this 2GB of applications as well as an entire operating system, which we found could be up to 5GB in size. GeoWebSwarm, the Web GIS cluster framework presented here, provides solid support for resolving the big data challenge facing Web GIS. The container-based cloud computing approach built in this framework allows for the rapid and automatic integration of high-performance computing and Web GIS. The Web GIS cluster framework allows geographers to focus on the application and provides an automated infrastructure that can manage considerable computation needs by leveraging high- performance computing. Our framework has the capability to handle web map tile requests within a reasonable time to provide smooth real-time user interactivity, as demonstrated by throughput results in the experiment. The capability to handle these requests is supported using a container-aware load balancing approach. Alongside capability, the framework also provides the capacity to respond to many user requests through

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

155

concurrent Web GIS servers. Our experimental results indicate that the use of multiple identical containers is responsible for the increased capacity of this framework, with additional benefits. One such benefit is the real-time adaption to partial cluster outages. Containers may be created or destroyed on the fly as compute nodes are put under maintenance. A container-aware load balancer deals with this changing network situation and routes requests to containers that are active. Thus, maintenance may be performed on the host cluster as well as the framework in a rolling-update fashion, reducing downtime and providing a seamless experience to users. Through the integration of a container orchestration platform and our framework, Web GIS professionals are presented with a reliable software infrastructure for the management and visualization of spatial data. While the framework is fully operational currently, there exists some limitations as demonstrated in the implementation in need of further investigation. The Web GIS cluster in this study uses NFS, which is a non-distributed file system that may prevent our framework from working if there is a failure of the single data server. In the future, we will implement the data server using a distributed file system approach such as Ceph or GlusterFS. Another limitation of the framework is that spatial data are maintained in file formats (ESRI Shapefiles), which will affect the computing performance of the Web GIS cluster in terms of data access. In the future, we will use a spatial database approach (e.g., PostGIS run as a container) to help resolve this limitation. As highlighted by Dangermond and Goodchild (2019), reproducibility remains important in GIScience. GeoWebSwarm is an example of using containers to conduct reproducible GIS-based data analytics. This is because containers perform repetitive and consistent actions despite changing HPC environments. While the framework was designed for Web GIS applications, it can be generalized to provide access to other kinds of services such as machine learning APIs or geocoding. A generalized version of this framework would also provide benefits for non- web applications. At the most basic level, our framework provides network access to services, without the need for direct access to the Internet. We believe this framework would also provide benefits for non-real time workloads. Within the GIScience domain, there are many computing problems which could be automated such as spatial data processing, analytics, and visualizations. The computation of these problems is possible to be implemented in containers as well, which could benefit from our framework’s support for computationally intensive applications. This deserves further studies in our future work.

6 Code Availability Our implementation of GeoWebSwarm has been made available at https://github. com/zacheryslocum/GeoWebSwarm. Acknowledgement We would like to thank Dr. Elizabeth Delmelle and Dr. Eric Delmelle for their guidance as members of the capstone committee of the first author on which this work is

156

Z. Slocum and W. Tang

based. We are also indebted to the anonymous reviewers for their insightful comments and suggestions. The authors also recognize the Center for Applied Geographic Information Science at UNC Charlotte for providing the computing resources to make this work possible.

References Askounis, D. T., Psychoyiou, M. V., & Mourtzinou, N. K. (2000). Using GIS and web-based technology to distribute land records: The case of Kallithea, Greece. Journal of Urban Technology, 7(1), 31–44. https://doi.org/10.1080/713684105 Batty, M., Hudson-Smith, A., Milton, R., & Crooks, A. (2010). Map mashups, web 2.0 and the GIS revolution. Annals of GIS, 16(1), 1–13. https://doi.org/10.1080/19475681003700831 Chen, Y., & Perrella, A. (2016). Interactive map to illustrate seat distributions of political party support levels: A web GIS application. Cartographica, 51(3), 147. https://doi.org/10.3138/ cart.51.3.3288 Cheng, B., & Guan, X. (2016). Design and evaluation of a high-concurrency web map tile service framework on a high performance cluster. International Journal of Grid and Distributed Computing, 9(12), 127–142. https://doi.org/10.14257/ijgdc.2016.9.12.12 Cherradi, G., El Bouziri, A., Boulmakoul, A., & Zeitouni, K. (2017). Real-time HazMat environmental information system: A micro-service based architecture, vol. 109, pp, 982-987): Elsevier B.V, Amsterdam. Dangermond, J., & Goodchild, M. F. (2019). Building geospatial infrastructure. Geo-Spatial Information Science, 1–9. https://doi.org/10.1080/10095020.2019.1698274 DeJonghe, D. (2019). NGINX cookbook [second release]advanced recipes for high performance load balancing (1st edn). Felter, W., Ferreira, A., Rajamony, R., & Rubio, J. (2015, 29-31 March 2015). An updated performance comparison of virtual machines and Linux containers. In: Paper presented at the 2015 IEEE international symposium on performance analysis of systems and software (ISPASS). Fu, P., & Sun, J. (2010). Web GIS: Principles and applications. Redlands, CA: Esri Books. Goodchild, M. F. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal, 69(4), 211–221. https://doi.org/10.1007/s10708-007-9111-y Guiliani, G., Dubois, A., & Lacroix, P. (2013). Testing OGC web feature and coverage service performance: Towards efficient delivery of geospatial data. Journal of Spatial Information Science, 7, 1–23. https://doi.org/10.5311/JOSIS.2013.7.112 Hardie, A. (1998). The development and present state of web-GIS. Cartography, 27(2), 11–26. https://doi.org/10.1080/00690805.1998.9714273 Huang, Z., Das, A., Qiu, Y., & Tatem, A. (2012). Web-based GIS: The vector-borne disease airline importation risk (VBD-AIR) tool. International Journal of Health Geographics, 11(1), 33. https://doi.org/10.1186/1476-072X-11-33 Khan, A. (2017). Key characteristics of a container orchestration platform to enable a modern application. IEE Cloud Computing, 4(5), 42–48. https://doi.org/10.1109/MCC.2017.4250933 Krygier, J. (1999). World wide web mapping and GIS: An application for public participation. Cartographic Perspectives, 33, 66–67. https://doi.org/10.14714/CP33.1023 Neumann, A. (2008). Web mapping and web cartography. In S. Shekhar & H. Xiong (Eds.), Encyclopedia of GIS (pp. 1261–1269). Boston, MA: Springer US. NSF. (2007). Cyberinfrastructure Vision for 21st Century Discovery. Open Geospatial Consortium. (2020). Welcome to The Open Geospatial Consortium. Retrieved from https://www.opengeospatial.org/ Pahl, C., Brogi, A., Soldani, J., & Jamshidi, P. (2017). Cloud container technologies: A state- of-the-art review. IEEE Transactions on Cloud Computing, 1–1. https://doi.org/10.1109/ TCC.2017.2702586

8 Integration of Web GIS with High-Performance Computing: A Container-Based…

157

Pahl, C., & Lee, B. (2015). Containers and clusters for edge cloud architectures – A technology review. In: Paper presented at the 2015 3rd international conference on future internet of things and cloud, Rome, Italy. Peng, Z.-R., & Tsou, M.-H. (2003). Internet GIS: Distributed geographic information Services for the Internet and Wireless Networks. New York: Wiley. Rapiński, J., Bednarczyk, M., & Zinkiewicz, D. (2019). JupyTEP IDE as an online tool for earth observation data processing. Remote Sensing, 11(17). https://doi.org/10.3390/rs11171973 Tosatto, A., Ruiu, P., & Attanasio, A. (2015). Container-based orchestration in cloud: State of the art and challenges. In: Paper presented at the ninth international conference on complex, intelligent, and software intensive systems, Blumenau, Brazil. Traefik. (2019). Basics. Retrieved from https://docs.traefik.io/v1.7/basics/#load-balancing Veenendaal, B., Brovelli, M., & Li, S. (2017). Review of web mapping: Eras, trends and directions. ISPRS International Journal of Geo-Information, 6(10), 317. https://doi.org/10.3390/ ijgi6100317 Wang, S., Zhong, Y., & Wang, E. (2019). An integrated GIS platform architecture for spatiotemporal big data. Future Generation Computer Systems, 94, 160–172. https://doi.org/10.1016/j. future.2018.10.034 Wilkinson, B., & Allen, M. (2005). Parallel programming (2nd ed.). Upper Saddle River, NJ: Pearson Prentice Hall. Wu, H., Guan, X., Liu, T., You, L., & Li, Z. (2013). A high-concurrency web map tile service built with open-source software. In Modern accelerator technologies for geographic information science (pp. 183–195). Boston, MA: Springer. Yang, C., Huang, Q., Li, Z., Liu, K., & Hu, F. (2017). Big data and cloud computing: Innovation opportunities and challenges. International Journal of Digital Earth, 10(1), 13–53. https://doi. org/10.1080/17538947.2016.1239771 Yang, C., Wu, H., Huang, Q., Li, Z., Li, J., Li, W., et al. (2011). WebGIS performance issues and solutions. In D. V. Li (Ed.), Advances in web-based GIS, mapping services and applications (pp. 121–138). London: Taylor & Francis Group.

Chapter 9

Cartographic Mapping Driven by High-Performance Computing: A Review Wenwu Tang

Abstract Cartography and geovisualization allow for the abstraction and transformation of spatial information into visual presentations that facilitate our understanding of geographic phenomena of interest. However, cartographic mapping, which are central in cartography and geovisualization, often faces a computational challenge when spatial data become massive or the algorithms that process these data are complicated. In this chapter, I conduct a review to investigate the use of high-performance computing in the domain of cartography and geovisualization. The review focuses on major cartographic mapping steps, including map projection, cartographic generalization, mapping methods, and map rendering. Further, specific challenges facing cartography and geovisualization are discussed by focusing on big data handling and spatiotemporal mapping. Keywords Cartography · Geovisualization · Cartographic Mapping · High- Performance Computing

1 Introduction Cartography and geovisualization provide a way of abstracting and transforming the spatial information from our environment into the form of maps (Slocum et al. 2008; Dent et al. 1999), which can facilitate the communication of these maps among users. While cartography is central in static mapping and geovisualization focuses on interactive visualization of spatial information, maps are the products of both. Maps can be of help for us to understand the spatial characteristics of a

W. Tang (*) Center for Applied Geographic Information Science, Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_9

159

160

W. Tang

geographic system of interest and spatial relationship among entities within the system. Different forms of maps, including general reference and thematic maps, have been developed in the domain of cartography and geovisualization to support the visualization and interpretation of spatial information (Slocum et al. 2008). The design and implementation of these maps require a series of cartographic steps such as map projection, map symbolization, map generalization, mapping methods, and map production (Slocum et al. 2008). These cartographic steps constitute the process of mapping and are central to the paradigm of analytical cartography (Tobler 1976). The visualization of spatial information via cartographic mapping has been stimulated by the advancement of computer techniques. Visualization is an important technology from computer science that enables geocomputation (Gahegan 1999). As a result, mapping has become a necessary capability in GIS and spatial analysis software packages. However, over the past decade, massive spatial data in different dimensions (2D, 3D, and even higher) are collected from sensor technologies and the spatial resolutions of these data become higher. More data are generated as we apply geospatial processing and modeling approaches on these data. Further, algorithms related to cartographic mapping steps may be complex (e.g., map projection, generalization, or data classification; see Armstrong et al. 2003). Driven by the factors above, cartographic mapping often poses a computational challenge. The resolution of this computational challenge requires support with higher computing capacities and capabilities. Thus, high-performance computing (HPC) has been recognized as a potential option to address the computational challenge facing cartographic mapping in the study of cartography and geovisualization (Armstrong 2000). The application of HPC in cartography and geovisualization can be traced back to 1990s when researchers were increasingly applying HPC into geographic studies (Openshaw and Turton 2000; Armstrong 2000). Since then, HPC has experienced a significant evolution in technologies as demonstrated and driven by cyberinfrastructure (Atkins et al. 2003). Yet, how HPC has been applied in the literature to empower cartographic mapping in the domain of cartography and geovisualization has been inadequately investigated. This investigation, however, is urgent as the learning curve of HPC for geospatial applications becomes lower and new HPC resources (e.g., cloud computing, quantum computing) and visualization technologies (e.g., augmented reality, virtual reality) are developing dramatically. The goal of this chapter is thus to conduct a review for such an investigation on the applications of HPC in cartography and geovisualization. The structure of the rest of this chapter is organized in the following manner. First, I will introduce HPC within the context of cartography and geovisualization. Then, I will discuss in detail the applications of HPC in cartography and geovisualization from four aspects: map projection, cartographic generalization, mapping methods, and map rendering. These four aspects dominate the process of cartographic mapping (see Dent et al. 1999; Slocum et al. 2008). After that, I will identify specific challenges

9 Cartographic Mapping Driven by High-Performance Computing: A Review

161

facing cartographic mapping using HPC and give suggestions on potential directions. This chapter ends with conclusion for the application of HPC in cartography and geovisualization.

2 High-Performance Computing HPC provides computational support for problem-solving through the coordinated use of networked computing resources instead of stand-alone computers (Wilkinson and Allen 2004; Foster 1995). The evolution of HPC has experienced different stages, including distributed computing, cluster computing, grid computing, and cloud computing. HPC resources can be provisioned from clusters of networked desktop computers to supercomputers. To harness the advanced computing power from HPC resources, parallel computing algorithms are often used. Parallel computing algorithms partition a computational problem of interest into a set of smaller sub problems that is otherwise computationally infeasible (Wilkinson and Allen 2004; Dongarra et al. 2003). These subproblems are then assigned to, and solved by, the networked computing elements on the HPC resources. Originally, most HPC resources are based on Central Processing Units (CPUs). Since 2006, Graphics Processing Units (GPUs) were promoted into a many-core massively parallel computing technique for general-purpose computation (Owens et al. 2007). GPUs are specifically engineered for the acceleration of graphic operations at the beginning, which are inherently related to cartography and geovisualization. In other words, GPU-related techniques are commonly used in visualization studies in general and cartography and geovisualization in particular. Programming frameworks or platforms for GPUs include graphic-specific (acceleration of graphic operations) and general purpose (acceleration of general computation such as spatial analysis algorithms). Graphic-specific GPU programming support is represented by OpenGL (Open Graphic Library; from KHRONOS group; see https:// www.opengl.org/) and Direct3D (from Microsoft; see https://docs.microsoft.com/ en-us/windows/win32/direct3d) for 3D visualization. OpenGL and Direct3D provide API (Application Programming Interface) that uses rendering pipelines for accelerated graphic operations at hardware level (i.e., GPUs). WebGL (Web Graphic Library) is an extension of OpenGL that uses JavaScript binding for web-based 3D visualization. General-purpose GPU programming frameworks or standards include CUDA (Compute Unified Device Architecture; specific to Nvidia GPUs), and OpenCL (Open Computing Language; compliant with different GPUs). Both CUDA and OpenCL rely on thread-based data parallelism that leverages many-core computing power from GPUs for general-purpose computation (Kirk and Hwu 2010). CUDA is a proprietary solution from Nvidia and OpenCL provides open standards that are well suited to different computing platforms. Further, CUDA is dominating the use of GPUs for general-purpose computation partially because of its large market share

162

W. Tang

and widely available libraries. In general, CUDA has higher acceleration performance compared to OpenCL. Because of its cross-platform support and open standards, OpenCL are more suitable for general-purpose computation on mobile or embedded devices (e.g., unmanned vehicles that are increasingly available). In addition to HPC resources, parallel visualization software platforms that harness HPC power have been available for the visualization need of large scientific data. For example, ParaView (https://www.paraview.org/) and VisIt (https://wci.llnl. gov/simulation/computer-codes/visit/) are two representative open-source software products for parallel interactive visualization (both projects were initiated in early 2000s). While these two open-source options are designed for the 2D and 3D visualization of terascale scientific data, they can be deployed to alternative computing resources, from single desktop computers to supercomputers. These two software products have been extensively applied in alternative domains for large-scale visualization and analytics of scientific data.

3 H igh Performance Computing for Cartography and Geovisualization In this chapter, I focus on discussing the use of HPC in cartography and geovisualization from four aspects: map projection, cartographic generalization, mapping methods, and map rendering (see Fig. 9.1 for a framework). These four aspects constitute major components of a cartographic mapping process, and support 2D or 3D mapping within desktop- or web-based environments. Specifically, mapping

Fig. 9.1 Framework of cartography and geovisualization accelerated using high-performance computing

9 Cartographic Mapping Driven by High-Performance Computing: A Review

163

methods include general reference mapping (focus on location of spatial features) and thematic mapping (focus on specific themes) (Slocum et al. 2008). Cartographic mapping within web-based environments requires extra handling on, for example, communication between clients and servers. However, desktop-based environments have the convenience of directly leveraging local computing power for cartographic mapping. Of course, as advancement in computing and networking technologies, desktop- and web-based environments tend to be integrated for cartographic mapping. A series of GIS software platforms (either commercial or open source; e.g., ArcGIS Pro, Quantum GIS) allows for the mash-up of remote web mapping services within desktop environments.

3.1 Map Projection Map projection allows for the transformation of spatial data between different coordinate systems (Snyder 1987; Slocum et al. 2008). A map product is often based on multiple spatial datasets that may use different coordinate systems, which requires the map (re) projection of these datasets into the single coordinate system used by the final map project. Map projection typically includes two steps (Slocum et al. 2008): inverse transformation (from planar coordinate system to ellipsoid coordinate system) and forward transformation (from ellipsoid to planar coordinate system). These two steps rely on mathematic transformations, which may be complicated in nature. The resolution of these mathematic transformations is often computationally demanding. While these two steps are necessary for both vector and raster data, the projection of raster data requires further interpolation (or sampling) operations on neighboring pixels (Finn et al. 2012; Usery and Seong 2001). A series of parallel computing studies have been reported in the literature to accelerate map projection. These studies either focus on using many-core GPUs or HPC clusters (e.g., supercomputers). In terms of using GPUs for accelerated map projection, Jenny et al. (2016) presented a real-time raster projection approach that is based on WebGL. This raster projection approach combines inverse per fragment projection and forward per-triangle projection for the on-the-fly projection (at least 30 frames per second in terms of frame rates) of spatial data within web browsers. Li et al. (2017) proposed a massively parallel algorithm that relies on CUDA- compliant GPUs for the acceleration of the map reprojection of raster data. The reprojection of each cell (pixel) in the output raster is handled by a CUDA thread. Li et al. reported a speedup (also known as acceleration factor) of 10–100 times (GPU: Nvidia GeForce GT 640 with 384 cores; CPU: Intel Quad-core i5 and i7) for their CUDA algorithm of map reprojection. US and global datasets were used in Li et al.’s study and the largest landscape size is 32,238 × 20,885 in terms of numbers of rows and columns. With respect to the use of HPC cluster, Tang and Feng (2017) developed a parallel map projection framework based on the combination of GPU clusters and cloud computing capabilities. This GPU-acceleration framework enables the map

164

W. Tang

projection of big vector data (LiDAR point clouds were used). Cloud-based virtual machines were used to manage data, algorithms, computing task scheduling, and visualization (via a Web GIS interface). A GPU cluster with 32 nodes and 96 GPUs was used to demonstrate the utility of this massively parallel computing solution. A LiDAR dataset (230 GBs of storage space) from the North Carolina Floodplain Mapping Program was used. A speed up of 900 (with consideration of data transfer time between CPU host and GPU device) was achieved when using 60 GPUs together for map projection. Finn et al. (2019) conducted detailed computing performance analysis for pRasterBlaster, a MPI-based parallel map projection software designed for HPC environments. Two XSEDE supercomputers were used in Finn et al. (2019)‘s study, including Trestles (#processors: 10,368) and Lonestar (#processors: 22,656). By using these supercomputing resources from XSEDE cyberinfrastructure, Finn et al. stressed that workload distribution and data-dependent load balancing are two computational bottlenecks for leveraging supercomputing clusters for efficient parallel map projection of large raster data. Finn et al. further identified a set of issues that may affect the computing performance of parallel map projection, including parallel I/O, load balancing, and dynamic partitioning.

3.2 Cartographic Generalization Cartographic generalization is “the process of reducing the information content of maps because of scale change, map purpose, intended audience, and/or technical constraints” (Slocum et al. 2008, p. 97). Generalization is needed when four conditions for mapping elements occur (Slocum et al. 2008), including congestion, coalescence, conflict, and complication. As a result, a set of generalization operations have been developed for both raster and vector data. Typically, vector-based operations, represented by simplification, smoothing, aggregation, collapse, merging, refinement, displacement, have been well developed to generalize different geometric features (points, polylines, and polygons). Raster-based generalization operations include resampling, filters, and categorization. For more detail, the readers are directed to Slocum et al. (2008). While cartographic generalization does not make direct change on the spatial dataset per se, generalization operations that are applied to geometric features often require considerable computational support. This is because geometric criteria (e.g., distance or angle) used in generalization are often complicated and algorithms for generalization may be at global level (in need of considering whole geometric feature and/or other features—i.e., entire dataset). Gao et al. (2015) investigated the use of Hadoop clusters (based on a MapReduce framework; move compute to data; see Dean and Ghemawat 2008 and White 2012) to accelerate line simplification based on a multi-scale visual curvature algorithm. With support from a Hadoop cluster with 8 computing nodes, Gao et al. reported that the parallel computing time is lower than 2.2% of sequential computing time for the simplification of GeoLife GPS trajectory data. Zhou et al. (2018) presented a detailed study that considers

9 Cartographic Mapping Driven by High-Performance Computing: A Review

165

multiple constraints for point generalization on the Hadoop architecture. These constraints cover the relationship of points of interest with other features (road network here), point features per se, and cartographic scale. A circle growth algorithm with support from these constraints was used for the generalization of point data. The point generalization process was parallelized through the MapReduce mechanism on Hadoop (9 virtual machine nodes each having two CPUs were used). Experimental results from Zhou et al. suggested that acceleration performance on Hadoop clusters is dependent on the volume of spatial data—the larger the data volume, the higher the acceleration performance is when more computing resources are used.

3.3 Mapping Methods Mapping methods consist of general reference mapping (with a focus on location; e.g., topographic maps) and thematic mapping. A set of thematic mapping methods have been available, including choropleth maps, dasymetric maps, isarithmic mapping, proportional symbol map and dot map, cartogram, and flow maps (Slocum et al. 2008). Parallel computing studies reported in the literature focus on choropleth mapping, heat mapping (a special type of isarithmic mapping), and cartogram. Choropleth mapping is a thematic mapping technique that has been most commonly used to present numerical attribute information on enumeration units (Armstrong et al. 2003). Class interval selection is an important step in choropleth mapping that groups original data into a series of ordered classes. Optimal data classification approaches, such as Fisher–Jenks’s algorithm (Jenks 1977; Fisher 1958), have been developed for class interval selection by minimizing within-interval variability. This optimal data classification presents a combinatorial exploration problem, which is computationally intractable for large problem size. Heuristic search methods such as evolutionary algorithm (see Armstrong et al. 2003) have been proposed to solve the computational issue of optimal data classification facing choropleth mapping from an algorithmic perspective. A decade after Armstrong et al.’s classic work, Rey et al. (2013) developed a parallel Fisher–Jenks optimal data classification approach for choropleth mapping. For the four steps of the data classification algorithm, Rey et al. parallelized the first step: computing of the sum of absolute deviations. This parallel optimal data classification approach was implemented and compared using three Python libraries: PyOpenCL (for GPU programming), Multiprocessing, and Parallel Python. Rey et al. (2013) tested their parallel implementations by varying the number of geometric features and partitions for data classification. The acceleration performance is not very high (speed up is less than 2) as reported. This is attributed to the hardware (Apple Mac Pro with 12 CPUs and ATI Radeon graphic card) used in Rey et al.’s study. However, Rey et al. did underline that parallel data classification is in favor of the size of the data for choropleth mapping and low acceleration can be improved by using high-end cluster computing resources. Laura and Rey (2013) further improved the parallel data classification for

166

W. Tang

choropleth mapping. Shared memory and vectorization mechanisms were used as extension to Rey et al. (2013)’s work. Heat maps as a special type of isarithmic maps have been extensively used in the visualization of point-based data. Kernel density estimation, which is computationally demanding, is often needed to extract density-based information of point data. For visualization acceleration purpose, WebGL is often used for the rendering of heat maps. For example, Thöny et al. (2016) presented a WebGL-based approach for accelerated heat map rendering. Thony et al. used car accident datasets in UK (including 1,494,275 accidents) to explore the capability of this WebGL-based approach. Thony et al.’s WebGL-based heat map visualization approach outperformed other alternatives, including ArcGIS Online, Google Map, and Leaflet. Ježek et al. (2017) also applied WebGL into their solution for the heat map visualization of massive point data. Cartogram is the third thematic mapping method which parallel computing implementation has been reported in the literature. Tang (2013) presented a parallel circular cartogram algorithm accelerated by GPUs. The construction of circular cartogram, as a form of area cartogram (see Dorling 1996), relies on an iteration process to adjust the location of enumeration units (circles here). While the goal of the iteration process is to minimize the overlapping among enumeration units, this iteration process is in need of significant computation. The massively parallel computing approach that Tang (2013) proposed is to use thread-based parallelism in which large amounts of CUDA threads are used to update the locations of spatially indexed enumeration units in parallel. Tang (2013) achieved 15–20 times of acceleration by using an NVidia GeForce GTX 480 GPU (480 CUDA cores). The acceleration performance of GPU-based parallel circular cartogram algorithm is dependent on problem size, spatial indexing, and the parameters of the cartogram algorithm (see Tang 2013).

3.4 Map Rendering Map rendering is the last step of cartographic mapping. Once a map is symbolized and generalized, map rendering needs to be applied to convert the map into images for visualization of geographic information from spatial data. Map rendering is fundamentally related to rendering in computer graphics. Map rendering for both 2D and 3D spatial data within GIS environments could be conducted beforehand (pre- rendering) or in an on-the-fly manner (real-time rendering). The mapping of spatial data can benefit from pre-rendering (either for web-based or desktop-based platforms). Pre-rendering is often known as map caching or tiling (see Goodchild 1989). Map caching can significantly enhance the efficiency of mapping while it needs large storage space for map tiles generated by the caching mechanism. Further, map caching of large spatial datasets could be computationally intensive because map tiles at different cartographic scales need to be rendered. This computational issue becomes worse if the spatial data are frequently updated.

9 Cartographic Mapping Driven by High-Performance Computing: A Review

167

Parallel computing support have been used to accelerate the map caching mechanism (see Wang 2012), and have been implemented in GIS software packages, such as ArcGIS Server (https://enterprise.arcgis.com/en/server/latest/publish-services/ windows/allocation-of-server-resources-to-caching.htm), and GeoServer integrated with GeoWebCache (see https://docs.geoserver.org/stable/en/user/geowebcache/ index.html). The goal of map rendering in GIS software is to achieve on-the-fly visualization of geographic information, which is necessary for interactive geovisualization. This becomes more important as recent noticeable growth in augmented reality (AR) and virtual reality (VR) technologies together with state-of-the-art artificial intelligence. Pokemon GO (see https://www.pokemongo.com/) is an example of such AR technology though it receives safety and privacy concerns. GPUs, which were designed for rapid graphics operations, have been used to accelerate map rendering for interactive visualization. The programmable rendering pipelines from high-performance graphics API (e.g., OpenGL) allow for leveraging many-core parallel computing power in GPUs for accelerated map rendering. In terms of 2D map rendering, Yue et al. (2016) utilized the rendering pipeline that combines vertex shader and fragment shader to speed up the rendering of linear map symbols using GPUs. Heitzler et al. (2017) applied the GPU-enabled rendering for the visualization of disaster simulation. GPUs have been increasingly used for 3D map rendering. Li et al. (2013) investigated the use of both CPUs and GPUs for the visualization of 3D/4D environmental data. Li et al. developed a visualization pipeline that consists of four major steps: preprocessing, coordinate transformation, interpolation, and rendering. Ray cast rendering in their visualization pipeline was parallelized using GPUs. Li et al. suggested that GPUs are suitable for rendering only when the size of the spatial data does not exceed the on-board memory capacities of GPUs. Tully et al. (2015) used GPUs to accelerate the rendering of 3D environments in their case study of crisis management. She et al. (2017) applied GPUs to parallelize the rendering of geometric features within 3D terrain environments. Wang et al. (2017) explored the web-based visualization of multidimensional spatial data. Wang et al.’s virtual globe platform, referred to as PolarGlobe, leverages spatially indexed level of detail (LOD; similar to cartographic generalization in response to distance) approach for web-based 3D volume rendering using WebGL.

4 Challenges While HPC provides a way of resolving the computational challenge facing cartographic mapping, specific challenges exist in terms of utilizing HPC in cartography and geovisualization. These challenges include, but are not limited to, the handling of big spatial data, spatiotemporal mapping and visualization, reconsideration of artificial intelligence (e.g., deep learning), preparation for the forthcoming of new HPC technologies such as quantum computing. In this chapter, I focus on discuss-

168

W. Tang

ing two specific challenges: handling of big spatial data, and spatiotemporal mapping and visualization.

4.1 Handling of Big Spatial Data Scientific domains have been actively handling the big data challenge facing their studies since early 2010s. Spatial data are characterized by five generic big data characteristics, including volume, velocity, variety, veracity, and value (Marr 2015; Yang et al. 2017). Because of these characteristics, the management, storage, processing, and analysis of big spatial data for cartographic mapping is extremely challenging. Often time, real-time mapping or interactive visualization of these big spatial data may not be realistic even on HPC infrastructure. To resolve this big data challenge for cartographic mapping, potential solutions may lie in the following threads. First, there is a need for advancement in data handling algorithms for the aggregation, resampling, indexing, fusion, and compression of spatial data (see Yang et al. 2017). These data handling algorithms will lead to more efficient management, access, and processing of spatial data for cartographic mapping and visualization. Second, parallel visualization software platforms (e.g., ParaView and VisIt) have been available in the cyberinfrastructure domain and increasingly used for scientific visualization of domain-specific data at terascale or higher by leveraging HPC power from, for example, supercomputing resources. However, the use of these parallel visualization platforms for big data-driven cartographic mapping and geovisualization is rare. Third, parallel strategies for cartographic mapping should be paid more attention. A suite of parallel strategies, including domain decomposition, load balancing, and task scheduling, have been developed for generic GIS- based algorithm and spatial analysis (Wang and Armstrong 2009). These parallel strategies can be adapted specifically to support the cartographic mapping and geovisualization of big spatial data. For example, Guo et al. (2015) presented a spatially adaptive decomposition approach that is based on computational intensity analysis of geospatial features and space-filling curve for parallel visualization of vector spatial data. As HPC technologies keep evolving (e.g., quantum computing; see NASEM 2019), the role that these threads (with respect to data handling, integration of parallel visualization platforms, and parallel strategies) play in resolving the big data challenge facing cartographic mapping will become more and more important.

4.2 Spatiotemporal Mapping and Visualization Spatiotemporal GIS and analysis have become a hot topic in GIScience and geocomputation (Goodchild 2013; An et al. 2015). Significant amounts of data with space-time stamps have been generated via sensor technologies and geographic

9 Cartographic Mapping Driven by High-Performance Computing: A Review

169

analysis and modeling. Cartographic mapping and visualization of these data can help us reveal spatiotemporal patterns for, and thus understand the complexity of, dynamic geographic phenomena of interest. While spatiotemporal data models, data structures, and database have been developed, introducing temporal dimension into spatial data complicates the mapping and visualization of these data from a computational perspective. As a result, studies on spatiotemporal mapping and visualization tend to lag behind the increasingly available spatiotemporal data and development of spatiotemporal analysis and modeling. To overcome this gap would require improvements on the series of cartographic steps (e.g., generalization, mapping methods, and rendering). Such a requirement creates great opportunities for applying HPC into the mapping and visualization of spatiotemporal data. For example, with acceleration support from HPC, spatiotemporal simulation outcome can be efficiently processed and presented in visual forms. This HPC-accelerated visual presentation capability may provide support for more insights into the spatiotemporal complexity of geographic systems of interest, which is often infeasible for standalone computing. Further, spatiotemporal principles (see Yang et al. 2011, 2017) should be always considered to guide the use of HPC for spatiotemporal mapping and visualization.

5 Conclusion In this chapter, I conducted a review to investigate the use of HPC for the acceleration of cartographic mapping in the domain of cartography and geovisualization. The investigation is based on four major components in a cartographic mapping process: map projection, cartographic generalization, mapping methods, and map rendering. Alternative types of HPC resources and technologies have been applied to enable the cartographic mapping that is often computationally demanding. HPC resources including computing clusters and many-core GPUs deployed on different computing infrastructure (cloud or non-cloud) have been increasingly used to address the computational challenge facing cartographic mapping. In particular, because of the hardware-level acceleration of graphics operations, GPUs and associated technologies (e.g., OpenGL and WebGL) have been extensively applied for computationally demanding cartographic mapping and visualization of spatial data. Further, GPUs have been used to accelerate the general-purpose computation needs for cartographic mapping (e.g., map projection, generalization). The choice of HPC resources for cartographic mapping depends on the mapping purpose and the need of cartographic operations. For example, computing clusters (tightly or loosely coupled) are suitable for cartographic generalization or map projections of large spatial datasets. But if on-the-fly visualization (i.e., map rendering) is needed, the use of GPUs should be preferred. A series of specific challenges exist for HPC-driven cartography and geovisualization. Efficient handling of big data and spatiotemporal mapping are representative of these challenges. If we are to resolve these challenges, parallel strategies and

170

W. Tang

data handling approaches (e.g., data storage, access) are needed to better leverage HPC power for the acceleration of cartographic mapping. At the same time, we may need to consider other state-of-the-art computer technologies such as artificial intelligence and be prepared for the coming of new technologies (e.g., quantum computing). These new technologies may bring revolutionary capabilities to further support the resolution of computational challenges facing cartography and geovisualization.

References An, L., Tsou, M.-H., Crook, S. E., Chun, Y., Spitzberg, B., Gawron, J. M., et al. (2015). Space–time analysis: Concepts, quantitative methods, and future directions. Annals of the Association of American Geographers, 105, 891–914. Armstrong, M. P. (2000). Geography and computational science. Annals of the Association of American Geographers, 90, 146–156. Armstrong, M. P., Xiao, N., & Bennett, D. A. (2003). Using genetic algorithms to create multicriteria class intervals for choropleth maps. Annals of the Association of American Geographers, 93, 595–623. Atkins, D. E., Droegemeie, K. K., Feldman, S. I., Garcia-Molina, H., Klein, M. L., Messerschmitt, D. G., et al. (2003). Revolutionizing science and engineering through cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure. Arlington, VA: US National Science Foundation. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51, 107–113. Dent, B. D., Torguson, J. S., & Hodler, T. W. (1999). Cartography: Thematic map design. New York: WCB/McGraw-Hill. Dongarra, J., Foster, I., Fox, G., Gropp, W., Kennedy, K., Torczon, L., et al. (Eds.). (2003). The sourcebook of parallel computing. San Francisco, CA: Morgan Kaufmann Publishers. Dorling, D. (1996) Area Cartograms: Their Use and Creation, Concepts and Techniques in Modern Geography (CATMOG) 59, Geo Books, Norwich. Finn, M. P., Liu, Y., Mattli, D. M., Behzad, B., Yamamoto, K. H., Shook, E., et al. (2019). High- performance small-scale raster map projection empowered by cyberinfrastructure. In S. Wang & M. F. Goodchild (Eds.), CyberGIS for geospatial discovery and innovation. Dordrecht: Springer. Finn, M. P., Steinwand, D. R., Trent, J. R., Buehler, R. A., Mattli, D. M., & Yamamoto, K. H. (2012). A program for handling map projections of small scale geospatial raster data. Cartographic Perspectives, 71, 53–67. Fisher, W. D. (1958). On grouping for maximum homogeneity. Journal of the American Statistical Association, 53, 789–798. Foster, I. (1995). Designing and building parallel programs: Concepts and tools for parallel software engineering. Reading, MA: Addison-Wesley. Gahegan, M. (1999). What is geocomputation. Transactions in GIS, 3, 203–206. Gao, P., Liu, Z., Han, F., Tang, L., & Xie, M. (2015). Accelerating the computation of multi-scale visual curvature for simplifying a large set of polylines with Hadoop. GIScience & Remote Sensing, 52, 315–331. Goodchild, M. F. (1989). Tiling large geographical databases. In Symposium on large spatial databases (pp. 135–146). Springer. Goodchild, M. F. (2013). Prospects for a space–time GIS: Space–time integration in geography and GIScience. Annals of the Association of American Geographers, 103, 1072–1077.

9 Cartographic Mapping Driven by High-Performance Computing: A Review

171

Guo, M., Guan, Q., Xie, Z., Wu, L., Luo, X., & Huang, Y. (2015). A spatially adaptive decomposition approach for parallel vector data visualization of polylines and polygons. International Journal of Geographical Information Science, 29, 1419–1440. Heitzler, M., Lam, J. C., Hackl, J., Adey, B. T., & Hurni, L. (2017). GPU-accelerated rendering methods to visually analyze large-scale disaster simulation data. Journal of Geovisualization and Spatial Analysis, 1, 3. Jenks, G. F. (1977). Optimal data classification for choropleth maps. Department of Geography, University of Kansas. Jenny, B., Šavrič, B., & Liem, J. (2016). Real-time raster projection for web maps. International Journal of Digital Earth, 9, 215–229. Ježek, J., Jedlička, K., Mildorf, T., Kellar, J., & Beran, D. (2017). Design and evaluation of WebGL-based heat map visualization for big point data. In I. Ivan, A. Singleton, J. Horak, & T. Inspektor (Eds.), The rise of big spatial data. Springer. Kirk, D. B., & Hwu, W.-M. (2010). Programming massively parallel processors: A hands-on approach. Burlington, MA: Morgan Kaufmann. Laura, J., & Rey, S. J. (2013). Improved parallel optimal choropleth map classification. In X. Shi, V. Kindratenko, & C. Yang (Eds.), Modern accelerator technologies for geographic information science. New York, NY: Springer. Li, J., Finn, M., & Blanco Castano, M. (2017). A lightweight CUDA-based parallel map reprojection method for raster datasets of continental to global extent. ISPRS International Journal of Geo-Information, 6, 92. Li, J., Jiang, Y., Yang, C., Huang, Q., & Rice, M. (2013). Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs). Computers & Geosciences, 59, 78–89. Marr, B. (2015). Big data: Using SMART big data, analytics and metrics to make better decisions and improve performance. West Sussex: John Wiley & Sons. NASEM (National Academies of Sciences, Engineering, and Medicine). (2019). Quantum computing: Progress and prospects. Washington, DC: The National Academies Press. Openshaw, S., & Turton, I. (2000). High performance computing and art of parallel programming: An introduction for geographers, social scientists, and engineers. London: Taylor & Francis Group. Owens, J. D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A. E., et al. (2007). A survey of general-purpose computation on graphics hardware. Computer Graphics Forum, 26, 80–113. Rey, S. J., Anselin, L., Pahle, R., Kang, X., & Stephens, P. (2013). Parallel optimal choropleth map classification in PySAL. International Journal of Geographical Information Science, 27, 1023–1039. She, J., Zhou, Y., Tan, X., Li, X., & Guo, X. (2017). A parallelized screen-based method for rendering polylines and polygons on terrain surfaces. Computers & Geosciences, 99, 19–27. Slocum, T. A., Mcmaster, R. M., Kessler, F. C., Howard, H. H., & Mc Master, R. B. (2008). Thematic cartography and geographic visualization. Upper Saddle River, NJ: Pearson Prentice Hall. Snyder, J. P. (1987). Map projections-a working manual. Washington, DC: USGPO. Tang, W. (2013). Parallel construction of large circular cartograms using graphics processing units. International Journal of Geographical Information Science, 27, 2182–2206. Tang, W., & Feng, W. (2017). Parallel map projection of vector-based big spatial data: Coupling cloud computing with graphics processing units. Computers, Environment and Urban Systems, 61, 187–197. Thöny, M., Billeter, M., & Pajarola, R. (2016). Deferred vector map visualization. In Proceedings ACM SIGGRAPH ASIA 2016 Symposium on Visualization, 16. Tobler, W. R. (1976). Analytical cartography. The American Cartographer, 3, 21–31. Tully, D., Rhalibi, A., Carter, C., & Sudirman, S. (2015). Hybrid 3D rendering of large map data for crisis management. ISPRS International Journal of Geo-Information, 4, 1033–1054.

172

W. Tang

Usery, L. E., & Seong, J. C. (2001). All equal-area map projections are created equal, but some are more equal than others. Cartography and Geographic Information Science, 28, 183–194. Wang, H. (2012). A large-scale dynamic vector and raster data visualization geographic information system based on parallel map tiling. Ph.D., Florida International University. Wang, S., & Armstrong, M. (2009). A theoretical approach to the use of cyberinfrastructure in geographical analysis. International Journal of Geographical Information Science, 23, 169–193. Wang, S., Li, W., & Wang, F. (2017). Web-scale multidimensional visualization of big spatial data to support earth sciences—A case study with visualizing climate simulation data. Informatics, 4, 17. White, T. (2012). Hadoop: The definitive guide. Sebastopol, CA: O’Reilly Media, Inc. Wilkinson, B., & Allen, M. (2004). Parallel programming: Techniques and applications using networked workstations and parallel computers (2nd ed.). Upper Saddle River, NJ: Pearson Prentice Hall. Yang, C., Huang, Q., Li, Z., Liu, K., & Hu, F. (2017). Big data and cloud computing: Innovation opportunities and challenges. International Journal of Digital Earth, 10, 13–53. Yang, C., Wu, H., Huang, Q., Li, Z., & Li, J. (2011). Using spatial principles to optimize distributed computing for enabling the physical science discoveries. Proceedings of the National Academy of Sciences, 108, 5498–5503. Yue, S., Yang, J., Chen, M., Lu, G., Zhu, A.-X., & Wen, Y. (2016). A function-based linear map symbol building and rendering method using shader language. International Journal of Geographical Information Science, 30, 143–167. Zhou, J., Shen, J., Yang, S., Yu, Z., Stanek, K., & Stampach, R. (2018). Method of constructing point generalization constraints based on the cloud platform. ISPRS International Journal of Geo-Information, 7, 235.

Part III

Domain Applications of High Performance Computing

Chapter 10

High-Performance Computing for Earth System Modeling Dali Wang and Fengming Yuan

Abstract High-performance computing (HPC) plays an important role during the development of Earth system models. This chapter reviews HPC efforts related to Earth system models, including community Earth system models and energy exascale Earth system models. Specifically, this chapter evaluates computational and software design issues, analyzes several current HPC-related model developments, and provides an outlook for some promising areas within Earth system modeling in the era of exascale computing. Keywords Earth system model · High-performance computing · Artificial intelligence

1 Introduction Over the past several decades, high-fidelity numerical models have been developed to advance our understanding of Earth systems for better projecting future climate change scenarios. Models range from relatively simple radiant heat transfer models to fully coupled general circulation models. In this chapter, we focus on several issues regarding high-performance computing (HPC) for fully coupled Earth system model developments. Many models adopt a component-based modeling approach in which different numerical methods are used to model each Earth system component. A flux coupler is also designed to integrate the interactions among individual components (Wang et al. 2011). Two typical examples of these Earth system models are the Community Earth System Model (CESM, www.cesm.ucar. edu) administrated by the National Center for Atmospheric Research and the Energy Exascale Earth System Model (E3SM, www.e3sm.org) maintained by the US

D. Wang (*) · F. Yuan Oak Ridge National Laboratory, Oak Ridge, TN, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_10

175

176

D. Wang and F. Yuan

Department of Energy (DOE). Considering the general interests of the audience of this book, we focus on HPC and software engineering issues associated with Earth system modeling. First, we review the HPC-enabled Earth system model developments, including individual component model development, coupler design, IO challenges, and in situ data analysis. Then we focus on new challenges in Earth system modeling in the era of exascale computing. We cover topics related to heterogeneity in computing, coupling strategies, software refactoring and redesign, as well as new frontiers with the interactions of promising technologies from artificial intelligence (AI) and machine learning.

2 Early Stage of Earth System Model Development Earth system models (ESMs) have evolved from early efforts on climate system models, such as global circulation models (GCMs) that can decipher the dynamics of the full 3D atmosphere using computationally intensive numerical models (McGuffie and Henderson-Sellers 2001; Shine and Henderson-Sellers 1983; Weart 2019). Early GCMs treat the ocean as a motionless water body (Manabe 1969), which could not represent ocean feedback and system evolution throughout time. The coupled atmosphere and ocean GCM, including sea-ice component, has been first developed (Meehl 1984). Land surface components (Manabe 1969; Pitman 2003) have been included into later ESMs to better understand the aerodynamics and evapotranspiration at the land–atmosphere boundary layers. In the past two decades, advanced ESMs have been developed to represent more complicated physical (e.g., energy, water, dynamics), biological, and chemical (e.g., carbon, nitrogen, and other greenhouse gases and other elements like sulfur) processes among the atmosphere, ocean, cryosphere, and land surface (Flato 2011; Meehl 1984; Randall et al. 2019). The carbon cycle is considered an important aspect of ESMs to project the impacts of postindustrial rising atmospheric carbon dioxide (CO2) (Flato 2011; Hansen et al. 1984; Randall et al. 2019). Further extensions include other noncarbon elements and their biological and biogeochemistrical effects in the physical climate system (Morgenstern et al. 2017). A variety of efforts are reported to estimate the CO2 and other greenhouse gas emissions and their reactive-transport and uptake by both ocean and terrestrial ecosystems.

3 Coupled Earth System Development ESMs contain various components about the climate system. Individual Earth components were developed by different communities using their own methodologies over geographical domains (grids and resolutions). Data transfer and mapping (i.e., geo-spatial data indexing, referencing, and transforming), communication, and other coordination among components are carried out by a software component

10 High-Performance Computing for Earth System Modeling

177

called a coupler (Liu et al. 2014; Valcke et al. 2011, 2012). Although a coupler provides great flexibility and portability to integrate individual components, it can cause overall computational performance degradation because it enforces explicit data communications/exchanges and computational synchronization between all Earth system components that are calculated at different speeds with different time intervals (Balaji et al. 2017). Some couplers are implemented as a “centralized” system, in which component- level interfaces under a single driver (i.e., one executable) (Ji et al. 2014). Examples include the Earth System Modeling Framework (Hill et al. 2004), CPL7 for CCSM4/ CESM1 (Craig et al. 2011), and the GFDL Flexible Modeling System (Balaji et al. 2006). If a coupler is implemented as an individual component, standardized data communication/transfer protocols are required. Therefore, code modification is often necessary (Liu et al. 2014). Other couplers are implemented in a “distributed” fashion (Ji et al. 2014). Examples include Ocean Atmosphere Sea Ice Soil coupler version 3 (OASIS3) (Valcke 2013), Bespoke Framework Generator 2 (Armstrong et al. 2009), Model Coupling Toolkit (Larson et al. 2005; Warner et al. 2008), CPL6 for CCSM3 (Craig et al. 2005), and C-Coupler1 (Liu et al. 2014). This kind of coupler requires minimal modifications to individual ESM component codes, but it might come at a higher computational cost (2017) (Balaji et al. 2017; Valcke et al. 2012) because of system synchronization overhead and internal data exchange scheduling.

4 Input/Output Solution and In Situ Data Analysis The ESM community has encountered input/output (I/O) issues since the early days. Parallel I/O was then introduced and is now broadly employed after the adoption of parallel NetCDF (Li et al. 2003) that combines MPI-IO (Thakur et al. 1997) and the popular data format NetCDF (Rew and Davis 1990) for the climate and broader computing community. NetCDF provides the capability to include both variables and metadata to describe the included variables, while MPI-IO provides a great deal of flexibility on how to perform I/O in an HPC environment. The CESM and E3SM community developed a customized software layer, called PIO (Dennis et al. 2012), to better integrate Earth system code and more generic general-purpose libraries, such as PnetCDF. There are efforts to link Earth system code with the ADIOS library (Lofstead et al. 2008). ADIOS is a meta I/O library that provides a simplified user interface and flexible support for multiple underlying generic I/O back ends. Although ADIOS provides parallel BP file formatting and asynchronous I/O capabilities (Abbasi et al. 2009), the integration issues between the current BP format with NetCDF limit its adoption by the ESM community. More recently, the ESM community has become more interested with in situ data analysis while a simulation is running (Turuncoglu 2018). A series of workshops have been organized to explore and evaluate several aspects of “In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (https://vis.lbl.gov/events/

178

D. Wang and F. Yuan

ISAV2019/).” These workshops cover in-situ data anaysis infrastructure, system architecture, algorithms, workflow, usability, case studies, etc. A good example of the in situ data analysis platform designed for Earth system simulation is the virtual observation system (Wang et al. 2017). Based on advanced computing technologies, such as compiler-based software analysis, automatic code instrumentation, and high-performance data transport, a virtual observation system was implemented to provide run-time observation and in situ data analytics capability for a terrestrial land model simulation within E3SM.

5 C hallenges and Advances in the Era of Exascale Computing 5.1 Heterogeneity in Computing The underlying technology of shrinking transistors to advance computing, as Moore’s Law, has reached its limit. From the HPC perspective, we have witnessed the transition from single central processing units to multicore central processing unit architecture. Depending on the underlying hardware architecture of these multicore computing units, these multicore systems can be categorized as homogeneous (or symmetric) multicore systems or heterogeneous multicore systems. Recently, heterogeneous systems, which integrate various computing units (e.g., GPU, DSPs, ASSPs, and FPGA), have brought hope for new ways to address current computing needs. However, heterogeneous systems pose a unique challenge in computing as they contain distinct sets of compute units with their very own architecture and programming models. Because the instruction set architecture of each of the compute units is different, it becomes challenging to achieve load balancing between them. Furthermore, data consistency and coherency semantics are difficult because each compute unit views the memory spaces disjointedly. NVIDIA’s CUDA is one of the solutions that addresses the evolving aspects of GPU-based heterogeneous systems. Other solutions, such as newer open platforms, standards, architectures like HSA Foundation, OpenCL, OpenACC, IBM’s Liquid Metal, and OpenHPC, have emerged to provide the software stack for heterogeneous systems. A recent report (Vetter et al. 2019) states that several types of special purpose accelerated processing units are under development and will play a huge role in the future of computer architectures. It is also likely that these processors will be augmented with diverse types of memory and data storage capabilities. These significant changes are driven by extreme growth in the data-centric machine learning and AI marketplaces. Therefore, it is clear that mainstream hardware is becoming permanently parallel, heterogeneous, and distributed. These changes are here to stay and will disrupt the way we develop computationally intensive ESMs on mainstream architecture.

10 High-Performance Computing for Earth System Modeling

179

5.2 Coupling Mechanisms The coupling of the component-based model can reduce the degrees of freedom and provides a way to validate each Earth component. However, the commonly used component-based coupling technology can inject extra numerical noises into the simulation system (Wang et al. 2011). The traditional coupler-enabled software architecture faces challenges on exascale computing platforms. New features within existing components and new Earth system components might change the computing patterns dramatically and create further difficulties for system-wide performance tuning. Besides traditional methods that use profiling and tracing tools to improve communication and computational performance, several new coupling strategies are in development. A good example is the “coupling approaches for next- generation architectures” project (CANGA, www.canga-scidac.org). This project is developing (1) a new approach for assembling ESMs to better use new HPC architectures, (2) new methods for transferring data between models to improve the accuracy and fidelity of the fully coupled system, and (3) enhanced analyses and techniques for integrating multiple components forward in time in a stable and robust manner. Exascale computer will adapt deep hierarchical architecture. Some early-stage efforts are exploring new I/O methods for ESM. A good example is the deployment of the Unified Memory and Storage Space (UNITY, www.unity-ssio.org) data framework within the E3SM model. The UNITY framework frees the application (E3SM) from the complexity of directly placing and moving data within multitier storage hierarchies. UNITY framework has also been tested for data access performance and efficient data sharing within E3SM (e.g., component coupling, visualization, and data durability).

5.3 Accelerator-Based Component Model Implementation Nowadays, accelerators (such as GPU and TPU) are the most popular elements for exascale computing. Some efforts use GPU accelerators to improve the performance of components or individual modules within ESMs. For example, GPU accelerators have been used to increase the performance of several components/submodels within Earth system or weather/climate models, including the ultrahigh resolution global atmospheric circulation model NICAM (Satoh et al. 2008; Shimokawabe et al. 2010), the community Weather Research and Forecasting Model (Michalakes and Vachharajani 2008), and the ACME Atmosphere model (Norman et al. 2017) on Summit supercomputer. All of these approaches require extensive data layout and loop re-structuring to obtain reasonable performance. The CUDA programming model (Cook 2012) is widely adopted because of strong debugging support and improved performance with the CUDA compiler framework.

180

D. Wang and F. Yuan

The insertion of directives is an alternative method for different accelerators. Programming models using directives, such as the OpenACC (Wienke et al. 2012), are helpful in simple configurations that do not require significant source code refactoring. Directives have been used in accelerated based model implementations. Typical examples are the Non-Hydrostatic Icosahedral Model (Govett et al. 2010), the COSMO limited-area numerical weather prediction (Fuhrer et al. 2014), and the Community Atmosphere Model code effort on the Sunway TaihuLight supercomputer (Fu et al. 2016). Source-to-source translation is sometimes used in the code refactoring and optimization. For example, Alvanos and Christoudias (2019) used a source-to-source parser written in Python to transform the FORTRAN-produced Atmospheric Chemical Kinetics model to CUDA-accelerated code. Fu et al. (2016) used a source-to-source translator tool to exploit the most suitable parallelism for a computing cluster with accelerators. Scientific software refactorization and performance tuning also attract research interests from the software engineering community. A series of workshops have been organized to facilitate collaboration. Two examples are the International Workshop on Legacy Software Refactoring for Performance (https://refac-ws.gitlab.io/2019/) and the workshop series of Software Engineering for Science (https:// se4science.org).

5.4 Artificial Intelligence: Enhanced Numerical Simulations Fully coupled ESMs require massive computational resources, which in turn limits many practical uses such as data analyses, uncertainty quantification, and process optimization. Proxy models have been developed in the past several decades to alleviate the computational burden. The traditional approaches to developing proxy models include reduced-order models and statistical response surfaces. The advances in AI and machine learning introduce a paradigm shift in how proxy models are developed. These smart proxy models accurately mimic the performance of highly complex numerical simulation models at speeds that are multiple orders of magnitude faster (Amini and Mohaghegh 2019; Vida et al. 2019). Furthermore, AI-based neural models have also been used as the first order of estimation to speed up the iterative numerical simulations. For example, in (Wiewel et al. 2019), the authors propose a method for the data-driven inference of temporal evolutions of physical functions with deep learning. Specifically, they targeted fluid flows (i.e., Navier–Stokes problems) and developed a novel LSTM-based approach to predict the changes of pressure fields over time. In Rasp et al. (2018), the authors mention that current climate models are too coarse to resolve many of the atmosphere’s most important processes. Traditional methods using parameterizations have impeded progress towards more accurate climate predictions for decades. Therefore, they develop data-driven models that use deep learning to leverage the power of short- term cloud-resolving simulations for climate modeling.

10 High-Performance Computing for Earth System Modeling

181

5.5 AI-Enabled In Situ Data Analysis In the era of exascale computing, we will witness more in situ data analysis since it is an advanced option to save I/O cost, increase prediction accuracy, and increase the use of HPC resources. An excellent example is the extreme weather pattern identification work that uses variants of Tiramisu and DeepLabv3+ neural networks on a high-end computer at Oak Ridge National Laboratory (ORNL) (Kurth et al. 2018). By taking advantage of the FP16 tensor cores, a half-precision version of the DeepLabv3+ network achieved a peak and sustained a throughput of 1.13 EF/s and 999.0 PF/s, respectively. To increase the accuracy of ESM prediction and to reduce the uncertainty of model prediction, real-time data assimilation is necessary. The deluge of the Internet of Things (IoT) data creates enormous opportunities to collect information on the physical world (Song et al. 2018). We expect the ESM community will significantly explore data assimilation opportunities, especially with the help of deep learning techniques as part of “edge computing.” Currently, the Cloud is the option for deploying deep learning-based applications. However, the challenges of cloud- centric IoT systems are increasing because of significant data movement overhead, escalating energy needs, and privacy issues. Rather than continually moving a tremendous amount of raw data to the cloud, it would be beneficial to leverage the emerging powerful IoT devices to perform the inference task. Also, big raw IoT data challenges the traditional supervised training method in the cloud. Therefore, we expect in situ AI, the autonomous and incremental computing framework and architecture for deep learning-based IoT applications will soon be adopted by the ESM community.

6 Summary HPC plays an important role in ESM development. The hardware development of HPC continuously outpaces the software and application development; therefore, how to efficiently use those computing resources remains a challenge for the ESM community. The chapter reviews HPC activities within ESM efforts, including the early phase of ESM development, coupled system simulation, I/O challenges, and in situ data analysis. The chapter then articulates five key challenging issues encountered by the ESM community in the era of exascale computing. These issues cover heterogeneity in computing, coupling mechanisms, accelerator-based model implementation, AI-enhanced numerical simulation, and AI-enabled in situ data analysis. Acknowledgements This research was funded by the US Department of Energy (DOE), Office of Science, Biological and Environmental Research (BER) program and Advanced Scientific Computing Research (ASCR) program, and by an ORNL AI initiative. This research used resources of the Oak Ridge Leadership Computing Facility, located in the National Center for Computational Sciences at ORNL, which is managed by UT-Battelle LLC for DOE under contract DE-AC05-00OR22725.

182

D. Wang and F. Yuan

References Abbasi, H., et al. (2009, August). Extending i/o through high performance data services. In 2009 IEEE International Conference on Cluster Computing and Workshops (pp. 1–10). IEEE. Alvanos, M., & Christoudias, T. (2019). Accelerating atmospheric chemical kinetics for climate simulations. IEEE Transactions on Parallel and Distributed Systems, 30(11), 2396–2407. Amini, S., & Mohaghegh, S. (2019). Application of machine learning and artificial intelligence in proxy modeling for fluid flow in porous media. Fluids, 4(3), 126. Armstrong, C. W., Ford, R. W., & Riley, G. D. (2009). Coupling integrated Earth system model components with BFG2. Concurrency and Computation: Practice and Experience, 21(6), 767–791. Balaji, V., et al. (2006). The Exchange Grid: A mechanism for data exchange between Earth System components on independent grids. In Parallel computational fluid dynamics 2005 (pp. 179–186). Elsevier. Balaji, V., et al. (2017). CPMIP: Measurements of real computational performance of Earth system models in CMIP6. Geoscientific Model Development, 10, 19–34. Cook, S. (2012). CUDA programming: A developer’s guide to parallel computing with GPUs. Oxford: Newnes. Craig, A. P., Vertenstein, M., & Jacob, R. (2011). “A new flexible coupler for earth system modeling developed for CCSM4 and CESM1.” The International Journal of High Performance Computing Applications 26, no. 1 (2012): 31–42. Craig, A. P., et al. (2005). CPL6: The new extensible, high performance parallel coupler for the Community Climate System Model. The International Journal of High Performance Computing Applications, 19(3), 309–327. Dennis, J. M., et al. (2012). An application-level parallel I/O library for Earth system models. The International Journal of High Performance Computing Applications, 26(1), 43–53. Flato, G. M. (2011). Earth system models: An overview. Wiley Interdisciplinary Reviews: Climate Change, 2(6), 783–800. Fu, H., et al. (2016). Refactoring and optimizing the community atmosphere model (CAM) on the Sunway TaihuLight supercomputer. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. Fuhrer, O., et al. (2014). Towards a performance portable, architecture agnostic implementation strategy for weather and climate models. Supercomputing Frontiers and Innovations, 1(1), 45–62. Govett, M. W., Middlecoff, J., & Henderson, T. (2010). Running the NIM next-generation weather model on GPUs. In 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE. Hansen, J., et al. (1984). Climate sensitivity: Analysis of feedback mechanisms. In J. E. Hansen & T. Takhashi (Eds.), Climate processes and climate sensitivity, Geophysical Monograph 29 (pp. 130–163). Washington, DC: American Geophysical Union. Hill, C., et al. (2004). The architecture of the earth system modeling framework. Computing in Science & Engineering, 6(1), 18–28. Ji, Y., Zhang, Y., & Yang, G. (2014). Interpolation oriented parallel communication to optimize coupling in earth system modeling. Frontiers of Computer Science, 8(4), 693–708. Kurth, T., et al. (2018). Exascale deep learning for climate analytics. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. Larson, J., Jacob, R., & Ong, E. (2005). The model coupling toolkit: A new Fortran90 toolkit for building multiphysics parallel coupled models. The International Journal of High Performance Computing Applications, 19(3), 277–292. Li, J., et al. (2003). Parallel netCDF: A high-performance scientific I/O interface. In SC’03: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing. IEEE. Liu, L., et al. (2014). C-Coupler1: A Chinese community coupler for Earth system modeling. Geoscientific Model Development, 7(5), 2281–2302.

10 High-Performance Computing for Earth System Modeling

183

Lofstead, J. F., et al. (2008). Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments. Manabe, S. (1969). Climate and the ocean circulation: I. The atmospheric circulation and the hydrology of the earth’s surface. Monthly Weather Review, 97(11), 739–774. McGuffie, K., & Henderson-Sellers, A. (2001). Forty years of numerical climate modelling. International Journal of Climatology: A Journal of the Royal Meteorological Society, 21(9), 1067–1109. Meehl, G. A. (1984). Modeling the earth’s climate. Climatic Change, 6(3), 259–286. Michalakes, J., & Vachharajani, M. (2008). GPU acceleration of numerical weather prediction. Parallel Processing Letters, 18(04), 531–548. Morgenstern, O., et al. (2017). Review of the global models used within phase 1 of the Chemistry- Climate Model Initiative (CCMI). Geoscientific Model Development, 10(2), 639–671. https:// doi.org/10.5194/gmd-10-639-2017 Norman, M. R., Mametjanov, A., & Taylor, M. (2017). Exascale programming approaches for the accelerated model for climate and energy. In T. P. Straatsma, K. B. Antypas, & T. J. Williams (Eds.), Exascale scientific applications: Scalability and performance portability. New York: Chapman and Hall. Pitman, A. J. (2003). The evolution of, and revolution in, land surface schemes designed for climate models. International Journal of Climatology: A Journal of the Royal Meteorological Society, 23(5), 479–510. Randall, D. A., et al. (2019). 100 years of earth system model development. Meteorological Monographs, 59, 12.1–12.66. Rasp, S., Pritchard, M. S., & Gentine, P. (2018). Deep learning to represent subgrid processes in climate models. Proceedings of the National Academy of Sciences, 115(39), 9684–9689. Rew, R., & Davis, G. (1990). NetCDF: an interface for scientific data access. IEEE Computer Graphics and Applications, 10(4), 76–82. Satoh, M., et al. (2008). Nonhydrostatic icosahedral atmospheric model (NICAM) for global cloud resolving simulations. Journal of Computational Physics, 227(7), 3486–3514. Shimokawabe, T., et al. (2010). An 80-fold speedup, 15.0 TFlops full GPU acceleration of non- hydrostatic weather model ASUCA production code. In SC’10: Proceedings of the 2010 ACM/ IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. Shine, K., & Henderson-Sellers, A. (1983). Modelling climate and the nature of climate models: A review. Journal of Climatology, 3(1), 81–94. Song, M., et al. (2018). In-situ AI: Towards autonomous and incremental deep learning for IoT systems. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE. Thakur, R., Lusk, E., & Gropp, W. (1997). Users guide for ROMIO: A high-performance, portable MPI-IO implementation. Lemont, IL: Argonne National Lab. Turuncoglu, U. U. (2018). Towards in-situ visualization integrated Earth System Models: RegESM 1.1 regional modeling system. Geoscientific Model Development Discussions. https://doi. org/10.5194/gmd-2018-179 Valcke, S. (2013). The OASIS3 coupler: A European climate modelling community software. Geoscientific Model Development, 6(2), 373. Valcke, S., Redler, R., & Budich, R. (2011). Earth system modelling-volume 3: Coupling software and strategies. Heidelberg: Springer. Valcke, S., et al. (2012). Coupling technologies for earth system modelling. Geoscientific Model Development, 5(6), 1589–1596. Vetter, J. S., et al. (2019). Extreme heterogeneity 2018-productive computational science in the era of extreme heterogeneity: Report for DOE ASCR workshop on extreme heterogeneity. Berkeley, CA: Lawrence Berkeley National Lab (LBNL).

184

D. Wang and F. Yuan

Vida, G., Shahab, M. D., & Mohammad, M. (2019). Smart proxy modeling of SACROC CO2- EOR. Fluids, 4(2), 85. Wang, D., Post, W. M., & Wilson, B. E. (2011). Climate change modeling: Computational opportunities and challenges. Computing in Science and Engineering, 13(5), 36–42. Wang, D., et al. (2017). Virtual observation system for earth system model: An application to ACME land model simulations. International Journal of Advanced Computer Science and Applications, 8(2), 171–175. Warner, J. C., Perlin, N., & Skyllingstad, E. D. (2008). Using the Model Coupling Toolkit to couple earth system models. Environmental Modelling and Software, 23(10–11), 1240–1249. Weart, S. (2019). General circulation models of climate, in: The discovery of global warming. Retrieved from: https://www.aip.org/history/climate/GCM.htm Wienke, S., et al. (2012). OpenACC—first experiences with real-world applications. In European Conference on Parallel Processing. Springer. Wiewel, S., Becher, M., & Thuerey, N. (2019). Latent space physics: Towards learning the temporal evolution of fluid flow. In Computer graphics forum. Wiley Online Library.

Chapter 11

High-Performance Pareto-Based Optimization Model for Spatial Land Use Allocation Xiaoya Ma, Xiang Zhao, Ping Jiang, and Yuangang Liu

Abstract Spatial land use allocation is often formulated as a complex multiobjective optimization problem. As effective tools for multiobjective optimization, Pareto-based heuristic optimization algorithms, such as genetic, artificial immune system, particle swarm optimization, and ant colony optimization algorithms, have been introduced to support trade-off analysis and posterior stakeholder involvement in land use decision making. However, these algorithms are extremely time consuming, and minimizing the computational time has become one of the largest challenges in obtaining the Pareto frontier in spatial land use allocation problems. To improve the efficiency of these algorithms and better support multiobjective decision making in land use planning, high-performance Pareto-based optimization algorithms for shared-memory and distributed-memory computing platforms were developed in this study. The OpenMP and Message Passing Interface (MPI) parallel programming technologies were employed to implement the shared-memory and distributed-memory parallel models, respectively, in parallel in the Pareto-based optimization algorithm. Experiments show that both the shared-memory and message-passing parallel models can effectively accelerate multiobjective spatial land use allocation models. The shared-memory model achieves satisfying performance when the number of CPU cores used for computing is less than 8. Conversely, the message-passing model displays better scalability than the shared-memory model when the number of CPU cores used for computing is greater than 8.

X. Ma School of Geosciences, Yangtze University, Wuhan, Hubei, China Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources, Shenzhen, Guangdong, China X. Zhao (*) · P. Jiang School of Resource and Environmental Science, Wuhan University, Wuhan, Hubei, China e-mail: [email protected] Y. Liu School of Geosciences, Yangtze University, Wuhan, Hubei, China © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_11

185

186

X. Ma et al.

Keywords Pareto-based optimization · Land use allocation · High-performance computing · Spatial optimization

1 Introduction As one of the most important tasks in land use planning, spatial land use allocation (SLUA) plays a crucial role in promoting the land use efficiency and protecting land resources. SLUA aims to optimize the spatial pattern of regional land use based on the present spatial land use patterns and future land resource demand (Memmah et al. 2015). To achieve this goal, SLUA focuses on the process of allocating different land use requests to specific spatial land use units according to the natural, economic, and social characteristics of the land resources to maximize the land use objectives (Stewart et al. 2004). Because SLUA problems must address complex objectives and with geospatial data, they are often defined as complex multiobjective combinatorial spatial optimization problems (Liu et al. 2012). Due to their strong ability to address high-dimensional nonlinear problems, heuristic algorithms have become the main search method used in solving SLUA problems (Memmah et al. 2015). Most of the SLUA models proposed in recent years have been based on heuristic algorithms, such as genetic algorithms (Matthews et al. 1999; Porta et al. 2013; Stewart et al. 2004), particle swarm optimization (Liu et al. 2017; Masoomi et al. 2013), ant colony algorithms (Liu et al. 2012; Mousa and El Desoky 2013), artificial immune algorithms (Huang et al. 2013; Zhao et al. 2019), and artificial bee colony algorithms (Shao et al. 2015a; Yang et al. 2015). Multiobjective heuristic optimization algorithms, which have been widely used in solving SLUA problems, can be divided into two categories: scalarization optimization methods and Pareto-based optimization methods (Kaim et al. 2018). The former usually combine multiple objective functions into a single objection function by using the weighted-sum approach; thus, multiobjective SLUA problems are solved as single-objective problems (Porta et al. 2013; Sante-Riveira et al. 2008). The trade-offs between different land use objectives (e.g., economic, ecological, and social objectives) are assessed by weighting coefficients in the weighted-sum approach, thereby allowing decision makers to obtain an optimal SLUA solution for specific decision scenarios or preferences. However, it is sometimes difficult for decision makers to properly determine the weights of the nonlinear optimization objectives in an uncertain decision environment. Alternatively, Pareto-based optimization methods use a Pareto frontier composed of a set of optimal solutions to support trade-off analyses in multiobjective decision making. Compared with scalarization methods, Pareto-based optimization methods are able to find the entire Pareto frontier at one time and perform trade-off analyses with more and better alternatives than those based on scalarization (Deb et al. 2002; Shang et al. 2012). Pareto-based multiobjective algorithms have been used to analyze the trade-offs among various land use objectives in recent SLUA modeling

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

187

studies (Duh and Brown 2007; Hou et al. 2014; Huang et al. 2013; Li and Parrott 2016; Shao et al. 2015b). Unlike scalarization optimization algorithms, which only obtain a unique optimal solution, Pareto-based optimization algorithms search for a set of Pareto optimal solutions in a large search space. Using Pareto-based heuristic optimization algorithms to search for optimal solutions in SLUA problems is a time consuming and challenging task (Huang et al. 2013). Thus, the main objective of this study is to develop a high-performance Pareto-based SLUA optimization algorithm to promote the efficacy of SLUA models and provide better support for multiobjective trade-off analysis under uncertain land use decision-making scenarios. In the past decades, many Pareto-based heuristic optimization algorithms have been developed to solve multiobjective optimization problems, such as nondominated sorting genetic algorithm II (NSGA- II) and strength Pareto evolutionary algorithm 2 (SPEA-2). Because previous studies have already shown that multiobjective artificial immune system have a strong ability to obtain effective spreading and convergence characteristics for the Pareto optimal front in multiobjective optimization problems (Huang et al. 2013; Shang et al. 2012). Consequently, we chose a multiobjective artificial immune algorithm as the fundamental optimization method to build our optimization model. The remainder of this paper is organized as follows. Section 2 presents a brief review of heuristic algorithms and high-performance computing in solving SLUA problems. Section 3 introduces the basic concepts and methodology used in this study. Section 4 provides a case study, and the performance of the proposed model is evaluated and discussed in this section. Section 5 concludes the paper and introduces future works.

2 A Brief Review of High-Performance Computing in SLUA Modeling There is no doubt that heuristic algorithms have achieved great success in solving SLUA problems. Nevertheless, one of the biggest challenges faced by heuristic models is that as the scale of the SLUA problem grows, the computing time required to obtain satisfactory optimal solutions becomes unbearable, especially for Pareto- based optimization methods (Memmah et al. 2015). Consequently, due to the constraints of computing resources, heuristic models in early studies were only applied in SLUA problems with no more than 1000 decision variables (Duh and Brown 2007; Matthews et al. 1999; Stewart et al. 2004). As computers have become faster, heuristic algorithm-based models have been used to solve large-scale SLUA problems in recent years (see Table 11.1). Obviously, solving SLUA problems is a computationally intensive task. Furthermore, with increases in the numbers of decision variables and optimization objectives, the search space of SLUA problems can grow exponentially (Memmah et al. 2015). Table 11.1 shows that even scenarios with tens

188

X. Ma et al.

Table 11.1 Representative examples using heuristic methods for land use optimization #Decision Reference variables Cao et al. 16,779 (2012) Shaygan 94,000 et al. (2014) Liu et al. 281,736 (2015) Li and Parrott 174,484 (2016) Garcia et al. 13,431 (2017)

#Objectives Algorithm 3 NSGA-II

Optimization method Pareto based

Time Iterations (hours) 5000 5

2

NSGA-II

Pareto based

10,000

84

2

Genetic algorithm Genetic algorithm NSGA-II

Scalarization

40,000

17

Pareto based

2897

9

Pareto based

150

12

3 3

Table 11.2 High-performance SLUA models in recent studies

Reference Cao and Ye (2013) Porta et al. (2013) Zhang et al. (2015) Sante et al. (2016)

Optimization method Scalarization

Number of decision variables 586

Scalarization

86,130

Scalarization

73,396

Scalarization

84,216

Parallelization technique Shared memory Message passing(MPJ) Message passing(MPI) Shared memory (OpenMP)

Max number of threads/process used 4

Max speedup ≈2.5

32

≈9.6

64

≈42.7

7

≈5.4

of thousands of decision variables require thousands or even tens of thousands of iterations and many hours to obtain optimal solutions. It is foreseeable that as the problem scale increases further, the computational costs of modeling may become unaccepted. Consequently, parallel computing technology, which can use multiple cores or CPUs to improve the computational efficiency and reduce the computational time, has been employed to accelerate the optimization process of SLUA in recent years. Table 11.2 summarizes the applications of high-performance computing to solve SLUA problems in recent studies. Experiments in recent studies demonstrated that both the quality of the optimal solutions and the efficacy of the parallel algorithm have been improved compared with the features of traditional sequential algorithms. The speedup and efficiency of the parallel algorithms (see Table 11.2) developed in these studies indicate that parallel computing is a promising approach for efficient and effective SLUA modeling (Memmah et al. 2015). Although previous studies have made many successful attempts to parallelize SLUA models and obtained notable results, the existing parallel SLUA models still have some disadvantages in supporting multiobjective land use decision making. As Table 11.2 shows, the multiobjective optimization methods used in previous parallel SLUA models were based on scalarization methods (Cao

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

189

and Ye 2013; Porta et al. 2013; Sante et al. 2016). To perform trade-off analysis in land use planning, decision makers must adjust the weights of each objective in these models to simulate different decision preferences or scenarios. Consequently, these models must run many times to obtain optimal solutions according to specific decision preferences or scenarios. Overall, high-performance computing technology strengthens the ability of optimization models to solve large-scale SLUA problems and provides powerful computing support for land use decision making. However, most of the high-performance optimization models used to solve multiobjective SLUA problems in previous studies were based on scalarization methods, which are inappropriate for trade-off analysis and posterior stakeholder involvement scenarios (Kaim et al. 2018). Consequently, to better support multiobjective decision making and trade-off analysis in land use planning, a high-performance Pareto-based optimization algorithm for solving multiobjective SLUA problems is urgently needed.

3 Methodology 3.1 Definitions and Basic Concepts 3.1.1 Multiobjective Optimization Multiobjective optimization problems can be formulated as follows:

max y = F ( x ) = ( f1 ( x ) , f2 ( x ) ,…, fm ( x ) )

s.t.gi ( x ) ≤ 0, i = 1, 2,…, p

h j ( x ) = 0, j = 1, 2,…, q

x = ( x1 ,x2 ,…,xn ) ∈ X ⊂ R n

y = ( y1 ,y2 ,…,yn ) ∈ Y ⊂ R m

(11.1)

where x is the decision vector and X is the n-dimensional decision space. F is an m-dimensional vector that contain the m objective values of the optimization problem. All the constraints can be defined by p inequalities and q equalities. Only the decision vector x ∈ X satisfies all constraints gi(x) ≤ 0 (i = 1, 2, …, p) and hj(x) = 0 (j = 1, 2, …, q) in the feasible set xf ∈ X. Following this definition, a feasible decision vector xA ∈ X is said to dominate (Pareto optimal) another feasible decision vector xB ∈ X (denoted as xA ≻ xB) only if Eq. (11.2) is satisfied.

190

X. Ma et al.

(∀i ∈ {1,2,…,m} : f ( x ) ≥ f ( x )) ∧ ( ∃k ∈ {1, 2,…m} : f ( x ) > f ( x )) A

i

B

A

i

k

B

k

(11.2)

In contrast, if a solution x∗ is said to be a nondominated solution or Pareto optimal solution, it can be defined as in Eq. (11.3).

¬∃x ∈ X f : x  x ∗

(11.3)

The Pareto optimal setPs is a set of Pareto optimal solutions to an optimization problem, and the set PF of all the objective values corresponding to the solutions in Ps is called the Pareto optimal front. The Pareto-based optimization algorithms search for the Pareto optimal solutions of multiobjective problems instead of combining multiple objectives in a single objective to obtain a unique optimal solution. 3.1.2 Multiobjective Artificial Immune Algorithm The artificial immune system (AIS) is a type of robust heuristic algorithm, which was developed based on the principles of natural immune systems(de Castro and Timmis 2003). In past decades, many AIS algorithms have been developed based on different immune mechanisms and theories for various purposes. For example, negative selection algorithms are often applied in information security (Ji and Dasgupta 2004), and clonal selection algorithms are usually used for optimization and machine learning (de Castro and Von Zuben 2002). Therefore, we use the multiobjective clonal selection algorithm to implement the parallel optimization model in this study. Figure 11.1 illustrates the basic steps of the algorithm. (1) The first step is encoding, which aims to represent the feasible solutions of the optimization problem as

Fig. 11.1 Flowchart of the multiobjective AIS algorithm

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

191

the artificial individuals (antibodies) of the algorithm. Thus, spatial decision variables in the solution are represented as virtual units (genes) in AIS. (2) The second step is initialization, which randomly generates the initial antibody population with N antibodies for the n decision variables. (3) The third step is evaluation, which uses objective functions to calculate the objective vector for each antibody. (4) The fourth step is nondominated sorting, in which the antibodies in AIS are sorted based on the principles of nondominated ranking and their crowding distance (Deb et al. 2002). Then, the entire population is divided into two subpopulations: a dominated population and a nondominated population. If the size of the nondominated population is greater than the size of the dominated population, i.e., Nnon > Nn, then the antibody with the smallest crowding distance will be deleted. The delete operation is continued until Nnon = Nn, where Nn is the expected size of the nondominated population. (5) If It reaches the maximum iteration gmax, then the algorithm will be terminated; otherwise, It is set to It = It + 1, and the algorithm proceeds to the next step. (6) The sixth step involves cloning, in which each antibody in the nondominated population is cloned q times. q denotes the cloning rate. (7) In hypermutation, gene values in antibodies are altered to produce new individuals for the next generation and obtain better solutions. Steps 3–7 above are repeated until the termination condition is met. The details of the multiobjective clonal selection algorithm can be found in previous studies (Deb et al. 2002; Shang et al. 2012).

3.2 Formulation of SLUA Problems SLUA focuses on optimizing the spatial patterns of land use by regulating the use of each land use unit at the microlevel to meet the needs of future land use. As a type of combinatorial optimization problem, we define SLUA problems as follows: Objective functions: (11.4)

 ∑n S  f1 ( x ) = max  i =1 i   n 

(11.5)

 ∑ mj =1 C j  f2 ( x ) = max    m  Cj =

4 Aj Pj

× 100

(11.6)

subject to: i =1

∑ Ai × lik = Ak N

(11.7)

192

X. Ma et al.

1, l = k lik = { i 0, li ≠ k j =1

∑ lik = 1 K

(11.8) (11.9)

where objective function f1(x) aims to maximize the land use suitability of the solution of SLUA and f2(x) seeks to maximize the spatial compactness of the optimal solutions. Both f1(x) and f2(x) have been widely used in previous studies (Huang et al. 2013; Porta et al. 2013; Sante et al. 2016). n is the number of land use units in the study area. m is the number of land use patches in a feasible solution. A patch is composed of a set of adjacent land use units that have the same type of land use. Si is the suitability of the ith land use unit according to the allocated land use in a solution. Land use suitability indicates whether a land use unit is appropriate for given land uses according to its natural, economic, and social characteristics (Malczewski 2004). Land use suitability varies with different land uses in the same land use unit, and the corresponding value in this study is in the range of [0, 100]. Cj is a landscape index used to measure the compactness of patch. Aj and Pj are the area and perimeter of the jth patch, respectively. Ak is the total area of the kth land use type, which is determined in the quantity allocation of land resources. li is the allocated land use in the ith land use unit. In this study, the land use types are encoded with integers from 1 to 5, which represent arable land, orchard land, forest, grassland, and built-up land. Equation (11.7) ensures that the area of the kth land use in the optimized spatial land use scheme equals the value determined in the quantity allocation of land resources. Equations (11.8) and (11.9) are constrained such that each land use unit can only be allocated for one land use type.

3.3 Multiobjective AIS for SLUA 3.3.1 Encoding and Initialization Land use units in SLUA problems can be represented by two types of data models: vector-based (Cao and Ye 2013; Sante et al. 2016) and raster-based (Cao et al. 2012) models. Compared with vector-based models, raster-based land use unit models are more efficient and effective in optimizing spatial land use patterns (Huang et al. 2013). Moreover, the area of each cell in raster-based models is equal, and the data structure of raster-based models is simpler than that of the polygons in vector-based models. Consequently, raster-based methods can greatly reduce the complexity of the local search algorithm in an optimization model. Therefore, land use units are represented by a raster-based data model in this study (Fig. 11.2). To address spatially explicit land use allocation problems, the initialization and hypermutation algorithms in the clonal selection algorithm were redesigned to

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

193

Fig. 11.2 Encoding solution for SLUA problems

Fig. 11.3 Improved initialization operation of AIS for solving SLUA problems

improve the efficiency of the overall algorithm and meet the demands of SLUA modeling. The details of these improved operations are as follows. SLUA problems are complex geospatial problems, and land use allocation must follow certain rules, such as land use suitability rules. Moreover, initialization has crucial impacts on the performance of a model (Duh and Brown 2007; Liu et al. 2016; Yang et al. 2018). Consequently, we improved the stochastic initialization algorithm in AIS to obtain reasonable initial land use allocation schemes. The principles of the improved initialization algorithm are illustrated in Fig. 11.3. Before performing the allocation operation, we build two lists: the location list of the unallocated land use units (denoted as U) and the land use requirement list L. All of the unallocated land use units in the study area are kept in list U. The elements Lk in list L represent the number of cells to be allocated for the kth land use. This value can be obtained by using the following formula:

194

X. Ma et al.

A Lk = Round  k  Ac

  

(11.10)

where Ak is the area of the kth land use, which is determined in the quantity allocation of land resources, and Acis the area of each cell. Overall, the initialization operation includes the following steps: (1) randomly select an unallocated land unit Ui from list U; (2) when Lk > 0, obtain the land use suitability information for land use type k; (3) based on the land use suitability information, build a roulette wheel to determine which land use is the winner (denoted as w) in land use unit Ui; and (4) allocate w to Ui and remove Ui from list U. Then, set Lk = Lk − 1. The steps above are repeated until all the element values in L equal 0. Then, a new antibody is initialized. All the antibodies generated by using the operations stated above must meet the constraints defined in formula (11.3). 3.3.2 Mutation Mutation is the only way to generate new individuals in AIS; it often alters the values of the genes of an antibody randomly in classical models. To improve the efficacy of the optimization algorithm, we adopt a greedy strategy to improve the mutation operation. In this study, for any arbitrary cell C, if the number of land use types in the 3 × 3 Moore neighborhood of C is greater than 1, then C and the other 8 neighborhood cells are edge cells. Otherwise, C is an interior cell. Based on these assumptions, Fig. 11.4 illustrates the principles and flowchart of the improved mutation operation. For a given antibody Ab that the mutation operation is performed on, the main steps of the mutation are as follows. (1) Build the edge cell list for each land use type, denoted as E. Each element in E represents the location of edge cells. (2) Build the list of alternative land uses AL. For any arbitrary element Ei, assuming that the current land use type of Ei is x, the suitability for the xth land use at location Ei is Six, and the compactness of the patch that Ei belongs to is Cix. For a given land use type y, if Siy > Six or Ciy > Cix, then y is an alternative land use type for Ei. In this case, the location of Ei is added to list ALy. (3) Randomly select gene Ei from list E; if rnd N non % ( N p − 1) Pi = { N non / ( N p − 1) + 1, i ≤ N non % ( N p − 1)

(11.12)

where i is the rank ID of the slaves and i ∈ [1, Np − 1]. Pi is the number of antibodies in the ith subset, which will be distributed to the ith slave process. Nnon is the number of nondominated antibodies in the master process. (4) After receiving nondominated antibodies from the master, slaves are activated to perform cloning, mutation, and evaluation operations and produce a new generation of antibodies. (5) The new antibodies produced by slaves are sent to the master, and the slave processes are suspended again and wait to receive the next generation of nondominated antibodies. (6) The master gathers all the new antibodies generated by slaves and performs the nondominated sorting of these antibodies. Additionally, some nondominated antibodies will be removed based on their crowding distance by the master process. (7) The master process determine whether the terminal condition is satisfied. If yes, then the optimal results are output, and the master and slave processes are terminated. Otherwise, steps 3–6 are repeated until the terminal condition is satisfied.

4 Case Study 4.1 Study Area and Data Anyue County, which is located in Sichuan Province (Fig. 11.7) and has an area of 2710 km2, was chosen as the study area to evaluate the performance of the proposed model. Anyue has experienced rapid urbanization over the past decade. Additionally, agricultural reform has played an essential role in promoting farmers’ incomes in this area and has influenced changes in spatial land use patterns. The parallel Pareto AIS algorithms we developed were used to generate alternative SLUA schemes to support land use planning and decision making in this area. We obtained statistical yearbook data and land use data from 2015, and the results of land use suitability evaluations from different departments of the local county governments were obtained. The land use data in the study area were rasterized into 100 × 100 m cells corresponding to a land use raster with 708 rows and 778 columns. Suitability of each land use is represented within a range from 0 to 100 (0 means completely unsuitable for a specific land use and 100 is the highest suitability). The suitability data have the same spatial resolution and spatial reference systems as the land use data (Fig. 11.8). In this study, we take 2015 as the base year of land use planning and use our model to optimize SLUA in the study area in 2025. Because the allocation of transportation land is achieved with special infrastructure construction planning, transportation land is not included in the optimization in this study. In addition, we assume that the distribution of water will not change during the planning period to

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

199

Fig. 11.7 Location of the study area: Anyue County, Sichuan Province, China

protect the ecosystem. Consequently, only cropland, orchards, forest, grassland, built-up areas are included in this case study, and the number of decision variables is 259,097 in the study area. Table 11.3 presents the results of the quantity allocation of land resources, which were optimized by using the multiobjective artificial immune algorithm based on an analysis of the land use demand and land resource supply (Ma and Zhao 2015) in 2025 in the study area.

4.2 Experiments and Optimization Results To assess the performance of the proposed model, we conducted experiments on a high-performance computer at the Super-Computing Center of Wuhan University (http://hpc.whu.edu.cn/). The supercomputer includes 120 heterogeneous physical nodes connected by the InfiniBand FDR network. Each node has two Intel Xeon E5–2630 v3 8-core 2.4 GHz CPUs with 6 GB memory per CPU core. The operating system running on the supercomputer is Linux 64-bit CentOS 6.6. GCC v5.4, and MPICH v3.2 installed on the supercomputer was selected to compile the program. The C++ language was employed to develop the program, and the GNU Compiler Collection (GCC) v5.4 was used to compile source files and to support OpenMP. The

200

X. Ma et al.

Fig. 11.8 Spatial patterns of land use in 2015 (a) and land use suitability in Anyue (b) Cropland; (c) Orchard; (d) Forest; (e) Grassland; (f) Built-up land Table 11.3 Areas of the six land use types in Anyue in 2015 and 2025 (unit: ha) Land use type 2015 (base year) 2025 (optimized)

Cropland 135,767 134,832

Orchard 14,696 14,956

Forest 92,165 93,289

Grassland 4634 2641

Built-up land 11,813 13,379

Bare land 22 0

open source package MPICH provides the essential message-passing functions for the implementation of the message-passing parallel algorithm. Jobs were submitted to the supercomputer via the workload management system SLRUM (Simple Linux Utility for Resource Management). The parameters of the message-passing parallel algorithm are set as follows: Nn is 96, the cloning rate is 4, D is 100, and 96 processes are used for optimization. After 10,000 iterations, we obtained a satisfactory Pareto frontier with 96 nondominated solutions in the study area. Figure 11.9 shows that the obtained nondominated solutions are widespread at the near-Pareto front: the compactness objective value

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

201

Fig. 11.9 Pareto frontier obtained by the message-passing parallel optimization algorithm in the study area

of the solutions ranges from 47 to 76, and the suitability objective value ranges from 82 to 85. In addition, the nondominated solutions are evenly distributed along the Pareto front, especially between solution a and solution e in Fig. 11.9. As shown in Fig. 11.9, solution a has the highest suitability and the lowest compactness value. In contrast, solution f has the highest compactness value and the lowest suitability. Figure 11.10 shows the corresponding land use patterns of solutions a-f in Fig. 11.9, and Fig. 11.11 illustrates the land use patterns of two representative regions in the study area. As demonstrated in Fig. 11.10, the land use suitability of the selected solutions decreases from solution a to solution f, whereas land use patches become more compact. In Fig. 11.11, Region 1 is the area around the capital of the county, and Region 2 is the major orchard area in this county. Both areas experienced rapid land use change over the past 10 years due to rapid urbanization (Region 1) and agriculture reform (Region 2). From a1 to f1 and a2 to f2 in Fig. 11.11, as the compactness value increases, we obtain more connected and larger patches in the land use allocation scheme. The decision makers can choose an approach to support practical land use planning based on the trade-off analysis between suitability and compactness according to the land use scenarios. Compared with the weighted-sum multiobjective optimization method, the Pareto-based approach can provide decision makers with more alternatives in an intuitive way; furthermore, it can avoid the subjective weight configuration required for different optimization objectives, which is difficult for decision makers to properly determine.

202

X. Ma et al. N

a

b

c

d

e

f

Cropland

Orchard

Forest

Grassland

Built-up

0 5 10

20

30

40

50 Km

Fig. 11.10 Spatial land use patterns of the labeled solutions in Fig. 11.9: (a–f) are the spatial land use allocation solutions labeled (a–f) in Fig. 11.9

Fig. 11.11 Optimal spatial land use patterns generated from different solutions in the specified regions: a1 ~ f1and and a2 ~ f2 are the optimal land use patterns of Region 1 and Region 2 in solutions a–f, respectively

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

203

4.3 Efficiency Analysis and Comparison To assess the efficiency of the two parallel models, both algorithms were run with the same parameters on a high-performance computer. Figure 11.12 shows that the computing time decreased as the number of threads (denoted as Nt) or processes increased. The sequential model required 362 h (approximately 15 days) to complete the optimization. The message-passing supported parallel algorithm took 17 h to accomplish the same task. When less than 8 CPUs were used for computing, the computational times of the shared-memory model were shorter than those of the message-passing model. However, as the number of CPUs increased to 8 or more, the performance of the message-passing model increased beyond that of the shared- memory model. We can also observe from Fig. 11.13 that the decrease in the parallel efficiency of the message-passing model is slower than that for the shared-memory model in

Fig. 11.12 Computing times of the message-passing and shared-memory algorithms

Fig. 11.13 Speedup and efficiency of the parallel algorithms

204

X. Ma et al.

this study. When using only two threads, the shared-memory model can obtain the highest parallel efficiency of 98.5%. Nevertheless, as Nt exceeds 8, model performance declines rapidly. The efficiency of the shared-memory model is 38.3% when the number of threads reaches 16. Due to the use of fork-join parallelism, all threads or CPU resources can participate in the optimization computations. Hence, the shared-memory model can achieve good performance when there are few threads that participate in the computations. However, as the number of threads increases, the amount of time be wasted on resource competition or synchronization increases, which leads to a decline in parallel performance. Another explanation for the decline in efficiency of the shared-memory algorithm is that the platform we used to run the experiments is a NUMA (non-uniform memory access) node (Pilla et al. 2014). The allocation of threads in our program does not consider this particular structure and leads to some overhead of communication for data access (Gong et al. 2017). The efficiency of the message-passing model initially increased and then decreased with increasing the number of processes or CPUs. We obtained the highest efficiency of 76.6% when 8 CPUs were used to run this model. Notably, the model has to use one process to manage the workflow and to coordinate communication between the master and slaves. The master process does not participate in the computations. Therefore, the superiority of the “master–-slave” architecture is not evident when only a few CPUs are used to run the model. As the number of slave processes increases, the master process requires more time to send and receive messages between the master and slaves. Therefore, the communication and synchronization costs increase over the computing time. Figure 11.13 also shows that the parallel efficiency of the message-passing model declines rapidly when the number of processes exceeds 16. When the number of processes reaches 96 in this case, and the parallel efficiency is only 21.8%. Additionally, the speedup of the shared-memory model significantly increases when the number of threads is less than 8. However, when Nt ≥ 8, the speedup is relatively constant. In contrast, before Np reaches 96, the speedup of the message- passing model shows a significant growth trend, which suggests that the MPI accelerated model has better scalability in this study. The cloning rate q is an essential parameter in AIS, and a large cloning rate is important for maintaining the diversity of the population and improving the ability of the algorithm to avoid local optima. However, the cloning rate also has a significant influence on the computational complexity. For example, if the clone rate used in optimization is doubled, the computing time for cloning, evaluation, mutation and message passing would double. Consequently, we conducted additional experiments to examine the influence of the cloning rate on the parallel efficiency of the algorithm. The cloning rate used in these experiments was set to 2, 4, 6, 8, and 10, and the results are shown in Fig. 11.14. According to Fig. 11.14, we can obtain consistent speed-up curves for each parallel model despite the difference in the cloning rate. Therefore, we can conclude that although the cloning rate is an important factor that affects the computational

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

205

Fig. 11.14 Influence of the cloning rate on the speedup of the shared-memory (a) and message- passing (b) models

complexity, it has no significant influence on parallel efficiency in both the shared- memory and message-passing models in this study.

5 Conclusions Multiobjective heuristic algorithms have been shown to be the most effective approaches for searching the Pareto frontier of land use allocation problems. However, as the numbers of decision variables and objectives increase, the search space of a multiobjective heuristic algorithm grows exponentially. Consequently, improving the efficiency and reducing the optimization time of SLUA models have become two of the largest challenges in the multiobjective optimization of spatial land use allocation. In this study, we developed a high-performance Pareto-based optimization model for SLUA based on a multiobjective artificial immune optimization algorithm and parallel computing technologies. Both shared-memory and message-passing parallel models were employed in parallel in the Pareto-based optimization algorithm for SLUA. Because most of the computing time of the algorithm is spent on evaluation and mutation, the parallelization of the shared-memory model focused on the loop- level parallelism by using the compiler directives provided by OpenMP. Additionally, the message-passing parallel algorithm was implemented based on a “master-slave” architecture, which uses one master process to manage the population and workflow of AIS and many slave processes running on distributed computing nodes to perform the computing tasks. To assess the performance of the parallelized Pareto-based optimization algorithms for SLUA, a series of experiments was conducted on a super computer. The experimental results show that high-performance computing is an effective way to

206

X. Ma et al.

promote the efficiency of multiobjective AIS algorithms and to reduce the computing time required to obtain the Pareto frontier in SLUA problems. The shared- memory model displayed excellent performance when less than 8 threads were used for computing. In contrast, the message-passing algorithm achieves its highest parallel efficiency of 76.6% when 8 processes were used to run the model. Although the efficiency of the message-passing model continues to decline as the number of CPUs increases, we still obtained a satisfactory speed-up level when 96 CPUs were used for optimization. Overall, the message-passing model displays better scalability than the shared-memory model in this study. Nevertheless, the shared-memory model is still the most effective and convenient way to support high-performance optimization for solving SLUA problems on PCs or workstations with multiple cores/CPUs. We also examined the influence of the parameter cloning rate of AIS on the speedup of the parallel algorithms. The experimental results indicated that although the cloning rate is an important factor that affects the computational complexity, it has no significant influence on the efficiency of the parallel models in this study. Therefore, we can conclude that the parallel model designed in this study is a promising way to promote the efficiency of multiobjective AIS and to reduce the computing time required to obtain the Pareto frontier in SLUA problems. The future works based on this study will include the following topics. First, more objective functions should be considered. In this study, we only adopt the most common and widely used objective functions to model the SLUA problem. However, the decision-making process in land use planning might consider more factors, such as objectives related to environmental protection and ecological security. Second, a hybrid parallel computing model could be used to improve the performance of our model. As the complexity of the SLUA problem increases, the search space of the optimization algorithm grows dramatically; as a result, the optimization algorithm requires a large number of iterations to obtain a satisfactory Pareto frontier. Consequently, the spatial domain decomposition of SLUA problems is essential not only for implementing hybrid parallel computing models but also for reducing the search space size and number of iterations of the optimization algorithm. We will focus on the decomposition of the spatial domain, multistage gradual optimization, and the design of a hybrid parallelization strategy (data parallelism and task parallelism) in the future. Acknowledgments This research was supported by the National Nature Science Foundation of China (Grant Nos. 41971336 and 41771429), National key research and development program (Grant No. 2018YFD1100801), and Open Fund of the Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (Grant No. KF-2018-03-033). We would like to thank Wenwu Tang and the two anonymous reviewers for their valuable comments. The authors also acknowledge the support received from the Supercomputing Center of Wuhan University. The optimization calculations in this paper were performed with the supercomputing system at the Supercomputing Center of Wuhan University.

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

207

References Cao, K., & Ye, X. Y. (2013). Coarse-grained parallel genetic algorithm applied to a vector based land use allocation optimization problem: The case study of Tongzhou Newtown, Beijing, China. Stochastic Environmental Research and Risk Assessment, 27(5), 1133–1142. https:// doi.org/10.1007/s00477-012-0649-y Cao, K., Huang, B., Wang, S. W., & Lin, H. (2012). Sustainable land use optimization using boundary-based fast genetic algorithm. Computers Environment and Urban Systems, 36(3), 257–269. https://doi.org/10.1016/j.compenvurbsys.2011.08.001 de Castro, L. N., & Timmis, J. I. (2003). Artificial immune systems as a novel soft computing paradigm. Soft Computing, 7(8), 526–544. https://doi.org/10.1007/S00500-002-0237-z de Castro, L. N., & Von Zuben, F. J. (2002). Learning and optimization using the clonal selection principle. IEEE Transactions on Evolutionary Computation, 6(3), 239–251. https://doi. org/10.1109/tevc.2002.1011539 Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182–197. https:// doi.org/10.1109/4235.996017 Duh, J. D., & Brown, D. G. (2007). Knowledge-informed Pareto simulated annealing for multi- objective spatial allocation. Computers Environment and Urban Systems, 31(3), 253–281. https://doi.org/10.1016/j.compenvurbsys.2006.08.002 Garcia, G. A., Rosas, E. P., Garcia-Ferrer, A., & Barrios, P. M. (2017). Multi-objective spatial optimization: Sustainable land use allocation at sub-regional scale. Sustainability, 9(6). https:// doi.org/10.3390/su9060927 Gong, Z., Tang, W., & Thill, J.-C. (2017). A graph-based locality-aware approach to scalable parallel agent-based models of spatial interaction. In D. A. Griffith, Y. Chun, & D. J. Dean (Eds.), Advances in geocomputation (pp. 405–423). Cham: Springer. Hou, J. W., Mi, W. B., & Sun, J. L. (2014). Optimal spatial allocation of water resources based on Pareto ant colony algorithm. International Journal of Geographical Information Science, 28(2), 213–233. https://doi.org/10.1080/13658816.2013.849809 Huang, K. N., Liu, X. P., Li, X., Liang, J. Y., & He, S. J. (2013). An improved artificial immune system for seeking the Pareto front of land-use allocation problem in large areas. International Journal of Geographical Information Science, 27(5), 922–946. https://doi.org/10.108 0/13658816.2012.730147 Ji, Z., & Dasgupta, D. (2004). Real-valued negative selection algorithm with variable-sized detectors. In K. Deb, R. Poli, W. Banzhaf, H. G. Beyer, E. Burke, P. Darwen, … A. Tyrrell (Eds.), Genetic and Evolutionary Computation - Gecco 2004, Pt 1, Proceedings (Vol. 3102, pp. 287–298). Jin, H., Jespersen, D., Mehrotra, P., Biswas, R., Huang, L., & Chapman, B. (2011). High performance computing using MPI and OpenMP on multi-core parallel systems. Parallel Computing, 37(9), 562–575. https://doi.org/10.1016/j.parco.2011.02.002 Kaim, A., Cord, A. F., & Volk, M. (2018). A review of multi-criteria optimization techniques for agricultural land use allocation. Environmental Modelling & Software, 105, 79–93. https://doi. org/10.1016/j.envsoft.2018.03.031 Li, X., & Parrott, L. (2016). An improved genetic algorithm for spatial optimization of multi- objective and multi-site land use allocation. Computers Environment and Urban Systems, 59, 184–194. https://doi.org/10.1016/j.compenvurbsys.2016.07.002 Liu, D. F., Tang, W. W., Liu, Y. L., Zhao, X., & He, J. H. (2017). Optimal rural land use allocation in Central China: Linking the effect of spatiotemporal patterns and policy interventions. Applied Geography, 86, 165–182. https://doi.org/10.1016/j.apgeog.2017.05.012 Liu, X. P., Li, X., Shi, X., Huang, K. N., & Liu, Y. L. (2012). A multi-type ant colony optimization (MACO) method for optimal land use allocation in large areas. International Journal of Geographical Information Science, 26(7), 1325–1343. https://doi.org/10.1080/1365881 6.2011.635594

208

X. Ma et al.

Liu, Y. F. L., Tang, W., He, J. H., Liu, Y. F. L., Ai, T. H., & Liu, D. F. (2015). A land-use spatial optimization model based on genetic optimization and game theory. Computers Environment and Urban Systems, 49, 1–14. https://doi.org/10.1016/j.compenvurbsys.2014.09.002 Liu, Y. L., Peng, J. J., Jiao, L. M., & Liu, Y. F. (2016). PSOLA: A heuristic land-use allocation model using patch-level operations and knowledge-informed rules. PLoS One, 11(6), e0157728. https://doi.org/10.1371/journal.pone.0157728 Ma, X. Y., & Zhao, X. (2015). Land use allocation based on a multi-objective artificial immune optimization model: An application in Anlu County, China. Sustainability, 7(11), 15632–15651. https://doi.org/10.3390/su71115632 Malczewski, J. (2004). GIS-based land-use suitability analysis: A critical overview. Progress in Planning, 62(1), 3–65. https://doi.org/10.1016/j.progress.2003.09.002 Masoomi, Z., Mesgari, M. S., & Hamrah, M. (2013). Allocation of urban land uses by multi- objective particle swarm optimization algorithm. International Journal of Geographical Information Science, 27(3), 542–566. https://doi.org/10.1080/13658816.2012.698016 Matthews, K. B., Sibbald, A. R., & Craw, S. (1999). Implementation of a spatial decision support system for rural land use planning: Integrating geographic information system and environmental models with search and optimisation algorithms. Computers and Electronics in Agriculture, 23(1), 9–26. https://doi.org/10.1016/s0168-1699(99)00005-8 Memmah, M. M., Lescourret, F., Yao, X., & Lavigne, C. (2015). Metaheuristics for agricultural land use optimization: A review. Agronomy for Sustainable Development, 35(3), 975–998. https://doi.org/10.1007/s13593-015-0303-4 Mousa, A. A., & El Desoky, I. M. (2013). Stability of Pareto optimal allocation of land reclamation by multistage decision-based multipheromone ant colony optimization. Swarm and Evolutionary Computation, 13, 13–21. https://doi.org/10.1016/j.swevo.2013.06.003 Pilla, L. L., Ribeiro, C. P., Coucheney, P., Broquedis, F., Gaujal, B., Navaux, P. O. A., et al. (2014). A topology-aware load balancing algorithm for clustered hierarchical multi-core machines. Future Generation Computer Systems-The International Journal of Escience, 30, 191–201. https://doi.org/10.1016/j.future.2013.06.023 Porta, J., Parapar, J., Doallo, R., Rivera, F. F., Sante, I., & Crecente, R. (2013). High performance genetic algorithm for land use planning. Computers Environment and Urban Systems, 37, 45–58. https://doi.org/10.1016/j.compenvurbsys.2012.05.003 Sante, I., Rivera, F. F., Crecente, R., Boullon, M., Suarez, M., Porta, J., et al. (2016). A simulated annealing algorithm for zoning in planning using parallel computing. Computers Environment and Urban Systems, 59, 95–106. https://doi.org/10.1016/j.compenvurbsys.2016.05.005 Sante-Riveira, I., Boullon-Magan, M., Crecente-Maseda, R., & Miranda-Barros, D. (2008). Algorithm based on simulated annealing for land-use allocation. Computers & Geosciences, 34(3), 259–268. https://doi.org/10.1016/j.cageo.2007.03.014 Shang, R. H., Jiao, L. C., Liu, F., & Ma, W. P. (2012). A novel immune clonal algorithm for MO problems. IEEE Transactions on Evolutionary Computation, 16(1), 35–50. https://doi. org/10.1109/tevc.2010.2046328 Shao, J., Yang, L. N., Peng, L., Chi, T. H., & Wang, X. M. (2015a). An improved artificial bee colony-based approach for zoning protected ecological areas. PLoS One, 10(9), e0137880. https://doi.org/10.1371/journal.pone.0137880 Shao, J., Yang, L. N., Peng, L., Chi, T. H., Wang, X. M., & Destech Publicat, I. (2015b). Artificial bee colony based algorithm for seeking the pareto front of multi-objective land-use allocation. In International Conference on Electrical and Control Engineering (ICECE 2015) (pp. 346–351). Shaygan, M., Alimohammadi, A., Mansourian, A., Govara, Z. S., & Kalami, S. M. (2014). Spatial multi-objective optimization approach for land use allocation using NSGA-II. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(3), 906–916. https://doi. org/10.1109/jstars.2013.2280697

11 High-Performance Pareto-Based Optimization Model for Spatial Land Use…

209

Stewart, T. J., Janssen, R., & van Herwijnen, M. (2004). A genetic algorithm approach to multiobjective land use planning. Computers & Operations Research, 31(14), 2293–2313. https://doi. org/10.1016/s0305-0548(03)00188-6 Yang, L. N., Sun, X., Peng, L., Shao, J., & Chi, T. H. (2015). An improved artificial bee colony algorithm for optimal land-use allocation. International Journal of Geographical Information Science, 29(8), 1470–1489. https://doi.org/10.1080/13658816.2015.1012512 Yang, L. N., Zhu, A. X., Shao, J., & Chi, T. H. (2018). A knowledge-informed and pareto-based artificial bee colony optimization algorithm for multi-objective land-use allocation. ISPRS International Journal of Geo-Information, 7(2), 63. https://doi.org/10.3390/ijgi7020063 Zhang, T., Hua, G. F., & Ligmann-Zielinska, A. (2015). Visually-driven parallel solving of multi- objective land-use allocation problems: A case study in Chelan, Washington. Earth Science Informatics, 8(4), 809–825. https://doi.org/10.1007/s12145-015-0214-6 Zhao, X., Ma, X., Tang, W., & Liu, D. (2019). An adaptive agent-based optimization model for spatial planning: A case study of Anyue County, China. Sustainable Cities and Society, 51, 101733. https://doi.org/10.1016/j.scs.2019.101733

Chapter 12

High-Performance Computing in Urban Modeling Zhaoya Gong and Wenwu Tang

Abstract In order to better understand the intrinsic mechanisms of urbanization, urban modeling has become a multidisciplinary effort, from disciplines such as geography, planning, regional science, urban and regional economics, and environmental science, which intends to create scientific models to account for functions and processes that generate urban spatial structures at either intra-urban or inter- urban scales. It is due to these intrinsic properties that urban models involve tremendous computational and data complexity and intensity. As the development of computational technologies such as high-performance computing, how to leverage high-performance computing to tackle the computational and data issues for urban modeling becomes an imperative objective. This chapter reviews and discusses the design and development of high-performance computing-enabled operational models for urban studies in several identified modeling application categories. To support our review and discussions, a case study of a general urban system model and its implementation within high-performance computing environments is presented to demonstrate the process of parallelization for urban models. Keywords High-performance computing · Urban modeling · Urban systems · Parallel urban models · Urban dynamics and complexity

Z. Gong (*) School of Geography, Earth and Environmental Sciences, University of Birmingham, Birmingham, UK e-mail: [email protected] W. Tang Center for Applied Geographic Information Science, Department of Geography and Earth Sciences, University of North Carolina at Charlotte, Charlotte, NC, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_12

211

212

Z. Gong and W. Tang

1 Introduction An urban and regional system is an open, dynamic, and complex system that comprises various dimensions, scales, processes, actors, and their interactions. Urbanization, manifested by the change of land use patterns, is a process of concentration of human population and activities. In order to better understand the intrinsic mechanisms of urbanization, urban modeling has become a multidisciplinary effort, from disciplines such as geography, planning, regional science, urban and regional economics, and environmental science, which intends to create scientific models to account for functions and processes that generate urban spatial structures at either intra-urban or inter-urban scales (Batty 1976, 2013). At the intra-urban scale, the focus is on the internal structure of a city which itself is treated as a system, while at the inter-urban scale the external relations of cities are emphasized to explain the distribution of urban centers in a system of cities (or urban-regional system). Practically, to facilitate policy-making for planning and sustainable development, these urban models are operationalized and implemented as computer programs fed with empirical data to estimate, simulate, and forecast urban patterns. It is due to these intrinsic properties that urban models involve tremendous computational and data complexity and intensity. As the development of computational technologies such as high-performance computing (HPC), how to leverage HPC to tackle the computational and data issues for urban modeling becomes an imperative objective. This chapter reviews the design and development of HPC-enabled operational models for urban studies in several identified modeling and application categories. The rest of this chapter is organized as follows: The second section sets a context of urban modeling in terms of incorporating urban dynamics and complexity as the key properties of urban models. Section 3 systematically reviews, identifies, and discusses applications of HPC to different categories of urban models and the associated computational complexity. Presented as a case study in Sect. 4, a general urban system model and its HPC-enabled implementation are examined with suggested parallel computing strategies and platforms. Finally, conclusions are drawn from our investigations.

2 M odeling Urban Dynamics and Complexity in Urban Studies Following Forrester (1969) early attempt to introduce dynamics into urban systems theory, theoretical developments of disequilibrium models focused on the dynamic process of nonlinear growth of urban systems that can generate not only continuous change but also discontinuity and catastrophe. Harris and Wilson (1978) embedded a spatial interaction model of retail centers in a dynamic framework that can give rise to nonlinearities and qualitative change when some parameter exceeds a critical

12 High-Performance Computing in Urban Modeling

213

value. Allen and Sanglier (1979, 1981) built a dynamic model of a central place system and showed how it can generate bifurcations where new centers may emerge because of random fluctuations and grow along different paths otherwise. In particular, the growth of population and employment in their model is interdependently determined by accounting for both agglomeration economies and congestion diseconomies in an ad hoc way. Specifically, existing employment/population attracts new employment/population, but eventually the capacity of places hits a ceiling. Their models have been calibrated and applied to a number of cities and geographies (Allen and Sanglier 1979). Although these dynamic models may employ “ad hoc” specifications for considerations of the economic motivation of individuals, they lack microeconomic foundations as other non-economic models do. Furthermore, their dynamic behavior is backward-looking rather than forward- looking, as individual decisions are static in the models when they consider future expected benefits and costs. In the same vein as the early development of dynamic urban models but with a rather aggregate approach, the new disequilibrium modeling paradigm arising in the recent decades views cities as open, complex, and self-organizing systems and aims to incorporate both temporal dynamics and spatial heterogeneity that characterize urban processes in a highly disaggregate and decentralized approach. Therefore, these models are advocated to accommodate the increasing availability of finer- resolution land use/cover data in space and time facilitated by the advent of new information technologies. This includes the penetration of information technologies such as personal computers into the entire modern society and the diffusion of geospatial technologies such as geographic information system (GIS) and remote sensing (RS) for data collecting, digital mapping, and spatial database management in urban planning practices. Along with developments in complexity theory, this approach emphasizes a new bottom-up paradigm, where the accumulation and aggregation of numerous localized decisions and decentralized interactions in disaggregated spatial-temporal dimensions give rise to the evolution of macroscopic urban structure as a globally emergent property. With a focus on urban morphology, cellular automata (CA), as a typical urban model of this paradigm, has gained significant popularity due to its simplicity and explicit spatial characteristics to represent the city and region as a fine-scale grid, where urban development and land use change can be simulated as diffusion processes that are reflected as the iteratively changing state (land use type or development) of each cell on the grid governed by decision rules and neighborhood effects. Interestingly, a decision-making process with a dedicated human behavioral point of view becomes the central concept of this paradigm. Agent-based models (ABM) provide an explicit representation for individuals or group entities as agents/actors to simulate their decision-making behaviors, interactions, and responses to the urban context and policy environment through various processes of change with different speed ranging from daily travel to relocation, from housing choice to real estate development. Among these bottom-up models, CA has been widely applied empirically to a variety of domains such as natural sciences, geography, urban studies, and land use

214

Z. Gong and W. Tang

and land cover change studies, but only a few of them have become operational models in that, being primarily physical environment driven, they pay little attention to transportation and spatial economy such as land price and travel costs; their highly disaggregate and dynamic structure raises issues for model calibration, which makes them remain indicative rather than predictive; as a result they lack the capability to test policies and support practical planning. The continuation of the behavioral paradigm that has prevailed in planning and urban studies since the 1970s, which is reflected in accommodations for the changing policy environment in transportation planning and travel demand management during this time, serves as a major catalyst for the activity-based research on travel behavior in particular and human behavior in general. The activity-based approach is intellectually rooted in activity analysis. This body of literature was singularly initiated by Hägerstrand (1970), Chapin (1974), and Fried et al. (1977) on the patterns of activity behavior, the constraints on the social structural causes for activity participation under the space-time context. Along the same line, the tenet of this approach is that travel decisions are based on the demand for activity participation, and therefore the understanding of travel behavior depends on the understanding of the underlying activity behavior. As such, activity-based models aim to replace the traditional trip-based aggregate models that generate trips based on a spatial interaction framework with a highly disaggregate approach. This approach focuses on the formation of daily activity agendas, the scheduling of activity programs, and the choice process of associated decisions for participation performed by individuals and households at the micro-level, which constrains the spatial pattern of their activities and characterizes their travel behaviors. As a typical realization specialized in transportation planning, the activity-based approach belongs to a broader concept named microsimulation which has a close relationship with the CA/ABM approach. It allows the simulation of the decision- making process of and complex interactions between individual actors within an open system at the micro-level (disaggregate). Therefore, it enables to trace the evolution of the whole system over time at the macro-level (aggregate) by accounting for path dependence and stochastic elements. With advances in computing power and increasing availability of disaggregate data, microsimulation has been introduced to land use and transportation modeling to account for the dynamics and complexity of urban systems, exemplified by practical applications such as TRANSIMS (Nagel et al. 1999), UrbanSim (Waddell 2002), and ILUTE (Miller et al. 2004). These models were intentionally developed as fully operational models that have targeted purposes for planning support, efficient computer programs for the implementation of model algorithms, clear specification for data requirements, well-organized procedures for empirical calibration or even validation, and powerful capability for policy analysis. However, due to their enormous data requirements, they only have been practically applied to city contexts where data availability is not an issue.

12 High-Performance Computing in Urban Modeling

215

3 HPC for Urban Spatial Modeling High-performance computing, in a layman expression, refers to the practice of increasing computing power through advanced hardware and/or software (e.g., supercomputers or parallel computers) that offer much higher performance than regular computers or workstations. Early exploration of applying HPC technologies to urban spatial modeling dates back to the 1980s (Harris 1985; Openshaw 1987). Making use of HPC, parallel processing was proposed to accommodate the increasing complexity of spatial analysis and modeling (SAM) in terms of the volume of data at fine spatial and temporal resolutions and of more sophisticated algorithms and models (Armstrong 2000). Openshaw and Turton (2000) identified a list of opportunities that would lead towards a computational human geography: • To speed up existing compute-bound tasks in order to engage large-scale experimentation and simulation of complex human and physical systems and real-time geospatial analysis. • To improve the quality of results by using compute-intensive methods to reduce the number of assumptions and shortcuts forced by computational restraints. • To permit larger databases to be analyzed or to obtain better results by being able to process finer-resolution data. • To develop completely new and novel approaches based on computational technologies such as computational intelligence methods. Along these lines, several special journal issues stand out as landmark contributions. A 1996 issue of the International Journal of Geographical Information Science (IJGIS) initiated a focus on parallelization of existing computationally intensive geospatial operations (Clematis et al. 1996; Ding and Densham 1996). Later, a 2003 issue of parallel computing extended this line of research to parallel spatial algorithms and data structures (Clematis et al. 2003; Wang and Armstrong 2003). As cyberinfrastructure emerged as a new paradigm to harness the power of data and computation sciences, two special issues (a 2009 issue of IJGIS and a 2010 issue of Computers, Environment and Urban Systems) on geospatial cyberinfrastructure are in order aiming at elevating geospatial sciences to the next level with the support of HPC as one of its critical components (Yang and Raskin 2009; Yang et al. 2010). Among those issues, significant enhancements have been demonstrated by specific applications employing HPC such as conducting computationally intensive geospatial analysis methods and large-scale forecasting of dust storms (Wang and Liu 2009; Xie et al. 2010). Riding this tide, recent research has been targeted at taking full advantage of the HPC resources available encompassing the development and application of parallel algorithms (Wang and Armstrong 2009; Li et al. 2010; Guan et al. 2011; Yin et al. 2012; Widener et al. 2012) and parallel libraries (e.g., pRPL, Repast HPC, and EcoLab, see Guan and Clarke 2010; RepastTeam 2014; Standish 2014) for SAM. The latest contribution is featured by a 2013 issue of IJGIS (Wang 2013; Wang et al. 2013) that focuses on the development of a new generation of

216

Z. Gong and W. Tang

cyberinfrastructure-based GIS (CyberGIS) as the synthesis of advanced cyberinfrastructure, GIScience, and SAM (Wang 2010). Expanding the frontiers of CyberGIS, this issue highlights establishing integrated and scalable geospatial software ecosystems with the pursuit of scalable methods, algorithms, and tools that can harness heterogeneous HPC resources, platforms, and paradigms (message passing vs. shared memory) (Zhao et al. 2013; Shook et al. 2013; Tang 2013; Zhang and You 2013). In the following sections, three categories of HPC-enabled modeling endeavors are discussed with respect to the domain of applications in urban and regional modeling. For each category, we further identify and examine the associated computational complexity, in terms of computational and data intensities, in order to leverage the power of HPC to achieve superior performance.

3.1 Integrated Land and Transport Modeling In the domain of integrated land use and transport modeling, parallel computing is utilized to help solve general equilibria or fixed-point problems that require a number of iterations of numerical approximation. Specifically, this involves matrix balancing and estimation in spatial interaction, input–output, and spatial regression models (Davy and Essah 1998; Openshaw and Turton 2000; Wong et al. 2001) and spatial network and location optimization in path search, traffic assignment, and location allocation problems (Smith et al. 1989; Birkin et al. 1995; Wisten and Smith 1997; Hribar et al. 2001; Lanthier et al. 2003). These studies revealed how effective HPC is to accelerate existing models so that they can be applied to a finer spatial detail or resolution on the largest available databases and thus provide improved levels of solution, accuracy, and representation. Computational complexity of urban models in this category lies in their nature of model systems in which each sub-model (land use model or transport model) searches for its local and short-run equilibrium before all sub-models together reach a global and long-run equilibrium for the entire model system. Due to the nonlinearity and spatial setting, sub-models require a short-run equilibrium to be computed by numerical approximations, which is usually time-consuming, as the analytical solution does not exist. Furthermore, the stability of each short-run equilibrium must be numerically verified through sensitivity analysis. Due to the multiplicity of equilibria, the model must obtain all the stable equilibria and select an appropriate one given the condition of path dependence. Once short-run/local equilibria are determined for all sub-models, it completes one iteration towards the long-run/global equilibrium. If the long-run equilibrium conditions for the model systems are not satisfied, the next iteration will be performed with certain predefined adjustment of some model parameters. This process continues until the long-run equilibrium conditions are all met. It will be followed by a stability test to verify the achieved longrun equilibrium. A common testing approach is through perturbation of the initial distributions of input conditions. Commonly, 30 instances of random perturbation

12 High-Performance Computing in Urban Modeling

217

runs are needed to justify the stability of a long-run equilibrium. Each perturbation run follows the same procedures as described above and it is how the model systems embrace stochasticity to search for regularities.

3.2 Computational Intelligence Models In the domain of computational intelligence, CA, ABM, and various biologically and linguistically motivated computational paradigms, such as genetic algorithm and neural networks, have been applied to model the complexity and dynamics of urban geographic phenomena. These modeling endeavors necessitate the support of HPC because these models enable the incorporation of heterogeneous factors and processes at multiple spatiotemporal scales and their decentralized micro-level interactions that give rise to macro-level structures or regularities and thus prompt massive computational demands (Dattilo and Spezzano 2003; Guan 2008; Guan and Clarke 2010; Li et al. 2010; Tang et al. 2011; Gong et al. 2012, 2013; Porta et al. 2013; Meentemeyer et al. 2013; Pijanowski et al. 2014). Furthermore, due to their intrinsic mechanism of concurrency and parallelism (e.g., decentralization of cells/ agents and interactions, evaluation of individuals in natural selection, and distributed processing of interconnected neurons), these computational models are inherently suitable for parallel computing (Wong et al. 2001; Tang et al. 2011; Gong et al. 2012). Especially, the calibration of these models involving estimating a large number of combinations of parameters and their simulations entailing a considerable number of iterations all justify the utilization of HPC, which in turn enables to gain unprecedented insights into the complexity and dynamics of urban-regional systems and manifest the opportunities to discover new theories (Meentemeyer et al. 2013; Pijanowski et al. 2014; Tang and Jia 2014; Gong et al. 2015). In addition to the computational complexity caused by the short- and long-run equilibria to be achieved at different levels of the model systems, urban models in this category, as a highly disaggregated approach, have added dimensions of complexity in the heterogeneous setting of space and distance relations. As a result, it allows a high level of spatial variation in density of the modeled agents, e.g., firms and households. To obtain short-run equilibrium with the space setting, instead of using standard numerical methods to solve nonlinear problems, an agent-based approach is commonly implemented as an iterative algorithm to approximate the equilibrium conditions.

3.3 Microsimulation and Activity-Based Modeling The domain of microsimulation and activity-based modeling, increasingly popular as a comprehensive decision support system for practical urban and transportation planning, is notorious for the level of details of the data it requires (parcel level

218

Z. Gong and W. Tang

spatial resolution and individual travel activities) and the heavy computing load it relies on (individual level location choice and vehicle level traffic simulation). General modeling frameworks of this approach such as UrbanSim and TRANSIMS, without exception, resort to HPC for effective and efficient problem solving once applied to real world planning projects (Rickert and Nagel 2001; Nagel and Rickert 2001; Cetin et al. 2002; Awaludin and Chen 2007). Notably, operational models implemented via this approach and applied real world planning practices include Oregon Statewide models (Donnelly, 2018; Brinkerhoff et al., 2010) that combine macro- and micro-level simulations of statewide land use, economy, and transport systems and Chicago Metro Evacuation Planning (TRACC 2011) that adapts TRANSIMS’ normal traffic forecasting capability to dynamic evacuation scenarios, which all take advantage of HPC clusters in order to achieve extraordinary performance. As an operational modeling approach, urban models in this category involve a large amount of data, namely high data intensity. In general, the data intensity increases with the geographic units such as number of locations or number of regions. Therefore, larger extent of the study area or finer granularity of the geographic unit can easily intensify the computational complexity. Moreover, with an increasing number of geographic units, the interactions between them, such as commuting flows and trade flows, grow exponentially, which imposes even higher intensity on data processing. In addition, data intensity is also related to the model complexity. For example, when multiple types of activities and agents are considered, the added dimension of complexity will require the differentiation of different types of interactions and thus cause more than proportional increase of the data intensity.

4 A Case Study: Parallel Models of Urban Systems This section provides a case study of the implementation of an urban system model by considering the computational complexity discussed in the previous section. For proof of concept, a general urban system model employed here must involve urban dynamics and complexity. Given this requirement, we adopt the theoretical and empirical models proposed in Gong (2015), which fall into the first and the second categories in last section, respectively. Theoretically, they are hyper-models for a system of polycentric urban models that incorporates urban costs and the internal spatial structures of cities. Specifically, each hyper-model embeds a number of polycentric urban models into one hyper-model structure of urban systems (Fujita and Mori 1997; Tabuchi and Thisse 2011). Empirically, each urban model considers the fact that the accessibility to locations of households and firms varies across a network of places. An agent-based approach is employed to model the location choices made by households and firms. The resulting geographic model can reflect the spatial structures of urban and regional development by coupling the first nature, a realistic heterogeneous space, with the space-economy.

12 High-Performance Computing in Urban Modeling

219

Fig. 12.1 Parallelization of urban systems models supported by both shared-memory and message-passing platforms

The general structure of the urban system model is depicted in Fig. 12.1. The population are exogenous. Within each polycentric urban model, households and firms with a low level of utility or profit prompt relocation to improve their own situation. This process repeats until no household can improve its utility level and no firm earns a non-zero profit, where a short-run equilibrium is reached at an intra- urban level. Once each urban region achieves its own short-run equilibrium, households respond to the inter-regional difference of utility level (short-run equilibrium) and migrate to urban regions with higher utility level. This inter-regional process repeats until all regions have the same utility level, where a long-run equilibrium is reached between regions.

220

Z. Gong and W. Tang

It is of great importance to investigate how to leverage the existing distinct HPC platforms to tackle the identified computational and data intensities involved in a general model of urban systems. Two existing paradigms for parallel computing are first discussed in terms of their advantages and disadvantages. Then, parallel computing strategies are suggested to develop the parallel models of urban systems. It has become standard to use the paradigm of message-passing on platforms of cluster computing, grid computing, and cloud computing, which become less expensive computational resources to access. The advent of parallel computing in personal computers is now opening new avenues for parallel SAM on platforms such as multicore CPUs and many-core graphic processing units (GPUs) (Owens et al. 2008) in compliance with the shared-memory paradigm. In the message-passing paradigm, computing elements (e.g., individual computers) have their local memory space and exchange data through sending and receiving data streams packaged as messages over interconnected networks (Wilkinson and Allen 2004). Thus, communications must be coordinated to reduce the costs of accessing remote memories during data exchange, which may significantly complicate the parallel programing. However, due to its explicit consideration of the communication process, the message-passing paradigm is highly flexible and portable to a range of parallel platforms such as vector supercomputers, computer clusters, and grid computing systems. In contrast to distributed memory systems in message-passing paradigm, the fundamental principle of shared-memory systems is that multiple processors or cores are organized in a way that multiple processing units access a common memory space simultaneously (Wilkinson and Allen 2004). Such architecture supports thread-level parallelism to boost computational performance when physical limits curtail further clock rate increases of single CPU. Multicore architectures are based on a coarse-grained shared-memory paradigm aiming to exploit parallelism through coordination among multiple concurrent processing threads within a single program. Compared to many-core shared-memory architectures, such as GPUs which support a large number of fine-grained light-weight threads (billions), they use a small number of threads, each of which has much more powerful computational capability. In practical applications, the message-passing and shared-memory paradigms can be combined to maximize the exploitation of different types of parallelisms on heterogeneous parallel platforms (Kranz et al. 1993). The parallel model of urban systems is designed to leverage both shared-memory and message-passing platforms. The parallelization strategies, depicted in Fig. 12.1, are applied to both theoretical and geographic models, since they follow the same set of general procedures. Practically, the design of the parallel model aims to adapt to the general architecture of cluster computing, ranging from small-scale cluster computers to supercomputers, which can be characterized by a hierarchical structure of the combination of shared-memory and message-passing platforms. Because a computer cluster constitutes a message-passing platform by implementing the message-passing interface (MPI) protocol between a large number of interconnected computers via high-speed networks, among which each individual computer itself is a shared-memory platform. Given that the shared-memory paradigm is

12 High-Performance Computing in Urban Modeling

221

more efficient to handle communications between operations than the message- passing paradigm, the design of the parallel model follows a main principle that model operations with high dependency are assigned to shared-memory platform while relatively independent model runs (or simulations with changing parameters) are handled by the message-passing platform. Therefore, the computational and data intensity internal to the model are shared by CPUs and/or cores within a computer, while numerous simulations with different sets of parameters are processed in parallel by different computers in the cluster. Specifically, on a shared-memory platform, each urban model in the urban systems model is processed by a CPU/core and the interactions between urban models are through inter-thread communications by accessing the shared memory space among CPUs/cores. MPI is used currently to allocate simulations of models with different parameters to different computers. If an urban systems model is too large to be accommodated by a single computer, MPI can be leveraged to decompose the urban system model into smaller sub-domains, each of which can be adequately handled by an individual computer. This extension will be part of future studies. The proposed theoretical and geographic models are implemented in WOLFRAM MATHEMATICA, which is a widely used symbolic and numerical modeling system with built-in ABM capability. More important, MATHEMATICA integrates both message-passing and shared-memory paradigms in a seamless HPC environment. Thus, it provides parallel computing support using both the threading-based multicore and the MPI-based computer cluster platforms. The resulting implementation of the parallel models speeds up the model performance by several order of magnitudes. Specifically, the CPU time for 100 model runs has been reduced from more than a week to 2 h by using a computing cluster PYTHON in University Research Computing at the University of North Carolina at Charlotte.

5 Conclusions This chapter reviewed various types of urban models for urban studies in terms of how they incorporate urban dynamics and complexity as the intrinsic properties of urban systems. Given these properties, the computation complexity of urban models becomes a prominent challenge and leveraging the power of HPC to properly cope with this issue has paved the way for many recent HPC-enabled urban modeling endeavors. Three categories of urban modeling work have been identified and discussed in terms of their considerations in the design and development with the support of HPC and their associated computational and data intensities. The review revealed that the support of HPC has been not only effective to accelerate urban models for the reduction of computing time, but also enabling and facilitating the design and development of new models with finer granularities and the exploration of wider model parameter space for extensive model calibration, validation, and sensitivity analysis. Hence, it can greatly enhance the efficacy of urban modeling and help to provide new insights into the solutions for urban problems

222

Z. Gong and W. Tang

and to discovery new urban theories. On the other hand, with the advent of big data era, urban data accumulate in themes, spatiotemporal resolutions, and a variety of sources. This provides unprecedented opportunities but also creates challenges for the evaluation of urban models, as data complexity and intensity may come from different space and time scales and varying sources and formats. How to leverage HPC to tackle the new challenges in order to better support the integration of urban modeling with urban big data warrants future investigations.

References Allen, P. M., & Sanglier, M. (1979). A dynamic model of growth in a central place system. Geographical Analysis, 11, 256–272. https://doi.org/10.1111/j.1538-4632.1979.tb00693.x Allen, P. M., & Sanglier, M. (1981). A dynamic model of a central place system - II. Geographical Analysis, 13, 149–164. https://doi.org/10.1111/j.1538-4632.1981.tb00722.x Armstrong, M. P. (2000). Geography and computational science. Annals of the Association of American Geographers, 90, 146–156. https://doi.org/10.1111/0004-5608.00190 Awaludin, A., & Chen, D. (2007). UrbanSim parallel programming capstone paper. Computer Science and Engineering 481E course report. June 4, 2007. https://courses.cs.washington.edu/ courses/cse481e/07sp/parallel.pdf Batty, M. (1976). Urban modelling: Algorithms, calibrations, predictions. New York: Cambridge University Press. Batty, M. (2013). The new science of cities. Cambridge, MA: The MIT Press. Birkin, M., Clarke, M., & George, F. (1995). The use of parallel computers to solve nonlinear spatial optimization problems - an application to network planning. Environment & Planning A, 27, 1049–1068. https://doi.org/10.1068/a271049 Cetin, N., Nagel, K., Raney, B., & Voellmy, A. (2002). Large-scale multi-agent transportation simulations. Computer Physics Communications, 147, 559–564. https://doi.org/10.1016/ S0010-4655(02)00353-3 Chapin, F. S. (1974). Human activity patterns in the city. New York: Wiley. Clematis, A., Falcidieno, B., & Spagnuolo, M. (1996). Parallel processing on heterogeneous networks for GIS applications. International Journal of Geographical Information Systems, 10, 747–767. https://doi.org/10.1080/02693799608902108 Clematis, A., Mineter, M., & Marciano, R. (2003). High performance computing with geographical data. Parallel Computing, 29, 1275–1279. https://doi.org/10.1016/j.parco.2003.07.001 Dattilo, G., & Spezzano, G. (2003). Simulation of a cellular landslide model with CAMELOT on high performance computers. Parallel Computing, 29, 1403–1418. https://doi.org/10.1016/j. parco.2003.05.002 Davy, J., & Essah, W. (1998). Generating parallel applications of spatial interaction models. In: Euro-Par’98 Parallel Processing (pp.136–145). Ding, Y. Y. Y., & Densham, P. J. P. (1996). Spatial strategies for parallel spatial modelling. International Journal of Geographical Information Systems, 10, 669–698. https://doi. org/10.1080/02693799608902104 Forrester, J. (1969). Urban dynamics. Cambridge, MA: The MIT Press. Fried, M., Havens, J., & Thall, M. (1977). Travel behavior-a synthesized theory. Washington, DC: NCHRP, Transportation Research Board. Fujita, M., & Mori, T. (1997). Structural stability and evolution of urban systems. Regional Science and Urban Economics, 27, 399–442. https://doi.org/10.1016/S0166-0462(97)80004-X

12 High-Performance Computing in Urban Modeling

223

Gong, Z. (2015). Multiscalar modeling of polycentric urban-regional systems: Economic agglomeration, scale dependency and agent interactions. The University of North Carolina at Charlotte. Gong, Z., Tang, W., Bennett, D. A., & Thill, J.-C. (2013). Parallel agent-based simulation of individual-level spatial interactions within a multicore computing environment. International Journal of Geographical Information Science, 27, 1152–1170. https://doi.org/10.1080/13658 816.2012.741240 Gong, Z., Tang, W., & Thill, J. (2012). Parallelization of ensemble neural networks for spatial land- use modeling. In Proceedings of the 5th International Workshop on Location-Based Social Networks - LBSN ‘12 (p. 48). New York, NY: ACM Press. Gong, Z., Thill, J.-C., & Liu, W. (2015). ART-P-MAP neural networks modeling of land-use change: Accounting for spatial heterogeneity and uncertainty. Geographical Analysis, 47, 376– 409. https://doi.org/10.1111/gean.12077 Guan, Q. (2008). Parallel algorithms for geographic processing. Santa Barbara: University of California. Guan, Q., & Clarke, K. C. (2010). A general-purpose parallel raster processing programming library test application using a geographic cellular automata model. International Journal of Geographical Information Science, 24, 695–722. https://doi.org/10.1080/13658810902984228 Guan, Q., Kyriakidis, P. C., & Goodchild, M. F. (2011). A parallel computing approach to fast geostatistical areal interpolation. International Journal of Geographical Information Science, 25, 1241–1267. https://doi.org/10.1080/13658816.2011.563744 Hägerstrand, T. (1970). What about people in regional science? Pap Reg Sci Assoc, 24, 6–21. https://doi.org/10.1007/BF01936872 Harris, B. (1985). Some notes on parallel computing: With special reference to transportation and land-use modeling. Environment & Planning A, 17, 1275–1278. https://doi.org/10.1068/ a171275 Harris, B., & Wilson, A. G. (1978). Equilibrium values and dynamics of attractiveness terms in production-constrained spatial-interaction models. Environment & Planning A, 10, 371–388. https://doi.org/10.1068/a100371 Hribar, M. R., Taylor, V. E., & DE Boyce. (2001). Implementing parallel shortest path for parallel transportation applications. Parallel Computing, 27, 1537–1568. https://doi.org/10.1016/ S0167-8191(01)00105-3 Kranz, D., Johnson, K., Agarwal, A., et al. (1993). Integrating message-passing and shared- memory. In Proceedings of the fourth ACM SIGPLAN symposium on principles and practice of parallel programming - PPOPP ‘93 (pp. 54–63). New York: ACM Press. Lanthier, M., Nussbaum, D., & Sack, J. R. (2003). Parallel implementation of geometric shortest path algorithms. Parallel Computing, 29, 1445–1479. https://doi.org/10.1016/j. parco.2003.05.004 Li, X., Zhang, X., Yeh, A., & Liu, X. (2010). Parallel cellular automata for large-scale urban simulation using load-balancing techniques. International Journal of Geographical Information Science, 24, 803–820. https://doi.org/10.1080/13658810903107464 Meentemeyer, R. K., Tang, W., Dorning, M. A., et al. (2013). FUTURES: Multilevel simulations of emerging urban–rural landscape structure using a stochastic patch-growing algorithm. Annals of the Association of American Geographers, 103, 785–807. https://doi.org/10.1080/ 00045608.2012.707591 Miller, E. J., Douglas Hunt, J., Abraham, J. E., & Salvini, P. A. (2004). Microsimulating urban systems. Computers, Environment and Urban Systems, 28, 9–44. https://doi.org/10.1016/ S0198-9715(02)00044-3 Nagel, K., Beckman, R., & Barrett, C. (1999). TRANSIMS for urban planning. In: 6th International Conference on Computers in Urban Planning and Urban Management, Venice, Italy. Nagel, K., & Rickert, M. (2001). Parallel implementation of the TRANSIMS micro-simulation. Parallel Computing, 27, 1611–1639. https://doi.org/10.1016/S0167-8191(01)00106-5

224

Z. Gong and W. Tang

Donnelly, R. (2018). Oregon’s transportation and land use model integration program: a retrospective. Journal of Transport and Land Use, 11. https://doi.org/10.5198/jtlu.2018.1210 Brinkerhoff, Parson, HBA Specto Incorporated, & EcoNorthwest (2010). Oregon Statewide Integrated Model (SWIM2): model description. https://www.oregon.gov/ODOT/Planning/ Documents/Statewide-Integrated-Model-2nd-Gen-Description.pdf. Openshaw, S. (1987). Research policy and review 17. Some applications of supercomputers in urban and regional analysis and modelling. Environment & Planning A, 19, 853–860. https:// doi.org/10.1068/a190853 Openshaw, S., & Turton, I. (2000). High performance computing and the art of parallel programming: An introduction for geographers, social scientists and engineers. New York: Routledge. Owens, J. D., Houston, M., Luebke, D., et al. (2008). GPU computing. Proceedings of the IEEE, 96, 879–899. https://doi.org/10.1109/JPROC.2008.917757 Pijanowski, B. C., Tayyebi, A., Doucette, J., et al. (2014). A big data urban growth simulation at a national scale: Configuring the GIS and neural network based land transformation model to run in a high performance computing (HPC) environment. Environ Model Softw, 51, 250–268. https://doi.org/10.1016/j.envsoft.2013.09.015 Porta, J., Parapar, J., Doallo, R., et al. (2013). High performance genetic algorithm for land use planning. Computers, Environment and Urban Systems, 37, 45–58. https://doi.org/10.1016/j. compenvurbsys.2012.05.003 RepastTeam. (2014). Repast for high performance computing. http://repast.sourceforge.net/ Rickert, M., & Nagel, K. (2001). Dynamic traffic assignment on parallel computers in TRANSIMS. Future Generation Computer Systems, 17, 637–648. https://doi.org/10.1016/ S0167-739X(00)00032-7 Shook, E., Wang, S., & Tang, W. (2013). A communication-aware framework for parallel spatially explicit agent-based models. International Journal of Geographical Information Science, 27, 2160–2181. https://doi.org/10.1080/13658816.2013.771740 Smith, T. R., Peng, G., & Gahinet, P. (1989). Asynchronous, iterative, and parallel procedures for solving the weighted-region least cost path problem. Geographical Analysis, 21, 147–166. https://doi.org/10.1111/j.1538-4632.1989.tb00885.x Standish, R. K. (2014). EcoLab. http://ecolab.sourceforge.net Tabuchi, T., & Thisse, J.-F. (2011). A new economic geography model of central places. Journal of Urban Economics, 69, 240–252. https://doi.org/10.1016/j.jue.2010.11.001 Tang, W. (2013). Parallel construction of large circular cartograms using graphics processing units. International Journal of Geographical Information Science, 27, 2182–2206. https://doi.org/10 .1080/13658816.2013.778413 Tang, W., Bennett, D. A., & Wang, S. (2011). A parallel agent-based model of land use opinions. Journal of Land Use Science, 6, 121–135. https://doi.org/10.1080/1747423X.2011.558597 Tang, W., & Jia, M. (2014). Global sensitivity analysis of a large agent-based model of spatial opinion exchange: A heterogeneous multi-GPU acceleration approach. Annals of the Association of American Geographers, 104, 485–509. https://doi.org/10.1080/00045608.2014.892342 TRACC. (2011). Advanced evacuation modeling using TRANSIMS. https://www.tracc.anl.gov/ index.php/evacuation-planning Waddell, P. (2002). UrbanSim: Modeling urban development for land use, transportation, and environmental planning. Journal of the American Planning Association, 68, 297–314. https://doi. org/10.1080/01944360208976274 Wang, S. (2010). A cyberGIS framework for the synthesis of cyberinfrastructure, GIS, and spatial analysis. Annals of the Association of American Geographers, 100, 535–557. https://doi. org/10.1080/00045601003791243 Wang, S. (2013). CyberGIS: Blueprint for integrated and scalable geospatial software ecosystems. International Journal of Geographical Information Science, 27, 2119–2121. https://doi.org/10 .1080/13658816.2013.841318

12 High-Performance Computing in Urban Modeling

225

Wang, S., Anselin, L., Bhaduri, B., et al. (2013). CyberGIS software: A synthetic review and integration roadmap. International Journal of Geographical Information Science, 27, 2122–2145. https://doi.org/10.1080/13658816.2013.776049 Wang, S., & Armstrong, M. P. (2003). A quadtree approach to domain decomposition for spatial interpolation in grid computing environments. Parallel Computing, 29, 1481–1504. https://doi. org/10.1016/j.parco.2003.04.003 Wang, S., & Armstrong, M. P. (2009). A theoretical approach to the use of cyberinfrastructure in geographical analysis. International Journal of Geographical Information Science, 23, 169– 193. https://doi.org/10.1080/13658810801918509 Wang, S., & Liu, Y. (2009). TeraGrid GIScience gateway: Bridging cyberinfrastructure and GIScience. International Journal of Geographical Information Science, 23, 631–656. https:// doi.org/10.1080/13658810902754977 Widener, M. J., Crago, N. C., & Aldstadt, J. (2012). Developing a parallel computational implementation of AMOEBA. International Journal of Geographical Information Science, 26, 1707–1723. https://doi.org/10.1080/13658816.2011.645477 Wilkinson, B., & Allen, M. (2004). Parallel programming: Techniques and applications using networked workstations and parallel computers (2nd ed.). Englewood Cliffs, NJ: Prentice- Hall, Inc. Wisten, M. B., & Smith, M. J. (1997). Distributed computation of dynamic traffic equilibria. Transportation Research Part C: Emerging Technologies, 5, 77–93. https://doi.org/10.1016/ S0968-090X(97)00003-X Wong, S. C., Wong, C. K., & Tong, C. O. (2001). A parallelized genetic algorithm for the calibration of Lowry model. Parallel Computing, 27, 1523–1536. https://doi.org/10.1016/ S0167-8191(01)00104-1 Xie, J., Yang, C., Zhou, B., & Huang, Q. (2010). High-performance computing for the simulation of dust storms. Computers, Environment and Urban Systems, 34, 278–290. https://doi. org/10.1016/j.compenvurbsys.2009.08.002 Yang, C., & Raskin, R. (2009). Introduction to distributed geographic information processing research. International Journal of Geographical Information Science, 23, 553–560. https://doi. org/10.1080/13658810902733682 Yang, C., Raskin, R., Goodchild, M., & Gahegan, M. (2010). Geospatial cyberinfrastructure: Past, present and future. Computers, Environment and Urban Systems, 34, 264–277. https://doi. org/10.1016/j.compenvurbsys.2010.04.001 Yin, L., Shaw, S.-L., Wang, D., et al. (2012). A framework of integrating GIS and parallel computing for spatial control problems: A case study of wildfire control. International Journal of Geographical Information Science, 26, 621–641. https://doi.org/10.1080/13658816.2011.609 487 Zhang, J., & You, S. (2013). High-performance quadtree constructions on large-scale geospatial rasters using GPGPU parallel primitives. International Journal of Geographical Information Science, 27, 2207–2226. https://doi.org/10.1080/13658816.2013.828840 Zhao, Y., Padmanabhan, A., & Wang, S. (2013). A parallel computing approach to viewshed analysis of large terrain data using graphics processing units. International Journal of Geographical Information Science, 27, 363–384. https://doi.org/10.1080/13658816.2012.692372

Chapter 13

Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery Using Automatic Identification System Data Xuantong Wang, Jing Li, and Tong Zhang

Abstract With the development of location-based data collection techniques, fine level tracking data has become abundant. One example is the tracking information of ships from the automatic identification system (AIS). Scientists and practitioners have formulated methods to identify patterns and predict movements of ships. Nevertheless, it is a computational challenge to enable efficient and timely implementations of these methods on a massive trajectory data. This article describes the development efforts of building a graphics processing unit (GPU)-based analytical workflow for AIS data. The development of the workflow contributes to the community of high performance geocomputation in two ways: First, this workflow leverages GPU computing capabilities to deliver efficient processing capabilities for major stages of pattern analysis. Second, the integrated analytical workflow along with the web system enhances the usability of pattern analysis tools by hiding the complexity of separate pattern mining steps. We have successfully customized the workflow to analyze AIS data in the Gulf Intracoastal Waterway. Results have shown that our GPU-based framework outperformed serial methods with more efficient data query and higher processing speed. Keywords Machine learning · Parallel processing · AIS · GPU · Data mining

X. Wang (*) · J. Li Department of Geography and the Environment, University of Denver, Denver, CO, USA e-mail: [email protected] T. Zhang State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, Hubei, China © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_13

227

228

X. Wang et al.

1 Introduction Automatic identification system (AIS) is a wireless communication system for ship- to-ship and ship–shore information exchange (Tu et al. 2018). AIS works by collecting GPS coordinates data and broadcasting on radio waves to exchange information between ships and maritime authorities. The data signal usually contains ship information like the Maritime Mobile Service Identity (MMSI), coordinates, direction, and speed. Therefore, unlike vehicle routes, ships routes are not restricted by the road network, but they are usually affected by factors like extreme weather events and seasonal changes. With the improvements in AIS techniques, scientists and practitioners have better access to high resolution of tracking information of ships and perform advanced analysis in an accurate way. The increasing availability of AIS data has greatly changed the traditional paradigm of marine transportation analysis, which largely relies on surveys or historical summary data (Dawson et al. 2018). Typical examples of analysis include traffic anomaly detection, route prediction, collision prediction, and path planning (Tu et al. 2018; Laxhammar et al. 2009; Pallotta et al. 2013; Chang et al. 2015; Silveira et al. 2013; Shelmerdine 2015). Despite the improved accessibility and the development of methods in spatiotemporal analysis, most of the analysis is conducted at an aggregated level as a summary report (Shelmerdine 2015). Advanced analysis such as fine levels of spatiotemporal patterns of vessels is not widely accessible partly due to the lack of efficient data processing capabilities on massive data and the misconnection between basic data management solutions and advanced processing tools. Incorporating high performance computing techniques with pattern mining methods is necessary, in particular, to support time critical analysis such as early abnormal pattern detection. To improve the efficiency of pattern analysis for trajectory data, we have developed a parallel analytical workflow for AIS data. This workflow leverages the parallel computing capabilities delivered by graphics processing units (GPUs) to speed up computing intensive steps of the analytical workflow. While scientists have formulated GPU parallel solutions for trajectory pattern analysis, a GPU-based analytical workflow that implements all stages of pattern analysis using GPU techniques is not yet available. The availability of such workflow can deliver high efficiency of analysis and improve the usability of pattern analysis tools. We have built a web-based prototype system that consists of four core modules: a database engine, spatiotemporal querying interfaces, a core analytical module, and an interactive visualization module. Both the database engine and analytical module are implemented in a parallel manner with GPU parallel computing techniques to deliver efficient processing capabilities. The visualization module is provided as a web-based visualization toolkit. The four modules are integrated to form an analytical workflow. The workflow is configurable through the web interfaces of the system. The utilization of GPU techniques ensures the parallel analytical workflow can be deployed with conventional computing facilities.

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

229

2 Related Work Maritime transportation has been widely recognized as the most economical method of transporting large quantities of goods (Tu et al. 2018). According to International Maritime Organization (IMO), maritime transportation plays a central role in the global logistics system. Marine transportation has two unique features: First, ships move in unrestricted and restricted waters (e.g., oceans vs narrow waterways) (Sheng and Yin 2018). Typical operations exhibit a certain degree of feasibility. For example, routing does not follow exact waterways. Second, navigation is largely performed by human latency of applying actions, navigation by human judgment aided by data gathered from instruments such as Radar systems. Understanding ship traffic is critical to ensure the secured and energy efficient operations of vessels and formulate reasonable route plans. Traditional pattern analysis of ship traffic largely relies on surveyed data. Since 2002, a large number of vessels have been equipped with AIS on board, allowing a fine level of pattern analysis on ship movements. AIS data has the following characteristics that greatly change the traditional pattern analysis. First, the majority of large ships are required to equip AIS devices to ensure the large coverage of such data collection (Safety of Life at Sea (SOLAS) n.d.). The data is generated at a high frequent rate, ranging from a few seconds to minutes depending on the motions. Compared to traditional survey on a selected amount of ships, AIS data supports the fine spatiotemporal level of pattern analysis on a large number of ships. Second, every AIS record contains location, temporal, and other necessary attribute information (such as direction, speed). The diversity of information permits a variety of pattern analysis. For example, anomaly pattern detection is not only determined by the locational information but also based on attribute data (Pallotta et al. 2013). Due to the high frequency of data generation and the large number of ships generating AIS data, the data volume of AIS based datasets is significantly larger than the traditional surveyed data (Sheng and Yin 2018). While many studies have examined various patterns of maritime transportation using AIS, few have tackled the computational challenges of processing massive volume of AIS tracking data in a timely fashion (Zhang et al. 2017). By contrast, numerous high performance solutions have been formulated in the field of trajectory data management and pattern analysis (Huang and Yuan 2015). One notable category of techniques is GPU-based parallel solutions, which have gained significant attention due to the low costs of deploying such techniques (Li et al. 2013). GPU, also known as the “video card,” is not only a powerful graphics engine, but also an important part of the computing system today. The main difference is that CPU utilizes serial processing, whereas GPU utilizes parallel processing. GPUs can help us significantly increase the programmability and capacity of processors to solve computational challenging and complex tasks. For example, GPUs can be accessible to end users who do not have access to high-end CPU clusters, and the many-core strategy allows launching many threads concurrently for computation to achieve high efficiency (Li et al. 2013, 2019; Zhang and Li 2015; Nurvitadhi et al. 2016). CPUs are particularly suitable for

230

X. Wang et al.

applications that (1) have large computational requirements, (2) dependent on parallelism, and (3) prioritize throughput over latency (Owens et al. 2008). In the scope of this study, we only focus on the GPU-based parallel solutions and how to leverage those solutions to enable the workflow of pattern analysis. A typical workflow of pattern analysis consists of four ordered tasks: (a) data preprocessing; (b) data management, (c) data querying, and (d) data mining tasks (Feng and Zhu 2016). As most studies rely on secondary data sources, in this paper, we do not consider the step of data preprocessing that cleans data or removes noise. Below we briefly discuss the GPU parallel solutions for the other three steps. GPU-based data management solutions leverage GPU parallel computing techniques to perform basic data operations such as data deletion, insertion, and aggregations. One notable example is the OmniSci solution, which provides a GPU-enabled database engine to manage spatial data (OmniSci 2019). With OmniSci, users can create persistent data files for spatial data in a database and execute database operations with GPU. Since database engines are generic to data management, further customizations (e.g., creating spatial data structures) are necessary to support the management of trajectory data. Among the three tasks, the work on querying trajectory data is the most abundant. Examples include performing similarity queries to identify the top-K similar trajectories and calculating distances between trajectories (Gowanlock and Casanova 2014; Li et al. 2018a). Experiments demonstrate that GPU-based parallel querying processing significantly outperforms CPU-based serial processing. As the primary goal is to improve the efficiency of querying, those methods are not yet integrated with data management solutions to enable a complete analytical workflow. In terms of GPU-based approaches for data mining methods, approaches vary significantly with different pattern mining algorithms (Feng and Zhu 2016). Several GPU-based mining approaches address the computational intensity of geometric computations. For example, many researchers have proposed solutions to speed up the distance computations between trajectories (Sart et al. 2010; Gudmundsson and Valladares 2015; Zhao et al. 2013). Recently, with the popularity of machine learning algorithms, scientists have begun to utilize ready-to-use GPU machine libraries to accelerate those algorithms to perform pattern analysis (Li et al. 2018b). However, due to the inherent complexity of mining algorithms and GPU programming, those approaches are implemented in a case-by-case manner (Feng and Zhu 2016). In summary, while separate efforts are made for different steps, a coherent framework that seamlessly integrates all stages of pattern analysis for trajectory data is missing. There is a need for developing an integrated high performance data mining workflow to enable the efficient analysis on massive trajectory data. On the one hand, a complete analytical workflow can connect separate complex analytical methods to reduce the difficulties of performing multiple analytical tasks. On the other hand, executing the workflow can reduce unnecessary low level data communication and preprocessing through leveraging the data management capabilities. The successful implementation of the workflow will not only deliver an efficient pattern mining solution but demonstrate the feasibility of using GPU parallel computing techniques to accelerate various types of computing tasks.

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

231

3 System Design and Implementation 3.1 System Architecture As discussed in Sect. 2, given the separate efforts made in different stages of pattern analysis, we have built the workflow by expanding existing implementations in the following ways: (a) Expand data structures of the basic database engine to support the management of trajectory data and relevant data products. The expansion allows users to perform spatiotemporal queries on data and processes data required by the workflow. (b) Parallel implementations of representative mining algorithms to provide examples of parallel implementations for further expansions. We choose one traditional clustering algorithm and one machine learning algorithm to illustrate the design of the GPU-enabled pattern mining methods. (c) Connect all stages of pattern mining to build a complete analytical workflow. We develop a web portal to facilitate the configurations of pattern mining tasks. Users can configure analytical tasks through the graphical web interfaces that connect multiple steps. This framework consists of four layers: data management, advanced querying, pattern analysis, and data representation. The data management layer includes a GPU- enabled database engine and a set of customized spatiotemporal manipulation functions to interact with data. Users can upload AIS data in plain text format and store the data in the GPU database. Advanced querying layer includes a set of predefined querying functions. The data analysis layer provides pattern mining methods, including clustering analysis and density analysis. The data representation includes a light-weight web-based visualization portal to end users to view the graphical representation of data and analytical results (Fig. 13.1). Based on the framework, we define a typical workflow as follows: (a) managing data with a GPU database; (b) preparing and transforming data for mining tasks; (c) executing data mining tasks; (d) visualizing analytical results. The first two steps are implemented as database functions. The third step is implemented as a separation function module. The entire workflow except for the last step is based on GPU parallel techniques. Below we explain the components.

3.2 Database Management The core component of the module is a GPU database engine. With the database engine, the module stores data as persistent database files. We mainly use the OmniSci’s GPU-accelerated database engine to store AIS data. OmniSci’s analytical platform, originating from research at MIT, has the great capacity to process

232

X. Wang et al.

Fig. 13.1 An overview of the system architecture

Fig. 13.2 Data structures of major classes

large datasets in an efficient manner by utilizing the parallel processing power of GPUs. According to OmniSci’s white book, it is currently the fastest SQL engine in the world. Due to the complexity of GPU’s architecture, it is very difficult for users to exploit the benefits of parallel processing by designing or configuring their own database engine. Thus, we have selected the OmniSci, which is an open-source data engine to store, process, and analyze data efficiently. Compared to traditional database engine, the OmniSci can (1) query data in milliseconds using the OmniSci Core SQL engine, (2) process and compute multiple data items simultaneously for big data mining, and (3) load data efficiently with distributed configuration (OmniSci 2018). In order to analyze the massive trajectory data, we mainly focus on three types of classes: point, trajectory, and segment (Fig. 13.2). We design tables to store the records of the three classes as persistent files in the database. According to Zheng et al., points consist of geospatial coordinates and temporal stamps, trajectories can be represented by a list of points in chronological order, and segments are fragments of trajectories that can be used for classification and clustering analysis (Zheng

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

233

2015). AIS raw data files usually provide information of points that capture the status of a ship at a time point. Raw AIS data files (in the form of csv files) can be loaded into the OmniSci database engine using its own utilities or SQL commands. We therefore add additional processing tools to generate trajectories from the point information to support trajectory-based operations. Finally, raw AIS data files are loaded into the database and converted as tracking points, trajectories, and segments for the pattern analysis.

3.3 Querying Functions We have implemented two categories of queries: simple query and complex query (Li et al. 2018a). The simple query function consists of spatial query, the temporal query, and the attribute query. The simple query function allows users to select based on (1) a spatial bounding box, (2) a temporal window, or (3) attributes. We have also designed a complex query function that allows users to perform more complicated data query. By using the complex query function, users can either (1) combine the simple query functions or (2) use a specific trajectory for selection based on the nearest neighbor query (Fig. 13.3b). To develop the query methods, we extend the querying interfaces of the OmniSci. Figure 13.3a shows a sample control panel to query data from our web-based platform based on the nearest neighbor query. In this case, the top 10 closest trajectories are extracted based on the selected trajectories. Users can choose simple or complex functions by selecting the corresponding functions. This panel will generate relevant query commands based on users’ inputs for the OmniSci database to select relevant data. Those query functions are provided as stored views for easy integration

Fig. 13.3 (a) Control panel for complex query functions and (b) flowchart of data query through OmniSci

234

X. Wang et al.

with the front-end visualization functions to enable the integration of queries and data management.

3.4 Pattern Analysis and Parallel Design Pattern analysis varies with the mining algorithms. Considering ship tracking data as a type of trajectory data, the fundamental patterns are moving together, trajectory clustering, periodic pattern, and frequency sequential pattern (Zheng 2015). Besides the generic ones for trajectory data, there are also pattern analysis types that are unique to maritime transportation (e.g., special navigation patterns, trajectory feature based clustering). We chose three types of pattern analysis methods for the workflow: clustering analysis, find major flows, density-based region of interest (ROI) identification. Despite the variety of mining algorithms, we only provide examples of implementing mining algorithms. 3.4.1 Clustering Analysis Typical clustering algorithms for trajectory data take the locational information to perform clustering analysis. In our framework, we integrate machine learning algorithms that convert massive trajectories to compressed features to reduce computational intensity. The analysis consists of three steps: Step 1. Data preprocessing: We implement an image-based method that represents large-scale trajectory data as images and employ the deep learning architecture of a convolutional autoencoders (CAE), which is a type of convolutional neural networks (CNNs), to extract and cluster trajectory features contained by the images. A CAE framework can combine feature learning and clustering in order to cluster images with better performance. A CAE can also be trained to learn features from unlabeled images. Thus, compared to the traditional machine learning methods like k-means clustering, CAE can better handle multiple-dimensional data like images. CAE generally has two layers—an encoder that is responsible for reduction and a decoder that is responsible for reconstruction. We extract all the unique IDs from a group of trajectories (AIS dataset) to create a set of images of trajectory polylines for each of the unique trajectories based on their corresponding geolocation points. Due to the spatial heterogeneity of trajectory points, we set the maximum spatial extent of the dataset as the maximum extent for all the images. Step 2. Model training: We build the CAEs based on Keras1 and TensorFlow2 libraries and train the model with the images. Therefore, in this data preprocessing stage, we have combined the geospatial representation of trajectory data,

Keras. (2019). https://keras.io/. Accessed 25 Apr 2018. TensorFlow. (2019). https://www.tensorflow.org/. Accessed 12 Mar 2018.

1 2

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

235

c onvolutional neural networks, and autoencoders for image-based trajectory data reduction and representation. Step 3. k-means clustering based on feature locations from CAE: k-means clustering is a popular clustering method for data mining by partitioning a number of observations into k clusters based on cluster centroids generated. In the previous step, we have used the convolution layer to compress the trajectory images. We extract the compressed images from the model and apply k-means clustering to the compressed representations. Step 4. Representative trajectory formation of every cluster: By using k-means clustering, we separate trajectories into different clusters. In order to measure the performance of our image-based CAE clustering model, we calculate the accuracy by comparing the assigned labels and manually added labels. The accuracy is measured by calculating the percentage of correctly assigned trajectories (Fig. 13.4). 3.4.2 ROI Identification Density-based ROI identification usually takes a grid-based approach that assigns trajectories or moving points into grids and calculates the number of points in each grid. With the count information, the function classifies the grid cells into different categories and merges cells with a high number of counts into zones. The identification helps planners to identify the popular regions and perform management tasks such as rerouting as needed. The ROI identification consists of the following steps (Fig. 13.5): Step 1. Rasterization of trajectories: The first step is to generate the raster representation of trajectories. Two options are available: line-based and point-based. This step is the most computational intensive. After the rasterization, every tracking point stores the grid information.

Fig. 13.4 Using CAE for data clustering

236

X. Wang et al.

Fig. 13.5 ROI identification flowchart

Step 2. Grid-based density calculation: The process recursively checks the grid locations of every trajectory and accumulates the number of points within each grid. Step 3. Region classification: Based on the count information, the algorithm determines a classification scheme and reassigns the class information to every grid based on the scheme. Step 4. Zoning to extract ROIs: The zoning searches high density cells and connects adjacent grids with similar high levels of density to form large zones. Step 5. Point to zone conversion: From Step 4, every zone contains a list of points that form a zone. This step performs a reserve search to assign zone information to every grid cell so that every cell stores the zone ID eventually. 3.4.3 Parallel Strategies Used in the Functions In implementing the parallel mining algorithms, we design two types of parallelism strategies: (a) trajectory level parallelism and (b) point level parallelism. Trajectory level parallelism (Fig. 13.6a) means that every processing thread executes the computation tasks on a set of assigned trajectories. For example, Step 1 in ROI extraction follows the trajectory level parallelism that every thread generates the raster representation for every trajectory. Point level parallelism (Fig. 13.6b) refers to how the processing thread executes the computation tasks on a set of assigned points. For example, during the k-means computation, every thread calculates the distances between points. While the GPU-enabled parallel processing paradigm can launch large numbers of threads concurrently, every thread still processes a set of trajectories or a set of points concurrently. The typical workflow (Fig. 13.6c) starts with configuring the thread and the block sizes. Based on the thread-block configuration, the workflow copies data and relevant metadata information to every thread from CPU to GPU. Multiple threads start concurrently to process data (e.g., calculating the raster

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

237

Fig. 13.6 Task assignment: (a) trajectory level parallelism, (b) point level parallelism, and (c) the parallel execution workflow

coordinates of every point on a trajectory). Upon completion, all data are merged and sent back to the CPU for further processing.

3.5 Visualization The visualization module allows users to manipulate data, customize data mining tasks, and view data and analytical results. We enhance the interactive manipulation through providing additional visual tools such as line chart and histogram of attribute values so the users can configure the data mining tasks based on the spatial and temporal distributions of data (Fig. 13.7a). Three types of views are available: map view, table view, and chart view. There are additional configuration controls that allow interactive manipulation on data and visualization results. Map view: This view shows spatiotemporal data at different spatial and temporal levels in a dynamic map (Fig. 13.7b). Table view: This view shows detailed information from the database. It can be non-spatial that highlights the relationships among trajectories (Fig. 13.7c). Chart view: This view shows quantitative information to facilitate the explorations of mining configurations. When the configuration is updated, the summary results of pattern analysis will be plotted as charts (Fig. 13.7d).

238

X. Wang et al.

Fig. 13.7 (a) An overview of the system, (b) map view of vessels’ information, (c) table view, and (d) chart view of trajectory data summary

3.6 Other Implementation Details In order to test our system, we have utilized virtual machines on Microsoft Azure Cloud Computing Platform. All virtual machines are equipped with the NVIDIA GPUs. To compare performance and deploy GPU-accelerated database engine, we have deployed both Linux and Windows-based virtual machines through the Azure portal. This system follows a typical client-server architecture, which consists of a front-end web portal, a server-side database, and a set of query and pattern analysis tools. We chose JSP as the front-end programming language and Python as the server-side programming language. Both are high level programming languages that are popular in geospatial domains. Therefore, scientists can easily adopt the prototype system to support different applications. Note the system works only with machines equipped with NVIDIA graphics cards. We hope to expand the system in the future for better compatibility. To develop the system, we leverage the capabilities of three major libraries: (a) OmniSci Python client (pymapd) (b) Anaconda Numba (c) Baidu ECharts To build the GPU-enabled trajectory database, we use OmniSci as the database engine. We extend OmniSci to support trajectory related query and spatiotemporal operations. We develop a parallel version of pattern analysis methods with NVIDIA GPU. We choose Anaconda Numba, which provides Just-In-Time (JIT) Python compilation support for CUDA kernels (i.e., parallel functions). The performance of Numba kernels is comparable to C++ based implementations. Baidu ECharts

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

239

Fig. 13.8 Web-based model builder

p rovides a variety of visualization options and is highly scalable for a large volume of data. We leverage the options of Baidu ECharts to develop different views of visualization. In addition, we also provide a web-based model builder, which is similar to ArcGIS model builder, so that users can build and customize their own workflows without having to click on the control panel or writing their code (Fig. 13.8). Users can (1) drag data mining and querying modules onto the canvas, (2) configure methods of data processing, and (3) connect the modules and complete the workflow to perform data analysis.

4 Case Study 4.1 Overview of the Study Area and Datasets The Gulf Intracoastal Waterway is one of the busiest waterways in the world. To demonstrate the efficiency of the workflow, we extracted AIS data from the Marine Cadastre’s Zone 14 to 17 dataset of January 2018. We include a total number of 19,037,100 data points for 6282 unique trajectories. The average number of points per trajectory is 3030. A series of subsets of data are prepared and Appendix 1 summarizes the information of these datasets. We conduct three different groups (denoted as “Group 1,” “Group 2,” and “Group 3” in later sections) of tests for efficiency analysis. We have pre-processed and filtered the data from Zones 14–17 so that only trajectories with more than 1000 points are selected. We select 600 different trajectories from each zone. Tests in Group 1 compare the performance of the GPU-accelerated OmniSci database engine and

240

X. Wang et al.

Python’s Pandas analysis toolkit to demonstrate the performance gains of using GPU acceleration for data aggregation and queries. Tests in Group 2 and Group 3 evaluate the benefits of data mining using optimization methods to speed up the computation and are conducted with our parallel version trajectory data mining. Tests in Group 2 compare the performance differences with and without applying the GPU-accelerated method for trajectory data clustering. Tests in Group 3 compare the performance differences with and without using GPU for all data filtering and processing. In all groups, we record the time costs of running the query and analyze the performance of our machine learning methods. We have prepared two test environments on Microsoft Azure’s virtual machines. The first test environment is Windows-based (Windows 2016) and the second one is Linux-based (Ubuntu 18.1) virtual machine. Both virtual machines have 24 cores (E5-2690v3), 4 × K80 GPU (2 Physical Cards), and 224 GB memory. CUDA Toolkit 10.0 and Python 3.7 are installed on both of the virtual machines, whereas ArcGIS 10.6 is installed on the Windows machine and OmniSci (GPU-based community version) is installed on the Linux machine.

4.2 Performance Evaluation 4.2.1 B asic Queries: GPU- and CPU-Based Data Aggregations and Queries Five sets of trajectories with different sizes from Zone 17 (Fig. 13.9) are selected for comparisons between Pandas (with CPU) and OmniSci (with GPU) performances. In this test, we evaluate the impacts of using GPU-accelerated database engine to optimize the data query for a large-scale trajectory database. The speedup ratio is calculated as follows:

Fig. 13.9 Group 1 sample trajectories of Zone 17

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

Speedup Ratio =

241

CPU calculation time ( s )

GPU calculation time ( s )

(13.1)

We compare two different sets of tests: (1) spatial and temporal selection test and (2) data aggregation test. For the first test, we implement a spatial and a temporal restriction for both the OmniSci and Python to filter and query data. For the second test, we implement an aggregation function by grouping the selected trajectories based on names to find average speed. Five sets of trajectory data are all used for the two different tests. According to Fig. 13.10, using parallel processing has significantly improved the data query efficiency. The average speedup ratios are 26.3 for test 1 and 2.28 for test 2. The highest speedup ratios are 52.2 for test 1 and 2.5 for test 2, whereas the lowest speedup ratios are 12.6 for test 1 and 2.0 for test 2. We have found that the average speedup ratio decreases in test 1 as the query data size increases, which implies that GPU-based parallel querying operations are more likely impacted by the size of data. The speedup ratios of querying operations are higher than that of aggregation. This is due to the different parallelism strategies adopted, the performance results are different. The aggregation query function is based on parallel reduction and it is not fully exploiting the computational power of many-core GPUs. 4.2.2 Data Analysis Clustering (k-Means) In this group of tests, we evaluate the accuracy of our image-based clustering method and compare the performance differences of using parallel processing and CPU-based method. The first dataset (set 1) contains AIS data from Zones 14 and 15. The second dataset (set 2) contains AIS data from Zones 16 and 17 (Fig. 13.11).

Fig. 13.10 Speedup ratio of group 1 tests including (a) spatial and temporal selection speedup ratio and (b) aggregation speedup ratio

242

X. Wang et al.

Fig. 13.11 Group 2 sample trajectories of Zones 14–17

Fig. 13.12 Group 2 Zones 14–17 data performance results including (a) group 2 speedup ratio and (b) test accuracy

Five subsets of AIS data with different sizes are selected for the two datasets to test the efficiency of GPU-based model training and accuracy of clustering. We compare (a) the performance differences when performing trajectory clustering using CPUand GPU-based CAE for image compression with CAE and k-means clustering and (b) the accuracy of clustering results by comparing with pre-labeled data. In both cases, we use OmniSci database engine to accelerate data selection process. Model configurations can be found in Appendix 2.

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

243

We record the time cost of performing clustering to calculate speedup ratio and the final class assignments for test trajectory data. Figure 13.12b shows that the average clustering accuracy for the first set of data is 90% and the average clustering accuracy for the second set of data is 83%. The overall average of accuracy is 87%. Figure 13.12a shows that the speedup ratio remains constant in general (between 5 and 6) as we increase the size of test data in both cases. The speedup ratio is slightly higher for the second set of data but its accuracy is lower. This shows that the distribution and shapes of trajectories can affect model performance. ROI Identification In this ROI identification test of Group 2, we have selected five different sets of trajectories from Zone 14 (Fig. 13.13a) to compare and evaluate the performance of ROI identification with GPU acceleration. In this test, we evaluate the impacts of using parallel processing to optimize ROI identification by comparing our method with ArcGIS’s Point Density tool. The speedup ratio is calculated. According to Fig. 13.13b, using parallel processing has significantly improved the data query efficiency. The average speedup ratios are about 1.5. As the size of test data increases, the speedup ratios remain similar. Nevertheless, there is a slight decrease (by 16%) in the speedup ratio as the test data size increases (by 7 times) from dataset 1 to dataset 5. This is probably because the ROI computation is not fully parallel. For instance, although Numba library allows automatic parallelization and performs optimizations, the implementation of ROI identification model still relies on other libraries that may affect data processing. 4.2.3 Complete Workflow GPU-based parallel processing improves the performance of individual pattern analysis functions. Our complete workflow seamlessly integrates all GPU-enabled parallel processing functions to further improve the overall performance. Below we

Fig. 13.13 (a) ROI test trajectories of Zone 14 and (b) ROI speedup ratio result

244

X. Wang et al.

demonstrate the performance of the workflow. The workflow consists of the following steps: (a) an initial data exploratory with querying functions; (b) ROI-based pattern mining on the selected data; (c) another round of selection; (d) ROI-based pattern mining on the selected data; and (e) final comparison. A user configures a query by interactively setting up the conditions of the query. Note our GPU-based workflow connects all tasks of pattern mining, manages immediate data through the database, and prepares data with GPU functions (e.g., preparing selected data for ROI calculation). By contrast, the traditional analytical practice includes additional steps to store and prepare data for analysis (Fig. 13.14a). Figure 13.14b is a demonstration of GPU-based workflow created in our web-based model builder. In Group 3, we evaluate the performance based on Zone 17’s trajectory data. We have used all Zone 17’s January raw data and applied temporal restrictions to query data. For each round of test, we first apply a temporal frame to query data and perform ROI analysis and save the result. Then we apply another temporal frame and repeat the previous step. As the temporal frame is slightly larger in the second round of tests, the input data for ROI analysis also increases. We record all the individual query time costs and total time costs for performance comparisons. Based on the results in Fig. 13.15a, we can see that the speedup ratio decreases significantly as we increase the size of data. The speedup ratio for the second round is lower than that of the first round. The overall total time of calculation has increased by about 80% as the data size has increased by 4 times.

Fig. 13.14 (a) Complete workflow for ROI-based pattern mining and (b) sample GPU-based workflow created in the model builder

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

245

Fig. 13.15 (a) Speedup ratio and (b) total time costs for the complete workflow for ROI-based pattern mining

5 Conclusion We report the effort of designing and implementing a GPU-enabled analytical workflow for pattern analysis on trajectory data. The major steps of the workflow are implemented in a parallel manner. Users can configure a pattern mining workflow through the web portal in an interactive manner. The experiments demonstrate the potentials of using light-weight parallel computing techniques to support all stages of pattern mining analysis. In all experiments, we observe performance gains. The variations in speedup ratios are determined by the processing tasks, the degree of parallelism, and the data sizes. Further expansion includes: (a) integrating more methods such as line-based sequence representation to the workflow; (b) providing high level of interfaces for users with different levels of technical proficiency to enhance the usability of the workflow. Acknowledgements The authors would like to thank the Microsoft Azure Cloud Computing Platform and Services for offering cloud credits. Table 13.1 Group 1: spatial and temporal selection

Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5

Trajectories 198 265 359 425 489

Points 56,303 107,613 160,654 224,440 280,086

246

X. Wang et al.

Table 13.2 Group 1: aggregation

Table 13.3 Group 2: clustering analysis

Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5

Set1 subset 1 Set1 subset 2 Set1 subset 3 Set1 subset 4 Set1 subset 5 Set2 subset 1 Set2 subset 2 Set2 subset 3 Set2 subset 4 Set2 subset 5

Trajectories 939 996 1034 1101 1130 887 959 1013 1089 1134

Points 5,158,832 6,208,392 6,931,493 8,127,300 8,964,427 3,318,627 4,061,089 4,566,102 5,328,967 5,820,866

Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5

Trajectories 122 245 367 490 613

Points 578,048 1,767,673 2,897,851 4,029,606 4,625,709

Table 13.4 Group 2: ROI analysis

Table 13.5 Group 3: complete workflow

Trajectories Points 1390 169,685 1479 480,349 1535 707,585 1661 1,586,405 1703 1,745,280

Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5

Round 1 points 995,541 1,995,213 2,999,248 4,061,839 5,141,769

Round 2 points 1,995,213 2,999,248 4,061,839 5,141,769 6,247,085

13 Building a GPU-Enabled Analytical Workflow for Maritime Pattern Discovery…

247

Appendix 1 Below is the information of the test datasets.

Appendix 2 To build our CAE model, we set 32 for batch size, 50 for epoch, Adadelta as optimizer, and binary cross-entropy as loss. For each set of test data, we use 70% of them for model training and use the other 30% for model testing.

References Chang, S.-J., Yeh, K.-H., Peng, G.-D., Chang, S.-M., & Huang, C.-H. (2015). From safety to security- pattern and anomaly detections in maritime trajectories. In 2015 International Carnahan Conference on Security Technology (ICCST) (pp. 415–419). IEEE. Dawson, J., Pizzolato, L., Howell, S. E., Copland, L., & Johnston, M. E. (2018). Temporal and spatial patterns of ship traffic in the Canadian Arctic from 1990 to 2015+ supplementary appendix 1: figs. S1–S7 (see article tools). Arctic, 71, 15–26. Feng, Z., & Zhu, Y. (2016). A survey on trajectory data mining: Techniques and applications. IEEE Access, 4, 2056–2067. Gowanlock, M. G., & Casanova, H. (2014). Parallel distance threshold query processing for spatiotemporal trajectory databases on the GPU. Gudmundsson, J., & Valladares, N. (2015). A GPU approach to subtrajectory clustering using the Fréchet distance. IEEE Transactions on Parallel and Distributed Systems, 26, 924–937. Huang, P., & Yuan, B. (2015). Mining massive-scale spatiotemporal trajectories in parallel: A survey. In Trends and applications in knowledge discovery and data mining (pp. 41–52). Cham: Springer. Laxhammar, R., Falkman, G., & Sviestins, E. (2009). Anomaly detection in sea traffic-a comparison of the Gaussian mixture model and the kernel density estimator. In 2009 12th International Conference on Information Fusion (pp. 756–763). IEEE. Li, J., Jiang, Y., Yang, C., Huang, Q., & Rice, M. (2013). Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs). Computers & Geosciences, 59, 78–89. Li, J., Wang, X., Zhang, T., & Xu, Y. (2018a). Efficient parallel K best connected trajectory (K-BCT) query with GPGPU: A combinatorial min-distance and progressive bounding box approach. ISPRS International Journal of Geo-Information, 7, 239. Li, J., Xu, Y., Macrander, H., Atkinson, L., Thomas, T., & Lopez, M. A. (2019). GPU-based lightweight parallel processing toolset for LiDAR data for terrain analysis. Environmental Modelling & Software, 117, 55–68. Li, X., Zhao, K., Cong, G., Jensen, C. S., & Wei, W. (2018b). Deep representation learning for trajectory similarity computation. In 2018 IEEE 34th International Conference on Data Engineering (ICDE) (pp. 617–628). IEEE. Nurvitadhi, E., Sheffield, D., Sim, J., Mishra, A., Venkatesh, G., & Marr, D. (2016). Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC. In 2016 International Conference on Field-Programmable Technology (FPT) (pp. 77–84). IEEE.

248

X. Wang et al.

OmniSci. (2018). OmniSci technical white paper. Retrieved July 30, 2019, from http://www2. omnisci.com/resources/technical-whitepaper/lp?_ga=2.192127720.316702718.1564495503 -925270820.1564495503 OmniSci. (2019). Retrieved February 23, 2018, from https://www.omnisci.com/ Owens, J. D., Houston, M., Luebke, D., Green, S., Stone, J. E., & Phillips, J. C. (2008). GPU computing. Proceedings of the IEEE, 96, 879. Pallotta, G., Vespe, M., & Bryan, K. (2013). Vessel pattern knowledge discovery from AIS data: A framework for anomaly detection and route prediction. Entropy, 15, 2218–2245. Safety of Life at Sea (SOLAS). Convention chapter V, regulation 19. Sart, D., Mueen, A., Najjar, W., Keogh, E., & Niennattrakul, V. (2010). Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In 2010 IEEE International Conference on Data Mining (pp. 1001–1006). IEEE. Shelmerdine, R. L. (2015). Teasing out the detail: How our understanding of marine AIS data can better inform industries, developments, and planning. Marine Policy, 54, 17–25. Sheng, P., & Yin, J. (2018). Extracting shipping route patterns by trajectory clustering model based on automatic identification system data. Sustainability, 10, 2327. Silveira, P. A. M., Teixeira, A. P., & Soares, C. G. (2013). Use of AIS data to characterise marine traffic patterns and ship collision risk off the coast of Portugal. The Journal of Navigation, 66, 879–898. Tu, E., Zhang, G., Rachmawati, L., Rajabally, E., & Huang, G.-B. (2018). Exploiting AIS data for intelligent maritime navigation: A comprehensive survey from data to methodology. IEEE Transactions on Intelligent Transportation Systems, 19, 1559–1582. Zhang, L., Meng, Q., & Fwa, T. F. (2017). Big AIS data based spatial-temporal analyses of ship traffic in Singapore port waters. Transportation Research Part E: Logistics and Transportation Review, 129, 287. Zhang, T., & Li, J. (2015). Online task scheduling for LiDAR data preprocessing on hybrid GPU/ CPU devices: A reinforcement learning approach. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8, 386–397. Zhao, Y., Sheong, F. K., Sun, J., Sander, P., & Huang, X. (2013). A fast parallel clustering algorithm for molecular simulation trajectories. Journal of Computational Chemistry, 34, 95–104. Zheng, Y. (2015). Trajectory data mining: An overview. ACM Transactions on Intelligent Systems and Technology (TIST), 6, 29.

Chapter 14

Domain Application of High Performance Computing in Earth Science: An Example of Dust Storm Modeling and Visualization Qunying Huang, Jing Li, and Tong Zhang

Abstract Earth science models often raise computational challenges, requiring a large number of computing resources, and serial computing using a single computer is not sufficient. Further, earth science datasets produced by observations and models are increasingly larger and complex, exceeding the limits of most analysis and visualization tools, as well as the capacities of a single computer. HPC enabled modeling, analysis, and visualization solutions are needed to better understand the behaviors, dynamics, and interactions of the complex earth system and its sub- systems. However, there are a wide range of computing paradigms (e.g., Cluster, Grid, GPU, Volunteer and Cloud Computing), and associated parallel programming standards and libraries (e.g., MPI/OpenMPI, CUDA, and MapReduce). In addition, the selection of specific HPC technologies varies widely for different datasets, computational models, and user requirements. To demystify the HPC technologies and unfold different computing options for scientists, this chapter first presents a generalized HPC architecture for earth science applications, and then demonstrates how such a generalized architecture can be instantiated to support the modeling and visualization of dust storms. Keywords High performance computing · Distributed computing · Cloud computing · Parallel computing

Q. Huang (*) Department of Geography, University of Wisconsin-Madison, Madison, WI, USA e-mail: [email protected] J. Li Department of Geography and the Environment, University of Denver, Denver, CO, USA T. Zhang State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, Hubei, China © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_14

249

250

Q. Huang et al.

1 Introduction The advancement of observations, analysis and prediction of our earth system, such as weather and climate, could inform crucial environment decisions impacting current and future generations, and even significantly save lives (Shapiro et al. 2010). Computational modeling and numerical analysis are commonly used methods to understand and predict the behaviors, dynamics, and interactions of the earth system and its sub-systems, such as atmosphere, ocean, and land. High performance computing (HPC) is essential to support these complex models and analysis, which often require the availability of a large number of computing resources (Huang et al. 2013b; Prims et al. 2018; Massonnet et al. 2018). HPC enables earth prediction models to incorporate more realistic physical processes and a high degree of earth system complexity (Shapiro et al. 2010). As such, the Japanese government imitated the Earth Simulator project in 1997 to promote research for global change predictions by using HPC simulation. A large-scale geophysical and tectonic modeling cluster was also built to perform a range of geosciences simulations at Munich University’s Geosciences department (Oeser et al. 2006). Further, earth system modeling and simulations are able to produce enormous amounts of spatiotemporal data at the peta and exa-scale (Feng et al. 2018). The growth of data volumes gives rise to serious performance bottlenecks in analyzing and visualizing data. In particular, performing interactive visualization can impose computational intensity on computing devices, and exceed the limits of most analysis tools and the capacities of these devices. To address the intensity, scientists have utilized high performance techniques to analyze, visualize, interpret, and understand data of large magnitude (Hoffman et al. 2011; Ayachit et al. 2015; Bauer et al. 2016; Childs 2012). Previously, most of the high performance visualization solutions required high-end computing devices. Most recently, visualization of massive data with remote servers was enabled by utilizing parallel computing capabilities in the cloud to perform visualization intensive tasks (Zhang et al. 2016). With the remote high performance visualization paradigm, users only need to install a light- weight visualization client to visualize massive data (Al-Saidi et al. 2012; Huntington et al. 2017). Such a design removes the hardware constraints to end users and delivers scalable visualization capabilities. To date, the choice of HPC technologies varies widely for different datasets, computational models, and user requirements (Huang et al. 2018). In fact, there are a wide range of computing paradigms (e.g., Cluster, Grid, GPU, Volunteer and Cloud Computing), and associated data parallel programming standards and libraries, such as Message Passing Interface (MPI)/OpenMPI, Compute Unified Device Architecture (CUDA), and MapReduce. While Cluster and Grid Computing used to dominate HPC solutions, Cloud Computing has recently emerged as a new computing paradigm with the goal of providing computing infrastructure that is economical and on-demand based. As an accompaniment, the MapReduce parallel programming framework becomes increasingly important to support distributed computing over large datasets on cloud. More complicatedly, a single HPC infrastructure may

14 Domain Application of High Performance Computing in Earth Science…

251

not meet the computing requirements for a scientific application, leading to the popularity of hybrid computing infrastructure, which leverages multi-sourced HPC resources from both local and cloud data centers (Huang et al. 2018), or GPU and CPU devices (Li et al. 2013). As such, it has been a challenge to select optimal HPC technologies for a specific application. The first step to address such challenge is to unmask different computing options for scientists. This chapter will first present a generalized HPC architecture for earth science applications based on the summary of existing HPC solutions from different aspects, including the data storage and file system, computing infrastructure, and programming model. Next, we demonstrate the instantiation of the proposed computing architecture to facilitate the modeling and visualization of earth system process using dust storm as an example.

2 Related Work 2.1 HPC for Environmental Modeling Previously, large-scale earth science applications, such as ocean and atmosphere models, were developed under monolithic software-development practices, usually within a single institution (Hill et al. 2004). However, such a practice prevents the broader community from sharing and reusing earth system modeling algorithms and tools. To increase software reuse, component interoperability, performance portability, and ease of use in earth science applications, the Earth System Modeling Framework project was initiated to develop a common modeling infrastructure, and standards-based open-source software for climate, weather, and data assimilation applications (Hill et al. 2004; Collins et al. 2005). The international Earth-system Prediction Initiative (EPI) was suggested to provide research and services to accelerate advances in weather, climate, and earth system prediction and the use of this information by global societies by worldwide scientists (Shapiro et al. 2010). In addition, these models are mostly processed sequentially on a single computer without taking advantage of parallel computing (Hill et al. 2004; Collins et al. 2005; Shapiro et al. 2010). In contrast to sequential processing, parallel processing distributes data and tasks to different computing units through multiple processes and/or threads to reduce the data and tasks to be handled on each unit, as well as the execution time of each computing unit. The advent of large-scale HPC infrastructure has benefited earth science modeling tremendously. For example, Dennis et al. (2012) demonstrated that the resolution of climate models can be greatly increased on HPC architectures by enabling the use of parallel computing for models. Building upon Grid Computing and a variety of other technologies, the Earth System Grid provides a virtual collaborative environment that supports the management, discovery, access, and analysis of climate datasets in a distributed and heterogeneous computational environment

252

Q. Huang et al.

(Bernholdt et al. 2005; Williams et al. 2009). The Land Information System software was also developed to support high performance land surface modeling and data assimilation (Peters-Lidard et al. 2007). While parallel computing has moved into the mainstream, computational needs mostly have been addressed by using HPC facilities such as clusters, supercomputers, and distributed grids (Huang and Yang 2011; Wang and Liu 2009). But, they are difficult to configure, maintain, and operate (Vecchiola et al. 2009), and it is not economically feasible for many scientists and researchers to invest in dedicated HPC machines sufficient to handle large-scale computations (Oeser et al. 2006). Unlike most traditional HPC infrastructures (such as clusters) that lack the agility to keep up with the requirements for more computing resource to support the addressing of big data challenge, Cloud Computing provides a flexible stack of massive computing, storage, and software services in a scalable manner at low cost. As a result, more and more scientific applications traditionally handled by using HPC Cluster or Grid facilities have been tested and deployed on the cloud and various strategies and experiments have been made to better leverage the cloud capabilities (Huang et al. 2013a; Ramachandran et al. 2018). For example, knowledge infrastructure, where cloud-based HPC platform is used to host the software environment, models, and personal user space, is developed to address common barriers to enable numerical modeling in Earth sciences (Bandaragoda et al. 2019).

2.2 HPC for Massive Data Visualization In the earth science domain, observational and modeled data created by the various disciplines vary widely in both temporal scales, ranging from seconds to millions of years, and spatial scales from microns to thousands of kilometers (Hoffman et al. 2011). The ability to analyze and visualize these massive data is critical to support advanced analysis and decision making. Over the past decade, there has been growing research interest in the development of various visualization techniques and tools. Moreland et al. (2016) built a framework (VTK-m) to support the feasible visualization on large volume data. Scientists from Lawrence Livermore National Laboratory (LLNL) have successfully developed an open-source visualization tool (Childs 2012). The Unidata program center also released a few Java-based visual analytics tools for various climate and geoscience data, such as the Integrated Data View (IDV) and Advanced Weather Interactive Processing System (AWIPS) II (Unidata 2019). Williams et al. (2013) developed Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT) to support the visualization on massive data. Li and Wang (2017) developed a virtual globe based high performance visualization system for visualizing multidimensional climate data. Despite a number of visual analytics tools have been developed and put into practice, most of those solutions depend on specific HPC infrastructure to address computational intensity. Such dependence prohibits scientists who have limited accessibility to high-end computing resources from exploring larger datasets.

14 Domain Application of High Performance Computing in Earth Science…

253

Performing interactive visualization of massive spatiotemporal data poses a challenging task for resource-constrained clients. Recently, scientists have explored remote visualization methods implemented on Cloud Computing facilities (Zhang et al. 2016). A complete interactive visualization pipeline for massive data that can be adapted to feasible computing resources is desirable.

3 H igh Performance Computing Solution Architecture for Earth Science To address major computing challenges from earth system modeling and data visualization, this section proposes a generalized HPC architecture by summarizing common HPC tools, libraries, technologies, and solutions (Fig. 14.1). Such an architecture includes four layers, data storage and file system, computing infrastructure, programming model, and application. While there are different computing technologies and options available from each layer, the combinations of these technologies can be used to solve most data and computing challenges raised from an earth science application. The application layer includes common libraries, tools, and models for a specific scientific problem. This layer varies with different earth science applications and problems, which in turn determines the other three layers providing the support of data storage, computing, and programming technologies for the problem.

Fig. 14.1 A generalized HPC solution architecture for earth science applications

254

Q. Huang et al.

3.1 Data Storage and File System The purpose of the first layer in the architecture is to organize and store datasets. Different storage models could have different impacts on the performance for I/O and data-intensive applications (Huang et al. 2013b). Additionally, the performance of the parallel system would be compromised if the network connection and topology of the remote storage and computing nodes were not properly configured (Huang and Yang 2011). The selection of a file system depends highly on the selected computing infrastructure (Sect. 3.2). In the HPC cluster infrastructure, for example, each computing node is usually designed to access the same remote data storage to execute tasks in parallel. Therefore, a network file system (NFS) or other methods, such as Lustre (Schwan 2003), Parallel Virtual File System (PVFS; Ross and Latham 2006), and General Parallel File System (GPFS) developed by IBM, are used to share the storage and to ensure the synchronization of data access. In other words, these file systems are mostly used for HPC servers and supercomputers, and can provide high speed access to a large number of data. The demand and challenge for big data storage and processing has catalyzed the development and adoption of distributed file systems (DFSs; Yang et al. 2017), which meet the challenge with four key techniques: storage of small files, load balancing, copy consistency, and de-duplication (Zhang and Xu 2013). DFSs allow access to files via a network from multiple computing nodes which include cheap “commodity” computers. Each file is duplicated and stored on multiple nodes enabling a computing node to access the local dataset to complete the computing tasks. With advances in distributed computing and Cloud Computing, many DFSs have developed to support large-scale data applications and to fulfill other cloud service requirements. For example, the Google File System is a DFS designed to meet Google’s core data storage and usage needs (e.g., Google Search Engines). Hadoop distributed file system (HDFS), used along with the Hadoop Cloud Computing infrastructure and MapReduce programming framework (Sect. 3.3), has been adopted by many IT companies, including Yahoo, Intel, and IBM, as their big data storage technology. In many cases, to enable the efficient processing of scientific data, a computing infrastructure has to leverage the computation power of both traditional HPC environment and the big data cloud computing systems, which are built upon different storage infrastructures. For example, Scientific Data Processing (SciDP) was developed to integrate both PFS and HDFS for fast data transfer (Feng et al. 2018).

3.2 Computing Infrastructure The computing infrastructure layer provides facilities to run the computing tasks, which could be run on central processing units (CPUs), graphics processing units (GPUs) or hybrid units to best utilize CPUs and GPUs collaboratively (Li et al.

14 Domain Application of High Performance Computing in Earth Science…

255

2013). The CPU or GPU devices can be deployed onto a single supercomputer or/ and multiple computing severs working in parallel. They can be embedded in local computing resources, or cloud-based virtual machines (Tang and Feng 2017). A hybrid CPU-GPU computing infrastructure can leverage both CPUs and GPUs cooperatively to maximize their computing capabilities. Further, a hybrid local- cloud infrastructure can be implemented by running tasks on local HPC systems, and bursting to clouds (Huang et al. 2018). Such a hybrid infrastructure often runs computing tasks on local computing infrastructure by default, but automatically leverages the capabilities of cloud resources when needed. Advances in computing hardware have greatly enhanced the computing capabilities for earth science data and models. Before GPU computing technology becomes popular, scientists designed computing strategies with high performance CPU clusters or multi-core CPUs, which addressed the computational needs for earth science for many years. As a specialized circuit, GPU was initially designed to accelerate image processing. Advances in GPUs have yielded powerful computing capabilities for scientific data processing, rendering, and visualization, serving as ubiquitous and affordable computing devices for high performance applications (Li et al. 2013). Similar to the multithreading parallelization implemented with CPUs, GPU Computing enables programs to process data or tasks through multiple threads with each executing a portion of the data or tasks. Currently, GPU Computing has been gradually integrated into cloud computing infrastructure, called GPU-based Cloud Computing, which benefits from the efficiency of the parallel computing power of GPUs, as well as utilizing cloud capabilities simultaneously. Typically, this layer will include a middleware deployed on each computing server to discover, organize, and communicate with all the computing devices. The middleware also implements the functions of scheduling jobs and collecting the results. Several open-source middleware solutions are widely used, including MPICH (Gropp 2002), and the Globus Grid Toolkit (Globus 2017). As a standard for Grid Computing infrastructure, the Globus Grid Toolkit is a low-level middleware and requires substantial technical expertise to set up and utilize (Hawick et al. 2003).

3.3 Programming Model The programming model layer determines the standards, libraries, and application programming interfaces (APIs) to interact with the underlying HPC devices, and achieve parallel processing of datasets and computing tasks. Similar to the storage and file system layer, the selection of the programming model or framework most likely is determined by the computing infrastructure layer. For example, a GPU Computing infrastructure requires leveraging libraries, such as CUDA, ATI Stream, and OpenACC, to invoke GPU devices, and MPI defines communication specifications to implement parallel computing on HPC clusters (Prims et al. 2018).

256

Q. Huang et al.

As a distributed programming model, MapReduce includes associated implementation to support distributed computing, and is increasingly utilized for scalable processing and generation of large datasets due to its scalability and fault tolerance (Dean and Ghemawat 2008). In both industry and academia, it has played an essential role by simplifying the design of large-scale data-intensive applications. The high demand on MapReduce has stimulated the investigation of MapReduce implementations with different computing architectural models and paradigms, such as multi-core clusters, and Clouds (Jiang et al. 2015). In fact, MapReduce has become a primary choice for cloud providers to deliver data analytical services since this model is specially designed for data-intensive applications (Zhao et al. 2014). Accordingly, many traditional algorithms and models developed in a single machine environment have been moved to the MapReduce platform (Kim 2014; Cosulschi et al. 2013).

4 Dust Storm Modeling and Visualization Using HPC This section presents two case studies, dust storm modeling and model output visualization, to demonstrate how the generalized computing architecture in Sect. 3 can be instantiated to support real-world earth science applications.

4.1 Dust Storm Modeling In this work, non-hydrostatic mesoscale model (NMM)-dust (Huang et al. 2013b) is used as the dust storm simulation model for demonstration purposes. Model parallel implementation is supported through the MPI programming model at the programming model layer, and adopts an HPC cluster at the computing infrastructure layer (Fig. 14.1). Specifically, MPICH is used to manage the HPC cluster. In the HPC cluster, all computing nodes access the same remote data storage, and NFS is used in the data storage and file system layer to share the storage, and to synchronize data access at the data storage and file system layer. 4.1.1 Dust Storm Models Dust storm simulation models are developed by coupling dust process modules to atmospheric process models. The Dust Regional Atmospheric Model (DREAM) is one of the most used for dust cycle modeling (Nickovic et al. 2001). DREAM can be easily incorporated into many atmosphere models, such as the Eta weather prediction model (Janjić 1994), including Eta-4bin that simulates and divides dust into 4 classes of particle size, and Eta-8bin for 8 classes of particle size simulation. While both Eta-4bin and Eta-8bin have been tested for various dust storm episodes

14 Domain Application of High Performance Computing in Earth Science…

257

in various places and resolutions, Eta has a coarse spatial resolution of 1/3 of a degree that cannot be used for many potential applications (Xie et al. 2010). Additionally, numerical weather prediction models run in sequence and reach valid limits for increasing resolution. Therefore, the Eta model was replaced by a Non- hydrostatic Mesoscale Model (NMM), which can produce high resolution forecasting up to 1 km (KM) and runs in parallel (Janjic 2003). As such, the coupling of DREAM and NMM (NMM-dust; Huang et al. 2013b) is adopted in this work for achieving parallel processing and high resolution forecasting of dust storms. 4.1.2 Parallel Implementation The atmosphere is modeled by dividing the domain (i.e., study area) into 3D grid cells, and solving a system of coupled nonlinear partial differential equations on each cell (Huang et al. 2013b). The calculations of the equations on each cell are repeated with a time step to model phenomena evolution. Correspondingly, the computational cost of an atmospheric model is a function of the number of cells in the domain and the number of time steps (Baillie et al. 1997). As dust storm models are developed by adding dust solvers into the regional atmospheric models, dust storm model parallelization parallelizes the core atmospheric modules using a data decomposition approach. Specifically, the study domain represented as 3D grid cells is decomposed into multiple sub-domains (i.e., sub-regions), which are then distributed onto a HPC cluster node and processed by one CPU of the node as a process (i.e., task). Figure 14.2 shows parallelizing a 4.5° × 7.1° domain with a spatial resolution 0.021° into 12 sub-domains for one vertical layer. This would result in 215 × 345 grid cells with 71 × 86 cells for each sub-domain except for these on the border. The processes handling the sub-domains will need to communicate with their neighbor processes for local computation and synchronization. The processes responsible for handling the sub-domain within the interior of the domain, such as sub-domains 4 and 7, will require communication among four neighbor processes, causing intensive

Fig. 14.2 An example of parallelizing a 4.56° × 7.12° domain to 12 sub-domains with 0.021° spatial resolution forecasting requirement, producing 215 × 343 grid cells in total

258

Q. Huang et al.

communication overhead. During the computation, the state and intermediate data representing a sub-domain are produced in the local memory of a processor. Other processes need to access the data through file transfer across the computer network. The cost of data transfer due to the communication among neighbor sub-domains is a key efficiency issue because it adds significant overhead (Baillie et al. 1997; Huang et al. 2013b). 4.1.3 Performance Evaluation This HPC cluster, used to evaluate the performance and scalability of the proposed HPC solution to speed up the computation of dust storm model, has a 10 Gbps network and 14 computing nodes. Each node has 96 GB of memory and dual 6-core processors (12 physical cores) with a clock frequency of 2.8 GHz. We parallelized the geographic scope of 4.5° × 7.1° along the longitude and latitude, respectively, into different sub-domains and utilized different computing nodes and process (i.e., sub-domain) numbers to test the performance (Fig. 14.3). The total execution time, including the computing time to run model and communication and synchronization time among processes, drops sharply with the increase of the process (sub-domain) number from 8 to 16, and then to 24. After that, the computing time is still reduced but not significantly, especially when two computing nodes are used. This is because the communication and synchronization time is also gradually increased until equal to the computing time when using 96 processes (Fig. 14.4). The experiment result also shows that the execution times of the model with different domain sizes converge to roughly the same values when the number of CPUs increases. The cases where 7 and 14 computing nodes are used

Fig. 14.3 Scalability experiment with different computing nodes and different sub-domain numbers to run the NMM-dust model over a rectangular area of 4.5° × 7.1° in the southwest US for 3 km resolution, and 3-h simulations

14 Domain Application of High Performance Computing in Earth Science…

259

Fig. 14.4 Comparison of the total computing time and communication and synchronization time with different numbers of sub-domains/processes involved using 7 computing nodes

yield similar performance. Especially, when more and more processes are utilized, seven computing nodes could have a little better performance than 14 computing nodes. In summary, this experiment demonstrates that HPC can significantly reduce the execution time for the dust storm modeling. However, the communication overhead could result in two scalability issues: (1) No matter how many computing nodes are involved, there is always a peak performance point of the highest number of processes that can be leveraged for a specific problem size. The peak point is 128 processes for 14 computing nodes, 80 processes for 7 computing nodes, and 32 for 2 computing nodes; and (2) A suitable number of computing nodes should be used to complete the model simulation.

4.2 Massive Model Output Visualization The visualization of massive NMM-dust model data is implemented by developing a tightly-coupled cloud-enabled remote visualization system for data analysis. Scientists can exploit our remote visualization system to interactively examine the spatiotemporal variations of dust load at different pressure levels. We adopt ParaView as the primary visualization engine. As an open-source, multi-platform data analysis and visualization application, ParaView supports distributed rendering with multiple computing nodes. We have extended ParaView by including interfaces to process model outputs and perform rendering based on view settings sent from the client. Several visualization methods are available for multidimensional data, such as volume rendering with ray casting, iso-volume representation, and field visualization. To support the parallel visualization, computing resources from Amazon Elastic Cloud Computing (EC2) are used to build the computing

260

Q. Huang et al.

infrastructure (Fig. 14.1). The communication between instances is based on MPICH as well. The datasets are hosted in the NFS and are sent to the computing nodes during the visualization process. We also built JavaScript based tools in the application layer to support the interaction between users and the computing infrastructure (Fig. 14.1). 4.2.1 Test Datasets The NMM-dust model generates one output in NetCDF format at a three-hour interval for the simulated spatiotemporal domain. Each output contains the information of dust dry deposition load (μg/m2), dust wet deposition (μg/m2), total dust load (μg/ m2), and surface dust concentration of PM10 aerosols (μg/m3) at four dimensions: latitude, longitude, time, and pressure. Performance evaluation is conducted using two model datasets about 50 Megabytes (MB) and 1.8 GigaBytes (GB) in size (referred as the small and large dataset hereafter respectively; Table 14.1). Note that although the large dataset is relatively small compared to some massive climate simulation datasets in the literature, the dust load distribution was described at a spatial resolution of 0.021°, which is relatively high at the regional scale compared to other data from mesoscale climate simulation models. The datasets can be easily expanded by including larger temporal time frames. The production of dust storm data is computationally demanding (Xie et al. 2010) and presents a challenging problem for interactive remote data visualization. 4.2.2 Parallel Implementation We followed the parallel design of ParaView to build the visualization pipeline. In particular, we have extended ParaView to support time series volume rendering for multidimensional datasets. The distributed execution strategy is based on data parallelism that splits one dataset into many subsets and distributes the subsets on multiple rendering nodes. Upon finishing rendering on the subsets, every node will Table 14.1 Test data description Dataset Latitude Small Dimension Resolution Start End Large Dimension Resolution Start End

Longitude 91 0.3° 23 50 480 0.021° 26 −113

Temporal 121 0.3° −128 −92 240 0.021° 36 −108

Pressure level Attribute Size Dust dry deposition, 25 8 dust web deposition, 3 h surface dust 7/1/2007 concentration, total 7/4/2007 dust load 25 24 3 h 7/1/2007 7/4/2007

50 MB

1.8 GB

14 Domain Application of High Performance Computing in Earth Science…

261

Fig. 14.5 The parallel implementation strategy of parallel rendering

upload the rendering results for combination. A sort last composite rendering method is used to ensure the correct composition of the final rendering results. The visualization pipeline is identical on all nodes. Figure 14.5 shows the parallel rendering process on the model output. The output dataset is stored in the form of NetCDF, which represents an attribute of a time period as a multidimensional array. The parallel strategy splits the array into a series of 3D arrays. Every 3D array is further decomposed into small arrays as subsets. The decomposition is along all three spatial dimensions. Every subset is assigned to a processor to perform rendering. Upon the completion of rendering by the processors, the pipeline merges the results to form the final image. Note only the 3D arrays from the same time stamp will be merged to ensure that all relevant subsets of data are combined. 4.2.3 Performance Evaluation We examined the relative performance of pipeline components. We used a range of m1.xlarge instances provided by Amazon EC2 service and measured the rendering performance delivered by Amazon EC2 clusters of 1–8 instances with 4–32 cores. Every instance has one four-core CPU, 15 GB main memory and more than 1 TB storage. Due to the popularity of volume rendering for multidimensional data, we use volume rendering as an example of the high performance visualization application (Fig. 14.6). The ray casting algorithm was adopted as the default volume rendering method for representing 3D spatiotemporal attributes such as surface dust concentration at different pressure levels. For each time step, dust information on 3D voxels (longitude, latitude, and pressure level) can be rendered and displayed in high-quality images. Unless otherwise specified, all results reported below were averaged over five independent runs. Figure 14.7 shows the detailed breakdowns of overall time costs given different network connections and CPU cores. The entire pipeline is divided into six components: sending user requests on visualization, connecting and disconnecting to the cloud, reading data and variable values, preparing visualization (including preparing duplicated data subset for parallel rendering), rendering (creating images and streaming), and transferring visualization results. To identify the impact of

262

Q. Huang et al.

Fig. 14.6 Examples showing interactive plots of the dust storm data. Left figure: A bird’s-eye view; Right figure: An orthographic view

n etworking, user requests are issued from three different locations: (1) a residential area that is connected to the network with low internet speed (less than 20 Mbps), (2) a campus area (organization) with relatively stable and fast Internet-2 connection (200–300 Mbps), and (3) Amazon inter-region connections with average network speed more than 100 Mbps. These three types are typical connections that users may have. Figure 14.7 shows the visualization with the small dataset, whereas Fig. 14.8 shows the visualization with the large dataset in the cloud. In Fig. 14.7, in the case of 1 core is used, connection time is zero as the no connection with multiple CPU cores is initiated. Sending requests, taking extremely small amount of time (~0.002 s), are not included in the figures. The principal observation from Figs. 14.7 and 14.8 is that rendering time is the dominant cost (e.g., 61.45% in average for the larger dataset), followed by connection time if multiple CPU cores are used. Main performance bottlenecks of our system include server side rendering with the result that rendering of massive data degrades remote visualization performance. When increasing the number of nodes, the time spent on rendering reduces, though, not proportionally. Therefore, in the case of multidimensional visualization, we suggest that configuring a powerful rendering cluster is important in delivering an efficient visualization pipeline. Connection is the second most time consuming process in the visualization pipeline. It occurs when multiple cores are used. The connection time changes slightly with the number of cores but does not increase with the size of the dataset. The connection time is mainly the warm-up time for visualization nodes. Similarly, the preparation time increases with the number of nodes as the datasets should be further divided to produce more subtasks for visualization. The effects of network speeds between the user client and server side on the overall runtime are very minor. Both sending requests and transferring rendered images have limited contribution to the overall runtime (e.g., 0.02% and 1.23% in average, respectively, for the larger dataset). The time to transfer the rendered image is basically constant for all the tests as the size of the rendered image does not vary

14 Domain Application of High Performance Computing in Earth Science…

263

Fig. 14.7 Time cost allocation for remote visualization pipeline components with the small datasets. Disconnection time is not included due to its small time cost. We have tested the visualization pipeline in three type of network environments, each with a one-core instance and a four-core instance. a) Time costs of data reading (grey bar), preparation (yellow bar) and image transfer (green bar). b) Time costs of connecting to the cloud (orange bar) and rendering images (blue bar)

greatly. Other resources such as data storage and network connection are less important in the pipeline as they take relatively small amounts of time. Another observation is that each dataset presents a unique time allocation pattern: system connection and image transfer time for the small dataset (Fig. 14.7) is a much larger share than for the large dataset (Fig. 14.8) because the rendering takes less time, indicating that the applicability of remote visualization is poor when the data size is small. In particular, we do not see significant performance gains when increasing the computing cores from 1 to 4 due to high communication overheads in the cloud.

264

Q. Huang et al.

Fig. 14.8 Time cost allocation for remote visualization pipeline components with the large datasets. Disconnection time is not included due to its small time cost. We have tested the visualization pipeline in three type of network environments, each with a one-core instance and a four-core instance. a) Time costs of data reading (grey bar), preparation (yellow bar) and image transfer (green bar). b) Time costs of connecting to the cloud (orange bar) and rendering images (blue bar)

In summary, based on the implementation and the tests, we make the following recommendation when planning a remote visualization cluster. 1. The benefits of a remote visualization framework are to provide powerful visualization capabilities through using multiple nodes and to reduce the hardware requirements on the clients. Building such a shared remote visualization framework will be beneficial for organization and communities to reduce shared costs. 2. Generally speaking, adding more powerful computing nodes to a visualization cluster will increase the rendering performance. However, additional overhead such as data preparation, connection to multiple nodes should be carefully considered to achieve better performance. 3. The visualization costs vary significantly with different visualization methods and data volumes. Therefore, users should evaluate the data sizes and perform small scale visualization tests on the data before configuring remote visualiza-

14 Domain Application of High Performance Computing in Earth Science…

265

tion strategies. In particular, users should evaluate the performance gains when changing the number of cores from 1 to a larger number, 4. Given the relatively long warm-up time of computing nodes, system administrators should modify the visualization pipeline so that the pipeline requires only one time of connection and multiple rounds of interactive operations.

5 Conclusion This chapter reviews relevant work on using HPC to address the major challenges from earth science modeling and visualization, and presents a generalized HPC architecture for earth science. Next, we introduce two case studies, dust storm modeling and visualization, to demonstrate how such a generalized HPC architecture can be instantiated to support real-world applications. Specifically, the dust storm modeling is supported through local Cluster Computing, and model output visualization is enabled through Cloud-based Cluster Computing, where all computing resources are provisioned on the Amazon EC2 platform. For both cases, all computing nodes communicate through the MPI and datasets are managed by the NFS. Of course, other HPC solutions, as summarized in Sect. 3, can be leveraged to support the two applications. For example, Huang et al. (2013b) used Cloud Computing to enable high resolution dust storm forecasting, and Li et al. (2013) leveraged hybrid GPU-CPU computing for model output visualization. The work presented in this chapter explores the feasibility of building a generalized framework that allows scientists to choose computing options for special domain problems. Results demonstrate that the applicability of different computing resources varies with the characteristics of computing resources as well as the features of the specific domain problems. Selecting the optimal computing solutions requires further investigation. While the proposed generalized HPC architecture is demonstrated to address the computational needs posed by dust storm modeling and visualization, the architecture is general and extensible to support applications that go beyond earth science.

References Al-Saidi, A., Walker, D. W., & Rana, O. F. (2012). On-demand transmission model for remote visualization using image-based rendering. Concurrency and Computation: Practice and Experience, 24(18), 2328–2345. Ayachit, U., Bauer, A., Geveci, B., O’Leary, P., Moreland, K., Fabian, N., et al. (2015). ParaView catalyst: Enabling in situ data analysis and visualization. In Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (pp. 25–29). ACM. Baillie, C., Michalakes, J., & Skålin, R. (1997). Regional weather modeling on parallel computers. New York: Elsevier.

266

Q. Huang et al.

Bandaragoda, C. J., Castronova, A., Istanbulluoglu, E., Strauch, R., Nudurupati, S. S., Phuong, J., et al. (2019). Enabling collaborative numerical Modeling in Earth sciences using Knowledge Infrastructure. Environmental Modelling & Software., 120, 104424. Bauer, A. C., Abbasi, H., Ahrens, J., Childs, H., Geveci, B., Klasky, S., et al. (2016). In situ methods, infrastructures, and applications on high performance computing platforms. Computer Graphics Forum, 35(3), 577–597. Bernholdt, D., Bharathi, S., Brown, D., Chanchio, K., Chen, M., Chervenak, A., et al. (2005). The earth system grid: Supporting the next generation of climate modeling research. Proceedings of the IEEE, 93(3), 485–495. Childs, H. (2012, October). VisIt: An end-user tool for visualizing and analyzing very large data. In High Performance Visualization-Enabling Extreme-Scale Scientific Insight (pp. 357–372). Collins, N., Theurich, G., Deluca, C., Suarez, M., Trayanov, A., Balaji, V., et al. (2005). Design and implementation of components in the Earth System Modeling Framework. International Journal of High Performance Computing Applications, 19(3), 341–350. Cosulschi, M., Cuzzocrea, A., & De Virgilio, R. (2013). Implementing BFS-based traversals of RDF graphs over MapReduce efficiently. In 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) (pp. 569–574). IEEE. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. Dennis, J. M., Vertenstein, M., Worley, P. H., Mirin, A. A., Craig, A. P., Jacob, R., et al. (2012). Computational performance of ultra-high-resolution capability in the Community Earth System Model. The International Journal of High Performance Computing Applications, 26(1), 5–16. Feng, K., Sun, X. H., Yang, X., & Zhou, S. (2018, September). SciDP: Support HPC and big data applications via integrated scientific data processing. In 2018 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 114–123). IEEE. Gropp, W. (2002). MPICH2: A new start for MPI implementations. In European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting (p. 7). Springer. Hawick, K. A., Coddington, P. D., & James, H. A. (2003). Distributed frameworks and parallel algorithms for processing large-scale geographic data. Parallel Computing, 29(10), 1297–1333. Hill, C., DeLuca, C., Balaji, S. M., & Ad, S. (2004). The architecture of the earth system modeling framework. Computing in Science & Engineering, 6(1), 18–28. Hoffman, F. M., Larson, J. W., Mills, R. T., Brooks, B.-G. J., Ganguly, A. R., Hargrove, W. W., et al. (2011). Data mining in Earth system science (DMESS 2011). Procedia Computer Science, 4, 1450–1455. Huang, Q., Li, J., & Li, Z. (2018). A geospatial hybrid cloud platform based on multi-sourced computing and model resources for geosciences. International Journal of Digital Earth, 11, 1184. Huang, Q., & Yang, C. (2011). Optimizing grid computing configuration and scheduling for geospatial analysis: An example with interpolating DEM. Computers & Geosciences, 37(2), 165–176. Huang, Q., Yang, C., Benedict, K., Chen, S., Rezgui, A., & Xie, J. (2013b). Utilize cloud computing to support dust storm forecasting. International Journal of Digital Earth, 6(4), 338–355. Huang, Q., Yang, C., Benedict, K., Rezgui, A., Xie, J., Xia, J., et al. (2013a). Using adaptively coupled models and high-performance computing for enabling the computability of dust storm forecasting. International Journal of Geographical Information Science, 27(4), 765–784. Huntington, J. L., Hegewisch, K. C., Daudert, B., Morton, C. G., Abatzoglou, J. T., McEvoy, D. J., et al. (2017). Climate Engine: Cloud computing and visualization of climate and remote sensing data for advanced natural resource monitoring and process understanding. Bulletin of the American Meteorological Society, 98(11), 2397–2410. Janjic, Z. (2003). A nonhydrostatic model based on a new approach. Meteorology and Atmospheric Physics, 82(1–4), 271–285. Janjić, Z. I. (1994). The step-mountain eta coordinate model: Further developments of the convection, viscous sublayer, and turbulence closure schemes. Monthly Weather Review, 122(5), 927–945.

14 Domain Application of High Performance Computing in Earth Science…

267

Jiang, H., Chen, Y., Qiao, Z., Weng, T.-H., & Li, K.-C. (2015). Scaling up MapReduce-based big data processing on multi-GPU systems. Cluster Computing, 18(1), 369–383. Kim, C. (2014). Theoretical analysis of constructing wavelet synopsis on partitioned data sets. Multimedia Tools and Applications, 74(7), 2417–2432. Li, J., Jiang, Y., Yang, C., Huang, Q., & Rice, M. (2013). Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs). Computers & Geosciences, 59, 78–89. Li, W., & Wang, S. (2017). PolarGlobe: A web-wide virtual globe system for visualizing multidimensional, time-varying, big climate data. International Journal of Geographical Information Science, 31(8), 1562–1582. Massonnet, F., Ménégoz, M., Acosta, M. C., Yepes-Arbós, X., Exarchou, E., & Doblas-Reyes, F. J. (2018). Reproducibility of an Earth System Model under a change in computing environment (No. UCL-Université Catholique de Louvain). Technical Report. Barcelona Supercomputing Center. Moreland, K., Sewell, C., Usher, W., Lo, L.-t., Meredith, J., Pugmire, D., et al. (2016). VTK-m: Accelerating the visualization toolkit for massively threaded architectures. IEEE Computer Graphics and Applications, 36(3), 48–58. Nickovic, S., Kallos, G., Papadopoulos, A., & Kakaliagou, O. (2001). A model for prediction of desert dust cycle in the atmosphere. Journal of Geophysical Research: Atmospheres, 106(D16), 18113–18129. Oeser, J., Bunge, H.-P., & Mohr, M. (2006). Cluster design in the Earth Sciences tethys. In International Conference on High Performance Computing and Communications. (pp. 31–40). Springer. Peters-Lidard, C. D., Houser, P. R., Tian, Y., Kumar, S. V., Geiger, J., Olden, S., et al. (2007). High-performance Earth system modeling with NASA/GSFC’s Land Information System. Innovations in Systems and Software Engineering, 3(3), 157–165. Prims, O. T., Castrillo, M., Acosta, M. C., Mula-Valls, O., Lorente, A. S., Serradell, K., et al. (2018). Finding, analysing and solving MPI communication bottlenecks in Earth System models. Journal of Computational Science, 36, 100864. Project TG. (2017). The Globus Project. Retrieved from http://www.globus.org Ramachandran, R., Lynnes, C., Bingham, A. W., & Quam, B. M. (2018). Enabling analytics in the cloud for earth science data. Ross, R., & Latham, R. (2006). PVFS: A parallel file system. In Proceedings of the 2006 ACM/ IEEE Conference on Supercomputing (p. 34). ACM. Schwan, P. (2003). Lustre: Building a file system for 1000-node clusters. In Proceedings of the 2003 Linux Symposium, vol. 2003. Shapiro, M., Shukla, J., Brunet, G., Nobre, C., Béland, M., Dole, R., et al. (2010). An earth-system prediction initiative for the twenty-first century. Bulletin of the American Meteorological Society, 91(10), 1377–1388. Tang, W., & Feng, W. (2017). Parallel map projection of vector-based big spatial data: Coupling cloud computing with graphics processing units. Computers, Environment and Urban Systems, 61, 187–197. Unidata. (2019). Unidata. Retrieved 14, August, 2019, from http://www.unidata.ucar.edu/software/ Vecchiola, C., Pandey, S., & Buyya, R. (2009, December). High-performance cloud computing: A view of scientific applications. In 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks (pp. 4–16), Kaohsiung, Taiwan, December 14–16. IEEE. Wang, S., & Liu, Y. (2009). TeraGrid GIScience gateway: Bridging cyberinfrastructure and GIScience. International Journal of Geographical Information Science, 23(5), 631–656. Williams, D. N., Ananthakrishnan, R., Bernholdt, D., Bharathi, S., Brown, D., Chen, M., et al. (2009). The earth system grid: Enabling access to multimodel climate simulation data. Bulletin of the American Meteorological Society, 90(2), 195–206.

268

Q. Huang et al.

Williams, D. N., Bremer, T., Doutriaux, C., Patchett, J., Williams, S., Shipman, G., et al. (2013). Ultrascale visualization of climate data. Computer, 46(9), 68–76. Xie, J., Yang, C., Zhou, B., & Huang, Q. (2010). High-performance computing for the simulation of dust storms. Computers, Environment and Urban Systems, 34(4), 278–290. Yang, C., Huang, Q., Li, Z., Liu, K., & Hu, F. (2017). Big Data and cloud computing: Innovation opportunities and challenges. International Journal of Digital Earth, 10(1), 13–53. Zhang, T., Li, J., Liu, Q., & Huang, Q. (2016). A cloud-enabled remote visualization tool for time- varying climate data analytics. Environmental Modelling & Software, 75, 513–518. Zhang, X., & Xu, F. (2013). Survey of research on big data storage. In 2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science (DCABES) (pp. 76–80). IEEE. Zhao, J., Tao, J., & Streit, A. (2014). Enabling collaborative MapReduce on the Cloud with a single-sign-on mechanism. Computing, 98, 55–72.

Part IV

Future of High Performance Computing for Geospatial Applications

Chapter 15

High Performance Computing for Geospatial Applications: A Prospective View Marc P. Armstrong

Abstract The pace of improvement in the performance of conventional computer hardware has slowed significantly during the past decade, largely as a consequence of reaching the physical limits of manufacturing processes. To offset this slowdown, new approaches to HPC are now undergoing rapid development. This chapter describes current work on the development of cutting-edge exascale computing systems that are intended to be in place in 2021 and then turns to address several other important developments in HPC, some of which are only in the early stage of development. Domain-specific heterogeneous processing approaches use hardware that is tailored to specific problem types. Neuromorphic systems are designed to mimic brain function and are well suited to machine learning. And then there is quantum computing, which is the subject of some controversy despite the enormous funding initiatives that are in place to ensure that systems continue to scale-up from current small demonstration systems. Keywords Heterogeneous processing · Neuromorphic computing · Quantum computing

1 Introduction Rapid gains in computing performance were sustained for several decades by several interacting forces, that, taken together, became widely known as Moore’s Law (Moore 1965). The basic idea is simple: transistor density would double approximately every 2 years. This increase in density yielded continual improvements in

M. P. Armstrong (*) Department of Geographical and Sustainability Sciences, The University of Iowa, Iowa City, IA, USA e-mail: [email protected] © Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5_15

271

272

M. P. Armstrong

processor performance even though it was long-known that the model could not be sustained indefinitely (Stone and Cocke 1991). And this has come to pass. Part of the reason for this performance fall-off is related to Dennard Scaling, which correlates transistor density with power consumption per transistor: for many years as density increased, power consumption per unit area remained near-constant. As described by Hennessy and Patterson (2019, p. 53) this relationship held up well until around 2007, when it began to fail, and fail big-time. This failure is a reason why clock frequencies have not continued to increase rapidly during the last decade. Instead, computer architects turned to the use of multiple cores to increase performance. This multicore path, however, raised other issues such as inefficiencies that arise as a consequence of ineffective thread exploitation, as well as failures in speculative branch prediction, which, in turn, causes wasted processing effort, and excessive power consumption. Indeed, the use of increasing numbers of cores push chips past their thermal design limits and requires selective power-gating approaches to prevent thermal failures (Perricone et al. 2018, p. 60). This has led to the problem of “dark silicon” in which cores only operate part-time. These problems paint a dire performance picture, but Hennessy and Patterson show a path forward, one that will require some re-thinking about programming and the use of novel architectural arrangements. This chapter describes the near-future state of exascale computing and then turns to a discussion of other ways to wrest improved performance out of both existing and experimental technologies including heterogeneous processing as well as neuromorphic and quantum computing.

2 The Pursuit of Exascale Systems There is now a race to construct an exascale system that will be able to provide a quintillion floating point calculations per second (Normile 2018). What is that? 1,000,000,000,000,000,000/second or a thousand petaflops. Exascale systems are viewed as strategic initiatives by several national governments that have made significant investments to support their development. While these investments can be considered to be part of ongoing support for scientific research, other rationales include national prestige and military defense (e.g., simulating nuclear explosions). This is an area of very rapid change and international intrigue, with major initiatives being pursued by the US, China and Japan. In the US, the Aurora Project is being funded by the Department of Energy to the tune of $500 million. Located at Argonne National Laboratory, Aurora will use a Cray Shasta architecture with Intel nodes built using a new generation of Xeon Scalable processors and a new Xe architecture for GPUs. Cray Slingshot communications technology provides Aurora with very high bandwidth communication: 25.6 Tb/s per switch, from 64,200 Gbs ports1.

www.cray.com/sites/default/files/Slingshot-The-Interconnect-for-the-Exascale-Era.pdf.

1

15 High Performance Computing for Geospatial Applications: A Prospective View

273

China had topped the “Top500” list of supercomputers for several years prior to being displaced by the US in 2018. The Chinese government is now funding three projects, at levels generally reported to be in the “several billion dollar” range, to help reach the goal of exascale performance levels2. Each project has constructed prototype systems as a proof-of-concept. It is notable that all Chinese high performance implementations since 2015 use domestically-sourced processors because of technology trade barriers put in place by the US government. This has given a substantial boost to the Chinese chip manufacturing industry. 1. The National Research Center of Parallel Computer (NRCPC) prototype uses Sunway SW26010 processors that are configured into 512 two-processor nodes connected by a locally-developed network. Though each processor has four quadrants with its own management core and 64 compute cores, to reach exascale performance, the system will need improved processors and a lot of them. 2. The Sugon system is heterogeneous and uses a Hygon clone of an AMD x86 processor. The system is constructed with nodes consisting of two Hygon processors and two data cache unit (DCU) prefetch accelerators that are connected by a 6D torus network. It was recently reported that Sugon was going to demonstrate a world-class machine (using an x86 processor and 4 AMD GPUs per blade) at a recent supercomputing conference, but was held back in order not to shed light on its prowess. That tactic did not work and Sugon has recently been denied access to US technologies by the US Department of Commerce3 after being placed on the so-called “entity list”. 3. The National University of Defense Technology (NUDT) is also building a heterogeneous system comprised of CPUs and digital signal processor chips that each have 218 cores. The system is configured using a series of “blades” containing eight CPUs and eight DSPs. Each blade will provide 96 teraflops and will be housed in a cabinet holding 218 blades; with 100 of these cabinets installed, a total of 1.29 peak exaflops will be possible. Not to be outdone, Japan is in the process of designing and implementing a new system with the name of Fugaku (a nickname for Mt. Fuji) that will be completed in 2021. Funded by the Japanese government at a level of approximately $1 billion through their RIKEN institute, Fugaku will be constructed using Fujitsu A64FX CPUs. Each chip has a total of 48 compute cores: four core memory group modules with 12 cores and a controller core, L2 cache and memory controller and other architectural enhancements4. These chips will be connected using Tofu (torus fusion) with six axes: X, Y, Z that vary globally according to the size of the system configuration while A, B, C are local (2 × 3 × 2) toroidal connections.

https://www.nextplatform.com/2019/05/02/china-fleshes-out-exascale-design-for-tianhe-3/. https://www.hpcwire.com/2019/06/26/sugon-placed-on-us-entity-list-after-strongshowing-at-isc/. 4 https://www.fujitsu.com/global/Images/post-k-supercomputer-development.pdf. 2 3

274

M. P. Armstrong

The search for processing supremacy will undoubtedly continue beyond the exascale milestone. It should also be noted that an exaflop level of performance is not meaningful without a software environment that enables programmers to harness the full potential of the hardware.

3 Coding Practices One well-known way to improve performance is to re-write software in more efficient code. Hennessy and Patterson (2019, p. 56) cite the example of a particular matrix multiplication code first written in Python, a scripting language that uses inefficient dynamic typing and storage management. The code was then re-written in C to improve cache hits and to exploit the use of multiple cores and other hardware extensions. This re-casting led to substantially increased performance. It should be noted, however, that improvements of this type are well-known. For example, early geospatial software was written partly in assembler to boost the performance of core computational tasks and MacDougall (1984, p. 137) also reports results for interpolation software that was written first in Basic and then in C. For the largest problem size reported, this language switch (from interpreted to compiled) reduced execution time from 747.4 to 5.9 min. Though such changes require significant programming intervention, it is clear that substantial performance improvements can be realized.

4 D omain-Specific Computing and Heterogeneous Processing Hennessy and Patterson (2019) also point to a more architecturally-centric approach to performance gains: domain-specific architectures. Their position represents a significant expansion of concepts that have a long history. Examples include math co- processors for early x86 chips, and graphics processing units (GPUs) that have their roots in the 1980s. Though floating-point units and graphics processing have now been incorporated into the functionality of commodity CPUs, a key aspect of domain specific computing is the identification of a particular type of architecture that is most germane to a specific application or problem class. Research is now being conducted to introduce new types of specialized hardware that can be applied to specific types of problems, and work is proceeding apace on the development of software tools that can be used to harness disparate architectures. Zahran (2019) provides a comprehensive look at the current state of the art in both the hardware and software used to implement these heterogeneous systems. Arguments were advanced decades ago in support of the use of domain-specific architectures and heterogeneous processing (Siegel et al. 1992; Freund and Siegel

15 High Performance Computing for Geospatial Applications: A Prospective View

275

1993). Siegel and associates describe a vision in which different code components are matched to architectures that are best suited to their execution. As Moore’s Law has faltered, interest has been renewed in a kind of neo-heterogeneity that has been further expanded to consider energy efficiency. And the view of heterogeneity now includes a number of new types of processing devices. • Field-programmable gate arrays (FPGA), are a type of integrated circuit that can be re-configured after it is manufactured (soft hardware). Intel, for example, has developed a commercial FPGA product called Stratix 10. It is also important to note that FPGAs normally consume far less power than CPUs or GPUs (Alonso 2018, p. 2). • Jouppi et al. (2018) describe a unique architecture that has developed to support the execution of deep neural networks (DNN). A tensor processing unit (TPU) is optimized to perform matrix multiplications and outperform CPUs and GPUs in both number of operations and energy efficiency. Part of this efficiency is achieved through the use of a single, large two dimensional multiply unit, the removal of features normally required by CPUs, and the use of eight-bit integers since DNN applications do not require high precision computation, but do require very large numbers of training cycles. In effect, a TPU is optimized to perform lots of inexpensive, low-precision matrix operations. • Intel is also developing an approach to supporting deep learning with an application specific integrated circuit (ASIC) product known as the Nervana Engine5. Having also rejected GPU approaches, Nervana has settled on 16 bit precision and each ASIC contains 6 links that enable it to be connected in a torus configuration to improve communication. A key aspect of training deep neural networks is data movement. Given the need for large numbers of training examples, to avoid data starvation the Nervana Engine provides 32 GB of storage with 8 terabits/second of bandwidth. • Corden (2019) describes the development of vectorization instructions in Intel processors. In the example provided (a double precision floating-point loop), a comparison is made among SSE (Stream SIMD Extensions), AVX (Advanced Vector Extensions) and the newest AVX-512 which increase the vector width to 512 bits. In processing the same loop, scalar mode produces one result, SSE produces two results, AVX produces four results, and AVX-512 produces eight results. This represents a substantial advance in throughput for those applications that can exploit this architecture. Figure 15.1 provides a simplified view of a collection of architectures that could comprise a heterogeneous system. Li et al. (2016) describe a much more tightly integrated approach to linking CPU and FPGA modes. Their CPU+[GPU, DSP, ASIC, FPGA] approach embeds heterogeneity either on chip or coupled with a high speed bus (Fig.15.2).

https://www.intel.ai/nervana-engine-delivers-deep-learning-at-ludicrous-speed/#gs.k0p7wf.

5

276

M. P. Armstrong

Fig. 15.1 A stylized view of a collection of heterogeneous processors accessing shared memory Fig. 15.2 Schematic architecture of an integrated heterogeneous processor incorporating a CPU and FPGA. (Figure based on Li et al. 2016)

While programming is hard, parallel programming is harder and writing efficient programs for heterogeneous environments is exceedingly difficult (Wang et al. 2018). Zahran (2016, p. 9) points to several measures of success against which heterogeneous software can be assessed: • Performance (as measured by speedup). • Scalability (number of homogeneous cores and across different architectures). • Reliability (as determined by graceful degradation with core reductions and transient faults). • Portability (across different architectures).

15 High Performance Computing for Geospatial Applications: A Prospective View

277

Siegel, Dietz and Antonio (Siegel et al. 1996, p. 237) describe several factors that must be considered when writing heterogeneous programs including matching a sub task to a specific machine architecture, the time to move data that is analyzed on different machines, and the overhead incurred in stopping a task and restarting it on a different machine. This entails problem decomposition into subtasks, assigning subtasks to machine types, coding subtasks for specific target machines, and scheduling the execution of subtasks. Because of this complexity, researchers have searched for ways to hide it by using abstraction. One view advanced by Singh (2011) is to develop language agnostic multi-target programming environments that are distantly related to the idea of device independence in computer graphics. In the abstract example presented, a data parallel program would be compiled and executed using different target architectures that would vary according to runtime performance and energy reduction requirements (Fig. 15.3). Reichenbach et al. (2018) describe an application that uses a library (LibHSA) that enables programmers to insert custom accelerators (e.g., to perform image processing with Sobel and Laplacian edge detectors) to accomplish tasks in heterogeneous environments. The library is compliant with an emerging standard model for data parallel processors and accelerators supported by the non-profit HSA Foundation that is working toward the goal of making it easier to program heterogeneous computing devices. The approach aims at reducing complexity and works with standard programming languages (e.g., Python) while abstracting low-level details to simplify the user view. It is worth noting that the HSA Foundation was founded by key hardware players such as AMD, ARM, Samsung, Texas Instruments and others (see: hasfoundation.com). The argument for domain specific computing advanced by Hennessey and Patterson is worth considering in a geospatial context. While it is unlikely that

Fig. 15.3 Compilation to multiple target architectures. CUDA is the Compute Unified Device Architecture developed by Nvidia for GPU devices; VHDL is Very High Speed Integrated Circuit Hardware Description Language which enables flexible configurations of FPGAs and other devices; SSE3 is Streaming SIMD Extensions, an evolving collection of low-level instructions that was developed by Intel. (Source: Based on Singh 2011, p. 3)

278

M. P. Armstrong

Table 15.1 Sequence of steps required to construct a location-allocation model for decision support in a heterogeneous environment Sequence 1 2 3 4 5

Operation type Shortest path Create strings Hillsman edit Vertex substitution Visualization

Time complexity O(n4) or less Sort: O(n2) or (n log n) Modify: O(n) Substitute: O(p∗(n-p)) Variable

Architecture GPU MIMD column sort MIMD edit each element GPU GPU

Note: Step 1 is due to Arlinghaus et al. (1990); step 4 is due to Lim and Ma (2013). Since vertex substitution is a heuristic method, it is often run numerous times to try to escape local optima

specific electronic devices will be developed specifically to support spatial data analysis, a la the TPU, there are good reasons for spatial middleware to closely align the specific characteristics of geospatial algorithms to particular types of hardware environments. For example, some geospatial problems lend themselves to SIMD architectures, while others are more naturally suited to MIMD architectures and then, of course there are data distribution issues that affect efficiency (e.g., Armstrong and Densham 1992; Marciano and Armstrong 1997; Cramer and Armstrong 1999). Densham and Armstrong (1994) sketched out a sequence of steps that would be required to run a vertex-substitution location-allocation model in a heterogeneous environment, though they did not advance it into implementation. Table 15.1 is an updated version of their sequence that includes new architectures and algorithms.

5 A Few Words on the Changing Nature of Storage At the present time, storage technology is undergoing a phase-shift from spinning disks to solid state drives (SSD) and this shift has become quite pronounced at the high end where custom made SSDs are now widely used in data centers. This technology is under considerable pressure given the pressing requirements caused by the multiple dimensions of big data (e.g., volume, velocity, and variety). It turns out, however, that for backup storage, tape continues to be an excellent choice in terms of density, capacity, durability and energy efficiency and there are also fruitful avenues that can be exploited to improve the performance of this medium (Greengard 2019). SSDs are designed with flash storage and a firmware controller that is responsible for managing the memory resource by doing low level tasks such as garbage collection and maintaining rough equality of writing to all storage areas to prolong the life of the device. Do et al. (2019) describe significant disruptive trends in SSD design. There is now a move away from the current bare bones approach that incorporates compute and software flexibility in the SSD, to a model in which each storage device has a greater number of cores and increased clock speeds, as well as increased flexibility provided by a more general purpose embedded operating

15 High Performance Computing for Geospatial Applications: A Prospective View

279

system that will support upgrades. By moving computation closer to the data, akin to edge computing, there are corresponding benefits that accrue in many applications, including improved bandwidth and deceased latency, both of which are extremely important to fast data applications in the IoT.

6 Neuromorphic Computing Neuromorphic computer systems are designed to emulate brain functions using spiking neural networks and are said to be superior to traditional DNNs because they are somewhat immune to a problem called catastrophic interference; when novel inputs are presented to a neuromorphic system, they can flexibly adapt to them and do not require complete re-training, as do DNNs. While this approach is in early stages of development and has been downplayed by some AI researchers, it is also clear that there are very large investments being made in the technology. For example, Intel’s product is called Loihi6 and it contains 128 neuromorphic cores. Intel has announced plans to construct a scalable system with 100 million neurons by 2020 (Moore 2019). It is also notable that the Loihi chip set consumes far less power than a GPU or CPU. IBM’s TrueNorth program is a strong competitor. Developed initially by simulating neuromorphic systems on conventional supercomputers such as BlueGene/Q, the simulation project eventually was instantiated in hardware. The current generation is configured in a hierarchical arrangement with 16 TrueNorth chips (one million neurons each) placed in a 4 × 4 grid connected by a custom I/O interface; four of these systems are used to create the NS16e-4 system with 64 million neurons and 16 billion synapses (DeBole et al. 2019). What is truly impressive is that this assemblage is extremely energy efficient, operating at approximately the same level as an incandescent light bulb (70 W). Even though a neuromorphic supercomputer and a data center are very different, there is a sharp contrast between 70 W and the megawatt power consumption of current data centers. Facebook, for example, has invested in a 138 megawatt wind farm to offset electrical grid power consumption at its Altoona, IA data center7. Application domains for neuromorphic computing include image recognition and feature extraction, as well as robotic and vehicle path planning. The TrueNorth NS16e-4 system, for example, is being used to detect and classify objects in high- definition aerial video at greater than 100 frames per second (DeBole et al. 2019, p. 25).

https://en.wikichip.org/wiki/intel/loihi. https://www.datacenterknowledge.com/archives/2014/11/14/facebook-launches -iowa-data-center-with-entirely-new-network-architecture. 6 7

280

M. P. Armstrong

7 Technology Jumps and the Quantum Quandary Though Hennessey and Patterson have painted a relatively grim picture about the future of high performance computing, other researchers have used a somewhat sunnier pallete. Denning and Lewis (2017) for example, conclude that Moore’s Law level of performance can be sustained as a consequence of assessing performance not at the chip level, but by expanding the assessment to include entire systems and communities. And perhaps more importantly, they suggest several ways in which jumps can occur to escape the inherent limiting embrace of Moore’s CMOS-based law and include alternative technologies with far greater promise of high performance. They are not alone in holding such views. The IEEE, for example, has a well-maintained web site8 concerned with the design and application of experimental approaches to computing. Among the most widely discussed technologies are: • • • •

Three-dimensional fabrication and packaging to escape planar form factors. Using spintronics for data representation and switch construction. DNA and biological computing. Quantum computing.

The last element in this list, quantum computing, has the potential to be a “game changer” by exponentially improving computational performance. Such performance increases have several important applications in the geospatial domain, chiefly in the areas of combinatorial spatial optimization and machine learning. How might this work? Traditional computer systems, at their root, use bits (0 or 1) and collections of bits (e.g. bytes) to represent values. A quantum computer uses a radically different approach, using quantum bits or qubits that can assume either values of 0 or 1, as well as both at the same time using the quantum feature of superpositioning. This means that qubits can inhabit all possible states that can be assumed by a traditional computer, thus enabling work in an exponentially larger problem space with vastly improved performance (NAS 2019, p. 2). As a consequence, there is considerable interest in advancing quantum computing capabilities, and a kind of arms race has ensued in part because a fully functioning quantum computer could rapidly (in polynomial time) factor integers and therefore decrypt public key encoded messages (e.g., RSA, Rivest et al. 1978). In support of these activities, the National Quantum Initiative Act9 was signed into law in 2018. The Act enables the Department of Energy to provide $1.25 billion to build interest in the business community, as well as to support the establishment of quantum computing national research centers. Similar types of funding initiatives are taking place in China, Japan and Europe. It is important to recognize, however, that there are many large barriers to overcome before a fully functional, large quantum computer is implemented. Some of

https://rebootingcomputing.ieee.org/. https://www.congress.gov/115/bills/hr6227/BILLS-115hr6227enr.xml.

8 9

15 High Performance Computing for Geospatial Applications: A Prospective View

281

these may take a decade or more to address, as described in a NAS Consensus Study Report ( 2019, pp. 2–11). • Unlike traditional binary computers that have large capacities for noise rejection, qubits can be any combination of one and zero, and as a consequence, quantum gates cannot reject even small amounts of noise, since it can be confused with an actual state. • Though quantum error correction can be used to reduce the effects of noise and yield error-corrected results, the amount of processing overhead required to achieve this goal is prohibitive. • The conversion of “normal” digital data into a quantum state is time consuming and would dominate processing time requirements in an Amdahl’s Law-like fashion; this is particularly problematic for geospatial big data applications. • New algorithms and a new software stack will be needed. • Debugging is exceedingly difficult. Intermediate states are not determinable since any measurement of a quantum state halts execution; when a register is read, superposition collapses. Because of these problems, a number of researchers are beginning to question the viability of the entire quantum enterprise (see, for example, Edwards 2019 and Vardi 2019). This skepticism is reinforced by the absence of a so-called virtuous cycle that has characterized developments in traditional integrated circuits during the past several decades: a voracious market generated large profits that were reinvested in research and development that then led to new products that created new revenue streams. In the absence of such a cycle, it could be many years before a viable quantum product is created. It should be noted, however, that the Quantum Initiative Act is intended to jump-start such a cycle. Despite these concerns, researchers are moving forward with work using the relatively small quantum devices that are now available. Though these devices are not able to handle practical applications, quantum computers will need to scale in order to be successful. One approach decomposes problems into smaller parts using classical computing systems and the sub-components are allocated to quantum processing. This approach incurs an overhead processing penalty and as described by Shaydulin et al. (2019, p. 20) two n-qubit computers would be less powerful and efficient than one 2n-qubit processor. However, if the quantum part of a complex problem can be effectively exploited, the penalty is minor, a kind of inverse Amdahl’s Law, as it were. DeBenedictis et al. (2018) outline a further set of requirements that need to be satisfied to reach the objective of quantum scalability. They begin by describing the slow original progress made by metal-oxide semiconductors and suggest that a similar path be developed for quantum systems, citing the work of DiVincenzo (2000) who lists several criteria, including: • Physical scalability with well-defined qubits. • A generalized quantum gate structure. • Decoherence times that are substantially greater than gate operation.

282

M. P. Armstrong

The first criterion is an echo of a Moore’s Law-like concept. The second is a paean to the march of technology, while the third may prove to be an intractable difficulty, though DeBenedictis et al. (2018) liken the problem to the quest to reduce defects in integrated circuits. Nevertheless, it is also important to recall that now unforeseen innovations can fundamentally alter the trajectory of technological developments.

8 Summary and Conclusion This chapter has provided an overview of several responses to the end of Dennard scaling as applied to the manufacture of conventional CMOS components. These responses become important when considering the substantial computational complexity of many geospatial methods, particularly when they are applied to increasingly large databases. For example, increases in the volume and velocity of data streamed from internet-connected devices will accelerate as the Internet of Things continues to infiltrate into routine daily activities (Armstrong et al. 2019, Jarr 2015). Though these new approaches to computing hold promise, they also present interesting challenges that will need to be addressed, in many cases, by inventing new algorithms and revising software stacks. The next generation of exascale systems will exploit conventional advances, but such systems are difficult to sustain and require enormous quantities of power to operate. And software that can efficiently exploit thousands of cores in a mixed CPU/GPU architecture is difficult to create. Instead of continuing along the same path, new approaches are being developed to match problems to specialized processing elements and to develop new classes of hardware that do not follow traditional architectural models. Advances are being made in the development of new heterogeneous systems that use conventional CPUs, but also deploy other architectures, such as GPUs, TPUs and FPGAs. Related progress is also being made on the development of programming models that are designed to enable better matches between architectures and model components, though at the present time, programmer intervention is required. Neuromorphic computing, which attempts to emulate cognitive functioning, is viewed by some with skepticism, though it is also clear that enormous investments in the development of hardware are being made by major component manufacturers. A key aspect of the approach is that it is designed to escape the brittleness of current deep neural networks that work well for a limited range of problems, but fail when they are presented with novel inputs. This fragility problem is described by Lewis and Denning (2018) and amplified by a collection of letters in response (see Communications of the Association for Computing Machinery, 08/2019, p. 9). Quantum computing remains a “known unknown” at the present. There is much promise, as well as tremendous hype, and quantum scaling seems to be accepted as inevitable by some. One article in the non-technical press sums it up: “Google believes it will reach ‘quantum supremacy’ – a stunt-like demonstration of a

15 High Performance Computing for Geospatial Applications: A Prospective View

283

machine’s superiority over a traditional computer – in the very near term. Chinese scientists say they’re on a similar timeline.” (Hackett 2019, p. 164). This is far from the pessimistic tone promulgated by the recent NAS report. But Hackett also quotes a researcher who advocates for patience, indicating that Sputnik was launched in 1957, but Neil Armstrong didn’t leap onto the Moon until 1969. And while the transistor was invented in 1947, the first integrated circuit wasn’t produced until 1958. So, it seems as if time will tell with quantum computing.

References Alonso, G. (2018). FPGAs in data centers. ACM Queue, 16(2), 52. Retrieved from https://queue. acm.org/detail.cfm?id=3231573 Arlinghaus, S. L., Arlinghaus, W. C., & Nystuen, J. D. (1990). The Hedetniemi matrix sum: An algorithm for shortest path and shortest distance. Geographical Analysis, 22(4), 351–360. Armstrong, M. P., & Densham, P. J. (1992). Domain decomposition for parallel processing of spatial problems. Computers, Environment and Urban Systems, 16(6), 497–513. Armstrong, M. P., Wang, S., & Zhang, Z. (2019). The Internet of Things and fast data streams: Prospects for geospatial data science in emerging information ecosystems. Cartography and Geographic Information Science, 46(1), 39–56. https://doi.org/10.1080/1523040 6.2018.1503973 Corden, M. (2019). Vectorization opportunities for improved performance with Intel® AVX -512: Examples of how Intel® compilers can vectorize and speed up loops. Retrieved from https:// techdecoded.intel.io/resources/vectorization-opportunities-for-improved-performance-withintel-avx-512/#gs.hom3s3 Cramer, B. E., & Armstrong, M. P. (1999). An evaluation of domain decomposition strategies for parallel spatial interpolation of surfaces. Geographical Analysis, 31(2), 148–168. DeBenedictis, E. P., Humble, T. S., & Gargini, P. A. (2018). Quantum computer scale-up. IEEE Computer, 51(10), 86–89. DeBole, M. V., Taba, B., Amir, A., Akopyan, F., Andreopoulos, A., Risk, W. P., et al. (2019). TrueNorth: Accelerating from zero to 64 million neurons in 10 years. IEEE Computer, 52(5), 20–29. Denning, P. J., & Lewis, T. G. (2017). Exponential laws of computing growth. Communications of the ACM, 60(1), 54–65. Densham, P. J., & Armstrong, M. P. (1994). A heterogeneous processing approach to spatial decision support systems. In T. C. Waugh & R. G. Healey (Eds.), Advances in GIS research (Vol. 1, pp. 29–45). London: Taylor and Francis Publishers. DiVincenzo, D. (2000). The physical implementation of quantum computation. Progress in Physics, 48.9(11), 771–783. Do, J., Sengupta, S., & Swanson, S. (2019). Programmable solid-state storage in future cloud datacenters. Communications of the ACM, 62(6), 54–62. Edwards, C. (2019). Questioning quantum. Communications of the ACM, 62(5), 15–17. Freund, R. F., & Siegel, H. J. (1993). Heterogeneous processing. IEEE Computer, 26(6), 13–17. Greengard, S. (2019). The future of data storage. Communications of the ACM, 62(4), 12–14. Hackett, R. (2019). Business bets on a quantum leap. Fortune, 179(6), 162–172. Hennessy, J. L., & Patterson, D. A. (2019). A new golden age for computer architecture. Communications of the ACM, 62(2), 48–60. Jarr, S. (2015). Fast data and the new enterprise data architecture. Sebastopol, CA: O’Reilly Media. Jouppi, N. P., Young, C., Patil, N., & Patterson, D. (2018). A domain-specific architecture for deep neural networks. Communications of the ACM, 61(9), 50–59.

284

M. P. Armstrong

Lewis, T. G., & Denning, P. J. (2018). Learning machine learning. Communications of the ACM, 61(12), 24–27. Li, Y., Zhao, X., & Cheng, T. (2016). Heterogeneous computing platform based on CPU+FPGA and working modes. In 12th International Conference on Computational Intelligence and Security (CIS). https://doi.org/10.1109/CIS.2016.0161 Lim, G. J., & Ma, L. (2013). GPU-based parallel vertex substitution algorithm for the p-median problem. Computers & Industrial Engineering, 64, 381–388. MacDougall, E. B. (1984). Surface mapping with weighted averages in a microcomputer. Spatial Algorithms for Processing Land Data with a Microcomputer, Lincoln Institute Monograph #84-2 Cambridge, MA: Lincoln Institute of Land Policy. Marciano, R. J., & Armstrong, M. P. (1997). On the use of parallel processing for interactive analysis of large GIS datasets: The effect of control point distribution on interpolation p erformance. Unpublished paper. (Paper was accepted for publication in a special issue of a journal. The special issue was never published). https://doi.org/10.17077/hbi4-la8x Moore, G. (1965). Cramming more components onto integrated circuits. Electronics, 38(8), 114–117. Moore, S. K. (2019). Intel’s neuromorphic system hits 8 million neurons, 100 million coming by 2020. IEEE Spectrum. Retrieved from https://spectrum.ieee.org/tech-talk/robotics/artificialintelligence/intels-neuromorphic-system-hits-8-million-neurons-100-million-coming-by-2020 NAS (National Academies of Sciences, Engineering and Medicine). (2019). Quantum computing: progress and prospects. Washington, DC: The National Academies Press. https://doi. org/10.17226/25196 Normile, D. (2018). Three Chinese teams join race to build the world’s fastest supercomputer. Science. Retrieved from https://www.sciencemag.org/news/2018/10/ three-chinese-teams-join-race-build-world-s-fastest-supercomputer Perricone, R., Hu, X. S., Nahas, J., & Niemer, M. (2018). Can beyond-CMOS devices illuminate dark silicon? Communications of the ACM, 61(9), 60–69. Reichenbach, M., Holzinger, P., Haublein, K., Lieske, T., Blinzer, P., & Fey, D. (2018). Heterogeneous computing using FPGAs. Journal of Signal Processing Systems, 91(7), 745. https://doi.org/10.1007/s11265-018-1382-7 Rivest, R., Shamir, A., & Adleman, L. (1978). A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2), 120–126. Shaydulin, R., Ushijima-Mwesigwa, H., Negre, C. F. A., Safro, I., Mniszewski, S. M., & Alexeev, Y. (2019). A hybrid approach for solving optimization problems on small quantum computers. IEEE Computer, 52(6), 18–26. Siegel, H. J., Armstrong, J. B., & Watson, D. W. (1992). Mapping computer-vision related tasks onto reconfigurable parallel processing systems. IEEE Computer, 25(2), 54–63. Siegel, H. J., Dietz, H. G., & Antonio, J. K. (1996). Software support for heterogeneous computing. ACM Computing Surveys, 28(1), 237–239. Singh, S. (2011). Computing without processors: Heterogeneous systems allow us to target programming to the appropriate environment. ACM Queue, 9(6), 50. Retrieved from https://queue. acm.org/detail.cfm?id=2000516 Stone, H. S., & Cocke, J. (1991). Computer architecture in the 1990s. IEEE Computer, 24(9), 30–38. Vardi, M. Y. (2019). Quantum hype and quantum skepticism. Communications of the ACM, 62(5), 7. Wang, S., Prakash, A., & Mitra, T. (2018). Software support for heterogeneous computing. In 2018 IEEE Computer Society Annual Symposium on VLSI. https://doi.org/10.1109/ ISVLSI.2018.00142 Zahran, M. (2016). Heterogeneous computing: Here to stay. ACM Queue, 14(6), 1–12. Retrieved from https://queue.acm.org/detail.cfm?id=3038873 Zahran, M. (2019). Heterogeneous computing: Hardware and software perspectives. New York: Association for Computing Machinery. https://doi.org/10.1145/3281649. Book #26.

Index

A Accelerator-based component model implementation, 179, 180 ACME Atmosphere model, 179 Activity-based modeling, 217, 218 Adaptive decomposition method, 60 Advanced computing technologies, 178 Advanced Vector Extensions (AVX), 275 Advanced Weather Interactive Processing System (AWIPS), 252 Agent-based models (ABMs), 213, 214, 217, 218, 221 advanced computing resources, 117 biophysical aspects, 117, 122 building blocks, 116 challenges, 117 cloud computing, 128, 129 high-performance and parallel computing, 129, 130 software engineering, 127, 128 code reusability and transparency, 117, 118 conceptual frameworks, 116 coupled aspects, 123, 124 cyberinfrastructure technologies, 117 functionality, 117 interconnected modules, 117 multiple domains, 116 real-world systems, 116 social aspects, 117, 122, 123 wide-spread application, 116 AI-based neural models, 180 AI-enabled in situ data analysis, 181

Amazon Elastic Cloud Computing (EC2), 259, 265 Amdahl’s Law, 12, 281 Apache Spark, 59, 63 Apache SparkSQL, 65 Application programming interfaces (APIs), 62, 161, 255 Application specific integrated circuit (ASIC), 275 Architecturally-centric approach, 274 Artificial immune system (AIS), 190 Artificial intelligence (AI) applications, 68 deep learning platforms, 68–69 definition, 68 GeoAI, 68 geospatial-oriented deep learning platforms, 68 Google Brain team, 68 object-based convolutional neural network, 68 tech giants, 68 Augmented reality (AR), 167 Aurora Project, 272 Automatic identification system (AIS), 4 advanced analysis, 228 data signal, 228 datasets, 229 GPU (see Graphics processing unit (GPU)) marine transportation analysis, 228 maritime transportation, 229 pattern analysis, 228, 229 spatiotemporal analysis, 228

© Springer Nature Switzerland AG 2020 W. Tang, S. Wang (eds.), High Performance Computing for Geospatial Applications, Geotechnologies and the Environment 23, https://doi.org/10.1007/978-3-030-47998-5

285

286 Automatic identification system (AIS) (cont.) tracking information, ships, 228 visualization, 228 web-based prototype system, 228 wireless communication system, 228 B Bayesian statistical methods, 11 Big data, 54 Biofuels, 107 Biomass-to-biofuel supply chain optimization, 107, 111 Biophysical aspects, 122 BlueGene/Q, 279 C Cartogram, 166 Cartographic mapping, 160 Cartography and geovisualization challenges big spatial data handling, 168 mapping and visualization, 168, 169 geographic system, 160 GIS and spatial analysis, 160 GPUs, 161 HPC API, 161 applications (see HPC applications, cartography and geovisualization) computational support, 161 resources, 161 interpretation and visualization, 160 maps, 159 Cellular automata (CA), 213, 214, 217 Centralized storage and pre-processing (CSP), 107 Central processing units (CPUs), 13, 161, 254 Ceph, 155 Choropleth mapping, 165 Cloud-based Cluster Computing, 265 Cloud-based HPC environment, 65 Cloud-based HPC platform, 252 Cloud computing, 20, 125, 126, 252, 265 Clustering analysis, 234, 235, 246 Code reusability and transparency (CRaT), 117 bibliometric analysis, 119 biophysical model, 118 flow charts, 119 mathematical expressions, 119 ODD protocol, 120, 121 OOP, 118–120 software/code reuse, 117

Index systematic approach, 117 workflow documents, 119 Coding practices, 274 Combinatorial optimization, 99 Community Atmosphere Model code, 180 Community Earth System Model (CESM), 175 Computational complexity Bayesian statistical methods, 11 centralization and de-centralization, 21 cloud computing, 20, 21 conventional SIMD approach, 16 CyberGIS, 19 cyberinfrastructure, 19 data elements, 10 data streams, 14 desktop systems, 14 distributed parallelism, 18, 19 distributed processing, 21 electronic devices, 21 Flynn’s taxonomy, 15 fundamental elements, 21 geospatial analyses, 9 geospatial methods, 11 high-performance networks, 10 horizontal scalability, 18, 19 horizontal scaling, 13 instruction streams, 14 integrity, 17 manufacturer-specific systems, 10 MCMC, 11 MISD category, 14 number of operations, 11 parallel processing, 16 performance evaluation, 12, 13 processors, 17 scalar architecture, 14 Silicon Graphics shared memory system, 17 spatial analysis methods, 10 spatial optimization problems, 10 spatial statistics application, 16 system components, 13 uniprocessor, 14 vector processing, 14 vertical scaling, 13 Computational intelligence models, 217 Computational modeling, 250 Compute Unified Device Architecture (CUDA), 62, 161, 163, 166, 179, 250, 277 Container-as-a-Service (CaaS), 128, 137, 140 Container orchestration, 139, 141, 143 Conventional supercomputers, 279

Index Convolutional autoencoders (CAE), 234, 235, 247 Convolutional neural networks (CNNs), 234 Coupled earth system development, 176, 177 Coupled human and natural systems (CHANS), 123 Coupler-enabled software architecture, 179 Coupling approaches for next-generation architectures (CANGA) project, 179 Cray Research, 14 Cyber-enabled geographic information systems (CyberGIS), 19, 63, 216 Cyberinfrastructure (CI), 19, 28, 160 capabilities, 124 challenge, 125 cloud computing, 125, 126 computational environment, 125 high-performance computing resources, 126, 127 implicit effect, 125 multiple processing, 125 parallel computing, 126, 127 power and connectivity, 124 D Dark silicon, 272 Data Analytics and Storage System (DAAS), 58 Data cache unit (DCU), 273 Data collection approaches, 66 Data handling algorithms, 168 Data mining, 230, 239, 240 Data server module, 143 Data storage/management data manipulation system, 56 HPC paradigms, 56 SDA, 57 SEA, 57 SNA, 57 Data structures, 58 Database management, 231, 232 Debugging, 281 Decomposition, 59 DeepLabv3+ neural networks, 181 DeepNetsForEO, 68 Deep neural networks (DNNs), 68, 275, 282 Dennard scaling, 12, 272, 282 Density-based region of interest (ROI), 234, 244, 245 Digital elevation models (DEM), 82 Direct3D, 161 Discrete global grid system (DGGS), 66, 67 Distributed computing, 254, 256 Distributed file systems (DFSs), 254

287 Distributed memory systems, 220 Docker, 139, 144 Domain decomposition, 84 abstraction, 59 buffering, 60 communications, 60 dependence, 60, 61 dimensions, 59 divide-and-conquer approach, 59 geospatial big data, HPC, 59 parallel operations, 59 process, 59 spatial, 60 subdomains, 59–60 temporal, 60 Domain-specific computing geospatial context, 277 GPUs, 274 identification, 274 Moore’s Law, 275 software tools, 274 Dust Regional Atmospheric Model (DREAM), 256 Dust storm modeling DREAM, 256 Eta weather prediction model, 256 model parallel implementation, 256 MPICH, 256 NMM-dust, 256 numerical weather prediction models, 257 parallel implementation, 257 performance evaluation, 258, 259 D-Wave system, 102–105, 111 E EarthDB, 64 Earth observation systems, 55 Earth Observing System Data and Information System (EOSDIS), 55 Earth science modeling, 251 Earth system grid, 251 Earth System Modeling Framework, 177, 251 Earth system models (ESM) carbon cycle, 176 CESM, 175 challenges (see Exascale computing era) coupled earth system development, 176, 177 E3SM, 175 flux coupler, 175 GCMs, 176 HPC, 175 in situ data analysis, 178 I/O issues, 177

288 Earth-system Prediction Initiative (EPI), 251 Economic systems, 122 Edge computing, 10, 181 Embarrassingly parallel computing, 84 Embedded operating system, 278–279 Energy Exascale Earth System Model (E3SM), 175 Environmental modeling, HPC cloud computing, 252 cluster/grid facilities, 252 earth science applications, 251 Earth System Modeling Framework project, 251 EPI, 251 grid computing, 251 Land Information System software, 252 multiple processes/threads, 251 parallel computing, 251, 252 Eta weather prediction model, 256 Exascale computing era accelerators, 179, 180 AI-enabled in situ data analysis, 181 coupling mechanism, 179 enhanced numerical simulations, AI, 180 heterogeneity, 178 Exascale system Aurora Project, 272 Chinese chip manufacturing industry, 273 conventional advances, 282 Fugaku, 273 proof-of-concept, 273 quintillion floating-point calculations, 272 strategic initiatives, 272 toroidal connections, 273 Extreme Science and Engineering Discovery Environment (XSEDE), 126 F Field-programmable gate arrays (FPGA), 275 Findable, Accessible, Interoperable and Reusable (FAIR), 138 Fisher–Jenks optimal data classification approach, 165 Fog computing aim, 67 challenges, 67 cloud computing, 67 description, 67 edge devices, 67 filtered data, 67 geospatial data processing, 67 HPC, 67

Index IoT, 67 middle computing layer, 67 real-time data processing, 67 FORTRAN-produced Atmospheric Chemical Kinetics model, 180 Framework (VTK-m), 252 Fugaku, 273 Function decomposition, 59 G General Parallel File System (GPFS), 254 General-purpose GPU programming frameworks, 161 General-purpose parallel programming platforms, 62, 63 Geographic Information Science (GIScience), 140 Geographic information systems (GIS), 78 Geographic model, 218 GeoLife GPS trajectory data, 164 GeoMesa SparkSQL, 65 Geoscience model simulations, 55 GeoServer, 145 GeoSpark SQL, 65 Geospatial algorithms, 278 Geospatial applications, 2–4 Geospatial artificial intelligence (GeoAI), 68 Geospatial big data definition, 54 4-Vs, 54 heterogeneity, 66 HPC, 54 intrinsic feature, 54 opportunities and challenges, 54, 69 role, 69 sources earth observation, 55 geoscience model simulations, 55 IoT, 55 VGI, 56 Geospatial-oriented platforms C++ library, 63 considerations, 63 CyberGIS, 63 GISolve Toolkit, 63 Google Earth Engine, 63 Hadoop, 63 Python-based library, 63 Geospatial service chaining, 65 Geospatial software, 274 Geospatial technologies, 213 GISandbox, 129

Index GISolve Toolkit, 63 GIS software packages, 167 GIS software platforms, 163 Global circulation models (GCMs), 176 Globus Grid Toolkit, 255 GlusterFS, 155 Google BigQuery GIS, 65 Google Brain team, 68 Google Earth Engine, 63 Google File System, 254 GPU-based Cloud Computing, 255 GPU-based parallel circular cartogram algorithm, 166 GPU computing technology, 255 GPU-CPU computing, 265 Graphics processing units (GPUs), 4, 16, 57, 161, 254, 274 aggregation, 246 clustering (k-Means), 241–243 CPUs, 229 database management, 231, 232 data filtering, 240 data mining, 230, 240 environments, 240 Gulf Intracoastal Waterway, 239 implementation, 238, 239 Linux-based, 240 machine learning, 230 OmniSci database, 230, 239 parallel computing techniques, 230 parallel processing, 229, 241 parallel querying processing, 230 pattern analysis clustering analysis, 234, 235, 246 maritime transportation, 234 parallel mining algorithms, 236, 237 ROI identification, 235, 236, 246 types, 234 workflow, 230 querying functions, 233, 234, 241 querying trajectory data, 230 ROI identification, 243 spatial and temporal selection, 245 speedup ratio, 240, 241 system architecture, 231, 232 video card, 229 visualization, 237 Windows-based, 240 workflow, 244, 246 Graphic-specific GPU programming support, 161 Grid Computing infrastructure, 255

289 H Hadoop, 63, 64, 69 Hadoop-based geospatial big data platforms, 61 Hadoop Cloud Computing infrastructure, 254 Hadoop distributed file system (HDFS), 57–59, 63, 254 Hadoop-like distributed computing platforms, 63 Harvard architecture, 13 Heat maps, 166 Heterogeneity, 66 Heterogeneous multicore systems, 178 Heterogeneous processing ASIC, 275 conventional CPUs, 282 DNN, 275 factors, 277 FPGA, 275 hardware players, 277 heterogeneous software, 276 integrated approach, 275, 276 language agnostic multi-target programming environments, 277 LibHSA, 277 location-allocation model, 278 multiple target architectures, 277 non-profit HSA Foundation, 277 parallel programming, 276 problem decomposition, 277 TPU, 275 vectorization instructions development, 275 Heuristic algorithms, 186–188 Heuristic search methods, 165 High-performance computing (HPC), 28 agent-based modeling, 3 application, 2, 251 cartographic mapping, 3 computational complexity, 2 computer architectures, 2 computing paradigms, 250 CyberGIS, 2, 4 cyberinfrastructure, 1, 3 distributed computing, 250 domain decomposition, 2 earth system models, 3 geophysical and tectonic modeling cluster, 250 geospatial applications, 2–4 geospatial big data, 2 GeoWebSwarm, 3 land use planning, 4

Index

290 High-performance computing (HPC) (cont.) models and analysis, 250 parallel algorithms, 4 Pareto-based optimization models, 3 quantum computing, 2, 4 sensor technologies, 4 SLUA (see Spatial land use allocation (SLUA)) spatial modeling, 2, 3 spatial models, 3 technologies, 250 urban spatial modeling (see Urban spatial modeling) urban system model, 4 HiveSQL, 65 HPC applications, cartographic mapping cartographic generalization circle growth algorithm, 165 conditions, 164 definition, 164 MapReduce, 165 multi-scale visual curvature algorithm, 164 spatial dataset, 164 desktop-based environments, 163 mapping methods, 165–166 map projection, 163–164 map rendering, 165–166 2D/3D mapping, 162 HPC architecture, earth science application layer, 253 computing infrastructure, 254, 255 data storage and file system, 254 layers, 253 programming model, 255, 256 HPC-based spatial data processing, 67 HPC-enabled geospatial big data handling components data storage/management, 56–57 domain decomposition, 59–61 query processing, 63–65 spatial indexing, 57–59 task scheduling, 61–62 workflow-based systems, 63–65 HPC-enabled geospatial big data handling platforms general-purpose platforms, 62, 63 geospatial-oriented platforms, 63–65 programming models and languages, 62 HPC facilities, 252 HPC visualization interactive, 250 paradigm, 250 requirement, 250

HTCondor, 62 Human decision-makers, 123 Hybrid local-cloud infrastructure, 255 I IBM’s TrueNorth program, 279 IDW Spatial interpolation algorithm, 60 Initialization algorithm, 193 In-memory distributed computing framework, 59 In situ data analysis, 178 Integer programming (IP), 104 Integrated Data View (IDV), 252 Integrated land, 216, 217 Intergovernmental Panel on Climate Change (IPCC), 55 International Journal of Geographical Information Science (IJGIS), 215 International Maritime Organization (IMO), 229 Internet of Things (IoT), 55, 282 Intra-urban scale, 212 Inverse-distance-weighted (IDW), 60 IPCC Fifth Assessment Report (AR5), 55 Irregular decomposition, 60 Ising model, 102, 103, 105 J Java-based visual analytics tools, 252 JavaScript based tools, 260 K KD-tree, 58 Kendall Square Research, 17 Kernel density analysis, 61 k-means clustering, 235, 241, 243 L Land Information System software, 252 Landsat archive, 55 Landscape fragmentation analysis, 81 Land use allocation, see Spatial land use allocation (SLUA) Land use suitability, 192 Lawrence Livermore National Laboratory (LLNL), 252 Legacy Software Refactoring, 180 Library (LibHSA), 277 Light detection and ranging (LiDAR), 59–62 Linear speedup, 86

Index Load balancing algorithm, 61, 142 Location-allocation model, 278 Loihi, 279 LSTM-based approach, 180 Lustre, 254 M Machine learning, 128, 230, 231, 234, 240 MapObjects Internet Map Server, 138 Mapping methods cartogram, 166 choropleth mapping, 165 general reference mapping, 165 heat maps, 166 heuristic search methods, 165 parallel computing, 165, 166 parallel data classification, 165 Python libraries, 165 thematic mapping, 165 Map projection cloud-based virtual machines, 164 GPUs, 163 HPC cluster, 163 LiDAR dataset, 164 parallel computing, 163 reprojection, 163 spatial datasets, 163 steps, 163 supercomputers, 164 transformation, 163 Map rendering AR and VR technologies, 167 cartographic mapping, 166 GIS software, 167 GPU-enabled rendering, 167 parallel computing, 167 pre-rendering, 166 programmable rendering pipelines, 167 2D and 3D spatial data, 166 web-based visualization, 167 MapReduce, 29, 63, 250, 254, 256 Map reprojection, 163 Marine transportation analysis, 228 Maritime transportation, 229, 234 Markov Chain Monte Carlo (MCMC), 11 Massive data visualization, HPC analysis and decision making, 252 cloud computing facilities, 253 earth science domain, 252 spatiotemporal data, 253 techniques and tools, 252 Unidata program center, 252 visual analytics tools, 252

291 Massive model output visualization EC2, 259 JavaScript based tools, 260 methods, 259 parallel implementation, 260, 261 ParaView, 259 performance evaluation Amazon EC2 service, 261 connection time, 262 datasets, 262 locations, 262 multidimensional visualization, 262 multiple cores, 262 network speeds, 262 observations, 262 pipeline network, 261 ray casting algorithm, 261 unique time allocation pattern, 263 recommendation, 264, 265 test datasets, 260 tightly-coupled cloud-enabled remote visualization system, 259 Message-passing interface (MPI), 57, 62, 63, 220, 221, 250 Message-passing parallel algorithm, 196, 200, 201 Metropolis–Hastings approach, 11 Microsimulation, 217, 218 ModelE, 55 Model sharing, 116, 120, 130 Modern-Era Retrospective analysis for Research and Applications (MERRA), 59 Moore’s CMOS-based law, 280 Moore’s Law, 12, 22, 271, 275, 280 MPI-based parallel map projection software, 164 MPI programming model, 256 Multi-node scaling, 139 Multiobjective AIS algorithm encoding, 190, 192–194 heuristic algorithm, 190 initialization, 190–194 mutation, 194, 195 Multiobjective heuristic optimization algorithms, 186 Multiobjective optimization AIS, 190, 191 definition, 189 Pareto-based optimization algorithms, 190 Pareto optimal set, 190 Multithreading parallelization, 255 Mutation, 194, 195

292 N National Aeronautics and Space Administration (NASA), 58 National Quantum Initiative Act, 280 National Research Center of Parallel Computer (NRCPC), 273 National University of Defense Technology (NUDT), 273 Nervana Engine, 275 NetCDF, 177, 261 NetLogo ABM Workflow System (NAWS), 125 NetLogo Web, 126 Network file system (NFS), 145, 254 Network of Workstations (NOW) project, 18 Neuromorphic computing application domains, 279 brain functions, 279 cognitive functioning, 282 custom I/O interface, 279 DNNs, 279, 282 IBM’s TrueNorth program, 279 Loihi, 279 megawatt power consumption, 279 NMM-dust model, 258–260 Nondominated sorting genetic algorithm II (NSGA- II), 187 Non-Hydrostatic Icosahedral Model, 180 Non-hydrostatic mesoscale model (NMM), 256, 257 Non-profit HSA Foundation, 277 North Carolina Floodplain Mapping Program, 164 Not only SQL (NoSQL) databases, 57 NP-hard quadratic assignment model, 107 Numerical analysis, 250 NVIDIA’s CUDA, 178 O Oak Ridge National Laboratory (ORNL), 181 Object-oriented paradigm (OOP), 119, 120 Ocean Atmosphere Sea Ice Soil coupler version 3 (OASIS3), 177 OmniSci, 232 OpenACC, 180 Open Computing Language (OpenCL), 161 Open Geospatial Consortium (OGC), 65, 138 OpenStreetMap (OSM), 146 Operational modeling approach, 218 Optimization model, 103 Overview-Design Concepts-Detail (ODD) protocol, 119 Oyo Imperial Period, 90

Index P Parallel algorithms, 99, 196 Parallel cartographic modeling language (PCML), 63 Parallel computing, 32, 62, 250, 255 algorithms, 161 technology, 188 Parallel interactive visualization, 162 Parallelization strategy, 57, 196–198, 220 Parallel landscape visibility analysis archaeology, 80 characteristics, 80 computational algorithms, 79 computational experiments, 93 compute- and data-intensity challenge, 81 computing performance, 90–92, 94 decomposition, 80 fragmentation analysis, 89, 90, 93 GIS, 78 intervisibility factors, 78 landscape fragmentation analysis, 81 master processor, 80 methods computing performance, 86 domain decomposition, 84 fragmentation analysis, 84–86 implementation, 87 viewshed analysis, 84–86 workflow, 83 pipelining divides, 79 social and cognitive factors, 78 software system, 80 spatial resolution, 93 supercomputer/parallel computing, 79 viewshed analysis, 79–81, 87, 92, 93 visibility patterns, 93 Parallel mining algorithms, 236, 237 Parallel processing, 229, 232, 236, 241, 243 Parallel programming, 250, 276 Parallel urban models agent-based approach, 218 cluster computing, 220 communications, 220 computing elements, 220 distributed memory systems, 220 geographic models, 218, 221 hyper-models, 218 implementation, 218 message-passing paradigm, 220 MPI, 220, 221 parallel computing strategies, 220 parallelization, 219, 220 shared-memory paradigm, 220, 221 shared-memory systems, 220

Index structure, 219 urban dynamics and complexity, 218 utility level, 219 Parallel Virtual File System (PVFS), 254 Parallel Virtual Machine, 62 Parallel visualization software platforms, 162, 168 Parallel web services, 141, 142 ParaView, 259, 260 Pareto-based multiobjective algorithms, 186 Pareto-based optimization heuristic models, 187 multiobjective SLUA problems, 189 Pareto frontier, 186 scalarization, 187 SLUA (see Spatial land use allocation (SLUA)) Path-integral Monte Carlo method, 106, 111 Pattern analysis, 229 Performance metrics average leaf node depth, 42 average leaf node size, 42 cut cylinders, 40, 41 distribution, 40 execution time, 40 P-median model, 104–106 PostGIS, 64 Pre-rendering, 166 Python, 274 Q Quadratic unconstrained binary optimization (QUBO) model, 103–105 Quadtree-based approach, 60 Quantum annealing (QA) binary bit, 100 biomass-to-biofuel supply chain optimization, 107 computational paradigm, 101 computational results, 108–110 D-Wave system, 102, 103 hardware, 101, 102 Ising model, 102, 103 path-integral Monte Carlo method, 106 physical qubits, 104 p-median model, 104–106 pseudo code, 108 quadratic assignment model, 107 quantum computing approach, 100 quantum tunneling, 101 superposition, 100

293 Quantum computing, 99–102, 110, 111 barriers, 280, 281 bits/qubits, 280 capabilities, 280 computational performance, 280 criteria, 281–282 enterprise, 281 integrated circuit, 283 National Quantum Initiative Act, 280 national research centers, 280 n-qubit computers, 281 problem decomposition, 281 Quantum Initiative Act, 281 quantum scaling, 282 scalability, 281 Quantum error correction, 281 Quantum Initiative Act, 281 Quantum processing units (QPUs), 111 Quantum supremacy, 282 Quantum tunneling, 101 Querying functions, 233, 234 Query processing extended-SQL, 64 HiveSQL, 65 HPC environment, 64 PostGIS, 64 programming/scripting, 63 query analytic framework, 64 raster data processing, 64 Spark platform, 64 SQL-based, 64 systems, 65 R Radio frequency identification (RFID), 55 RasDaMan, 64 Raster-based models, 192 Raster data processing, 64 Raw AIS data files, 233 Ray casting algorithm, 261 Redundant array of independent disks (RAID), 56 Remote sensing, 55 Renewable energy, 107 ROI identification, 235, 236, 243, 246 R-tree, 58 S Scalable visual analytical system, 64 Scalarization methods, 186, 187, 189 Science Gateway for Geographic Information Science, 129

294 Scientific data processing (SciDP), 254 Shared-disk architecture (SDA), 56, 57, 62 Shared-everything architecture (SEA), 56 Shared-memory parallel algorithm, 196, 203, 204 Shared-nothing architecture (SNA), 56, 57 Shor’s quantum factoring algorithm, 99 Shuttle Radar Topography Mission (SRTM), 82 Simulated annealing algorithm, 107–109 Single-image raster maps, 138 Single instruction and single data streams (SISD), 14 Smart proxy models, 180 SNA, de-facto big data storage architecture, 57 Social aspects, 122, 123 Social-ecological systems (SES), 123, 124 Social media platforms, 56 Solid state drives (SSDs), 278 Space-time, 30–32, 47 Spatial analysis and modeling (SAM), 215 Spatial and temporal buffering, 60 Spatial decomposition, 60 SpatialHadoop, 58 Spatial heterogeneity, 85 Spatial indexing concurrent spatial data visits, 58 data structures, 58 native formats, 58 SIA, 58 spatiotemporal indexing, 58 tile-based, 59 use, 57 Spatial interaction model, 212 Spatial land use allocation (SLUA) Anyue County, Sichuan Province, 198, 199 cloning rate, 204, 205 combinatorial optimization problem, 191 heuristic algorithms, 186 heuristic models, 187, 188 high-dimensional nonlinear problems, 186 high-performance computing, 188, 189 land use data, 198, 200 land use efficiency, 186 land use patterns, 201, 202 land use suitability, 192, 201 land use types, 192, 200 message-passing parallel algorithm, 200, 201, 203, 204 multiobjective AIS, 192–195 multiobjective heuristic optimization algorithms, 186 multiobjective optimization, 188–191 multiobjective trade-off analysis, 187

Index parallel algorithm, 188 parallel computing technology, 188 parallelization strategy, 196–198 Pareto-based multiobjective algorithms, 186 Pareto-based optimization methods, 186, 187 proposed model, 199 protecting land resources, 186 scalarization, 186, 187, 189 shared-memory parallel algorithm, 203, 204 single objection function, 186 spatial land use units, 186 speedup and efficiency, parallel algorithms, 203 supercomputer, 199 trade-off analysis, 189 Spatial optimization challenges, 99 computational approaches, 98 computational complexity theory, 99 definition, 98 high-performance computing, 100 mathematical models, 98 models, 98 NP-hard problems, 99 optimal solution, 98, 99 parallel algorithms, 99 performance, classical computer, 99 potential solutions, 99 QA (see Quantum annealing (QA)) quantum computing, 99 Spatial pattern, 85 Spatiotemporal analysis, 228 Spatiotemporal complexity, 123 Spatiotemporal dependence, 61 Spatiotemporal domain decomposition method (ST-FLEX-D), 2 bandwidths, 43, 44 computational intensity, 28, 29 data, 30, 31 data collection, 28 data-driven methods, 46 divide-and-conquer strategy, 29, 30, 45 domain experts, 46 edge effects, 30 efficiency, 30 evolution of technologies, 28 fine-grained decomposition, 46 geospatial applications, 28 GIScience, 28, 29 load balancing, 29 MapReduce, 29

Index methods decomposition parameters, 37–40 performance metrics and sensitivity, 36, 37, 39 ST-FLEX-D, 34, 35 ST-STATIC-D, 32, 33 parallel algorithms and strategies, 29 parallel processing, 45 processors, 30 shared memory, 29 simulation approaches, 28 spatial and temporal grid resolutions, 46 stream, 28 Spatiotemporal indexing approach (SIA), 58 Speculative branch prediction, 272 Stream SIMD Extensions (SSE), 275 Strength Pareto evolutionary algorithm 2 (SPEA-2), 187 Sugon system, 273 Superposition, 100 System architecture, 231, 232 T Task scheduler, 62 Task scheduling data locality definition, 62 importance, 62 LiDAR, 62 requirement, 62 shared-disk architecture, 62 task scheduler, 62 definition, 61 geospatial big data processing aspects, 61 HPC programming paradigms/ platforms, 61 load balancing aim, 61 algorithm, 61 big data processing platforms, 61 cloud-based HPC environment, 62 grid computing environment, 61 parallel computing, 61 Temporal decomposition, 60 Tensor processing unit (TPU), 275 Tile-based spatial index, 59 Tobler’s First Law, 18 Trade-off analysis, 189 Traditional binary computers, 281 Traditional coordinate systems, 66 TRANSIMS, 218 Transport modeling, 216, 217 TrueNorth NS16e-4 system, 279

295 U Ultrascale Visualization Climate Data Analysis Tools (UV-CDAT), 252 Unidata program center, 252 Unified Memory and Storage Space (UNITY), 179 Unified Modeling Language (UML), 119 Urban dynamics and complexity, 213 ABM, 213 activity-based approach, 214 bottom-up models, 213 CA, 213 decision-making process, 213 disaggregate data, 214 dynamic behavior, 213 dynamic model, 213 microsimulation, 214 nonlinear growth, urban systems, 212 practical applications, 214 spatial interaction model, 212 Urban modeling computational technologies, 212 policy-making, 212 urban dynamics and complexity, 212–214 Urban spatial modeling activity-based modeling, 217, 218 computational human geography, 215 computational intelligence models, 217 CyberGIS, 216 cyberinfrastructure, 215 HPC, 215 IJGIS, 215 integrated land, 216, 217 microsimulation, 217, 218 parallel process, 215 transport modeling, 216, 217 Urban systems parallel models (see Parallel urban models) urban dynamics and complexity, 212–214 Urbanization, 212 US Department of Energy (DOE), 175–176 User-defined functions (UDFs), 65 US National Institute of Standards and Technology, 20 V Variety, 54 5-V characteristics, 54 Vector-based models, 192 Vectorization instructions, 275 Velocity, 54 Veracity, 54 “Video card”, 229

Index

296 Viewshed analysis, 78–81, 84 Virtual machines (VMs), 125, 139 Virtual reality (VR), 167 Virtuous cycle, 281 Visualization, 228, 237 Volume, 54 Volunteered geographic information (VGI), 56, 138 W Web-based interface, 126 Web-based model builder, 239 Web-based prototype system, 228 Web Coverage Service (WCS), 65, 138 Web Feature Service (WFS), 65, 138 Web GIS framework big data, 136, 153 CaaS, 137 capability, 154 client-side user activities, 147 cloud computing technologies, 137 concurrent capacity, 151 configuration and maintenance, 153 container technologies, 139, 140 cyberinfrastructure technologies, 136 data access, 155 distributed file system approach, 155 distribution of transaction time, 152 domains, 136 efficiency, 150, 151 experiment configuration, 148, 149 frequency distribution, 154 hardware configuration and data, 146, 147 high-performance computing, 136, 137, 154 histograms of transaction times, 153 implementation, 144–146 jMeter, 150 methods container orchestration module, 143 data server module, 143

GeoWebSwarm, 140 high-throughput cluster, 140 high-throughput computing, 140 load balancing, 142 parallel web services, 141, 142 network access, 155 number of transactions, 152 OpenStreetMap, 138 performance measures, 150 performance metrics, 148–150 software packages, 137 spatial data, 155 and information, 153 and maps, 136 spatial database approach, 155 spatial information, 136 variety of organizations, 138 velocity, 136 virtual machines, 137, 153 web-based computing environments, 136 Xerox PARC Map Viewer, 138 WebGL-based heat map visualization approach, 166 Web Graphic Library (WebGL), 161, 163, 166 Web Mapping Service (WMS), 65 Web Map Service (WMS), 138, 145 Web Map Tile Service (WMTS), 138, 145, 147 Web Processing Service (WPS), 65, 138 Workflow-based systems cloud-based HPC, 65 distributed computing environment, 65 geospatial big data processing, 65, 66 geospatial service chaining, 65 pipeline, 65 X XSEDE cyberinfrastructure, 164 XSEDE supercomputers, 164