Spatial Synthesis: Computational Social Science and Humanities [1st ed.] 9783030527334, 9783030527341

This book describes how powerful computing technology, emerging big and open data sources, and theoretical perspectives

224 11 20MB

English Pages XV, 454 [450] Year 2020

Table of contents :
Front Matter ....Pages i-xv
Front Matter ....Pages 1-1
Foreword I: Charting Computational Social Science from a Spatial Perspective (Michael Batty)....Pages 3-5
Foreword II: Convergence and Synthesis (Michael F. Goodchild)....Pages 7-9
Front Matter ....Pages 11-11
The China Family Tree Geographic Information System (Di Hu, Xinghua Cheng, Guonian Lü, Yongning Wen, Min Chen)....Pages 13-37
GIS for Chinese History Research (Yifan Lu, Ping Zhang)....Pages 39-51
Digital Historical Yellow River (Wei Pan, Rao-rao Su, Zhi-min Man, Li-jie Zhang, Mi-mi He, Li-kun Han)....Pages 53-63
Visualizing Classic Chinese Literature (Yongming Xu)....Pages 65-76
Quantifying Spatial Variation in Aggregate Cultural Tolerance (Hongwei Xu)....Pages 77-96
Conservation of Cave-dwelling Village using Cultural Landscape Gene Theory (Anrong Dang, Dongmei Zhao, Yang Chen, Congwei Wang)....Pages 97-105
Digitalized Enka-Style Taipei (C. S. Stone Shih)....Pages 107-122
Front Matter ....Pages 123-123
Research Progress on Spatial Demography (Hengyu Gu, Xin Lao, Tiyan Shen)....Pages 125-145
Complex Network Theory on High-Speed Transportation Systems (Haoran Yang, Yongling Li)....Pages 147-162
Economic Impact Analysis for an Energy Efficient Home Improvement Program (Qisheng Pan)....Pages 163-179
Exploring the Dynamics of Carbon Emission in China via Spatial-Temporal Analysis (Jin Zhang, Jinkai Li, Xiaotian Wang)....Pages 181-198
Spatial Visualization and Analysis of the Development of High-Paid Enterprises in the Yangtze River Delta (RenZhou Gui, Tongjie Chen, Zhiqiang Wu)....Pages 199-220
High Performance Spatiotemporal Visual Analytics Technologies and Its Applications in Big Socioeconomic Data Analysis (Zhipeng Gui, Yuan Wang, Fa Li, Siyu Tian, Dehua Peng, Zousen Cui)....Pages 221-255
Demystifying the Inequality in Urbanization in China Through the Lens of Land Use (Jinlong Gao, Jianglong Chen)....Pages 257-283
Analyzing Spatial Patterns of Intergenerational Education Mobility in China (Kun Qin, Ping Luo, Binbin Lu, Zeng Lin)....Pages 285-301
Can Social Media Rescue Child Beggars? (Xining Yang, Daniel Z. Sui)....Pages 303-321
Front Matter ....Pages 323-323
Spoofing in Geography: Can We Trust Artificial Intelligence to Manage Geospatial Data? (Bo Zhao, Shaozeng Zhang, Chunxu Xu, Xiaobai Liu)....Pages 325-338
A Complex-Network Perspective on Alexander’s Wholeness (Bin Jiang)....Pages 339-354
Spatial-Temporal Behavior Analysis in Urban China (Suhong Zhou, Yinong Peng)....Pages 355-376
Studies on Tourists’ City Space Images (Jun Gao, Jianyu Ma, Jie Li, Liangxu Wang)....Pages 377-398
Accessibility of Residential Houses to Community Facilities (Guoqiang Shen)....Pages 399-411
Uncovering Online Sharing Vehicle Mobility Patterns from Massive GPS Trajectories (Wei Tu, Cui Wei, Tianhong Zhao, Qiuping Li, Chen Zhong, Qingquan Li)....Pages 413-429
Application of Eye-Tracking Technology in Humanities, Social Sciences and Geospatial Cognition (Shulei Zheng, Yufen Chen, Chengshun Wang)....Pages 431-448
Front Matter ....Pages 449-449
Prospects of Spatial Synthesis in Computational Social Science and Humanities: Towards a Spatial Synthetics and Synthetic Geography (Daniel Z. Sui)....Pages 451-454

Recommend Papers

Spatial Synthesis: Computational Social Science and Humanities 3030527336, 9783030527334

This book describes how powerful computing technology, emerging big and open data sources, and theoretical perspectives

281 67 20MB Read more

Pathways Between Social Science and Computational Social Science: Theories, Methods, and Interpretations (Computational Social Sciences) 3030549356, 9783030549350

This volume shows that the emergence of computational social science (CSS) is an endogenous response to problems from wi

112 69 6MB Read more

Recent Developments in Spatial Analysis: Spatial Statistics, Behavioural Modelling, and Computational Intelligence (Advances in Spatial Science) 3540631801, 9783540631804

In recent years, spatial analysis has become an increasingly active field, as evidenced by the establishment of educatio

112 4 4MB Read more

Tourism in the USA : A Spatial and Social Synthesis [1 ed.] 9780203864654, 9780415956840

The United States continues to provide opportunities for travel and tourism to domestic and international travellers. Th

135 118 4MB Read more

Ecological Exile: Spatial Injustice and Environmental Humanities 2017027086, 9781138189683, 9781315641478

358 114 4MB Read more

Geographical Data Science and Spatial Data Analysis: An Introduction in R (Spatial Analytics and GIS) 1st Edition

623 135 6MB Read more

Understanding Technological Systems (Synthesis Lectures on Engineering, Science, and Technology) [1st ed. 2024] 3031454405, 9783031454400

This book is about understanding technology using the perspective of systems. It addresses the need for an accessible ap

114 15 13MB Read more

Handbook of Computational Social Science for Policy 303116623X, 9783031166235

This open access handbook describes foundational issues, methodological approaches and examples on how to analyse and mo

103 10 7MB Read more

Carbon nanotube science: synthesis, properties and applications [1st ed.] 0521828953, 9780521828956

Carbon nanotubes represent one of the most exciting research areas in modern science. These molecular-scale carbon tubes

392 87 9MB Read more

Computational and Analytic Methods in Science and Engineering [1st ed.] 9783030481858, 9783030481865

This contributed volume collects papers presented at a special session of the conference Computational and Mathematical

358 17 14MB Read more

Author / Uploaded
Xinyue Ye
Hui Lin

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Human Dynamics in Smart Cities Series Editors: Shih‐Lung Shaw · Daniel Sui

Xinyue Ye Hui Lin Editors

Spatial Synthesis Computational Social Science and Humanities

Human Dynamics in Smart Cities Series Editors Shih-Lung Shaw, Department of Geography, University of Tennessee, Knoxville, TN, USA Daniel Sui, Department of Geography, Ohio State University, Columbus, OH, USA

This series covers advances in information and communication technology (ICT), mobile technology, and location-aware technology and ways in which they have fundamentally changed how social, political, economic and transportation systems work in today’s globally connected world. These changes have raised many exciting research questions related to human dynamics at both disaggregate and aggregate levels that have attracted attentions of researchers from a wide range of disciplines. This book series aims to capture this emerging dynamic interdisciplinary ﬁeld of research as a one-stop depository of our cumulative knowledge on this topic that will have profound implications for future human life in general and urban life in particular. Covering topics from theoretical perspectives, space-time analytics, modeling human dynamics, urban analytics, social media and big data, travel dynamics, to privacy issues, development of smart cities, and problems and prospects of human dynamics research. This will include contributions from the participants of the past and future Symposium on Human Dynamics Research held at the American Association of Geographers annual meeting as well as other researchers with research interests related to human dynamics via open submissions. The series invites contributions of theoretical, technical, or application aspects of human dynamics research from a global and interdisciplinary audience.

More information about this series at http://www.springer.com/series/15897

Xinyue Ye Hui Lin •

Editors

Spatial Synthesis Computational Social Science and Humanities

123

Editors Xinyue Ye Department of Landscape Architecture and Urban Planning Texas A&M University College Station, TX, USA

Hui Lin School of Geography and Environment Jiangxi Normal University Nanchang, China

ISSN 2523-7780 ISSN 2523-7799 (electronic) Human Dynamics in Smart Cities ISBN 978-3-030-52733-4 ISBN 978-3-030-52734-1 (eBook) https://doi.org/10.1007/978-3-030-52734-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Introduction: Spatial Synthesis in Computational Social Science and Humanities

1. Towards Computational Spatial Social Science and Humanities As Goodchild et al. (2000) illustrate, “changes in the space and place of peoples and nations have profoundly affected the spatial organization of the social, the economic, the political, and the cultural—the key domains of focus of the social sciences.” Space and place are central across social science and humanities disciplines to serve as both inputs and outputs of empirical and theoretical investigations (Dezzani 2010). Goodchild (2020) also states “in essence the particular form of integration that is so central to GIS practice is what we might term spatial integration.” Different from traditional social science and humanities, computational social science and humanities adopt computation as the vital enabling methodological foundation and platform (Ciofﬁ‐Revilla 2014). The Internet and cellular data networks signiﬁcantly change our mode of communication and reshape the formation of networked groups which were previously strongly constrained by distance and location, releasing the power of social interactions and group assembly across a much larger territory. Furthermore, Lazer et al. (2009) announce the coming age of computational social science, because we are entering the life in the network digitally captured to form comprehensive pictures of both individuals and communities. The past decade has witnessed the dramatic growth of computational social science and humanities research spurred by the increasingly available ﬁne-scale and human-centered spatial (spatiotemporal) data. The computing environment for geo-visualization, geo-simulation, geo-collaboration, and human participation has also been developed to assist various computer-aided research tasks (Lin et al. 2013). The following major trends have been identiﬁed: 1. Spatial social science and humanities research has shifted from a data-scarce to a near real time data-rich environment. The availability of unprecedented data sources over space, time, and social networking would facilitate the modeling of

v

vi

Introduction: Spatial Synthesis in Computational Social Science and Humanities

individuals’ behavior and the outcomes of such model across spatiotemporal scales, deeply rooted in both geographic landscape and social network. For instance, Ye et al. (2019) integrate the spatial method and social network analytics to model the scope and sources of online transactions and quantify the driving forces, based on online transactions at the city level. A powerful analytical framework for identifying space-time research gaps and frontiers is fundamental to the comparative study of spatiotemporal phenomena upon conﬁguration of various intertwined relationships. For example, novel research questions can be generated when we can systematically query the dynamic virtual and physical dimensions across multiple scales in socioeconomic modeling, transportation analysis, and disaster response (Ye and Rey 2013; Li et al. 2017; Wang and Ye 2017). 2. As the space-time data accumulate, the rich details of spatiotemporal dynamics in computational modeling remain largely unexplored because of many binding constraints for scientiﬁc advancement such as the challenge of intensities of data computing and very large geo-referenced dynamic databases (Shaw ad Ye 2019). In addition, such 24/7 unstructured social data needs special methods (Batty 2020). The revolution of computing and information technology has further blurred the boundary and deﬁnition of disciplines and applications. The increasing affordability of computing cost and lowered learning curve have also accelerated big spatiotemporal analytics studies at a growing rate. 3. The integration has been gradually realized among and across conceptualizations, analytical methods, and open-source software environments across disciplines of social science and humanities. Such an integration is needed to respond to the new data and computing environment (Liu et al. 2019). Human dynamics has been emphasized from the geospatial dimension within the context of mobile and big data era (Shaw et al. 2016). A virtual geographic environment has also been proposed as a computer-aided workspace for geographic experiments and analyses involving both the physical and human dimensions (Lin et al. 2013). By integrating environmental psychology theory and geospatial artiﬁcial intelligence, a framework of virtual geographic cognition experiment has been further developed to model and simulate human activity and urban context data (Zhang et al. 2018).

2. Synthesis and Convergence Themes of Social Science and Humanities are increasingly relevant to convergence and synthesis across multiple disciplines as well as the data, computing, interactive, and collaborative environments. Annual International Symposium of Spatially Integrated Social Science and Humanities have been held ten times to promote such practice. This book is born from the most exciting and dominant themes in Spatial Synthesis: Computational Social Science and Humanities research in China. As a ﬁrst English book of such kind, it spans most social science and humanities

Introduction: Spatial Synthesis in Computational Social Science and Humanities

vii

disciplines as well as computational science. This book is a comprehensive text on spatial and computational social science and humanities research. The development of more powerful computing technology, emerging big and open data sources, and theoretical perspectives on Spatial Synthesis has revolutionized the way in which we investigate social science and humanities. Given the pace of change and prominence of human-centered computing and spatial social science/humanities research, a summary of the principles and applications of such research is urgently required and will be of great value. In the foreword of this book, Batty (2020) notes that “the continued miniaturization of computers to the point where we are now using them personally in real time to organize our lives has led to many new ways of sensing and delivering data about our social behaviours”, while Goodchild (2020) highlights core reasons supporting convergence and synthesis: “the pressing challenges faced by the earth cannot be solved by one single discipline, and hence need the collaborative work between computing experts and domains scientists for broader perspectives”. This book contains research and contributions from scholars across China and the World. The main principles and applications of spatial social science and humanities over the past decade in China are reviewed. The book provides fundamental information that will help to shape future research. This book will allow researchers, students, and policy-makers worldwide to learn about the signiﬁcant achievements and applications of spatial social science and humanities research within China.

3. Spatial Synthesis in Humanities, Regional Science, and Urban Science This volume is the Human Dynamics in Smart Cities book series and is composed of 25 chapters. After the forewords by Academicians Michael Batty and Michael Goodchild, the following chapters cover a variety of interesting and timely topics on Spatial Synthesis for Computational Social Science and Humanities. The chapters focus on three aspects: humanities, regional science, and urban science according to their different roles, pertinent issues, and corresponding solutions, as below: Spatial Synthesis in Humanities: According to Hu et al. (2020), the ofﬁcial history, chorography, and family trees form the memory of China as a nation. They construct a multilevel architecture of Family Tree Geographical Information System (FTGIS) by incorporating modern geospatial information technologies into the research on family trees. Lu and Zhang (2020) build Historical Geographic Information System to promote the research of Chinese history including literature, maps, remote-sensing images, and archeological relics. Digital Historical Yellow River system is also developed to contain: (1) high-precision three-dimensional micro-geomorphology; (2) fusion scheme of historical hydraulic engineering and

viii

Introduction: Spatial Synthesis in Computational Social Science and Humanities

terrain model; (3) restoration of the three-dimensional shape of river channel; (4) simulation and demonstration of motion process in historical period of surface water; (5) reconstruction of rainfall characteristics in historical periods; and (6) river-water management methods in historical periods (Pan et al. 2020). Based on CHGIS (China Historical Geographic Information System) and CBDB (China Biographical Database) and mapping tools, Xu Y (2020) visualizes the trajectory, activities, and social networks of Tang Xianzu, a Chinese playwright of the Ming Dynasty. Measuring the cultural effects on demographic behaviors and outcomes is difﬁcult because such influences are challenging to quantify, Xu H (2020) integrates biomarker data and small area estimation techniques to identify the spatial variation of cultural tolerance. How to effectively and efﬁciently protect the traditional Cave-Dwelling village is crucial for cultural heritage conservation. Dang et al. (2020) adopt Cultural Landscape Gene theory to analyze landscape features of cave-dwelling village in Wudinghe River Basin and examine its cultural values. Through the perspective of the cultural space, Shi (2020) explores the popularity of the Taiwanese ballad music form characterized by the mixed-race influences from Japan, by integrating the geographic information system and qualitative interviews. Spatial Synthesis in Regional Science: Gu et al. (2020) systematically review the recent advance on Spatial Demography from the angles of differentiation and isolation, birth and death, migration and urbanization, regional population forecast, population and the environment, as well as analytical methods and application. Yang and Li (2020) document the previous studies for the air and high-speed railway networks at different spatial and temporal scales, based on the various conﬁguration of complex network in the weighted network. Quite a few USA-based residential property owners have gotten ﬁnancial support because of energy-efﬁcient products and services through the programs such as the Property Assessed Clean Energy, Pan (2020) computes the economic influences of the residential energy efﬁcient programs based on a metropolitan input–output model. Zhang et al. (2020) estimate the influences of changes in industrial structure, energy total factor efﬁciency, and energy structure on changes in carbon emission (CO2) at the provincial level in China, using exploratory spatial data analysis and spatial panel econometric models. Gui et al. (2020) conduct a visual analysis of smart cities and big data management in the Yangtze River Delta region, based on company registration information for 30 years. Using both ordinary least square and geographically weighted regression, Gao and Chen (2020) analyze the driving forces of land urbanization in China at the county level in 2000 and 2015, ﬁnding land urbanization experienced an average increase by 2.77% annually during this period with an obvious north–south disparity. Population growth, economic development, industrial structure, city/county features, and geographical location are found to be signiﬁcant factors shaping the geographical disparities of land urbanization. Qin et al. (2020) use the Chinese General Social Survey data to explore the geographical patterns and driving forces of intergenerational education mobility via intergenerational mobility indices and geographically weighted regression model. Adopting a representative sample of volunteered geographic information crawled from Sina

Introduction: Spatial Synthesis in Computational Social Science and Humanities

ix

Weibo and Baby Back Home, Yang and Sui (2020) analyze the spatial distribution of child beggars and missing children in China, respectively. Spatial Synthesis in Urban Science: To illustrate how geospatial data might be influenced by the rapid advance of artiﬁcial intelligence, Zhao et al. (2020) examine three geospatial spooﬁng cases: the game player trajectories generated by bot, the tweeted fake locational information, and simulated image of place made by a deep learning algorithm. Jiang (2020) promotes a complex network angle on the wholeness to better understand the nature of order or beauty for sustainable design, which helps to reduce the mystery of wholeness and enables us to appreciate Alexander’s wholeness philosophy in ﬁne and deep structure. Zhou and Peng (2020) develop an analytical framework of behavior research in China, fundamental to comparative study as well as dynamic and predictive research. Gao et al. (2020) adopt travelers’ perception towards city space through bloggers, tweets, pictures, and videos, examining tourists’ perceived images of city center, historical community, and traditional water town of Shanghai. Shen (2020) investigates the patterns of distance-based accessibilities for various housing types associated with surrounding community facilities for four counties in North Carolina, U.S.A. In addition, taking the transportation network companies vehicle GPS trajectories in Shenzhen as a case, Tu et al. (2020) design a data-driven framework to uncover on-demand shared mobility pattern. Zheng et al. (2020) review the applications of eye movement experiments in humanities, social science, and geospatial cognition. Furthermore, two experiments are conducted based on goal searching strategy and indoor wayﬁnding.

4. Conclusion Academia, decision-makers, and citizens have progressively realized the necessity of modeling, simulating, and analyzing social phenomena based on large-scale computing, in order to transform our understanding of our lives, organizations, and societies at this point in the human history (Lazer et al. 2009). Noting that chapters in this book do not cover the full scale of computational spatial social science and humanities research, the following research avenues are also noteworthy. First, there is a need to develop a systematic and theoretical framework to characterize such methodological integration in reflecting the multifaceted nature of human dynamics and social complexity. Second, data-challenged depressed communities deserve the emerging strand of study because ignoring the coexistence of data-rich and data-poor environments would lead to possibly biased model results that are meaningless and harmful in policy implementation. To address fairness in big data analytics of social science and humanities offers us new opportunities in understanding the world and harnessing data science for social good. While we may not be able to mention all relevant studies in this short introductory piece, this edited

x

Introduction: Spatial Synthesis in Computational Social Science and Humanities

volume is among the efforts to promote spatial synthesis in human and social dynamics studies, towards a new generation of research environments and tools to contribute to a deeper understanding of the geographic world. Xinyue Ye Hui Lin Acknowledgement The work has been supported by School of Geography and Environment, Jiangxi Normal University.

References Batty, M. (2020). Foreword I: Charting computational social science from a spatial perspective. (This volume). Ciofﬁ‐Revilla, C. (2014). Introduction to computational social science: Principles and Applications, Springer: New York. Dang, A., Zhao, D., Chen, Y., & Wang, C. (2020). Conservation of cave-dwelling village using cultural landscape gene theory. (This volume). Dezzani, R. (2010). Spatially integrated social science. In B. Warf (Ed.), Encyclopedia of geography. doi: 10.4135/9781412939591.n1070 Gao, J. & Chen, J. (2020). Demystifying the inequality in urbanization in China through the lens of land use. (This volume). Gao, J., Ma, J., Li, J., & Wang, L. (2020). Studies on tourists’ city space images. (This volume). Goodchild, M. F. (2020). Foreword II: Convergence and synthesis. (This volume). Goodchild, M. F., Anselin, L., Appelbaum, R. P., & Harthorn, B. H. (2000). Toward spatially integrated social science. International Regional Science Review, 23(2), 139–159. Gu, H., Lao, X., & Shen, T. (2020). Research progress on Spatial Demography. (This volume). Gui, R., Chen, T., & Wu, Z. (2020). Spatial visualization and analysis of the development of high-paid enterprises in the yangtze river delta. (This volume). Gui, Z., Wang, Y., Li, F., Tian, S., Peng, D., & Cui, Z. (2020). High performance spatiotemporal visual analytics technologies and its applications in big socioeconomic data analysis. (This volume). Hu, D., Cheng, X., Lu, G., Wen, Y. & Chen, M. (2020). The China family tree geographic information system. (This volume). Jiang, B. (2020). A complex-network perspective on alexander’s wholeness. (This volume). Lazer, D., Pentland, A. S., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D., & Jebara, T. (2009). Life in the network: the coming age of computational social science. Science: New York, NY, 323(5915), 721. Li, M., Ye, X., Zhang, S., Tang, X., & Shen, Z. (2017). A framework of comparative Urban trajectory analysis. Environment and planning B. doi: 10.1177/2399808317710023 Lin, H., Chen, M., & Lu, G. (2013). Virtual geographic environment: a workspace for computer-aided geographic experiments. Annals of the Association of American Geographers, 103(3), 465–482. Liu, X., Xu, Y., & Ye, X. (2019). Outlook and next steps: Integrating social network and spatial analyses for urban research in the new data environment. In Cities as spatial and social networks (pp. 227–238). Springer, Cham.

Introduction: Spatial Synthesis in Computational Social Science and Humanities

xi

Lu, Y & Zhang, P. (2020). GIS for Chinese history research. (This volume). Pan, Q. (2020). Economic impact analysis for an energy efﬁcient home improvement program. (This volume). Pan, W., Su, R., Man, Z., Zhang, L., He, M., & Han, L. (2020). Digital historical yellow river. (This volume). Qin, K., Luo, P., Lu, B., & Lin, Z. (2020). Analysing spatial patterns of intergenerational education mobility in China. (This volume). Shaw, S., Tsou, M., & Ye, X. (2016). Human dynamics in the mobile and big data era. International Journal of Geographical Information Science, 30(9): 1687–1693. Shen, G. (2020). Accessibility of residential houses to commnuity facilities. (This volume). Shi, C. (2020). Digitalized enka-style Taipei. (This volume). Tu, W., Wei, C., Zhao, T., Li, Q., Zhong, C., & Li, Q. (2020). Uncovering online sharing vehicle mobility patterns from massive GPS trajectories. (This volume). Wang, Z. & Ye, X. (2017). Social media analytics for natural disaster management. International Journal of Geographical Information Science. doi: 10.1080/13658816.2017.1367003 Xu, H. (2020). Quantifying spatial variation in aggregate cultural tolerance. (This volume). Xu, Y. (2020). Visualizing classic Chinese literature. (This volume). Yang, H. & Li, Y. (2020). Complex network theory on high-speed transportation systems. (This volume). Yang, X. & Sui, D. (2020). Can social media rescue child beggars? (This volume). Ye, X. & Rey, S. J. (2013). A framework for exploratory space-time analysis of economic data. Annals of Regional Science, 50(1): 315–339. Ye, X., Lian, Z., She, B., & Kudva, S. (2019). Spatial and big data analytics of E-market transaction in China. GeoJournal, 1–13. Zhang, F., Hu, M., & Lin, H. (2018). Virtual geographic cognition experiment in big data era. Acta Geodaetica et Cartographica Sinica, 47(8), 1043. Zhang, J., Li, J., & Wang, X. (2020). Exploring the dynamics of carbon emissions in China via spatial-temporal analysis. (This volume). Zhao, B., Zhang, S., Xu, C., & Liu, X. (2020). Spooﬁng in geography: Can we trust artiﬁcial intelligence to manage geospatial data? (This volume). Zheng, S., Chen, Y. & Wang, C. (2020). Application of eye-tracking technology in humanities, social sciences and geospatial cognition. (This volume). Zhou, S. & Peng, Y. (2020). Spatial-temporal behavior analysis in Urban China. (This volume).

Contents

Part I 1

2

Foreword

Foreword I: Charting Computational Social Science from a Spatial Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Batty Foreword II: Convergence and Synthesis . . . . . . . . . . . . . . . . . . . . Michael F. Goodchild

Part II

3 7

Spatial Synthesis in Humanities

3

The China Family Tree Geographic Information System . . . . . . . . Di Hu, Xinghua Cheng, Guonian Lü, Yongning Wen, and Min Chen

13

4

GIS for Chinese History Research . . . . . . . . . . . . . . . . . . . . . . . . . Yifan Lu and Ping Zhang

39

5

Digital Historical Yellow River . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wei Pan, Rao-rao Su, Zhi-min Man, Li-jie Zhang, Mi-mi He, and Li-kun Han

53

6

Visualizing Classic Chinese Literature . . . . . . . . . . . . . . . . . . . . . . Yongming Xu

65

7

Quantifying Spatial Variation in Aggregate Cultural Tolerance . . . Hongwei Xu

77

8

Conservation of Cave-dwelling Village using Cultural Landscape Gene Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anrong Dang, Dongmei Zhao, Yang Chen, and Congwei Wang

9

97

Digitalized Enka-Style Taipei . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 C. S. Stone Shih

xiii

xiv

Part III

Contents

Spatial Synthesis in Regional Science

10 Research Progress on Spatial Demography . . . . . . . . . . . . . . . . . . . 125 Hengyu Gu, Xin Lao, and Tiyan Shen 11 Complex Network Theory on High-Speed Transportation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Haoran Yang and Yongling Li 12 Economic Impact Analysis for an Energy Efﬁcient Home Improvement Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Qisheng Pan 13 Exploring the Dynamics of Carbon Emission in China via Spatial-Temporal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Jin Zhang, Jinkai Li, and Xiaotian Wang 14 Spatial Visualization and Analysis of the Development of High-Paid Enterprises in the Yangtze River Delta . . . . . . . . . . . 199 RenZhou Gui, Tongjie Chen, and Zhiqiang Wu 15 High Performance Spatiotemporal Visual Analytics Technologies and Its Applications in Big Socioeconomic Data Analysis . . . . . . . . 221 Zhipeng Gui, Yuan Wang, Fa Li, Siyu Tian, Dehua Peng, and Zousen Cui 16 Demystifying the Inequality in Urbanization in China Through the Lens of Land Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Jinlong Gao and Jianglong Chen 17 Analyzing Spatial Patterns of Intergenerational Education Mobility in China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Kun Qin, Ping Luo, Binbin Lu, and Zeng Lin 18 Can Social Media Rescue Child Beggars? . . . . . . . . . . . . . . . . . . . . 303 Xining Yang and Daniel Z. Sui Part IV

Spatial Synthesis in Urban Science

19 Spooﬁng in Geography: Can We Trust Artiﬁcial Intelligence to Manage Geospatial Data? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Bo Zhao, Shaozeng Zhang, Chunxu Xu, and Xiaobai Liu 20 A Complex-Network Perspective on Alexander’s Wholeness . . . . . . 339 Bin Jiang 21 Spatial-Temporal Behavior Analysis in Urban China . . . . . . . . . . . 355 Suhong Zhou and Yinong Peng 22 Studies on Tourists’ City Space Images . . . . . . . . . . . . . . . . . . . . . 377 Jun Gao, Jianyu Ma, Jie Li, and Liangxu Wang

Contents

xv

23 Accessibility of Residential Houses to Community Facilities . . . . . . 399 Guoqiang Shen 24 Uncovering Online Sharing Vehicle Mobility Patterns from Massive GPS Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Wei Tu, Cui Wei, Tianhong Zhao, Qiuping Li, Chen Zhong, and Qingquan Li 25 Application of Eye-Tracking Technology in Humanities, Social Sciences and Geospatial Cognition . . . . . . . . . . . . . . . . . . . . 431 Shulei Zheng, Yufen Chen, and Chengshun Wang Part V

Afterword

26 Prospects of Spatial Synthesis in Computational Social Science and Humanities: Towards a Spatial Synthetics and Synthetic Geography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Daniel Z. Sui

Part I

Foreword

Chapter 1

Foreword I: Charting Computational Social Science from a Spatial Perspective Michael Batty

Although digital computers emerged in the first half of the 20th century, the idea of computation had been deeply embedded in science and philosophy from the Enlightenment, certainly from the Renaissance on, and indeed as far back as the Greeks. When computers were invented however, it took on a new meaning in that everything which computers were able to do first depended upon reducing a problem to its digital fundamentals and then combining and recombining its elements using logics, arithmetic, and algebras. This is the contemporary notion of ‘computation’ in contrast to the term ‘computer’ which is reserved for the hardware on which such computation takes place. In fact computation has come to dominate the myriad of applications that that define the scope that computers can address, and slowly but surely over the last 80 years, the term ‘computational’ has been appended to many areas as computers increasingly penetrated social and economic life, well beyond their original applications in science. In the late 20th century, the term began to be applied to various of the social sciences. For example, 25 years ago, it was used by Hummon and Farajo (1995) in their paper on computational sociology. In the late 1980s, David Mark and his colleagues at the National Center for Geographic Information and Analysis used the term in many conversations and in 1994, The Centre for Computational Geography was set up by Stan Openshaw at the University of Leeds (http://www.ccg.leeds.ac.uk/). This led directly to the term Geocomputation which is still widely used to this day and whose history I recalled in an editorial to mark the 21st anniversary of the first conference (Batty, 2017). But back to social science. The publication of the path-breaking book by Epstein and Axtell (1996 Growing Artificial Societies: Social Science from the Bottom Up) was a wonderful demonstration of how computation could be employed to simulate many features of contemporary communities and their histories, showing all M. Batty (B) CASA, University College London, London, UK e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_1

3

4

M. Batty

the features of complexity science—segregation, agglomeration, income distribution, spatial clustering, reflecting new methods and ideas ranging from emergence, fractals, positive feedback, power laws, historical accident, path dependence and so on. It introduced agent-based modelling in contrast to much of the aggregative modelling in the social sciences that had preceded it. Thus computational social science came to define the use of computers to enable simulations to be extended to quite large systems but more specifically to methods that enabled many different kinds of logics other than classical algebras to be applied to diverse problems in social and economic domains. A decade after the millennium, the field had matured to the point where representation as well as simulation set the boundaries on its scope, reflected in Claudio Cioffi-Revilla’s (2010) review of the field where he defines “the main computational social science (CSS) areas are automated information extraction systems, social network analysis, social geographic information systems (GIS), complexity modelling, and social simulation models.” What marks this definition is that CSS is methodologically self-conscious in that although it deals with the social and economic domains extending as far as the behavioural sciences of individual economic and social decision-making, it does not presume to extend our knowledge of these substantive systems. CSS is not focussed on developing new social science theory per se although it may be based on demonstrating how we can develop new methods for validating theory, and in this process, there is often some focus on articulating ways of measurement and simulation which may eventually lead to new and better theory. For the last 10 years, there has however been a sea change with respect to CSS. The continued miniaturisation of computers to the point where we are now using them personally in real time to organise our lives has led to many new ways of sensing and delivering data about our social behaviours. This is data that is captured and often delivered, often analysed and acted upon in near real time. It is data that is ‘big’ in the sense that individual behaviours are being captured 24/7 and it is voluminous in size. It often requires special and very different multivariate techniques and methods to even represent, store and access it as this is data that is largely unstructured. Unlike official Census data, it is not made to measure and often requires the powerful tools coming from what is now called data science for exploring whether significant patterns exist within it. In this sense, the focus changes within computational social science to developing much more inductive methods, methods that seek to extract patterns which ultimately build up to new hypotheses, rather than develop simulations which seek to test these hypotheses. Of course, the scientific methods in CSS are no different from any other positive philosophies which always depend on a fusion of inductive and deductive perspectives. In this book, Hui Lin and Xinyue Ye have put together an interesting collection of papers that deal with a very wide range of computational approaches to not only the social sciences but also the humanities. What distinguishes this set of papers other than its extent is the fact that the expertise of the editors in geospatial analysis is brought to bear on the various papers. Computational geography as we alluded to above is an additional theme that runs throughout the collection and this serves to ground the various chapters in quite well-developed GIS technologies. In fact as

1 Foreword I: Charting Computational Social Science …

5

Cioffi-Revilla (2014) notes, social GIS (geographic information systems/science) is key to his more catholic definition of CSS and this is certainly the stance taken by the editors. The book is divided into three parts, all dealing with communicating a synthetic knowledge of computation in the humanities, regional science, and urban science in that order. The first part deals with the humanities covering the structure of geographic information using ideas about hierarchy, the use of GIS in Chinese historical research, the visualisation of Chinese literature, cultural landscapes, conservation, and archaeological perspectives. The second and third parts deal with changes in scale, to some extent from national concerns to the regional and then the urban. Spatial demography, network theory as in transportation, economic impact analyses, carbon emissions, the locations of firms, analysis of social and economic structure through visualisation, inequalities, educational mobility and poverty are all key dimensions in the papers developed in this section. There is a stronger quantitative dimension to the papers here where spatiotemporal modelling and visualisation are widely developed. The book then changes tack to deal with cities at the urban scale. Illusions and twists in geographic analysis introduce this focus and then the tenor changes to complex networks, spatiotemporal behaviour, imageability in cities, community facilities, mobility, and tracking. All of these papers are written using spatial tools which emphasise visualisation of complex data sets. In fact most of the data introduced in what are a set of strongly empirical papers do not really fall into the class of big data per se. But the tools of simulation and visualisation in computational social science are well developed here and potential readers will be able to gain a real sense of how geospatial analysis can be used in CSS to great advantage. Many of the examples relate to different spatial scales in Chinese cities and regions and this provides a fascinating explanation of how wide such science is and how it is being developed for important advances in our understanding of explanation and prediction in social systems from a spatial perspective.

References Batty, M. (2017). Geocomputation. Environment and Planning B: Urban Analytics and City Science, 44, 595–597. Cioffi-Revilla, C. (2010). Computational social science. WILEY Interdisciplinary Reviews: Computational Statistics, 2(3), 259–271. Cioffi-Revilla, C. (2014). Introduction to computational social science: Principles and applications. New York: Springer. Epstein, J. M., & Axtell, R. L. (1996). Growing artificial societies: Social science from the bottom up. Cambridge, MA: The MIT Press. Hummon, N. P., & Fararo, T. J. (1995). The emergence of computational sociology. Journal of Mathematical Sociology, 20(2–3), 79–87.

Chapter 2

Foreword II: Convergence and Synthesis Michael F. Goodchild

In their book Convergence of Knowledge, Technology, and Society Roco et al. (2013; see also NRC 2014) argued that the history of science has been one of swings between divergence and convergence. In the divergence phase specialization flourishes, with limited interaction between specialties, while in the convergence phase the barriers between the specialties begin to weaken, and science advances through the sharing of expertise and interest between specialties. The US National Science Foundation has recognized the importance of convergence in today’s scientific enterprise, defining it as “integrating knowledge, methods, and expertise from different disciplines and forming novel frameworks to catalyze scientific discovery and innovation” (https:// www.nsf.gov/od/oia/convergence/index.jsp). There are several reasons for believing in the importance of convergence and synthesis at this point in the history of science. First, the problems faced by humanity are arguably more challenging than they have ever been, as the planet becomes more crowded and its ability to sustain life is under increasing threat. Second, science today is by nature collaborative, requiring specialists in statistics, computing, and other cross-cutting disciplines in addition to the expertise in particular domain sciences that is required by the problem at hand; the days when a lone investigator working in a single discipline could isolate and study a problem and derive significant knowledge from it are probably gone. Third, many of the practices of academia are centripetal, drawing a scientist into a real or imagined core of his or her discipline; it follows that positive effort is required to encourage and reward broader perspectives. Yet anyone who has followed the history of geographic information systems (GIS) since their inception in the mid 1960s will be familiar with an earlier version of the convergence argument. In building his school of landscape architecture at the University of Pennsylvania in the 1950s and 1960s, Ian McHarg argued that expertise in a M. F. Goodchild (B) University of California, Santa Barbara, CA, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_2

7

8

M. F. Goodchild

number of disciplines—in ecology, hydrology, climatology, geology, soil science— was essential to good landscape architecture. Thus in staffing the school he made sure to hire experts in each of these areas, and to insist that each of them be dedicated to synthesis, to making the school more than the sum of its disciplinary parts (McHarg 1969, 1996). In developing a plan, each discipline’s contribution could be visualized as a layer of knowledge, one of the stack of layers that now graces the front cover of many GIS textbooks and the left margin of many GIS software products. In short, GIS has always claimed a role in integration and in supporting interactions across the boundaries of disciplines. In essence the particular form of integration that is so central to GIS practice is what we might term spatial integration. Driving a spike through all of the layers will intersect the same location on each layer, and have the effect of integrating each discipline’s data for the point in space that is represented by the spike. The ecologist’s information about that point can now be coupled with information from the geologist, the hydrologist, the climatologist, and the soil scientist. This argument raises space to a central role in integration or convergence. In the late 1990s I proposed that the National Science Foundation fund a Center for Spatially Integrated Social Science, an investment in the infrastructure of the social sciences that would explore and demonstrate the value of location in enabling conversations and synthesis between the social sciences. The center was established at UCSB in 1999, and for five years it organized a series of programs: workshops, software development, learning resources, examples of best practice, improving access to tools, and search for data based on geographic location (csiss.org; Goodchild and Janelle 2004; Goodchild et al. 2000). The work of the center continues today in UCSB’s Center for Spatial Studies (spatial.ucsb.edu). It is easy to see how this argument for the role of geographic space in integration can be extended to time, and to any discipline that deals with phenomena distributed in space and time. A compelling argument can even be made that space and time are unique in this respect; that the processes studied largely independently in the domain sciences need be integrated only when it is necessary to study their joint impacts on a location at a specific time. Early progress on the development of GIS was slow, due at least in part to the heavy demands that it placed on very limited computing resources. Combining vector data required that intersections be computed between the lines and areas depicted on each layer, and it was not until the late 1970s that algorithms, methods of indexing, and computing resources had advanced to the point where this was feasible and reliable (Esri’s PIOS and Harvard’s ODYSSEY both emerged at about this time). Instead, a common work-around was to represent each layer in raster, using a common, coregistered raster for each layer, despite the old adage that “raster is faster but vector is correcter”. Several systems emerged in the early 1970s to implement what was in practice a very simple raster overlay operation. Today vector overlay algorithms are fast and reliable, and many of the early raster-overlay systems disappeared or were absorbed by the industry leaders. But another twist to this argument has emerged in recent years. Raster systems were always seen as single-scale, working at a fixed spatial resolution, yet today we

2 Foreword II: Convergence and Synthesis

9

have access to a great variety of raster data at a wide range of resolutions, and interesting advances have been made recently in multi-scale analysis, combining layers at different resolutions. The technology of discrete global grid systems (DGGS; Sahr et al. 2003) allows the planet’s surface to be divided into tiles that are approximately equal in size and shape, at a hierarchy of levels of resolution, with each level nesting within the level above. DGGS are superbly elegant ways of integrating multi-scale data. This new book on spatial synthesis is one more proof of the value of this approach. Each chapter takes one area of the social sciences and humanities and shows how a spatial approach can result in significant advances in knowledge. It should be of great value to anyone interested in pursuing this approach, or in developing new tools to support it, or in developing courses that can empower students. Congratulations to the organizers and editors; I look forward very much to seeing it in print.

References Goodchild, M. F., & Janelle, D. G. (2004). Spatially integrated social science. Oxford, New York. Goodchild, M. F., Anselin, L., Appelbaum, R. P., & Harthorn, B. H. (2000). Toward spatially integrated social science. International Regional Science Review, 23(2), 139–159. McHarg, I. (1969). Design with nature. Garden City, NY: Natural History Press. McHarg, I. (1006). A quest for life. Wiley, New York. NRC (National Research Council). (2014). Convergence: Facilitating Transdisciplinary Integration of Life Sciences, Physical Sciences, Engineering, and Beyond. Washington, DC: The National Academies Press. Roco, M. C., Bainbridge, W. S., Tonn, B., & Whitesides, G. (Eds.). (2013). Convergence of knowledge, technology and society: beyond convergence of nano-bio-info-cognitive technologies. New York: Springer. Sahr, K., White, D., & Kimerling, A. J. (2003). Geodesic discrete global grid systems. Cartography and Geographic Information Science, 30(2), 121–134.

Part II

Spatial Synthesis in Humanities

Chapter 3

The China Family Tree Geographic Information System Di Hu, Xinghua Cheng, Guonian Lü, Yongning Wen, and Min Chen

3.1 Family Tree and GIS 3.1.1 Family Tree A family tree (also called genealogy) is important historical material. The official history, chorography and family trees constitute China’s national history. A family tree systematically documents a clan with the same ancestor. A large amount of historical information about individuals, families, clans, society, ethnology, customs, economy, peoples, geography, population and culture is contained in a family tree (Ge 1996; Wang 2006). Family trees have great value that is mainly reflected in four aspects: cultural relics, literature, education and rooting (Ge 1996; Wang 2006). First, a family tree is a cultural relic; some family trees have existed for more than 1,000 years. Some family trees have been edited or commented on by celebrities. Second, the family tree is an important form of literature that can provide ample and important data for many research fields, including studies of the family, history, humankind and surnames. The family tree is a kind of useful material for researchers D. Hu · X. Cheng · G. Lü (B) · Y. Wen · M. Chen School of Geography Science, Nanjing Normal University, Nanjing 210023, China e-mail: [email protected] D. Hu · G. Lü · Y. Wen · M. Chen Key Laboratory of Virtual Geographic Environment, Ministry of Education, Nanjing Normal University, Nanjing 210023, China Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China X. Cheng Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_3

13

14

D. Hu et al.

who wish to investigate feudal thought and the family system of China. Furthermore, national historical events, the lives of celebrities, and histories of minorities and families are recorded in family trees with different levels of detail. These records are important reference materials for historical research. Third, family trees have educational value. Generally, family tree records include parental instructions, the regulations of a clan and the laws of a family. These records reflect traditional Chinese virtues. Investigating a family tree enables a researcher to trace a family’s heritage and development and reproduce a glorious history that can greatly inspire the descendants of the family. Fourth, family trees have rooting value. An increasing number of overseas Chinese people are interested in identifying their ancestors. In this sense, family trees can be regarded as an important reference that provides important evidence of family history. The origin and lineage of a family are important content that makes a family tree essential material for such genealogical research efforts. A great advantage can be found in Chinese family trees. First, Chinese family trees have a long history, originating from the pre-Qin period and continuing to the present. Family trees have existed over several centuries, and some have existed for nearly 1,000 years. Second, numerous family tree records exist in China, providing researchers with important source material. Some family trees have been well preserved, which enables researchers to extract useful information from them. Third, with increasing awareness of tracing roots and ancestors, family trees are continually consulted by both the public and research institutes. Therefore, it is important to investigate and mine the information contained in Chinese family trees.

3.1.2 GIS and Family Tree Research Geographical information system (GIS) is a computer system that stores, manages, analyzes, expresses and displays geographic information about geolocation-related phenomena (Goodchild 2009). In the several decades since its initial development, GIS has been applied in all walks of life, including environmental protection (Goodchild 1993), hydrologic modeling (Devantier and Feldman 1993), land use analysis, agriculture, public health (Nykiforuk and Flaman 2011; Higgs 2004), transportation and urban planning (Harris and Elmes 1993). Current hot topics in GIS research include three-dimensional GIS, service-oriented GIS, digital globe (Goodchild 2018) and smart cities (Roche 2014; Degbelo et al. 2016) and so on. GIS has enabled a focus on the organization, management and spatial analysis of geographic information about natural phenomena over recent decades. However, few studies have put effort into studying geographic information about the humanities and social sciences. In recent years, applying GIS to solve problems related to the humanities and social sciences has become increasingly popular, and GIS has been widely used in fields such as history, linguistics, criminology and economics. The concept of spatially integrated humanities and social sciences has been proposed (Harris 2009; Rumsey 2009; Goodchild and Janelle 2010). Many research institutions

3 The China Family Tree Geographic Information System

15

related to GIS and the humanities and social sciences have been established, and corresponding conferences have been held successfully. Moreover, a lot of databases and information systems have been established, such as Chinese Civilization in Time and Space (CCTS) (Liao and Fan 2012; Academia Sinica 2002), the China Historical Geographic Information System (CHGIS) (Harvard CGA 2001), the Historical GIS Database of Cotton Textile Industry on the Songjiang Region in Late Ming China (Billy 2011) and the Spatial History Project at Stanford University (White 2010). The family tree can be considered a new source of geographic information because of its typical spatial-temporal characteristics. Family trees contain extensive information about individual, family and clan activities that occur in the context of specific spatial-temporal scenarios. The spatial information includes the birthplace and death place of an individual, the location of a grave and the site of an event. The temporal information includes the times of births, deaths, migrations and other events. GIS has a variety of functions, including spatial-temporal information modeling, analysis, expression and display. These functions are highly useful for family tree research and use. Family tree information can be stored, managed, analyzed, expressed and displayed from spatial and temporal perspectives using GIS.

3.1.3 Concept and Objectives of Family Tree GIS The concept of the Family Tree Geographical Information System (FTGIS) was proposed by Lü et al. (2009). The FTGIS emphasizes the importance of obtaining and mining family tree information about the spatial-temporal distribution and migration of clans and then expressing the spatial-temporal clan pedigree visually. The FTGIS is dedicated to digitally storing, analyzing and expressing the spatial-temporal poral information in a family tree. Furthermore, FTGIS aims to construct a visible spatialtemporal network of family tree and to express spatial and genealogical relationships clearly and understandably. In dealing with Chinese family tree sources, mining the driving mechanisms of family inheritance and development to reproduce the history of Chinese civilization is the ultimate goal of the FTGIS. Specifically, the main objectives of the FTGIS are as follows: (1) To digitize the full texts of family trees. Most family trees are stored in libraries and private homes in the form of printed text, which makes it difficult to analyze and share family tree information. Existing family tree information systems mainly support bibliographic search rather than full-text search. The primary objective of the FTGIS is to digitize the full text of family trees and then build family tree databases and establish a foundation for the construction of the FTGIS platform. (2) To make the temporal information contained in family trees comparable. The expression of the temporal information is mainly based on the Chinese traditional calendar, supplemented by the Christian era. These two ways of indicating time are different in terms of benchmarks. Hence, the FTGIS is dedicated to

16

(3)

(4)

(5)

(6)

D. Hu et al.

ensuring that the temporal information can be located by using a time conversion engine. To make the spatial information contained in family trees locatable. Place names are the main spatial information in family trees. These place names are not associated with longitude and latitude coordinates, which makes them difficult to locate. In addition, as time passes, place names, locations, and regions often change. The FTGIS can map ancient place names to specific spatial locations or regions using an encoding technology based on ancient and modern place names. To build a platform for family tree information collection and sharing. In China, many family trees have not been publicly published, making information collection and sharing an issue. The FTGIS platform aims to provide an efficient and convenient way to collect and share family tree information. To express the family tree information in a dynamic and visual way. Family tree information is mostly recorded in the form of dry words, which is a disadvantage for intuitively expressing family tree information, especially spatial-temporal information. Expressing family tree information in a dynamic and visual way not only helps the public gain an understanding of family trees but also helps researchers to analyze family tree information. The FTGIS aims to provide various ways of displaying family tree information directly and vividly. To promote family tree information analysis with the aid of GIS. As a tool, GIS has a variety of functions, including spatial analysis, spatial positioning, and multidimensional visualization. Of these, spatial analysis is the core function. Therefore, expressing and analyzing family tree information by leveraging GIS is both possible and convenient. Through the FTGIS, information on individuals, families and clans can be mined in addition to the path of the development and heritage of any clan.

3.2 A Unified Spatial-Temporal Framework for Family Trees 3.2.1 Why Is a Unified Spatial-Temporal Framework Needed? A unified spatial-temporal framework is essential for constructing the FTGIS. A large amount of spatial-temporal information is implicated in the preface, personal biographies and other content of a family tree, which is an important part of studying clan lineages, migrations and the spatial distribution of families and building the spatialtemporal pedigrees of families and clans. However, such information is recorded with different spatial-temporal benchmarks. Hence, constructing a unified spatialtemporal framework is extremely important for mapping the spatial-temporal information from different family trees for further analysis. GIS technologies can be used

3 The China Family Tree Geographic Information System

17

to conduct a spatial analysis of family trees in various time periods and geographic regions. Time is one basic type of information contained in family trees. Generally, time is expressed in two forms, namely, the Chinese traditional calendar and the Gregorian calendar. These two forms are different in terms of datum. Specifically, the Chinese traditional calendar can be divided into two types in which time is recorded using the annual number of dynasties and the annual branches of dynasties. Ancient Chinese family trees mainly employ the Chinese traditional calendar, while modern family trees mostly use the Gregorian calendar. However, the Chinese traditional calendar usually omits the name and annual number of the dynasty; therefore, that time must be estimated by interpreting the context. It is difficult for a computer to directly compare the different types of temporal information, and issues such as clan lineages, population ages and life regulations cannot be definitively resolved. Thus, constructing a unified temporal datum and unifying the time expression methods are essential. Spatial information in family trees is mainly expressed by place names and simple maps that lack accurate descriptions of specific locations and regions. Ancient place names are mostly expressed as lower-level place names and tend to omit higher-level administrative divisions. This makes it difficult for modern people to locate ancient place names. In addition, place names change frequently over time. One place may have different names in different time periods, and different places may have the same name. Therefore, locating positions of place names correctly is important for further study of family trees.

3.2.2 How Can a Unified Spatial-Temporal Framework Be Constructed? Positioning time and place names correctly is the core of building a unified spatialtemporal framework. Therefore, models for time and place names should be built and then combined into a unified spatial-temporal framework. First, to address the time expression issues mentioned above, time must be located based on a unified temporal datum. More importantly, the time model should be able to cover the entire process of the spatial-temporal evolution of Chinese civilization. Second, types of place names should be selected feasibly. Ancient and modern place names are taken into consideration because these two types of place names exist at specific time points or periods. Positioning place names requires building relationships between ancient place names and modern place names and spatially orienting them based on administrative divisions. Then, the current locations of ancient place names can be identified based on this relationship. Moreover, place names can be abstracted as geographic regions or entities with specific spatial locations, shapes, and ranges. In addition, many place names in family trees lack longitude and latitude coordinates. Therefore, a specific spatial-temporal symbol for expressing place name is needed.

18

D. Hu et al.

A unified spatial-temporal framework for family trees is proposed in this study. This framework uses the Christian era year, Julian date and time as temporal datum and Chinese historical administrative divisions and ancient and modern place names as spatial datum. Through spatial-temporal database technology, the framework converts Chinese traditional time to the Christian era and Julian date and time using a time conversion engine. Thus, time information can be positioned. The framework maps ancient place names to a specific location or extent using an ancient and modern place name encoding engine. Thus, spatial information can also be positioned. Then, the family tree information can be expressed in a unified spatial-temporal framework, and researchers can use spatial-temporal data analysis and mining methods to investigate family tree information.

3.3 FTGIS Data Model 3.3.1 Content and Information of Family Trees Family trees are rich in content. Generally, a family tree contains a cover, preface, commentary, legend, catalog, compiler, details of origin, lineage chart, Zibei, honor record, biographies, clan rule and domestic discipline, and information about ancestral temples, tombs, clan property, contracts, writings and serial numbers (Ge 1996; Wang 2006). Zibei is a word used in a name to indicate the rank of a clan. Modern family trees usually contain attached demographic charts, compared tables of time, and ancient and modern place name references. Some modern family trees even contain audio and video materials. There is no standard for the content of family trees. Some contain more information, and some contain less; some are brief, and some are detailed. Family tree information can be divided into three parts: basic information, core information and other information. Time and place are the basic information of a family tree. The birth and death information of family members constitutes the main content of a family tree together with their activities at specific time periods and places. Time and place frequently appear in family trees. Based on a family tree, we can know when and where the ancestor of a branch clan migrated; networks of blood relationships, which imply the order of birth; when and where individuals were born and died; where their graves are located; and when and where they lived, studied, worked and had experiences. Extensive time and place information express the inheritance relationship of a family in every generation from the temporal perspective and convey the distribution and migration information of a family from the spatial perspective. The core information of the family tree is clan information and individual information and relationships. Clan information includes compiling information, the clan branch and migration. This information shows detailed migration information for a clan and its main migrators, such as the time when a migration occurred and the

3 The China Family Tree Geographic Information System

19

Fig. 3.1 Relationships in a family tree

origin and destination of the clan or migrators. Compiling information includes the preface, commentary, autobiographies, honor records, biographies, stylistic rules, clan rules and collection records. Individual information includes name, Zi, Hao, nation, generation, rank, birth time and place, experiences, death time and place and grave location. Zi and Hao are respectful title for a person used in ancient Chinese society. Experiences include when and where the individual studied, lived and worked. Relationships include family and clan relationships. As illustrated in Fig. 3.1, these relationships include father, mother, spouse, stepfather, stepmother, adopted father and mother, and main member or clan member. Other family tree information, also called bibliographic information, includes genealogy place, genealogy name, compiler, compilation mode, version, carrier form, binding form, annotation, abstract and collection unit.

3.3.2 Overview of the Models Based on the above analysis, this study proposes the FTGIS data model, which is composed of five data models. Figure 3.2 shows the components of the FTGIS data model and their relationships. These five data models can be divided into two categories: the spatial-temporal framework data model and the family tree spatialtemporal data model. The time data model and place name data model constitute the data model for the unified spatial-temporal framework. The family tree spatialtemporal data model includes the family tree bibliographic model, family tree item content model and family tree lineage record model. Figure 3.3 shows the time data model, which is divided into two parts. One part contains the entities HistoricalStage, Dynasty, DynastyStage, Emperor and EmperorReignTitle. The other part is the time reference, including YearRef and DateRef entities. The HistoricalStage entity contains the id, name, start and end date attributes. The Dynasty entity contains the id, id of HistoricalStage entity, name, start and end

20

D. Hu et al.

Fig. 3.2 Components of the FTGIS data model

Fig. 3.3 Time data model

date attributes. The DynastyStage entity contains the id, the title of the emperor’s reign, name, creator, start and end date attributes. The Emperor entity contains the id, id of DynastyStage entity, name, historical name, temple name, posthumous title, start and end date attributes. The EmperorReignTitle entity contains the id, id of the Emperor entity, name, start and end date attributes.

3 The China Family Tree Geographic Information System

21

The YearRef entity contains id, gregorian_calendar_year, traditional_calendar_year, lunar_year, gregorian_calendar_start_date, gregorian_calendar_end_date and remarks attributes. The DateRef entity contains id, lunar_year, lunar_month, month_type, gregorian_calendar_start_date, gregorian_calendar_end_date and remarks attributes, as shown in Fig. 3.4. Figure 3.5 displays the place name data model. The AncientModernPlaceName entity is composed of five entities: PlaceName, Type, SubordinateRelationship and SpacePos. The SpacePosRef entity is related to the SpacePos entity, and the Time entity is related to Type and SubordinateRelationship entities. Furthermore, the Time entity includes the start and end time and comprises the time description and standard time attributes.

Fig. 3.4 Year and date references in the time data model

Fig. 3.5 Place name data model

22

D. Hu et al.

Figure 3.6 shows the family tree bibliographic model describing bibliographic information in sets of family trees and relations among a set of family trees and a volume of family trees as well as the relation between bibliographic information and clan. The FamilyTree entity contains the head_info and content attributes. The Keywords entity contains the keyword, version, introduction, create_info, modify_info, publicate_info and data_store_access attributes. The Store entity and Access entity comprise the StoreAccess entity. The Store entity contains the file_name, file_type, store_location and process_environment attributes. The Modification entity contains the modifier, modify_time, and modify_place attributes. The Publication entity contains publicator, publicate_time, and publicate_place attributes. The Creation entity includes the creator, create_time and create_place attributes. The Keyword entity contains the content and type attributes. The KeywordType entity contains the family_name, celebrity, generation_extent and living_place attributes. Figure 3.7 shows the family tree item content model, which expresses the item content, excluding the lineage record and relations between item content and clan.

Fig. 3.6 Family tree bibliographic model

3 The China Family Tree Geographic Information System

23

Fig. 3.7 Family tree content data model

The Clan entity contains the id, ft_name, family_name, ft_item, ft_edit, clan_event, origin_text, other_text, and ft_multimedia attributes. The FTItem entity contains the type, author, time, item_text, and item_multimedia attributes. The FTItemType entity contains the type attributes of the FTItem entity, including preface, genealogical_comment, genealogical_style, honor_record, biography, and clan_rules. The FamilytreeEditInfo entity contains types of ft_edit, editor and edit_time attributes. The MultiMedia entity contains id, title, type, format, description and url attributes. The MultiMediaType entity is related to the Table of MultiMediaType entity and contains the picture_type, audio_type and video_type attributes. The MultiMedia

24

D. Hu et al.

entity description attribute is related to the SubmitInfo entity and the TextInfo entity. The SubmitInfo entity is related to the TextInfo entity and includes the submittor and submit_time attributes. The TextInfo entity contains the title and content attributes. As shown in Fig. 3.8, the family tree lineage record model is the core of the family tree spatial-temporal data model. This data model expresses information about individuals, families and clans, events related to them, and relations among them. There are two types of families: the main family and affiliated families. The Family entity is composed of husband-and-wife attributes, and the affiliated family is related to the MainFamily entity and records the second husband and wife. The Clan entity includes the clan name and totem attributes. The ClanObjRelation entity indicates relations among individuals, clans and families. The RelationType entity indicates individual-individual, individual-family, individual-clan, family-family, family-clan, clan-clan and other relations. The Event entity contains the name, subject, time, place, type, and description attributes. The subject attribute is associated with the individual, family and clan and type attributes, including individual births, deaths, family construction, migrations and clan sacrifices.

Fig. 3.8 Family tree lineage record model

3 The China Family Tree Geographic Information System

25

3.4 Family Tree Information Specification and Sharing 3.4.1 Existing Specifications Associated with Family Trees Sharing family tree information is difficult. Family trees arise from different time periods, nations and families and involve multiple levels of information. Furthermore, family trees in different periods have different compilation features with rich content and various forms of expression and preservation. This complexity causes difficulty in collecting and sharing family tree information. Although massive family tree catalog databases have been established, these systems have distinct data collection, processing, and querying procedures, which makes it difficult to share family tree information. Information specification is an essential and effective way to digitize and share family tree information. A perfect specification should be able to perfectly describe the appearance, structure and content of a family tree; then, information can be exchanged and shared adequately and easily by leveraging it. Such a specification is dedicated to providing a method for implementing the standardized expression and sharing of family tree information. The design of a specification not only takes all content into consideration but also considers different attributes. As an effective way to implement sharing information, family tree information specification has attracted considerable attention. Several description specifications for family tree information have been developed. The Family History Department of the Church of Jesus Christ of Latter-day Saints proposed GEDCOM (GEnealogical Data COMmunication), which is dedicated to providing a flexible, unified family tree data interchange and presentation format that can be processed directly by computers. Currently, the widely used version is GEDCOM 5.5 (GEDCOM Team 1996), and GEDCOM 6.0, which stores data in XML format, has been released (GEDCOM Team 2001). Based on the lineage-linked data model, GEDCOM records information on nuclear families and individuals. In general, a GEDCOM file is plain text using either ANSEL (American National Standard for Extended Latin Alphabet Coded Character Set for Bibliographic Use) or ASCII (American Standard Code for Information Interchange) and is composed of three sections: the header, records and trailer. The header section defines the metadata, such as the genealogy name, founder, source and collector. A series of modified specifications based on GEDCOM have emerged, such as GeniML (Genealogical Information Markup Language), GedML (Genealogical Data in XML), and GenXML (Genealogy XML) (GEDCOM Team 2001). GEDCOM has facilitated the sharing and expression of Euro-American family tree information. However, problems still exist in describing the relationships among individuals, families and clans, events, spatial-temporal information and individual information. The Library of Shanghai, China, has established the Genealogy Description Metadata Specification (Shanghai Library 2005), which normatively defines and describes bibliographic resources. This specification is designed for sharing and interoperating family tree resources among digital libraries. The Genealogy Description Metadata Specification consists of core elements of ancient literature and individual

26

D. Hu et al.

elements. This specification is designed mainly to standardize information, such as the publisher, date, source, creator, and description. Moreover, it extracts only information about ancestors, first migrated ancestors and notable ancestors from the family tree as a content abstract. Thus, it is merely a metadata specification of bibliographic information, not a comprehensive description of the family tree content. For Chinese family trees, none of the specifications mentioned above are able to express their unique content and elements and to perfectly implement the sharing and exchange of information. This is because Chinese family trees are different from those of other countries and have many features (i.e., a long history, clear lineage, well-developed family system and complicated family structure). Therefore, there is an urgent need to develop a new specification or standard for Chinese family trees.

3.4.2 Family Tree Information Specification This study proposes a specification of Chinese family tree information based on the FTGIS data model. The specification includes two parts, namely, metadata specification and content specification. The implementation of the proposed specification is based on XML. In this sense, two types of elements, simple elements and composite elements for family tree information, are defined by XML Schema. The simple elements are used to describe atomic information items of the family tree, family members, and supplementary materials. These atomic information items do not include sub-information items and are expressed by the XML element type, which corresponds to the leaf nodes of the XML document. A composite element is composed of two or more simple elements. Compared with simple elements, composite elements are expressed by XML complexType element and correspond to non-leaf nodes.

Fig. 3.9 Entity group of family tree metadata elements

3 The China Family Tree Geographic Information System

27

As shown in Fig. 3.9, the designed elements for family tree metadata are divided into four core entity sets and three accessorial entity sets. The core entity sets include the header information set, bibliographic management information set, stylistic rules information set and clan information set. The accessorial entity sets include the time information set, spatial information set, and individual information set. The header information set defines the description of the family tree metadata file. The bibliographic management information set is based on the Dublin Core Metadata and family tree description metadata specifications. Structural and nonstructural records are applied to record time information in family trees. Structural records mainly record the precise types and timing methods. More specifically, time information is organized at four levels: year, month, day and hour. The spatial information structure contains existence time, administrative division, geolocation, and present contrast. In this specification, the new concepts of the double time tag and period place name are used to completely and explicitly express the spatial-temporal information (Hu et al. 2010, 2011). The stylistic rules information set is important for directory navigation and providing instructions for locating family tree content. In addition, it provides useful instructions for modifying family tree content. The clan information set is the core of family tree metadata. This is designed to describe and extract core and specified information of family used to distinguish different families and clans. The core element of the metadata specification, the ClanInfo element defined by XML Schema, is shown in Fig. 3.10. The designed elements for family tree content are divided into the basic information set, member information set, time information set, site information set and other information set. Specifically, as the core element, FTBasicInfo includes the id, name, surname, entry, modification, event, other text, multi-media and original text elements or attributes. Among these elements, the entry element, modification element, event element, and other text elements are composite elements. XML Schema definitions of the FTBasicInfo element, FTItem element and Event element are illustrated from Figs. 3.11, 3.12 and 3.13. The FTMember element is composed of the id, name, sex, nation, generation, seniority, clan branch, current state, event, other text, member multi-media, original text elements or attributes. Four elements, including name, event, other text, and member multi-media, are composed of simple elements. The XML schema definition of the FTMember element is shown in Fig. 3.14. More details of the XML elements for family tree information specification can be found in (Feng 2011).

3.5 Mass Family Tree Information Collection The volunteered geographic information (VGI) approach advocates collecting and organizing geographic information through cooperation between users and information collectors (Goodchild 2008). After geographic information is organized and arranged, it becomes the basic data for the public to share and apply. As historical

28

Fig. 3.10 XML Schema definition of the ClanInfo element

Fig. 3.11 XML Schema definition of the FTBasicInfo element

D. Hu et al.

3 The China Family Tree Geographic Information System

29

Fig. 3.12 XML Schema definition of the FTItem element

Fig. 3.13 XML Schema definition of the Event element

Fig. 3.14 XML Schema definition of the FTMember element

material for civilians, family trees have a broad public base. Many family tree software systems have a large number of users, and many family trees have been created or integrated through these systems by these users. Thus, to take advantage of the public’s enthusiasm for genealogical information sharing and tracing ancestors, this study employs VGI and explores a new mode of mass family tree information collection. Many factors, such as the diverse age groups, education levels, and computer skills of users, are considered.

30

D. Hu et al.

There are three ways to collect family tree information, which are introduced below. (1) Collecting information manually in a variety of ways. First, users collect family tree information from the FTGIS website, surname websites, family websites and personal blogs. Second, information collection from the stand-alone version based on Microsoft Excel is available. Users can input family tree information into an Excel spreadsheet. Third, users can also input data through Microsoft Word to create a paper version, which is designed for people who are not able to access the web and are not familiar with computers. This project established a Word document form to help users who need to input family tree information manually. (2) Generating family trees semi-automatically and quickly. This study proposes a family tree collection system that provides an interface for generating family trees semi-automatically. The print version of the family tree is scanned as images. Then, users edit the scanned images and collect information by manually processing them. (3) Converting the data formats of family trees. To support the popular family tree data format GEDCOM, the FTGIS platform enables users to organize family tree data in the GEDCOM format and convert it into other formats, such as XML.

3.6 FTGIS Platform 3.6.1 Architecture of the FTGIS Platform This study proposes the architecture for the FTGIS platform shown in Fig. 3.15, which provides an overview of the key components. The platform adopts a threetiered architecture: data layer, service layer and application layer. The data layer contains the family tree index database, family tree metadata database, family tree webpage database, family tree image database, family tree full-text database, service metadata database, time database, ancient and modern place name database and Chinese historical administrative division database. These databases are the main data carriers for the FTGIS platform. Based on the data layer, the service layer provides users with two types of web services: family tree data services and family tree function services. The application layer is the access interface provided by the general platform. Users connected to the web can access family tree web services provided by the platform.

3 The China Family Tree Geographic Information System

Fig. 3.15 Architecture of the FTGIS platform

Fig. 3.16 Architecture of the FTGIS website group

31

32

D. Hu et al.

To collaboratively construct the FTGIS and share family tree information, the FTGIS website group is proposed. As shown in Fig. 3.16, the basic three-tier architecture integrates surname websites, clan websites and individual homepages. Users can create family trees and edit and share information through a consistent interface. Any individual or group can construct a surname website or individual homepage using the web services provided by the platform. In addition, the platform provides news services using web crawler technology to retrieve and parse family tree news from the internet. By combining the in-depth information collected from the internet, this platform significantly promotes the social sharing of family tree information.

3.6.2 Functions of the FTGIS Platform The FTGIS web services can be classified into two categories: data services and function services, which are shown in Table 3.1 in detail. The FTGIS data services include a perpetual calendar data service, place name data service and genealogical data service. As shown in Figs. 3.17 and 3.18, the FTGIS data services support creating and editing individual information online and editing time and place information. The FTGIS function services support a full-text genealogy retrieval service, place name encoding service, spatial-temporal analysis service, spatial-temporal spectrum visual express service, and statistical analysis service. Figure 3.19 shows the query of present and past place names. As illustrated in Fig. 3.20, the GIS function service supports genealogical lineage information using the tree structure. Moreover, users can view information for individuals, families, and surnames with the help of the web map. Based on data provided by the FTGIS data services, statistical charts and graphs are available online, as shown in Fig. 3.21.

3.7 Conclusions and Future Research To make full use of family trees and to help resolve the related problems in the humanities and social sciences, this study proposes a strategy to construct the FTGIS by incorporating modern information technologies, such as database, GIS and web technologies, into research on family trees. In this way, family tree information can be systematically collected, arranged, analyzed, and integrated. The key FTGIS issues were discussed in detail: (1) a unified spatial-temporal framework; (2) the FTGIS data model; (3) family tree information specification and sharing; and (4) mass family tree information collection. Finally, a multilevel architecture of the FTGIS platform was proposed, and a prototype of the FTGIS platform was developed. The proposal and implementation of the FTGIS dramatically promote family tree information analysis and sharing. Furthermore, the FTGIS provides an accurate tool for integrating and visually expressing potential information in family trees, building spatial-temporal

3 The China Family Tree Geographic Information System

33

Table 3.1 Family Tree Platform Service Description Service Type

Service Title

Description

FTGIS Data Services

Perpetual calendar data service

Supports expressing temporal information in different ways

Place name data service

Supports querying past and present place names

Family tree full-text data service

Supports creating and editing family tree items and individual information online. Multiple ways of querying and browsing family tree information

Historical geography fundamental data service

Provides web maps digitized from Chinese historical atlases, past and present place names extracted from past and present Chinese place names dictionary

Maps and images service

Provides web maps and images

Family tree news data service Provides family tree news from internet using web crawler Family tree metadata service Provides family tree metadata FTGIS Function Services

Full-text family tree retrieval Retrieves family tree information service including individuals, families, clans, and surnames Family tree data transformation service

Converts time data to different record formats automatically

Place name encoding service Supports encoding of 6,000,000 modern place names and 100,000 ancient place names as well as fuzzy encoding Time encoding service

Supports converting time expressed in different ways with precision of Chinese traditional time as day

Spatial-temporal analysis service

Supports spatial-temporal analysis of family tree information

Spatial-temporal spectrum visualization service

Tree structure visualization. Web map visualization

Statistical analysis service

Provides online statistical charts and graphs

spectra and revealing the evolutionary process of families and clans from diverse perspectives. Although this study proposes the FTGIS and achieves many of its goals, some limitations and issues remain to be addressed. In the future, some improvements will be made on the FTGIS platform, case studies will be strengthened, and services for

34

D. Hu et al.

Fig. 3.17 Creating and editing individual information online

Fig. 3.18 Editing time and place information

historical scholars who use family trees as data sources will be provided. From the perspective of historical GIS, some ideas for further research are suggested below. (1) Building a novel and unified spatial-temporal framework and then applying it to a data platform designed for spatial-temporal analysis and visual expression as part of a historical humanities knowledge system. (2) Designing a GIS data model and fundamental historical GIS software based on time, sites, individuals, events, and scenes. (3) Exploring the thematic mapping method and spatial-temporal analysis methods as well as mining potential information and patterns.

3 The China Family Tree Geographic Information System

Fig. 3.19 Querying past and present place names

Fig. 3.20 Visualizing tree structure

35

36

D. Hu et al.

Fig. 3.21 Online statistical charts and graphs

References Academia Sinica. (2002). Chinese Civilization in Time and Space. Retrieved May 5, 2019, from http://ccts.ascc.net. Billy, K. L. (2011). GIS Database of Cotton Textile Inducstry of the Greater Songjiang Region form the Late Ming to the mid-Qing. Retrieved May 5, 2019, from http://www.iseis.cuhk.edu.hk/son gjiang/. Devantier, B. A., & Feldman, A. D. (1993). Review of GIS applications in hydrologic modeling. Journal of Water Resources Planning and Management, 119(2), 246–261. Degbelo, A., Granell, C., Trilles, S., et al. (2016). Opening up smart cities: Citizen-centric challenges and opportunities from GIScience. ISPRS International Journal of Geo-Information, 5(2), 16. Feng, Y. R. (2011). The designation of Family Tree Metadata Specification and its implementation by XML. Diss: Nanjing Normal University. GEDCOM Team. (1996). The GEDCOM standard release 5.5. Family and Church History Department, The Church of Jesus Christ of Latter-day Saints. GEDCOM Team. (2001). The GEDCOM standard release 6.0. Family and Church History Department, The Church of Jesus Christ of Latter-day Saints. Ge, J. X. (1996). The value and limitation of genealogy as historical article. History Teaching and Research, 6, 3–6. Goodchild, M. F. (1993). The state of GIS for environmental problem-solving. Environmental modeling with GIS, 8–15. Goodchild, M. F. (2008). Virtual geographic environments as collective constructions. Acta Geodaetica Et Cartographic Sinica, 31(1), 1–6. Goodchild, M. F. (2009). Geographic information system. Encyclopedia of Database Systems (pp. 1231–1236). Boston, MA: Springer. Goodchild, M. F., & Janelle, D. G. (2010). Toward critical spatial thinking in the social sciences and humanities. GeoJournal, 75(1), 3–13. Goodchild, M. F. (2018). Reimagining the history of GIS. Annals of GIS, 1–8. Harris, T. M., & Elmes, G. A. (1993). The application of GIS in urban and regional planning: a review of the North American experience. Applied Geography, 13(1), 9–27. Harris, T. (2009). Conceptualizing the spatial humanities and humanities GIS. In Keynote presentation at the GIS in the humanities and social sciences international conference.

3 The China Family Tree Geographic Information System

37

Harvard CGA. (2001). China Historical GIS. Retrieved May 5, 2019, from http://sites.fas.harvard. edu/~chgis/. Higgs, G. (2004). A literature review of the use of GIS-based measures of access to health care services. Health Services and Outcomes Research Methodology, 5(2), 119–139. Hu, D., Lü, G. N., Wen, Y. N., et al. (2010). GIS-based family tree information sharing and service. International Conference on Geoinformatics. IEEE. Hu, D., Lü G. N., & Wen, Y. N., et al. (2011). GIS-based family tree system integration. In International Conference on Spatial Data Mining and Geographical Knowledge Services. IEEE. Lü, G. N., Chen, M., & Wen, Y. N., et al. (2009). Research on constructing the Family Tree GIS. In: Proceedings of the 1st Spatially Integrated Humanities and Social Science Forum, Hongkong, China, 114–127. Liao, H. M., & Fan, I. C. (2012). Chinese civilization in time and space: The design and application of China historical geographic information system. E-science Technology & Application, 3(4), 17–27. Nykiforuk, C. I., & Flaman, L. M. (2011). Geographic information systems (GIS) for health promotion and public health: a review. Health Promotion Practice, 12(1), 63–73. Rumsey, A. S. (2009). Scholarly communication institute 7: spatial technologies and the humanities, a conference hosted by the Scholarly Communication Institute, University of Virginia, Charlottesville, VA: June 28–30. Roche, S. (2014). Geographic Information Science I: Why does a smart city need to be spatially enabled? Progress in Human Geography, 38(5), 703–711. Shanghai Library. (2005). Genealogy Description Metadata Specification. 2003 DEA4T035: CDLSS05-015.Shanghai: Shanghai Library. Wang, H. M. (2006). The value and abuse of genealogy. Shanghai Education Research, 6, 63. White, R. (2010). What is spatial history. Spatial History Lab: Working paper [online]. Retrieved May 5, 2019, from http://www.stanford.edu/group/spatialhistory/cgi-bin/site/pub.php.

Chapter 4

GIS for Chinese History Research Yifan Lu and Ping Zhang

Geographic Information System (GIS) has commenced to provide support for historical research worldwide since 1980s, and it is widely applied in the field of Chinese history research from the early 21st century. Recently, with the rapid development of the information technology, the attempt to use the GIS technology leads to many breakthroughs in history studies, especially promoting and enhancing the research of issues in regard to environmental change, rivers and geomorphology, climate change, water conservancy projects, rural settlements, urban growth, diseases spread and old maps researching. Besides, some unsolved problems have been tackled through GIS technology, and thus, historians would like to pay more attention to develop the Geographic Information System and Science as a new approach for significant progress in history. On the basis of that, this paper reviews the process that GIS has been drawn into the research of China Study and summarizes the enormous changes and promotions that this technology brings into the traditional Chinese history research by these following six parts.

Y. Lu College of Foreign Languages, Capital Normal University, Beijing 100089, China e-mail: [email protected] P. Zhang (B) School of History, Capital Normal University, Beijing 100089, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_4

39

40

Y. Lu and P. Zhang

4.1 The Construction of Typical Geographic Information Systems for China Study These years, several representative Geographic Information Systems have been established, such as the Chinese Historical Geographic Information System (CHGIS) which is developed cooperatively by Harvard University and Fudan University,1 the Chinese Civilization in Time and Space (CCTS) which is built by “Academia Sinica”,2 A Historical GIS Dataset of Urban Cultures in Republican Beijing which is constructed by the Institute of Space and Earth Information Science of The Chinese University of Hong Kong,3 the Silk Road Historical Geography Information Platform which is developed collaboratively by the Center for Historical Geography of Capital Normal University and the General Publishing House of Shaanxi Normal University. These databases represent the major progress that the GIS technology achieved these years in Chinese history studies. In these datasets, changing boundaries are linked to various statistical information and historical administrative maps have been enriched with a vast number of datum containing historical events or geographical elements. Moreover, they offer a dynamic expression method for historical maps and textual descriptions so that spatial analysis and map expression have been deeply improved in social science. In addition, these datasets are the platforms for the querying, integration and exchange of information. That promotes historians’ cooperation. In 2001, Chinese Historical GIS (CHGIS) has been developed for purpose of establishing a comprehensive geographic dataset for exploration of the spatial pattern of the past and for further study of history. On the basis of the historical mapping and statistics showing the administrative divisions in each period of history, this dataset goes beyond simple mapping to more complex electronic visualizations. It reflects the continuous changes of historical divisions and place names in history. Besides, large amounts of relevant information have been also displayed such as the changing boundaries of districts and administrative divisions. Therefore, CHGIS provides researchers with much more functions than electronic maps. Combining mapping and the time dimensions, users gain easily access to data querying and acquisition, timeline data and statistics, information retrieval tools as well as spatial analysis. Wider time ranges being accurate to annual changes of administrative regions, from the year 221BC in which the Qin Dynasty established to 1911 when the Qing Dynasty fell, will be completely presented on the platform.4 And now most parts of digital maps have been published and allow users to download from the website.

1 http://yugong.fudan.edu.cn/views/chgis_index.php?list=Y&tpid=700. 2 http://ccts.ascc.net/intro.php?lang=zh-tw. 3 http://www.iseis.cuhk.edu.hk/history/beijing/index.htm. 4 https://sites.fas.harvard.edu/~chgis/.

4 GIS for Chinese History Research

41

Another national historical GIS system, the Chinese Civilization in Time and Space has built a platform with precise spatial positioning, data querying and acquisition, integrated time and space attributed to experts for further spatial analysis. Developed in 2002 as its first edition, CCTS provides rich basic historical geographic data and thematic data by various old map resources. Based on the Historical Atlas of China edited by Tan Qixiang, which provides maps of each dynasty, and the 1930 Shenbao Map of China which was edited by Ding Wenjiang, the integrated historical maps in each period of history over 2000 years are collected in the CCTS and are organized in dynasties. Furthermore, the “Academia Sinica” has linked large amounts of thematic data resources expressed on the historical maps, such as the digital literature system of Chinese works, the Database of grain price in Qing Dynasty developed by the Institute of Modern History, the Chinese Ming and Qing Dynasties’ chorographies union catalog database. By way of using these data resources and a base map of ArcChina drawn in 1990s, the scale of which is 1:1000000, the system could transfer the traditional paper map into new visualization method so that the spatial relations of each historical elements and contents are shown in the electronic maps. Users get the searching information as well as the spatial relations of searching results at the same time. Thus unsolved problems in history studies will be discussed further with new perspective and new questions are to be put forward. A Historical GIS Dataset of Urban Cultures in Republican Beijing developed by The Chinese University of Hong Kong, aims at examining the spatial patterns of modern urban cultural changes in China through historical geographic information system and science. The object of observation is set in this project at Beijing city from the advent of the Republican era in 1912 to the year 1937. And in order to present the urban cultural changes, the data across six sets of cultural spheres, including urban morphology, market culture, education culture, public health and medical culture, legal culture and religious culture, are added to the GIS program for spatial analysis and data comparison so as to explore any implications therein. Besides, it is available for users to browse the digital maps online. The Silk Road Historical Geography Information Platform developed cooperatively by the Historical Geography Center of Capital Normal University and the General Publishing House of Shaanxi Normal University has been launched in June, 2017.5 This system aims at exploring the huge changes of natural environment and cultural landscapes along the Silk Road. It has chosen several elements lasting for over 2000 years as representative objects in order to observe their changes, such as eco-environment, heritage sites, ethnics and religions, traffic and trade, cultural transmission. At present, rich thematic information is involved in the platform, including 300,000 pieces of datum in regard to the place names along the Silk Road. Via extracting these datum, it allows experts exploring the spatial patterns of the past, investigating geographic entities and phenomena both in the spatial and temporal dimensions and then analyzing the spatiotemporal relations of geographic entities on the platform. The system covers the shortage of the historical maps that are lack

5 http://www.srhgis.com/homePage.

42

Y. Lu and P. Zhang

in the expression of the social, economic and cultural factors as well as spatiotemporal phenomena and is focus on a dynamic expression of the evolution process of thematic history with respect to certain fields, such as traffic, ethnics, regimes and cities from 2nd century BC to the early 20th century as well as the hydrology and rivers’ changes over the past 300 years. Due to a time-place-event three-dimensional visualizing expression of those thematic information, users are provided with new ways and perspectives by the subject-based modeling method for promoting a better understanding of the previous phenomena under their personalized studies. Besides, some other data analyzing programs are available to use online including 3D analyst, Kernel analyst, buffer analyst, tracking analyst, etc.

4.2 The Research Regarding Climate, Rivers, Hydrology and Geomorphology Through the Application of the GIS Technology The research on climate, rivers, hydrology and landform in history belongs to historical geography studies. Due to lack of accurate and suitable geographical data, quantitative analysis was rare to use previously. However, owing to bringing in the Geographic Information System and Science, various analyzing methods and rich integrated information promote further historical geographic research referring to numerous issues.

4.2.1 The Historical Climate Research with the GIS Some historians has tried to use the GIS technology for discussing the historical climate issue. Man (2000) has combined the textual historical documents with the GIS technology for discussing the severe drought of 1877. The historical records that the expert could easily have access to are the official documents submitted by the governors of Shanxi and Hebei Province about the situations of drought in 1877. GIS technology provides a new way that enables researchers to extract these datum in the written texts and to reconfigure them spatially in order to analyze the spatial relations of historical geographic factors that could not be found directly in the historical documents. Via organizing the existing data related to disasteraffected villages according to the historical documents, an uniform drought index have been obtained according to the number of villages attacked by drought of different degrees in different divisions recorded in the written documents. Then, the Kriging interpolation method was used on account of the drought index in order to optimally estimate the information of villages which are not mentioned in the historical records but still bit by the disaster, so as to make up for the lack of data of certain villages in the disaster areas.

4 GIS for Chinese History Research

43

On the basis of that method, Zhimin Man drew a specialized distribution map of the drought in 1877, presenting directly and continuously the spatial differences in the drought severity between Shanxi and Hebei province. In this map, with showing the drought grades of difference places, author has observed the location of three drought centers as well as their respective durations. Some other thematic maps such as the boundary map of different divisions have been achieved, which also help the expert locating directly and accurately the drought centers. And furthermore, in terms of the disaster intensity index and its movement both in space and time, he has inferred the movement process of the rainfall zone and the summer monsoon in North China of that year. In addition, it could be confirmed that this severe drought in 1877 of northern China was affected by the strong ENSO event in the worldwide. When the monsoon rain weakened in Asia, consequently the course and feature of the rainfall altered. Recently, some experts, such as Wei Pan and Zhimin Man, continue to use this researching method. Through making increased use of the GIS technology and establishing datasets, they have explored some other similar problems, such as issues relating to the frequency of floods and droughts, and the changes of rivers’ volume of runoff along the Yellow River (Pan 2011, 2013) and Loess Plateau area, and then have illustrated the relationship among the factors including these disasters, the changes of landscapes and the movement of the summer monsoon (Liu and Pan 2014), and thus have obtained abundant achievements (Pan 2014).

4.2.2 The Research of Rivers and Hydrology in History Through the GIS Introducing the Historical GIS as a discipline to history studies is of great help to create new insight into the geographies of the past and then to rebuild the unknown departed landscapes or rivers. Via using the Highest-resolution Topographic Database of Earth generated from NASA’s Shuttle Radar Topography Mission (SRTM data) which aims at obtaining the digital elevation model on a near-global scale, Zhimin (2006) has identified several watercourses of Yellow River in different periods through the remote-sensing images. Referring to the records in historical texts and comparing the ancient and modern place names, the author has confirmed one of the ancient watercourses of Yellow River from the period of Eastern Han Dynasty to 1034 A.D. It flowed through Henlong and then into the sea in the present Shandong territory. This water course of Yellow River was Jingdonggudao, the name of which was recorded in historical documents, nevertheless its flowing route and direction had never been drawn before. Thus, drawing precisely the ancient water courses of Yellow River on the geomorphologic maps has become a breakthrough in the study of historical fluvial geomorphology.

44

Y. Lu and P. Zhang

4.2.3 The Geomorphology and the Research of Environmental Changes Through the GIS Historical GIS expands the expression of geographic patterns in the form of their changing process and aids the historians by means of analyzing the interactions between historical phenomena and geographical elements. Some experts like Deng Hui, used the digital spatial simulation, which is the development of the GIS technology, for the study of the process of desertification in the Mu Us Desert (Deng 2007), so as to show on the maps the quantity of reclamation and the land-use pattern in the Mu Us area in the Ming and Qing Dynasty, especially the quantity of garrison reclamation at Yulin area in the Ming Dynasty. And thus through digital maps, it is possible to visualize the changes of the south side of the desert (Wu 2014). As a result of that, with the historical documents supplied, it could be concluded that the garrison reclamation did not lead to the enlargement southward of the desert, since the south side of the Mu Us desert in the Ming Dynasty was nearly the same with that of today. In addition, in the Qing Dynasty, as history has proven, the land-use pattern in the northern Shaanxi, had really positive effect on the ecological system, that was graining in the south and grassing in the north. This pattern still takes effect up to now (Shu 2016). The environmental changes in the region of Yangtze River Delta is an emerging researching field. A vast number of historical documents show that the main reason for the environmental changes in the Jiangnan region is the changes of the river networks caused by the extension of polders and towns. Experts like Zhimin Man and Wei Pan, using several large scale maps and charts, described the erosion and the deposition intensity of Yangtze Estuary south branch from 1861 to 1953 (Pan 2009) and the channel density with its changes in the area of Qingpu, Shanghai from 1918 to 1978 (Pan 2010). In these two issues, the grid systems were constructed based on the GIS technology for reorganizing the datum about the density of channels and the length of rivers recorded in the 1918 and 1978 military maps. By mean of this method, the datum of maps are able to be extracted and applied in the grid systems for discussing the density contrasts in the equal-area deltas or channel networks. As a result of these comparisons, it is moreover possible to analyze the position of deltas, the length of rivers, the area of river networks and their changes in different periods, and furthermore to estimate the impact of the river networks on the environment of the whole Yangtze River Delta.

4 GIS for Chinese History Research

45

4.3 The Research of Towns and Villages in History via the Application of the GIS 4.3.1 The Urban History and the Research of Urban Historical Geography Under the GIS Platforms In 1990s, researchers, represented by Li Xiaocong and Wu Honglin, initiated reading remote sensing aerophotographic films as a supplementary way for the urban history study. In these films and historical records, they found out the process of the relocation of three cities along the Yangtze River since the Ming Dynasty, that is Jiujiang, Anqing and Wuhu. Furthermore, on the basis of showing the geomorphic conditions and the urban forms as well as their changes in these cities, experts clarified the relations between the spatial expansion of the city areas from the Ming Dynasty and the water course changes of Yangtze River (Li 1992). Via checking out and comparing the large scale maps of Shanghai which were drawn from 1855 to 1990, Xiaohong Zhang (2013) presented the spatial patterns as well as their changes so as to observe the relocation process of different cultural spheres in Shanghai. The process of taking shape of modern spatial pattern in Shanghai, the development of the management in the concessions as well as in the city have been shown to us by GIS analysis. And then, the author’s standpoints have been confirmed with the help of historical documents. Historical GIS gives us an expression method offering the spatiotemporal changing process of the historical geographic information, so that large number of intuitive and visual images which express the changing process are widely obtained for pursuing a dynamic demonstration and exploration. That makes up the disadvantage of using only the textual descriptions. Wu (2008) has focused on the issue of river reclamation. The evolution process of Shanghai is able to be shown through the GIS technology. Using historical records as supplementary, it is possible for us to see various views and the process that large areas of farmlands transformed to urban road system in Shanghai. According to the information about roads, Chen (2010a) has compared the changes of city landscapes before and after the advent of the British concessions in Shanghai by means of analyzing the documents relating to the process of the transformation of the land usufruct. Besides, using the GIS technology, Mou (2012) has presented the process of changes of urban landscapes from polders to downtown in the French concession of Shanghai. And recently, through using the city maps in the period of the late Qing Dynasty and the Republic of China era, experts have tried to analyze the issues relating to the social geography in cities (Wang and Zhu 1999), the class divisions, and the spatiotemporal features of urban crimes, and thus obtained various achievements and deep conclusions about the adjustments of the internal forms and spaces of cities (Zhang and Sun 2011).

46

Y. Lu and P. Zhang

4.3.2 The Research of Town Economy of Jiangnan Region in Ming and Qing Dynasty by the GIS Specialists represented by I-chun Fan have numerous works regarding to the economic development in the area of Yangtze River Delta in the Ming and Qing Dynasty. I-chun Fan has marked all towns in the Taihu area in different periods during the Ming and Qing dynasty on the digital map, the scale of which is 1:50000, and has studied the relations between the rise and decay of towns of different levels and the exploitation of the whole Delta region with the help of GIS statistical analysis. According to these maps, it could be summarized that, except for few large towns which maintained a sustaining growth, most of towns in this area developed and declined unsteadily in the Ming and Qing dynasty. In author’s opinion, the reasons for the increase in the number of towns over the past 600 years are various, and the most important one is not the development of the capitalist economy, or the urbanization, but is the regional development (Fan 2002, 2004).

4.3.3 The Research on the Rural Settlements in History Through the GIS Historical GIS promotes in historiography by providing revised studies to challenge the existing opinions. I-chun Fan (2008) has digitized two valuable village maps in the Hebei area compiled in the late Qing Dynasty, the map of Qingxian and Shenzhou. And then he has drawn the villages on large scale GIS maps. Using the spatial layer analysis method, several mentioned factors in the historical records have been classified and shown overlapped for data integration and then a new layer has been formed containing certain factors. Through this layer mode, the author has acquired some features and characteristics of these northern Chinese villages, including the patterns of land allocation, settlements, population, education, elites, religions, markets, etc. Based on the comprehensive analysis of various factors in this area through the GIS technology, the author raised doubt about the traditional view that the villages usually distributed in virtue of the location of the periodic markets in north of China during the late Qing dynasty. By means of comparing the maps of town markets with that of the villages, in combination with the charts of the density of the population extracted from the layer analysis, it is available to present that the relations between the villages and the population are linked by various types of markets. As a result of that, it is possible to illustrate more about the internal logic among the markets, villages and the population size in the area of the northern China.

4 GIS for Chinese History Research

47

4.4 The Research of the Economy and Society in History via the GIS Irrigation problem is a hot debated issue in the field of social history recent years. Li (2012) has discussed the history of conflicts for the water resource from 1763 to 1945 in the region of an irrigated area, named Houcunzhen, that is from the Danshui River which is located in the west of Taipei Basin, to the west bank of the Dahanxi River. The positions of the irrigation area and the irrigated canals in the farmland in different periods have been marked on the digital map by the author, and it is possible to find out the connection between the distribution of the origin of the irrigated canal and the stream segments in each period. On the basis of that, the locations of certain factors such as the place of residence, the ancestral home, and the intake and the pump station of the water conservancy project along the Daxi River were also marked on the maps, and then were added on the geomorphologic maps of Google Earth. By this way, author tried to analyze the characteristics and the interrelation of each location of these factors, and the relations between the places of these factors and the area in which the conflict of water resource took place. And furthermore he attempted to discuss the differences between the traditional method for water development and the modern water conservancy project. According to author’s opinion, the traditional water resource facilities were so rough. As an irrigated area, Houcunzhen was located at the end of the waterway, it took the least water withdrawal in this area, so that in the dry season, the disputes for water resource among different villages was the most drastic. Moreover, the water-supply method and the water flow direction were also the reasons that have impact on the water conflicts. However, in modern times, large water conservancy facilities started to come into use. The government, the provider of these water resource projects and the protection to the irrigation areas, became the focal point of the water conflicts, and as a result of that, the water resource dispute has been a problem between the authority and the residents.

4.5 The Research of Ancient Maps and Their Digitization by the GIS Technology In 2004, experts from Taiwan, like Jinn-Guey Lay, have tried to investigate the Taiwan Qian-Hou-Shan Map for spatial analysis (Lay 2004). The Taiwan Qian-Hou-Shan Map, released in 1878, is one of the most important old maps for that era. Drawn based on the scientific survey, it consists of latitude and longitude coordinates which were initialized in the later years of the Qing Dynasty. Experts scanned this old map, located and digitized it, then conducted the coordinate transformation for the spatial data, and overlapped it with the modern map of Taiwan province so as to reconstruct a unified spatiotemporal framework with historical elements. The analysis of old maps was seldom conducted by researchers because of lack of analyzing tools for transforming the old maps to digital maps

48

Y. Lu and P. Zhang

which are conforming to the modern mapping criteria. Recently with the aid of GIS technology, different types of maps are available to be transformed to the GIS map under the same standard for spatial quantitative analysis, thus it is possible to know the old people’s spatial cognition level, to investigate the quality of maps, to extract useful information and to provide new kind of data and model for supporting history research. Thus, experts, such as Lay (2004) developed a new approach of quantitative geometry analysis based on a geographic information system and considered that historical spatial cognition can be effectively interpreted, and the interpretation can enhance research in historical geography. These days, specialists have further researches on the ancient unified maps, town maps and cadastral maps. Moreover, large number of maps, such as the measured maps of Qing Dynasty, the scale of which is 1:50000 or 1:100000, have been digitized and applied for much deeper researches.

4.6 The Exploratory Research for the Methodology for Digitizing the History Geographic Information The digitized information in historical geography is related to the researching aspects including the administrative divisions, populations, economy, land utilization, ethnics, religions, and culture. The information was always extracted from traditional Chinese records and documents. However, most of these records and documents were qualitative descriptions and were far less than systematic. Thus how to transform the scattered information to useable geographic information became an important methodological problem for historians. Therefore, a series of new attempts have been promoted and some methods have been demonstrated as useful (Jiang 2015). GIS technology makes it possible for historians to make full and simultaneous use of digital maps containing three components: space, attributes and time and then to analyze in combination with other historical elements. The Grid system put forward by Zhimin Man is a method which deserves particularly recommendation. Grid System is one of the principal methods to standardize the spatial data in Geography. Via this geographic technology, it is possible to divide the geographical interfaces into several grids, the size of which could be selected as needed, so that researchers get the degree and the density of certain factors in equal area and thus they could compare the different densities of each factor distributed in this equal area and then evaluate the land utility degree and the land use efficiency in different areas. Zhimin Man has shown an example. Experts who need to analyze the land use status and their changes in Shanghai in a period of time, should compare the changes of three sets of datum, that are the data reflecting the features of hydrographic nets, the spatial changes of settlements, and the city changes. In general, these datum are classified respectively by the data of river nets, settlements and city’s districts

4 GIS for Chinese History Research

49

and blocks. However, these three sets of datum could be expressed separately on the GIS spatial data system as points, lines and surfaces in terms of three types of presentations, which had difficulties to be shown on the same charts before. So it is a key problem to solve by experts about how to compare these datum on one chart via using the same standard. This is also regarded as a problem of standardizing the data. In the opinion of Man (2008), the grid system is an instrument that has advantage to contain and standardize different types of datum. So it is possible to present these different types of datum on the same plane in the Grids by means of transforming in space these various types of historical datum and records. Therefore, it is convenient to present the spatial patterns of land covers, and also the man-earth relation in a small area. In brief, the grid system has solved the problem of the accurate calculation in the regional study and thus experts are able to develop the spatial quantitative analysis regarding to the way and the process that humans use resources. And the accuracy of the research on the man-earth relationship has also been largely improved. The Chinese history aspects that GIS technology could apply to also include many other issues, such as the positioning of archaeological sites developed by Nishimura (2016), the manuscripts from Dunhuang Grottoes and its geographic information researched by Rong (2016), the historical population studies researched by Lu (2012, 2014, 2015), the disease studies in history presented by Gong (1993, 2014), the disaster research developed by Kong (2017), the religions research conducted by Chen (2010b), and the research of the land use and the cadastral management presented by Zhao (2005). Recently, Geographic Information System and Science has become an important analysis instrument for Chinese history research which is widely used in the field of archeology, historical geography and regional social history. The experts in these fields have persistently promoted the use of GIS technology for their studies. And until now, the GIS technology has been applied to the research of Chinese history for over 20 years. As a tool of mapping, querying, analyzing and researching, it has already given initial successes. And we have considered that the more complicated the historical issue is, the more the GIS technology could contribute to solve the problem. Owing to the continuous development of historical spatial database, the GIS technology will play more and more important role for history research in the future.

References Chen, L. (2010a). The changes of urban and rural landscapes in modern time of Shanghai (1843– 1863)-according to the analysis of the data of the road system in Shanghai. The Doctoral Thesis of Fudan University. Chen, Q. (2010b). Landlord, religious organization and dispersion of lower Danshui tribe in Wandan Region of Pingtung Plains (1720–1900). Research of Taiwan History, 3, 1–37. Deng, H. et al. (2007). The changes of the south side of the mu us desert since ming dynasty. Chinese Science Bulletin, 52(21), 2556–2563.

50

Y. Lu and P. Zhang

Fan, I.-C. (2002). The nature of the expansion of the Jiangnan market towns in the ming-qing dynasty. Institute of History and Philology Bulletin of Academia Sinica, 73(3), 443. Fan, I.-C. (2004). Market towns and regional development to the east of lake tai during the mid-ming dynasty. Institute of History and Philology Bulletin of Academia Sinica, 75(1), 149–221. Fan, I.-C. (2008). Local society in late Qing Hebei as seen in village maps from two counties. Journal of New History, 19(1), 51–104. Gong, S. (1993). A preliminary study on variations of the distribution of zhang-disease for the past 2000 Years in China. Acta Geographica Sinica, 48(4), 1993. Gong, S. et al. (2014). A geographic study of epidemic disasters of Jiangnan area in ming dynasty (1638–1644). Geographical Research, 33(8), 1569–1578. Jiang, W. (2015). Research on urban Population of Jiangnan in the Republic of China: Base on GIS and the data of topographic maps. Researches in Chinese Economic History, 4, 39–56. Kong, D. et al. (2017). Spatial-temporal characteristics and environmental background of locust plague in Beijing-Tianjin-Hebei region during ming and qing dynasties. Journal of Palaeogeography, 19(2), 383–392. Lay, J.-G. et al. (2004). Quantitative Analysis of 1878 Taiwan Qian-Hou-Shan Map, Symposium of the 1st Seminar of the Toponymy in Taiwan, 253–271. Li, C.-Y. (2012). Inside-out: The historical changes of the dispute of water conservancy in the irrigated area of Houcunzhen. Journal of Baisha Historical Geography, 2, 65–169. Li, X. (1992). Using the remote sensing images for urban historical geography research-taking example of the relationship between the changes of cultural landscapes and that of river courses in three cities, Jiujiang, Wuhu, Anqing. Journal of Beijing University, 37–41. Liu, H., & Pan, W. (2014). The Initial Time of the Rainy Season on the Loess Plateau during 1766–1950 and its Response to the Summer Monsoon. Journal of Earth Environment, 5(6), 378–382. Lu, W. (2012). The distribution of Hui people’s settlements in Shaanxi-Gansu Areas and the Related Database Construction before the Tongzhi Reign of the Qing Dynasty. N.W. Journal of Ethnology, 4, Total No. 75, 37–45. Lu, W. (2014). A research on the small probability events with tiny population base under gis case study on scale and spatial distribution of Chin-shihs from Hui ethnic group in qing dynasty. The Journal of Hui, 2, 54–61. Lu, W. (2015). Spatial Distribution of Hui Muslin Chin-shihs and Population during Ming and Qing Dynasty. Journal of Beifang University of Nationalities, 2, Total No. 122, 99–105. Man, Z. (2000). Climatic background of the severe drought in 1877. Fudan Journal (Social Science), 6, 28–35. Man, Z. (2008). Spacial and temporal data structure for local study. Journal of Chinese Historical Geography, 23(2), 5–11. Mou, Z. (2012). From the ancient water town to the eastern Paris-the research on the urban space changing process in the French concession of modern Shanghai, Shanghai Bookstore Publishing House, Aug. 2012. Nishimura, Y., & Kitamoto A. (2016). Re-identify ancient ruins on the silk road and establish ruins database. Journal of Shaanxi Normal University (Philosophy and Social Sciences Edition), 2, 75–85. Pan, W. (2009). Reconstruction of erosion-deposition in yangtze river estuary south branch and related problem study, 1861–1953. Journal of Chinese Historical Geography, 24(1), 22–29. Pan, W. et al. (2010). The grid methods of drainage density data reconstruction in big river delta—based on the case of Qingpu, Shanghai, 1918–1978 A.D. Journal of Chinese Historical Geography, 25(2), 5–14. Pan, W. et al. (2011). Reconstruction of the precipitation(May-Oct) in the upper and middle reaches of the Yellow River (1766–1911). Journal of Earth Environment, 6(1), 285–290. Pan, W. et al. (2013). The study for relationship between PDO and the Streamflow of Yongdinghe River (Lugouqiao) since 1766AD. Journal of Chinese Historical Geography, 28(1), 127–133.

4 GIS for Chinese History Research

51

Pan, W. et al. (2014). The changing of chinese coastal typhoon frequency based on historical documents, 1644–1911AD. Geographical Research, 33(11), 2196–2204. Rong, X. (2016). A view of the historical-geographic information of the ancient Gaochang in terms of the unearthed documents in Turpan. Jounal of Shaanxi Normal University (Philosophy and Social Sciences Edition), (1), 12–24. Shu, S. et al. (2016). The Temporal and spatial distribution of settlements in the area along the great wall in Yansui town during the late ming dynasty. Geographical research, 35(4), 790–802. Wang, J., & Zhu, G. (1999). A preliminary study of the social geography of beijing during the late Qing and the early republican period. ACTA Geographica Sinica, 54(1), 69–76. Wu, J. (2008). From ancient water town to Metropolis: The changes of the urban road system in modern Shanghai (1843–1863). The Doctoral Thesis of Fudan University. Wu, C. et al. (2014). The study on the land development process in the Border Area between Shaanxi and Inner Mongolia. Geographical Research, 33(8), 1579–1592. Xiaohong, Z. (2013). The maps of modern cities and the research of the urban spatial pattern in the british concession area of shanghai since it opened to foreign traders. Journal of Historical Geography, 28(2), 2013. Zhang, X., & Sun, T. (2011). Urban space production: Urbanization of Wujiaochang Area in Jiangwan Town of Shanghai in 1900–1949. Acta Geographica Sinica, 31(10), 1181–1188. Zhao, Y. (2005). The land utility in the area of Sunwan and its motivations. The Doctoral Thesis of Fudan University. Zhimin, M. (2006). The research on the flowing Route of Jingdonggudao of yellow river in the northern song dynasty. Journal of Historical Geography, 21(2006), 1–9.

Chapter 5

Digital Historical Yellow River Wei Pan, Rao-rao Su, Zhi-min Man, Li-jie Zhang, Mi-mi He, and Li-kun Han

5.1 Digital Historical Yellow River The development of historical geographic information needs to try new ideas and means. “Digital Historical The Yellow River”(DHY) is a project started in 2017 by Yunnan University and Shaanxi University. It is a sample of “Digital Historical River”, which consists of six aspects: (1) high-precision three-dimensional microgeomorphology; (2) fusion scheme of historical hydraulic engineering and terrain model; (3) restoration of three-dimensional shape of river channel; (4) simulation and demonstration of motion process in historical period of surface water (5) reconstruction of rainfall characteristics in historical periods; (6) river-water management methods in historical periods. The practice of “Digital History Yellow River” as the concept of “Digital History River” is not only a visualization result showing the temporal and spatial changes of the Yellow River channel in the historical period, but a professional historical data management platform + a special data set + a series of historical information analysis and display features (Pan et al. 2012). The basic components of DHY include materials database, 3-D terrain module, the Yellow River water environment event information management module, data management platform and analysis-simulation function module. DHY currently focuses on the Yellow River-related information during the Qing Dynasty-Republic. Among them, we have completed the design and construction of the database, the W. Pan (B) · L. Zhang Institute of Historical Geography, Yunnan University, Kunming 650091, Yunnan, China e-mail: [email protected] R. Su · M. He · L. Han Northwest Institute of Historical Environment & Socio-Economic Development, Shaanxi Normal University, Xian 710119, Shaanxi, China Z. Man Center for Historical Geography, Fudan University, Shanghai 200433, China © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_5

53

54

W. Pan et al.

Yellow River basic hydrological information database and the financial management information database. The materials database is a historical information platform for the Yellow River with query, download, online browsing, annotation and data association functions. The data is divided into the rivers, the Qing Dynasty river archives, the Republican archives, the Republic archives and the folk literature. The database is currently able to hold files in formats such as DOC\PDF\PPT\EXCEL\JPG, and we plan to expand the data types to allow the library to manage video and audio files. 3-D terrain module can simulate the terrain with multi-scale, to show the water move direction, speed and scale in virtual environment. The Yellow River water environment event information management module was used to managed the data about historical hydrological changing, flood disasters, etc. Data management platform and analysis-simulation function module will assistance user to observe and analysis the data in the system or personal‘s. These 2 module will create kinds chart, provide “deep mapping” to users (Schreibman et al. 2004). Through the work of DHYR, we initially tried three-dimensional, dynamic, historical hydrological simulation and historical water-scenario simulation in historical geographic information 2.0. We hope that the practice of this work can enhance the level of historical geographic information (Chen 2014; Tu 2014), combine the information operation method with the actual research, and cultivate the deep soil of historical geographic information development, so that this direction can have long-term vitality.

5.2 The Relationship Between Qing Government Finance and the Yellow River Management Yellow River management is one of main political affairs in Qing Dynasty. The Empire and the emperors needs the Yellow River keep peaceful, so the greatness of emperors could be felt by peoples over the river. However, the political affairs need finance power to afford the huge engineer (Pan et al. 2020). Based on the DHY, huge amount of files, archives and old maps can be used as research materials. The software “Voyant” can be used in analyzing the structure characteristics of historical documents. The analyzing result showed that the silver supplies became the most important theme since 1740AD. The Yellow River management become a financial problem in about 1740s and continued until the collapse of Qing Dynasty. The government officers must provide mounts of silver for projects or engineering taken place along the river. The Fig. 5.1 describe the scale of silver spend in different counties over the lower reaches of the Yellow River in 1750s. It is just an example, in fact, the whole period of Qing Dynasty, the distribution of silver spend on the Yellow River changed every year. In order to make sure the management of Yellow River affairs within the control of central government, the general structure of Yellow River had been determined,

5 Digital Historical Yellow River

55

Fig. 5.1 The silver distribution over the Yellow River in 1750s

especially after the maintenance by Jin Fu in Kangxi Period. The stable financial system was increasingly important for government to operate the maintenance of Yellow River (Pan 2019). The quota river maintenance fund system developed by Qing Dynasty should not be interpreted as only fixating the amount of river maintenance fund, but also its source, management departments, expenditure item and soon. However, this quota assumption of central government was too difficult to realize, it was never actually implemented in practice for river maintenance. Qing government attempted to control the cost of river maintenance with quota management system, however, the actual spending was still hard to be restrained with the system. It could be revealed that the purpose of quota management system was hard to realize through the repeating discussion of supplementary fund from Qianlong to Jiaqing dynasty. In practice, the quota management system was continually adjusted, even the source of the fund was not stable. The expected quota management system of Qing government was built on the financial system based on the agricultural economy. It was nearly impossible to be stable. While the supplementary fund in Qianlong and Jiaqing dynasty was built on inchoate financial system, such as salt administration in Henan province and civil exchangers in Shandong province. Many inherent conflicts could be found out through the study of quota management system. To show this spatial feature better, thematic cartography technology was applied in the article to reveal the specific condition of quota river maintenance fund collection in Henan and Shandong with visualization method. The county level administrative region data in 1820 from CHGIS is quoted in the article. With the geographic information systematic software ArcGIS, the record in this report of quota river maintenance fund record will be interpreted into geographic data which is easier to understand and analyze (Man 2002). With the DHY, the record in this report of quota river maintenance fund record will be interpreted into geographic data which is easier to understand and analyze.

56

W. Pan et al.

The counties fulfilled the fund collection tasks in Henan were less than 50%, and this index in Shandong was only about 65% as well (Pan 2014). It proved that completion status of quota river maintenance fund collection was not satisfied enough and the tax of the land tax with apportioned poll tax could not annually well support the quota cost of east river maintenance. Quota river maintenance fund system was unstable since fund collection, which not only related to the unbalanced quota fund collection undertaken by minority counties but also the spatial condition of this system. For example, the states and counties which undertaken more quota tasks in Shandong concentrated mainly in the angle zone formed by Yellow River, Henan and Hebei. This area located among the Grand Canal, Nansi Lakes and Yellow River. Flood disasters happened frequently in this area because of its low and plain terrain. Subsequent problems were the difficulty of fund collection and the arrears of during the collection. The quota collection of river maintenance fund became more and more difficult (Zhang 2018). The statistics of the extreme disasters in Anyang, Luoyang, Zhengzhou, Nanyang and Xinyang in Henan province in Qianlong dynasty based on DHY, which listed the frequency of 1 (the most water logging) and 5 (the driest) in the drought-flood grade, as below. Figure 5.2 reveals clearly that the regions with heavier disasters undertook more responsibility of river maintenance fund collection. In order to reveal the contradiction degree between quota river maintenance fund collection and disasters, the concept of contradiction index is introduced in the article. The most ideally reasonable condition is considered as the least fund collection responsibility for the region suffering the most severe consequence of disaster and the same with that least responsibility for the region suffering the least severe consequence of disaster.

Fig. 5.2 The relationship between disasters and hydrological quota funding during 1736–1795AD (Qianlong Emperor period)

5 Digital Historical Yellow River

57

In the distribution diagram of flood and drought of atlas, every site stands for the scope of one or two regions in administrative division (or one or two mansions in the historical period), which means every dot in the atlas stands for the spot and its perimeter zone. The data of the dot could be considered as the condition of the spot and its perimeter zone. There was a great conflict between the increase of Yellow River maintenance projected expenditure and the payment ability of various counties. Some counties along the river, which suffer the disasters more easily, undertook higher quota of fund collection, especially the areas in dyke building on the south bank from Yingze to Yucheng and dyke building on the north bank from Wuling to Kaocheng in Henan. However, these areas were just the ones which were frequently impacted by the Yellow River flood in this 350 km river reach from Wuling to Xuzhou city, and more than 140 levee failures occurred in this reach from early in Qing dynasty to the levee failure of Tongwaxiang (Pan 2014). The counties with heavier disasters undertook heavier financial burden. There were serious unreasonable circumstances in the distribution pattern of quota river maintenance fund collection and the distribution area of disasters. This framework to some extent threaten the stability and effective operation of the quota river maintenance fund system of Qing government. Usually counties along the river were under a greater threat when experiencing the drought and flood of Yellow River, especially flood. Along with the increasing maintenance projects of Yellow River and the growing predicted expenditure of river works, river affairs management was also become more and more difficult (Piao et al. 2010). The river maintenance expenditure of Yellow River increased in late Qianlong dynasty, but the existing quota river maintenance fund system could not be implemented ideally and meet the requirements. The quota river maintenance fund system in Henan experienced the reconstruction from middle to late of Qianlong dynasty to Jiaqing and Daoguang dynasty, which was that the supplementary fund appeared in the mid-term of Qianlong dynasty and was abolished in the late term and proposed once again in early Jiaqing dynasty, and fundraising and interest-bearing was proposed in mid-to-late term of Jiaqing dynasty. The supplementary fund was collected by counties at first and donated from the compensation salary of nourishing honesty by officials later and finally stabilized by the measure of supplementary fund saved from fundraising and interest-bearing. Based on DHY and GIS, the conclusion is that the collection of quota river maintenance fund relied on minority counties in a great extent and other counties undertook little proportion of collection. More importantly, the counties with heavier disasters undertook heavier financial burden. These spatial features directly influenced the punctual and full payment of quota river maintenance fund. Because of these institutional problems, the quota river maintenance fund system lacked the sustainability and the river maintenance project faced the increasingly severe challenges in the context of increasing material price and hired labor instead of dispatched.

58

W. Pan et al.

5.3 The Collapse of the Yellow River Finance in 1820–1840 5.3.1 The Changing of the Hydrological Environment Over the Yellow River in 1820–1840 In the early 18th century, the Qing government began to set up water level observation stations on rivers in China. The water level observation station at Wanjintan is on the Yellow river, north of Laoxiancheng, Shanxian county, Sanmenxia city, Henan province. It is an important data source for monitoring the water conditions of the middle reaches of the Yellow River. Similarly, there are observation stations on the Qinhe River and Yongding River, located at Muluandian in Wuzhi County and Shijingshan-Lugouqiao, respectively. According to the rules of Qing Dynasty, when the water level rises 2 Chi (Chi is a Chinese length unit; 1 Chi ≈ 0.32 m) or more, the date and height must be reported to the imperial government (Zhuang and Pan 2016). At present, these reports are scattered through the following sources: ‘Extracts of the water condition of historical floods in Qing Dynasty at Wanjintan and Xiakou on the Yellow River; Muluandian on the Qinhe River; and Gongxian on the Yiluo River’, edited by the Yellow River Conservancy Commission in the 1980s as internal documents as Fig. 5.3 (Liu et al. 2012; Wei et al. 2013). According to the average situation of the reconstruction, the beginning times of flood seasons of both the Yellow River and Qinhe River range from July 6 to July 10. Meanwhile, the flooding season of the Yongding River begins a little later, and

Fig. 5.3 The Water Level spots in the basin of the Yellow River during 1766–1911AD

5 Digital Historical Yellow River

59

ranges from July 16 to July 20. Here, we reconstruct the chronology of the beginning of the flood season of the three rivers on a pentad scale (Mantua et al. 1997). The flood season and the fluctuation at Sanmenxia on the Yellow River during 1766–1911AD, the Qinhe River during 1761–1911AD, and Yongding River during 1736–1911 were reconstructed based on the water level observation reports of the Qing Dynasty. 5-pointsmoothed chronologies show that flood season was advanced and delayed during 1820–1860s and 1870–1880s, which correlates negatively with the temperature change of the loess plateau. This phenomenon is especially apparent in the 1880s (Zheng et al. 2005). Through establishing the regression model, inverts the annual runoff of 1766– 1911AD, builds up and improves the annual runoff series of 1766–2000AD in flood season in Lanzhou, Qingtongxia and Sanmenxia by using the records of water level stake of Sanmenxia stations in the Upper- Middle Yellow River (UMYR) in the Qing Dynasty. Combining the annual runoff of 1766–1911AD at Tangnaihai Station in riverhead reach, the study builds the runoff series of four stations at the riverhead and UMYR, which is presently the clearest runoff curve of the Yellow River by historical records. According to the research, the heavy “river disaster” that appeared in the lower Yellow River in the mid-19th century was caused by sudden changes of the runoff at the Qingtongxia-Sanmenxia section. Drought period of the river in the 1920s existed from the riverhead to the middle reach, but it was not caused by sudden changes. Meanwhile, the study also reveals that PDO and the runoff of the UMYR had a periodic inverse phase relationship on the inter-decadal scale. In the early and mid-20th century, the runoff of the four stations had an inverse phase relationship on the scale of 8–16 years. In the 1830–1850s, the inverse phase relationship between PDO and flow on the scale of 4–6 years was more obvious at Lanzhou-Sanmenxia section. According to the interactive wavelet analysis, there is a significant inverse correlation between PDO and the amount of water in the UMYR on a scale of 8– 16 years, but only at the Sanmenxia-Lanzhou section, suggesting that the relationship between summer rainfall in the UMYR and PDO had obvious temporal and spatial differences. (1) During Qing Dynasty, the change of runoff flow in the UMYR had obvious differences; On the natural state, there was no obvious consistency in the flow change of the UMYR. The occurrence of sudden change time point was not synchronousin history. In the long term, the runoff change of the UMYR had a unique phenomenon. The simultaneous reduction of flows of each reach since the 1970s is a special phenomenon, at least it is the only phenomenon discussed in this study within this time range. (2) It is concluded that the correlation between the PDO and runoff in the UMYR is periodic and there is no special obvious linear relationship, but regional differences are more obvious. The inverse correlation between PDO and runoff in the study reaches is mainly on a decadal scale. The Lanzhou-Sanmenxia section is relatively sensitive in the face of the change of the PDO on the decadal scale. When formulating the water resources strategy of the Yellow River, we should notice the differences in the response of different sections to the same environmental background. (3) In the mid- 19th century, many large- scale floods in the lower reach resulted from the sudden increase of runoff in the middle reach. In the reign of Emperor Daoguang of the mid-19th century, the Qing Dynasty declined rapidly.

60

W. Pan et al.

During this period, large- scale flood disasters occurred in many parts of eastern China, especially in the populous North China Plain and Taihu Basin. The flood brought huge financial and social losses. Among them, eastern Henan of North China Plain suffered from the flood disaster in successive years by burst of the Yellow River in the 1840s, and the central government spent a huge amount of money to solve the problem of the river, which greatly aggravated the financial difficulties in that period. The large scale flood in the lower Yellow River corresponds to the period of sudden change of runoff low in Sanmenxia section revealed by this research, which indicates the sudden increase of rainfall in the Loess Plateau. Climate change was deeply involved in China’s decline and depression during the reign of Daoguang Emperor. (4) Although some progress has been made in reconstruction of multi-site and long-time runoff series of the Yellow River based on different materials, further work is needed in data analysis so as to make clear the sequences of uncertainty (Shi et al. 1990), thus enabling the integration of data in the future to provide basic data for further research on long-time spatial and temporal change of runoff of the Yellow River. The colder Northern Hemisphere period is not consistent with the warm summer temperature on the Loess Plateau. It is also inconsistent with the advance of the flood season revealed in this research. The results show that the beginning time of flood season of rivers in this area is related to the temperature fluctuation of the Northern Hemisphere, whereas it should be even more closely correlated with the summer temperature change on the loess plateau. This phenomenon could be a multiyear response to the change of the intensity of the monsoon of East Asia. The maxim value of the Yellow River runoffs in past 300 years appeared in 1820– 1840s, and the flood season was earliest during the runoffs peak. The sudden climatic changing occurred in 1820–1840s lead to the hydrological variation over the Yellow River.

5.3.2 The Hydrological Challenge of Daoguang Period The Yellow River finance system had already been malfunction before Daoguang Period (1820–1840s). The huge amount of silver needed every year made Qing government feel big pressure of finance. In Daoguang period, the silver less and less in Emperor‘s economic and financial section. As the structural faultiness of the Yellow River management finance has been introduced in this paper, the silver collection became more and more difficult as the flood disasters more serious since 1810s. The silver cannot be got from the towns, villages and cities over the Yellow River. First of all, the Hedaoku (河道库) originally has a certain amount of deposits, in case of emergency. However, during the Daoguang period, there was a shortage of deposits in Hedaoku (河道库). On June 18th on the seventh year of Daoguang, Yan Liang, the Governor of Hedong River Road, mentioned in a memorial that a certain amount of deposits of Kailuan and Hebei Hedaoku (河道库) can be used for

5 Digital Historical Yellow River

61

emergency purposes. However, the storage of these two Hedaoku (河道库) has been significantly reduced in recent years. This shows that the storage of the quota river funds is not ideal in the Daoguang period. During the Daoguang period, although the emperor repeatedly warned the minister of river affairs that the state funds had its management system, it still could not guarantee the stable supply of the river funds, and the quota system was difficult to sustain at this time. In the eleventh year of Daoguang, the Bangjiayin (帮价银) was reduced from 300,000 Liang to 250,000 Liang. However, this amount has not been seriously implemented, and river officials often avoid this when they are budgeting. The amount of the river funds used for purchasing the materials in Henan province is far greater than 300,000 Liang. This is an important manifestation of the non-binding nature of the quota system. Most importantly, some changes can be seen from the way the river officials apply for silver. As mentioned above, Yan Liang asked for extra two thousand duo of straw to meet the needs for material during 1820–1836AD. The emperor’s reply is in accordance with the application. The situation that quoting the former way of asking for more river funds is along with the Daoguang period. It is important to note that the governor of Henan Province and the emperor had discussed the issue of “increasing the river funds” for several times at the turn of Qianlong and Jiaqing. And the new quota was finally set. Although the scale of the river funds has been expanded, the quota system itself has been retained and it is still possible to restrict the increase of the river related expenditure. However, during the Daoguang period, discussions between the emperors and ministers about the increase in expenditures of the Yellow River were no longer carried out under the premise of a quota system. This has a great relationship with the attitude of the Emperor Daoguang towards the quota system. In the early days of Emperor Daoguang’s administration, he resolutely kept the quota unchanged, and did not allow the behavior of increasing the amount of the river funds. On September 11thon the seventh year of Daoguang (1827AD), there is a memorial from Emperor Daoguang to Cheng Zuluo, the governor of Henan province,its general meaning is as follows: I have reviewed the documents related to the increasing of the river funds between the governor of Henan province and the central government in the 57th year of Qianlong (1792AD) (Ren et al. 2008). The Emperor Qianlong clearly opposed the increase of the river funds. This behavior can only be used as a temporary measure under special circumstances and cannot be mistakenly believe that the behavior of increasing the material and the quota river funds is a new quota standard set by the central government. Yan Liang’s request to ask for extra two thousand duo of straw is an act of arbitrarily increasing materials, which affects the people’s daily life (Verdon and Franks 2006). In short, the Quota river funds system, which is closely related to Diding tax, has been in poor operation since the late Qianlong period. During the Jiaqing period, a new quota had been set, and the use of interest was also used to make up for the arrears of the river funds. However, the expenditure of the river affairs are extremely huge, and it needs millions of silver each year in common situation. The financial system that is still in its infancy cannot afford such a huge amount (Lingling et al.

62

W. Pan et al.

2007; Xiaohua et al. 2010). In the Daoguang period, the quota system has become ineffective. The main reason is that the supply of the fixed amount of river funds has been increasingly dependent on items outside the country’s normal fiscal system, such as donations (捐纳). In the 1840s, the hydrological environment of the Yellow River was abrupt, causing big disasters in the south of Henan Province. The temporary large-scale engineering to settle these disasters was not subject to quota control, which further led to unrestricted expenses. However, from the perspective of the operation of the quota system, as early as the 1840s, the quota system of river funds was already rampant and could not play the role of “limitation”.

5.4 Conclusion The Runoff of the upper and middle reaches of the Yellow River in flood season from AD1766 to AD2000 clearly shows the changes of the water environment of the Yellow River in the Qing Dynasty. According to this understanding, it can be found that the direct cause of the year-after-year burst flood in eastern Henan in 1840s is the sudden change of runoff in the Qingtongxia-Sanmenxia reach, and the lower reaches of the Yellow River is at the highest stage of water level since Qianlong 30 years ago. In this context, a detailed study of the Qing Dynasty’s management system of river industry and banking shows that the expenditure of river industry in Daoguang period has further increased compared with the rapid expansion since Qianlong and Jiaqing dynasties, but the utilization efficiency has not increased significantly, and the flooding of the lower reaches of the Yellow River is more than that of the previous generations. During the Daoguang period, the river administration problems were only triggered by the sudden change of water environment. The essential reason is the change of management mode of river works funds. The management mode of river works bank, whose main symbol is quota, has been shaken since the late period of Jiaqing. During the Daoguang period, River affairs had become the most important financial burden of the Qing Dynasty. In the face of the high river work expenditure, Daoguang Emperor himself repeatedly stressed that river work expenditure needed to be controlled, but he never raised the quota standard of river affairs. River officials continuously increased the actual expenditure of river affairs by selectively quoting Qianjia cases. In the Daoguang period, the quota system of river management and bank has lost its ability to restrict the rapid increase of expenditure.

References Danielle C. V., & Franks, S. W. (2006). Long-term behavior of ENSO: Interaction with the PDO over the past 400 years inferred from paleo-climate records. Geophysical Research Letters, 33, L06712, 5PP. Gang, C. (2014). The study for DH and HGIS. Social Sciences in Nanjing, 3, 136–142.

5 Digital Historical Yellow River

63

Hong-zhong, Z., & Wei, P. (2016). The research of the flood-height recording by Qing government— based on Wanjintan Henan. The Qing History Journal, 2, 87–99. Jingyun, Z., Zhixin, H., & Quansheng, G. (2005). The changes of precipitation over the middle and lower reaches of the Yellow River during the past 300 years. Science in China: Series D, 35(8), 765–774. Lingling, K., Yuexian, N., & Jinhua, W. (2007). Rebuilding the natural runoff series in the nearly 500 years at the Lanzhou Station in up stream of Yellow River. Journal of Water Resources and Water Engineering, 18(4), 5–8. Liu, F., Shengliang, C., & Ping, D. (2012). Spatial and temporal variability of water discharge in the Yellow River Basin over the past 60 years. Journal Geography Science, 22(6), 1013–1033. Man Zhi-min. (2002). Entered into digital era: methods and conceptions of GIS. Historical Geography, 18, 12–22. Mantua, N. J., Hare, S. R., & Zhang, Y. (1997). A Pacific inter-decadal climate oscillation with impacts on salmon production. Bull. Amer. Meteor. Soc., 78, 753–1069. Pan, W. (2014). A preliminary study for the yellow river finance during late Qianlong reign in Shandong. Journal of Chinese Historical Geography, 29(4), 5–12. Pan, W. (2019). The creation of annual renovation of flood-prevention work of the yellow river in Shunzhi Reign. Shanghai Academy of Social Science, 180, 77–87. Piao, S., Ciais, P., & Huang, Y. (2010). The impacts of climate change on water resources and agriculture in China. Nature, 467, 43–51. Ren, G.-y., Jiang, T., Li, W.-j. (2008). An Integrated assessment of climate change impacts on China’s water resources. Advances in Water Science, 19(6), 772–779. Shi Fu-cheng, M., & Ping, G Z.-d. (1990). The reconstruction of run-offs in Qingtongxia George during 1736–1912AD. Yellow River, 4, 27–29. Susan Schreibman,Ray Siemens,John Unsworth. (2004). A Companion to Digital Humanities//A companion to digital humanities /.Blackwell Pub. Tu Zi-pei.(2014). The Peak of Data Science. Science Press. Wei, P., Tao, S., & Zhi-min, M. (2012). The review of GIS entered into Chinese historical geography since 2000 and outlook. Journal of Chinese Historical Geography, 27(1), 11–18. Wei, P., Jing-yun, Z., & Ling-bo, X. (2013). The relationship of nature run-off changes in floodseason of middle Yellow River & Yongding River, 1766–2004. Acta Geographica Sinica, 68(7), 975–982. Wei, P., Zhe, W., & Zhi-min, M. (2020). The achievement of historical GIS since 1990. Journal of Chinese Historical Geography, 35(1), 25–35. Xiaohua, G., Yang, D., & Fahu, C. (2010). The reconstruction based on tree-rings and analysis of runoffs in the upper reaches of the Yellow River during the past 1234 years. Chinese Science Bulletin, 55, 3236–3243. Zhang, P. (2018). The application of the geographic information system in the study of chinese history. Historiography Quarterly, 2, 35–47.

Chapter 6

Visualizing Classic Chinese Literature Yongming Xu

In the eyes of data scientists, classical Chinese literature, including original texts and research outcomes, have great potential for analytics. Big data of this sort can be molded into databases of various types, and some data can be visually represented. The author is not a database or computing expert; yet during his scholarly visits to Western universities, the author has witnessed the visualization of classic Chinese literature through relevant software and databases by scholars and graduate students alike, which is explicit, refreshing, and new. The author finds it possible to employ these databases and visualization methods for the study and teaching of classical Chinese literature, further facilitating its development in the big data era. Therefore, the author presents here some relevant databases and software as well as their operational procedures, based on the case study of Tang Xianzu, a famous playwright of the Ming Dynasty, in the hope that it may help illustrate such geospatial humanities procedures to many non-technical readers.

6.1 Visualization of Writers’ Trajectory and Activity Distribution ArcGIS is the product of Esri. It is a powerful analytic software that can be widely used to create maps in relation to anything geographical and spatial. Harvard University has acquired the right to use ArcGIS products so that its faculty and students can install and use the software on campus. In China, however, very few universities or research institutes have the license. Consequently, the use of ArcGIS is largely limited in China. Y. Xu (B) School of Humanities, Zhejiang University, Hangzhou 310028, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_6

65

66

Y. Xu

QGIS is short for “Quantum GIS”. It is an open source geographical information system software developed by the QGIS Development Team. Users can gain free access to the website (http://www.qgis.org) to download the latest version of QGIS software. It is also an analytic map-creating software concerning geography and space, similar as ArcGIS. The China Historical Geographic Information System, CHGIS, is a project led by Prof. Peter K. Bol from the Department of East Asian Languages and Civilizations at Harvard University, with Lex Berman as project manager. It is an open source Chinese geographical information system website,1 The CHGIS project cooperates with Fudan University’s Center for Historical Geography; it vectorizes Chinese historical place-names and maps, and records the hierarchy and the evolutionary information of place-names in the form of relational databases. So if there is any Chinese historical place-name, the digital CHGIS can render it into a visual representation. The website provides the geographic coordinate system for Chinese historical place-names. However, only the Qing dynasty vector historical map can be downloaded. For the Ming dynasty and the earlier ones, some place-names’ coordinate system can only be looked up, without the vectorized map of the administrative units. CaroDB is a geographic space database on the Cloud. Users can upload acquired longitude and latitude data in batches onto the CartoBD website, quickly creating a visual effect based on maps. The maps created in this way can be saved online or published for public access. It is also an open source website. Worldmap is a platform for publishing and sharing the results of a global geographical information study; it was developed by Harvard University’s Center for Geographic Analysis. For the China component, it contains geographical information and maps of numerous areas, such as demographics, religion, traffic, urban study, ethnic minorities and languages, energy, environment, education, climate, public health, economy, and history. For example, in the literature-related area, there are the Imperial Examination distribution maps of the Song/Yuan/Ming/Qing dynasties, as well as the courier station roadmaps of the Ming and Qing dynasties. After a brief introduction to geographical information systems and spatial map generating software, the author will take Tang Xianzu as an example, presenting his trajectory and activities distribution on a map using QGIS. But first let’s take a look at the rendering after production (Fig. 6.1). The place-names in red indicate the distribution of Tang’s trajectory and activities. So then how does the map come into being? The steps and methods are as follows (Sheet 6.1): (1) Install QGIS software. (2) Look up Tang’s trajectory and activities distribution (based on A Chronological Biography of Tang Xianzu, written by Mr. Xu Shuofang). (3) Look up the longitude and latitude of Tang’s trajectory and activities distribution. This step involves the use of CGIS; namely, China Historical Geographic Information System website (http://www.fas.harvard.edu/~chgis/). In addition, users 1 Its

address is: http://www.fas.harvard.edu/~chgis/.

6 Visualizing Classic Chinese Literature

67

Fig. 6.1 Tang Xianzu’s Trajectory and activity map1

can resort to the search interface developed by Lex Berman, project manager of CHGIS. The website address is http://maps.cga.harvard.edu/tgaz/, and users can copy the searched coordinate information into an Excel worksheet, with the field name as follows: name X Y. It should be noted that, as Tang Xianzu is from the Ming dynasty, so the place-name to look up should be under the administrative units of the Ming dynasty, since the corresponding location of

68

Y. Xu

Sheet 1 Tang Xianzus’s trajectory and activities (parts) ID

Place

Place2

X

Y

1

臨川

Lin Chuan

116.3513

27.98478

21

南昌

Nan Chang

115.8977

28.6749

22

新建

Xin Jian

115.8977

28.6749

23

臨川

Lin Chuan

116.3513

27.98478

24

滕州

Teng Zhou

117.0657

35.06738

25

宣城

Xuan Cheng

118.7425

30.94694

26

順天府

Shuntian Fu

116.368

39.93143

(4)

(5)

(6)

(7)

(8)

some place-names changes over the course of history. The coordinate information of Tang’s trajectory and activities distribution that I found is presented as follows: Save the Excel worksheet as a CSV file and upload it onto the QGIS. Note that to upload, one has to locate the huge comma symbol to the left of the opened QGIS. Click Okay, enter ‘Xian 1980’ in the filter bar, and double click the ‘Xian 1980’ below. Go to CHGIS website http://www.fas.harvard.edu/~chgis/ and download ‘v4_citas90_cnty_pgn_utf_stats’. The path is: DATA—China Historical GIS— Version 4 Datasets (with descriptions)—CITAS-1990-Counties (polygons)— Data Archive—1990 CITAS Counties (With Stats, UTF-8)—Dataset. Decompress the downloaded ‘v4_citas90_cnty_pgn_utf_stats’, go back to the icon on the left, upload the files with ‘.shp’ suffix GGIS interface, click the from the decompressed v4_citas90_cnty_pgn_utf_stats, and drag CSVs onto the top of the ‘.SHP’ files. Click the property of CSV file, under the ‘labels’ condition, tick ‘label this layer with’, then choose ‘name’ from the pull-down, and set the colors and font size below. Import Google map or Bing map into the map link above the QGIS menu. The path is: plugins—manage and install plugins–open layers—Web-openlays plugin—googlemap—googlephysics.

If the base map is a satellite map, then the visual effect will be different (Fig. 6.2). Apart from the QGIS, mappers can use the CartoDB website to create the map of a writer’s trajectory and activities distribution for free. The steps and methods are: (1) Register on https://cartodb.com/. (2) Click the red light after log-in, choose ‘your dashbord’, and then choose ‘new map’. (3) Click ‘connect dataset’, and upload the excel worksheets with the field ‘name X Y’. (4) Click ‘the geom GEO’ in the dataview, and choose the Coordinate X Y bar. Thus, Mapview is ready (for preview). Parameters can be set up in the option box on the right.

6 Visualizing Classic Chinese Literature

69

Fig. 6.2 Tang Xianzu’s trajectory and activity map2

(5) The map one has created can be saved online, published or saved locally to one’s computer. Below is a section of the effect map made with CartoDB (Fig. 6.3).

6.2 Visualization of the Geographical Distribution of Writers’ Social Relations with CBDB and the Aforementioned GIS Software CBDB is short for China Biographical Database project, with the website: http:// isites.harvard.edu/icb/icb.do?keyword=k16229. This project is led by Prof. Peter K. Bol from the Department of East Asian Languages and Civilizations at Harvard University, and with the collaboration of the Center for Research on Ancient Chinese

70

Y. Xu

Fig. 6.3 Tang Xianzu’s Trajectory and activity map3

History at Peking University and the Institute of History and Philology of “Academia Sinica”. CBDB is by far the most comprehensive database for China biographical materials and analyses, with as many as 360,000 individuals recorded throughout the dynasties. Almost 500,000 or so individuals from Chinese local gazetteers are also covered. With this database, one can look up an individual’s basic biographical information, such as birth year, nickname and alternate name, affiliation, and his/her result in the imperial examination, as well as his/her kinship and social relations. Coordinate system data for historical place-names, such as affiliation, are also available. Contents of the database are accessible free of charge. Users can search online or download the database to a local computer. For instance, if we are to ascertain Tang Xianzu’s kinship and social relations, we could acquire relevant data by

6 Visualizing Classic Chinese Literature

71

Fig. 6.4 The access version of CBDB1

searching for kinship and social relations on CBDB. The figure below is the offline search interface of CBDB (Fig. 6.4). For example, looking up one’s social relations network would denote various types of social categories. For ‘academic’ relations, teacher/student relationships, academic exchanges, subject appropriation, academic committees, academic patronage, literature and art exchanges, and academic attacks are covered. For ‘political’ relations, officialdom equality, officialdom subordinate/superior relations, officialdom support, recommendations, and political confrontation are also included. These relations are the data captured by the computer from massive text data on the basis of predetermined relation keywords; therefore, some data may be invaluable beyond the grasp of human vision. However, the data captured by the computer may sometimes fail to present an individual’s practical social relations network. Say A’s anthology gets circulated into place B, person C from place B comes across A’s anthology, and then C may comment on the reading of A’s anthology in his writings. The computer would naturally capture the A/C relationship. While the A/C relationship exists to some degree in real life, however, there may not be any interaction between the two. Hence, not all social relations found in the search are real-life social relations, which requires users to distinguish from the search results. The best solution is to combine the results of the search with an author’s chronological biography to screen out the more intimate and significant social relations with practical interactions. Below is the social relation search interface on CBDB (Fig. 6.5).

72

Y. Xu

Fig. 6.5 The access version of CBDB2

The table below shows Tang Xianzu’s social associations. It is the result from combining CBDB search results with A Chronological Biography of Tang Xianzu, written by Mr. Xu Shuofang (徐朔方). Some of the data from the longitude X and latitude Y have been auto-generated by the CBDB; some are my additions based on CHGIS search results (Sheet 6.2). With the coordinate registered data, the social relations’ geographical distribution maps can be readily made with the the help of software and websites such as ArcGIS, QGIS, and CartoDB. The creation method is similar to that of the trajectory and activities distribution map, so the procedure would not be listed here.

6.3 The Point-Line Visualization of Social Relations with Databases and Software Such as CBDB and GEPHI After some editing, the social relations data acquired from CBDB can be visualized using GEPHI. GEPHI is another free open source network analysis software. But this software needs a JAVA 1.7 language working environment, which requires the pre-installation of JAVA Control on the computer. Two tables are needed in order for GEPHI to demonstrate an individual’s social relations: one is ‘Nodes’, and the other ‘Edges’. ‘Nodes’ contains two fields—ID and Label, while ‘Edges’ contains Source and Target, which mainly present the correlation among individuals, indicating the one-to-many relationship. In the tables, ID mainly signifies the correlation. Take Tang Xianzu as an example, ‘Nodes’ and ‘Edges’ are like this (Fig. 6.6):

6 Visualizing Classic Chinese Literature

73

Sheet 2 Tang XianZu’s social relations (parts)

Id Label 1 Tang Xianzu 2 Chen Yubi 3 Dai Xun 4 Feng Mengzhen 5 Gu Xiancheng 6 Gu Yuncheng 7 Hu Guifang 8 Hu Yingl in 9 Jiang Shichang 10 Li Weizhen 11 Li Zhi Nodes

Source

Target 68 6 68 9 68 12 68 17 68 18 68 22 68 23 68 26 68 31 68 33 68 38 edges

Importing the two tables into GEPHI, Tang’s social relations map connected with point-line will be generated. The rendering appears as follows (Fig. 6.7). GEPHI can not only generate the point-line social relations map of an individual, but also the map of two to many individuals and groups. Below is a point-line

74

Fig. 6.6 Tang XianZu’s social relations map

Fig. 6.7 Tang XianZu’s social relations

Y. Xu

6 Visualizing Classic Chinese Literature

75

Fig. 6.8 Tang XianZu & Tu Long’s social relations

relation map of Tang Xianzu and Tu Long, another playwright from the Ming Dynasty (Fig. 6.8). The following is the social relations network construed among Tang Xianzu, Tu Long and Wang Daokun (Fig. 6.9). The point-line representation of the writers’ social relations network is a very straightforward way to unveil each writer’s social relations and the shared acquaintances among these writers. Software such as UCINET, Nodexl, and Pajek can all represent the correlation among data in a point-line manner. Due to the length of this paper, they will not be illustrated here.

6.4 Conclusion An introduction to the aforementioned databases and software concerning visualization could reach the conclusion that the visualization of literary study requires on the one hand, the support of a database, and on the other hand, high-quality software. The construction of a database regarding literature and history is perspective driven, and calls for computer specialists as well as long-term funding input. ‘CHGIS’ and ‘CBDB’ projects, funded by Prof. Peter K. Bol from the Department of East Asian Languages and Civilizations at Harvard University, are even more significant after over a decade of development. They have a grander prospect and, as open source databases, we are confident that they will improve. For one thing, we are looking forward to the vectorized historical maps of China for the periods before the Ming

76

Y. Xu

Fig. 6.9 Tang Xianzu,Tu Long and Wang Daokun’s social relations

dynasty. Thus, when it comes to the making of the map of the writers of a certain dynasty, the geographical maps of this dynasty, as base maps, would make it more reliable. Moreover, we hope that domestic academia would make an effort towards the construction of databases regarding literature and history, and appeal to concerned administrative departments to increase funding investments in the construction of databases, instead of waiting to exploit the inherited “Big Data” someday, only to find that all of the valuable databases have been developed by foreigners. Classical Chinese literary works include much that can be visualized, such as personal names, place-names, goods, utensils, clothing, flora and fauna, and so on. How to visualize such objects encountered during the reading of texts deserves close attention and serious study. In terms of software, the above-mentioned software are all developed by Westerners and we might experience some restriction during usage. For instance, there are very limited fonts available to choose from; only Bing maps and Google maps are available as contemporary maps for QGIS’ map link, and the Baidu map is absent. Our hope is that Chinese software developers will eventually produce visualization software optimized for Chinese users.

Chapter 7

Quantifying Spatial Variation in Aggregate Cultural Tolerance Hongwei Xu

7.1 Introduction Social scientists frequently acknowledge the significant role of cultural force in shaping human behaviors and performance with respect to cognition (DiMaggio 1997), academic achievement (Hsin and Xie 2014; Yamamoto and Sonnenschein 2016), labor force participation (Antecol 2000) and entrepreneurship (Guiso et al. 2006), marriage and family (Thornton 2005), emotional self-regulation (Varnum and Hampton 2016), and mental health (Chen et al. 2003), to name just a few areas. However, research efforts to quantify exogenous cultural influence remain very limited because components of culture such as social norms, values, and beliefs are difficult to measure and difficult to isolate from other institutional, social, and economic confounders (Bachrach 2014). Current quantitative measures of social norms rely heavily on survey questions about respondents’ individual attitudes and beliefs (Thornton and Achen 2010; Thornton and Binstock 2012). This approach restricts researchers’ capacity to correct for measurement error and reporting bias in respondents’ self-reports and to infer causal influence of culture when individuals’ ideations and behaviors are contemporaneously measured in a cross-sectional survey. Even with longitudinal data, the individual-level approach can still be problematic because a person’s cultural values and beliefs affect how he/she interacts with the world, from which the acquired new life experiences may further modify his/her prior cultural schemas. In addition, developing effective survey measures of culture generally requires extensive qualitative investigation, including the use of ethnographic observation, semi-structured interview, and focus group discussion with

H. Xu (B) Department of Sociology, Queens College - CUNY, 65-30 Kissena Blvd., Powdermaker Hall 252, Queens, NY 11367, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_7

77

78

H. Xu

socio-demographically diverse people, as well as pilot surveys to validate the instrument (Thornton et al. 2010, 2012). Developing valid instruments that are suitable for international comparative studies can be even more challenging and costly. This study seeks to develop new behavioral measures of the cultural tenets that value individual autonomy and freedom of choice over conformity and deference to authority. For simplicity, these tenets are referred to as cultural tolerance in this study. Cultural tolerance has been theorized as the ideational origin of the second demographic transition in the Western societies (Lesthaeghe 2010) and of similar societal shifts in East Asia (Raymo et al. 2015). It also correlates closely with the contrast between the Eastern collectivistic cultures and Western individualistic cultures (Triandis 2001). Psychological research has documented that people’s collectivistic versus individualistic cultural orientation affects their social cognition, that is the skills that relate to interactions with other people and the social environment, and further, social cognition deficits can contribute to various mental disorders (Koelkebeck and Uwatoko 2016). Developing new measures of cultural tolerance can help enrich the research into cultural influence on social cognition and mental disorders beyond the collectivistic-individualistic dichotomy. This study constructs new measures of cultural tolerance by studying the distribution of human handedness in general populations. The basis of my approach hinges on the observation that, aside from potential genetic and pathological factors, the population distribution of left versus right human handedness is affected by cultural and environmental pressures against left-handedness (Porac and Coren 1981). Sociologists, economists, and demographers are interested in examining cultural influences on demographic behaviors and socioeconomic outcomes, but they rely heavily on attitudinal surveys to measure cultural traits (Guiso et al. 2006; Fernández and Fogli 2009; Thornton et al. 2012; Polavieja 2015). Social psychologists are interested in cultural difference in cognition and personality, but they tend to use a person’s country of origin as a crude proxy of his/her cultural background (Markus and Kitayama 1991; Koelkebeck and Uwatoko 2016), or design experimental tasks to tap into specific cultural traits in a laboratory setting (Masuda and Nisbett 2001; Masuda et al. 2008). On the other hand, both psychologists and epidemiologists have frequently treated handedness as a personal trait and examined its implications for an individual’s cognition, health, and socioeconomic achievement (Annett and Kilshaw 1982; Coren and Halpern 1991; Halpern and Coren 1991; Annett 1993; Bryden et al. 2005; Johnston et al. 2013). Some behavioral psychologists and neuroscientists have examined crosscountry differences in the prevalence of left-handedness, as well as within-country temporal trends (Brackenridge 1981; Raymond and Pontier 2004; McManus 2009). However, the generalizability of their findings is questionable given inconsistencies in measurement of handedness across studies and the use of non-representative samples, including clinical patients, college students, magazine subscribers (Gilbert and Wysocki 1992), and even people depicted in artworks (Porac and Coren 1981). Meanwhile, epidemiologists and gerontologists have embarked on collecting nationally representative data on grip strength as a biomarker to study population aging and health around the world. My analysis bridges these research strands to address the

7 Quantifying Spatial Variation in Aggregate Cultural Tolerance

79

challenge of quantifying cultural context in the study of exogenous cultural effects on social and behavioral outcomes. My research approach is two-fold: (1) estimate geographic variation in left-hander prevalence as a proxy for between-area variation in cultural tolerance, and (2) assess the construct validity of the handedness-based measure of cultural tolerance. Drawing on data from the China Health and Retirement Longitudinal Study (CHARLS), this study applies small area estimation (SAE) methods to estimating left-hander prevalence at the provincial level (equivalent to the state-level in the U.S.). This study then tests whether or not the small area estimates of left-hander prevalence at the provincial level (obtained from the CHARLS sample) predict individual-level cultural attitudes in an independent sample—the World Values Survey (WVS). China is selected as the research setting for both substantive and analytical reasons. First, China is a country known to be conservative, traditional, collectivistic, and culturally exclusive (Nisbett et al. 2001). Developing and demonstrating subnational geographic variation in the new cultural measure in China will help establish its broad utility for other countries. Second, China is also known for its vast geography and large population size, which imply substantial within-country cultural variation for us to exploit. Third, although comparable survey data on individuals’ handedness are available for many other countries, CHARLS is one of the few sources that permit access to participants’ geographic information at a fine subnational level, without any restriction.

7.2 Conceptual Background Various theories have been proposed to explain the existence of a left-handed minority across space and time. These theories can be broadly grouped into three categories: genetic, pathological, and cultural factors. Many twin studies have been conducted to estimate the relative importance of genetic and environmental influences on handedness. For example, Medland and colleagues (2006, 2009), who analyzed large twin samples (N > 20,000), found that about 25% of the variance in handedness was explained by an additive genetic effect and about 75% by non-shared environmental effects._ENREF_45 However, recent genome-wide association studies have been unsuccessful in detecting any genetic variant associated with handedness (Brandler et al. 2013; Armour et al. 2014). As for pathological factors, a variety of birth traumas and brain injuries have been theorized to affect handedness with the presumption that left-handedness results from certain physiological or neurological insults that disrupt the normal developmental processes. However, the empirical evidence remains inconclusive (Bakan 1971; Satz 1972; Hicks et al. 1978a, b; Chayatte et al. 1979). Even if some pathological factors are at work, they only pertain to certain special populations, exerting little influence on the general population. On the other hand, social pressure, stigma, and discrimination have been associated with left-handedness in many cultures. Bias against left-handedness is evident in negative connotations associated with the word “left” in many languages (Beidelamn

80

H. Xu

1973). For example, the Latin word “sinister” means “evil” as well as “left”. The English word “left” comes from the Celtic “lyft” which means “weak” or “broken”; whereas the word “right” means “correct.” The French word “gauche” means both “left” and “awkward” or “impolite.” The German word “links” or “linkisch” means both “left” and “clumsy” or “inapt.” The so-called Right-Sided World Hypothesis argues that hand preference results from a learning process influenced by social pressure and cultural bias built into the environment (Porac and Coren 1981; Porac et al. 1986). Hand preference has some plasticity whereby an individual’s initial biological inclination can be modified through a learning process. In a right-sided world, regardless of its underlying mechanisms, the population composition of handedness varies as a function of both the amount of cultural and environmental pressure applied to left- or mixed-handed persons to conform to the right-handed norm and the resistance of those persons to that pressure (Porac and Coren 1981). Where pressure is substantial, people born left-handed may effectively be forced to switch to a dominant right hand orientation, which would presumably reduce the prevalence of left-handers in the population. And where pressure is low and cultural tolerance of left-handedness is high, we should expect an increased prevalence of left-handers in the population, although the upper bound of the increase may be constrained by biological factors (Porac and Coren 1986; McManus 2009). Historically, in both Western and non-Western countries, using the left hand was forbidden or strongly discouraged for certain socially relevant activities such as writing and eating. Throughout the literate world, children were trained by parents and school teachers to write with their right hands, with natural left-handers coerced, sometimes punitively, to switch hands (Harris 1990). In the West, school teachers were still allowed to apply physical punishment to enforce right-handed writing at the turn of the 20th century (Harris 1983). It was not until the early to mid- 1900s that several U.S. and U.K. psychologists began to question the practice of forcing right-handed writing, linking it to interfering with speech development, and through to the 1950s, U.S. psychologists, pediatricians, and educators continued to debate its use (Harris 1990). Previous cohort studies have documented growing tolerance for left-handedness in “liberal” (non-traditional) Western countries. Smart and colleagues (1980) reported an increase in the percentage of left-handedness from 6.2% among grandparents to 10% among parents and 17.5% among the children in Britain. Tambs and colleagues (1987) found an increase in the percentage of left-handedness from 1.2% in the 1895– 1905 cohort to 8.7% in the 1975–1985 cohort in Norway. The tolerance towards lefthanded writing began to increase in the U.S. from 1930 onwards, but it took some 40 years to complete the liberalization (Levy 1974), as opposed to only 20 years in the Netherlands, where the shift started after 1945 (Beukelaar and Kroonenberg 1986). In contrast, the practice of forced hand switching is sustained in many “conservative” non-Western countries today, even though the rationale for forbidding left hand use may no longer apply. Iwasaki and colleagues (1995) found no declining cultural censorship of left-handedness in Japan, with 15.5% of adult respondents reporting correction of hand use for writing and eating at an early age. In China the left hand is restricted for writing and eating and childhood intervention has been so pervasive

7 Quantifying Spatial Variation in Aggregate Cultural Tolerance

81

and effective that prevalence of left-handedness is extremely low—at just 0.23% for children and adults in the mainland (Li 1983), 0.7% for grade-school and college students in Taiwan (Teng et al. 1976), and 1.6% for college students in Hong Kong (Hoosain 1990). By comparison, the prevalence of left-handed writing is 6.5% among Asian American school children in California (Hardyck et al. 1975). Even in the absence of overt cultural pressure, covert environmental pressure against left-handedness may persist. Since the Industrial Revolution, or even earlier, many factory machines (e.g., lathes and presses), everyday tools (e.g., scissors and can openers), musical instruments (e.g., violins and guitars), sporting gear (e.g., fishing reels and bowling balls), and other equipment such as cameras and computer mice have been designed and produced for right-handed usage (Porac and Coren 1981; McManus 2009). In many cases, left-handed people have adapted by learning to operate equipment with their right hand (Coren 1989). Although biological factors may play a role in the distribution of handedness across populations, this does not preclude a significant role for cultural pressure. Porac and colleagues (1990) conducted a meta-analysis of 55 studies across 20 countries and found that environmental pressure accounts for about 8% of the withincultural variation in adult handedness score and 23.5% of the cross-cultural variations in prevalence of left-handedness, whereas biological factors (using race as a proxy) only explains 1.9% of the variability. In conclusion, assuming that the genetic and pathological effects have remained relatively stable, we expect the prevalence of left-handedness and/or mixedhandedness to increase in a population as the overall cultural tolerance grows (assuming resultant decreases in both overt anti-left-hander social pressure and indirect environmental pressure). This expectation allows us to infer the degree of cultural tolerance in a society from estimating its rates of left-handedness.

7.3 Data and Measures There are two sources of data in this study. Data from the 2011 national baseline of the China Health and Retirement Longitudinal Study (CHARLS) were used for SAE of left-hander prevalence. Modeled after the Health and Retirement Study in the U.S., CHARLS is a biennial survey of a nationally representative sample of Chinese residents ages 45 and older, and their spouses if available. The 2011 national baseline of CHARLS surveyed 10,287 households and 17,708 individuals living in 150 counties across 28 out of 31 provinces in mainland China, with a response rate of 80.5% (Zhao et al. 2014). Individual-level handedness was measured in two ways. First, a subjective measure of individual-level handedness is based on survey participants’ responses to the global question, “Which is your dominant hand?” (choices: right hand, left hand, or both hands equally dominant). A dichotomous variable was coded 1 for left-handed, and 0 for right-handed or ambidextrous. Self-reported handedness is subject to reporting error. If left-handed people are more likely to falsely report their

82

H. Xu

true handedness in areas where cultural and environmental pressures are higher, the sample estimates of left-handedness rates will be systematically biased downward. To address this problem, the second measure of individual-level handedness incorporated objective information on hand grip strength as measured with a hand-held dynamometer in CHARLS (Zhao et al. 2013). Two measurements were taken for each hand, with the final grip strength of each hand determined by the average value of the two measurements. Each CHARLS respondent was classified as left-handed if he/she self-reported as left-handed, or his/her grip strength was greater in the left than the right hand. As a sensitivity check, a one-kilogram (or two-kilogram) threshold in determining handedness such that respondents were categorized as left-handed if the grip strength is at least one (or two) kilogram greater in the left hand than the right hand (Siengthai et al. 2008). To apply poststratification weights from China’s 2010 Population Census Data to model-based SAE (see the Methods section below), a set of individual-level covariates in CHARLS were coded into the same categories as the cross-tabulated census statistics at the provincial level. These individual covariates include age in 2010 (45–49, 50–54, 55–59, 60–64, 65–69, 70–74, 75–79, 80–84, or over 85), sex (men or women), educational attainment (none, primary school, middle school, high school, college or above), and residence (rural or urban). Data from wave 6 of World Values Survey (WVS)-China were used for the analysis of the association between left-hander prevalence and cultural values. WVS is a repeated cross-sectional survey of people’s values around the world. Wave 6, conducted in 2012, surveyed 2,300 adult respondents recruited in 24 provinces. The measures of individuals’ cultural values and attitudes are: (1) self-perception of individual autonomy; and (2) the overall emancipative values index and its four sub-indices—autonomy, equality, choice, and voice. Self-perceived autonomy is based on survey participants’ ratings of the statement, “I see myself as an autonomous individual” on a four-point Likert scale. The emancipative values index and its sub-indices were originally constructed by Welzel (2013) and publicly released as part of the WVS data product. All of them are normalized continuous scores, ranging from 0 to 1, based on survey participants’ responses to multiple questions about their attitudes and beliefs in each area. The autonomy sub-index measures views on individual autonomy versus obedience to authority. The equality sub-index measures views on gender equality with respect to education, employment, and political leadership. The choice sub-index measures views on personal freedom in reproductive choices. The voice sub-index measures views on the role of the voice of the people as a societal influence. The overall emancipative values index is a summary score of the four sub-indices. The control variables include age, sex, and provincial fixed effects.

7 Quantifying Spatial Variation in Aggregate Cultural Tolerance

83

7.4 Methods CHARLS samples respondents within a province to create an overall sample that is nationally representative, not representative of the province in which the sampled respondents are located. A model-based SAE strategy, known as the multilevel regression with poststratification weighting (MRP) (Gelman and Little 1997; Park et al. 2004; Zhang et al. 2014), was employed to obtain reliable estimates of left-hander prevalence at the provincial level. The MRP method consists of three steps. The first step is to fit a multilevel logistic model, in which individual-level left-handedness (1 = yes; 0 = no) is regressed on individual demographic and socioeconomic characteristics and provincial random intercepts. The second step is to apply coefficient estimates from the multilevel logistic model to calculate the probability of being left-handed for each of the 5,040 age×sex×education×residence×province crosstabulated categories (i.e., 9 age categories, 2 sex categories, 5 education categories, 2 residence categories, and 28 provincial intercepts). The third step is to calculate provincial left-hander prevalence by summing the predicted individual probabilities of being left-handed over all the cross-tabulated categories in a given province weighted by the categories’ corresponding population size in that province (also known as posstratification weighting). To evaluate the construct validity of the handedness-based cultural measures, the small area estimates of provincial left-hander prevalence derived from the CHARLS data are used to predict individual-level cultural values from the WVS-China data. The WVS-China respondents (level 1) who lived in the same provinces (level 2) were first merged with the same rates of left-handers at the provincial level estimated from the CHARLS data. Then the WVS-China participants’ scores on self-perceived autonomy, the emancipative values index, and its four sub-indices (autonomy, equality, choice, and voice) were regressed on provincial left-hander prevalence while controlling for age, sex, and provincial fixed effects. Given assumptions regarding the role of cultural tolerance in moderating overt social pressure and indirect environmental pressure exerted on handedness, a positive correlation between provincial left-hander prevalence and residents’ scores on these indices is expected (Thornton 2001; Lesthaeghe 2010; Kavas 2015).

7.5 Results 7.5.1 Descriptive Statistics Table 7.1 presents the frequency distributions of individual-level left-handedness and the independent variables used in SAE in the CHARLS sample. The unweighted sample size is 13,022 and it represents 506,547,019 Chinese middle age and older adults after weighting. In unweighted and weighted samples, the prevalence rate of left-handers is much lower on the basis of self-reported dominant hand (Measure

84 Table 7.1 Summary statistics for the variables used in the small area estimation of left-hander prevalence in Chinese middle-aged and older adults (>=45 years)

H. Xu Unweighted %

Weighted %

Measure 1

7.4

7.9

Measure 2

30.5

30.3

Measure 3

22.4

22.3

Measure 4

16.8

17.1

45–49

19.5

21.5

50–54

16.5

15.7

55–59

21.5

21.6

60–64

16.5

14.5

65–69

11.2

10.4

70–74

7.4

7.6

75–79

4.7

5.1

80–84

2.0

2.5

>=85

0.7

1.1

Women

52.1

51.7

Men

47.9

48.3

Left-handedness

Covariates Age (years) in 2010

Sex

Educational attainment None

29.1

26.0

Primary school

40.3

38.2

Middle school

19.9

22.0

High school

9.3

11.4

College or above

1.5

2.4

Urban

36.9

49.4

Rural

63.1

50.6

N of observations

13,022

506,547,019

Residence

Note Individual’s left-handedness is determined by self-reported dominant hand only in Measure 1; by both self-report and grip strength in Measure 2; by both self-report and grip strength with 1 kg margin of error in Measure 3; and by both self-report and grip strength with 2 kg margin of error in Measure 4

7 Quantifying Spatial Variation in Aggregate Cultural Tolerance

85

1; about 7.4–7.9%) than for self-report combined with grip strength (Measure 2; about 30%), suggesting that estimation of left-hander prevalence is sensitive to the measurement of individual handedness. As the margin of error in comparing grip strength between left and right hands is allowed to increase to 1 and 2 kg, the prevalence rate of left-handers drops to about 22% (Measure 3) and 17% (Measure 4), respectively. In terms of covariates, overall the CHARLS sample is sex-balanced (approximately 52% women and 48% men) and consists of a large proportion of middleaged (roughly 73–74% of ages 45–64), poorly educated (about two thirds completed primary school or less) respondents. The unweighted statistics are similar to the weighted statistics with one exception. The unweighted sample is dominated by rural respondents (63.1%), whereas the weighted sample is evenly split between rural (50.6%) and urban (49.4%) respondents.

7.5.2 Small Area Estimation Results Table 7.2 reports regression coefficients estimated from the multilevel logistic models of being left-handed in the CHARLS sample. Regardless of how individual handedness is determined, men were more likely than women to be left-handed. There was a negative educational gradient in left-handedness—better educated respondents were less likely to be left-handed. This pattern also holds irrespective of the measurement choice of individual handedness. Rural-urban residence was not predictive of individual left-handedness, whereas age difference varied by the measurement of individual handedness. When using self-reported dominant hand to determine individual handedness (Measure 1), respondents in certain oldest age groups (70–74 and over 85 years) were less likely than the youngest group (45–49 years) to be left-handed; but those of 75–79 years old were more likely to be left-handed (marginally significant). After incorporating grip strength to determine individual handedness (Measures 2–4), the negative associations between the oldest age groups and left-handedness lost statistical significance, while there was some evidence that respondents of 55–59 years old were more likely to be left-handed than the youngest reference group. In short, gender and educational attainment were consistently predictive of individual left-handedness, whereas age and rural-urban residence were not. Model-based small area estimates of left-hander prevalence in the 28 provinces surveyed in CHARLS were obtained after calculating predicted probabilities of being left-handed for all the cross-tabulated demographic categories and applying poststratification weights. Figure 7.1 compares the kernel density distributions across four sets of estimates using different measures of individual handedness. Recall that Measure 1 refers to individual left-handedness purely determined by self-reported dominant hand; Measure 2 extends Measure 1 by categorizing a respondent whose grip strength on left hand is stronger than that on right hand as left-handed even if he or she self-reports to be right-handed or ambidextrous; Measures 3 and 4 are similar

86

H. Xu

Table 7.2 Regression coefficients from the multilevel logistic models of being left-handed in Chinese middle-aged and older adults (>=45 years) Men (ref: women)

Measure 1

Measure 2

Measure 3

Measure 4

0.346

0.147

0.299

0.369

***

*

***

***

Age group (ref: 45–49) 50–54

−0.108

0.040

55–59

−0.057

0.113

60–64

0.003

0.028

−0.010

−0.005

65–69

−0.087

0.055

−0.005

0.026

70–74

−0.268

*

0.102

0.033

75–79

0.326

†

0.183

0.193

80–84

−0.217

−0.028

−0.304

−0.319

>=85

−1.855

**

0.164

0.035

−0.312

0.014 †

0.132

0.052 *

0.102

−0.075 †

0.199

Education (ref: no schooling) Primary school

−0.334

**

−0.188

**

−0.216

**

−0.263

**

Middle school

−0.437

**

−0.217

**

−0.244

*

−0.313

**

High school

−0.484

*

−0.331

***

−0.387

***

−0.452

***

>=College

−1.204

**

−0.395

†

−0.276

−0.571

*

Rural area (ref: urban)

−0.105

Constant

−2.247

***

−0.867

***

−1.330

***

−1.637

***

0.075

***

0.064

***

0.053

***

0.057

***

0.076

0.091

0.044

Variance component Province level

Note ref = reference category. Individual’s left-handedness is determined by self-reported dominant hand only in Measure 1; by both self-report and grip strength in Measure 2; by both self-report and grip strength with 1 kg margin of error in Measure 3; and by both self-report and grip strength with 2 kg margin of error in Measure 4 † p < 0.1; * p < 0.05; ** p < 0.01; *** p < 0.001

to Measure 2 except that they allow a margin of error (1 and 2 kg, respectively) when comparing grip strength between two hands. Two findings stand out in Fig. 7.1. First, consistent with the descriptive statistics mentioned the previous section, combing self-reported dominant hand and grip strength to classify individual handedness leads to much higher estimates of lefthander prevalence than using self-report alone. The entire distribution of small area estimates is concentrated below 7% when using self-reported dominant hand alone (Measure 1), whereas the corresponding distributions combining self-report and grip strength are centered above 10%, suggesting that subjective measure of individual handedness may lead to underestimates of left-hander prevalence. Among the three measures that combine self-reported dominant hand and grip strength, the average estimate of provincial left-handed prevalence rate is highest when no margin of error is allowed.

7 Quantifying Spatial Variation in Aggregate Cultural Tolerance

87

Fig. 7.1 Kernel densities of the small area estimates of left-hander prevalence at the provincial level in Chinese middle-aged and older adults (>=45 years) Note Individual’s left-handedness is determined by self-reported dominant hand only in Measure 1; by both self-report and grip strength in Measure 2; by both self-report and grip strength with 1 kg margin of error in Measure 3; and by both self-report and grip strength with 2 kg margin of error in Measure 4

Second, the small area estimates are more smoothed when grip strength is incorporated to determine individual handedness than relying on self-reported dominant hand alone. As shown in Table 7.3, when using self-reported dominant hand alone (Measure 1), the standard deviation of the small area estimates is 0.9%, which is only about a fifth of that for the estimates using Measure 2 (4.4%). Similarly, the interquartile range of the small area estimates is 1.2 when using Measure 1, which is about one fourth of that for the estimates using Measure 2. After taking into account Table 7.3 Summary statistics for different types of small area estimates of left-hander prevalence in Chines middle-aged and older adults (>=45 years) at the provincial level Mean

SD

Min

Median

Max

IR

N

Model-based estimates of left-hander prevalence at the provincial level (%) Measure 1

5.3

0.9

3.1

5.3

7.3

1.2

28

Measure 2

26.3

4.4

12.7

27.6

35.6

5.1

28

Measure 3

17.5

2.8

9.4

18.0

23.9

3.6

28

Measure 4

11.7

2.2

6.1

12.2

16.4

2.3

28

Note SD = standard deviation; Min = minimum; Max = maximum; IR = interquartile range

88

H. Xu

possible margin of error in comparing grip strength between two hands, the small area estimates are shrunk towards the center of the distribution. Specifically, the standard deviation of the small area estimates drops from 4.4% for Measure 1 to 2.8% and 2.2% for Measures 3 and 4, respectively, and the corresponding interquartile ranges decreases from 5.1% to 3.6% and 2.3%. Figure 7.2 depicts the spatial distributions of small area estimates of lefthandedness prevalence at the provincial level. Darker colors represent higher prevalence rates of left-handers. Although there are some variations across when using different measures of individual handedness, two clusters of high left-hander prevalence rates can be observed. The first cluster is located in Southwest China, consisting of Sichuan, Yunan, and Guizhou Provinces. The second cluster is located in Northwest China, consisting of Gansu, Qinghai, and Xinjiang Provinces.

Fig. 7.2 Spatial distributions of small area estimates of left-handedness prevalence at the provincial level

7 Quantifying Spatial Variation in Aggregate Cultural Tolerance

89

Table 7.4 Summary statistics for the variables in the 2012 World Values Survey-China Definition

Mean

SD

Min

Max

N

Outcome variables Self-perceived as an autonomous individual

4-point Likert scale on “I see myself as an autonomous individual”

3.16

0.58

1

4

1,970

Emancipative values index

Multi-point index from 0 to 1 based on 12 items

0.39

0.12

0

0.86

2,159

Autonomy sub-index

4-point index from 0 to 1

0.60

0.23

0

1

2,300

Equality sub-index

12-point index from 0 to 1

0.53

0.23

0

1

2,134

Choice sub-index

30-point index from 0 to 1

0.21

0.22

0

1

1,885

Voice sub-index

6-point index from 0 to 1

0.18

0.24

0

1

2,098

43.92

14.95

18

75

2,300

0.49

0.50

0

1

2,300

Control variables Age

Years

Sex

1 if male; 0 if female

Note SD = standard deviation; Min = minimum; Max = maximum

7.5.3 Predicting Individual Cultural Values Table 7.4 reports the descriptive statistics for the variables in WVS-China. No missing data occur in the control variables. The full sample is evenly split between men and women, with an average age at about 44 years. The average score of self-perceived autonomy is 3.16 on a 4-point Likert scale, which suggests that on average the respondents ‘agree’ with the statement that, “I see myself as an autonomous individual.” All the other attitudinal variables are normalized on a continuous scale ranging from 0 to 1, with a higher score representing a stronger emphasis on freedom of choice and equality of opportunities (Welzel 2013), although the maximum value of the emancipative values index in my analytical sample reaches only 0.86. The amount of missing data varies across the attitudinal measures, but even in the worst case of the choice sub-index, more than 80% of the sample have valid responses. To maximize statistical power, the analytical sample size is allowed to vary depending on the number of valid responses for each attitudinal measure. Table 7.5 presents the coefficients from regressing attitudinal measures on different estimates of left-hander prevalence, adjusting for age, sex, and provincial fixed effects. A significantly positive coefficient lends support to the theoretical expectation that left-hander prevalence is a valid measure for cultural tolerance. Across different small area estimations, respondents living in provinces with a higher prevalence rate of left-handers consistently considered themselves with a higher level of autonomy. For the emancipative values index and its sub-indices, the association between left-hander prevalence at the provincial level and cultural attitude at the individual level differs by which measure of individual handedness is used in small area estimation. When using self-reported dominant hand alone to determine individual

(0.0003) 0.0210

(0.0006) 0.0478

(0.0005) 0.0424

Measure 2

Measure 3

Measure 4

***

***

***

***

(0.0001) 0.0030

(0.0001)

0.0033

(0.0001)

0.0015

(0.0000) −0.0199

***

***

***

***

Overall index

Emancipative values

(0.0001) 0.0030

(0.0001) 0.0034

(0.0001) 0.0015

(0.0000)

−0.0205

***

***

***

*

Autonomy sub-index

(0.0002) 0.0094

(0.0002) 0.0106

(0.0001) 0.0046

(0.0002) −0.0038

***

***

***

***

Equality sub-index

(0.0002) −0.0035

(0.0002) −0.0039

(0.0001) −0.0017

(0.0001) −0.0214

***

***

***

***

Choice sub-index

(0.0002) 0.0094

(0.0003) 0.0106

(0.0001) 0.0046

(0.0001) −0.0107

***

***

***

***

Voice sub-index

Note Individual’s left-handedness is determined by self-reported dominant hand only in Measure 1; by both self-report and grip strength in Measure 2; by both self-report and grip strength with 1 kg margin of error in Measure 3; and by both self-report and grip strength with 2 kg margin of error in Measure 4. All the models control for age, gender, and provincial fixed-effects. Robust standard errors that adjust for individuals clustered within provinces are shown in parentheses * p < 0.05; ** p < 0.01; *** p < 0.001

N of observations

(0.0002) 0.0707

Measure 1

% Left-handers at the provincial level

Self-perceived autonomous individual

Table 7.5 Regression estimates of the associations between left-hander prevalence in the 2011 China Health and Retirement Longitudinal Study and Chinese adults’ attitudes in the 2012 World Values Survey

90 H. Xu

7 Quantifying Spatial Variation in Aggregate Cultural Tolerance

91

handedness (Measure 1), a higher level of left-hander prevalence at the provincial level were negatively associated with the individual-level attitudinal score on the overall emancipative values index, as well as all the four sub-indices. In contrast, when using both self-reported dominant hand and grip strength to classify individual handedness (Measures 2–4), a higher level of left-hander prevalence at the provincial level were positively associated with the individual-level attitudinal score on the overall emancipative values index, as well as the autonomy, equality, and voice sub-indices, but negatively associated with score on the choice sub-index. In addition, the strength of these associations appeared to be stronger when margin of error was allowed in comparing grip strength between two hands (Measures 2 and 3) as evident by the larger coefficient sizes compared with using Measure 1 which did not incorporate margin of error. In short, using self-reported dominant hand to classify individual handedness leads to results that run largely counter to the theoretical expectation, whereas combining self-report and grip strength to measure handedness produces results that are largely consistent with the theoretical expectation.

7.6 Discussion Focusing on one aspect of culture—cultural tolerance—this study conceptualizes aggregate-level handedness as a contextual indicator and infer the amount of social pressure against left-handedness in an area from the left-hander prevalence in that area. Drawing on nationally representative data in China, this study applied a modelbased SAE method to subjective (self-reported dominant hand) and objective (hand grip strength) measures of individuals’ handedness, yielding four sets of estimates of left-hander prevalence at the provincial level for 28 out of 31 provinces. This study found that sets of estimates generally agree with one another not in absolute values but in relative terms. This study tested the construct validity of this new measure of cultural tolerance (population prevalence of left handedness) by using the SAEs of left-hander prevalence obtained from the 2011 CHARLS sample to predict individual-level cultural attitudes gleaned from the 2012 WVS-China sample. The two data sources are independent from each other. The estimates of left-hander prevalence presumably reflected the cultural environment during the early childhood of the CHARLS respondents, who were 45 years and older in 2011, while the attitudinal responses in the WVS-China sample reflected the respondents’ cultural values at the time of the interview. These analytic features help alleviate the concern about potential endogeneity in the regression model. This study found strong evidence for construct validity when grip strength was combined with self-reported dominant hand to classify individual handedness. After controlling for age, sex, and provincial fixed effects, a higher prevalence rate of lefthanders at the provincial level predicted significantly higher scores at the individual level of self-perceived autonomy, the emancipative values index, and its three subindices—autonomy, equality, and voice. These indices capture favorable attitudes

92

H. Xu

toward freedom of choice and equality of opportunities. This finding is robust against different margins of error used to classify individual handedness. The only exception is that a higher prevalence rate of grip-strength left-handers at the provincial level predicted a significantly lower score on the choice sub-index. One possible explanation is that the choice sub-index consists of three items that summarize respondents’ attitudes toward homosexuality, abortion, and divorce, and these domains do not overlap well, at least in the Chinese context, with the cultural value placed on tolerance of left-handedness. On the other hand, using self-reported dominant hand to classify individual handedness led to generally poor performance of SAEs of left-hander prevalence in predicting individuals’ attitudes toward freedom of choice and equality of opportunities. The difference in predicting individuals’ attitudes suggests that the selfreported dominant hand measure may be subject to reporting bias and less reliable than objectively measured grip strength. Future research should avoid sole reliance on self-reported handedness, especially in contexts where cultural pressure against left-handedness is strong. Despite these limitations, my proposed cultural measure has several potential methodological merits. First, they are behavioral in nature and hence are less subject to measurement error and reporting bias, which are common in survey measures of attitudes, beliefs, and values. Even if some natural left-handed respondents underreport their dominant handedness because of cultural stigmatism or because they were pressured to switch hands during childhood, we can alleviate the problem of misclassifying handedness by using objectively measured grip strength data (Siengthai et al. 2008). Second, despite being collected at middle age or older, the handedness data actually reflect the cultural pressure experienced by the adult survey respondents during their early childhood, when handedness is typically established (Raymond and Pontier 2004). The new handedness-based cultural measures are retrospective in nature and yet robust against respondents’ recall bias because they are objective, behavioral measures rather than subjective, attitudinal measures. The retrospective or time-lagged feature of the new measure also implies that they are exogenous to the sociocultural context at the time of survey, which provides an opportunity for future researchers to investigate the long-term causal impact of early-life cultural environment on later-life outcomes with cross-sectional data. Third, developing attitudinal questions in surveys, especially comparable instruments used in different countries, can be costly. Similarly, experimental tasks developed by social psychologists to assess cultural attitudes and values in a laboratory environment are hardly applicable to large-scale data collection in a representative sample of the general population. In contrast, thanks to the growing efforts of health and aging studies, data on self-reported dominant hand and grip strength have already been collected in many countries across the globe with consistent instruments and procedures, as shown in Table 7.6. Together, these surveys cover 30-plus countries in North America, Europe, Asia, and Africa. The new measure can be applied to quantify cultural tolerance in many countries around the world where comparable

7 Quantifying Spatial Variation in Aggregate Cultural Tolerance

93

Table 7.6 Selected longitudinal surveys and the corresponding waves in which both self-reported dominant hand and grip strength have been measured using the same instruments HRS

CHARLS

SHARE

MHAS

KLoSA

JSTAR

WHO-SAGE

U.S.

China

20+ European countries

Mexico

Korea

Japan

China, Ghana, India, Mexico, Russia, South Africa

W1

W1

W2

W2

2004–05

W7

W1

2006–07

W8

W2

2008–09

W9

2010–11

W10

W1

W4

2012–13

W11

W2

W5

2014–15

W12

W3

W3

W3

W4

W4

W1

Note HRS = Health and Retirement Study; CHARLS = China Health and Retirement Longitudinal Study; SHARE = Survey of Health, Ageing and Retirement in Europe; MHAS = Mexican Health and Aging Study; KLoSA = Korean Longitudinal Study of Aging; JSTAR = Japanese Study on Aging and Retirement; WHO-SAGE = World Health Organization Study on global AGEing and adult health

handedness data are available. As additional waves of these longitudinal surveys are conducted and new birth cohorts are recruited, the new cultural measure can be updated periodically at relatively low cost. In a broader sense, this study highlights the progress in spatial data analysis in social science research. In the past, such research topics as cultural values and social norms were predominantly examined by a qualitative approach due to lack of population-based data, let alone research into spatial dynamics of cultural values and social norms. In addition to increased data availability, advancements in statistical methods and computing power have considerably reduced the computational costs of managing large data and estimating complex models. For example, hierarchical generalized linear models, as the one used in this study, were developed in the 1990s, but their applications in social science research remained rare in the early 2000s. Now, procedures of estimating hierarchical generalized linear models are a routine part of most statistical software packages. This study demonstrates how to draw on biomarker data for cultural research, but the source of data is still a traditional survey. Future research on SAE in general will benefit from integrating data from different sources (e.g., survey, administrative records, Internet searches, and social media) or in different forms (e.g., spreadsheets, text, and images). Such data fusion remains a challenging task but new techniques are emerging, especially in the field of spatial-temporal analysis. Acknowledgements An earlier version of this article was presented at the 2017 annual meeting of the Population Association of America in Chicago, IL. The author thanks session participants for useful comments. The author also thanks Arland Thornton for his helpful feedback and N.

94

H. Xu

E. Barr for her assistance in copy-editing. This study was supported by the National Institutes of Health under a center grant to the Population Studies Center at the University of Michigan (R24 HD041028) and a Mellon Diversity Fellowship awarded to the author at Queens College.

References Annett, M. (1993). The disadvantages of dextrality for intelligence—corrected findings. British Journal of Psychology, 84(4), 511–516. Annett, M., & Kilshaw, D. (1982). Mathematical ability and lateral asymmetry. Cortex, 18(4), 547–568. Antecol, H. (2000). An examination of cross-country differences in the gender gap in labor force participation rates. Labour Economics, 7(4), 409–426. Armour, J. A., Davison, A., et al. (2014). Genome-wide association study of handedness excludes simple genetic models. Heredity, 112(3), 221–225. Bachrach, C. A. (2014). Culture and demography: From reluctant bedfellows to committed partners. Demography, 51(1), 3–25. Bakan, P. (1971). Handedness and birth order. Nature, 229(5281), 195. Beidelamn, T. O. (1973). Kaguru symbolic classification. In R. Needham (Ed.), Right and left: Essays on dual symbolic classification (pp. 128–166). Press: Chicago, University of Chicago. Beukelaar, L. J., & Kroonenberg, P. M. (1986). changes over time in the relationship between hand preference and writing hand among left-handers. Neuropsychologia, 24(2), 301–303. Brackenridge, C. J. (1981). Secular variation in handedness over ninety years. Neuropsychologia, 19(3), 459–462. Brandler, W. M., Morris, A. P., et al. (2013). Common variants in left/right asymmetry genes and pathways are associated with relative hand skill. PLoS Genetics, 9(9), e1003751. Bryden, P. J., & Bruyn, J. et al. (2005). Handedness and health: An examination of the association between different handedness classifications and health disorders. Laterality: Asymmetries of Body, Brain and Cognition, 10(5), 429–440. Chayatte, C., Abern, S. B., et al. (1979). Left handed people. Irish Medical Journal, 72, 511. Chen, H., Guarnaccia, P. J., et al. (2003). Self-attention as a mediator of cultural influences on depression. International Journal of Social Psychiatry, 49(3), 192–203. Coren, S. (1989). Left-handedness and accident-related injury risk. American Journal of Public Health, 79(8), 1040–1041. Coren, S., & Halpern, D. F. (1991). Left-handedness: A marker for decreased survival fitness. Psychological Bulletin, 109(1), 90–106. DiMaggio, P. (1997). Culture and cognition. Annual Review of Sociology, 23(1), 263–287. Fernández, R., & Fogli, A. (2009). Culture: An empirical investigation of beliefs, work, and fertility. American Economic Journal: Macroeconomics, 1(1), 146–177. Gelman, A., & Little, T. C. (1997). poststratification into many categories using hierarchical logistic regression. Survey Methdology, 23, 127–135. Gilbert, A. N., & Wysocki, C. J. (1992). Hand preference and age in the united states. Neuropsychologia, 30(7), 601–608. Guiso, L., Sapienza, P., et al. (2006). Does culture affect economic outcomes? Journal of Economic Perspectives, 20(2), 23–48. Halpern, D. F., & Coren, S. (1991). Handedness and life span. New England Journal of Medicine, 324(14), 998. Hardyck, C., Goldman, R., et al. (1975). Handedness and sex, race, and age. Human Biology, 47(3), 369–375.

7 Quantifying Spatial Variation in Aggregate Cultural Tolerance

95

Harris, L. J. (1983). Laterality of function in the infant: historical and contemporary trends in theory and research. In G. Young, S. J. Segalowitz, C. M. Corter, & S. E. Trehub (Eds.), Manual specialization and the developing brain (pp. 177–247). New York, Academic: Press. Harris, L. J. (1990). Cultural influences on handedness: historical and contemporary theory and evidence. Left-handedness: Behavioral implications and anomalies. S. Coren. Amsterdam, Elsevier Science, 195–258. Hicks, R. A., Evans, E. A., et al. (1978a). Correlation between handedness and birth order: Compilation of five studies. Perceptual and Motor Skills, 46(1), 53–54. Hicks, R. A., Pellegrini, R. J., et al. (1978b). Handedness and birth risk. Neuropsychologia, 16(2), 243–245. Hoosain, R. (1990). Left handedness and handedness switch amongst the chinese. Cortex, 26(3), 451–454. Hsin, A., & Xie, Y. (2014). Explaining Asian americans’ academic advantage over whites. Proceedings of the National Academy of Sciences, 111(23), 8416–8421. Iwasaki, S., Kaiho, T., et al. (1995). Handedness trends across age groups in a japanese sample of 2316. Perceptual and Motor Skills, 80(3), 979–994. Johnston, D. W., Nicholls, M. E. R., et al. (2013). Handedness, health and cognitive development: Evidence from children in the national longitudinal survey of youth. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(4), 841–860. Kavas, S. (2015). ‘Wardrobe modernity’: Western attire as a tool of modernization in turkey. Middle Eastern Studies, 51(4), 515–539. Koelkebeck, K., & Uwatoko, T. et al. (2016). How culture shapes social cognition deficits in mental disorders—a review. Social Neuroscience: null-null. Lesthaeghe, R. J. (2010). The unfolding story of the second demographic transition. Population and Development Review, 36(2), 211–251. Levy, J. (1974). Psychobiological implications of bilateral asymmetry. In S. J. Dimond & J. G. Beaumont (Eds.), Hemisphere function in the human brain (pp. 121–183). Oxford, England, John: Wiley. Li, X.-T. (1983). The distribution of left and right handedness in chinese people. Acta Psychologica Sinica, 3, 268–276. Markus, H. R., & Kitayama, S. (1991). Culture and the self: Implications for cognition, emotion, and motivation. Psychological Review, 98(2), 224–253. Masuda, T., Ellsworth, P. C., et al. (2008). Placing the face in context: Cultural differences in the perception of facial emotion. Journal of Personality and Social Psychology, 94(3), 365–381. Masuda, T., & Nisbett, R. E. (2001). Attending holistically versus analytically: Comparing the context sensitivity of japanese and americans. Journal of Personality and Social Psychology, 81(5), 922–934. McManus, I. C. (2009). The history and geography of human handedness. In I. E. C. Sommer & R. S. Kahn (Eds.), Language lateralization and psychosis (pp. 37–58). Press: New York, Cambridge University. Medland, S. E., Duffy, D. L., et al. (2009). Genetic influences on handedness: Data from 25,732 Australian and dutch twin families. Neuropsychologia, 47(2), 330–337. Medland, S. E., Duffy, D. L., et al. (2006). Handedness in twins: Joint analysis of data from 35 samples. Twin Research and Human Genetics, 9(01), 46–53. Nisbett, R. E., Peng, K., et al. (2001). Culture and systems of thought: Holistic versus analytic cognition. Psychological Review, 108(2), 291–310. Park, D. K., Gelman, A., et al. (2004). Bayesian multilevel estimation with poststratification: Statelevel estimates from national polls. Political Analysis, 12(4), 375–385. Polavieja, J. G. (2015). Capturing culture: A new method to estimate exogenous cultural effects using migrant populations. American Sociological Review, 80(1), 166–191. Porac, C., & Coren, S. (1981). Lateral preferences and human behavior. New York: Springer. Porac, C., Coren, S., et al. (1986). Environmental factors in hand preference formation: Evidence from attempts to switch the preferred hand. Behavior Genetics, 16(2), 251–261.

96

H. Xu

Porac, C., & Rees, L. et al. (1990). Switching hands: A place for left hand use in a right hand world. Left-Handedness: Behavioral Implications and Anomalies. S. Coren. Amsterdam, Elsevier Science, 259–290. Raymo, J. M., Park, H., et al. (2015). Marriage and family in east Asia: Continuity and change. Annual Review of Sociology, 41(1), 471–492. Raymond, M., & Pontier, D. (2004). “Is there geographical variation in human handedness?” laterality: Asymmetries of body. Brain and Cognition, 9(1), 35–51. Satz, P. (1972). Pathological left-handedness: An explanaory model. Cortex, 8(2), 121–135. Siengthai, B., Kritz-silverstein, D., et al. (2008). Handedness and cognitive function in older men and women: A comparison of methods. The Journal of Nutrition, Health & Aging, 12(9), 641–647. Smart, J. L., Jeffery, C., et al. (1980). A retrospective study of the relationship between birth history and handedness at six years. Early Human Development, 4(1), 79–88. Tambs, K., Magnus, P., et al. (1987). Left-handedness in twin families: Support of an environmental hypothesis. Perceptual and Motor Skills, 64(1), 155–170. Teng, E. L., Lee, P.-H., et al. (1976). Handedness in a chinese population: Biological, social, and pathological factors. Science, 193(4258), 1148–1150. Thornton, A. (2001). The developmental paradigm, reading history sideways, and family change. Demography, 38(4), 449–465. Thornton, A. (2005). Reading history sideways: The fallacy and enduring impact of the developmental paradigm on family life. Chicago: University of Chicago Press. Thornton, A., & Achen, A. et al. (2010). Creating questions and protocols for an international study of ideas about development and family life. Survey Methods in Multinational, Multiregional, and Multicultural Contexts, Wiley, 59–74. Thornton, A., Binstock, G., et al. (2012a). International fertility change: New data and insights from the developmental idealism framework. Demography, 49(2), 677–698. Thornton, A., Ghimire, D. J., et al. (2012b). The measurement and prevalence of an ideational model of family and economic development in nepal. Population Studies, 66(3), 329–345. Triandis, H. C. (2001). Individualism-collectivism and personality. Journal of Personality, 69(6), 907–924. Varnum, M. E. W., & Hampton, R. S. (2016). Cultures differ in the ability to enhance affective neural responses. Social Neuroscience, 1–10. Welzel, C. (2013). Freedom rising: Human empowerment and the quest for emancipation. New York: Cambridge University Press. Yamamoto, Y., & Sonnenschein, S. (2016). Family contexts of academic socialization: The role of culture, ethnicity, and socioeconomic status. Research in Human Development, 13(3), 183–190. Zhang, X., Holt, J. B., et al. (2014). Multilevel regression and poststratification for small-area estimation of population health outcomes: A case study of chronic obstructive pulmonary disease prevalence using the behavioral risk factor surveillance system. American Journal of Epidemiology, 179(8), 1025–1033. Zhao, Y., Hu, Y., et al. (2014). Cohort profile: The china health and retirement longitudinal study (CHARLS). International Journal of Epidemiology, 43(1), 61–68. Zhao, Y., Strauss, J., et al. (2013). The China health and retirement longitudinal study (CHARLS)— Users’ Guide for the 2011–2012 national baseline survey. Beijing: China, National School of Development, Peking University.

Chapter 8

Conservation of Cave-dwelling Village using Cultural Landscape Gene Theory Anrong Dang, Dongmei Zhao, Yang Chen, and Congwei Wang

8.1 Introduction As a result of the interaction between the natural environment and the folk culture for a long history, village cultural landscape, with strong regional and national characters, not only is the crystallization and witness of agricultural civilization, but also the carrier of regional culture (Dong et al. 2019; Chen et al. 2014; Yang et al. 2013; Yang and Dang 2012; Li et al. 2010). Under influences of an environmental concept known as “oneness of nature and man”, the cave dwelling which is seen as a building taking root in the earth becomes a principal architectural form in Wudinghe river basin (Dang et al. 2013, 2012a, b; Ma 2012). Located in the northern Loess Plateau of China, Wudinghe river is one of the main tributaries in the middle reaches of the Yellow River. Not only countless neolithic Longshan culture sites distribute along both sides of Wudinghe river, but also various types of cave village cultural landscape which constitute an integrated system, and all these have great academic value and significance of cultural inheritance (Zhao 2010; Dang et al. 2009; Li 2007; Qin et al. 2008). According to the cultural landscape gene theory, it is deemed that there exists a cultural factor which is not only different from other cultural landscapes but also can be inherited from generation to generation (Huo and Liu 2005; Liu 2004). Effective A. Dang (B) · D. Zhao · Y. Chen · C. Wang School of Architecture, Tsinghua University, Beijing 100084, People’s Republic of China e-mail: [email protected] D. Zhao e-mail: [email protected] Y. Chen e-mail: [email protected] C. Wang e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_8

97

98

A. Dang et al.

identification of the landscape can only be achieved by grasping the gene fundamentally. In addition, Cultural Landscape gene theory also puts forwards that the settlement landscape gene identification can be performed from five aspects, such as environmental factors, layout forms, totem signs, subjectivity public buildings, and dwelling characteristics (Liu 2004). In this chapter, the author treats the concepts of cultural landscape gene theory as breakthrough points to analyze the five genetic characteristics of cave-dwelling village cultural landscape (CDVCL) in Wudinghe river basin, such as natural gene, cultural gene, spatial gene, material gene, and intangible gene, and explore their cultural values in order to conserve CDVCL.

8.2 The First Genetic Characteristic of CDVCL: Natural Gene 8.2.1 The Loess Landform The landform of the Wudinghe river basin can be roughly divided into three parts, such as the hinterland of the Maowusu desert in the northwest where lower population density within larger area, the river source area in the southwest, and the loess hillygully region in the southeast where both residents mainly settled. Covered with thick loess and divided into numerous broken parts by long-term erosion of the Wudinghe river tributaries, the region is characterized with ravines interlaced morphologies such as tableland, loess hill, hillock and groove. With a depth of 100–200 m and difficult seepage and extremely orthostatic nature, the loess provides a very good precondition for the development of cave dwelling.

8.2.2 Semi-arid Climate With an annual average temperature of 7.9–11.2 °C and an annual rainfall between 300–550 mm, the Wuding river basin is in the temperate continental monsoon climate. The areal distribution of precipitation is unbalanced, which is mainly concentrated in summer (from June to September) with frequent rainstorm and torrential downpour. To sum up, the environmental factors of dry climate with little rain, cold winter, hot summer, and lack in forstory in the basin, form the foundations of formation and development of cave dwelling, which is warm in winter and cool in summer, green and economical, and wood is not required.

8 Conservation of Cave-dwelling Village …

99

8.2.3 Dry Farming With generally poor ecological environment and complex landscape, most area of the Wudinghe river basin has to conduct dry farming that mostly relies on natural precipitation. On the other hand, large-scale irrigation and water conservancy facility are hard to construct because of the deficiency of flat land. As a result, the efficiency of agricultural production there is fairly low, therefor, the residents always settled in the area close to the arable land. It is the dry farming mode in the basin that not only determines the layout structure of village, but also the choice of the most suitable building there—cave dwelling. Adapted to the slope topography, obtaining interior space horizontally, and making the most use of undisturbed land as its wall and roof, cave dwelling is conducive to protect the limited arable land. Especially, the original earth was dugged out during the construction of the caves on cliff, can also be used to fill slope to be stretched out, so as to increase the farmland area.

8.3 The Second Genetic Characteristic of CDVCL: Cultural Gene Rooted in deep loess, the cave dwellings are mostly built along the river, cliffs or slopes, forming the structure of one layer or layers of stereo cave zones along the contour lines, or sunken caves which can be described by “in the village without seeing it, only top of the tree crown is visible”.

8.3.1 Village Pattern of Along the River Influenced by the Chinese traditional “geomancy” culture, the site selection of cave dwelling generally maintained a routine, that is “face water and back to mountain, carry Yin and embrace Yang”. In order to avoid natural disasters like flood, debris flow, landslide, oblique slip, etc., villages always distribute beside the river and extend according to the concave and convex folds of hills and valleys, achieving the state of the balance between nature and humanity while making full use of the nature. Therefore, the spatial layout of the village is mostly determined by the river morphology, more specifically, there are three layout types: dendritic structure following the directions of the valley, parallel structure perpendicular to the valley and scatter structure spreading along the branches of the valley.

100

A. Dang et al.

8.3.2 Village Pattern of Along the Cliff Cave dwelling that built along the cliff on the loess slope and utilize the open space in front of it is so called the “backer cave”, which normally shows a curve or broken liner distribution along the contour line. This type of cave dwelling has good daylight, but there is a certain slope distance from the river and the road, so it is not quite convenient for the residents to access, transport materials and get water.

8.3.3 Village Pattern of Along the Slope This type of cave dwelling is normally built on the sunny side of the loess slope beside the river or on the upper part of the rock wall, both of which are always too steep to cultivate on. The main body across the courtyard is constituted of 2–5 backer caves, together with pigsty or sheepfold, toilet, walls and gates to form a basic building unit. The caves extend along the slope in compliant with topography, and always show a linear distribution. Viewing from a further perspective, layers of circular arches strewn at random altitude outline the whole village. The advantage of this type is the convenience in transportation and water consumption, as well as shelter from sand, while the weakness is poorer vision than the backer cave.

8.4 The Third Genetic Characteristic of CDVCL: Spatial Gene 8.4.1 Production Space Most land in the Loess Plateau are extremely barren, so limited population are fed with low crop yield. Therefore, in addition to the restriction of landform, the size of the village there is determined by the quantity and quality of arable land around the village to a great extent. Due to the gully topography, arable land in the Wuding River Basin are mostly small and scattered instead of large and flat, and the distance between the farming area and the residence space is quite short, as a result, the scale of most cave villages are small.

8.4.2 Living Space Generally, a courtyard lives one family. Compounded by main caves, outdoor hearth or lean-to, livestock shed or corn storehouse, millstone, pigsty or henhouse, seepage

8 Conservation of Cave-dwelling Village …

101

pit or water cellar, toilet, fences and gates, the courtyard possess both living and production functions.

8.4.3 Mental Space Because of the belief in the Mountain-god, villagers consider the local temple as a place for spiritual sustenance, so the temple is the main place for public activities typically. In the day of the first, the ceremony of “invitation of the Mountain-god” usually held, and then, in the name of god, public activities such as Yangko, serpentine maze, drama show, etc. are organized. Therefore, in the view of space form, the temple, outdoor stage and square are adjacent to each other to keep all the activities smoothly.

8.5 The Fourth Genetic Characteristic of CDVCL: Material Gene 8.5.1 Construction Materials According to the structure and material, cave dwellings can be divided into the following types, such as loess cave, interface cave, brick cave, stone cave, as well as adobe cave, thin shell cave, brick-stone cave and other derived ones. To be more specific, the adobe cave and the thin shell cave are derivative from the loess cave and the brick cave accordingly, and the brick-stone cave uses two kinds of building materials. (1) Loess cave. It is the most primitive form of cave, derived from the ancient habitude of cavemen. The loess cave is generally 3 m wide, 3 m high and 7 to 8 m deep, while the deepest could be 20 m. The advantages of the caves, which are being worm in winter and cool in summer, cost and material saving and easy to build, are fully embodied by the loess caves. However, at the same time, poor daylight and air circulation, windows and walls are hard to paint, front walls are easily weathered and rain corroded, landslide leads to collapse are the main weaknesses. With the gradual improvement of the residents’ living standard after the founding of new China, the loess caves have been largely abandoned. (2) Interface cave. On the basis of the former loess cave, several progresses are made including widening the original outlet, expanding the depth of the cave by 1 or 2 m, using bricks or stones to build the front wall (mostly between 1.5–m), and new made round window and wooden door. The connecting area between the loess and stone (brick) parts is hidden with screening, so as to integrate them as one. Bigger doors and windows mean larger lighting area and

102

A. Dang et al.

bring more sunlight, and the other improvements bring better heat preservation, firmness and outlooking. (3) Brick cave. It is a kind of arch cave that built by clay brick and mortar, with the advantages of beautiful and neat, but the weaknesses of poor heat preservation and easily aging. (4) Stone cave. It is a kind of arch cave that built with rocks and dust. Specifically, the front wall is assembled with square or arc rocks in accurate size, the internal wall is painted with white stucco, the stove is made by polished stone slabs and the floor and the Kang are pasted with tiles, all there treatments indicate that the stone cave is a kind of new generation of caves.

8.5.2 Facade Forms There are different forms of cave dwellings including backer cave, sunken cave, detached cave, etc. Firstly, the backer cave is most wildly applied, with terraced distribution along the slope or the edge of tableland. Secondly, the sunken cave is dug in the inwalls of a square pit, forming an underground courtyard. Thirdly, the detached cave is independent form the surroundings, also known as “head cave”. The most important facade of cave dwellings is the front one, facing the courtyard and often referred to as “the cave face”, which means it is as important as a man’s face, and always be decorated delicately by the local residents. The structures of the facade from top to bottom include the parapet, eaves, arch head line, doors, windows, etc., in which doors and windows take most part of the “face” and located at the center, so become the most decent decoration parts. The “face” of the cave dwellings in the Wuding river basin mostly in the form of “open and full arch window”—the area from the arch line to the middle part of the face all taken up by windows, while on the other side are doors, and brick, rock or adobe are only used below the windowsill. What’s more, there are some kind of number routines, for example, the layers of windowsill must be singular number, and 17 layer if it is a brick or adobe one. If several caves distributes linearly, the patterns of the window lattices should not be consistent. Moreover, the shapes of the arches are normally double, triple and concentric, so the overall modeling creates a atmosphere of clear, symmetry and grand.

8.5.3 Flat Pattern The flat layout form and structure pattern of cave dwellings mostly follow the Chinese traditional residential courtyard—enclosed courtyard, with the main forms of threesection, quadrangle courtyard, and courtyard with the combination of two former ones. In addition, some poor families only build one regular room, while enclose the courtyard by walls. Because the cave dwellings are located in the Loess Plateau with

8 Conservation of Cave-dwelling Village …

103

ravines crossbar and complicated topography, the terrain do always claim different requirements on the layout and structure of the courtyard. The layout of the cave dwellings has to abide by the etiquette rules usually, such as facing south and setting the living space in the southeast. Within a complete cave courtyard, only three cave rooms are dug on the frontage. Due to the concept of “virtues between father and son”, the middle cave with the biggest size is the elders’ house, the east and west rooms nearby are assigned to the eldest and the second son respectively, and the descendants and servants live in the caves (like wing-room) on both sides. Furthermore, the middle caves are often used as a central hall where decorate the spirit tablet of ancestors for worship.

8.5.4 Partial Adornment The priority colors of cave dwellings are yellow and green gray. The main external decorations of caves include the material, style, craft and color of the windows, door curtains, top bars, etc., showing whether the master is hardworking as well as the family’s wealth state. Above all, the arch curves of the doors, windows and the entrance are the most critical decoration parts, and the Chinese traditional culture and regional folk ideas have been embodied in the decoration patterns. The internal decoration of cave dwellings include the inner shape, wicket (that is, a small door between two caves) and the curtain cover up it, as well as paintings, coverings of furnitures and household appliances, etc., which are mostly the handiworks by the local residents especially the housewives. The tank and Kang surrounding paintings are mostly representative, and the latter has been included in “the list of second batch of national intangible cultural heritage” in 2008. According to the material, cave dwelling decoration can be divided into rockmade, brick-made, wood-made and paper-made. Rocks and bricks are usually used in caved lions, drums, foundations, screen walls, and arch headlines on the facade, overhangs, parapets and so on with auspicious patterns of “happiness”, “affluence” and “longevity”. Wood is mainly used in the carvings of gate raising and window lattice, etc. Paper refers to window and roof paper-cuts, Kang surrounding painting, hanging curtain, goalkeeper, etc. which can be temporarily replaced.

8.6 The Fifth Genetic Characteristic of CDVCL: Intangible Gene 8.6.1 Religion In the Wuding river basin, a variety of religious beliefs coexist, including not only the Buddhism, Taoism and Confucianism, which are widespread in China, but also

104

A. Dang et al.

the Catholicism, Christianity, and Islam. The characteristics of religious beliefs in Wuding River Basin are primitiveness, practicability and diversity. The system of Three Wise Kings and Five August Emperors, being closely connected with the agricultural civilization, was the worship of religious belief at the early stage. As the rise of the combinational culture of the Confucianism, Buddhism and Taoism, temples were gradually constructed. In Ming and Qing dynasties, the temples and folk meetings were connected, forming a strong grassroots social organization, in which the entertainment activities promoted the public conservation efficiently, and the ingenious union of folk art and sacrificial activities brought more vigor to the temple fairs, strongly attracting the masses to participate in the activities of religious culture. In the area, numerous temples of different religious beliefs existed in each village, and almost every family has their own God, which mainly due to the special location of the Wuding river basin—a crucial battlefield of the Hans, Huns and other minorities in the transitional zone between the cropping and nomadic area. In ancient times, frequent wars and natural disasters made people struggle on the edge of starvation and in pain bitterness. So people desired for peaceful life and then turned to the gods blessing, at the same time, because of the promotion of cultural communication and integration by war, the religious believes became diverse.

8.6.2 Traditional Customs As most residents in the area have the pantheism faith and believe that “the gods are everywhere”, the traditional folk custom there usually shows the fear of god. Taking the cave building as an example, from the very beginning of the location decision, the geomancer plays an important part in selecting the terrain, orientation and propitious timing. First, the geomancer helps the master to decide the orientation of the new building by means of the “compass” and tell him which day is good for the beginning of construction. On the break ground day, the master would held a serious of worship activities,including offering food and drink, lighting incense and kowtow to the local soil god and tell that the construction is going to start, praying god bless the whole family in the coming days, and then, the construction could be started. On the finishing day, another ceremony, named ‘closing dragon’s mouth’, should be held. What’s more, before the possession date, there are even other activities like window setting, god placing, cave worming, etc.

8.7 Conclusions Embodying the environmental concept of “the unit of nature and man”, the cave dwelling village in the Wuding river basin, is a typical representative of human settlement environment and an important part of human cultural heritage, which shows the distinctive local characteristics. By analyzing the form and other aspects

8 Conservation of Cave-dwelling Village …

105

of the settlement, this paper comes to the conclusion that, during the process from the location and formation to the development of the settlement, the form is indeed the result of the joint interactions among the history, natural environment and humanities environment. To conserve, develop, and inherit the traditional settlement with historical, cultural and artistic value, it is not only beneficial to maintain its regional characteristics and cultural continuity, but also make sense to the modern architectural designing for reference and practical value by means of conserve the Cultural Landscape Genes.

References Chen, Y., Dang, A., et al. (2014). Building a cultural heritage corridor based on geodesign theory and methodology. Journal of Urban Management, 2014(1–2), 121–141. Dang, A., Zhao, D., & Cheng, Y. (2012a). Characteristics of traditional cave dwelling village cultural landscape at Yulin Prefecture. Traditional Village Conservation, 10, 128–133. Dang, A., Ma, Q., & Lv, J. (2009). Conservation of Chinese traditional culture based on information technology. The 14th Inter-University Seminar on Asian Mega-Cities (IUSAM), Taipei City, Taiwan, China. March 12–15, 2009. Dang, A., Ma, Q., & Zhao, J. (2012b). Conservation study on village traditional culture based on geo-information technology. Urban Flux, 1, 26–29. Dang, A., Zhang, Y., & Chen, Y. (2013). sustainable-oriented study on conservation planning of cavedwelling village culture landscape. In: Spatial Planning and Sustainable Development, edited by M. Kawakami et al. published by Springer. Dong, Y., Fei, Y., & Dong, Y. (2019). analysis of the cultural landscape characteristics of Hezhe Traditional settlements based on the genetic method of cultural landscape. Development of Small Cities and Towns, (03), 98–105. Huo, Y., & Liu, P. (2005). The town form and the landscape of the Loess Plateau. Journal of Architecture, 12, 42–44. Li, Y., & Dang, A. (2010). application of spatial statistical for village system planning. In: Spatial Integrated Humanities and Social Science. China Science Press, (2): 257–269. Li, J. (2007). Research on the ancient settlement of Mizhi cave in northern Shaanxi, Xi’an, Xi ‘an academy of fine arts. Liu, P. (2004). The gene expression and recognition of ancient village cultural landscape. Journal of Hengyang Normal University, 24(4), 1–8. Ma, W. (2012). The planning and design of traditional cave village landscape in northern Shaanxi— based on the principle of cultural ecology. Beijing: Tsinghua University. Qin, Y., & Dong, J. (2008). The villages and natural environment of the Loess Plateau in northern Shaanxi in the late Qing and early republic. Gansu social Science, 2008, 210–213. Yang, Y., & Dang, A. (2012). The cross-disciplinary research methods on village cultural landscape, taking Nuodeng village as an example. Urban Flux, 2012(1), 18–22. Yang, Y., Zhang, D., & Dang, A. (2013). 2013, Discuss on the spatial and timing characteristics of the forming mechanism of village cultural landscape—taking Nuodeng village as an example. Chinese Garden, 3, 60–65. Zhao, J. (2010). 2010, The conservation planning and design of cave village landscape in northern Shaanxi—taking Dangjiashan village as an example. Beijing: Tsinghua University.

Chapter 9

Digitalized Enka-Style Taipei C. S. Stone Shih

9.1 Introduction This study explores the popularity of Taiwanese ballad that reflect influences from Japan. These ballads have evolved from Taipei’s old-town regions throughout history. As the geographic information system (GIS) precursor, Goodchild (2004) proposed a spatially integrated approach to humanism and social science. The integrity of this approach relies largely on the technological progress of the GIS, such as virtual reality, the Internet, and wireless terminals. Progress in the information processing technology in terms of the GIS has led to the regular use of time–space-oriented information in humanities because a cultural–spatial–analytical perspective is employed in humanities (Bodenhamer et al. 2010). Since 2001, sociology digital mapping has been developed as a separate discipline in Taiwan. By rethinking the social space and the integrity problem, Shih proposed the “GIS bridging method” by using digital drafting for multidimensional adjustment, interpretation, and reconstruction. The bridging method is a human-centered study that includes multiple methods and data for conducting comprehensive quantification and GIS integration (Shih et al. 2010). A researcher maps the cultural space with the social space and physical space (Liechty 2003). The differentiation within a class and among different classes causes the appropriation and redefinition of any space by the ascendant group according to cultural logic (Bridge 2002; Podmore 1998). The social space may be defined as a cultural site not only selected as the geosocial locale of ethnographic gaze but also as a centralized location within a cultural community that serves as the confluence of banal ritualized activity and exchanges of cultural currency (Alexander 2003). The main factors of social space discussed in this study are as follows: class and ethnic

C. S. Stone Shih (B) Department of Sociology, Soochow University, Taipei 111, Taiwan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_9

107

108

C. S. Stone Shih

groups and dynamic change in Taipei’s historical landscape via musical genres, especially Taiwanese ballad. In this study, we explored the cultural space of Taiwanese ballad related to Enka sources by bridging qualitative interviewing methods with the GIS. All these cultural imagery and historical discourses are intertwined with cultural products and imbued with social meaning, thus providing a critical understanding of the relevance of Enka in Taipei.

9.2 Taiwanese Enka-Style Ballad Performers in the 1960s: Two Cases1 Enka has considerably influenced many post-war popular Taiwanese songs, such as songs of the singing diva Lu-Shyia Chi (紀露霞) and primo singer Yi-Feng Hung (洪一峰) in the 1960s. Enka originated in the 1880s during the Meiji period (1868–1912), and jiyu minken undo (freedom and civil rights movement) occurred during 1874–1890. The freedom of public expression was restricted by the Meiji government. Thus, to avoid government restrictions and police interference, Japanese intellectuals used speech–song form to express their ideas at public gatherings. The most primitive form of Enka— half sung and half spoken—was performed by Enkashi (Enka singer). The musical style of Enka changed again in the early post-war period (i.e., the 1950s). Naniwa bushi, a genre of traditional Japanese narrative singing, did not have the largest influence on Enka. Rather, American jazz bands, which thrived in major cities and military bases, were more influential on Enka than Naniwa bushi. However, Japanese burusu became popular again in the 1960s. The songs composed in Japanese burusu were known as mudo-kayo (mood songs). A slow-to-medium tempo, the topic of lost love, and the use of Western instruments such as the saxophone and guitar are the signature musical traits of Japanese burusu. For distinguishing between the Japanese and Western popular music styles, the music industry in the 1970s used the label of Enka to represent and emphasize the popular Japanese musical styles. Enka was derived from popular songs. The independence of Enka can be traced back to the 1960s. Since the establishment of the special department of the Columbia Record Company (コロムビア販賣株式會社) in the 1963, Enka has been considered as a unique genre of popular Japanese music (Kishi 2013). The typical accompanying instruments for Japanese-style Enka are the alto and tenor saxophones, trumpet, electric bass, piano, and string instruments. These differences became more obvious between the 1970s and 1980s because of the emergence of nyu myujikku (new music), which included brighter and lighter lyrics and modern music styles. Compared with the older audience who preferred the sad melodies of Enka, the audience of nyu myujikku is younger. Currently, Enka has become a distinct genre, and its subgenres have been developed on the basis of the musical style and lyrical subjects. According to Yano, Enka is currently less 1 All

the interviewees were interviewed by the author between 2009 and 2011.

9 Digitalized Enka-Style Taipei

109

popular than before and has been replaced by other forms of popular music (Yano et al. 2003). Because popular Japanese music frequently incorporates elements of Western music for producing new genres and subgenres, more varieties of popular music exist than ever before. Thus, the history, musical style, and audience of Enka are not fixed under different cultural contexts. People still sing and listen to Enka on different occasions in their daily life. The two singers, Yi-Feng Hung (the primo singer) and Lu-Shyia Chi (the singing diva), are representative performers of Enka-style Taiwanese ballad in the 1960s. Hung Yi Feng covered most of the musical repertoire of the Enka bass singers of the time. For example, he covered 15 musical pieces of Frank Nagai (フランク永井), such as “Let’s Meet at Yurakucho” (有樂町で逢いましょう). He also performed the nine musical pieces of Ishihara Yuujirou (石原裕次郎), such as “Rusty Knife”(錆びたナイフ); two musical pieces of Ichiro Kanbe (神戶一郎); and several pieces of Mifune Hiroshi (三船浩), such as “Man’s Blues”(男のブルース). These musical pieces were performed by well-known bass singers. Moreover, the selected songs were related to Japanese culture. Consequently, Hung traveled to Japan and performed in the Nihon Gekijou theater (a theater in Japan) in Tokyo in 1962 due to his popularity. Due to his inclusion in the Nihon Gekijou program and because he was considered a bass singer as good as Frank Nagai, Hung was the male representative of Taiwanese ballad. He recorded many albums and more than 100 individual pieces of music. He also played the leading role in Taiwanese movies of the 1960s. His vocal performance exhibits the same bass and aura as that of Frank Nagai, which helped popularize his songs. After Hung performed at the Nihon Gekijou theater, he returned to Japan eight times continually and taught Chinese folk songs in Shinjuku. He divided his time among the major cities in Japan, including Tokyo, Nagoya, Sendai, Yokohama, Osaka, and Kyoto. Lu-Shyia Chi adapted songs from musicschool-trained Japanese divas. For example, Misora Hibari’s (美空ひばり) musical repertoire was the primary source of Chi’s adaptations, such as “Hill at the Dusk” (夕やけ峠) and “Hibari the Flower Girl” (ひばりの花売娘). In contrast to the musical sources of Hung, who adapted songs from a limited selection of singers, Chi’s sources were more varied. For example, her other sources were the musical pieces of Awaya Noriko (淡谷のり子), Hirano Aiko (平野愛子), Takamine Mieko (高峰三枝子), and tenor singers such as Haida Katsuhiko (灰田勝彦). Despite adapting songs from various Japanese singers, Chi displayed her unique singing skill by effectively using the resonance of her head and thoracic cavity for creating a soft and mellow tone, which she developed from her bel canto learning experience. Chi’s own personal singing style earned her the title of “Taiwanese Misora Hibari.” In contrast to Chi, who composed Taiwanese ballad with Mandarin songs, Hung only sang Taiwanese ballad. However, similarities exist in the content of their work. The songs selected by them were generally well-known classics in their original countries, and their songs differed from the majority of Enka songs adapted to Taiwanese because the adaptations became classics in Taiwanese rather than in Japanese. Although Lu-Shyia Chi was honored with the title of “Taiwanese Misora Hibari” due to the popularity of her hit song “Hill at Dusk,” her song differed from that of Misora Hibari’s original version and could not compare with Hibari’s classic.

110

C. S. Stone Shih

The original Japanese version of “Hill at Dusk” was composed in 1957 during the second rural-to-urban population migration. Due to the rise of the cultural industry driven by broadcasting and television, Japan entered the “era of singer adoration and popular music fervor ” (愛唱歌時代) after the 1950s. Simultaneously, Taiwan experienced an ethnic conflict in 1947. The massacre of the 228 incident led the Taiwanese to long for a way out of trauma. Taiwan also entered into the era of singer adoration and popular music fervor in the 1960s (Shih 2016). Compared with the social background in Japan, the post-war rural-to-urban migration in Taiwan occurred from the 1950s to the 1960s. Chi’s version of the song was a response to the urbanized social background. For example, consider the country girl described in the lyrics of “Hill at Dusk,” who moved to a factory in the metropolitan area of Taipei for finding a job. While working day and night, she missed her mother in the village of southern Taiwan and imagined that she stood on the hills in the evening gazing afar for her hometown. This scenario illustrates a social–political process of how Japanese urban songs were transformed into Taiwanese ballad. The Taiwanese people localized the lyrics that contained Enka. This often occurred in the early Showa period in the historical stage and echoed the social situation of Taiwan in the 1960s. As an urban popular song genre, Enka emerged through the stories of misfortune that reflected the backwardness of Taiwan’s industrialization and the distress in the political environment.

9.3 Digitalized Enka Pertaining to Taipei that Reflected a Mixed-Race Cultural Space Lu-Shyia Chi’s Japanese songs that were covered into Taiwanese could be defined as “mixed-race songs.” After the 228 incident, in the 1960s, Taiwan’s economy did not reach mature global circulation, political control of popular music was not strict, and relevant legal norms were not yet institutionalized. In this situation, local lyricists added lyrics to the existing songs, reinterpreted foreign songs, and localized foreign songs in a multidisciplinary manner to transform them to the Taiwanese style. I consider this process as “quasi-globalization” (Shih 2014), which was a cultural rather than economic globalization. Quasi-globalization illustrated the success of Taiwanese ballad in the 1960s. The incorporation of Enka into Taiwanese songs by singers such as Lu-Shyia Chi and Yi Fong Hong’s was only one of the forms of mixed-race songs. The singers not only incorporated pieces from Japan but also from China, especially Shanghai, Hong Kong, Vietnam, South Korea, the United States, and Italy. The mixed-race songs, especially including Enka, studied in this research are analyzed in terms of the historical time from the governance under the Japanese colonial era (1895–1945) to the Kuomintang governance in the early post-war period as well as in terms of the distribution of musical media venues in the cultural space. The venues for popular music performance were mainly in the old district of Taipei, the

9 Digitalized Enka-Style Taipei

111

western district near the Tamsui River. Before Japanese colonization in Taiwan, the Chinese Qing government had already established Taipei City. The main administrative center was later renamed as “inner city” (城內) by the Japanese people. Traditionally, inner city, Dadaocheng (大稻埕), and Monga (艋舺) in Taipei’s western district were together known as the “three prosperous streets” (三市街). After occupying Taiwan, the Japanese people established Seimonch¯o (西門町) in a place close to inner city. The region resembled Asakusa near Tokyo, the Japanese metropolis that was a center of recreation and business. As displayed in Fig. 9.1, four districts of Taipei constitute the main cultural space in this study: Inner city, Dadaocheng, Monga, and Seimonch¯o. Regarding the space of popular songs, Jones (2004) proposed the concept of “media loop” to explain pop music circulation in Shanghai in the 1930s. The work of Jing-hui Li (黎錦暉), who is the founding father of Chinese yellow music (modern song), was taken as an example. In a sense, the film, Peach Blossom Dream (1935), represented the creation of a new media loop at that time. The sort of urban milieu in which Li’s yellow music had first gained popularity became the object of filmic representation in movies pertaining to the lives of sing-song girls who performed the music. The screen songs from the movie were published in the collections of sheet music and film magazines in turns, made into gramophone records, broadcasted, and ultimately emulated by sing-song girls in the dance halls. Figure 9.1 displays the cultural space of Taipei’s musical genre and media (1930–1970). The venues presented through digital mapping, such as cinemas, dance halls, radio stations, and recording shops, were used for playing music and conducting dramas in Taipei. These venues could be classified into six categories according to the musical style from the construction of the city to the initial post-war period: (1) Beijing opera/nanguan /stage play, (2) Taiwanese puppet show/Taiwanese opera, (3) popular music from Shanghai and Japanese movies, (4) popular music from Taiwanese movies and ballads, (5) occidental movies, and (6) outdoor cabaret. Some of the venues were assigned to several categories. This indicates that a venue could be used for diverse genre performance functions and could be regarded as a mixed hall. Consider the following examples: Eirakuza (永樂座 #13; Here, Beijing operas, stage plays, and Japanese and Shanghai movies were conducted before the war but Taiwanese movies were displayed after the war), The First Theater (第一劇場 #12; Here, film screenings of occidental movies, Beijing opera performances, Japanese movies, and Taiwanese movies occurred after the war), Yoshino Kan in Seimonch¯o (芳乃館 #17; Here, Beijing opera performances occurred before the war and screenings of Japanese, occidental, and Chinese movies occurred after war). Varied performances were conducted in the mixed halls, which indicated that the musical space in Taipei from the Japanese colonial period to the initial post-war period was hybrid and complex. Several venues managed by the Japanese people, such as Taipeiza (台北座 #29), Niitaka Kan (新高館 #30), and Yoshiaki Kan (芳明館 #28), hosted traditional Japanese songs and dance forms, such as noh (能劇). The Taiwanese people were the majority spectators for the Taiwanese puppet show (budaisi布袋戲) and Taiwanese opera.

112

C. S. Stone Shih

Fig. 9.1 Cultural Space of Taipei’s Musical Genres and Media Venues (1930–1960)

9 Digitalized Enka-Style Taipei

Fig. 9.1 (continued)

113

114

Fig. 9.1 (continued)

C. S. Stone Shih

9 Digitalized Enka-Style Taipei

115

Fig. 9.1 (continued)

During the Qin Dynasty, Changchou (漳州) and Quanzhou (泉州), who were immigrants from Fukien in China, sailed across the sea to Taiwan and initially resided in Monga. They then moved to Dadaocheng because of mob violence and cultivated the Tamsui River bank. Because of the dense population in this area, the area was named as the “Taiwanese street” (臺灣人市街). Before 1920, Dadaocheng, Monga, and Dalongdong (大龍峒) belonged to the Taipei Prefecture (台北州). These regions were known as “Taipei” according to the local government system after the administrative area was restructured by the Japanese government. Monga was occupied by both the Taiwanese and Japanese people and contained the “Wan Hua Hooker Street” (萬華遊廓), which offered Japanese prostitution and entertainment services during the Japanese colonial period. Seimonch¯o was designated the downtown area by Japanese administrators. The Japanese government decided to follow the model of Asakusa district, Tokyo, and filled soil into the Monga depression in October 1914. After the construction was completed, the Japanese people were the principal residents in Seimonch¯o and formed the so-called “Inlander’s street” (Japanese street, 內地人市街), which became a recreation and business area that flourished until the 1960s under the Kuomintang government (Gao 2004). Because of class and ethnic group divisions, Hsin Minpao (臺灣新民報) presented a report titled “Three Main Problems of Taipei” on August 2, 1930. The report stated that “In the 5th year of the Showa period (1925), the total population of Taipei City was 233,340, the number of Japanese was 64,800, and the number of Taiwanese was 164,400. The Japanese and the Taiwanese inhabited areas were clearly divided. The Japanese mostly lived in the Inner city, and some lived in Monga. The Taiwanese lived in Dadaocheng and Monga. Therefore, the Japanese and Taiwanese people were divided in space.” The scenario continued from the 1930s to the late 1960s. There existed differences between the language and reading habits of different ethnic groups. Figure 1 illustrates the aggregation of music genre venues. The ethnic groups who lived in Monga were in a mixed state. Thus, the songs heard and movies watched by the residents were mixed as well. As displayed in Fig. 9.1, the audience who were attracted to the Monga

116

C. S. Stone Shih

theater (#25) included Taiwanese and Japanese people. The content of the performances included Taiwanese puppet shows (budaisi); Taiwanese operas (歌仔戲); stage plays (新劇); and Japanese, Taiwanese, and Chinese movies. In Dadaocheng, music and drama, such as Taiwanese ballad, Taiwanese movies, Taiwanese operas, Taiwanese puppet shows, stage plays, Shanghai movies, Beijing operas, nanguan ( 南管), beiguan (北管), and Xiao-Qu (小曲), were mainly conducted in the theaters. For conducting stage plays, the Kosei Theater Society (厚生演劇研究會) was established in 1943 by intellectuals who were highly interested in drama, such as ChuanSheng Lu, San-Lang Yang, and Tuan-Chiu Lin. They held the first performance at Eirakuza (#13) in Taipei on September 3rd, and their program included “A Capon,” “Takasag Kan,” “Terrestrial Heat,” and “The City Lights We Look Down Above from the Mountain” in Japanese, which were written and conducted by Lin Po Chiu. These plays resulted in Eirakuza being unprecedentedly packed (Yeh 1990; Shih 2011). After the war, the new venues in Dadaocheng included Greater China Theater (#14), Da Qiao Theater (#11), Guosheng Cinema (#2), Golden Dragon Hall (#5), and Xiaokilin (#4). Before the war, the general public watched Taiwanese puppet shows and listened to Taiwanese ballad, whereas the intellectuals favored nanguan, beiguan, and Beijing operas. Because Dadaocheng was a “Taiwanese street,” in addition to speaking Taiwanese, the residents yearned for Chinese culture and favored Chinese opera and music. Therefore, during the Japanese colonial rule, Dadaocheng was not close to Japan but was rather close to China. However, the post-war 228 incident occurred exactly in Dadaocheng. This event completely changed the story. The Kuomintang government deliberately suppressed Taiwanese ballad. Eventually, the Chinese musical genre was rejected by the Taiwanese people and was replaced by Japanese songs and movies. After the war, Kuomintang controlled the inner city. As a Japanese entertainment center, Seimonch¯o was transformed for serving mainlanders. Japanese pop culture was directly expelled to Dadaocheng. In the early 1950s, the Mandarin-oriented genres of music and movies favored by Taiwanese audiences indicated that they culturally identified themselves as Chinese in the Japanese colonial rule. The 228 incident completely disintegrated the identity to cultural China of the Taiwanese in Dadaocheng after the war. Dadaocheng continued to be a center for the performance of Taiwanese ballad and Taiwanese films transformed from Xia Qu and the Taiwanese opera. Before the war, Seimonch¯o was the entertainment center of the Japanese ruling class. Many types of movies were exhibited in Seimonch¯o from countries such as Japan, China, the United States, and Europe. All the exhibited movies were firstround movies. After the war, the Kuomintang government directly took over the entertainment space of the Japanese people. People in Seimonch¯o continued to watch first-round movies and listen to pop songs; however, the language was changed from Japanese to Mandarin. After being defeated by the communist party in China, the Kuomintang government exiled two million soldiers and people from Taiwan. These people were historically known as “mainlanders” (外省人). An owner who originally operated a dance hall and was engaged in the entertainment industry in Shanghai reopened a similar hall in Seimonch¯o and banned Japanese songs there. Thus, the cultural landscape of the district changed completely.

9 Digitalized Enka-Style Taipei

117

The locations of the performances were Sekai Kam (#18), Sinsekai Kam (#19), and Daisekai Kam (#20). After the war, Japanese movies, noh dramas, and stage plays were mainly screened, and Chinese operas were sometimes shown. The Inner city area adjacent to Seimonch¯o was one of the places where senior Japanese officials congregated. The main performance venues in Inner city were Taihoku City Public Auditorium (#21) and Kikumoto Department Store (#15). The films shown at these locations included movies from Japan as well as first-round movies in Europe and America. After the war, the entire situation changed. The Chinese people replaced the Japanese people as the masters of the presidential palace at Inner city, secured political power, and became the main consumers of entertainment at Seimonch¯o and Inner city.

9.4 Interviews and Digital Mapping Interpretation During the Japanese colonial era, the Shanghai Beijing opera troupe conducted a program at the Eirakuza (永樂座) in Dadaocheng in 1923. Before 1937, Shanghai, Japanese, and occidental movies were often screened, whereas Taiwanese operas and new dramas were occasionally performed at Eirakuza. The films presented in Mandarin were mostly produced by the Lian Hua (聯華) and Mingxing film companies (明星影業) in Shanghai. The movies had plots revised from romances that had occurred in both new and old Chinese societies. Popular movie examples during this trend include “The Broken Zither Loft” and “Peach Blossom Village” by Hu Die (胡蝶) and “Love and Duty” by Ling-Yu Ruan (阮玲玉). Moreover, occidental movies were screened at The First Theater (第一劇場) in Dadaocheng, which was built for “The Taiwan Exhibition of the Fortieth Anniversary of Japanese Colonial Governance” in 1935 and patronized by the renowned tea merchant Tian-Lai Chen (陳天來). Dong-Cheng Wong (翁東成), a 90-year-old elder who once lived in Dadaocheng, noted My wife was recruited to be an accountant at The First Theater by Lin-Qiu Li. That’s why I often went to the movies…(What was The First Theater’s program?) Movies, it was entirely movies. (Where were those movies from?) Most of them were from America. (Were there any Japanese movie?) Yes, but there were less of them than American movies. (Were there any Taiwanese movies?) No! At that time, I went to the theater but I never watched Taiwanese movies. Nobody wanted to see Taiwanese movies. We all loved watching American and Japanese movies. Gradually, the center of Beijing’s opera performances switched to Eirakuza in Dadaocheng, which opened in February 1924. Although initially prosperous, the popularity of Beijing opera quickly declined. Except for the performance of Feng Yi’s Beijing opera troupe (鳳儀京班) at The First Theater in 1935 and the performance of the Shanghai Tian Chan Big Beijing opera troupe (天蟾大京班) at Eirakuza, the performance of Beijing opera ceased entirely until the conclusion of World War II. When describing the program of the new stage and The First Theater, Yun-Diao

118

C. S. Stone Shih

Wang (王雲雕), a resident of Dadaocheng and a well-educated 80-year-old elder, recalled that At that time, Taiwanese opera was usually performed on the New Stage (#26). I didn’t want to see Taiwanese operas at all, but my oldest sister often saw the plays…The First Theater was opened on October 10. (What was its first program?) The Beijing opera from Shanghai! During the post-war period, The First Theater became the most important space for Enka as well as the screening of Japanese movies. Famous Taiwanese ballad singers, such as Yi-Fong Hung and Lu-Shyia Chi, performed at this theater. JingShang Li (李錦祥), the manager of the First Record Store (第一唱片行), argued that In the late 1960s, it was good business to sell records. Because my company was located in front of The First Theater, audiences constantly came to my place to purchase records, including Enka movie theme songs such as “Love in May Flower ”(愛染かつらを), “My Darling on the Bridge”(あの橋の畔で), and “Where My Darling Is”(あの波の果てまで). “The Third Sekai Kan” theater, which was renamed the “Da Guang Ming Theater” (#9) after the war, was also located in Dadaocheng. The main program at this theater was Taiwanese movies. Notably, as a “Taiwanese street,” Dadaocheng embodied not only Taiwanese culture but also Chinese opera and music. In summary, the culture in Dadaocheng was overall much more similar to Chinese culture than to Japanese culture. For instance, intellectuals such as Wei-Shuei Jiang (蔣渭水), who established the Taiwan People’s Party in 1923, frequently attended Jiang Shan Dinners (#3), Peng Lai Pavilion (#27), and Dong Hui Fang (#8). These theaters retained Beijing opera programs, including the nanguan and beiguan styles. The programs at Chun Feng De Yi Dinners (#10), which was opened by Jiang, were similar to those of the aforementioned three theaters. Chun Feng De Yi Dinners showcased operas such as the “Baffling Case in Fuzhou,” “Nine Interlocking Rings,” and the “Crab Song.” This prosperous performance of Beijing opera indicates the strong relationship between Dadaocheng and Chinese culture in the Japanese colonial period. The Taiwanese Xiao-Qu style included contributions from artists such as Jun-Yu Chen (陳君玉) and offered a motive for pursuing a traditional Taiwanese identity. This musical style became increasingly important when the relationship between Dadaocheng and China collapsed following the decline of Beijing opera and the Beiguan style, particularly after the 228 incident. Taiwanese identity persisted throughout the creation of Taiwanese ballad and movies, which were heavily derived from Xiao-Qu pop and Taiwanese opera. Moreover, the direct ignorance of the Kuomintang government toward Taiwanese ballad triggered a marked change among popular music audiences. Taiwanese movies and songs began circulating in movie theaters, cabarets, and dance halls, including the Da Qiao Theater (#11) and the Mayflower land (originally, the Chun Feng De Yi Dinners). By contrast, Chinese dramas and music faded from Dadaocheng and were only circulated in venues patronized by the ruling class in the region of Seimonch¯o. The example of Dadaocheng indicated that Japanese and Taiwanese music audiences were separated spatially

9 Digitalized Enka-Style Taipei

119

before the war, which was reflected in the cultural separation of highly and less educated Taiwanese people. Notably, the establishment and group distribution in Seimonch¯o had its own complicated separation. Seimonch¯o was the main Japanese community and recreational area in Taipei during the Japanese colonial period. The programs offered in this area were mainly Japanese film screenings, noh performances, and stage play performances. However, “sometimes, they also rented out to Chinese troupes performing Beijing operas” (Ye 1997). Furthermore, Seimonch¯o was connected to the performance venues located in Inner city, such as The Taihoku City Public Auditorium (#21) and Kikumoto Department Store (#15). A variety of Japanese and popular international films were screened at Seimonch¯o’s theaters, including the Daisekai Kan (#20), Sekai Kan (#18), Shinsekai Kan (#19), and Yoshino Kan (#28). Although the audiences were primarily Japanese people, Taiwanese people also watched these movies. Rong-Liu Yan (顏榮柳), a 90-year-old Monga gentry, described When I was at the Kai-Nan High School of Commerce and Industry, I always skipped class to go to the Daisekai Kan and watch movies. They were all in Japanese, and the audiences were mainly Japanese people. The tickets were 3 or 5 dollars. I bought the half -priced student ticket, and went to see the occidental movies. I can remember watching “Giant,” “High Noon,” “The Bark of the Gun,” and “Bump, Bump!” It was so exciting, and I still remember it. The interaction between Shanghai and Taiwan resumed when the provincials migrated to Taipei in the post-war period during the rule of the Kuomintang government. The government inherited the tradition initiated by the Japanese people and designated “Seimonch¯o” as the major entertainment space after the population had migrated from China to Taiwan. The popular entertainment programs then comprised Mandarin movies from Shanghai; new “national language” popular music, which was sung by singers such as Xuan Zhou (周璇) and Guang Bai (白光); and Beijing opera and music, which originated from intellectuals in Dadaocheng. This modification of entertainment habits reflected the transformation of political power in Taiwan and led to different audiences and various performance locations in the Taipei music scene. Jing-Mei Li (李靜美), a Taiwanese ballad singer, spoke vividly of her experience meeting the diverse ethnic groups that appeared in the musical space. However, she also revealed that Mandarin songs became mainstream during this period, as evidenced from her own experiences when circuiting cabarets in the late 1960s. She states I gained my fame in Guo Sheng Cinema (#2), Zhenshanmei Hall (#1), and Xiaokirin (#4). I always wanted to sing in Seimonch¯o because you can record if you sing in Seimonch¯o. At that time, singing Mandarin songs was the mainstream performance as a result of the decline in Taiwanese songs. Other singers, such as Ni Zhen (甄妮) and Ya You (尤雅),… we were all of the same generation. According to Li’s memory, the programs sung in the cabarets in Dadaocheng during the 1970s primarily comprised Taiwanese and Japanese songs. She wanted to sing in Seimonch¯o for two reasons: (1) due to to the higher audience standards, she would have been able to achieve fame more easily and (2) singing in a cabaret presented opportunities for recording and television appearances. Several theaters

120

C. S. Stone Shih

and cabarets changed their name after the war. For example, Sakaeza (#16, built in 1900) was named as Wan Guo theater and then Cinema. Yoshino Kan (#28, built in 1908) was renamed first as the Mei Du Li theater and later as the Ambassador theater. Daisekai Kan (#20, built in 1935) was named as the Da Shi Jie theater. Kokusai Kan (#22, built in 1935) was renamed as the Wan Nian International Commercial building. The Taiwan Theater (#23) was named as The China Theater, and Kikumoto Department Store One was later renamed as Seventh Heaven. Notably, Seventh Heaven was the main location for popular Mandarin music performances. Lu-Shyia Chi, a singer whose musical pieces are analyzed in this study, recalled that she made a stage appearance at this venue. Hua-Shi Guan and Zhi Shen (who once was the producer and host of the first Mandarin singing television program, Taiwan Television, in 1962), her good friends, also made appearances at Seventh Heaven. In addition to the location of the performance, record stores were essential for the circulation of Taiwanese ballad. The record stores that existed in the four districts of Taipei gradually joined the “media loop” circulation, which originated from the regions that were under Japanese colonial power during the post-war period. As a base for selling records, these stores were the source through which Taiwanese ballad entered the media loop. Specifically, these stores even functioned as a major hub for records. The production of records using Taiwanese materials and recording techniques was not revived until 1952 when Shi Xu (許石), the pioneer of Taiwanese ballad, established the China Record Company. The factory of this company was situated in Sanchong district (三重縣) adjacent to Taipei City across the Tamsui River, which was the base for prosperous record companies in the 1960s and then became an extension of Taipei’s media loop (Shih 2009). Sanchong was thus the headquarters of record companies. Among the 72 record companies in Taiwan in 1967, more than 40 were located in Sanchong district. The records manufactured from factories in this area were distributed throughout Taiwan. Notably, the Zhong Hua merchandize market (中華商場) in Seimonch¯o contained several recording warehouses, such as Universal, Metro–Goldwyn–Mayer, Columbia, Xin Xing, Nana, Jinmen, and Ge Wang (Fig. 9.1). These seven recording warehouses were crucial for sellers from the middle and southern regions of Taiwan because they obtained their money from wholesale and then retail sales. Jing-Shang Li, the manager of the First Record Company, stated In the late 1960s, if you wanted to listen to Enka records, the best place to buy them was at my store. You could also find records at the the Zhong Hua merchandize market in Seimonch¯o. Enka movies, like Love in May Flower (愛染かつらを), were also very popular at that time. The movies were all screened first in Seimonch¯o and then in Dadaocheng. Taiwanese people, especially the working class, liked to watch movies at The First Theater and then come to my store to buy the records because it was cheaper. You could also find the Enka-style records of Lu-Shyia Chi, Shia Wun, and Yi-Fong Hung at that time, which were very, very popular… In conclusion, we can reasonably estimate that half of all the record stores in Taiwan during the post-war period were located in Taipei. In the 1960s, Enka-style mixed-race pop was still one of the principal memories in the cultural imagination of Taiwanese people due to the popularity of Japanese movies and Enka-style records

9 Digitalized Enka-Style Taipei

121

and songs. Lu-Shyia Chi and Yi-Feng Hung were the two main singers who combined Enka and Taiwanese music, which made them very popular at that time. The locations of Enka-style performance halls and theaters shifted from Inner city and Seimonch¯o to Dadaocheng after the war.

9.5 Concluding Remarks In the Japanese colonial era, the center of Beijing’s opera performance gradually switched to Eirakuza, Dadaocheng. Although initially prosperous, the popularity of Beijing opera quickly declined. Japanese people were the primary residents in Seimonch¯o and formed the so-called “Inlander’s street,” which became a recreation and business area that flourished until the 1960s under the rule of the Kuomintang government. Beijing opera and music faded from Dadaocheng and were only circulated in venues patronized by the ruling class in Seimonch¯o. The example of Dadaocheng indicated the formation of an ethnic–class cultural space. Japanese and Taiwanese song audiences were separated spatially before the war, which was reflected in the cultural separation of highly and less educated Taiwanese people. During the post-war period, Dadaocheng’s First Theater became the most important space for Enka as well as the screening of Japanese movies. The Kuomintang government inherited the tradition initiated by the Japanese people and designated Seimonch¯o as the major entertainment space after the Chinese people migrated to Taiwan. Beijing opera and music, which originated favored by Dadaocheng’s intellectuals. This modification of entertainment habits reflected the transformation of political power in Taiwan, led to different audiences and various performance locations in the Taipei music scene, and gave rise to diverse ethnic groups that appeared in different musical spaces. In the 1960s, Enka-style music was still in the forefront of the cultural imagination of Taiwanese people due to the popularity of Japanese movies and Enka-style records. Lu-Shyia Chi and Yi-Feng Hung were the two main singers who combined Enka and Taiwanese music to form mixed-race songs, which made them very popular in the 1960s. By focusing on the cultural space, this study describes the popularity of the Taiwanese ballad music form. The compositions of Lu-Shyia Chi and Yi-Fong Hung were taken as examples. Their compositions feature mixed-race influences from Japan. The music evolved from Taipei’s four districts, namely, inner city, Monga, Dadaocheng, and Seimonch¯o, from 1930 to 1960. All the cultural imagery and historical discourses entangled with cultural products and imbued with social meaning provided a critical understanding of the implications of Enka in the lives of its listeners. As indicated by the bridging interviewing method and GIS digital mapping, the music-language differences throughout history eventually caused racial geospatial division in the examined districts. This historical study is in agreement with the suggestion of Brown and Knopp (2008), who state that an epistemologically plural approach is possible from a perspective that embraces tensions and conflicts as opportunities to advance knowledge rather than viewing them as obstacles.

122

C. S. Stone Shih

References Alexander, B.K. (2003). Fading, twisting, and weaving: an interpretive ethnography of the black barbershop as cultural space. Qualitative Inquiry, 9, 105–125. Bodenhamer, D. J., Corrigan, J. Harris, T. M. (2010). The spatial humanities: GIS and the future of humanities scholarship, Indian University Press. Bridge, G. (2002). Bourdieu, Rational action and time-space strategy of gentrification. Transactions of Institute of British Geographers, New Series, 205–216. Brown, M., & Knopp, L. (2008). Queering the map: The productive tensions of colliding epistemologies. Annals of the Association of American Geographers, 98(1), 40–58. Gao, T. C. (2004). Watching Taipei through time and space: The 120th anniversary of city founding: ancient maps and old image literature and cultural relics exhibition. Taipei: Taipei City Government Press. Goodchild, M. F. (2004). GIScience, geography, form, and process. Annals of the Association of American Geographers, 94(4), 709–714. Jones, A. F. (2004). Yellow music: media culture and colonial modernity in the chinese jazz age. Durham and London: Duke University. Kishi, Toshihiko 貴志俊彥 (2013). 東アジア流行歌アワー越境する音交錯する音樂人, Tokyo: 岩波書店. Liechty, M. (2003). Suitably modern: Making middle-class culture in a new consumer society. Princeton: Princeton University Press. Podmore, J. (1998). (Re)reading the ‘loft living’ habitus in Montreal’s inner city. International Journal of Urban and Regional Research. 283–301. Shih, C. S. Stone, Chi, C. L., & Huang, Y. L. (2009). A spatial excavation on the medial loop of Taiwan’s folk song in the greater Taipei area, 1960–80. In: Jinn- Guey Lay et al. (eds.), Digital Archives GIScience. pp. 1–22, Taipei, Department of Geography, National Taiwan University. Shih, C. S. Stone, Chi, C. L., & Huang, Y. L. (2010). Representation, bridging and interpretation: on the realization of social geographic information systems. In: L. Hue et al. (eds.), Spatially Integrated Humanities and Social Sciences. pp. 17–32, Beijing, Science Publication. Shih, C. S. Stone (2011). Taiwan’s Ballad as a mainstream song of the period: the shanghai and other mixed-blood influence of music Taipei, 1930–1960. Taiwanese Journal of Sociology, 47, 91–141. Shih, C. S. Stone (2014). Modern Song: Lu-Shyia Chi and Taiwanese Ballad’s Era. Taipei: Tonshan Publication Inc. Shih, C. S. Stone (2016). 「歌謡、歌謡曲集と雑誌の流通:中野忠晴、「日本歌謡学院」の戦後初期台日に対する文化を越えた影響」,p. 101–132, 『台湾のなかの日本記憶』,三元社, 日本。. Yano, K., Nakaya, T., & Isoda, Y. (2007). Virtual Kyoto: Exploring the past. Nakanishiya: Present and Future of Kyoto. Yano, K., Nakaya, T., Kawasumi, T., & Tanaka, S. (2011). Historical GIS of Kyoto. Nakanishiya. Ye, L. Y. (1997). Taiwan’s earliest theaters and movies. Taipei Literature, Straight (122). Yeh, C. F. (1990). Taiwan’s early post-war drama. Taipei: Taiyuan.

Part III

Spatial Synthesis in Regional Science

Chapter 10

Research Progress on Spatial Demography Hengyu Gu, Xin Lao, and Tiyan Shen

10.1 Introduction The healthy development of all human beings has always been an important standard to measure the sustainable development level of a region or a country. Human Development Report 1990 released by the United Nations put forward the HDI (Human Development Index), seeking to assess the sustainable development level of a region from three aspects: life expectancy, knowledge, and living standard. Therefore, population change is closely related to the sustainable development of a country or a region. To look into the depth, there are three components of the population change: birth, death, or migration. With the improvement in healthcare and the collapse in birth rate, population migration has directly led to the changes in the total population and the structure of population. According to The International Migration Report 2017 issued by the UN Population Division, the world international migrant stock reached 258 million, taking up 3.4% of the world population. Among all these international migrants, 57% migrated to developed countries while the rest 43% moved to developing countries. In developed countries, 61% of immigrants come from developing countries, while for developing countries, immigrants from developed countries take up only 13% of the immigrants. It can be concluded that the international population migration exacerbates the imbalance of the international population distribution and further influences culture exchange, trade, and resources allocation between various H. Gu · T. Shen (B) School of Government, Peking University, Beijing 100871, China e-mail: [email protected] H. Gu e-mail: [email protected] X. Lao School of Economics and Management, China University of Geosciences, Beijing 100083, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_10

125

126

H. Gu et al.

regions. In spite of its contribution to the flow of factors, population migration has widened the regional imbalance, which poses a threat to the order and development of the regions. In other words, an uneven population pattern that partly resulted from migrations exacerbates the imbalanced regional development, which will manifest as spatial heterogeneity of social characteristics, such as urbanization and public health. In the community of demography, space is an inherent dimension of demographic research and a core concept in demographic application analysis. In demography, population refers to a group with certain characteristics in a geographic region and at a particular time (Zeng et al. 2011). Therefore, population data has almost the same characteristics as spatial data. Spatial synthesis and computational methods are crucial to advance social science and humanities, especially for demography. Spatial methods can help to visualize better, analyze, and predict the spatial distribution of demographic characteristics at each geographic unit. Had the concept of space been ignored, many demographic researches would have been hardly carried out. In the second half of the twentieth century, scholars of social science began to pay attention to the spatial problems of social science. Giddens (1984) pointed out that spatial factors that are instrumental in building a reasonable social theory were ignored in the traditional theories. Additionally, with the growing maturity of GIS technology, the technology, models, and theories of spatial analysis have gained booming development, which speeds up the process of “Space Transfer” in social science. Against the background of spatial science, Spatial Demography has emerged as an interdisciplinary academic field and received more and more attention from scholars in demography, geography, and regional science. Compared to other existing methods of demographic study, spatial demography can better deal with demographic issues related to space to some degree, such as the distribution of urbanization rates and the characteristics of interprovincial migration. Meanwhile, supported by spatial methods of spatial demography, some traditional theories of demography can be better validated. Furthermore, spatial demographic analysis methods can display some population data (e.g., distribution, migration) more intuitively, thus providing scientific evidence for urban population management and governance. In recent years, research on Spatial Demography has been thriving with the application of many emerging techniques and methods of spatial analysis. However, few papers have reviewed the research themes and methodology of previous studies systematically. Besides, Spatial Demography, as an emerging subject, still faces many problems waiting to be further discussed. For example, how to get a systematical understanding of “Space” in Spatial Demography? Is Spatial Demography equivalent to demographic, spatial analysis? This paper clarifies the core concept of Spatial Demography and sorts out its development clue, introduces recent advances in Spatial Demography research, hoping to promote discipline construction of it and of some related fields.

10 Research Progress on Spatial Demography

127

10.2 Core Concept of Spatial Demography 10.2.1 The Definition of Spatial Demography The concept of Spatial Demography first emerged with more studies on population migration (Clarke 1984). According to the book Spatial Demography written by Suzuki Keisuke (1980), a Japanese demographer, the issue of population can be explained from the aspects of population size and population distribution, which depends not only on the rates of birth and death, but also on interregional migration. American demographer Voss (2007), the founder of Spatial Demography, published a paper “Demography as a Spatial Social Science”, which has had a big repercussion in this field. He regarded Spatial Demography as a new discipline which offers a regional perspective to the study of traditional demography. While other demographers like Matthews and Parker (2013) define Spatial Demography as “the spatial analysis on the issues and process of population”, emphasizing on the significance of spatial analysis to the development of Spatial Demography. In conclusion, existing definitions of Spatial Demography can be summarized as follows: From the perspective of scale, Spatial Demography focuses on the overall presentation of population phenomena at the regional level instead of individual behaviors; From the perspective of a method, Spatial Demography is inseparable with the spatial analysis techniques. Some scholars even contend that Spatial Demography is employing spatial analysis techniques to solve traditional demographic issues (Matthews and Parker 2013). In this paper, Spatial Demography is defined as a discipline that studies population (birth, death, migration) at the regional level by using spatial data and techniques (cartography, visualization, pattern recognition, and mechanism analysis). In general, Spatial Demography has the following four characteristics: (1) Collection of the population spatial data. Access to the population spatial data is the prerequisite to analyze problems in Spatial Demography. The data applied should contain not only demographic characteristics but also spatial information such as the latitude and longitude. (2) The application of the spatial analysis method. In addition to the traditional spatial analysis based on the geographical cognition such as buffer analysis and overlay analysis, spatial relationship modeling and spatial statistical analysis methods are emphasized. (3) Model spatialization. The concept of space should be integrated into the analytical model of Spatial Demography. Space should be embedded in the model through transport cost (New Economical Geography) or distance parameter (gravity model), or the model should reflect the process of spatial interaction (multiregional population projection). (4) Regionalization of the analytical perspective. Spatial Demography focuses on demographic phenomena in a particular region, and macroscopically explores the demographic issues, which belongs to the category of Macro-demography to some extent (Voss 2007).

128

H. Gu et al.

10.2.2 Space and Spatial Analysis in Spatial Demography Space is an essential element in demographic research. When space is studied as the background and object behind certain demographic phenomena, we should cast our attention towards the spatial distribution pattern of such demographic phenomena and its influence mechanism, for example, the spatial characteristics of a country’s labor market and their formation reasons. When space is regarded as the subject that affects the demographic phenomena, or space serves as the underpining driver of such demographic phenomena, we should pay attention to the effect of space on the demographic phenomena, for instance, the effect of the urban built environment on the resident trips shapes the overall pattern of population distribution within the city. Understanding the relationship between space and demographic phenomena from the perspectives of object and subject is the logical starting point to analyze demographic issues in Spatial Demography. The thought of spatial analysis found its basis in the first law of geography, which signifies the interrelation of the geographic objects, i.e., the closer they are to each other, the more connected they are (Tobler 2004). The advancement of the spatial analysis technique is an important driving force for the development of Spatial Demography. The more advanced the spatial analysis technique is, the more detailed geo-spatial data and question-oriented GIS analysis method can be obtained, which have considerably raised attention on space in recent years. Techniques like Spatial Econometrics, Geographically Weighted Regression (GWR), Multilevel Modeling and Spatial Pattern Analysis are all influential for the development of Spatial Demography in the future (Matthews and Parker 2013). However, as Goodchild and Janelle (2004) states, there is a shortage of theoretical interpretation of space in related models. For example, in the spatial interaction model, space can be interpreted as either the transport cost or the correlation between group communication and distance. Thus, Spatial Demography cannot be simply equated with the spatial analysis of demographic issues. As for Spatial Demography, the most urgent need is to promote the combination of the advanced spatial analysis techniques with the space-based demographic theories.

10.2.3 The Relations and Differences Between Spatial Demography and Related Disciplines In order to distinguish certain concepts and meanings of some related theories and lay a qualitative foundation for technological analysis, this paper compares the focuses laid in demographic research (Shown in Table 10.1) in the fields of Spatial Demography, population geography and regional science: ➀ Compared with population geography, Spatial Demography concerns more about those demographic issues from the perspective of geographic space. ➁ Compared with regional science, Spatial Demography focuses on demographic and economic issues in a specific

10 Research Progress on Spatial Demography

129

Table 10.1 The comparison of population study between Spatial Demography and related disciplines Disciplines

Population studies in related disciplines

Population studies in spatial demography

population geography Study the geographic distribution of population as well as its relationship with the environment, belonging in the category of geography

Study the law of population development combining geographical theories with techniques such as spatial statistics, belonging in the category of demography

Regional science

Study the economic population structure in specific geographical units, focusing on the description and the mechanism of spatial economic population phenomena with population as the main subject

Study the economic population structure in abstract geographical units, and emphasize establishing an explanatory economic model, with the population as one of the important elements in a region

Regional demography Study the spatial change of regional(multi-regional) demographic phenomena, focus on the comparative study from the regional(multi-regional) perspective and the synthesis of those analytical perspectives

Study the demographic phenomena at a technical level from the aspects of visual representation, pattern recognition, analysis and modeling of driving forces, focusing on the application of quantitative analysis

spatial unit, in which descriptive analysis is applied instead of explanatory modeling. ➂ Compared with regional demography, Spatial Demography lays more emphasis on the application of spatial analysis techniques. However, the difference between Spatial Demography and regional demography is so tiny that Spatial Demography can be regarded as the dominant theory and method of regional demography to some degree (Wang 2017).

10.3 The Course of Development in Spatial Demography Some signs of Spatial Demography are shown in some early studies in demography. It could be dated back to 1855 when Snow (1855) began to analyze the causes of cholera deaths in London, England, with the cartography method. Spatial Demography is featured by its spatial character of the population data and the regional perspective of research. According to this statement, most demographic researches are classified into Spatial Demography, because census data used in these researches are added onto a certain geographical level or unit for demographic change analysis (Voss 2007). Although Voss (2007) admits that this classification may be groundless, it at least shows an early understanding of Spatial Demography in the demographic circle. That is, in traditional demographic studies, all these demographic analyses with the spatial character can be classified into Spatial Demography. Actually, before

130

H. Gu et al.

the middle of the last century, this space-based analytical model is applied in many demographic researches. Meanwhile, in those disciplines related to demography, such data collected from geospatial integration are also widely used (Theodorson 1961). Since the middle of the last century, with the emergence of the “Ecological Fallacy” and mass popularization of the statistic survey mode in micro-demography, the western world casts their attention towards micro-demographic researches, in which social demography focus on families and individuals is emphasized. At the same time, there are still some demographers who paid consistent attention to spatial demographic issues with the subjects concerning urban demography, population migration and population prediction. In urban demography, some scholars focus on urban function, urban hierarchy, urban structure and spatial distribution of ethnic groups within a city and extend the measurement methods of early studies on residential differentiation. The primary concerns in population migration are the measurements of interregional migration and its influencing mechanism (Shryock and Eldridge 1947). In terms of population prediction, the main research task is to estimate and to predict the population of a particular region, which is an important component of Applied Demography. At the beginning of the 1980s, with the rapid advancement of GIS spatial analysis technology, Spatial Demography has been paid great attention by the academic community. Those population geographers such as Rees, Congdon and Batey are the first group who applied the spatial analysis technology into the demographic researches. Rees and Wilson (1977) emphasize that the main direction of population geography is to analyze the population issues with methods of spatial analysis and demographic statistics. As a result, the number of literature on Spatial Demography has been on the rise. Congdon and Batey (1989) have issued a memoir on Spatial Demography and have generalized the demographic papers from these four aspects, such as spatial planning based on population information, residence and re-distribution, population migration and population forecast. During this period, the great achievements from some related disciplines such as geography, regional science and spatial econometrics are valued and absorbed by social science fields represented by demography. Compared with the traditional spatial demographic researches, Spatial Demography of this period presents some spatial characteristics from the perspectives of models, data and analysis. There also appears a gradual diversification of research topics, and the interdisciplinary comprehensive researches in demography begin to emerge. Such tendency continues untill today (Table 10.2).

10.4 The Trans-Century Research Focuses in Spatial Demography Spatial Demography didn’t step into a period of great development until the turn of the century. Many papers published in top international demographic journals have

10 Research Progress on Spatial Demography

131

Table 10.2 The development phases, characteristics and causes of Spatial Demography Development phase Time

Characteristics of the researches

Causes

Phase 1: Origin stage

Before the 1950s

A general and macro demographic interpretation based on demographic data at the geographic level

Space is an important element of demographic research; Demographics is integrated at a certain geographical level

Phase 2: Slow development stage

The 1950s-1990s

Despite the shift to The popularization of micro-demography, the micro-demographic some scholars still paid data; Ecological fallacy consistent attention to spatial demographic issues concerning urban demography, population migration and population prediction

Phase 3: Leaping development stage

The 1990s till today

Applying the spatial The rapid development analytical techniques to of GIS spatial demographic researches, techniques stressing the spatial interpretation of demographic issues

pointed out positive directions towards which Spatial Demography can proceed either in theory or in applications (Voss 2007; Matthews and Parker 2013). Spatial Demography was first published in 2013, marking the maturity and the systematization of Spatial Demography in this field. Paying special attention to advanced achievements in Spatial Demography, this part reviews the related literature since 2000, based on the classification system consisting of “differentiation and isolation”, “birth and death”, “migration and urbanization”, “population and the environment”, “regional population forecasting” and “methodology research”. This kind of classification tries to integrate traditional demographic topics (birth, death and migration) with new topics (differentiation and isolation, urbanization, population and environment, and population forecasting) emerging due to the development of cross-disciplines and technology.

10.4.1 Differentiation and Isolation Spatial differentiation and isolation of those demographic characteristics such as ethnics, stratum, income and educational level have always been important research topics in demography. Such differentiation often leads to the imbalanced development of regional demographic and economic factors. Demographic and economic variables are always featured by spatial heterogeneity and spatial dependence (spatial

132

H. Gu et al.

autocorrelation). For instance, the poverty rate of a region is influenced not only by some economic and environmental variables in that region but also by some relevant variables in the adjacent regions. However, the spatial spillover effects of population variables are difficult to be estimated by traditional regression models, which calls for advanced tools and techniques. With the development of spatial econometrics, spatial regression models, with the spatial weight matrix introduced, is useful in dealing with spatial heterogeneity and spatial dependence. When it comes to the mechanism of population variables, spatial economic models with a spatial weight matrix can quantify the spatial interaction effects, which provide a more accurate evaluation. Previous demographic studies on spatial differentiation and isolation mainly focus on the following three aspects. ➀ Residential differentiation: that is to lay stress on the spatial differentiation of residents in a region from the following aspects of ethnics, educational level and age structure, and on the relationship between the spatial differentiation of residential environment and demographic characteristics in the residential zones. Based on the spatial demographic perspective, Harris et al. (2007) employed a multilevel modeling technique in studying whether students in Birmingham city of UK prefer to choose a state-funded secondary school closer to them, and finally found out that the ethnic composition in the residential region has an obvious influence on students’ selection of schools. From the perspective of age differentiation in communities, using census data of American counties in 1990, 2000 and 2010, Winkler (2013) discussed the residential segregation of aged people (60 and above) from young people (20–34), and discovered that the age differentiation among residential communities is widespread in America, especially for Hispanics and non-Hispanic whites. While Duncan et al. (2012), from the perspective of the spatial differentiation of community walkability, measured the spatial variance of walkability in communities in Boston and its relation to spatial demographic variables (such as the proportion of ethnic minority population and the proportion of poor households), using the methods of Moran’s I, OLS regression and spatial auto-regression. It turned out that although there exists residential segregation in Boston, spatial demographic variables have no significant effects on the residential walkability. ➁ Regional income disparity: that is to emphasize the regional spatial differences both in economic development and poverty level, and the causes of such differences, with child poverty rate as the main focus. Since 2006, Voss et al. began to compute the spatial auto-correlation and the spatial spillover effects of child poverty rate in American counties with explanatory spatial data analysis and spatial regression analysis (Voss et al. 2006). Recently, this research team applies spatial econometrics to the researches of the child poverty rate and takes into account the effects of more independent variables such as ethnics and the regional economic composition, revealing that the regional ethnic agglomeration and industry restructuring both have a marked impact on the regional poverty level (Curtis et al. 2012). Laurini (2016) employed DMSP-OLS nighttime light data to estimate spatial and temporal disparity of resident income level in Brazil, proving the feasibility of such a method in the absence of the census information. ➂ Spatial variation of other social problems: that is to focus on the spatial variance of some social problems such as the crime rate,

10 Research Progress on Spatial Demography

133

the unemployment rate and the causes of both of them. Arnio and Baumer (2012) studied the spatial variance of crime rates in Chicago based on OLS and GWR and found the significant influence on spatial patterns of crime rates from variables like the proportion of the black in the community, the concentration of immigrants and the foreclosure. Based on the Theil index, Thiede and Monnat (2016) evaluated the spatial difference of unemployment rates at the county level and state level during the American financial crisis (2007–2009). After identifying the cluster areas with similar change regularities based on spatial statistics method, they finally used a spatial regression model to assess the influential factors of the unemployment rate. The results show that the imbalances in labor markets at the county level have been exacerbated, and some counties with a surging unemployment rate during the financial crisis are affected by similar factors such as lower educational investment.

10.4.2 Birth and Death Birth and death is a traditional topic in demographic studies and a hot issue in Spatial Demography. The discussion has been made in depth around this traditional topic based on spatial analytical techniques. Most researches on fertility rate are conducted under the background of the fertility decline across the world. Based on the economic and demographic data at the county level in the USA, Porter (2017) studied the relationship between regional economic development and fertility rate. Meanwhile, some hot topics like childbearing within cohabitation have aroused the awareness of spatial demographers. Vitali et al. (2015) studied the spatial distribution of childbearing within cohabitation in Norway from 1988 to 2011. The research findings display that there exist spatial heterogeneity and autocorrelation, and that the unemployment rate and the female educational level are the main causes for childbearing within cohabitation based on the results of the spatial panel model. In addition, antenatal care becomes a hot issue in recent years. Gayawan (2014) investigated the explanatory factors and spatial effects of antenatal care services in Nigeria with the Poisson regression model, discovering that the antenatal care service presents spatial heterogeneity, and that some factors like the childbearing age, spouse age, length of marriage have great influences on it. Researches on mortality mainly focus on the cases of developing countries, of which the prediction and estimation of child mortality is an important topic. Balk et al. (2004) paid early attention to child mortality in developing countries, studied some determinants on child mortality in 10 West African countries and found the child mortality has more correlation with the geographic environment. Later, Storeyard et al. (2008) explored the spatial distribution of global child mortality and conducted the spatialization of child mortality data using grid statistics. Yang et al. (2015) employed Spatial Durbin Model to reveal the spatial distribution of mortality at county level in the USA with noticeable spatial spillover effects. Jankowska et al. (2013) constructed an estimation model based on child ( 0, it shows the adjacent region is positively correlated, which means the high value is adjacent to high value and the low value is adjacent to low value. If I < 0, it shows the adjacent region is negatively correlated, which means the high value is adjacent to low value adjacent or low value is adjacent to high value. If I → 0, it shows the sample is randomly distributed or there is no spatial autocorrelation. Local autocorrelation mainly explores the distribution characteristics of local subsystems of spatial data, and is used to study spatial local agglomeration patterns and spillover effects. It uses the local Moran index I (formula 13.2), also known as LISA (local indicator of spatial association), to test whether similar or different observations are concentrated in local regions.

190

J. Zhang et al.

Table 13.2 Global Moran’s I of dependent and independent variables Variables

2000

2005

2010

2015

CO2

0.236 (0.02)**

0.295 (0.005)**

0.272 (0.009)**

0.246 (0.015)**

Ind_str

−0.016 (0.365)

−0.0093 (0.335)

−0.016 (0.365)

−0.0317 (0.418)

En_str

0.182 (0.03)**

0.254 (0.014)**

0.164 (0.047)**

0.187 (0.036)**

En_TFP

0.115 (0.09)*

0.175 (0.052)*

0.216 (0.033)**

0.172 (0.061)*

Notes Results by 999 permutations randomization; Data in () are p-value; * p < 0.1 ** p < 0.05 *** p < 0.01

Ii =

¯ (xi − x) wi j (x j − x) ¯ 2 S i= j

(13.2)

If LISA value is greater than 0, it indicates that the high value region is surrounded by a high value region (H-H), or the low value region is surrounded by a low value region (L-L). If LISA value is less than 0, it indicates that the high (low) value region is surrounded by low (high) regions (H-L or L-H). n Where, S 2 = n1 (xi − x) ¯ 2 represents the variance of the sample data; x¯ = 1 n

n

i=1

xi represents the sample mean; i, j represents different regions, n represents the

i=1

total number of regions in the study, and W ij represents the elements in the spatial weight matrix. This paper builds a spatial weight matrix based on the rules of Rook adjacency.1 Table 13.2 shows the results of the global autocorrelation test for the dependent and independent variables. The global autocorrelation test of carbon dioxide is significantly positive throughout the sample, indicating that there is a significant spatial autocorrelation relationship in China’s carbon emissions. Overall, carbon emission hotspots tend to cluster with carbon emission hotspots, which verifies the analytical assumptions of Fig. 13.1. Energy efficiency and energy structure evolution also exhibit similar spatial agglomeration characteristics. However, the results of the global spatial autocorrelation test of industrial structure are not significant. This indicates that the spatial agglomeration form of industrial structure has significant dynamic characteristics and does not form stable and significant agglomeration characteristics in a certain time section. This not only verifies the evolution of the industrial structure in Fig. 13.2, but also reveals the complexity of the industrial structure variable. 1 Although

there are different ways of constructing a weight matrix, such as economic distance or political connection, here this paper adopts the classical Rook adjacency in order to make comparison with the previous studies well.

13 Exploring the Dynamics of Carbon Emission in China …

191

The global Moran index can only test whether there is an agglomeration in the space but does not determine where to agglomerate. In other words, the global Moran’I only answers Yes or NO, and the local Moran’I answers Where. If the global autocorrelation is significant, further local autocorrelation testing is required. In general, LISA cluster maps2 are used to visually reflect the imbalance of spatial distribution and the pattern of local agglomeration. According to the LISA analysis, the local spatial agglomeration characteristics of energy efficiency and energy structure have not changed much. Basically, in 2000 and 2015, both exhibits a pattern of high value agglomeration in the eastern region, and the western region shows a pattern of low value clustering. The local autocorrelation characteristics of carbon emissions are also relatively stable. From 2000 to 2015, they are characterized by high-value accumulation in Northern China, high value and low-value accumulation in Sichuan. The local autocorrelation feature of the industrial structure still shows complexity. These show that the spatial difference is very significant in this analysis and cannot be ignored. In particular, if a traditional regression analysis is performed, the current sample does not conform to the classical Gaussian assumption and cannot be effectively estimated.

13.5 Spatial Econometrics Analysis The panel data regression model combines the information of time scale and section unit, which contains more variability, and it usually has less collinearity between variables. It can reflect the relationship between variables more scientifically and objectively. The traditional panel data model assumes that the observed samples are obtained by random sampling. However, as indicated in the spatial exploration analysis section, the unit of analysis have significant spatial dependence and spatial heterogeneity. The traditional panel regression model cannot be used, and the spatial panel regression model is needed. In recent years, the spatial measurement model has become increasingly mature for the processing, setting and estimation of panel data, which gives solid foundation to the analysis of this paper. The spatial econometric model considers three kinds of interaction effects: the endogenous interaction between the explanatory variables, the exogenous interaction between the explanatory variables, and the interaction between the error terms. The commonly used spatial panel models are Spatial Lag Model (also known as spatial autocorrelation model, SAR), Spatial Error Model (SEM), and Spatial Durbin Model (SDM). The spatial lag model (SAR) contains endogenous interaction effects. Endogenous interaction is the interpretative variable of a particular unit that depends on other units. If the space unit can form a spatial lag model due to the spillover effect caused by technology diffusion, resource flow, etc., the model is set as follows: 2 Due

to space constraints, please contact to us for LISA map analysis results if necessary.

192

J. Zhang et al.

Yit = α + ρ

30

Wi j Yit + X it βit + γi + δt + μit

j−1

where ρ is the spatial lag regression coefficient; Yit is the explanatory variable; X it is the 1 × m dimensional explanatory variable; βit is the corresponding m × 1 dimensional coefficient vector; Wi j is the element normalized in the weight matrix; μit represents the error term obeying the independent and identical distribution; γi and δt respectively indicate regional and temporal effects. If the impact of these two aspects is ignored, the calculation of the estimated amount will be inaccurate. The spatial error model (SEM) contains the interaction effects of the error terms. The interaction effect of the error term means that the missing variables in the model are spatially related, or there are unobservable impacts that follows the spatial interaction form. The spatial error model (SEM) assumes that the effects between regions are generated by unknown variables, and the disturbances in this region affect the disturbances in another region. The model is as follows: Yit = α + X it βit + γi + δt + εit εit = ρ

30

Wi j εit + φit

j−1

where ρ is the spatial autoregressive coefficient; γi and δt represent the regional and temporal effects respectively; εit are spatial error terms, φit are random terms of the error term, φit ∼ N (0, σ 2 Iit ). The spatial Durbin model includes both endogenous and exogenous interaction effects models. Exogenous interactions refer to the interpretative variables of a particular unit that depend on independent explanatory variables of other units. The model is as follows: Yit = α + η

30 j−1

Wi j Yit + X it βit +

30

Wi j X it θ + γi + δt + μit

j−1

η and θ are parameter vector. The selection of models has been discussed in the work of Anselin (2005) and LeSage and Pace (2010). It is mainly judged based on the Lagrange estimators. The dependent variable, independent variables and control variables are shown earlier. Table 13.3 shows the results of spatial econometric analysis. The significance and the positive effect of the independent variables showed that the regression results are consistent. This indicates that there is a significant positive impact of industrial structure, energy efficiency and energy structure on carbon emissions during the sample period. Energy prices have a slight significant positive correlation with carbon emission, and institutional factor and government intervention factor have significant negative relations with carbon emission.

13 Exploring the Dynamics of Carbon Emission in China … Table 13.3 Results of spatial panel models

193

Variables

SDM

SAR

SEM

En_TFP

0.694*** (7.04)

0.689*** (7.58)

0.666*** (7.50)

En_str

0.833*** (11.92)

0.855*** (12.14)

0.827*** (11.73)

En_price

0.457* (2.42)

0.577** (3.00)

0.582** (3.04)

Ind_str

0.642*** (8.58)

0.671*** (9.47)

0.642*** (8.74)

Institution

−0.203*** (−5.23)

−0.110** (−3.13)

−0.126*** (−3.32)

Gov_int

−0.057 (−0.85)

−0.057 (−0.86)

−0.044 (−0.66)

rho

0.058 (0.89)

0.147** (2.84)

lambda

0.120 (1.73)

sigma2_e

0.011*** (15.48)

N

480

480

480

LM

412.17

391.15

388.7

Spatial

0.011*** (15.46)

0.012*** (15.46)

Notes t statistics in parentheses*p, you can use “ a/@href “.

14.2.1.3

MongoDB

With the rapid development of data computing technology, the use of highperformance equipment and software has achieved high efficiency in processing large amounts of data. But at the same time it has brought opportunities for low cost, high performance and fast query storage technology. Traditional relational databases

14 Spatial Visualization and Analysis of the Development …

205

are difficult to generate responses quickly when dealing with large, highly concurrent dynamic websites. Aimed at shortcomings of low speed, low efficiency and high requirements of traditional relational database, a big data processing method based on NoSQL database is proposed, and has broken the traditional relationship database model. Data is stored in a freer way, and not rely on a fixed table structure, no longer depend on the relationship between the data, can be quickly and efficiently complete the data document to read, write and query. NoSQL database is a non-relational database, which is also divided into key database, column database, document database and graph database. In this platform, we adopt the document-oriented NoSQL database, MongoDB, as the data storage of our platform, which is not only used to store the collected data, but also to support the data visualization afterwards. MongoDB is based on distributed file storage, its query language is a very powerful, and can store more complex data types by using bjson (Binary JSON) in the form of a document storage. MongoDB has the characteristics of light weight, high efficiency and easy to transport, high performance, and very convenient to be used.

14.2.1.4

The Method of Crawling Data in This Chapter

As show in Fig. 14.4, we have collected registration information for all companies in the Yangtze River Delta of China. Some of the types of companies we classify include electronic computers, real estate, high and new technologies, internetwork, finance, tourism, software development and communication. The command of the crawler to collect data is shown in Fig. 14.5. The configuration code of settings.py is as follows: BOT_NAME = 'myspider' # The name of the crawler project SPIDER_MODULES = ['myspider.spiders'] NEWSPIDER_MODULE = 'myspider.spiders' RANDOMIZE_DOWNLOAD_DELAY = True # Delay of 0.5 s DOWNLOAD_DELAY = 0.5 # The crawler mode is width first SCHEDULER_ORDER = 'BFO' # Configure browser information in header information USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36'

Next, the configuration code of items.py for the data class is as follows:

206

R. Gui et al.

Fig. 14.5 Crawler command and the data display during acquisition. a Start crawler command. b Real-time display of data acquisition

class MyspiderItem(scrapy.Item): Name = scrapy.Field() # company name Type = scrapy.Field() # company type SearchType = scrapy.Field() # Product type EcoScale = scrapy.Field() # Registered capital scale Produce = scrapy.Field() # product Address = scrapy.Field() # company address lat = scrapy.Field() # Company longitude and latitude location lnt = scrapy.Field() Province = scrapy.Field() # province FoundYear = scrapy.Field() # registration time

14 Spatial Visualization and Analysis of the Development …

207

The Url part of the website to obtain company information is: hxs = HtmlXPathSelector(response) # Extract the Url's Xpath data findedUrlfield = hxs.select('//div[@class="itemblocks"]/h3/a/@href').extract() for item_url in findedUrlfield: # Request services for each page, passing the web page processing to the new callback function yield Request(url=item_url,callback = self.parse_item)

Part code of information data extraction and processing in the company’s website: hxs2 = HtmlXPathSelector(response) name = hxs2.select('//div[@class="dn_more"]/table/tbody/tr[1]/td/text()')[0].extract() type = hxs2.select('//div[@class="dn_more"]/table/tbody/tr[2]/td/text()')[0].extract() ecoscale = hxs2.select('//div[@id="busDetail"]/table/tbody/tr[5]/td/text()')[0].extract() produce = hxs2.select('//div[@class="dn_more"]/table/tbody/tr[4]/td/text()')[0].extract() address = hxs2.select('//div[@class="dn_more"]/table/tbody/tr[5]/td/text()')[0].extract() year = hxs2.select('//div[@id="busDetail"]/table/tbody/tr[8]/td/text()')[0].extract() # The code is set to GBK address2 = address.encode('gbk') # Call the longitude and latitude query function and set the approximate location as the target province latlnt = map.getLocation(address2,'XXProvince')

Then we integrate the data and convert them into precise coordinates through the Baidu Map API, save them as json format. Json is a lightweight data interchange format and is easy to parse and generate. After web crawling, we store all data from json files into MongoDB.

14.2.2 Data Visualization Visual analysis is an important method of big data analysis, and data visualization is technically a more advanced technology and method. These techniques allow the use of graphics, image processing, computer vision and user interfaces to build the visualized data. Generally, data visualization can be further processed by data reporting, mapping, actual display, or statistical processing of data. About 80% of the information obtained by human from the outside world comes from the visual system. Through data visualization, human-computer interaction can be better realized. In this section, data visualization is the embedded development of ArcGIS Engine using c#, and MongoDB is connected.

208

14.2.2.1

R. Gui et al.

ArcGis

Geographic Information System (GIS) is a special and important space information system that can collect, store, manage, compute, analyse, display and describe the geographic distribution data in the whole or part of the earth’s surface (including the atmosphere) space, as shown in Fig. 14.6, supported by computer hardware and software systems. It is characterized by a common geographical location and can be displayed by coordinate transformation. It also can collect, manage, analyze and output various geospatial information. The system has strong ability of spatial comprehensive analysis and dynamic prediction, and can generate high-level geographic information. For the purpose of geographic research and geographic decision making, it is a good human-computer interactive spatial decision support system. Therefore, it can be fully applied to the development of smart cities to observe the changes of data through time-space observation. ArcGIS Engine is an embedded development component for custom development of GIS applications, capable of building specialized GIS application solutions by using simple interfaces to obtain combinations of arbitrary GIS functions in C++, COM, .net and Java environments.

Fig. 14.6 Yangtze river delta region are displayed using ArcGIS

14 Spatial Visualization and Analysis of the Development …

14.2.2.2

209

The Method of Data Visualization

We first use the axMapControl of the engine to load the map file into the system, and then query the required data preliminarily from MongoDB. The main code of loading the map file is as follows:

// Open the file dialog box System.Windows.Forms.OpenFileDialog openFileDialog; openFileDialog = new OpenFileDialog(); openFileDialog.Title = "Open the map document"; // The condition is filtered as .mxd map file openFileDialog.Filter = "map documents(*.mxd)|*.mxd"; openFileDialog.ShowDialog(); string filePath = openFileDialog.FileName;//Get map path if (axMapControl.CheckMxFile(filePath)){ axMapControl.MousePointer = esriControlsMousePointer.esriPointerHourglass; // Import the map axMapControl.LoadMxFile(filePath, 0, Type.Missing); axMapControl.MousePointer = esriControlsMousePointer.esriPointerDefault; } else{ MessageBox.Show(filePath + "Not a valid map document");}

Once the data is successfully imported into the visualization platform, it can be displayed on the map control. However, due to the format of the data in the database, we need to further extract and process the data from the database to compare them to more filter criteria such as time, registered capital size, and company type. At the same time, we do coordinate transformation for the filtered data, the coordinate of longitude and latitude of data position is transformed into geodetic coordinate, as shown in Fig. 14.7. Finally, the data are displayed on the corresponding position of map at different times. The code to convert latitude and longitude to geodetic coordinates is as follows:

Fig. 14.7 Data processing before Visualization

210

R. Gui et al.

IMap pMap = pActiveView.FocusMap; IPoint pt = new PointClass(); ISpatialReferenceFactory pfactory = new SpatialReferenceEnvironmentClass(); ISpatialReference flatref = pMap.SpatialReference; //Adopt Beijing 1954 geodetic coordinate system ISpatialReference earthref = pfactory.CreateGeographicCoordinateSystem( (int) esriSRGeoCSType.esriSRGeoCS_Beijing1954); pt.PutCoords(x, y); IGeometry geo = (IGeometry)pt; geo.SpatialReference = earthref; geo.Project(flatref); return pt; //Returns the converted coordinate point

After loading the map, the coordinate points need to be displayed on the map. But the displayed points still need to be configured. In this paper, the red point with a black edge is selected for clear display, and the painted point is marked on the map as an element. What still needs to be noted is that, after all the elements of the points are added, the map needs to be refreshed to show all the points. If the map page is refreshed with every addition, it will cause low speed and affect efficiency due to continuous refresh when there are too many data points. Part of the code is as follows: IPoint point = new ESRI.ArcGIS.Geometry.Point(); IMap map = axMapControl.Map; IMarkerElement markElement = null; activeView = map as IActiveView; ISimpleMarkerSymbol simpleMark = new SimpleMarkerSymbol(); // Load the coordinates point.PutCoords(pointX, pointY); // Set the shape and color style of the coordinate point ISimpleMarkerSymbol simpleMark = new SimpleMarkerSymbol(); simpleMark.Size = 3; simpleMark.Color = getRGB(255, 0, 0); simpleMark.Color.Transparency = 150; simpleMark.Style = esriSimpleMarkerStyle.esriSMSCircle;

14 Spatial Visualization and Analysis of the Development …

211

simpleMark.Outline = true; simpleMark.OutlineColor = getRGB(0, 0, 0); simpleMark.OutlineColor.Transparency = 80; simpleMark.OutlineSize = 1; graphicContainer = map as IGraphicsContainer; IElementCollection elementsCollection = new ElementCollectionClass(); IMarkerElement markElement = new ESRI.ArcGIS.Carto.MarkerElement() as IMarkerElement; markElement.Symbol = simpleMark; IElement element = markElement as IElement; element.Geometry = point; // Add the point to the map container graphicContainer.AddElement(element, 0); // Refresh the map activeView.PartialRefresh(esriViewDrawPhase.esriViewGraphics, null, null);

In addition, we also develop the function of regional statistics and quantitative ranking. When the rectangular area is selected by the mouse, the right Statistics area of the software shows the number of the company, the company density, the total registered economic scale and ranking the number of the various types of Companies. The data is displayed in the form of points to clearly show the spatial distribution of various companies, but another problem is that it does not display the density of the area well. Especially when the area points are too concentrated and strong, the points will overlap. Then we will not be able to get information about the density of the area (Fig. 14.8). Density map, also known as heat map, can show the density of statistics in each region, and be easy to help people understand the distribution characteristics. What’ more, it can well solve the problem of overlap. In this chapter, map areas are divided into plenty of small areas. In each small area, we compute the number of the company or sum up the registered economy, then divide by the area of the region, and then we get the corresponding density. After We calculate all the density of each region and the corresponding range color, finally show them on the map. The code for calculating density is as follows:

Fig. 14.8 Spatial visualization system

212

R. Gui et al.

//Query all the results foreach (Record record in result) { // Check whether the longitude and latitude of each result are within the current display box area, and meet the time, captial scale and company type requirements on this side. TempPoint1 and tempPoint2 are the points in the lower left and upper right corner of the current area respectively if (!(record.lnt < tempPoint1.X || record.lat < tempPoint1.Y || record.lnt > tempPoint2.X || record.lat > tempPoint2.Y) && ifMeetDrawMaprequirement(record.FoundYear, record.EcoScale, record.SearchType, cYear, cMonth)) { // Longitude and latitude conversion tempPoint3 = GetProject(axMapControl.ActiveView, record.lnt, record.lat); // Get the position of the matrix MatrixX = (int)((tempPoint3.X - XMin) / DensityXUnit); MatrixY = (int)((tempPoint3.Y - YMin) / DensityYUnit); // Extract the number of captial scale int tempindex = record.EcoScale.IndexOf(" "); int tempscale = Convert.ToInt32(record.EcoScale.Substring(0, tempindex)); DensityMatrix[MatrixX, MatrixY] += 1; DensityScaleMatrix[MatrixX, MatrixY] += tempscale; } }

After calculating the density of each small area of the matrix, a “color template” can be set from the lowest density of dark green to light yellow, and finally the highest density to dark red. Each density range is configured with a color, and the color squares are displayed on the screen in turn to a density map.

14.3 Realization Due to the platform built by ArcGIS Engine and C#, and the data we get from internet, we can see the results shown as Fig. 14.9 In UI design, a large area of the middle is used to display map controls and data visualization. The left and bottom are the screening sections for the display conditions, including the company type, the company’s registered size, and the time axis. The right side is the map control area and the visual functional area, which can operate on the middle display control, display density and data statistics processing. In addition to the time axis, the earth coordinates and the coordinates of the longitude and latitude of the mouse position are also at the bottom. The dynamic display is to dynamically display the distribution of data points on the visual control according to the time axis on the left. We chose electronics companies and all companies in the Yangtze River Delta as examples to show visualization. Electronic companies include: computers, high

14 Spatial Visualization and Analysis of the Development …

213

Fig. 14.9 UI interface of the platform

technology, internet, software and communications. Registered capital includes all ranges. We can control the display area of the map and use the distribution of points to observe the distribution of the data from a spatial perspective. As can be seen from Fig. 14.10a, Shanghai’s electronics company is mainly located in the center of Shanghai, becoming a sloping rectangular distribution, just like a sun shining around. In this area, the development of companies is very prosperous. From Fig. 14.10b, the most companies in Zhejiang province are distributed in coastal areas, presenting as a semicircular distribution. For example, companies in Hangzhou, Ningbo, Shaoxing, Jiaxing, Wenzhou, Jinhua and other cities are relatively dense, especially around Hangzhou. And they spread outward based on this, while other areas are relatively sparse compared to inland areas. From Fig. 14.10c, most of the current distribution in Jiangsu province is close to the Yangtze river, and all the way to Shanghai, such as Nanjing, Yangzhou, Changzhou, Wuxi and Suzhou. From Fig. 14.10d that the overall distribution of Anhui province is relatively sparse at present, showing like several star distributions. The denser places are mostly in Hefei and several regions along the Yangtze river, which do not spread well. The Fig. 14.11 shows that the whole Yangtze river delta area (electronics companies and all types of companies), companies mostly distributed in the coastal areas and along the Yangtze river, especially in Shanghai, and can be divided into two small triangle and big triangle structure. Small triangle in the Shanghai area is driving the development of the whole triangle. On the other hand, the whole Yangtze river delta and continuously to the small triangle gathered themselves together, and through the

214

R. Gui et al.

Fig. 14.10 Characteristics of electronic company distribution of each province in 2015 a Shanghai b Zhejiang c Jiangsu d Anhui

companies gathered to form company communities to better promote the development of the Yangtze river delta regional interaction. In a further proliferation, there is also a trend of development. It can be said that Shanghai is the development center of the Yangtze River Delta (Cheng and LeGates 2018; Lin et al. 2015). By selecting the column function on the right side of the UI, the number distribution of different rectangular areas can be depicted, as shown in Fig. 14.12. The number marked on the blue bar is the number of companies in the selected rectangular area. The role of spatial distribution of companies can not only show the characteristics of regional economic development, but also be combined with more fields. For example, we can observe how companies interact with transportation development. As shown in Fig. 14.13, the distribution of the all companies in Shanghai changed in different time periods before and after the completion of Shanghai metro lines 1,

14 Spatial Visualization and Analysis of the Development …

215

Fig. 14.11 Company distribution in the Yangtze river delta region in 2015. a Electronic company b All types of companies

Fig. 14.12 The number distribution of electronic companies in some cities in the Yangtze River Delta in 2015

4 and 11. It can be clearly seen that before the completion of corresponding metro lines, there were already corresponding dense company distributions near the lines, which indicates that it is the development of regional economy that influences the completion of subway lines. After the completion of the subway line, the number

216

R. Gui et al.

Fig. 14.13 The interplay between company distribution and metro distribution in Shanghai (The company data are all companies in Shanghai): take metro line 1 (built in 1993), metro 4 (built in 2005) and metro line 11(built in 2009) as examples. a distribution map of metro line 1 in Shanghai. b company registration distribution map of Shanghai in 1990–1993. c company registration distribution map of Shanghai in 1994–1996. d company registration distribution map of Shanghai in 1994–1996. e distribution map of metro line 4 in Shanghai. f company registration distribution map of Shanghai during 2000–2005. g company registration distribution map of Shanghai during 2006–2010. h company registration distribution map of Shanghai during 2011–2015. i distribution map of metro line 11 in Shanghai. j company registration distribution map of Shanghai during 2004–2009. k company registration distribution map of Shanghai during 2010–2014. l company registration distribution map of Shanghai during 2015–2018

of companies near the line continued to develop rapidly and the pattern remained roughly unchanged, which indicated that the subway construction still had a positive impact on the distribution of companies. Transportation and economic distribution complement and reinforce each other. As the leading city in the Yangtze river delta, what other economic characteristics of Shanghai itself can be observed? From Fig. 14.14, we can observe from the angle of time. Through the comparison of the distribution of different times, the distribution of the electronics companies is still mainly concentrated in the center, and becomes more and more intensive from 2000 to 2015. And from Fig. 14.15 we can see, electronic companies in the center of Shanghai are very densely distributed, the number of which accounts for 49.91% of the total of Shanghai in the statistics shown in the Figures. And in all types of companies, communications are the most numerous, with computers and software development second.

14 Spatial Visualization and Analysis of the Development …

217

Fig. 14.14 Data space point display in different years (The data is electronics company in Shanghai): a 2000. b 2005. c 2010. d 2015

Fig. 14.15 Quantitative statistics: a 4624 in the center. b 9265 in the whole Shanghai

The density maps are shown in Fig. 14.16. Figure 14.16a, c show the density map for the quantity. Within a unit area, the larger the number of companies, the more red it will become, the less the number, the closer the color will be to green. Figure 14.16b, d are the registered capital density. From these four figures, the results of the density map show that the closer to Shanghai, the denser the number of companies. Especially in the center of Shanghai, it is the most economically developed area. The research shows that the distribution center of companies in the Yangtze river delta is still Shanghai, which can lead the development of the whole Yangtze river delta and the coordinated development of industry. The color distribution range of quantity is shown in Table 14.2.

14.4 Conclusion With the development of the era of big data, the requirements for data collection, storage and visual analysis are getting higher and higher. Now, the rise of smart city cannot be separated from the development of big data. This chapter aims to provide time and space visualization and analysis of enterprises in the Yangtze River Delta

218

R. Gui et al.

Fig. 14.16 Data density map display in 2015: a Quantitative density in the Yangtze River delta region. b Registered capital density in the Yangtze River delta region. c Quantitative density in Shanghai. d Registered capital density in Shanghai

region, and provide a certain data and image base for the development of the Yangtze River Delta region. Through data and visual analysis, some practical conclusions can be drawn. Through the development of big data acquisition and visualization platforms, this chapter realized effective data acquisition, storage and visual analysis. The main works are summarized as follows: (1) Use web crawler to obtain information on the network with python language. In the XPath method, using algorithm based on the tree node text extraction and parallel information collection, effective means against anti reptiles, and parallel acquisition with multi threads. This method has high efficiency, flexibility, and access to comprehensive data. (2) In the management of the data, use MongoDB database based on the NoSQL to solve the traditional storage bottleneck problem. The collected information is sorted, sorted and stored in the database, and efficient data query is achieved.

14 Spatial Visualization and Analysis of the Development …

219

Table 14.2 Density map color range of quantity RGB color

Quantity density range (unit: a/km2 )

Scope of captial economic density (unit: RMB thousand/km2 )

Remarks

(0, 150,20) (0, 180, 10)

0–0.02

0–2

Aqua

0.02–0.1

2–300

(0, 210, 0)

0.1–0.2

300–600

(0, 240, 0)

0.2–0.3

600–900

(30, 250, 0)

0.3–0.4

900–1200

(60, 250, 0)

0.4–0.5

1200–1500

(90, 255, 0)

0.5–0.6

1500–1800

(120, 255, 0)

0.6–0.7

1800–2100

(150, 250, 0)

0.7–0.8

2100–2400

(200, 250, 0)

0.8–0.9

2400–2700

(250, 250, 0)

0.9–1

2700–3000

(250, 200, 0)

1–1.1

3000–3300

(250, 150, 0)

1.1–1.2

3300–3600

(250, 100, 0)

1.2–1.3

3600–3900

(250, 50, 0)

1.3–1.4

3900–4200

(255, 0, 0)

More than 1.4

More than 4200

Yellow green

Yellow

Red

(3) Develop ArcGIS by using C# to implement a visual interface. Further analyze the mining data from the perspective of time and space, obtain the details of the statistics, distribution and changes of the data, and explore and analyze the development changes. The combination of big data and smart city development will become hotter, but the development of big data is still in its infancy, and people have more and more demands for the visualization and analysis of big data. Especially in urban development, by analyzing and mining the spatio-temporal view of data between regions and regions, we can find more useful values to get rid of the drawbacks of backward traditional urban planning. This method can not only help the development of the Yangtze river delta region, but also extend to the pearl river delta and cities, and provide better tools and means for national development and construction. It also lays the foundation for future functions and development.

References Cheng, Y., & LeGates, R. (2018). China’s hybrid global city region pathway: evidence from the Yangtze River Delta. Cities, 77, 81–91.

220

R. Gui et al.

Deren, L., Yuan, Y., & Zhenfeng, S. (2015). Big data in smart city. Geomatics And Information Science of Wuhan Univers, 58(10), 1–12. Guo, X., & Liu, Z. G. (2011). Researching and implementing of multiple resources management information system of Maoershan experimental forestry based on ArcGIS engine. Advanced Materials Research, 268–270, 1360–1366. Han J., E, H., Le, G., & Du, J. (2011). Survey on NoSQL database. 2011 6th International Conference on Pervasive Computing and Applications, Port Elizabeth, 363–366. Hashem, I. A. T., Chang, V., Anuar, N. B., Adewole, K., Yaqoob, I., Gani, A., et al. (2016). The role of big data in smart city. International Journal of Information Management, 36(5), 748–758. Kang Y. S., Park, I. H., Rhee, J., & Lee, Y. H. (2015). MongoDB-based Repository Design for IoT-generated RFID/Sensor Big Data. IEEE Sensors Journal, 1. Lin, X., Quan, H., Zhang, H., & Huang, Y. (2015). The 5I model of smart city: A case of Shanghai, china. IEEE First International Conference on Big Data Computing Service and Applications, 2015, 329–332. Lv, Z., Yin, T., Zhang, X., Song, H., & Chen, G. (2016). Virtual reality smart city based on WebVRGIS. IEEE Internet of Things Journal, 3, 1015–1024. Ma, L., Cheng, L., & Li, M. (2013). Quantitative risk analysis of urban natural gas pipeline networks using geographical information systems. Journal of Loss Prevention in the Process Industries, 26(6), 1183–1192. Myers, D., & McGuffee, J. W. (2015). Choosing Scrapy. Journal of Computing Sciences in Colleges, 31(1), 83–89. Pan Y., Tian, Y., Liu, X., Gu, D., & Hua, G. (2016). Urban big data and the development of city intelligence. Engineering, 2(2), 171–178,185–192. Wu Z., & Ye, Z. (2016). Research on urban spatial structure based on baidu map thermal map—a case study of central urban area of Shanghai. Urban Planning, 40, 33–40. Yunus, S., Sinem, G.S., & Mehmet, K. (2017). Big Data and Restful Based Web Api for Smart Health Application in Smart Cities. International Conference on Advanced Technology & Sciences. Yamamoto, K. (2015). Visualization of GIS analytic for open big data in environmental science. International Conference on Cloud Computing and Big Data (CCBD), 2015, 201–208. Zhang, X., & Xu, F. (2013). Survey of research on big data storage. 2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science, Kingston upon Thames, Surrey, UK, 76–80.

Chapter 15

High Performance Spatiotemporal Visual Analytics Technologies and Its Applications in Big Socioeconomic Data Analysis Zhipeng Gui, Yuan Wang, Fa Li, Siyu Tian, Dehua Peng, and Zousen Cui

15.1 Introduction Spatial computing methods are crucial to advance Social Science and Humanity research. Nowadays, with the development of Volunteered Geographic Information (VGI) and Internet of Things (IoT), spatial social science and humanities research has shifted from a data-scarce to a data-rich environment. The booming of big spatiotemporal socioeconomic data would facilitate the modelling of macro-level as well as individuals’ socioeconomic behavior in space and time. The outcomes of such models can quantitatively analyze socioeconomic activities using full samples, and also creates new opportunities for revealing socioeconomic trends across spatial scales. To comparative study such spatiotemporal phenomena, a powerful visual analytical framework for effectively identifying interesting events and discovering Z. Gui (B) · S. Tian · D. Peng School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China e-mail: [email protected] S. Tian e-mail: [email protected] D. Peng e-mail: [email protected] Y. Wang · F. Li · Z. Cui State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China e-mail: [email protected] F. Li e-mail: [email protected] Z. Cui e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_15

221

222

Z. Gui et al.

hidden patterns, anomalies, and relations from datasets is fundamental. However, building such a visual analytics framework faces tough technical challenges in ear of big data. As the space-time data accumulate, the rich details of spatiotemporal dynamics remain largely unexplored because of binding constraints on data management, computing and visualization. The heterogeneity and streaming feature of multisource data, extremely large data volume and intensive computation impede the efficiency and computability of visual analytics. Therefore, a novel analytical framework with a flexible system architecture for integrating latest computing technologies is fundamental and highly-desired. In this chapter, we introduce a multi-tier computing framework for supporting web-based visual analytics of big socioeconomic data. In this framework, the latest enabling technologies that cover the major steps of data analysis workflow, are considered throughout as a full-stack solution, including storage, preprocessing, computing, transmitting and visualization. The architecture of the proposed framework is illustrated in Fig. 15.1, which compose of storage layer, computing layer and web visualization layer for big heterogeneous data management, high-performance-supported data analysis and interactive visualization respectively.

Visualization Libraries

Web visualization Layer

Echart

D3.js

Storage Layer

...

Web Framework

Django Angular Vue.js

React

Computing Layer

Kepler.gl

Communication Components & Web API HPC Framework & Technologies Hadoop / Spark / FLink

SQL Databases

MPI / OpenMP

CUDA

NoSQL Databases

Data Cubes

Spatial Indexes Distributed File System

Fig. 15.1 Generic computing framework for web-based high-performance big data visual analytics

15 High Performance Spatiotemporal Visual Analytics Technologies …

223

Data storage is critical for data analysis and visualization in subsequent operations. Cutting-edge data storage technologies, such as NoSQL databases, spatial or full-text index and data cubes, provide approaches to handle data heterogeneity, and improve I/O performance of data query and access. High performance computing (HPC) technologies, such as Apache Spark and CUDA in the computing layer, enables data transformation, processing and analysis, in a streaming and real-time fashion, making interactive analysis possible. HPC accelerates computing by decomposing data and scheduling computing tasks in parallel or in distributed computing environments. Utilizing state-of-the-art HPC framework, advanced analysis and computing functions can be easily implemented or integrated, such as spatial autocorrelation and machine learning algorithms. The communication components and Web APIs between the client-side (visualization layer) and server-side (computing and storage layers) are for data transmission and communication optimization. Web framework and visualization libraries are essential to build web client, which can provide capabilities and flexibilities to support web Graphical User Interfaces (GUI) design and provide rich visualization effects. The rest of the chapter is organized as follows: Sect. 15.2 describes spatial index and storage mechanisms for efficient spatial data access. Section 15.3 presents HPC methods and frameworks for accelerating spatial processing. Section 15.4 introduces web-based visualization technologies that provide interactive visual spatial analytics functions. Section 15.5 demonstrates a HPC-supported visual analytics application by using big enterprise registration data as an example. Section 15.6 concludes the chapter with discussions.

15.2 Spatial Index and Storage Mechanisms The amount of socioeconomic and social media data increases explosively on a daily basis, with the advances in the Internet of Things. In contrast to traditional spatial data, these data have larger data volume, higher complexity, and heterogeneity in data modes and relations. This innate characteristic of Big Data challenges traditional data storage methods, thus limiting the capacity for efficient analysis and visualization. We introduce the fundamental technologies and state-of-the-art databases for spatial big data storage that tackle such problems.

15.2.1 Spatial Indexing Due to the complexity of spatial operations like spatial queries, spatial data needs sophisticated index mechanism for accurate retrieval and efficient processing. A spatial index is a data structure arranged in a certain order according to spatial distribution of data. Based on the construction principle, spatial indexing can be categorized into space-driven and data-driven structures. Space-driven data structures are

224

Z. Gui et al.

based on partitioning of space into rectangular cells, independently of the distribution of the spatial objects. While data-driven structures are organized by partitioning the spatial objects, which adapts to the objects’ distribution (Rigaux et al. 2002).

15.2.1.1

Space-Driven Structures

Space-driven structures partition the embedding space into cells and map Minimum Bounding Boxes (MBRs) to the cells according to spatial relationships, e.g., overlap or intersect. Based on the mechanism used for division of space, space-driven structures include grid indexing, quad-tree and geohash techniques, as shown in Fig. 15.2. Grid indexes divide space into array of cells; intersecting or overlapping spatial objects are associated with each cell. A geohash is a geocoding system based on space as divided into longitude-latitude rectangles encoded as binary strings. A quadtree is a tree-like structure where each node in a quad-tree represents a bounding box covering certain part of the space. The concepts, partitioning and encoding methods of space-driven structures are straight-forward, which have become the buildingblock of GIS data structures. Databases like IBM DB2, Microsoft SQL Server and ESRI geodatabases adopt this indexing method.

A1

A2

B3

B4

11010

11011

111

B5

1100 C1

C2

C3

C4

D1

D2

D3

D4

E1

E2

E3

E4

0

C5

10

(a) Grid Index 00

0100

(b) Geohash Index 0101 Root

b a

0110

0111

1100

1101

00 10

c 1110

1111

01

10

11

[a]

00

01

10

[a]

[b]

[a ]

(c) Quad-Tree Index

Fig. 15.2 Examples of Grid, Geohash and Quad-Tree Index

11

00

01

[c]

[c]

10

11

15 High Performance Spatiotemporal Visual Analytics Technologies …

225

m8 R7 R4

m10

m4 m1

R2

R1 m3

m9 m11

R6 R7

m5 R5

m12 R1 R2 R3

R6

R4 R5

m6

m2 m7

R3 m1 m2 m3 m4 m5 m6 m7

m8 m9 m10 m11 m12

Fig. 15.3 An example of R-tree Index

15.2.1.2

Data-Driven Structures

Data-driven structures use the spatial containment relationship instead of the order of the index. These structures, such as R-tree, adapt themselves to the MBRs of the spatial objects (Zhang et al. 2017). An R-tree consists of a hierarchical index on the MBRs of the geometries as shown in Fig. 15.3. This hierarchical structure is based on the heuristic optimization of the area of MBRs in each node in order to improve the access efficiency (Theodoridis et al. 2000). Data-driven advanced indexing structures are developed and widely-adopted in GIS tools and spatial databases for efficient data query as discussed in the following section.

15.2.1.3

Advanced Spatial Indexes

In recent decades, advanced spatial indexes have been developed, including R-tree variants (Balasubramanian and Sugumaran 2013) and dynamic indexes (Kamel et al. 2017). R-tree variants as shown in Fig. 15.4, improve the original R-tree by changing indexing process, and can be combined with other methods or enhanced with extensions. The process changes in the construction of R-Tree, usually aim at minimizing overlap of tree nodes. Hybrid indexes combine R-tree with other spatial indexes such as Hillbert curve, K-D Tree, and Hash, for supporting advanced capabilities. Extension corresponds to R-Tree extended to store additional information to more effectively process unique types of queries so that extended application can be supported.

226

Z. Gui et al.

R-Tree

Process change

Hybrid

Extension

R+ Tree

Hilbert R-Tree

DR Tree

R* Tree

R k-d Tree

RT Tree

Packed R-Tree

HR Tree

3D R-Tree

Buffer R-Tree

R*Q Tree

Historical R-Tree

Priority R-Tree

Q+R Tree

Partially Persistent R-Tree

X Tree

Vo R-Tree

Parametric R-Tree

Multi Small Index

FNR R-Tree

Fig. 15.4 The family of R-Tree variants

15.2.2 Spatial Databases A spatial database provides an “all-in-one” solution for supporting spatial data store and access. In addition to the well-known relational databases, NoSQL databases have become increasing popular.

15.2.2.1

Relational Databases

Relational databases, or SQL databases use relation model to storage spatial data. In SQL spatial databases, geometric features are represented in records by multiple key values including coordinates and associated attributes. The mainstream SQL spatial databases contain traditional SQL and SQL with stock support. Traditional SQL with spatial feature extensions, includes databases such as IBM DB2 with Spatial Extender, Oracle Database with Oracle Spatial and Graph, PostgreSQL with PostGIS Extension, and SQLite with SpatiaLite. SQL database with stock support for spatial data types, include Microsoft SQL Server, MySQL, TerraData GeoSpatial, and Boeing’s Spatial Query Server. In addition, open-source or free license databases such as PostGIS or MySQL are widely applied.

15 High Performance Spatiotemporal Visual Analytics Technologies …

15.2.2.2

227

NoSQL Databases

NoSQL databases stem from the unsatisfactory performance of relational databases in scenarios where instant queries and fast updates rather than strict validity are required to query huge volumes of data. Originally, NoSQL means “non-SQL” or “non-relational”. Nevertheless, the term has been extended to “Not Only SQL”, which means a new generation of database designs, which highlight scalability and availability for emerging applications. For improving performance, NoSQL databases compromise consistency with the concept of eventual consistency, which means queries might not return updated data immediately or might result in reading data inconsistent with their real status. To deal with different application scenarios and address heterogeneity of data, NoSQL databases with different data modes have been developed, including key-value databases like Aerospike and MemcaheDB, column databases like HBase, Dynamo, and Cassandra which is used by Facebook to store social media data, graph databases like Neo4j, and document databases like CouchDB. Another important issue needed to be addressed is how to store huge dataset with increasing data volume using commodity device and guarantee high availability. The next section will introduce distributed databases as a solution to such an issue.

15.2.2.3

Distributed Databases

Distributed database technologies were developed to handle the growing data volume by making database systems distributed across physically dispersed hardware. Distributed databases can be regarded as a collection of separated database systems that communicate with each other (Fig. 15.5). The major advantages of a distributed database system are flexibility and scalability, while the trade-off is the extra communication and computation cost on data synchronization and validation, that all the storage nodes need to keep local data most updated. Currently both mainstream SQL and NoSQL databases highlight their abilities in supporting distributed storage, such as Oracle, DB2, MongoDB and Cassandra. Research on distributed spatial index is also emerging, e.g., using distributed index to manage geo-spatial data in IoTs (Fox et al. 2013; Fathy et al. 2017; Zhang et al. 2016). While distributed database seems to be a good solution for big data storage, there are some certain scenarios, such as lots of operations against the raw data need to be performed. Comparing with databases, file system is a more simple way to store dataset using original raw data structure directly.

228

Z. Gui et al. Memory Memory Database Database Location 1 Communication Channel

Location 2

Location 3 Database

Memory

Fig. 15.5 An exemplary physical architecture of Distributed Database System

15.2.3 Distributed File System Distributed file systems1 (DFSs) are also widely used for big data management, especially in high I/O and high-performance computing applications. Like a distributed database, a DFS does not share block-level access details and thus is “transparent” and “invisible” to end users. The major difference between a DFS and a distributed database is that the latter uses different APIs to extract datasets or views with different semantics. DFS, in contrast, allows files to be accessed using the same interfaces and semantics. The Google File System (GFS) and Parallel Virtual File System (PVFS) are two representatives of distributed file systems. The Hadoop Distributed File System (HDFS) maintained by Apache is the most popular open-source implementation of GFS. HDFS is widely used in both academia and industry to store many types of data, such as imagery, user profiles, web pages and web logs. There are many other kinds of DFS such as Lustre, Andrew File System (OpenAFS) as well as Microsoft DFS. Using DFSs to manage big and streaming geospatial data has become more and more popular, some relevant studies are worth attention (Zhang et al 2016; Fahmy et al. 2017; Hu et al. 2018). DFSs such as HDFS are well-supported by high-performance computing frameworks as the basic data storage mode for conducting distributed computing. In next section, we will introduce the latest high-performance computing technologies and frameworks.

1 Distributed

file systems. In Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index. php?title=Distributed_file_systems&oldid=869574529. Accessed 21 Feb 2019.

15 High Performance Spatiotemporal Visual Analytics Technologies …

229

15.3 High Performance Computing Technologies For the last few decades, massive volumes of disparate, dynamic and distributed spatiotemporal data, generated by ubiquitous earth observation systems and IoTs, have become significant research material in the spatial humanities and social sciences. However, the limitations of traditional computing methods impede the exploration of spatiotemporal patterns and dynamics hidden behind these massive datasets. In this context, high performance computing technologies has made great progress to tackle the challenges. This section introduces basic computing paradigms in high-performance technologies, the mainstream frameworks, and their application in the spatial humanities and social sciences.

15.3.1 Computing Paradigms Big data can be both computing-intensive and data-intensive. Usually, big data computation requires massive computing resources such as CPU and memory, with long processing times due the complexity of the analytical algorithms. A computation process may also involve intensive I/O tasks and large network communication overheads because of massive data volume. To address these demands, computing task decomposition and scheduling mechanisms divide the processing operations into pieces and allocate them to different processing units. In turn, large volume datasets are subdivided into many segments, then cached and processed on nearby processing units to avoid frequent data transmission and reduce I/O workloads. According to the differences in computing environments and modes, high-performance computing technologies can be categorized into parallel computing and distributed computing in general.

15.3.1.1

Parallel Computing

Parallel computing handles intensive computing tasks by dividing them into a set of sub-tasks that can be solved concurrently. Apparently, the procedures in parallel computing vary with the features of hardware, which are divided into CPU-based parallel computing and General-Purpose computing on GPU (GPGPU). CPU-based parallel computing is a type of classical parallel computing method, including multi-core CPU and multi-CPU. The major goal is to maximize the speedup ratio with high parallel efficiency, which is influenced by the relative proportions of parallelizable serial code and the communication overhead between cores or CPUs. Data and control flow synchronization on distributed memory must be properly handled; otherwise, the modifications for parallel processing may introduce inconsistences and fatal errors. In general, CPU-based parallel computing is used to meet

230

Z. Gui et al.

Fig. 15.6 The GPGPU Pipeline in four steps

low hardware requirements, but the acceleration effect is significantly affected by the computing capacity of a single machine. GPGPU refers to the use of GPU for general-purpose processing rather than graphic processing, which results from advances in GPU hardware technologies. A GPU usually contains hundreds or even thousands of tightly coupled cores and achieves significant gains in performance. In general, a GPGPU pipeline is conducted between CPUs and GPUs. CPUs move the data to GPUs, and the GPUs analyze the data simultaneously, on a large number of cores. As is shown in Fig. 15.6, the pipeline can be generally finished in four steps. Firstly, CPU initiates the environment and allocates memory for input datasets. Secondly, the input datasets are transferred to a GPU. Thirdly, the cores of a GPU execute the iterations simultaneously. Lastly, the results are retrieved from the GPU. For simple, but strongly repetitive operations such as matrix computations, GPGPU offers advantages over CPU-based computing. However, computing capacity and applications of parallel computing technologies is limited, because of the constricted computing resources of a single machine.

15.3.1.2

Distributed Computing

As the data volume increase, a single machine will not be able to efficiently store and process them; hence, the distributed computing using computer clusters develops. Usually in distributed computing, instructions are sent to the computing nodes where the data located, rather than transferring the datasets to distrinct nodes. Therefore, every node focuses on processing its local datasets. As is shown in Fig. 15.7, batch processing and stream processing are two major types according to the data processing time pace.

15 High Performance Spatiotemporal Visual Analytics Technologies …

231

(a) Batch processing

(b) Stream processing Fig. 15.7 Distributed computing paradigm

In batch processing, datasets are collected over time, and fed to the processing engine as batch (Fig. 15.7a). It can be triggered in diverse ways such as fixed time interval or data size. Since batch processing can be carried out on distributed computers, fault tolerance mechanisms must be established to avoid single points of failure apart from data partition strategies, task schedulers, and communications. In scenarios like offline geospatial data analysis, batch processing methods are the right choice. In stream processing, datasets are fed directly to the engine piece by piece when they arrive (Fig. 15.7b). Ideally, the time when datasets are produced and the time when datasets are processed should be equal. However, there is highly mutable skew between them in reality caused by input sources, stream processing engines, and hardware. The skew may generate time latency in processing pipeline and affects the correctness and completeness of the computation in turn. To deal with the problem, native stream processing systems offers diverse mechanisms, including windowing, watermarks, triggers and accumulation. For applications like real-time data analysis, warning systems, and financial transactions, stream processing outperforms other methods.

232

Z. Gui et al.

15.3.2 Mainstream Frameworks Due to the vast demands for parallel/distributed computing technologies, many commercial or open source HPC frameworks are developed. Using these frameworks, developers can implement parallel programing much easier as compared to hard coding the program from scratch. In the section, we will introduce some widelyused frameworks corresponding to the computing paradigms that we have explained above.

15.3.2.1

OpenMP

Open Multi-Processing (OpenMP2 ) is a library for CPU-based parallel computing on standalone computers. It is implemented based on the fork/join programming model. The programs start on a single master thread and fork additional threads where operations must be executed in parallel. When the parallel operations are finished and synchronized, they are joined back together. The advantages of OpenMP includes: (1) easy to implement parallel program with a few code modifications; (2) OpenMP program can be run as serial codes; (3) codes are easier to understand and maintained. However, there are also disadvantages: (1) the program can only run on shared memory computers; (2) requires compiler support; (3) application scenario is limited to a few program structures, e.g., loop.

15.3.2.2

MPI

Message Passing Interface (MPI) is one of the major programming models and specifications for CPU-based parallel computing on computer clusters. The most commonly used implementation of MPI is Open MPI,3 which was derived from many early projects, such as FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI. MPI hides differences in hardware architecture, performs necessary data conversion and switches communications protocol automatically. Therefore, MPI makes it possible to run programs on heterogeneous systems and groups of processors with distinct architectures. MPI can be run on both shared and distributed memory architectures and has wider applications than OpenMP. However, MPI programs exact a relatively higher cost on programming and debugging, and the performance will be limited by the communication network between nodes.

2 The OpenMP API specification for parallel programming. https://www.openmp.org. Accessed 21 Feb 2019. 3 Open MPI: Open Source High Performance Computing. https://www.open-mpi.org. Accessed 21 Feb 2019.

15 High Performance Spatiotemporal Visual Analytics Technologies …

233

Fig. 15.8 Programming model of CUDA

15.3.2.3

CUDA

Compute Unified Device Architecture4 (CUDA) is a GPGPU API based on C programming language exclusively for NVIDIA GPUs. Figure 15.8 demonstrates the programming model of CUDA. In this model, the host refers to the CPU that controls the computing procedure, while device refers to the GPU that executes the computing tasks. Different tasks are expressed as different kernel functions and assigned to different grid. For each grid, the tasks are divided to several thread blocks, which are handled by a relevant streaming multiprocessor (SM). Then, threads in the blocks physically execute the tasks on the processing cores. Synchronization and communications between the threads in the same block are supported by shared memory. Therefore, the CUDA model is in sync with the hardware design of NVIDIA GPUs. CUDA can dramatically speed up massive parallel jobs, especially for applications like image processing, model simulations, and machine learning. Nevertheless, CUDA demands high hardware requirements for intensive computation. Moreover, 4 NVIDIA

Corporation. (2019). Develop, Optimize and Deploy GPU-accelerated Apps. https://dev eloper.nvidia.com/cuda-toolkit. Accessed 21 Feb 2019.

234

Z. Gui et al.

Input

Splitting K1,V1 Dear Bear River

Dear Bear River Car Car River Dear Car Bear

Car Car River

Dear Car Bear

Mapping

Shuffling

List(K2,V2)

Reducing

K2,List(V2)

Dear,1 Bear,1 River,1

Bear,(1,1)

Bear,2

Car,(1,1,1)

Car,3

Dear,(1,1)

Dear,2

River,(1,1)

River,2

List(K3,V3)

Car,1 Car,1 River,1

Dear,1 Car,1 Bear,1

Final Result

Bear,2 Car,3 Dear,2 River,2

Fig. 15.9 MapReduce process taking Word Count as an example

GPU fails on branch prediction, if a program contains very chaotic instruction flows, then the GPU will become slower than the CPU.

15.3.2.4

Hadoop

Apache Hadoop5 is an open-source framework for scalable, reliable and distributed computing. The core computing mechanism of Hadoop is MapReduce, which consists of two functions: a map function that applies a specific operation to input datasets and produces a set of key/value pairs on distributed nodes, and a reduce function merges all intermediate values associated with the same key on the nodes. An example is word count, which involves map and reduce as illustrated in Fig. 15.9. Hadoop takes care of data partitioning, scheduling, load balancing, fault tolerance, and network communications, so programmers can focus on the design for distributed applications. There have been some spatial extension libraries for Hadoop. For example, SpatialHadoop (Eldawy 2014; Eldawy and Mokbel 2015) adapted Grid File, R-tree and R+-tree indexes to partition data across nodes and organize local records inside each node. These enhancements accelerate typical spatial operations and geometric operations, including range query, KNN and spatial join. SpatialHadoop has been applied in traffic data processing, as well as in map and satellite data analysis.

15.3.2.5

Spark

Apache Spark6 is a unified computing engine for distributed clusters, which offers libraries for SQL, stream computation, graph computation and machine learning. Its 5 The Apache Software Foundation. (2018). Apache Hadoop. https://hadoop.apache.org. Accessed 21 Feb 2019. 6 Apache Spark. https://spark.apache.org. Accessed 21 Feb 2019.

15 High Performance Spatiotemporal Visual Analytics Technologies …

235

core programming model, Resilient Distributed Dataset (RDD), provides transformations (e.g., distinct, filter, map and sort) and actions (e.g., reduce, count, first and take) to break through the limitations of Hadoop. In Spark, a job consists of multiple transformations and an action, the transformations build up a Direct Acyclic Graph (DAG) of instructions, and the action begins the execution of the graph. The result of a Spark job is stored in memory by default, so the I/O time cost for writing and reading data is much less than Hadoop jobs. Furthermore, high-level APIs like DataFrames and Datasets have also been developed, making it more convenient to process and manipulate big data. The high speed, ease of use, generality, and compatibility of Spark has attracted the interest of large numbers of developers to solve geospatial problems. Frameworks and libraries like GeoSpark (Yu et al. 2018) and GeoTrellis (Kini et al. 2014) are applied to process big geospatial datasets for humanities and social sciences research, including geospatial datasets IO, geospatial indexes, and geospatial operations.

15.3.2.6

Flink

Apache Flink7 is also an open source unified computing framework based on Google’s DataFlow model, processing datasets in native stream rather than by microbatches in Spark Streaming. Flink uses custom memory management and serialization methods to avoid the costs of garbage collection in JVM, which makes it easier to achieve lower latency and higher throughout than micro-batch processing (Karimov et al. 2018). The core component of Flink is a distributed system that accepts stream programs and executes them in cluster with fault tolerance, i.e., “Flink runtime” in Fig. 15.10. Flink also provides a wide range of high-level and user-friendly APIs to develop programs with flexibility, including DataStream API, DataSet API and Table API. These factors have attracted a broader community for Flink than other stream processing frameworks. Flink is still an emerging framework, and geospatial related extensions are not mature. Nevertheless, it has been applied to achieve real-time transport analytics and spatial semantics processing (Hennig et al. 2016). As demand for low latency grows, for the foreseeable future Flink will play a key role in real time data processing.

15.3.3 Applications in Spatial Humanities and Social Sciences High performance computing technologies have attracted many researchers seeking to handle massive geospatial datasets. Here we will introduce several applications in the spatial humanities and social sciences. 7 Apache

Flink- Stateful Computations over Data Streams. https://flink.apache.org. Accessed 21 Feb 2019.

236

Z. Gui et al.

Fig. 15.10 Key components of Flink Stack (Friedman and Tzoumas, 2016)

15.3.3.1

Social Phenomenon Analytics

The investigation on spatiotemporal distribution and spatial interaction between social events can benefit governmental policy making and individual-level activity planning. However, computing multi-level space-time interaction is time-consuming and hinders the research progress. For instance, to compute a series of space-time interactions with 32,505 crime event records, it would take around 48 min to complete calculation for 1,000 runs on a desktop GPU. In contrast, Keeneland powered by MPI and GPGPU technologies will spend only 264 s to finish the same task (Ye et al. 2017).

15.3.3.2

Urban Mobility Simulation

Cities are complex systems, and the agent-based model (ABM) for simulation of urban mobility is an effective approach to discover patterns of cities. Nevertheless, intensive computing and extremely long urban simulation running times create challenges for researchers. HPC technologies (like MPI) have been used to implement ABM and simulation frameworks like Repast HPC speed up agent-based geosimulation. However, experiments show that approximation of Point of Attraction (PoA) information effectively boosts efficiency but with less simulation accuracy (Zia et al. 2013). So the tradeoff between the efficiency and accuracy should be considered carefully.

15.3.3.3

Social Sensing

Many spatiotemporal data sources currently capture daily human behaviors introducing the new field of social sensing. However, the big volume and dynamic

15 High Performance Spatiotemporal Visual Analytics Technologies …

237

attributes of spatiotemporal data brings challenges to efficient and scalable spatial operations. MPI and Spark have been adopted to accelerate spatial join processing over large-trajectory dataset and road network data for map matching (Stojanovic and Stojanovic 2013). However, different data splitting strategies result in different I/O and computing workload, as the result produce different performance results. The efficiency of uniform splitting by fixed grid size decreased rapidly when number of processors increased (only 18% when 16 processors). When spatial splitting considers spatial distribution of data, the efficiency remains high (about 95%) even when the processors grow to 16. Therefore, skewness of spatial data must be properly handled when using HPC technologies.

15.4 Web-Based Visualization Visualization helps people make sense of information much more rapidly (Wang et al. 2020). For socioeconomic data, effective visual analytics can better assist analyst to explore spatiotemporal relationships, mining influencing factors and potential rules of human social phenomena or the processes hidden behind data. Today, internet is the most important channel to acquire information. Web visualization has become more popular as compared to traditional stand-alone or client/server-based visualization (Bender et al. 2000). The wide adoption of cloud computing has triggered a trend toward data processing and management on the cloud or server-side. Cloud computing resources accomplish analytics tasks for massive data, thus avoiding risks associated with transferring sensitive or large volume raw data to the client side. In addition, with the pervasive use of mobile devices and heterogeneous terminals, loosely coupled software architectures that have good portability and scalability are highly desired. To create sophisticated visual analytics applications with such web-based architectures, developers utilize open-sourced visualization tools and web application frameworks to make programming more succinct and efficient. Optimized data model and communication technologies could improve the user experience associated with the volume of data transmission problems.

15.4.1 JavaScript-Based Visualization Libraries In recent years, various dynamic web page and visualization technologies have emerged. These include JavaScript, Java Applet, Java Server Page (JSP), Flash, Flex, Silverlight and Web Graphics Library (WebGL). With the trend of standardization, major manufacturers of Internet have gradually reached a consensus on web standards, and HTML 5 has become the de facto standard. In this context, JavaScriptbased visualization has become a trend and compatible with various browsers without installation of any plugins.

238

Z. Gui et al.

JavaScript-based visualization technologies and libraries have become pervasive. WebGL is a 3D drawing protocol that combines JavaScript and OpenGL ES 2.0. Developers can utilize WebGL to command graphic processing units (GPU) to streamline 3D scenes in the browser, and achieve complex scene navigation and interaction for data visualization. Advanced graphics libraries, such as D3.js, Deck.gl, Kepler.gl, Plotly.js, Three.js, and Leaflet provide plenty of visualization forms, not only making programming much easier but also make the results more professional and eye-catching.

15.4.1.1

D3.Js

D3.js8 is a data-driven JavaScript library that binds arbitrary data to the Document Object Model (DOM) to archive data-driven transformations. Different from most open source chart libraries, it allows users to customize the style of charts. D3 provides a convenient method to set attributes or styles of nodes called selections, as defined by W3C Selectors API rather than the traditional W3C DOM API. D3 with a small workload for DOM manipulation has been widely used to visualize the distribution pattern of geographical phenomena. D3 provides a variety of visualization methods to present the patterns of geographical phenomena. Figure 15.11a is a choropleth map which shows unemployment rates data of US from Bureau of Labor Statistics, Census Bureau, in August, 2016. Figure 15.11b uses histograms to present the medical cost of hip replacement by state. Figure 15.11c. presents a hex-agonal heatmap and Fig. 15.11d illustrates the topography of Maungawhau with D3-contour and D3-hsv. D3.js provides plenty type of charts for two-dimensional visualization, while some other libraries e.g., Deck.gl are dedicates to map vis-ualization and provide professional scenario-related visualization effects.

15.4.1.2

Deck.Gl

Deck.gl9 is a WebGL-powered dataset visualization framework. The predecessor of Deck.gl was an Uber project to better understand the human travel behavior, which use maps to present big data of passenger and driver movements from where to get on and where to drop off to further optimize service quality. Deck.gl is not only good at static presentation of different types of maps such as 3D histograms and migration maps, but also supports state-of-the-art animations to reveal the spatiotemporal dynamics behind dataset, such as huge amount of taxi tracks. Using Deck.gl, users can easily design various cool 3D histograms, scatter plots and dynamic trajectories. Figure 15.12a reveals the road safety in UK by counting personal injury road accidents in Great Britain from 1979 to 2017. 8 D3:

Data Driven Document. https://d3js.org. Accessed 21 Feb 2019. http://deck.gl. Accessed 21 Feb 2019.

9 Deck.gl.

15 High Performance Spatiotemporal Visual Analytics Technologies …

(a)

(b)

(c)

(d)

239

Fig. 15.11 Four examples of D3.js, a unemployment rates of the US (Mike Bostock. (2017). D3 Choropleth Unemployment rate by county. https://beta.observablehq.com/@mbostock/d3-chorop leth. Accessed 21 Feb 2019), b medical cost of hip replacement in US (Phuoc Do. (2015). Medical Cost of Hip Replacement by State. https://vida.io/documents/s5qo5Gwrct5HNxAD2. Accessed 21 Feb 2019), c hexagonal heatmap (D3: Data Driven Document. https://www.visualcinnamon.com/ 2013/07/self-organizing-maps-creating-hexagonal.html), d the topography of Maungawhau (D3: Data Driven Document. https://observablehq.com/@d3/volcano-contours)

Figure 15.12b shows the flight paths of London Heathrow Airport in a 6-hours window. Figure 15.12c shows the yellow cab and green cab trips in Manhattan, New York city, using a dynamic track display. Figure 15.12d demonstrates the highway safety in the US. Due to the successful application of Deck.gl, there are many libraries developed based on it like Kepler.gl, which provides more user customization functions.

15.4.1.3

Kepler.gl

Kepler.gl10 is a powerful open source graphics library based on Deck.gl. Different from Deck.gl, it provides various built-in visualization GUIs and data analysis function, and supports users to upload their own data for visualization, which dramatically reduces the programming workload. Meanwhile, Kepler.gl can be easily embedded into users’ web applications by using web frameworks, such as React and Redux. Furthermore, Kepler can display complex 3D scenes smoothly since it 10 Kepler.gl.

https://kepler.gl. Accessed 21 Feb 2019.

240

Z. Gui et al.

(a) road safety in UK

(b) flight paths of Heathrow Airport

(c) cab trips in Manhattan

(d) highway safety in the US

Fig. 15.12 Four examples of Deck.gl (Deck.gl examples overview. (2018). http://deck.gl/#/exa mples/overview. Accessed 21 Feb 2019)

is WebGL-based, which provides hardware-accelerated 3D rendering for a HTML5 Canvas. Figure 15.13a shows a small sample of taxi trip records in New York City. Despite the fact that 100,000 rows of data are contained in this sample, it can still render the layer quickly. Figure 15.13b shows the elevation contours of San Francisco mainland and Treasure Island/Yerba Island. Figure 15.13c shows the congestion of every single street in San Francisco by using a 3D density map. Figure 15.13d is an origindestination map, which shows commuting patterns of England and Wales residential areas using 3D arcs. Libraries such as, Kepler.gl and Deck.gl provide powerful functionalities to support map-based big data visualization, while statistical charts is also indispensable. In that, Plotly.js is very powerful.

15.4.1.4

Plotly.js

Plotly.js11 is based on D3.js and Stack.gl. It uses JavaScript to implement graphical presentations like MATLAB and Python matplotlib on the web, which supports more than 20 graphic styles including 2D and 3D visualizations. It is dedicated to 11 Plot.ly.

https://plot.ly/javascript. Accessed 21 Feb 2019.

15 High Performance Spatiotemporal Visual Analytics Technologies …

241

(a) taxi trip in NYC

(b) elevation contours of San Francisco

(c) street congestion in San Francisco

(d) commute patterns of England and Wales residence

Fig. 15.13 Four examples of Kepler.gl (Kepler.gl demo. https://kepler.gl/demo. Accessed 21 Feb 2019)

the visualization of statistical charts. The interactive effects are abundant enough to meet the needs of statistical analysis of many data types. Plotly.js is not only used in web development projects, but also supports other languages such as R, Python and MATLAB, making code integration of different programming languages more convenient. Plotly.js not only supports map-based visualization, but also supports 3D statistical charts. Figure 15.14a is a bubble map and the size of bubble reveals the city populations in US. Figure 15.14b–c belongs to choropleth maps and the depth of color shows the size of two socioeconomic indicators agriculture exports and North America precipitation respectively. Figure 15.14d is a 3D scatter plot for visualizing high dimensional data. Plotly.js provides statistical charts to meet the visualization requirements of users, while Three.js is more effective when creating 3D visualizations.

242

Z. Gui et al.

(a)

(b)

(c)

(d)

Fig. 15.14 Four examples of Plotly.js, a 2014 US city populations (Bubble Maps. https://plot. ly/javascript/bubble-maps. Accessed 21 Feb 2019), b 2011 US agriculture exports by state (USA Choropleth Map. https://plot.ly/javascript/choropleth-maps/#usa-choropleth-map. Accessed 21 Feb 2019), c North America precipitation (North America Precipitation. https://plot.ly/javascript/sca tter-plots-on-maps/), d a 3D scatter plot (3d Scatter Plots. https://plot.ly/javascript/3d-scatter-plots. Accessed 21 Feb 2019)

15.4.1.5

Three.Js

Three.js12 is a JavaScript 3D library based on WebGL. Utilizing Three.js, developers can implement 3D visualization that run smoothly on the browser, without writing C++ programs. Three.js provides strong capability to create a variety of 3D scenes, including cameras, lights and materials. However, the engine is still under development, and lacks an API and documentation creating a steep learning curve for beginners. Even so, there are still many excellent applications emerging. For instance, a web globe has been built by Owen Cornec aiming to show the scope, variety, and inequality of world economies. It is featured in the Best American Infographics 2016 edition, and won the IEEE Vis and IIB awards.13 As we can see from the Fig. 15.15, it supports several forms of visualization including globe view, map view, country stacks and 3D product space. The dynamic and interactive effect is intuitive and helpful for detecting economic growth and 12 Three.js.

https://threejs.org. Accessed 21 Feb 2019. Books, Houghton Mifflin Harcourt. (2016). The Best American Infographics 2016. https://sfpl.bibliocommons.com/item/show/3273821093_the_best_american_infographics_ 2016. Accessed 21 Feb 2019. 13 Mariner

15 High Performance Spatiotemporal Visual Analytics Technologies …

(a) 3D version product stacks

(b) all products by category

(c) 3D version of product space

(d) stacks products by category

243

Fig. 15.15 The globe of economic complexity (Center for International Development, (2016). The globe of economic complexity. http://globe.cid.harvard.edu. Accessed 21 Feb 2019)

exploring the economic differences between countries. By using Three.js, it will be much easier for users to build 3D visualization scenes. In addition to the visualization libraries on the front-end, the communication technologies between front-end and back-end also play an important role.

15.4.2 Web Framework and Communication Technologies A systematic web visualization solution should not only consider graphic rendering with visualization libraries on the front-end, but also front-end and back-end communication issues, such as data optimizing transferring, network session and transaction management, security, and authentication. Therefore, sophisticated web application frameworks with flexible embedding mechanisms to integrate function modules and technologies for handling such issues are widely adopted. A web application framework is essential for web application development. It can provide self-contained templates and methods for developers to create web pages and deploy web servers. Especially in web applications with complex functions and abundant interaction, web application frameworks can package the complicated operations in encapsulated functions thus alleviating the workload of development

244

Z. Gui et al.

and deployment. With web application frameworks, developers can easily implement functional requirement like database access, data validation and dynamic interaction by using library functions (Sun et al. 2005). Web application frameworks can also optimize system architecture and achieve loose coupling between the front-end and back-end, in turn improving development efficiency and increasing portability (Deeb et al. 2015). There are many JavaScript-based web application frameworks such as Node.js, Vue.js, React.js, Jquery.js and Express.js. Network transport protocols and data format standards are critical for data transmission across different platforms and programming languages. Web service technology is a standardized, platform and language independent method way of integrating web applications (Mockford et al. 2004). Conventional, Simple Object Access Protocol (SOAP) or HTTP REpresentational State Transfer (REST) are used for data transmission by using both human and machine readable formats like XML, JSON, GeoJSON, WKT, GML and other formats that comply with web service standards. Optimized data transmission mechanisms is for reducing transmission latency and promoting user experiences when fetching a large amount of data. A widely used method is progressive transmission by scope of window or levels of detail (LOD) (Levenberg et al. 2002). To implement progressive transmission, specific data structure models are needed to organize data by blocks or levels. The asynchronous data prefetching and caching method in the front-end is also important to reduce the data latency and network load. For example, Google earth uses the quad-tree encoding method to organize map data, prefetch and cache the necessary data to the client to make interaction smoothly. Moreover, data compression can also reduce the data volume in network transmission. The encoding and decoding algorithm must be efficient enough to avoid overlong computing time. For example, Deck.gl loads trip data by compressing the series consisting of longitude, latitude and timestamps into an encoded polyline format that includes arrays of turn points.14

15.5 Enterprise Registration Data Visual Analytics as a Use Case To demonstrate the power of aforementioned technologies in big data visual analytics, we take big enterprise registration data as an example in this section. Enterprise registration data provides fine-grained registration information about each individual enterprise, offering a promising solution for economic geography and regional studies (Duranton and Overman 2005; Marcon and Puech 2010). By analyzing the spatiotemporal distribution of industries at multiple scales, studies of urban spatial structure, urban agglomerations, industrial aggregations, and socioeconomic activities can be furthered (Li et al. 2018). As shown in Fig. 15.16, an HPC-supported web visual analytics framework was developed to store, preprocess, and visually analyze big 14 ibgreen.

(2018). building-apps.md. https://github.com/uber/deck.gl/blob/master/docs/developerguide/building-apps.md. Accessed 21 Feb 2019.

15 High Performance Spatiotemporal Visual Analytics Technologies … Web-based Visual Analytics

Visualization Libraries

Web visualization Layer

Echart

D3.js

Kepler.gl

245

...

Spatiotemporal Distribution Trend Web Framework

Django Angular Vue.js

React

Network Analysis Spatial Clustering Patterns

Data Query Services

Computing Layer

Computation Services

Spatial Query & Analysis Components

Correlation Analysis Support

HPC Framework

Big Data Computing

Apache Spark

Spatial statistics and analysis

MySQL

Storage Layer

Neo4J

Nanocubes

Spatial & Full-text Indexes

HDFS

Data Preprocessing

Big Enterprise Registration Data Management Location

Time

Industrial Types Other attributes

Fig. 15.16 High-performance visual analytics of big enterprise registration data for economic geographic studies

enterprise registration data by use the proposed technologies and tools (Li et al. 2018; Gui et al. 2020b; Wang et al. 2020).

15.5.1 HPC-Accelerated Data Preprocessing Big, fine-grained enterprise registration data that includes time and location information enables us to quantitatively analyze, visualize, and understand the patterns of industries at multiple scales across time and space. However, data quality issues like nonstandarlization, duplication, incompleteness, and ambiguity, hinder such analysis. Data preprocessing become challenging when the volume of data is immense and constantly growing, and may result in out of memory and intolerable calculation time problems. HPC technologies can be used to tackle big data computational issues. We use HPC technologies to fill the industry category and location attribute missing problems for enterprise registration data. Industry categorization is imperative for analyzing the development of different industrial categories, while location accuracy is critical for spatial analysis. For the dataset we collected with about 17 millions of enterprise registration records of mainland China, there is 43.64% of the records has no industrial category values. Approximately 30% of the records only have a street-level address but do not include the province or city to which it belongs. The address ambiguity problem seriously impedes effective geocoding. A big data imputation workflow based on cluster computing technologies is utilized to impute enterprise registration data (Li et al. 2018). The proposed imputation workflow is illustrated in Fig. 15.17. Industrial category imputation is treated as a short text classification problem consisted of input

246

Z. Gui et al. Incomplete enterprise registration data

Filtering

Apache Spark cluster

Support

Data used for industrial category imputation

Data used for location imputation

Industrial category imputation

Location imputation

Input vector construction

Postcode imputation

Support

Bare-metal cluster

AD imputation Classification methods filling

Geocoding

Fig. 15.17 Workflow and computational framework for imputation of incomplete enterprise registration data

vector construction and classification, and solved in Apache Spark. Location imputation uses a bare-metal computing cluster and contains three steps, i.e., postcode imputation, Administrative Division (AD) imputation, and geocoding. Experiments demonstrate the feasibility and efficiency of the proposed imputation framework for big geotagged text data. It cost about 1,600 s for industrial category imputation and achieved about 77.4% accuracy in average using Logistic Regression; 97% of all records were geocoded using the proposed location imputation method while the geocoding rate was only 53% without location imputation. The HPC-based framework efficiently handles the data incompleteness and location ambiguity problems, and make further spatiotemporal visual analysis of industries possible.

15.5.2 HPC-Enabled Dynamic Visual Analytics Based on the imputed data, we developed a web-based visual analytics system by utilizing aforementioned data storage, indexing, computing and web-based visualization technologies in an integrated manner under the proposed computing framework. With the supports of these enabling technologies, this system is capable to provide four types analyses on-the-fly, including spatiotemporal distribution trends, clustering patterns, spatial correlations, and network relations.

15 High Performance Spatiotemporal Visual Analytics Technologies …

247

Fig. 15.18 The spatiotemporal distribution of industries in selected cities of China

15.5.2.1

Spatiotemporal Distribution Trend

Analysis of industrial spatiotemporal distribution has been highlighted in economic geography, studies of urban spatial structure, and regional policy studies (Li et al. 2015; Parr 2014). To analyze the overall spatiotemporal density distribution trend of different kinds of industries visually, Nanocubes are applied to store and index big enterprise registration data. Through data-cube mechanisms, Nanocubes slice and dice data with respect to space, time, or other attributes, supporting real-time viewing of tens of billions points on a web browser over heatmaps of leaflets. In Fig. 15.18, the spatial distribution of all industries in China in 2015 is visualized along with the spatiotemporal distribution of industries in different cities. The yellow line is Hu’s (Heihe-Tengchong) line (Hu 1935) often used in population geography. About 94.4% of population in China was distributed on the southeast of this line at 2010 (Chen et al. 2016). About 92% of enterprises were distributed on the southeast of this line, revealing a spatial correlation between population and economic activity, as well as the sharp gap on east-west industrial development in China. The moving economic barycenter and changes of industrial distributions reflect the rising and extending of industries in space (Wang et al. 2006). To explore the spatiotemporal moving of the economic barycenter, standard deviation ellipses and centroid trajectories are visualized (Fig. 15.19). Intensive computing is involved to compute centroids and parameters of standard deviation ellipses within arbitrary regions containing millions of points. Apache Spark is used to accomplish such computation tasks on-the-fly for supporting real-time visualization, and a RESTful API is applied to query the computed results (Song et al. 2017). Trajectories and ellipses are visualized using Leaflet and Baidu ECharts.15 15 Echarts.

https://echarts.baidu.com. Accessed 21 Feb 2019.

248

Z. Gui et al.

Fig. 15.19 Standard deviation ellipses and centroid trajectory of industries in Chongqing, China

15.5.2.2

Spatial Clustering and Aggregation Pattern

Spatial concentration of industries may affect the competitive advantage of regional economies in studies of agglomeration economies (Porter 2014; Tian et al. 2017). To explore the clustering patterns of millions of enterprises, we used HPC-supported spatial clustering technologies. Apache Spark graph computing and KD-tree has been used to accelerate the computing process of DBSCAN (Gao et al. 2017), nevertheless the progress is needed to support real-time clustering. Therefore, we developed grid-based multiscale clustering method (Gui et al. 2020a) because of its lower computation complexity and advantages in analyzing clustered patterns across multiscale (Dan et al. 2006). The grid-based multiscale clustering can be accomplished in a near real-time fashion. Rendering millions of clusters on the client side is also a nonnegligible challenge. As shown in Fig. 15.20, we used the Kepler library to visualize the clustered results of enterprise registration data. Figure 15.20a shows hundreds of thousands of clustering results across China, while Fig. 15.20b illustrates the coexistence of clusters in Yangtze River delta area. We also used Nanocubes and Apache Spark to analyze the industrial spatial agglomeration, qualitatively and quantitatively (Wang et al. 2020). As shown in Fig. 15.21, Nanocubes depict the spatial distribution of different industries in 2013. We found that different industries have different spatial cluster patterns. The industries shown in Fig. 15.21b are geographically concentrated in the capital or large cities, while the industries in Fig. 15.21a are more evenly dispersed, revealing the patterns of industrial aggregations visually. To reveal the geographic concentration of economic activities quantitatively, we compared the aggregations of these industries in Guangdong, China. We developed distributed Ripley’s K functions based on Apache Spark to accelerate spatial point pattern analysis (Gui et al. 2020b). As shown in Fig. 15.21c, d, industries on the left figure are geographically concentrated within

15 High Performance Spatiotemporal Visual Analytics Technologies …

(a) Country-level clustering in mainland China

249

(b) Regional-level clustering in Yangtze River delta area

Fig. 15.20 Visualization of spatial clustering of enterprises in mainland China using Kepler

(a) Visualization of AFAHF, Guangdong, China

(b) Visualization of Social service, Guangdong, China

(c) Ripley’s K result of AFAHF, Guangdong, China

(d) Ripley’s K result of Social service, Guangdong, China

Fig. 15.21 Visualization and quantitative analysis of enterprise spatial cluster patterns using Ripley’s K (AFAHF denote Industries of Agriculture, Forest, Animal Husbandry, and Fishery)

250

Z. Gui et al.

a larger space (within a distance of 520 km) than industries on the right figure in Guangdong (170 km). This result verifies that the visualization result in Fig. 15.21a, b that social service industries are more geographically concentrated than agriculture, forest, animal husbandry and fishery industries. Enterprise clustering pattern visualization analysis helps us explore the aggregation phenomenon of industries at multiple scale. In the following sub-section, we demonstrate how to use visualization to facilitate the spatial correlation study of different industries across space and time.

15.5.2.3

Spatial Autocorrelation and Correlation

Spatial autocorrelation of industries indicates how industries of the studied area are related to industries of neighbored regions (Cui et al. 2017). To reveal the spatial autocorrelation of different categorical enterprises, we calculated the z-values for the Getis-Ord General G (Dubin 1998) of different industries, based on space gridding using Apache Spark. Hot-spot distributions of different kinds of industries were visualized using Leaflet and ECharts. As shown in Fig. 15.22a, b, different kinds of industries display different spatial patterns. Primary industries are more dispersed in space while the second industries are geographically concentrated in the main urban areas. The z-values on the right part of each figure illustrate the spatial autocorrelation variations across different spatial scales. Figure 15.22c, d show that the spatial distributions of industries and their spatial autocorrelation change over years, and the

(a) primary industry during 2011-2015

(b) second industry during 2011-2015

(c) primary industry during 2001-2005

(d) primary industry during 2006-2010

Fig. 15.22 Spatial autocorrelation and hot-spots visualization of enterprises in Chongqing, China

15 High Performance Spatiotemporal Visual Analytics Technologies …

251

Fig. 15.23 Province level correlation exploration between number of industries, population, GDP, and GDP per capita by associating the map with scatter diagrams

z-scores in figures indicate the degree of aggregation for different industry categories in different periods. In addition to spatial autocorrelation analysis, we also explored correlation between industries and other statistical indexes using visual analytics. As shown in Fig. 15.23, by using the choropleth map and scatter diagrams from D3.js, the correlations among number of industries, population, GDP, and GDP per capita were visualized. By linking the map and the scatter diagrams together, correlations between different indexes and regions can compared and analyzed in an interactive manner. The aforementioned cases of visual analysis are conducted in physical or geographical space, we furtherly investigate the enterprise relations in network space in the following sub-section.

15.5.2.4

Enterprise Network Relation

By using network visual analysis, relations, including supply chains between upstream and downstream industries, cooperation and competition between enterprises can be explored. As shown in Fig. 15.24, we visualized enterprises as nodes of networks, and constructed an edge between two enterprises if they cooperate. We applied community detection algorithm on the constructed networks, and uncovered the most influential enterprise communities for different industrial categories. In this exploration, Neo4J is used to store, and manage the enterprise relation data as a big graph. Graph algorithms included in Neo4J are used to help detect hard-to-find patterns and structures in the connected data. A force-directed graph from D3.js was used to show the discovered patterns.

252

Z. Gui et al.

Fig. 15.24 Exploring network relation of enterprises by using force-directed graph and Neo4J graph database

15.6 Conclusions In this chapter, we introduce high-performance visual analytics technologies for enabling big socioeconomic data analysis from perspective of system architecture. The depicted technologies and application demonstration might benefit researchers and developers who is suffering from the big data computing and analysis issues in policy making and inter-disciplinary research. It might give insight on how to utilize the latest technologies, software packages and frameworks for data storage, computing, and web visualization to build such a visual analytics system. We take enterprise registration data as an example to demonstrate the capabilities and potential applications of the proposed high-performance visual analytics framework. HPCbased data imputation methods show it power in enabling and accelerating data preprocessing for large data volume. The feasibility of web-based dynamic visual analytics is verified by four exemplary case studies, including (1) spatiotemporal distribution trend analysis using heatmaps and histograms provided by Nanocubes, as well as economic barycenter trajectories and standard deviation ellipses; (2) spatial clustering pattern using grid-based multi-scale clustering and aggregation pattern using Ripley’s K function; (3) spatial autocorrelation using Getis-Ord General G function and correlation exploration among different statistical indexes by associating maps with statistical plots; (4) network relation analysis using community detection algorithm with the support of graph database and force-directed graph. The introduced technologies and proposed framework are not limited to the illustrated case study, it can be adopted to visual analytics scenarios of other big socioeconomic data. To fully utilize the power of the introduced technologies, the entire technology stack needs to be designed thoroughly. The data indexing and storage method need to be carefully investigated in term of the concrete data models and application

15 High Performance Spatiotemporal Visual Analytics Technologies …

253

scenarios. The HPC-supported analysis algorithm is capable to leverage the cuttingedge computing technologies for computing intensive applications. The application framework, visualization libraries, data transmission and rendering strategies are indispensable to develop intuitive visualization effects and user-friendly interaction functions to better support exploratory visual analytics. With the wide adoption of cloud computing technologies and increasing demands on big data online fusion and analysis, web-based visual analytics functions might evolve into cloud service in near future, i.e., Visual Analytics as a Service (VAaaS). To achieve this goal, generic visual analytics capacity must be divided into fine-grain, application and domain-specific cloud service components or functions elaborately. Meanwhile, the issues relevant to service delivery, computing resource provisioning, payment, and the protection of data privacy are also need to be carefully designed.

References Balasubramanian, L., & Sugumaran, M. (2013). A state-of-art in r-tree variants for spatial indexing. International Journal of Computer Applications, 42(20), 35–41. Bender, M., Klein, R., Disch, A., & Ebert, A. (2000). A functional frame-work for web-based information visualization systems. IEEE Transac-tions on Visuali-zation & Computer Graphics, 6(1), 8–23. Chen, M., Gong, Y., Li, Y., Lu, D., & Zhang, H. (2016). Population distribution and urbanization on both sides of the Hu Huanyong Line: Answering the Premier’s question. Journal of Geographical Sciences, 26(11), 1593–1610. Cui, Z., Xie, G., Gui, Z., & Wu, H. (2017). Analyzing the spatiotemporal distribution of different industries in wuhan city using enterprise registration data. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W7, 5–10. Dan, K., Galun, M., & Brandt, A. (2006). Fast multiscale clustering and manifold identification. Pattern Recognition, 39(10), 1876–1891. Deeb, R., Ooms, K., Brychtová, A., Van Eetvelde, V., & De Maeyer, P. (2015). Background and foreground interaction: influence of comple-mentary colors on the search task. Color Research & Application, 40(5), 437–445. Dubin, R. A. (1998). Spatial autocorrelation: A primer. Journal of Housing Economics, 7(4), 304– 327. Duranton, G., & Overman, H. G. (2005). Testing for localization using micro-geographic data. Review of Economic Studies, 72(4), 1077–1106. Eldawy, A. (2014, June). SpatialHadoop: towards flexible and scalable spatial processing using mapreduce. In Proceedings of the 2014 SIGMOD PhD symposium (pp. 46–50). ACM. Eldawy, A., & Mokbel, M. F. (2015, April). Spatialhadoop: A mapreduce framework for spatial data. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on (pp. 1352–1363). IEEE. Fahmy, M. M., Elghandour, I., & Nagi, M. (2017). CoS-HDFS: co-locating geo-distributed spatial data in hadoop distributed file system. Ieee/acm, International Conference on Big Data Computing Applications and Technologies (pp. 123–132). IEEE. Fathy, Y., Barnaghi, P., & Tafazolli, R. (2017). Distributed spatial indexing for the Internet of Things data management. IEEE: Integrated Network and Service Management. Fox, A., Eichelberger, C., Hughes, J., & Lyon, S. (2013, October). Spatio-temporal indexing in nonrelational distributed databases. In Big Data, 2013 IEEE International Conference on (pp. 291– 299). IEEE.

254

Z. Gui et al.

Friedman, E., & Tzoumas, K. (2016). Introduction to Apache Flink: Stream Processing for Real Time and Beyond. “ O’Reilly Media, Inc.”. Gao, X., Gui, Z., Long, X., Li, F., Wu, H., & Qin, K. (2017). KDSG-DBSCAN: A High Performance DBSCAN Algorithm Based on KD-Tree and Spark GraphX. Geography and Geo-Information Science, 33(6), 1–7. Gui, Z., Peng, D., Wu, H., Long, X. (2020a). MSGC: Multi-Scale Grid Clustering via Analytical Granularity and Visual Cognition for Detecting Hierarchical Spatial Patterns. Future Generation Computer Systems, 112, 1038–1056. Gui, Z., Wang, Y., Cui, Z., Peng, D., Wu, J., Ma, Z., Luo, S., Wu, H. (2020b). Developing Apache Spark based Ripley’s K Functions for Accelerating Spatiotemporal Point Pattern Analysis. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B4-2020, 545–552. Hennig, L., Thomas, P., Ai, R., Kirschnick, J., Wang, H., Pannier, J.,… & Uszkoreit, H. (2016). RealTime Discovery and Geospatial Visualization of Mobility and Industry Events from Large-Scale, Heterogeneous Data Streams. Proceedings of ACL-2016 System Demonstrations, 37–42. Hu, H. Y. (1935). The distribution of population in china, with statistics and maps. Acta Geographica Sinica, 15(2), 1–24. Hu, F., Yang, C., Jiang, Y., Song, W., Duffy, D., Schnase, & J., Lee, T. (2018). A hierarchical indexing strategy for optimizing Apache Spark with HDFS to efficiently query big geospatial raster data. International Journal of Digital Earth. Kamel, I., Talha, A. M., & Aghbari, Z. A. (2017). Dynamic spatial index for efficient query processing on the cloud. Journal of Cloud Computing, 6(1), 5. Karimov, J., Rabl, T., Katsifodimos, A., Samarev, R., Heiskanen, H., & Markl, V. (2018). Benchmarking Distributed Stream Processing Engines. arXiv preprint arXiv:1802.08496. Kini, A., & Emanuele, R. (2014). Geotrellis: Adding geospatial capabilities to spark. Spark Summit. Levenberg, J. (2002). Fast view-dependent level-of-detail rendering using cached geometry. Visualization, 2002. Vis (pp. 259–266). IEEE. Li, F., Gui, Z., Wu, H., Gong, J., Wang, Y., Tian, S., et al. (2018). Big enterprise registration data imputation: supporting spatiotemporal analysis of industries in china. Computers, Environment and Urban Systems, 70, 9–23. Li, J., Zhang, W., Chen, H., & Yu, J. (2015). The spatial distribution of industries in transitional China: A study of Beijing. Habitat International, 49, 33–44. Marcon, E., & Puech, F. (2010). Measures of the geographic concentration of industries: improving distance-based methods. Journal of Economic Geography, 10(5), 745–762. Mockford, K. (2004). Web services architecture. BT Technology Journal, 22(1), 19–26. Parr, J. B. (2014). The regional economy, spatial structure and regional urban systems. Regional Studies, 48(12), 1926–1938. Porter, M. E. (2014). Competitive advantage, agglomeration economies, and regional policy. International Regional Science Review, 19(1), 85–90. Rigaux, P., Scholl, M., & Voisard, A. (2002). Spatial Databases: with application to GIS (p. 410). San Francisco: Morgan Kaufmann. Song, Y., Gui, Z., Wu, H., & Wei, Y. (2017). A web-based framework for visualizing industrial spatiotemporal distribution using standard deviational ellipse and shifting routes of gravity centers. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W7, 129–135. Stojanovic, N., & Stojanovic, D. (2013). High–performance computing in GIS: techniques and applications. International Journal of Reasoning-based Intelligent Systems, 5(1), 42–49. Sun, L., Lu, B., & Sun, J. (2005). Design and study of web application framework based on struts. Computer Engineering, 31(8), 57–60. Theodoridis, Y., Stefanakis, E., & Sellis, T. (2000). Efficient cost models for spatial queries using r-trees. Knowledge & Data Engineering IEEE Transactions on, 12(1), 19–32. Tian, S.,Wang, J., Gui, Z.,Wu, H.,&Wang, Y. (2017). A case study: exploring industrial agglomeration of manufacturing industries in shanghai using duranton and overman’s k-density function. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLII-2/W7, 149–154.

15 High Performance Spatiotemporal Visual Analytics Technologies …

255

Wang, X., Dian-Ting, W. U., & Xiao, M. (2006). Industrial development and moving of Chinese economic barycenter. Economic Geography. Wang, Y., Gui, Z., Wu, H., Peng, D., Wu, J., Cui, Z. (2020). Optimizing and Accelerating SpaceTime Ripley’s K Function based on Apache Spark for Distributed Spatiotemporal Point Pattern Analysis. Future Generation Computer Systems, 105, 96-118. Ye, X., Shi, X., & Chen, Z. (2017). Scalable near-repeat and event chain calculations over heterogeneous computer architecture and systems. Big Earth Data, 1(1–2), 191–203. Yu, J., Zhang, Z., & Sarwat, M. (2018). Spatial data management in apache spark: the GeoSpark perspective and beyond. GeoInformatica, 1–42. Zhang, F., Zheng, Y., Xu, D., Du, Z., Wang, Y., Liu, R., et al. (2016). Real-time spatial queries for moving objects using storm topology. ISPRS International Journal of Geo-Information, 5(10), 178. Zhang, X and Du, Z. (2017). Spatial Indexing. The Geographic Information Science & Technology Body of Knowledge (4th Quarter 2017 Edition), John P. Wilson (ed). https://doi.org/10.22224/ gistbok/2017.4.12. Zia, K., Farrahi, K., Riener, A., & Ferscha, A. (2013). An agent-based parallel geo-simulation of urban mobility during city-scale evacuation. Simulation, 89(10), 1184–1214.

Chapter 16

Demystifying the Inequality in Urbanization in China Through the Lens of Land Use Jinlong Gao and Jianglong Chen

16.1 Introduction Regional inequality is an important aspect of academic inquiry and is one of the major concerns facing governments as it may threaten national unity and social stability (Ravallion 2014; Wei 2015; Iammarino et al. 2018). The trends and driving forces underlying regional inequality have been the subject of heated debates, especially after the late 1980s (e.g., Liu 2006; Florida and Mellander 2016; Paredes et al. 2016; Lee et al. 2018). As the neoclassical growth model predicts, poor nations/regions tend to catch up with the rich ones in terms of the level of per capita product or income, because of the relative homogeneity in technology, preferences, and institutions (Martin and Sunley 1998; Scott 2000; Wei 2000; Rey and Janikas 2005). While some support the theory of neoclassical convergence, others find a lack of convergence and that regional inequality even increased in those developing economies such as China and India (Liao and Wei 2012; Ravallion 2014; Xie and Zhou 2014; Wei 2017; Yenneti et al. 2017). From a methodological perspective, the commonly used Gini coefficient, Theil index, and, coefficient of variation, which can well examine the temporal variation of regional inequality based on social-economic data, have been challenged for their ignorance of geographical space. Specifically, one can hardly figure out exactly where the gap is, but merely know there is a gap (Li et al. 2015; Gao et al. 2019a). As Li and Gibson (2013) argued much of the apparent increase in inter-provincial inequality was a statistical artifact caused by the distortion of non-hukou migrants J. Gao · J. Chen (B) Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing 210008, China e-mail: [email protected] J. Gao e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_16

257

258

J. Gao and J. Chen

in China. Others have also argued that gross domestic product (GDP) statistics data is distorting under the pressure of political achievement competition among local government officials in China (Liu et al. 2013), and even suggested to abandon GDP as a measure of national success (Costanza et al. 2014). Fortunately, we can acquire a relatively stable urban land use statistic from remote sensing images. And Sawyer (1975) has pointed four decades ago that urban form flowed out of and must remain consistent with the basic economic structure of the society of which it was a part. And urban land can thus be employed as an important indicator of regional/urban development, particularly under the new urbanization background (Lin 2014; Gao et al. 2015; Lee et al. 2016; Li et al. 2018). Specifically, the accretion and replacement of urban land can well characterize the flow/transformation of population and energy, and can provide clues for further magnitude the evolution of inequality (Bai et al. 2014; Ding and Zhao 2014; Gao et al. 2017, 2020a). Though scholars have argued that a polarized pattern of demographic urbanization in China has been forming surrounding those mega-city regions (Fang et al. 2015), and claimed that there should be a generally universal framework of urbanization across the country (Wang and Liu 2015; Chen et al. 2018). These are far from the truth from the perspective of land urbanization (Li et al. 2017; Wei et al. 2017; Li et al. 2018). On the contrary, relatively little has been known about the disparity among regions in China in terms of the land use, more efforts are still needed to examine urban inequality and differentiation (Lang et al. 2018; Zeng et al. 2018; Gao et al. 2019a). Coincidently, unfolding complex land urbanization makes it clear that the quantitative understanding, optimization, and adjustment of land use pattern of cities is a major issue for sustainable land use (Verburg et al. 2004). Nevertheless, the lack of solid understanding of patterns makes it difficult to address the ongoing challenges of the volatility and complexity of land use policy in China (Long 2014). The situation consequently poses a number of challenging questions to the country: (1) How to correctly depict the general situation of land urbanization and consequential spatial phenomenon in China? (2) How to adapt a quantitative approach to address the distinctive patterns of land urbanization? and (3) What are the underlying drivers of land urbanization patterns? With the introductory remarks in mind, we analyze the patterns of inequality in urbanization for the period 2000–2015, from a land use perspective, and proceed with the following agenda. The next section presents a brief discussion of data and methodology. Then, we start with examining the pattern and the evolution process of land urbanization at county level. Thereafter we model the determinants of land urbanization under the given analytic framework. Finally we conclude with major findings and policy implications.

16 Demystifying the Inequality in Urbanization …

259

16.2 Data and Methodology 16.2.1 Data Sources Employing the data of urban and rural construction land (including urban, industrial and mining, rural residential, and transportation lands) acquired from remote sensing images, this study mainly analyze and discuss spatial differential characteristics of land urbanization in China at the county level since 2000. Socioeconomic data required for influencing factors are extracted from the Statistical Yearbook of Social Economy of Counties (cities) in China in 2001 and Statistical Yearbook of Counties in China (for counties and cities) in 2016. In the space expression section, data on traffic networks, terrain conditions, precipitation, and the administrative division vector boundary are provided by the Resource and Environment Data Cloud Platform of Chinese Academy of Sciences (http://www.resdc.cn/). Combining the particularity of land urbanization in developing counties and data availability, land urbanization rate (LUR) was employed as the dependent variable. And independent variables were selected from the terms of population size, economic level, industrial structure, urban characteristics, and geographical location (see Table 16.1). In this chapter, specific variable selection was based on the following assumptions: (1) Urban population growth is the main demand for urban land (Wu et al. 2015; Chen et al. 2016). And the larger the population size, the higher the corresponding level of land urbanization is (Deng et al. 2008; Wu and Zhang 2012; Gao et al. 2015). (2) Economic development can effectively increase the income of urban residents, improve living conditions in cities and towns, and simulate the transfer of agricultural populations to cities and towns, thus increasing the demand for land for housing, industry, and transportation (Deng et al. 2010; Chen et al. 2016; Li et al. 2018). (3) Land urbanization inevitably promote the transformation of industrial structures by reducing the share of agriculture sectors (Liu et al. 2014; Chen et al. 2016). The development of the service industry and the improvement of intensive level of manufacturing sector may have a negative impact on regional land urbanization. (4) Urban characteristics including the administrative level and population density have also been recognized to have significant impact on the expansion of urban land (Gao et al. 2014; Li et al. 2015). And the higher the administrative level or density of a city, the stronger ability of agglomerating resources and the higher level of development it has. This will unsurprisingly accelerate the rate of land urbanization to a certain extent. 5) Favorable physical conditions (i.e., geographical location and terrain) can better meet the requirement of urban land expansion (Liao and Wei 2014; Chen et al. 2016), which is conducive to land urbanization.

260

J. Gao and J. Chen

Table 16.1 Influencing factors of land urbanization at county level Categories

Variables

Definitions

Population growth

Ration of demographic urbanization (DUrban)

Urban population/permanent population

Economic development

Per capital GDP (PGDP)

Gross domestic production (GDP)/permanent population

Fixed investment (FInvest) Total amount of fixed investment/GDP Industrial structure

Urban characteristics

Fiscal revenue (Finance)

Budget revenue/GDP

Industrialization (Indust)

Non-agricultural value added/GDP

Intensification (ADSIndust)

Gross industrial output value above designated size/GDP

Development of service (Service)

Added value of tertiary industry/non-farming gross product

Administrative hierarchy (Admin)

Districts in province-level municipality, sub-provincial city, provincial capital city, general city and the county (county-level city) have a value of 5–1

Population density (PDen) Permanent population/area of district or counties Geographical features

Topographic relief (Terrain)

Stemming from Feng et al. (2007)

Annual precipitation (Precipit)

County average annual precipitation

Density of roads (Roads)

Total road mileage/area of district or counties

Central region(Central)

Dummy variable, counties in central are 1, others are 0

Western region(West)

Dummy variable, counties in west are 1, others are 0

Northeastern region(NEast)

Dummy variable, counties in northeast are 1, others are 0

16.2.2 Methodology Land conversion index (Lin et al. 2018) and land urbanization quality (Zhang and Wang 2018) were used for reference to calculate the index for LUR in counties, that is, the proportion of urban, industrial and mining, and transportation land used in cities and towns relative to the total urban and rural construction land (Yang et al. 2018). This index not only describes the level of land urbanization, but also reflects the changes in land use in the urbanization process. The formula LUR is as follows: LUR =

ul + il + tl ul + il + tl + rl

16 Demystifying the Inequality in Urbanization …

261

where ul denotes the scale of urban land, il denotes the scale of industrial and mining land, tl denotes the scale of transportation land, and rl denotes the scale of land for rural residents. Applying spatial analyst in software ArcGIS 10.2, the land urbanization pattern in China at the county level during 2000–2015 was presented. In this chapter, the county was taken as the basic research unit. Considering the restrictions of linear regression model (LRM) for estimating spatial characteristics of independent variables and “global” estimation, this chapter combined the ordinary least square (OLS) and geographically weighted regression (GWR) models to measure the influence of the above-mentioned factors on land urbanization. Given there are series of observed values for explanatory variables x ij and explained variables yij with i = 1, 2…, m and j = 1, 2…, n, the classical global regression model is shown as follows: y j = β0 +

n

β j xi j + εi , (i = 1, 2, . . . , m; j = 1, 2, . . . n)

j=1

where ε denotes the error of the whole regression model, and regression coefficient β is assumed to be a constant. OLS is generally used to estimate model parameters and GWR expands the OLS model. The regression coefficient is no longer the assumed constant β 0 obtained from global information, but is β 0 obtained from conducting local regression estimation on a sub-set of data approximate to observed values. β 0 varies with geographic locations. The specific GWR can be described as follows: yi = β0 (m i , n i ) +

n

β j (m i , n i )xi j + εi

j=1

where (m i , n i ) denotes the central geographic coordinates of the ith county unit, and γ j (m, n) denotes the value for continuous function γ j (m, n) of variables xi j at the ith county unit.

16.3 Spatial Inequality in Land Urbanization 16.3.1 Land Urbanization Patterns by County in 2000 According to the remote sensing data, LUR in China was 26.33% in 2000. And they are the eastern coastal regions and old industrial bases in northeast China those have the highest land urbanization levels of 31.26% and 26.03%, respectively. While the central and western regions have relatively low land urbanization levels of 20.74 and 23.64%. This is basically consistent with the regional pattern of population urbanization (Fang et al. 2015). Specifically, land urbanization levels in counties are generally lower, and the land urbanization levels in over 75% of the 4342 counties are

262

J. Gao and J. Chen

lower than 50% (Fig. 16.1). With reference to the stage of total urbanization in China, land urbanization levels were classified into 5 grades, namely low (≤10%), mediumlow (10%–30%), medium (30%–50%), medium-high (50%–70%), and high (>70%). As Fig. 16.1 indicates, the number of districts/counties with low land urbanization is roughly equivalent to those with medium low land urbanization, with the proportion of the both being about 30%. Conversely, the number of districts/counties above medium land urbanization is relatively small, with proportions of 17.87, 10.69, and 14.03%. Geographically, the north-south differentiation pattern of the land urbanization level in Chinese counties is more apparent than the east-west or coast-inland one. Levels of land urbanization in southern counties with Qinling Mountains-Huaihe River as the boundary are obviously higher than those in northern counties. Urban agglomeration areas such as the Yangtze River Delta, Pearl River Delta, West Coast of Taiwan Straits, Chengdu-Chongqing, and Middle Reaches of Yangtze River are manifested as areas with high land urbanization levels followed by East Liaoning Peninsula and Shandong Peninsula. In contrast, traditional agricultural areas such as Huang-Huai-Hai, Northeast China, and Shaanxi-Gansu-Ningxia have low levels of land urbanization (Fig. 16.2). 1 0.9

Accumulation of counties

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Ratio of land urbanization (LUR) Fig. 16.1 Lorenz curve of land urbanization at the county level in 2000

0.9

1

16 Demystifying the Inequality in Urbanization …

263

Fig. 16.2 Spatial patterns of land urbanization at the county level in 2000

16.3.2 Land Urbanization Patterns by County in 2015 With further acceleration of population urbanization, the level of land urbanization increases correspondingly. East China has the highest level, 43.29%, followed by its west and central counterparts with 42.22% and 34.57%, respectively. However, due to continuous population shrinkage and economic decline, land urbanization in northeast China is relatively slow, with LUR increasing by only 4.56% in 15 years. During this period, the nation proposed the new-type urbanization strategy of “coordinating urban and rural development, and promoting urbanization actively and steadily,” to slow rapid urbanization and narrow the differences in land urbanization level among regions. The variable coefficient of LUR in counties decreased from 0.775 in 2000 to 0.584 in 2015, and regional differences of land urbanization tended to converge (Fig. 16.3). As Fig. 16.4 maps, the number of districts/counties in which land urbanization is at the low or medium low levels decreased by nearly 20%, from 2493 in 2000 to 1632 in 2015. The number of districts/counties in which land urbanization is above medium high levels accounted for over 40% of all study units. The number of districts/counties in which LUR was over 70% increased by 11.32% compared with that in 2000, and overall LUR in counties improved significantly. Regions with

264

J. Gao and J. Chen

1 0.9

Accumulation of counties

0.8 0.7 0.6

0.5 0.4 0.3 0.2 0.1 0 0

0.1

0.2

0.3 0.4 0.5 0.6 0.7 0.8 Ratio of land urbanization (LUR)

0.9

1

Fig. 16.3 Lorenz curve of land urbanization at the county level in 2015

high land urbanization levels include the region south of Qinling Mountains-Huaihe River, which expanded, and the southeastern coastal areas, such as areas of Pearl River Delta and west coast of the Taiwan Straits. The main urban agglomerations along the Yangtze River Economic Belt, became polar land urbanization nuclei. Moreover, in northwestern regions and the Inner Mongolia-Shanxi area, there were large areas with high land urbanization. The probable reason might be that a large amount of petroleum or coal resource-based cities are concentrated in these areas, and large-scale exploitation of resources contributed to the increase of industrial and mining land use.

16 Demystifying the Inequality in Urbanization …

265

Fig. 16.4 Spatial patterns of land urbanization at the county level in 2015

16.3.3 Evolution of Land Urbanization Patterns in Chinese Counties From 2000–2015, LUR in China increased from 26.33 to 39.63%, with an average annual growth of 2.77%. Due to their lower base, counties in central and western China have witnessed the rapidest growth. Whilst, the growth rate in the northeast is the lowest, which might be a result of economic recession and population decline (Table 16.2). Based on the growth rate of demographic urbanization and taking conclusions remarked by existing studies on the coupling relationship between population and land urbanization into consideration, we herein divide the growth rate of Table 16.2 Regional difference of land urbanization, 2000–2015 Region

2000 (%)

2015 (%)

2000–2015 (%)

Average growth rate (%)

Eastern

31.26

43.25

11.99

2.19

Central

20.74

34.57

13.83

3.47

Western

23.64

42.22

18.58

3.94

Northeastern

25.03

30.59

5.56

1.35

266

J. Gao and J. Chen

land urbanization into 5 groups, namely decrease (≤0), slowly increase (≤0–1%), increase (≤1–3%), rapid increase (≤3–5%), and fantastically increase (>5%). Results of spatial statistics and linear interpolation show that the annual average growth rate of land urbanization in over 20% of counties is over 5%, followed by those with an annual average growth rate of 3–5%, accounting for 18.42%. While the proportion of counties with annual average rate of land urbanization under 1% is 14.03%, representing a relatively high level of land urbanization in the whole. According to division of “Hu Line”, LUR in the northwestern part increases from 26.23% in 2000 to 38.39% in 2015, with an annual growth rate of 2.57%; while LUR in the southeast increases from 21.33% to 40.77%, with an annual growth rate of 4.41%. As Fig. 16.5 maps, counties with higher LUR are primarily concentrated in the middle reaches of Yangtze River, Wanjiang regions along Yangtze River in Anhui province, Nanchang-Jiujiang of Jiangxi province, central regions of Yunnan province, Gansu-Ningxia region, central region of Inner Mongolia, and the central region of Xinjiang. And regions surrounding those provincial capitals such as Nanjing, Ji’nan, Hefei, Nanchang, Taiyuan, Hohhot, and Guiyang are demonstrated as hotspots of land urbanization in the past one and a half decades.

Fig. 16.5 Changing patterns of land urbanization at the county level, 2000–2015

16 Demystifying the Inequality in Urbanization …

267

16.3.4 Land Urbanization Types in Chinese Counties By overlaying the maps of land urbanization in 2000 (including 3 basic levels of low and medium-low, medium, medium-high and high) and the map of land urbanization change from 2000–2015 (including 5 growth rates), we set 15 types of land urbanization. As Fig. 16.6 implies, counties with both medium basic level and high growth rates (i.e., increase, rapid increase, and fantastically increase) account for as much as 44.29%, and mainly distribute on the peripheries of Yangtze River Delta, Pearl River Delta and other urban agglomerations mentioned in the National New-type Urbanization Plan (Fang et al. 2015). And 715 counties (about 16.45% of the whole) with low LUR in 2000 have witnessed the development of land urbanization with the rate over 1% till 2015. In particular, counties in the Huang-Huai-Hai plain and central region of Inner Mongolia shows an obvious trend of catching up in terms of the rate of land urbanization. In addition, about 16% of all the counties witnessed a gradually decrease of LUR during 2000–2015, the majority of which were counties

Fig. 16.6 Development types of land urbanization at the county level. Note L, M, and H denote the low, median, and high levels of land urbanization in 2000; D, S, I, R, and F denote decrease, slowly increase, increase, rapid increase, and fantastically increase during 2000–2015

268

J. Gao and J. Chen

with high basic level and concentrated in northeastern China. On the whole, land urbanization in Chinese counties shows the trend of catching up in the convergent manner of “the lower, the faster and the higher, the slower”.

16.4 Determinants of Spatial Inequality in Land Urbanization 16.4.1 Comprehensive Analysis of Elements Based on the OLS Model Employing the Z-score method, we standardized 15 index variables listed in Table 16.1 with the help of SPSS software. Thereafter, variance inflation factor (VIF) was applied to conduct multicollinearity tests. VIFs of all the variables in 2000 and 2015 are smaller than 3, with no multicollinearity existing among variables (Table 16.3). According to fitting results of the OLS model, in both the years, the above-mentioned variables can better explain the inequality patterns of land urbanization in counties, and overall, they both reach an extremely significant level (p 1, then the actual observation number of samples in which the education level of father is i and the education level of children is j will be greater than the theoretical expectations, so when the education level of father is i, the education level of children is more likely to be j, and vice versa. With intergenerational education index, we can calculate the intergenerational education inflow index (IEII) and intergenerational education outflow index (IEOI). The formula of IEII is: ei j /(m − 1) (17.2) Ij = i= j

The intergenerational IEII measures the possibility that the education level of children is j when the education level of father is not j. The smaller the IEII, the smaller the intergenerational mobility of such education level, the greater the barriers for children to get the education level j when the education level of father is not j. The formula of IEOI is: ei j /(m − 1) (17.3) Oi = j=i

The IEOI measures the possibility that the education level of children is not i when the education level of father is i. The greater the IEOI, the greater the intergenerational mobility of such education level, the greater the possibility for children to get other education level when the education level of father is i.

17.3.2 Indices to Estimate the Intergenerational Mobility This paper uses the four indices of IGE, inertia rate (IR), upward flow rate (UFR) and downward flow rate (DFR) to estimate the intergenerational mobility of Chinese provinces. Refers to the log-linear intergenerational income regression to estimate the IGE proposed by Becker and Tomes (1979), we convert the variables about income into education-related variables, and the formula to estimate the IGE is as follows: yc = α + βy f + ε

(17.4)

17 Analyzing Spatial Patterns of Intergenerational …

289

where yc is the log of children’s education level, and yf is the log of parents’ education level. The coefficient β is IGE. The larger the IGE, the smaller mobility in a given society. The formulas for calculating IR, UFR and DFR are as follows:

I R = n2 n

(17.5)

U F R = n2 n

(17.6)

D F R = n3 n

(17.7)

where n is the total number of samples in a province, and n1 is the number of children whose education level is equal to the education level of the father, and n2 is the number of children whose education level is higher than the education level of the father, and n3 is the number of children whose education level is lower than the education level of the father.

17.3.3 Geographically Weighed Regression Geographically weighed regression (GWR) is an important local technique to exploring the spatial heterogeneities or non-stationarities in data relationships (Brunsdon et al. 1996). In this article, it is used to explore the spatially varying relationships between influencing factors and the educational level of children and their spatial distribution. A basic GWR model could be expressed as follows: yi = βi0 +

p

βik xik + εi i = 1, 2 . . . n

(17.8)

k=1

where yi is the dependent variable at location i, x ik is the kth independent variable at the location i, βi0 is the intercept parameter at location i, βik is the local regression parameter for the kth independent variable at the location i, p is the number of independent variables, εi is the random error at location i and εi ~ N (0, σ2 ).

17.4 The Overall Situation of Intergenerational Education Mobility in China For this part, the education level is divided into five grades: uneducated is divided into the first grade, old-style private school and primary school is divided into the second grade, junior high school is divided into the third grade, high school is divided into

290

K. Qin et al.

Fig. 17.1 The frequency distribution of the education level grade

the fourth grade, the university and above is divided into fifth grade. The frequency distribution of the education level grade of father and children is shown in Fig. 17.1. It can be seen from Fig. 17.1 that most of fathers has a low education level, and the number is gradually declining when move to higher education level, however, the distribution of children’s education level forms a symmetrical distribution centered on grade 3 (junior high school), indicating that the education level of children has been greatly improved compared with father, which reflects the great achievements made in the development of China’s education. Figure 17.1 reflects the significant success of China’s education, however, the mobility of education between children and father needs to be further explored. This study uses the conversion matrix and the intergenerational education index to estimate the overall situation of China’s intergenerational mobility. The conversion matrix can reflect the change of something from one state to another. Education intergenerational conversion matrix can reflect the changes of the children’s education level compared to father, the calculation results of conversion matrix are shown in Table 17.1. It can be seen from Table 17.1 that with the rise of father’s education level, the possibility for children to get the education level of higher education (grade 5) is gradually increased, indicating that the educational superiority of the father can significantly optimize the educational opportunity of the children; When the father is illiterate (grade 1), the possibility for children to get the education level of junior high school and below (the first three grades) is 87.28%, and the possibility for the grade 5 is only 2.93%, indicating that when the father is illiterate, it is difficult for the children to obtain high education level, especially higher education (grade 5); However, when the father received higher education (grade 5), the probability of children receiving higher education is 63.93%, and the possibility of children

17 Analyzing Spatial Patterns of Intergenerational …

291

Table 17.1 Intergenerational education conversion matrix The grade of father’s education level Grade 1 (%) Grade 2 (%) Grade 3 (%) Grade 4 (%) Grade 5 (%) The grade of Grade 1 children’s Grade 2 education Grade 3 level Grade 4

28.00

5.18

1.70

1.18

34.10

22.52

8.82

6.74

2.30

25.18

39.26

28.65

19.14

12.13

9.79

21.53

30.04

27.59

21.31

Grade 5

2.93

11.51

30.79

45.35

63.93

100.00

100.00

100.00

100.00

100.00

Total

0.33

entering the lowest two grades is only 2.63%, indicating that the children is very likely to get a high education level when the father have a high education level, and almost impossible to get a low education level; The diagonal figures (bolded) indicate that the father and the children have the same education level, i.e. the inheritance rate, the average of inheritance rate is 34.13%, yet the inheritance rate of grade 5 is 63.93%, far exceed the average value, indicating that high education level is very likely to be inherited, as one’s social status and occupation will be largely affected by education level, this result also corresponds to the phenomenon of Chinese society such as “rich second generation”, “official second generation” etc. The calculation results of the intergenerational education index are shown in Table 17.2. The diagonal figures (bolded) in Table 17.2 stand for the intergenerational education inheritance index, it can be seen that grade 5 has the largest inheritance index, followed by the grade 1, and the IEOI is gradually increasing when moving to the middle grade, the third grade has the lowest inheritance index, and the highest IEOI, indicating that the father of medium education level has the largest intergenerational mobility, higher or lower education level has smaller intergenerational mobility, which corresponds to the phenomenon that increasingly serious polarization of Chinese society; figures that are not on the diagonal stand for the intergenerational education mobility index, it can be seen that most of the figures above the diagonal are smaller than 1, while the figures below the diagonal are mostly greater than 1, Table 17.2 Calculation results of intergenerational education index The grade of father’s education level The grade of children’s education level

Grade 1

Grade 2

Grade 3

Grade 4

Grade 5

IEII

Grade 1

2.193

0.405

0.133

0.092

0.026

0.164

Grade 2

1.506

0.995

0.389

0.298

0.101

0.574

Grade 3

0.853

1.330

0.971

0.649

0.411

0.811

Grade 4

0.516

1.134

1.582

1.453

1.122

1.088

Grade 5

0.182

0.715

1.913

2.818

3.973

1.407

IEOI

0.764

0.896

1.004

0.964

0.415

292

K. Qin et al.

indicating that the overall trend of education in China is moving to a high level of education, this conclusion can be seen more clearly from the IEII, the IEII of the first three grade is smaller than 1, while four or five grade is greater than 1, the higher the grade, the higher the IEII. The spatial pattern of intergenerational education mobility in China. This paper uses the four indices of IGE, IR, UFR and DFR to estimate the intergenerational mobility of Chinese provinces. The results are shown in Table 17.3, where the first column is region, and column 2, 3, 4 is the result of the binary regression to calculate the IGE, IGE is the regression coefficient of the log of father’s education level (column 3), and the latter three columns are the calculation results of IR, UFR and DFR, the last column is the sample size. The visualization of the spatial distribution of indices is shown in Fig. 17.2. It can be seen from Table 17.3 and Fig. 17.2 that China’s overall IGE is 0.404, which indicates that China’s intergenerational mobility is small, the IGE of the provinces in China is between 0.245 and 0.479, and there is a great difference in spatial distribution. Tianjin, Liaoning, Heilongjiang and Beijing and Shanghai have the smallest IGE, while Chongqing, Jiangsu, Gansu and Anhui, Henan have the largest IGE. China’s overall IR is 0.249, and the IR of provinces in China is between 0.144 and 0.407, the overall UFR in China is 0.679, and the UFR of provinces in China is between 0.462 and 0.835, the overall DTR in China is 0.072, and the DTR of the provinces in China is between 0.021 and 0.143. It can be seen that China’s UFR is very large while IR and DFR is very small, indicating that China’s education has achieved great success and the overall education level is improving. IR, UFR and DFR has similar spatial distribution characteristics, and Beijing, Tianjin, Qinghai, Guangdong and Shanghai all show the characteristics of small IR and large UFR, indicating that the children’s education level is easy to move upward in these regions, While Gansu, Anhui, Ningxia, Yunnan all show the characteristics of large IR, small UFR, and large DFR, indicating that the children’s education level is difficult to move upward in these regions.

17.5 Analysis of Influencing Factors of Children’s Education Level and Their Spatial Distribution This paper analyzes the influencing factors of children’s education level and their spatial distribution by constructing GWR model. We choose children’s average education level of Chinese provinces as the dependent variable of GWR, considering both family factors and individual factors that may affect the children’s education level, we choose the average of children’s age, sex, Hukou and father’s education level, political status, ISEI in Chinese provinces as the potential independent variable of GWR, the definition of variables is shown in Table 17.4:

17 Analyzing Spatial Patterns of Intergenerational …

293

Table 17.3 Calculation results of IGE, IR, UFR and DFR Region

Constant

Father’s education

Adj. R2

IR

UFR

DFR

Sample size

All-regions

1.473*** (0.011)

0.404*** (0.007)

0.246

0.249

0.679

0.072

9979

Beijing

1.196*** (0.044)

0.278*** (0.022)

0.228

0.144

0.835

0.021

514

Tianjin

2.025*** (0.040)

0.245*** (0.021)

0.263

0.146

0.831

0.023

390

Hebei

1.553*** (0.069)

0.328*** (0.043)

0.18

0.238

0.669

0.092

260

Shanxi

1.771*** (0.072)

0.289*** (0.040)

0.187

0.214

0.705

0.080

224

Inner Mongolia

1.489*** (0.114)

0.284** (0.084)

0.099

0.305

0.611

0.084

95

Liaoning

1.844*** (0.050)

0.264*** (0.028)

0.209

0.201

0.732

0.067

343

Jilin

1.543*** (0.051)

0.345*** (0.035)

0.192

0.260

0.653

0.087

415

Heilongjiang

1.713*** (0.042)

0.268*** (0.028)

0.143

0.256

0.697

0.047

535

Shanghai

1.961*** (0.045)

0.278*** (0.022)

0.25

0.192

0.752

0.057

475

Jiangsu

1.346*** (0.058)

0.472*** (0.035)

0.282

0.251

0.669

0.081

459

Zhejiang

1.614*** (0.053)

0.349*** (0.032)

0.217

0.231

0.693

0.075

424

Anhui

1.186*** (0.062)

0.453*** (0.045)

0.213

0.338

0.552

0.110

373

Fujian

1.430*** (0.068)

0.424*** (0.048)

0.237

0.254

0.702

0.044

248

Jiangxi

1.350*** (0.054)

0.402*** (0.038)

0.205

0.286

0.615

0.099

423

Shandong

1.451*** (0.049)

0.404*** (0.032)

0.228

0.264

0.656

0.079

541

Henan

1.254*** (0.049)

0.434*** (0.040)

0.173

0.322

0.604

0.074

566

Hubei

1.432*** (0.045)

0.406*** (0.034)

0.209

0.255

0.687

0.058

537

Hunan

1.562*** (0.052)

0.341*** (0.034)

0.192

0.265

0.678

0.057

419

Guangdong

1.842*** (0.058)

0.300*** (0.030)

0.263

0.171

0.758

0.071

269 (continued)

294

K. Qin et al.

Table 17.3 (continued) Region

Constant

Father’s education

Adj. R2

IR

UFR

DFR

Sample size

Guangxi

1.620*** (0.053)

0.321*** (0.032)

0.226

0.226

0.666

0.109

350

Chongqing

1.188*** (0.066)

0.479*** (0.055)

0.232

0.320

0.628

0.053

247

Sichuan

1.602*** (0.044)

0.308*** (0.028)

0.17

0.226

0.715

0.060

571

Guizhou

1.441*** (0.069)

0.432*** (0.043)

0.282

0.245

0.693

0.062

257

Yunnan

1.344*** (0.059)

0.319*** (0.043)

0.133

0.316

0.575

0.109

358

Shaanxi

1.372*** (0.067)

0.372*** (0.048)

0.159

0.284

0.619

0.097

310

Gansu

1.065*** (0.101)

0.469*** (0.063)

0.225

0.323

0.534

0.143

189

Qinghai

1.589*** (0.133)

0.371*** (0.080)

0.177

0.167

0.771

0.063

96

Ningxia

1.060*** (0.138)

0.425*** (0.098)

0.164

0.407

0.462

0.132

91

Note 1. The dependent variable of the binary regression in column 3, 4, 5 is the log of children’s education level, Father’s education = the log of father’s education level. 2. CGSS provides data for 28 regions, with no data for Xinjiang, Tibet, Hainan, Hong Kong, Macau, and Taiwan ** P < 0.01;*** P < 0.001

Functions provided in the GWmodel package (Lu et al. 2014a; Gollini et al. 2015) in R is used. The independent variables are chosen via a model specification approach in a ‘forward’ direction (see details in Lu et al. 2014b). This process of variables selected could be visualized in Fig. 17.3, and the corrected Akaike Information Criterion (AICc ) values are plotted in Fig. 17.4. Figure 17.3 shows the process of variable selection through 21 times regress, where the center is the dependent variable, and the other nodes of different shapes and colors represent the independent variables. Figure 17.4 shows the changes of AICc values during the process. It can be seen from Figs. 17.3 and 17.4 that the AICc value decreases gradually with the addition of the independent variables. The first independent variable included in the model is fa_edu (the 6th regression) and there is a sharp decline of the AICc value when it is included, indicating that the education level of father is most related with the education level of children. The second included variable is hukou (the 11th regression), and the AICc value changes a bit, indicating the Hukou of children has a relationship with the education level of children, and age (the 15th regression), ISEI (the 18th regression), party (the 20th regression), sex (the 20th regression) is included successively, yet the AICc value change is very small, indicating that these factors are not very relevant to the education level of children.

17 Analyzing Spatial Patterns of Intergenerational …

(a) IGE

(b) IR

(c) UFR

(d) DFR

295

Fig. 17.2 The visualization of the spatial distribution of indices

It can be seen from Fig. 17.4 that when the model runs to the 18th time, the AICc value of the model is minimized, and almost not change thereafter. So, the optimal model can be built by including the independent variables of fa_edu, hukou, age, party, the formula of the regression model is shown as follows: child_edu i =βi0 + βi1 f a_edu i1 + βi2 hukou i2 + βi3 agei3 + βi4 I S E Ii4 + εi

(17.9)

296 Table 17.4 Definition of variables

K. Qin et al. Type

Variables

Symbol of variables

Dependent variable

The average educational level of children

child_edu

Independent variables

The average educational level of father

fa_edu

The average sex of children

Sex

The average age of children

Age

The average Hukou of hukou children The average political status of father

Status

The average ISEI index of father

ISEI

Fig. 17.3 The visualization of the process of variable selection

where βi0 is the intercept at location i, βik is the local regression parameter for the kth independent variable at the location i, and εi is the random error at location i. The comparison of model fit diagnostic parameter of GWR and OLS model is shown in Table 17.5. It can be seen from Table 17.5 that the Adjusted R2 of GWR model reaches 96.1%, indicating that the degree of model fitting is very high, and

17 Analyzing Spatial Patterns of Intergenerational …

297

Fig. 17.4 The visualization of the sorted AICc outputs

Table 17.5 The comparison of model fit diagnostic parameter of GWR and OLS model

Diagnostic parameter

GWR

OLS

RSS

1.978

3.764

AICc

14.563

35.270

R2

0.978

0.958

Adjusted R2

0.961

0.950

compared to the OLS model, the diagnostic parameters of the GWR model are optimized, indicating that the GWR model is superior to the OLS model. We analyzed the spatial pattern of the relationship of children’s education level and two influencing factors which most related to it, i.e. the education level of father and Hukou, and Fig. 17.5 shows the visualization results of the spatial distribution of the correlation coefficient of two influencing factors. It can be seen from Fig. 17.5 that the education level of father and child are positively correlated, the minimum correlation coefficient is 0.989 and the maximum is 1.296. Hukou and the education level of child are also positively correlated and the minimum correlation coefficient is 2.131, the maximum is 4.317, indicating that in the case of other variables are controlled, if the education level of father increased by one year, the education level of children will increase 0.989 to 1.296 years, and the average education level of children will increase 2.131 to 4.317 years when the Hukou of the children is non-agricultural Hukou compared to agricultural Hukou. As the precondition of applying GWR is that the data have spatial autocorrelation, the correlation coefficient of one region is related to the regions adjacent to it, therefore the differences between adjacent regions will be ignored, only the global differences can be observed. It can be seen from Fig. 17.5a that the influence of the

298

(a) fa_edu

K. Qin et al.

(b) hukou

Fig. 17.5 The visualization of the spatial distribution of the correlation coefficient of two influencing factors

education level of father is slightly larger in the north than in the south, and is larger in the west than the east, showing a tendency to gradually decrease from northwest to southeast. The East China and the South China region have the smallest correlation coefficient, indicating that the influence of father’s education level is weak in these regions, while the North China, Northwest China and the Southwest China regions have the largest correlation coefficient, indicating that the influence of father’s education level is strong in these regions. It can be seen from Fig. 17.5b that the influence of Hukou is balanced in the north and the south, while from the west to the east, the influence of Hukou is weakened gradually, the Northwest China has the largest correlation coefficient, indicating that Hukou has greater constraints to the acquisition of child education in the Northwest China, the urban children can get more educational opportunities than rural children, while in North China, Central China, East China and other regions, the influence of the Hukou is weak.

17.6 Conclusion and Discussion With CGSS data in 2013, this paper analyzed the overall situation of intergenerational education mobility in China, and discussed the spatial pattern of intergenerational education mobility in China and its causal mechanism. The main conclusions are as follows: (1) The education level of children is largely related to the education level of father, and high education level is more likely to be inherited.

17 Analyzing Spatial Patterns of Intergenerational …

299

(2) The overall trend of Chinese education is moving to higher grade of education, the higher the grade, the higher the IEII, the father of medium education level has the largest intergenerational mobility, higher or lower education level has smaller intergenerational mobility. (3) The intergenerational education mobility of China is very unevenly distributed in space, China’s overall IGE is 0.404, indicating that the intergenerational mobility is small, while China’s UFR is very large, IR and DFR is very small, indicating that the overall education level of China is improving. (4) The education level of father and Hukou is most relevant to the education level of children, followed by the age of the children, the father’s occupation, and the gender of the children. (5) The influence of father’s education level is weakened from northwest to southeast, and father’s education level had the least influence on children’s education in East China. The influence of Hukou is weakened from west to east, and Hukou had the strongest influence on children’s education in Northwest China. This paper has made some achievements in the spatial analysis of intergenerational education mobility in China, however, there are some limitations: First, as the sample size is limited, we cannot make more detailed study from the aspect of different sex, age stage etc. Second, this paper just made a preliminary speculation on the causal mechanism of intergenerational education mobility in China and it needs to be further studied. Third, current method cannot effectively explain why spatial distribution of correlation coefficient have this kind of spatial characteristics as shown in Fig. 17.5, which requires further exploration.

References Becker, G. S., & Tomes, N. (1979). An equilibrium theory of the distribution of income and intergenerational mobility. Journal of Political Economy, 87(6), 1153–1189. Benabou, R., & Ok, E. A. (2001). Social mobility and the demand for redistribution: The POUM hypothesis. Quarterly Journal of Economics, 116(2), 447–487. Björklund, A., & Jäntti, M. (1997). Intergenerational income mobility in Sweden compared to the United States. The American Economic Review, 87(5), 1009–1018. Black, S. & Devereux P. (2010). Recent developments in intergenerational mobility. Institute for the Study of Labor (IZA), IZA Discussion Papers. Blau, P. M., & Duncan, O. D. (1967). The American occupational structure. New York: Free Press. Bloome, D., & Western, B. (2011). Cohort change and racial differences in educational and income mobility. Social Forces, 90(2), 375–395. Bowles, S., & Gintis, H. (2002). The inheritance of inequality. Journal of Economic Perspectives, 16(3), 3–30. Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). Geographically weighted regression: A method for exploring spatial nonstationarity. Geographical Analysis, 28(4), 281–298. Chen, M. (2013). Intergenerational mobility in contemporary China. Chinese Sociological Review, 45(4), 29–53. Chen, Y., & Cowell, F. A. (2015). Mobility in China. Review of Income & Wealth, 63(2), 203–218. Dunn, C. E. (2007). The intergenerational transmission of lifetime earnings: evidence from brazil. B.e. Journal of Economic Analysis & Policy, 7(2), 1782–1782.

300

K. Qin et al.

Escobar, L. D. P., & Izquierdo, M. G. (2016). Intergenerational educational and occupational mobility in Spain: does gender matter?. British Journal of Sociology of Education, 37(5). Gibbons, M. (2010). Income and occupational intergenerational mobility in New Zealand. New Zealand: Treasury Working Paper. Gollini, I., Lu, B., Charlton, M., Brunsdon, C., & Harris, P. (2015). GWmodel: An R package for exploring spatial heterogeneity using geographically weighted models. Journal of Statistical Software, 63(17). Gong, H., Leigh, A., & Meng, X. (2012). Intergenerational income mobility in urban china. Review of Income & Wealth, 58(3), 481–503. Haider, S., & Solon, G. (2006). Life-cycle variation in the association between current and lifetime earnings. American Economic Review, 96(4), 1308–1320. Jantti, M., Bratsberg, B., Roed, K., Raaum, O., Naylor, R., & Osterbacka, E., et al. (2005). American exceptionalism in a new light: a comparison of intergenerational earnings mobility in the Nordic countries, the United Kingdom and the United States. Warwick Economics Research Paper, 6(2), e1000778. Labar, K. (2011). Intergenerational Mobility in China. Retrieved from https://halshs.archives-ouv ertes.fr/halshs-00556982. Lin, H., Lai, J. G., & Zhou, C. H. (2010). Spatially integrated humanities and social science. Being, China: Science Press. (in Chinese). Lin, Z., & Palmer, J. C. (2016). A critical review of sociological dialogue between china and the west. Chinese Sociological Dialogue, 1(1), 3–14. Lin, H., Zhang, J., Yang, P., & Liu, J. (2006). Development on spatially integrated humanities and social science. Geo-Information Science, 2, 30–37. (in Chinese). Liu, Z. G., & Fan, Y. J. (2013). Research on the factors which influencing the intergenerational mobility of education. Education Science, 29(1), 1–5. (in Chinese). Lu, B., Harris, P., Charlton, M., & Brunsdon, C. (2014a). The GWmodel R package: Further topics for exploring spatial heterogeneity using geographically weighted models. Geo-Spatial Information Science, 17(2), 85–101. Lu, B., Charlton, M., Harris, P., & Fotheringham, A. S. (2014b). Geographically weighted regression with a non-Euclidean distance metric: A case study using hedonic house price data. International Journal of Geographical Information Science, 28(4), 660–681. Luo, J. J., Feng, S. Z., Liang, Y. C., & Chen, Y. S. (2016). Computational social science in the era of big data. Journal of Guizhou Normal University (Social Sciences), 6, 25–26. (in Chinese). Mazumder, B. (2005). Fortunate sons: new estimates of intergenerational mobility in the United States using social security earnings data. Review of Economics and Statistics, 87(2), 235–255. Minello, A., & Blossfeld, H. P. (2014). From mother to daughter: Changes in intergenerational educational and occupational mobility in Germany. International Studies in Sociology of Education, 24(1), 65–84. Nimubona, A. D., & Vencatachellum, D. (2007). Intergenerational education mobility of black and white south africans’. Journal of Population Economics, 20(1), 149–182. Piketty, T. (1995). social mobility and redistributive policies. Quarterly Journal of Economics, 110(3), 551–584. Qin, K., & Kang, C. G. (2016). Theories and methods of spatiotemporal analysis in computational social science. Journal of Guizhou Normal University (Social Sciences), 6, 46–48. (in Chinese). Solon, G. (1992). Intergenerational income mobility in the United States. The American Economic Review, 393–408. Solon, G. (2002). Cross-country differences in intergenerational earnings mobility. The Journal of Economic Perspectives, 16(3), 59–66. Zhang, Y. Z. (2016). Research on the elasticity of intergenerational education mobility and Its change. Journal of Fujian Provincial Committee Party School of CPC, 5, 63–69. (in Chinese). Zhao, H. X., & Feng, X. N. (2016). A comparative study on the intergenerational mobility and regional difference in China’s education: Based on CHARLS 2013 data. China Youth Study, 8, 54–58. (in Chinese).

17 Analyzing Spatial Patterns of Intergenerational …

301

Zhou, X., & Zhang, P. (2014). Intergenerational occupational mobility and intergenerational income mobility: An Empirical research about urban and families in China. China Economic Quarterly, 14(01), 351–372. (in Chinese). Zimmerman, D. J. (1992). Regression toward mediocrity in economic stature. The American Economic Review, 409–429.

Chapter 18

Can Social Media Rescue Child Beggars? Xining Yang and Daniel Z. Sui

18.1 Introduction Being the world’s most populous country in the world with growing disparities between regions and various social groups, child begging and missing children has been reported in China and has become a major social issue with growing national attention. A report by the National Working Committee on Children and Women indicates that China has approximately 1 to 1.5 million homeless children.1 Among these homeless children, China estimates 20,000 children are victims of abduction every year.2 Even worse, some are forced to beg by criminal organizations, while others are sold to couples who cannot have children, or are made to work in factories as child labor. The Chinese government has been actively pursuing programs to effectively address the child beggars issue. In 2003, the central government issued a national policy on the Management of Street Beggars in Cities, after which assistance stations were established to provide food and shelter for the homeless in major cities. The government has launched five major crackdowns on human trafficking since the 1990s. In 2010, a national DNA database was established to assist the identification and rescue of child beggars. Blood samples from parents of children reported missing and samples from children of unclear background were collected and stored 1 http://cnzgw.org/2012/0629/8761.html. 2 http://www.foreignpolicy.com/articles/2011/10/06/china_missing_children.

X. Yang (B) Department of Geography and Geology, Eastern Michigan University, Ypsilanti, MI 48197, USA e-mail: [email protected] D. Z. Sui Vice President for Research and Innovation, Virginia Tech, Blacksburg, VA 24061, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_18

303

304

X. Yang and D. Z. Sui

for automatic matching. These actions initiated by the government has had very positive impacts on the child beggar issue. According to the Ministry of Public Security’s report, more than 9,300 kidnapped children in China have been rescued since April 2009 after the nationwide campaign was launched to crack down on human trafficking.3 However, the Ministry noted in the report that children kidnapped to become beggars took up only a small portion of all cases of child beggars. In a growing number of cases, children were taken to beg along with their parents or relatives. As a result, the government encouraged the involvement of citizens in providing clues to help the police rescue children—especially those being abused and forced to beg on the streets. Child beggars, in other literature usually referred as street children in general, has gained attention of scholars as a research topic, and yet traditional studies of street children have certain limitations. First, these studies are often based upon very small number of samples that can hardly provide adequate representation of the entire population of street children. Also, previous studies of street children focus mainly on developing theories and conceptual frameworks, with less emphasis on empirical studies. Furthermore, traditional studies rely primarily on using interviews and surveys to collect the data of street children, which always face issues such as privacy and unwillingness to report. On the other hand, child trafficking, which has long been recognized as a serious crime internationally, also strikes the society in China (Wang et al. 2018) with tens of thousands of children trafficked every year.4 A lot of children are sold or abandoned by their biological parents with a variety of reasons including poverty, financial profits, or the one-child policy in China. Missing children exist as a large group of population who would like to find their way back home. Different from traditional approaches of studying the street child and missing children reported in the literature so far (Bromley and Mackie 2009; Panter-Brick 2002; Strehl 2010; Cheng and Lam 2010), this paper proposes a new way to study the child beggars and missing children issues. Instead of collecting data through individual surveys and interviews, we collect information from social media websites. Using the volunteered geographic information harvested from the social media, we want to explore the geographic patterns of child beggars and missing children and discuss how social media and VGI has empowered citizens to help rescue child beggars and missing children find their way back home. We will first present a literature review of the emergence of volunteered geographic information and social media and its recent applications in spatial social science. The two Chinese pioneer movements related to child beggars and missing children on social media platform will be introduced next. Data collection and exploratory data analysis will is elaborated, followed by the results and discussion.

3 http://www.legaldaily.com.cn/index_article/content/2011-02/11/content_2467637.htm?node=

5954. 4 http://www.xinhuanet.com/info/2015-08/31/c_134572911.htm.

18 Can Social Media Rescue Child Beggars?

305

18.2 Volunteered Geographic Information (VGI), Social Media, Child Beggars, and Missing Children Until recently virtually all geographic information was produced through a top-down process by government mapping agencies or corporations in the mapping industry. By the mid-1990s, however, new technologies were emerging that had fundamentally changed these arrangements (Goodchild and Glennon 2010). First, it has become possible for the average citizen to determine positions accurately these days, without formal training. Second, it has also become possible for almost anyone who has access to the Internet to develop the skills to make maps from acquired data—cartographic design skills previously possessed only by trained cartographers. Google’s Maps service, for example, allows anyone to produce a decent-looking map from custom data, and OpenStreetMap render raw data provided by the user into a cartographically acceptable street map. Increasing, citizens with no expertise in the mapping sciences were suddenly able to perform many of the complex mapping tasks that had previously been the preserve of experts. Volunteered geographic information (VGI) (Goodchild 2007), also known as neogeography (Turner 2006) or neocartography (Liu and Palen 2010), has been coined to describe this phenomenon. The world of volunteered geography is breaking down the traditional distinctions between expert and non-expert in the specific context of the creating geographic information, since many of the traditional forms of mapping expertise can now be performed through various on-line or hand-held devices. With the rapid development of hardware, network infrastructure, and mobile technology, location based social media applications have been widely adopted and accessed by individuals across the world (Sui and Goodchild 2011). Many location based social media websites and applications have emerged in the past few years to encourage and facilitate the actions of neogeographers (Gong and Yang 2020). In essence these applications make it possible for user generated content on the Web to become useful and usable geographic information for various mapping applications. Popular sites include Flickr and its georeferenced photographs, the OpenStreetMap, Wikimapia and its large collection of user-described features, and numerous citizen science web sites that collect georeferenced observations on plant, animal, and bird sightings. Furthermore, it is increasingly common for the content of Twitter, Facebook, and many other social media to be georeferenced which gives us opportunity to explore human activities in both space and place (Yang et al. 2016). With the social media websites increasingly location-based, most location-based social media currently allow users to create and share geo-tagged information about their real lives and we now have deep data about many for social and behavioral study as discussed by Sui and Goodchild (2011). Using data harvested from location-based social media represents a major advances in methodologies of spatial social science research (Gupta et al 2012; Tsou 2012; Chen and Yang 2014; Ye et al. 2016, 2018). Street children, usually referred as kids who permanently live on street to the extent of even sleeping there at night, are often excluded from the mainstream society and off the radar screen of government census. These particular groups of children are

306

X. Yang and D. Z. Sui

often out of school and do not have access to basic health services or the protection of an adult and they work, beg and steal for a living on the street (De Venanzi 2003). Because the children living on the street has drew emotive public concern and media coverage, the study of street children has become a matter of priority for national and international child welfare organizations and thus receive attention by academic researchers (Panter-Brick 2002). For example, Bromley and Mackie (2009) explored the experiences of over a hundred child traders on the street in Peru, suggesting that international policy changes should be made in order to better protect the street child worker. Young and Barrett (2001) studied the interactions between street children and their socio-spatial environment by using four visual methods which include the mental and depot maps, thematic and non-thematic drawings, daily time lines and photo diaries. Sociologists find out that street children tend to have a lower level of subjective wellbeing than those regular children after an examination of street children in China (Cheng and Lam 2010). Recent efforts of rescuing child beggars and helping the missing children find their way back home have been noted in Chinese social media websites. This provides unprecedented opportunity to advance the methods in spatial social science and humanities. In the context of this paper, we found the following sites of particular relevance. Street Photos to Rescue Child Beggars. Child begging phenomenon is drawing national attention due to the growing popularity of social media among the Internet users in China. Photos posted on Sina WeiBo,5 also known as the Chinese version of Twitter, raised the level of public awareness on the child beggars issue in China since winter 2011. This social issue first gained attention by a request from a mother who lost her child in 2009 and sought help from the public using social media. Based upon the clues of several photos posted voluntarily on Sina WeiBo, the lost child was found begging in the central business district in the city Xia’men, China. The spread out of this successful story on the Internet got the attention from Yu Jian-Rong, a professor from the Institute of Rural Development, Chines Academy of Social Sciences. He was motivated by the huge potential of crowdsourcing efforts in rescuing the child beggars on the street, and helping the parents to find their missing children. Yu created a public account on Sina Weibo called “Street Photos to Rescue Child Beggars”6 to encourage the citizens to take photos when noticing suspicious kidnapped child begging on the street and spread these photos through its public account on Sina Weibo. The blog received large amount of volunteers’ responses in a short period, more than 200,000 followers and 7000 posts as of August 31st, 2012. Baby Back Home. The increasing amount of missing children in China has become a serious social problem. China’s Ministry of Public Security has set up a national DNA database for missing children, connecting 236 DNA centers nationwide, to help track children abducted by human traffickers. However, due to the confidentiality concern, this database is not publicly available. This The idea of a missing children Internet database came from Zhang Baoyan who built a non-governmental 5 http://www.weibo.com. 6 http://www.weibo.com/jiejiuqier

18 Can Social Media Rescue Child Beggars?

307

organization website called “BaoBeiHuiJia”,7 aiming to help the biological families find their missing children with the publishing of information online. As of the year 2012 the data of this article was collected, Baobeihujia.com has more than 10,000 users registered who have voluntarily provided information about missing children online. It has successfully helped more than 500 children to find their parents since 2007. Baobeihuijia means “baby back home”, which contains large amount of volunteer geographic information about missing children that can be mined and analyzed in our study. With the availability of the abovementioned two social movements dedicated to address children issues in China, relevant, timely and loosely-coupled data from social media with geographic information provided an unprecedented opportunity to understand the geographic patterns of child beggars and missing children. We will examine three research questions in the following sections. First of all, at a national scale the child beggars and missing children are reported from different locations, how are they spatially distributed? Our second objective is by using exploratory spatial data analysis, we would like to explore the socio-economic factors that could account for the distribution of the child beggars and missing children reported by the social media. The third research objective is to visualize the diffusion of social media awareness on child beggars and missing children in space and time. As illustrated by Wang et al. (2018), our assumption is that a majority of the information on social media application and website was truly reported, as this would maximize the likelihood of success of achieving the goal of rescuing child beggars and helping missing children.

18.3 Data and Methods This paper relies on a combination of quantitative as well as qualitative data from social media, supplemented by socio-economic data from government sources. We also used a combination of quantitative and qualitative analysis methods to examine the spatial and temporal distribution and diffusion patterns of Child beggars and missing children in China. Volunteered Geographic Information. We have two sources of volunteered geographic information and we develop two corresponding strategies to collect the data used in this paper. We accessed the two social media sites on August 2011 and compiled the datasets for our analysis. The first eight months in 2011 was picked as the span of time in our exploratory study. For the “Street Photos to Rescue Child Beggars” website, each post with a street child beggar is reported on the social media account. The raw data is usually a paragraph of text with the location at which the photo was taken and a descriptive paragraph of the observation. Some data are also found with attached one or two photos. These raw data were collected and interpreted 7 http://www.baobeihuijia.com.

308

X. Yang and D. Z. Sui

via a java application written using Sina Weibo API. We built a crawler to retrieve all the post and we name this set of data as dataset 1. For the website “BaoBeiHuiJia”, we used its built-in search engine to search the website to retrieve all the reported missing children. Records are retrieved into the database with the attribute of “ID”, “name”, “sex”, “birthday”, “native place”, “missing place”, “missing time” and description. We name this set of data as dataset 2. Socio-economic Data. To better understand the correlation between socio-economic factors and the distribution of child beggars and missing children, we used the Chinese administrative provinces boundary map from National Fundamental Geographic Information System of China as a geographic unit to summarize socio-economic indicators. All the child beggar and missing children samples were geo-coded and aggregated to the province level for further analysis. In order to examine the possible socio-economic factors that influence the distribution of the child beggars observed, we collected four independent variables for each province respectively. A joint function was performed using ArcGIS to append these variables to the boundary data for performing the exploratory spatial data analysis (Table 18.1). Exploratory Spatial Data Analysis. Exploratory Spatial Data Analysis (ESDA) is a set of techniques aimed at describing and visualizing spatial distributions, at identifying a typical localizations or spatial outliers, at detecting patterns of spatial association, clusters or hot spots, and at suggesting spatial regimes or other forms of spatial heterogeneity (Haining 1990; Bailey and Gatrell 1995; Anselin 1998a, b). These methods provide measures of global and local spatial autocorrelation, which can reveal spatial clustering patterns. To answer our first two research questions, we first performed the Exploratory Spatial Data Analysis (Anselin 1998) trying to find out the global trend and pattern of the distribution of child beggars observations count for each province. To test whether there is spatial dependence existing for the count of child beggars in each province, Moran’s I (1950) was then calculated toward the count using the definition: N I = i jωi j

i jωi j (X i −) X j − i(X i −)2

where N is the number of total provinces (34) indexed by i and j. wij is an element of a matrix of spatial weights and we generated using a conventional Queen’s method. A multivariate regression analysis was performed to examine the possible contributing factors that may correlate to the distribution of child beggars for the second research question we proposed. Spatial-time paths approach. To answer our third research question, we use the space-time paths approach to visualize the diffusion of social media awareness on child beggars and missing children in space and time. Time geography was first proposed by Torsten Hagerstrand in the late 1960s (Hagerstrand 1968) to study human activities and their constraints under the context of space and time (Hägerstrand 1970). The space-time system is represented as a three-dimensional (3D) system

60284.69

73910.31

2249.30

38442.57

38505.62

54145.16

46718.84

64814.30

59994.23

64113.88

71855.26

81604.74

68154.91

91514.70

13166.73

Shanghai

Jiangsu

Zhejiang

Anhui

Fujian

Jiangxi

Shandong

Henan

Hubei

Hunan

Guangdong

Guangxi

Hainan

173834.03

Jilin

Heilongjiang

56656.08

Liaoning

443520.03

Shanxi

Neimenggu

4347.51

72079.46

Hebei

6384.08

Beijing

Tianjin

Area (Sq Mile)

Province

97

73

774

237

125

97

85

45

246

97

101

233

247

19

4

59

40

26

54

37

204

Weibo_CT

37

405

1001

509

646

1328

858

488

811

551

430

917

221

256

194

193

162

394

667

77

639

BBHJ_CT

Table 18.1 Socio-economic variables and distribution of child beggar

254.283

194.186

590.881

310.772

307.556

566.219

616.488

265.489

304.901

424.288

545.744

790.025

3951.32

85.095

143.46

298.123

21.508

228.722

384.89

1149.04

1186.13

Pop_Density (Per SQKM)

0.762

0.741

0.828

0.751

0.755

0.787

0.828

0.735

0.801

0.727

0.841

0.805

0.908

0.786

0.776

0.835

0.803

0.775

0.810

0.897

0.891

HDI

25.6

15.4

48.2

15.7

18.4

13.7

21.2

14.0

38.5

11.8

41.7

27.3

59.7

16.2

19.0

26.5

16.0

24.1

19.2

43.5

60.0

Internet_P

23831

20219

44736

24719

27906

24446

41106

21253

40025

20888

51711

52840

76074

27076

31599

42355

47347

26283

28668

72994

73856

(continued)

GDP_P (Chinese Yuan)

18 Can Social Media Rescue Child Beggars? 309

148249.58

438126.19

78577.56

146829.66

283435.53

20041.91

629224.88

14058.33

Yunnan

Xizang

Shaanxi

Gansu

Q inghai

Ningxia

Xinjiang

Taiwan

1.73

0

4

4

19

8

42

22

54

4

71

97

172

136

Weibo_CT

0

0

0

96

71

51

147

511

3

365

1318

1517

600

BBHJ_CT

123281

6807.35

641.533

13.385

118.996

7.665

67.252

183.412

2.646

119.714

197.099

142.729

350.062

Pop_Density (Per SQKM)

0.940

0.862

0.932

0.757

0.724

0.684

0.681

0.775

0.621

0.672

0.647

0.728

0.675

HDI

209901 287404

66.0

117263

25034

26860

24115

16113

27133

17027

15752

13119

21182

27596

GDP_P (Chinese Yuan)

70.0

70.0

27.1

16.6

23.6

12.5

21.1

16.4

12.1

11.5

13.6

21.2

Internet_P

GDP_P: Gross Domestic Product per capita; HDI: Human Development Index; Internet_P: Internet Penetration Rate

Macau

400.50

68065.30

Hongkong

217542.02

Guizhou

31815.96

Area (Sq Mile)

Sichuan

Chongqing

Province

Table 18.1 (continued)

310 X. Yang and D. Z. Sui

18 Can Social Media Rescue Child Beggars?

311

Fig. 18.1 Spatial Distribution of Child Beggars in China, 2011

which consists of two spatial dimensions and another temporal dimension. A spacetime path, which is defined as a trajectory that records an individual’s movements in space and over time, is a critical concept in time geography. We have already seen significant progress of implementing the time geography concepts in a GIS environment over years (Kwan and Hong 1998). For a better visualization purpose, a generalized spatial-time path tool was developed by Shih-Lung Shaw and his team (Shaw et al. 2008). The space-time path approach provides an efficient method of organizing individual mobility histories in an integrated space-time environment. In this paper, we used the space-time path tool developed by Shaw et al. to visualize the mobility patterns of child beggars.8

18.4 Results Geographic Distribution. The spatial distribution of child beggars was mapped using ESRI’s ArcGIS. To examine the spatial distribution of child beggars and missing children, we created two maps of child beggars and missing children with population density as the base layer (Figs. 18.1 and 18.2). The results using data from both Weibo data and BBHJ data show that the child beggars and missing children observed and posted on the web are mostly concentrating in the south-eastern part of China, indicating that either child beggar cases exist more in these areas or more people 8 http://web.utk.edu/~sshaw/NSF-Project-Website/download.htm.

312

X. Yang and D. Z. Sui

Fig. 18.2 Spatial Distribution of Missing Children in China, 2011

from these areas are willing to voluntarily contribute. In contrast, the west China (Xinjiang, Xizang, Gansu and so on) receives limited count of the child begging phenomenon. Also, provinces with high population density tend to have more child beggars and vice versa. Exploratory Statistical Analysis. We then tested the spatial dependence of our children datasets to choose the proper regression model. The univariate global Moran’s I (Fig. 18.3) shows that there is no spatial autocorrelation so we decided to use the ordinary least square (OLS) to perform the multivariate regression analysis. We performed a multivariate regression analysis to study the relationship between the number of child beggars/missing children and several socio-economic variables. The number of child beggars and missing children aggregated at province level is set to be the dependent variable and Gross Domestic Product (GDP), Human Development Index (HDI), Population Density (PD) and Internet Penetration Rate (IPR) are four independent variables. Although none of the four variables are statistically significant at 0.05 level, the result in Table 18.2 does indicate that HDI is positively correlated with the number of child beggars in each province while the other three variables demonstrate a negative correlation to the number of social media reports in each province. Spatial Mobility Trends. Human mobility is the physical movement by humans from one area to another. We extracted the successful cases from dataset 2, in which each case demonstrates a lost and found location of the missing child. We were able to depict the lost and found link of each child (Fig. 18.4) and visualize them on the

18 Can Social Media Rescue Child Beggars?

313

Fig. 18.3 The Moran’s I −0.0883 and 0.0692 both show very weak spatial autocorrelation

Table 18.2 Results of regression analysis Variable

Coefficient

Std.Error

t-Statistic

Probability

CONSTANT

−76.0989

1059.728

−0.07181

0.943244

HDI

995.3628

1547.934

POP_DENSIT

−0.00393

0.009494

−0.41438

0.681644

INTERNET_P

−6.81648

8.838534

−0.77122

0.446814

GDP

−0.00114

0.002259

−0.50437

0.617811

0.643027

0.525259

national map. According to the map, we are able to confirm the home places in most cases of missing children are located in Western and Central China while these children are often found in Southern and Eastern China. This finding indicates a general mobility trend from the less developed regions of China in the West and North to well developed areas in the East and South. Temporal Distribution Patterns. In order to analyze the social impacts of the “Street Photos to Rescue Child Beggars” activity, we collected the total number of postings in each month since its launch in January 2011. As we can see from Fig. 18.5, the amount of cases posted had been relatively steady throughout the year—100 to 400 postings in most months except February. The number of cases soured to 3464 entries in February, which marks the beginning of the Chinese Lunar calendar year and the Spring Festival. Apparently, this can be explained along multiple lines. It is quite possible that during the month of the Chinese new year—the most important holiday in China (equivalent to Christmas in the West), most people are keen in having family reunion, thus becoming more sensitive and earnest in finding their own ways of addressing the child beggar and missing children issue. Another possible reason is that most of the people in China during the month of Chinese new year are on break,

314

X. Yang and D. Z. Sui

Fig. 18.4 Lost-Found places of missing children in China

4000 3500 3000 2500 2000 1500 1000 500 0 January

February

March

April

May

June

July

August

Fig. 18.5 The amount of posting by month

thus they have the luxury of time to go out to take the photos, voluntarily post them, and respond to calls of actions from various social media sites. Also, we conducted a temporal analysis to capture the behavioural patterns from the contributors. The count of the contributions received in weekdays versus the ones received in weekends is summarized (Fig. 18.6). Also, we compare the amount of

18 Can Social Media Rescue Child Beggars?

315

Temporal patterns of contributions 4500 4000 3500 3000 2500 2000 1500 1000 500 0 Weekday

Weekend

Non-Working Hour

1710

1263

Working Hour

2249

0

Fig. 18.6 Temporal Distributions of VGI Postings

contributions from the working hours to the ones from the non-working hours as well. Apparently, three-quarters of the postings were made during the weekdays and only about a quarter of postings were made during the weekends. It is also interesting to note that postings made during working hours are equivalent to those made during weekends plus non-working hours during the weekdays. “Street Photos to Rescue Child Beggars” gets its attention from different cities in different time. Besides the unevenness of the spatial distribution around the country, there is also variation in terms of the timing of microblog postings from different cities. For example, the first post was from Beijing in January 25th, 2011. Then a volunteer from Wenzhou contributed a photo in the same day. Within 10 days, this social media website received posts from 71 different cities in China. In order to visualize the space-time trend of the VGI distribution, 71 county-level cities are identified sequentially based upon the time of its first contribution posed. To simplify their timestamp information, we denote the sequential time using a relative time stamp T1, T2…, and T71. Along with their locational information, a space-time path has been mapped using Shaw’s Extended Time-Geographic Framework Tools Extension in ESRI Arcgis 9.3 (Fig. 18.7). Although the rhythms of VGI contributions may not be so obvious in our case due to the relative short time span, the space-time path approach nonetheless demonstrates its great potential as a tool to visualize the spatial-temporal patterns of VGI contributions via social media sites. Narrative Analysis. As a useful qualitative approach, narrative analysis or inquiry is “the study of experience as story” (Connelly and Clandinin 2006). This research method could not only capture people’s experience as stories, but also reflect the social, culture and institutional contexts where the experiences were constituted (Clandinin and Rosiek 2007). Kwan and Ding (2008) bring out a geo-narrative

316

X. Yang and D. Z. Sui

Fig. 18.7 The space-time path of VGI contribution on social media sites

approach, which integrates qualitative geographic information system and narrative analysis. Our study finds out the potential to apply the narrative analysis approach on successful cases from “BaoBeiHuiJia” website. Taking the description from missing children who seek help to find out their biological parents as narrative material, we can not only capture the geographical information from the text, but also infer the social and cultural context of the place and circumstances that triggered the cases of missing children. Figure 18.8 shows an example of narrative analysis of a child “San Wa” who was abducted in 1996 from his hometown. Four different colours are used to mark the temporal reference, locational reference, action and feelings respectively. In this example, the child was abducted on Feb 22nd, 1996 when he was in his age of five. He didn’t remember the exact name of his hometown, but his memory recalled the name is “Wei” county. He was first abducted by a man with a beard to a valley, and then he was sent to his foster parents place, Xing Tai city in Hebei province. He vaguely remembered the name of his father and brother. Also he pointed out some of the feature of his hometown such as there is a white elephant statue over a school. With the narrative information, volunteers from the web community successfully helped “San Wa” find his biological parent in Yunnan province.

18 Can Social Media Rescue Child Beggars?

317

Fig. 18.8 Narrative analysis from description material of “San Wa”

18.5 Discussion Studies on children and childhood have become significant research topics for social scientists in general and geography in particular (Abebe 2009). In geography, a social study about the future of children had been conducted in Bunge’s (1971) classic “Fitzgerald: The Geography of a Revolution” some 40 years ago. Hart (1979) conducted a naturalistic and descriptive study towards a small group of children about their life experiences and shared some common characteristics in both space and place. To better understand the lives of younger people under the age of 25 from a geographic perspective, an interdisciplinary journal was launched in 2003, devoted to discussing issues that have significant impacts on children’s lives. There exists different ways in which children can be involved in the research process (Abebe 2009). Generally speaking, the different methods of conducting children’s research were identified by Christensen and James (2000). According to their classification based on the participation levels, children may be treated as objects, subjects and participants respectively. This study is a major extension on children’s geography by deploying data from social media, coupled with both quantitative (GIS-based spatial analysis) and qualitative approaches (narrative analysis). Despite the preliminary nature of our results, we detect multiple stories behind these results, which may point to directions for future studies in this area. Growing economic and regional disparities in China. As the results of our analysis indicate, the overwhelming majority of child beggar cases reported by volunteers in Chinese social media take place in cities in the eastern party of China, but the children themselves are either from rural areas or Western provinces in China. According the “three economic belts” scheme based on the Seventh Five-Year Plan (1986–1990), east China tends to be economically more developed than west China and the gap between urban and rural areas in China also tend to enlarging rather than narrowing in recent years since a common hypothesis is that street child beggars would prefer a location with higher level of economic development, the spatial distribution of the child beggars in fact reveal the growing regional and urban-rural disparities in China. Despite improvement in economic development levels across China in recent years

318

X. Yang and D. Z. Sui

(Fan and Sun 2008), Gini coefficient in China—one of the leading indicators for regional and social disparity—reached a dangerous level. According to the World Bank (2005), the Gini coefficient, has risen from 0.33 in 1980 to 0.47 in 2005. Many other studies have also provided strong evidence on the growing regional inequality in China (Wei and Ma 1996; Fan and Sun 2008) using the social-economic statistics data. The geographical distribution of child beggars and missing children is a manifestation of this growing pain in China—an important issue China must confront in the coming years in order to achieve the goals of sustainable development it set for itself. VGI, Social Media, and People-based GIS. Methodologically, this paper aims to cross the quantitative and qualitative chasm by using volunteered geographic information (VGI) harvested from social media sites. Through a combination of GIS-based quantitative spatial analysis and narrative-based qualitative approach, this paper revealed multiple dimensions of the child beggar and missing children issues in China that would be almost impossible due to the lack of needed data. We are fully aware that there are multiple challenges that need to be addressed in using VGI for geographic research before its full potential can be realized (Elwood et al. 2012). Our paper is a modest step toward that direction. Furthermore, as more and more detailed data are available at the individual level tagged with explicit spatial as well as temporal information, we are apparently moving closer to what Miller (2003) envisioned as the “people-based GIS.” With location-awareness technologies increasingly embedded in mobile phones, cameras, pedometers, and other hand-held devices, citizens can nowadays record their daily activity with accurate location in real time and instantaneously shared with the rest of the world. These massive individual level data with spatial and temporal information is rapidly merging with various other streams of big data and will serve as a gold mine for GIScience researchers to practice people-based GIS. In addition, with the ever expanding blogosphere and various social media, we will be accumulating a massive amount of “deep” qualitative data and stories about people’s lives as well (Manovich 2011). The emergence and growing popularity of social media could give us the chance to have deep data for many (Sui and Goodchild 2011) to practice the so-called people-based GIS. The fusion of GIS and social media could help us not only map the distribution of spatial phenomena, such as the uneven distribution of child beggars in China, but also trace the relevant trajectories of certain people or events. For example, when a child beggar is posted on social media sites, concerned citizens will help to spread the word so that they can gain more public attention. Sometimes, people may also add comments or evidences to the original post which may help to rescue the children. Successful cases have been reported through this thematic account that some begging children are recognized by their biological parents later or rescued by the local police.

18 Can Social Media Rescue Child Beggars?

319

18.6 Conclusions The goal of this paper was to explore the spatial and temporal patterns of child beggars and missing children in China using volunteered geographic information as our primary data source. Volunteered geographic information harvested from the leading Chinese social media website—Sina WeiBo and “Baobeihuijia” were used for our empirical analysis. Exploratory spatial data analysis (ESDA) was used to explore the spatial patterns of child beggars in China. Because no spatial autocorrelation was detected using univariate Moran’s I, we implemented an ordinary least squares (OLS) linear regression model to find which socio-economic variables would be used to fit the model. The results show that child beggars are mostly observed in cities in South-east China—the most prosperous and well developed regions in China. Development levels as measured by HDI is found to be two critical factors that that explain the spatial distribution of child beggars in China. The study also finds out VGI on the children issue varies across time and a space-time path is built to facilitate our hypothesis. Our preliminary results have revealed interesting spatial and temporal distribution and mobility patterns for street child beggars in China. The geographical patterns are consistent with the growing regional and social disparities in China. By engaging in more detailed spatial and temporal data at the individual level, we are moving a step closer to people-based GIS. There are limitations for this project and further studies are warranted. First of all, the volunteered geographic information are collected and aggregated at the provincial level, which served our goals well because we want to understand the macro spatial and temporal mobility patterns of child beggars. We realize that such kind of bottom-up level data may present different distribution if we choose alternate scale and this need to be further clarified. We also assume that each entry in our dataset is independently contributed, without taking the author of each contribution into account. Providing that some of the entries may be contributed by the same author, we may have a different result for our study. For the future study, since we only look for the spatial autocorrelation at a national level in our study by performing the global univariate Moran’s I, a sub-regional dimension study can be conducted to compare the result to what we have right now. In addition, since the count number is a raw observation of the events, we may perform a further study by looking the specific rate of each province rather than just looking the raw count number. Another improvement would be that more social-economical factor and variable would be applied to robust the regression model. In addition, we could account for the author and content of each contribution to conduct a network analysis of this subject. To be concluded, the future research on this topic will be in line with the overall trend in spatial social science and humanities research that we have now entering to a data-rich environment. Spatial Synthesis and Computational methods are crucial to advance Social Science and Humanities (Okabe 2016; Ye et al. 2016). Therefore, a powerful analytical framework for identifying space-time research gaps and frontiers

320

X. Yang and D. Z. Sui

in the child beggars and missing children issue is fundamental to comparative study of spatiotemporal phenomena. This will also call for an inter-disciplinary efforts to include experts from computational science, social science and humanities.

References Abebe, T. (2009). Multiple methods, complex dilemmas: negotiating socio-ethical spaces in participatory research with disadvantaged children. Children’s Geographies, 7(4), 451–465. https:// doi.org/10.1080/14733280903234519. Anselin, L. (1998a). Exploratory spatial data analysis in a geocomputational environment. pp. 77–94 in Geocomputation, A Primer, edited by P.A. Longley, S. Brooks, B. Macmillan and R. McDonnell. New York: John Wiley. Anselin, L. (1998b). Interactive techniques and exploratory spatial data analysis. In P. A. Longley, M. F. Goodchild, D. J. Maguire, & D. W. Wind (Eds.), Geographical information systems: Principles, techniques, management and applications. New York: Wiley. Bailey, T., & Gatrell, A. C. (1995). Interactive spatial data analysis. Harlow: Longman. Bromley, R. D., & Mackie, P. K. (2009). Child experiences as street traders in Peru: Contributing to a reappraisal for working children. Children’s Geographies, 7(2), 141–158. Bunge, W. (1971). Fitzgerald: geography of a revolution. Cambridge, Morristown: Schenkman Pub. Co.; distributed by General Learning Press. Chen, X., & Yang, X. (2014). Does food environment influence food choices? A geographical analysis through “tweets”. Applied Geography, 51, 82–89. Cheng, F., & Lam, D. (2010). How is street life? An examination of the subjective wellbeing of street children in China. International Social Work, 53(3), 353–365. https://doi.org/10.1177/002 0872809359863. Christensen, P. M., & James, A. (2000). Research with children: perspectives and practices. Psychology Press. Clandinin, D. J., & Rosiek, J. (2007). Mapping a landscape of narrative inquiry. In Handbook of narrative inquiry: Mapping a methodology (pp. 35–75). Connelly, F. M., & Clandinin, D. J. (2006). Narrative inquiry. In Handbook of complementary methods in education research (Vol. 3, pp. 477–487). De Venanzi, A. (2003). Street children and the excluded class. International Journal of Comparative Sociology, 44(5), 98–114. Elwood, S., Goodchild, M. F., & Sui, D. Z. (2012). Researching volunteered geographic information: Spatial data, geographic research, and new social practice. Annals of the Association of American Geographers, 102(3), 571–590. Fan, C., & Sun, M. (2008). Regional inequality in China, 1978–2006. Eurasian Geography and Economics, 49(1), 1–20. Gong, X., & Yang, X. (2020). Social media platforms. In J. P. Wilson (ed.) The geographic information science & technology body of knowledge (3rd Quarter 2020 ed.). https://doi.org/10.22224/ gistbok/2020.3.2. Goodchild, M. F. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal, 69(4), 211–221. Goodchild, M. F., & Glennon, J. A. (2010). Crowdsourcing geographic information for disaster response: A research frontier. International Journal of Digital Earth, 3(3), 231–241. Gupta, D., Spitzberg, B., Tsou, M., Gawron, M., & An, L. (2012). Revolution in Social Science Methodology and Pitfalls. Available on-line at http://mappingideas.sdsu.edu/publications/Rev olution_final_version.pdf Accessed Sept. 15. Haining, R. (1990). Spatial data analysis in the social and environmental sciences. Cambridge: Cambridge University Press.

18 Can Social Media Rescue Child Beggars?

321

Hagerstrand, T. (1968). Innovation diffusion as a spatial process. Innovation diffusion as a spatial process. Hägerstrand, T. (1970). What about people in regional science? Papers of the Regional Science Association, 24, 1–12. Hart, R. (1979). Children’s experience of place. New York: Irvington Publishers; distributed by Halsted Press. Kwan, M. P., & Ding, G. (2008). Geo-narrative: Extending geographic information systems for narrative analysis in qualitative and mixed-method research. The Professional Geographer 60(4), 443–465. Kwan, M.-P., & Hong, X. (1998). Network-based constraints-oriented choice set formation using GIS. Geographical Systems, 5, 139–162. Liu, S. B., & Palen, L. (2010). The new cartographers: Crisis map mashups and the emergence of neogeographic practice. Cartography and Geographic Information Science Special Issue: New Directions in Hazards and Disaster Research, 37(1), 69–90. Manovich, L., 2011. Trending: The promises and the challenges of big social data [online]. Retrieved May 10, 2011, from http://www.manovich.net/DOCS/Manovich_trending_paper.pdf. Miller, H. J. (2003). What about people in geographic information science? Computers, Environment and Urban Systems, 27(4), 447–453. Moran, P. A. P. (1950). Notes on Continuous Stochastic Phenomena. Biometrika, 37, 17–33. Okabe, A. (2016). GIS-based Studies in the Humanities and Social Sciences. CRC Press. Panter-Brick, C. (2002). Street children, human rights, and public health: A critique and future directions. Annual Review of Anthropology, 31, 147–171. Shaw, S. L., Yu, H. B., & Bombom, L. S. (2008). A space-time GIS approach to exploring large individual-based spatialtemporal datasets. Transactions in GIS, 12(4), 425–441. Strehl, T. (2010). The risks of becoming a street child: working children on the streets of lima and Cusco. In G. K. Lieten (Ed.), Hazardous Child Labour in Latin America (pp. 43–65). Dordrecht: Springer Netherlands. Retrieved from http://www.springerlink.com/content/q9j56112161q6148/. Sui, D., & Goodchild, M. (2011a). The convergence of GIS and social media: Challenges for GIScience. International Journal of Geographical Information Science, 25(11), 1737–1748. Sui, D. Z., & Goodchild, M. F. (2011). The convergence of GIS with social media: new challenges for GIScience. International Journal of Geographic Information Science (in press). Tsou, M. H., & Yang, J. A. (2012, September). Spatial analysis of social media content (tweets) during the 2012 US Republican Presidential Primaries. In Proc. GIScience. Turner, A. (2006). Introduction to Neogeography. Sebastopol, CA: O’Reilly. Wang, Z., Wei, L., Peng, S., Deng, L., & Niu, B. (2018). Child-trafficking networks of illegal adoption in China. Nature Sustainability, 1(5), 254. Wei, Y., & Ma, Laurence J. C. (1996). Changing patterns of spatial inequality in China, 1952–1990. Third World Planning Review, 18(2), 177–191. World Bank. (2005). World Development Report 2006: Equity and Development. New York, NY: Oxford University Press. Yang, X., Ye, X., & Sui, D. Z. (2016). We know where you are: In space and place-enriching the geographical context through social media. International Journal of Applied Geospatial Research (IJAGR), 7(2), 61–75. Young, L., & Barrett, H. (2001). Adapting visual methods: Action research with Kampala street children. Area, 33(2), 141–152. https://doi.org/10.1111/1475-4762.00017. Ye, X., Li, S., Yang, X., & Qin, C. (2016a). Use of social media for the detection and analysis of infectious diseases in China. ISPRS International Journal of Geo-Information, 5(9), 156. Ye, X., Huang, Q., & Li, W. (2016b). Integrating big social data, computing and modeling for spatial social science. Cartography and Geographic Information Science, 43(5), 377–378. Ye, X., Li, S., Yang, X., Lee, J., & Wu, L. (2018). The fear of Ebola: A tale of two cities in China. In Big data support of urban planning and management (pp. 113–132). Springer, Cham.

Part IV

Spatial Synthesis in Urban Science

Chapter 19

Spoofing in Geography: Can We Trust Artificial Intelligence to Manage Geospatial Data? Bo Zhao, Shaozeng Zhang, Chunxu Xu, and Xiaobai Liu

19.1 Introduction Ever since the turn of the century, the popularity of Artificial Intelligence (AI) has been driven by concurrent advances in computer science and information technologies. As an increasingly significant component of modern technology, AI has and will continue to upgrade or replace various aspects of traditional industry, and further emancipate people from monotonous tasks. Today, deep learning, a core branch of AI that incorporates deep artificial neural networks, has significantly challenged and improved the accuracy and efficiency of other AI systems. Hence, new innovations have occurred in various applied realms, such as behavioral simulation, image recognition, natural language processing, and content recommendation, to name a few. In retrospect, GIScience has actually incorporated the early versions of AI for more than three decades such as agent-based modeling, object detection of aerial imagery, and cellular automaton simulation. Since 2015, a newly emerged technique, GeoAI, has been celebrated for its potential to provide tremendous opportunities to leverage B. Zhao (B) Department of Geography, University of Washington, Seattle, WA 98195, USA e-mail: [email protected] S. Zhang Program of Anthropology, Oregon State University, Corvallis, OR 97330, USA e-mail: [email protected] C. Xu College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, OR 97330, USA e-mail: [email protected] X. Liu Department of Computer Science, San Diego State University, San Diego, CA 92182, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_19

325

326

B. Zhao et al.

GIScience with a series of AI advances (Sirosh 2018; Boulos et al. 2019; Hu et al. 2019; Janowicz et al. 2019). However, we should not neglect those negative impacts and problematic uses of AI techniques, such as falsifying transmitted GPS signals to change the planned course of a yacht (Kerns et al. 2014), creating bots to crawl private Internet information, and spreading fake news to targeted groups on Facebook. In light of the controversy surrounding this technology, we encourage geographers, GIScientists and researchers in other fields to explore critically its complicated implications on individuals and society. Therefore, in this chapter, we would like to paint a holistic picture of the convergence of AI and GIScience and its impacts on society by examining AI-involved “spoofing” in geography. In the remaining sections of this chapter, we will discuss the connotations of spoofing and the three types of AI. Then, analyze three case studies to demonstrate the extensive and significant effects of AI-involved spoofing on different aspects of geography. To cope with the complex status quo, we will discuss its impacts on data quality and suggest methods for detecting spoofing. We conclude this chapter by envisioning the potential impacts of AI on society and humans.

19.2 Related Works In this section, we provide a brief introduction to AI and make an overview of spoofing in geography, before we present and analyze the cases of AI-involved spoofing later in this chapter.

19.2.1 What Is Spoofing? The term “spoof” was coined by the British comedian Arthur Roberts (1852–1933) to describe a poker game with a hoaxing trait. It is often used to characterize the deceptive features of a phenomenon. In cartography and geovisualization, Monmonier (1996) describes several approaches to lying or spoofing with regard to maps, and further argues that the lies inherent in maps are an unavoidable cartographic paradox. In other words, “to present a useful and truthful picture, an accurate map must tell white lies.” Monmonier, even contends that map generalization can be viewed as a type of “spoofing”, mainly because mapmakers need to generalize the visual representation of geographic objects to create an efficient and legible map in a predetermined scale. In the context of geography, spoofing entails a deliberate inconsistency between the reported location and the actual geographical location. To qualify as a spoofing phenomenon, there must be two indispensable components—positioning inconsistency and spoofing motivation (Zhao and Sui 2017). Here, the former can be accomplished mainly through various types of media, including, for example, ‘HTML5 Geolocation’ Application Programming Interface (API) (Casario et al. 2011), fake

19 Spoofing in Geography: Can We Trust Artificial Intelligence …

327

location Apps (Wehner 2016), Virtual Private Network (VPN) to update the system’s locational information (Anderson 2016), or GPS spoofing to affect the accuracy of the GPS signal in the surrounding environment (Grant et al. 2009). Spoofing motivation refers to the intentions of creating a piece of geographic information that contains aspects of positioning inconsistency. Spoofing, despite its often-negative connotation, can be used in a positive way. For example, people utilize spoofing as it relates to geography for many reasons such as satisfying curiosity, research interests, entertainment, or protecting privacy (Zhao and Sui 2017). Today, the proliferation of geospatial big data shifts the focus of data quality away from traditional surveying and mapping to a more human-centric one (Flanagin and Metzger 2008; Goodchild 2008). Geospatial data are being generated by ordinary people for various purposes, such as for use in gaming (Zhao and Zhang 2018). This trend has brought about new research questions regarding the producers’ motivations for generating such inconsistent data.

19.2.2 AI, Machine Learning, and Deep Learning In this section, the development of AI techniques, especially its two major subfields— machine learning and deep learning, are reviewed. Regarding their relationship, the former is a subset of AI, while the latter is a specific type of machine learning that utilizes deep neural networks (see Fig. 19.1). Fig. 19.1 AI versus machine learning versus deep learning

328

B. Zhao et al.

AI, as defined by one of its early pioneers McCarthy (1989), is the science and engineering of making intelligent machines. It is also referred to as machine intelligence, for the vast capacity of machines it demonstrates, in contrast to that of humans. AI indicates any computer/machine program that behaves like a human, including analytical reasoning, knowledge representation, classification, learning, natural language processing, perception, and robotics (Poole et al. 1998; Luger 2005; Russell and Norvig 2016). AI, as an increasingly significant component of modern technology, has been taking over many sectors of traditional industry and further emancipating people from monotonous and repetitive tasks. In the past five years, AI has experienced a resurgence following advances in computer processors (i.e., dual, quad and even multiple cores, graphics processing unit, etc.) and big data analytics (i.e., paralleling computing), it has become a forefront of today’s industrial development and social progress. Machine learning is a subset of AI that enables computers/machines to actively learn without being explicitly programmed by focusing on automated extraction of patterns from data sets (Samuel 1959). In other words, it mainly uses statistical techniques to enable computer systems to learn from data input and make predictions. Because machine learning can overcome strictly coded algorithms and incorporate data-driven decision-making processes, it has been utilized in a range of computingrelated tasks, such as content filtering, network detection, agent-based planning, and computer vision. In geospatial analysis, particularly, machine learning has been widely used in land cover classification (Bruzzone et al. 2006) and geostatistical modeling and simulation (Recknagel 2001), and contributes to many fields such as urban planning (Chan et al. 2001), disaster management (OFLI et al. 2016), and climate change adaptation (Tripathi et al. 2006). Deep learning is a specific type of machine learning that makes use of complex neural networks. The term “deep” refers to the significant number of layers within the neural network. These multiple layers allow deep neural networks to execute a task in a hierarchical manner. A deep learning algorithm usually requires highperformance computational support; therefore, multi-core processors are utilized to speed up the progress of mapping and simplify the hierarchical layers. Like other techniques in machine learning, deep learning relies on data analysis instead of task-specific programs. It utilizes, but is not limited to, artificial neural networks, convolutional and recursive neural networks, long and short-term memory, and deep belief networks, which have produced results comparable and in some cases superior to that of human experts. For example, Alpha Go has demonstrated “wisdom” that exceeds that of its human competitors. The “deep layers” facilitate feature extraction, which is a major challenge in traditional machine learning, and therefore have achieved massive success in the tasks related to image analysis, like scene and object detection, segmentation, and classification (Zou et al. 2015; Nogueira et al. 2017; Zhu et al. 2017; Qiu et al. 2019).

19 Spoofing in Geography: Can We Trust Artificial Intelligence …

329

19.3 Three Spoofing Cases 19.3.1 Bots in Location-Based Gaming Ingress is a location-based mobile game that allows the movement of its players realtimely synchronize to the digital game. The game frame is made by geographical data sets from Google Maps or OpenStreetMaps. While playing the game, the avatar of a player locates on the digital game frame at the extact physical place of the real world. The portals are the most important game resource, they are made of by locations of interests, such as monument, historic building, unique archicture of a city. These portals can only be visible by activating the agumented reality feature of the mobile phone. Once a portal is scanned by a mobile phone, a player can claim, modify or enhance this portal and also interactive with other players who have been visited the portal before. Moreover, players can gain game items (i.e., resonators, bursters) by moving a certain distance in the game, such as walking, cycling, or driving. These items will accelerate a player to obtain a portal. It is quite time consuming and labor intensive to excerise in the real space, another way of preventing such an tedious excerise is to use game bots. It can automate the above-mentioned task by simulating how an actual human player performs in the game. The game bot is a classic AI robot that can overcome both human physiological and geographic barriers. The game bot can play for its user all day long because it does not need to sleep or replenish the energy. Moreover, Table 19.1 shows a code snippet of LocationRunner.java1 from an openly accessible GitHub repository— ingress-bot. The piece of code programs a game bot to imitate a human player’s movement between two locations—currentLocation and the newLoc—in the real world, no matter how far away in between. In this way, a bot allows a player to visit any portal on the Earth, thereby accelerating a player’s upgrade. A game bot as such can cheat for game rewards. To prevent the use of game bots, the game company of Ingress “Niantic” has enacted a measure to terminate a player’s account if the player is caught violating the terms of use. Even the game bot in location-based game has been viewed as game-breakers who make the game unfair. However, as revealed by Zhao and Zhang (2019) in broader social contexts, such bots have also been used to challenge the unequal spatial disparity of accessing game resources and to resist the extremely profit-oriented gaming industry.

1 Refer

java.

to the copy snippet at https://github.com/Maome/ingress-bot/blob/master/LocationRunner.

330

B. Zhao et al.

Table 19.1 Code snippet for player walking simulation

String[] newLocaon = curLine.split(","); S2LatLng newLoc = S2LatLng.fromDegrees(Double.parseDouble(newLocaon[0]),Double.parseDouble(newL ocaon[1])); //get distance in meters Double dist = S2Wrapper.GreatEarthDistance(currentLocaon, newLoc); TransitHandler th = new TransitHandler(currentLocaon, newLoc, gui); th.start(); int waitTimeSeconds = (int) (dist/5.0); System.out.println("Waing " + waitTimeSeconds + " seconds to arrive."); DebugHandler.debugInfo("Waing " + waitTimeSeconds + " seconds to arrive."); th.join();

19.3.2 Location Spoofing on Twitter In reaction to the Iranian 2009 presidential election, the government of Iran began to monitor activities on social media regularly (Ansari 2012). Online public participatory political activity became a target for harsh criticism from the government; therefore, social media platforms were frequently shut down during the election campaign. In order to resist this surveillance as well as to protect the Iranian protestors, many international supporters used the technique of location spoofing on social media, including Twitter, to mislead the Iranian government. These supporters set their locations and time zones to Tehran, Iran in order to protect the local opposition leaders who were physically in Tehran from Iranian government harassment and persecution. In their post texts, as shown in Fig. 19.2, Twitter participants were honest about their participation in this spoofing movement. They spoke openly to call for more Twitter users to change location and timezone to Tehran. This appeal reveals that Twitter participants were not necessarily driven by dishonest intentions. To understand the diffusion of location spoofing on social media, we need to look closely into social media platforms’ content filtering algorithms that are classic machine learning algorithms. Content filtering algorithms sift through a massive amount of information to offer information of potential interest to individual users. The filters dynamically updated through a set of evolving rules which are continuously learned from the timely inputs. Specifically, Twitter’s timeline function uses

19 Spoofing in Geography: Can We Trust Artificial Intelligence …

331

Fig. 19.2 Twitter users openly called for location spoofing

a content filtering algorithm to determine the order in which tweets appear on a Twitter user’s news feed page. Two ordering options are offered. The “top Tweets first” option places the tweets that a user is likely to care about on top. A content filtering algorithm identifies which ones will be mostly cared about. The second is the “latest Tweets first” option that lists tweets based on the temporal sequence of their created time. The latest tweet appears on the top.2 In other words, the timeline function, if the first option is selected, creates an illusion of a chronological list of events as they occur; however, it is in fact only a list of tweets selected and then ordered by the machine learning algorithm. The timeline function leads to the increasingly recognized “filter bubble effect” on social media (Gross 2017), in which users are selectively fed with opinions with which they are likely to agree and news stories which they are interested in reading. In this way, online social media’s content filtering function directly manipulates the diffusion of online information. As a result, in this case, Twitter’s content filtering algorithm practically amplified the location spoofing campaign during the Iran 2009 election. This location spoofing case from 2 Refer

to https://help.twitter.com/en/using-twitter/twitter-timeline.

332

B. Zhao et al.

social media brings to light the unexpected use and impacts of machine learning algorithms. Although the machine learning algorithm per se does not generate the location spoofing, it does have accelerated the virally spreading of spoofing over the Twittersphere. Moreover, the social implication of the machine learning-facilitated spoofing is not merely for the deception purpose. These tweets of location spoofing were not created to deceive other Twitter users. But rather, they meant to confuse government agencies by taking advantage of Twitter’s content filtering algorithm. This is not an isolated case. Civil disobedience in the cyber world or “hacktivism” has proven to be powerful. An early example could be dated back to 1998 when hacktivists blockaded the Pentagon website via Electronic Disturbance Theater (EDT) in a “virtual sit-in protest” (Meikle 2002). Since the widespread use of online social media, ordinary users turn them into open platforms and generate bottom-up influence (Selwyn 2012, Correa 2013). Social media collective actions have become remarkable bottom-up initiatives that lead to the creation of public value (Hansen et al. 2010). The use of machine learning algorithms for location spoofing in cyber-based civil disobedience activities is relatively recent but is undoubtedly proliferating, such as in the Standing Rock protest in 2016. Machine learning algorithms play a powerful and effective role in spreading the practice of location spoofing and amplifying the overall non-violence resistance against technological and political authorities.

19.3.3 Simulated Image of a Place Place is one of the most important concepts in Human Geography. According to John Agnew (1987), a place has three fundamental components, in terms of the location, locales and the sense of place. While being in a place, an individual can see, hear, touch, and smell a place. Indeed, a holistic understanding of a place needs to explore it through multiple sensations. Computer vision techniques have been frequently used to explore the context of a place from a visual perspective. For example, the objects from an image of a place can be identified using imag recoganition algirhtms, such as PlaceNet, ImageNet, Yolo. As a result, image can be a very important resource for understanding the context of place. However, images can also be falsified. Especially using deep learning involved algorithms, such as Generative Adversarial Networks (GANs), a generated image can be uncannily real. As a class of deep learning, GANs is designed to learn data patterns (Goodfellow et al. 2014; Salimans et al. 2016). The generated data set will learn the statistical traits of the traning set. In the realm of cartography and GIScience, previous studies have introduced the potential of transfering map styles (Isola et al. 2017), conducting spatial interpolation and simulate satellite images (Xu and Zhao 2018). To show the utility of a GAN in generating a image of place, we use the tensorflow port of pix2pix (Isola et al. 2017). The pix2pix model performs with a pair of images. For example, if a model has been trained by a large amount of pairs of place images and their structures, we can use a new structure to automatically generate a new

19 Spoofing in Geography: Can We Trust Artificial Intelligence …

333

Fig. 19.3 Simulated image of place

place. To show how it works, we try to generate the facades of a building by inputing an abstractive structure (see Fig. 19.3). As shown, the input are some structures like background, wall, door, window, balcony and entrance, but the output is a image of building facade. As shown, the image looks so realistic. As a result, we should be more contious whenever we interpret the context of place using images.

19.4 Spoofing Detection Confronted with AI’s touted capacity and the accompanying spoofing issues, it is evident that we cannot entirely rely on AI to generate, process or interpret geospatial data. The technical nuisance and human aspects of AI involvement in geospatial spoofing are troubling to, yet not sufficiently recognized by, GIScientists as well as other data users. Spoofing, unlike other types of data inconsistencies, is intentionally generated. However, this critical difference has been largely ignored in the simple habitual use of the conventional methodology of data quality determination and data cleaning. Therefore, we urge that the trustworthiness of geospatial data should be examined by two organically related strategies—the detection of geospatial inconsistency and an investigation into the motivations for spoofing. Specifically, scholars can carry out surveys, questionnaires and various kinds of participatory observations to qualitatively interpret motivation. However, it is often difficult and sometimes even impossible to learn of real intentions through self-reporting measures if the selfreporting may violate professional ethics, game rules, or social norms. For example, in the Ingress case, players would be reluctant to discuss their reasons for using bots due to fear of losing the advantages gained by spoofing, acquiring a bad reputation by breaking game rules, or punishment by the game company. Compared with the spoofing motivation investigation, the underlying geospatial inconsistency can be quantitatively detected. From a theoretical perspective, any

334

B. Zhao et al.

geospatial inconsistency detection is supposed to check whether the examined data conform to the common geographic knowledge or not. If not, it is highly likely that the geospatial data were spoofed. In human history, various geographic knowledge has been discovered and tested. For example, Tobler’s first law of geography has revealed the inherent spatial correlation of any geographic phenomenon. If a geographic phenomenon is extremely against this law, it is highly likely that this phenomenon is inconsistent with the ground truth. Among the potentially useful approaches based on common geographic knowledge, time geography (Hägerstraand 1970) can provide theoretical support to detect the inconsistency from user-generated data sets. In time geography, each individual is on a unique life path in space-time, which is hindered by the following three constraints: (1) capability constraints—limitations on the activity of individuals due to their physical makeup and available resources; (2) coupling constraints—the spatiotemporal limitations of an individual, such as having to rely on other people or materials to support producing, consuming and transacting, and (3) authority constraints—the restricted space-time an individual is capable of accessing. Over the years, several geometric tools have been developed (e.g., space-time prism, cone, path, etc.) to measure and gauge human activities in a spatial-temporal context. Based on the categorizations from time geography introduced above, we proposed a set of constraint rules from common sense geographic knowledge, as listed in Table 19.2. If any of the rules are violated, the examined data can be considered as spoofing. In Table 19.2 Detection principles based on human behavioral constraints Category

Human behavioral constraints derived from geographic knowledge

Capability

R1: Unable to surpass the maximum speed of the civil airplane (approx. 1,000 km/h), when on air R2: Unable to surpass the maximum speed of the fast-available train. (e.g., Amtrak in the U.S., High-speed train in China, Shinkansen in Japan, etc.) R3: Unable to surpass the highway speed limit when in an urban area R4: Unable to surpass the fast-available ship when in the ocean or on the river R5: Unable to publish posts on a social media platform from precisely the same location considering the human behavioral pattern and GPS noise

Coupling

R6: Unable to travel to places other than the airport before or after an air trip R7: Unable to stay off the road, air, and ship course, if inside of a vehicle R8: One’s activities on the Earth surface should be in a clustered pattern rather than a regular or a complete spatial random pattern R9: One seldom visits a place that has been rarely visited by others.

Authority

R10: Unable to appear in military, fishing, or other forbidden areas R11: Unable to publish posts on a social media platform where the services are blocked (e.g., Twitter in North Korea, etc.) R12: Unable to publish social media info from a place of no Internet access. (e.g., some depopulated areas in the Sahara Desert, Cascade Range, etc.)

19 Spoofing in Geography: Can We Trust Artificial Intelligence …

335

practice, we can model these rules as mathematical conditions to identify suspicious cases quantitatively. Notably, the above rules are derived from time geography, so they are not applicable for detecting environmental inconsistencies or simulated image of place in particular. More importantly, the GANs-generated spoofing is even more challenging to detect because the simulated geographic data, including images, can mimic the geographic environment in an incredibly authentic way. To address this, we could look into the traits of data flaws caused by GANs (Marra et al. 2018, McCloskey and Albright 2018; Zhang et al. 2019) for detecting simulated geographic data. Besides, we argue that local knowledge of the geographic context is useful in checking the data trustworthiness. For example, if we know the facades in Fig. 19.3 does not exist in a certain place, it is also a proof of the authenticy of examined image.

19.5 Concluding Remarks: Fake Geography? Technologies, including AI since more recently, have been one of the major driving forces of human civilization. Geospatial technological advances, for example, enable us to deepen the understanding of the spatial dimension of our society and environments. Through this chapter, the divergent impacts of geospatial technologies are unconcealed. In this paper, instead of repeatedly praising the new opportunities AI would bring to GIScience, we attempted to raise public awareness of how the convengence of AI and GIScience may transform the trustworthiness of geospatial data as well as our perceptions of the geographies. To enable a holistic evaluation, we listed three spoofing phenomena generated by a generic AI, a machine learning system, and a deep learning algorithm—GANs—to explain how AI and its derivatives might falsify geospatial data. Thus, if we do not invest new cognative strategies to investigate the social impact of spoofings as well as concomitant detection approach, the rare cases of AI-involved spoofing may possibly develop into a “fake geography” dystopia (Maclenan 2018). Therefore, neither the above-mentioned extreme optimism nor pessimism is a reasonable response to the encounter of AI and geospatial data. Indeed, none of the three kinds of AI-involved spoofing is positive or negative. All of them are simply a part of human life. The game bots are tools of cheating and breaking the game rules, but they are also tools of challenging the unequal geospatial order of accessing game resources and the extremely profit-oriented gaming industry. In the Iran protest, AI-involved location spoofing as a crucial means of civil disobedience and political resistance against political and technological powers. Simulated image of place can be misleading, but can also be very useful, for example, in imagining an disappeared place or preserving our memery of a place in fast urban transformation. Therefore, we land on a conclusion that the convergence of AI and GIScience is neither a utopia nor a dystopia. If so, what we need is neither to embrace it without questioning nor to reject it as dystopia or random uncertainties. Accordingly, the proposed spoofing detection is so important since they enable us to understand the

336

B. Zhao et al.

AI-involved spoofing better and hopefully inform the ongoing process of human civilization in which the spoofing (or lies, fake news, etc.) plays a significant role. Overall, during today’s ongoing shift to the data-intensive environment, a critical lens on geospatial technologies is not just helpful but even urgently required. Such a critical lens enables us to examine spoofed geospatial data, interpret the spoofing behavior, and apprehend the social implications of GeoAI.

References Agnew, J. (1987). The United States in the world-economy: A regional geography. Cambridge, UK: Cambridge University Press. Anderson, M. (2016). How to defeat VPN location-spoofing by mapping network delays. The Stack. Retrieved Feb 20, 2016, from https://thestack.com/cloud/2016/02/16/vpn-network-time-delayabdelrahman-abdou-cpv/. Ansari, A. (2012). The role of social media in Iran’s Green Movement (2009–2012). Global Media Journal-Australian Edition, 6(2), 1–6. Boulos, M. N. K., Peng, G., & Vopham, T. (2019). An overview of GeoAI applications in health and healthcare. BioMed Central. Bruzzone, L., Chi, M., & Marconcini, M. (2006). A novel transductive SVM for semisupervised classification of remote-sensing images. IEEE Transactions on Geoscience and Remote Sensing, 44(11), 3363–3373. Casario, M., Elst, P., Brown, C., Wormser, N., & Hanquez, C. (2011). HTML5 Geolocation API. HTML5 Solutions: Essential Techniques for HTML5 Developers. Springer, 263–280. Chan, J. C.-W., Chan, K.-P., & Yeh, A. G.-O. (2001). Detecting the nature of change in an urban environment: A comparison of machine learning algorithms. Photogrammetric Engineering and Remote Sensing, 67(2), 213–226. Correa, T. (2013). Bottom-up technology transmission within families: Exploring how youths influence their parents’ digital media use with dyadic data. Journal of Communication, 64(1), 103–124. Flanagin, A. J., & Metzger, M. J. (2008). The credibility of volunteered geographic information. GeoJournal, 72(3–4), 137–148. Goodchild, M. F. (2008). Spatial accuracy 2.0. ed. Proceedings of the eighth international symposium on spatial accuracy assessment in natural resources and environmental sciences, 1–7. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. ed. Advances in neural information processing systems, 2672–2680. Grant, A. et al. (2009). GPS jamming and the impact on maritime navigation. Journal of Navigation, 62(02), 173–187. Hägerstraand, T. (1970). What about people in regional science? Papers in Regional Science, 24(1), 7–24. Hansen, D. L., Shneiderman, B., & Smith, M. (2010). Visualizing threaded conversation networks: mining message boards and email lists for actionable insights. ed. International Conference on Active Media Technology, 47–62. Hu, Y., Gao, S., Newsam, S. D., & Lunga, D. (2019). GeoAI 2018 workshop report the 2nd ACM SIGSPATIAL international workshop on GeoAI: AI for geographic knowledge discovery seattle, WA, USA-November 6, 2018. SIGSPATIAL Special, 10(3), 16.

19 Spoofing in Geography: Can We Trust Artificial Intelligence …

337

Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. ed. Proceedings of the IEEE conference on computer vision and pattern recognition: 1125–1134. Janowicz, K., Gao, S., McKenzie, G., Hu, Y., & Bhaduri, B. (2019). GeoAI: spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond. Taylor & Francis. Kerns, A. J., Shepard, D. P., Bhatti, J. A., & Humphreys, T. E. (2014). Unmanned aircraft capture and control via GPS spoofing. Journal of Field Robotics, 31(4), 617–636. Luger, G. F. (2005). Artificial intelligence: Structures and strategies for complex problem solving. Pearson education. Maclenan, A. (2018). Fake geography. GeoConnexion International, January, 18. Marra, F., Gragnaniello, D., Cozzolino, D., & Verdoliva, L. (2018). Detection of GAN-generated fake images over social networks. ed. IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 384–389. Mccarthy, J. (1989). Artificial intelligence, logic and formalizing common sense. Philosophical logic and artificial intelligence. Springer, 161–190. Mccloskey, S., & Albright, M. (2018). Detecting GAN-generated Imagery using Color Cues. arXiv preprint arXiv:1812.08247. Meikle, G. (2002). Future active: Media activism and the Internet. Routledge. Monmonier, M. (1996). How to lie with maps. Chicago, IL: University of Chicago Press. Nogueira, K., Penatti, O. A., & Dos Santos, J. A. (2017). Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recognition, 61, 539–556. OFLI, F. et al. (2016). Combining human computing and machine learning to make sense of big (aerial) data for disaster response. Big Data, 4(1), 47–59. Poole, D., Mackworth, A., & Goebel, R. (1998). Computational Intelligence: a logical approach. Oxford: Oxford University Press. Qiu, C., et al. (2019). Local climate zone-based urban land cover classification from multi-seasonal Sentinel-2 images with a recurrent residual network. ISPRS Journal of Photogrammetry and Remote Sensing, 154, 151–162. Recknagel, F. (2001). Applications of machine learning to ecological modelling. Ecological Modelling, 146(1–3), 303–310. Russell, S. J., & Norvig, P. (2016). Artificial intelligence: A modern approach. Malaysia: Pearson Education Limited. Salimans, T. et al. (2016). Improved techniques for training gans. ed. Advances in neural information processing systems, 2234–2242. Samuel, A. (1959). Some studies in machine learning using the game of checkers. Reprinted in EA Feigenbaum & J. Feldman (Eds.)(1963). Computers and thought. McGraw-Hill. Selwyn, N. (2012). Making sense of young people, education and digital technology: The role of sociological theory. Oxford Review of Education, 38(1), 81–96. Sirosh, J. (2018). Microsoft and Esri launch Geospatial AI on Azure [online]. Microsoft Azure. Retrieved Dec 3, 2018, from https://azure.microsoft.com/en-us/blog/microsoft-and-esri-launchgeospatial-ai-on-azure/. Tripathi, S., Srinivas, V., & Nanjundiah, R. S. (2006). Downscaling of precipitation for climate change scenarios: a support vector machine approach. Journal of Hydrology, 330(3–4), 621–640. Xu, C., & Zhao, B. (2018). Satellite Image Spoofing: Creating Remote Sensing Dataset with Generative Adversarial Networks (Short Paper). ed. 10th International conference on geographic information science (GIScience 2018). Zhang, S., Zhao, B., & Ventrella, J. (2018). Towards an Archaeological—Ethnographic Approach to Big Data: Rethinking Data Veracity. ed. Ethnographic Praxis in Industry Conference Proceedings, 62–85. Zhang, X., Karaman, S., & Chang, S.-F. (2019). Detecting and simulating artifacts in gan fake images. arXiv preprint arXiv:1907.06515.

338

B. Zhao et al.

Zhao, B., & Sui, D. Z. (2017). True lies in geospatial big data: detecting location spoofing in social media. Annals of GIS, 23(1), 1–14. Zhao, B., & Zhang, S. (2019). Rethinking spatial data quality: Pokémon go as a case study of location spoofing. The Professional Geographer, 71(1), 96–108. Zhu, X. X., Tuia, D., Mou, L., Xia, G. S., Zhang, L., Xu, F., et al. (2017). Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geoscience and Remote Sensing Magazine, 5(4), 8–36. Zou, Q., Ni, L., Zhang, T., & Wang, Q. (2015). Deep learning based feature selection for remote sensing scene classification. IEEE Geoscience and Remote Sensing Letters, 12(11), 2321–2325.

Chapter 20

A Complex-Network Perspective on Alexander’s Wholeness Bin Jiang

Nature, of course, has its own geometry. But this is not Euclid’s or Descartes’ geometry. Rather, this geometry follows the rules, constraints, and contingent conditions that are, inevitably, encountered in the real world. Alexander et al. (2012).

20.1 Introduction Nature, or the real world, is governed by immense orderliness. The order in nature is essentially the same as that in what we build or make, and underlying order-creating processes of building or making of architecture and design are no less important than those of physics and biology. This is probably the single major statement made by Alexander (2002–2005) in his theory of centers, in which he addressed the fundamental phenomenon of order, the processes of creating order, and even a new cosmology—a new conception of how the physical universe is put together. In the theory of centers or living geometry (Alexander et al. 2012), the wholeness captures the meaning of order and is defined as a life-giving or living structure that appears to some degree in every part of space and matter; see Sect. 20.3 for an introduction to wholeness and wholeness-related terms. As the building blocks of wholeness, centers are identifiable coherent entities or sets that overlap and nest each other within a larger whole. Unlike the previous conception of wholeness focusing on the gestalt of things (Köhler 1947), the wholeness of Alexander (2002–2005) is not just about cognition and psychology, but something that exists in space and matter. Different from the wholeness in quantum physics mainly for understanding (Bohm 1980), Alexander’s wholeness aims not only to understand the phenomenon of order, but also to create order in the built world or art. The wholeness is defined as B. Jiang (B) Faculty of Engineering and Sustainable Development, Division of GIScience, University of Gävle, 801 76 Gävle, Sweden e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_20

339

340

B. Jiang

a recursive structure. Based on this definition, Jiang (2015b) developed a mathematical model of wholeness as a hierarchical graph with indices for measuring degrees of life or beauty for both individual centers and the whole. This model helps address not only why a design is beautiful, but also how much beauty it has. However, this previous study had some fundamental issues on the notions of centers and wholeness unaddressed. Specifically, what are the centers, how are they created, and how do they work together to contribute to the life of wholeness? In addition, the wholeness remains somehow mysterious, particularly within our current mechanistic worldview (Alexander 2002–2005). To address these fundamental issues, this paper develops a complex-network perspective on the wholeness. A complex network is a graph consisting of numerous nodes and links, with unique structures that differentiate it from its simple counterparts such as regular and random networks (Newman 2010). Simple networks have a simple structure. In a regular network, all nodes have a uniform degree of connectivity. In a random network, the degrees of connectivity only vary slightly from one node to another. As a consequence, a random network hardly contains any clusters, not to say overlapping or nested clusters. On the contrary, complex networks, such as small-world and scale-free networks (Watts and Strogatz 1998; Barabási and Albert 1999), tend to contain many overlapping and nested clusters that constitute a scaling hierarchy (Jiang and Ma 2015; see a working example in Sect. 20.2). The scaling hierarchy is a distinguishing feature of complex networks or complex systems in general. For example, a city is a complex system, and a set of cities is a complex system (Jacobs 1961; Alexander 1965; Salingaros 1998; Jiang 2015c), both having scaling hierarchy seen in many other biological, social, informational, and technological systems. This paper demonstrates that the wholeness bears the same scaling hierarchy as complex networks or complex systems in general. Relying on the complex network perspective, this paper aims to demonstrate that wholeness is not just in cognition and psychology, but something that exists in space and matter. It also aims to show that the concept of wholeness is important not just for understanding the phenomenon of order, but also for creating order with a high degree of wholeness through two major structure-preserving transformations: differentiation and adaptation. I argue that there are major differences between the whole as a vague term and wholeness as a recursive structure. The wholeness comprises recursively defined centers induced by itself, whereas the whole, as we commonly perceive, comprises pre-existing parts. The mantra that the whole is more than the sum of its parts should be more truly rephrased as the wholeness is more than the sum of its centers. This paper examines the notions of wholeness and centers from the perspective of complexity science. It also discusses two types of coherence respectively created by differentiation and adaptation processes, which are consistent with the spatial properties of heterogeneity and dependence for understanding the nature of geographic space. The remainder of this paper is structured as follows. Section 20.2 briefly introduces complex networks and scaling hierarchy using head/tail breaks—a classification scheme and a visualization tool for data with a heavy-tailed distribution (Jiang 2013a, 2015a). Section 20.3 compares related concepts, such as whole and parts versus

20 A Complex-Network Perspective on Alexander’s Wholeness

341

wholeness and centers, and discusses the theory of centers using two examples of a cow and an IKEA desk. Section 20.4 presents three case studies to show how wholeness emerges from space and how it can be generated through the two major structure-preserving transformations of differentiation and adaptation. Section 20.5 further discusses implications of the complex-network perspective and wholeness. Finally, Sect. 20.6 draws a conclusion and points to future work.

20.2 Complex Networks and the Underlying Scaling Hierarchy Small-world and scale-free networks are two typical examples of complex networks, which fundamentally differ from their regular and random counterparts. A smallworld network is a middle status between the regular and random networks, so it has some nice properties of its regular and random counterparts. These properties are local efficiency of regular networks characterized by high clustering coefficient, and global efficiency of the random networks measured by short average path length (Watts and Strogatz 1998). Scale-free networks are a special type of complex networks, and their degree of connectivity demonstrates a power-law distribution, indicating far more less-connected nodes than well-connected ones (Barabási and Albert 1999). In other words, very few nodes, or hubs, have the highest degree of connectivity, many nodes have the lowest degree, and some in between the highest and the lowest. Complex networks are so called precisely because of their complex structure that involves a large number of nodes and components. The nodes are the basic units, while the components are those built from the basic units. Complex networks tend to contain many components, termed communities or clusters (Newman 2004). A cluster has many inside links and a few outside links, so constitutes a coherent sub-whole or sub-structure that adapts to its context. Clusters within a complex network nest each other, forming a scaling hierarchy of far more small communities than large ones (Jiang and Ma 2015). To illustrate, we adopt the Karate Club network (Zachary 1977), widely studied in the literature of social and complex networks. This network, consisting of 34 nodes and 78 links, can be broken down into 14 communities of different sizes: 28, 15, 10, 6, 5, 5, 3, 2, 2, 2, 1, 1, 1 and 1 (Fig. 20.1). Apparently there are far more small communities than large ones. There are also many nested relationships. For example, the community of size 28 consists of a community of size 15, which is further broken down into communities of sizes 6, 5, 2, 1, and 1. The scaling hierarchy is formed from the clusters, which are similar to human organs that function as independent coherent parts, but still fit into the context of the human body. More generally, a complex network is not assembled from mechanical parts, but is more like a tree or human body growing or unfolding from a seed or an embryo. From the design point of view, clusters as coherent parts are integrated into their contexts as a whole (Alexander 1964). This introduces the notion of adaptation or fit, which recurs between adjacent elements and systems, and

342

B. Jiang

Fig. 20.1 The Karate Club network broken into mutually nested communities (Note Panel a shows the social network, consisting of 34 nodes and 78 links. Panel b depicts nested relationships of the 14 communities of sizes 28, 15, 10, 6, 5, 5, 3, 2, 2, 2, 1, 1, 1 and 1 with far more small communities than large ones. These communities were detected by an algorithm inspired by head/tail breaks (Jiang 2013a, 2015b))

helps create harmony or coherence at individual local scales. Clusters resemble the concept of centers (see Sect. 3 for more details) as a building block of the wholeness. The underlying scaling hierarchy is an important property of complex networks or complex systems in general (Simon 1962). It appears in a variety of natural and societal complex systems such as social, biological, technological, and informational ones. This scaling hierarchy is not tree-like but rather a complex network with many redundant and overlapping links. This insight was originally observed through naturally evolved cities, as Alexander (1965) nicely articulated in the classic A City is Not a Tree. He used the term semi-lattice to refer to the scaling hierarchy simply because the concept of complex networks was not known then. In this respect, Alexander was far ahead of the time when complex networks were understood. More importantly, his thoughts are not just limited to understanding surrounding things, just as what current complex network theory does, but aim to create things with living structure (i.e., buildings, communities, cities or artifacts). The scaling hierarchy can be further seen from the relevance or importance of individual nodes within a complex network. Through Google’s PageRank (Page and Brin 1998), nodes’ relevance or importance can be computed, as there are far more less-important nodes than more-important ones. PageRank is recursively defined and resembles the wholeness as a recursive structure. This will be further discussed in the following section.

20.3 The Wholeness and the Theory of Centers The general idea of wholeness, or seeing things holistically, can be traced back to the Chinese philosopher Chuang Tzu (360 BC), who saw the structure of a cow as a complex whole, in which some parts were more connected (or coherent) than others.

20 A Complex-Network Perspective on Alexander’s Wholeness

343

The butcher who understands the cow’s structure always cuts the meat from the soft spots and the crevices of the meat, so makes the meat fall apart according to its own structure. The butcher therefore can keep his knife sharp for a hundred years. In the 20th century, wholeness was extensively discussed by many writers prominent in Gestalt psychology (Köhler 1947), quantum physics (Bohm 1980), and many other sciences such as biology, neurophysiology, medicine, cosmology and ecology. However, none of these writers prior to Alexander (2002–2005) showed how to represent or formulate wholeness in precise mathematical language. According to the Merriam-Webster dictionary, whole is something complete, without any missing parts. On the other hand, wholeness is the quality of something considered as a whole. However, both whole and wholeness have deeper meanings and implications in the theory of centers (Alexander 2002–2005). In particular, the notion of wholeness is so subtle and profound that Alexander (1979) previously referred to it as ‘the quality without a name’. He struggled with different names, such as alive, comfortable, exact, egoless and eternal, but none of these captured the true meaning of the quality. In this paper, as in Alexander (2002–2005), the three terms wholeness, life, and beauty are interchangeably used when appropriate to indicate order or coherence. Things with a high degree of wholeness are called living structure. Wholeness is defined as a recursive life-giving structure that exists in space and matter, and it can be described by precise mathematical language (Alexander 2002– 2005). Although whole and wholeness seem different, they are closely related, and sometimes refer to the same thing. The whole is on the surface and is referred to informally, while wholeness is below the surface and is referred to formally (Table 20.1). The term whole can be both a noun and an adjective. For example, a cow is a whole, and a cow is more whole than a desk. On the other hand, wholeness has two different meanings: a recursive structure, and as a measure for degree of wholeness or coherence. A whole is a relatively coherent spatial set and easy to see, because it appears on the surface. For example, the whole of a cow is the cow itself. The wholeness as a recursive structure is hard to see, since it lies deeper consisting of atoms, molecules, cells, tissues, and organs forming a scaling hierarchy. It is usually difficult to sense that one thing has a higher degree of wholeness, or is more whole than another. However, things assembled from parts are often less whole than things grown from embryos. For example, a cow has a higher degree of wholeness than a desk. We can also compare the two related terms parts and centers, of which the whole and wholeness respectively consist. Parts usually pre-exist in the whole and refer to mechanical pieces that are easy to see because they appear on the surface and are non-recursive and simple. Centers are mainly created by the wholeness and refer to organic pieces that are hard to see because they exist deeper below the surface, and are recursive and complex. The differences between the parts and centers show two different world views: the mechanistic, to which we are accustomed, and the organic, which underlies Alexander’s radical thought. To further elaborate on the wholeness, let us examine which of the two carpets (Fig. 20.2) is more whole, or has a higher degree of wholeness. Both the carpets possess a high degree of wholeness, but the left one has a higher degree, or is more whole, than the one on the right. This phenomenon of wholeness can be captured

344

B. Jiang

Table 20.1 Comparison of whole and wholeness using two examples of a cow and an IKEA desk Whole (noun + adjective)

Wholeness (structure + measure)

On the surface

Deeper below the surface

Consists of parts (see below subsection)

Consists of centers (see below subsection)

Easy to see, the whole of a cow is the cow itself

Hard to see as a recursive structure, the wholeness of a cow

Hard to sense, one thing is more whole than another

Hard to sense as the degree of coherence somehow like temperature

Easy to sense, a cow is more whole than a desk

Easy to sense, a cow has a higher wholeness than a desk

A whole assembled from parts like a desk

A desk has low wholeness or low coherence

A whole unfolded from an embryo like a cow

A cow has high wholeness or high coherence

Parts

Centers

Pre-existing in the whole

Induced by the wholeness

On the surface

Deeper below the surface

Easy to see

Hard to see

Mechanical

Organic

Non-recursive

Recursive

Simple

Complex

Fig. 20.2 Two carpets with different degree of wholeness (Alexander 1993) (Note The left is more whole than the right, or the left has a higher degree of wholeness than the right)

through the mirror-of-the-self experiment (Alexander 2002–2005). Two objects or their images are put side-by-side in front, and you are asked to choose one that represents a picture of your own deepest or truest self as a whole. The experiment is not to choose one that you prefer, which is likely to be idiosyncratic, accounting for 10 percent of our feelings. Instead, you must pick one that reflects your own inner self. This part of human experience, accounting for 90 percent of our feelings,

20 A Complex-Network Perspective on Alexander’s Wholeness Table 20.2 The 15 fundamental properties of wholeness

345

Levels of scale

Good shape

Roughness

Strong centers

Local symmetries

Echoes

Thick boundaries

Deep interlock and ambiguity

The void

Alternating repetition

Contrast

Simplicity and inner calm

Positive space

Gradients

Not separateness

is shared among people regardless of their faiths, ethics and cultures (Alexander 2002–2005). The wholeness exists both physically in the world and psychologically inside the human self. Physically, the phenomenon of wholeness comprises the 15 fundamental properties such as levels of scale, strong centers, boundaries, and local symmetries (see Table 20.2). The left carpet has more of the 15 properties than the right, or more of the mirror-of-the-self qualities than the right. The mirror-of-the-self experiment is somehow like relying on human sense to compare two temperatures. Our feelings cannot provide a precise measurement of temperature. Equally, it is usually hard to compare the wholeness of two complex things such as a cow and a tree. In this regard, the mathematical model of wholeness (Jiang 2015b), relying on a hierarchical graph for representing the wholeness, can accurately measure the degree of wholeness or life. The model is essentially a complex-network perspective on wholeness.

20.4 The Wholeness from a Complex-Network Perspective This section presents three case studies to show how wholeness is represented as a complex network of its centers, and how centers are created by wholeness. We begin with the simplest case study of a paper with a tiny dot, followed by studies of the Alhambra plan and the Sierpinski carpet. Through the case studies, we demonstrate that wholeness exists in space. More importantly, we elaborate on how whole or wholeness can be created step by step by following the 15 fundamental properties or structure-preserving transformations.

20.4.1 A Paper with a Tiny Dot The wholeness is very subtle, but also very concrete. To discuss wholeness, Alexander (2002–2005) presented a simple example involving a blank paper with a tiny dot (Panels a and b of Fig. 20.3). I use the same example to examine how the wholeness emerges and changes before and after a tiny dot is placed on the paper. A blank paper is a whole, which is easy to see. The wholeness of the blank paper is not hard to

346

B. Jiang

Fig. 20.3 A simple wholeness emerged from a blank paper with a dot placed (Note The dot in the sheet of paper created at least 20 entities, listed according to their relative strength: (1) the paper itself, (2) the dot, (3) the halo, (4) the bottom rectangle, (5) the left-hand rectangle, (6) the right-hand rectangle, (7) the top rectangle, (8) the top left corner, (9) the top right corner, (10) the bottom left corner, (11) the bottom right corner, (12) the ray going up, (13) the ray going down, (14) the ray going left, (15) the ray going right, (16) the white cross by these four rays, (17) the diagonal ray to the bottom right corner, (18) the diagonal ray to the bottom left corner, (19) the diagonal ray to the top right corner, and (20) the diagonal ray to the top left corner)

imagine because it is pretty simple, consisting of four corners and the gravitational center of the paper. However, after a tiny dot (no more than 0.0001 of the sheet) is placed, the whole remains without much change on the surface, but its wholeness changes dramatically. There are at least 20 latent centers, such as a halo around the dot, the four rectangles around the dot, the four corners, and different rays from the dot to the corners (Panels c-h of Fig. 20.3). Among the centers, only the paper and dot pre-exist. All others are induced by the overall configuration of space or wholeness. Seen in Fig. 20.3, these centers are real, not just in cognition and psychology. There are also supporting relationships among the centers, which constitute a complex network as a whole (Fig. 20.4a). The whole or the overall configuration of the space is the source of the centers’ strength, as indicated by the dot sizes. The paper with the dot is more whole than the blank paper itself. The wholeness of the original blank paper constitutes a simple network containing only five nodes (Fig. 20.4b). However, the network representing the wholeness of the paper with the dot is far more complex or whole than that of the blank paper (Fig. 20.4a). First, the whole of the blank paper contains only five nodes, whereas the whole of the paper with the dot contain the 20 nodes. Second, the 20 nodes form a striking scaling hierarchy, i.e., far more less-connected nodes than well-connected ones. Third, the

20 A Complex-Network Perspective on Alexander’s Wholeness

347

Fig. 20.4 The networks of the wholeness respectively emerged from a the paper with the dot, and b the blank paper (Note The first network consists of the 20 nodes, while the second network consists of only the four nodes, both with dot sizes indicating their strength. It is obvious that the paper with the dot is more whole, or has a higher degree of wholeness, than the blank paper)

20 nodes constitute different communities, such as the four corners, the four rays, and the top three nodes. Seen in Fig. 20.4a, the strongest center number 1 is followed by centers 2, 3, and so on. There are several sub-wholes, such as centers 8, 9, 10, and 11 as corners; 12, 13, 14, and 15 as rays; and 17, 18, 19, and 20 as diagonal rays. These sub-wholes constitute some implicit relationships within the complex network. Compared to the complex network presented in Sect. 20.2, the networks for the wholeness are directed. As a rule, weak centers tend to support strong ones, as shown in Fig. 20.4. This kind of support relationship is cumulative or iterative. This is why center 1 is the strongest, with nine in-links, which are further enhanced by some in-links in an iterative manner. Although center 2 has also nine in-links, it is weaker than center 1 because of the iterative nature of support relationships. The centers of 16, 17, 18, 19 and 20 are the weakest, because they only contain out-links. The hierarchy shown in Fig. 20.4a is very steep, while the one in Fig. 20.4b is very flat.

20.4.2 The Alhambra Plan The Alhambra plan possesses many of the 15 fundamental properties that help identify many latent centers. Thick boundaries, local symmetries, levels of scale, and strong centers are probably the most salient properties. There are at least 720 positive spaces or centers, and 880 relationships (Jiang 2015b, Fig. 20.5). There are more other centers created by the wholeness or the overall configuration of the space. For example, the whole of the plan consists of many nested sub-wholes or centers defined

348

B. Jiang

Fig. 20.5 The complex network of the 720 centers and 880 relationships of the Alhambra plan (Note It is essentially the density of the centers, and their complex relationships that make the plan alive and beautiful, forming so called living structure or structural beauty. Every center is well adapted to its surrounding, and there are far more small centers than large ones, which arises from the continuous or iterative differentiation processes. The complex network as a whole is no less ordered than the rigid tree (Alexander 1965))

at many different levels. It consists of three sub-wholes at the first level, which can be named as the left sub-whole, the middle sub-whole, and the right sub-whole. Each of the three sub-wholes is asymmetrical, containing further three sub-wholes, so there are nine sub-wholes at the second level. The left sub-whole comprises three subwholes: left, middle and right. Each of the middle and right sub-wholes comprises three sub-wholes: top, middle and bottom. The nine sub-wholes at the second level can be further broken down into sub-wholes or centers at the third or fourth level, recursively. The whole and its numerous sub-wholes (or centers) constitute mutually reinforcing and supporting relationships, usually with small surrounding centers pointing to big central ones. Scaling hierarchy is eventually formed, and it is the source of life or beauty of the building complex. The plan lacks global symmetry, but it is full of local symmetries at different levels of scale, such as the three sub-wholes at the first level and the nine sub-wholes at the second levels. These local symmetries are mainly created by numerous walls or thick boundaries as one of the 15 properties. Each of the local symmetries adapts its individual local context or need to make a better space. Overall, the plan has a good shape, because it comprises many good shapes at different levels of scale in an iterative or recursive manner. Some of the shapes echo each other in the overall configuration of the space. Eventually the large number of local symmetries and many of the other 15 properties recurring in the plan contribute a great deal to the life of the individual centers and the plan as a whole. The plan was likely to be designed unselfconsciously (Alexander 1964), but no one can deny the presence of the 15 fundamental properties. In this regard, the plan could be thought of as generated by iteratively applying the 15 properties, namely transformation properties, to the whole

20 A Complex-Network Perspective on Alexander’s Wholeness

349

by differentiation and adaptation in a step-by-step fashion. The space continues to be differentiated with a wide range of scales from the smallest to the largest. Each step creates the context for the next one, and each wholeness develops, or more truly unfolds, from the previous one. This generation process underlies the structurepreserving or wholeness-extending transformations (Alexander 2002–2005).

20.4.3 The Sierpinski Carpet The Sierpinski carpet, although being one of many strictly defined fractals (Mandelbrot 1982), possesses a high degree of wholeness. The wholeness of the carpet or the coherence can also be assessed from two aspects: across all scales, and at each scale. Across all scales, the number of squares meets a power-law relationship with the levels of scale (Salingaros and West 1999), which should be extended to far more small things than large ones (see Sect. 20.5 for further discussion on spatial heterogeneity). At each scale, all pieces are the same size, which should be extended to more or less similar sizes (see Sect. 20.5 for further discussion of spatial dependence). To be more specific, there are three scales: 1/3, 1/9, and 1/27. Each of these scales respectively contains one, eight, and 64 squares, with a fractal dimension of log(8)/log(1/3) = 1.89 (Mandelbrot 1982). The 73 squares of the three different scales constitute a complex network as a whole (Panel b of Fig. 20.6). With the complex network, there are two kinds of relationships: those among a same scale that are undirected, and those between two consecutive scales that are directed. The surrounding centers usually support the center ones, or the centers ones are enhanced by neighboring centers. In addition to the support relationships, there are also nested relationships among the 73 induced centers (Panel c of Fig. 20.6). Because of these relationships, the central square has the highest degree of wholeness, which is accumulated from the eight middle squares and the 64 smallest squares. Eventually, the underlying

Fig. 20.6 A complex network perspective on the Sierpinski carpet (Note a the carpet with three scales, b its complex network of 73 pre-existing centers, in which the dot sizes represent degrees of in-links, c nested relationships among the 73 created centers, and d the scaling hierarchy of the complex network, in which the dot size represents the strength of the 73 pre-existing centers or their degrees of life)

350

B. Jiang

scaling hierarchy of the Sierpinski carpet is shown in a tree structure (Panel d of Fig. 20.6). It should be noted that the whole is not a tree, but a complex network. Despite its high degree of wholeness, the Sierpinski carpet is not exactly the type of pattern we look for in practical designs. Instead, we seek the kind of pattern like the Alhambra plan. There are some major differences between the two patterns. The Sierpinski carpet is by creation, while the Alhambra is by step-by-step generation; the Sierpinski carpet has local and global symmetries, while the Alhambra lacks global symmetry, but with only local symmetries; all centers (or squares) of the Sierpinski carpet are the same size on each scale, while they are more or less similar in the Alhambra; and all shapes are squares in the Sierpinski carpet, while the Alhambra has different shapes. In addition, the Sierpinski carpet is too rigid in terms of the scaling ratio at precisely 1/3, and the increment of the number of squares across two consecutive scales is by exactly eight times. However, the two patterns are the same at the fundamental level in terms of wholeness or coherence at both local and global scales. The wholeness of the Alhambra can only be obtained through step-by-step creation or unfolding: differentiation and adaptation. In the course of design, a whole is continuously divided and differentiated to meet the scaling hierarchy, and newly added centers or scales are created to fit to their local contexts. Design or making is much more challenging than understanding, but the 15 properties (Alexander 2002–2005; Salingaros 2013, 2015) provide guidance for structure-preserving transformations toward a whole with a higher degree of wholeness. Through the case studies, the seemingly abstract concepts of wholeness and centers become more concrete and visible, helping reduce the mystery of wholeness. The wholeness comprises many recursively defined centers, and the centers are induced or created by the wholeness. For the sake of simplicity, we adopted these simple cases to illustrate the subtlety of the wholeness. The next section further discusses the implications of the complex network perspective to argue why the wholeness or living geometry is powerful and unique in terms of planning and repairing our environment or making the Earth’s surface more whole or more beautiful.

20.5 Implications of the Complex-Network Perspective and Wholeness The complex-network perspective enables us to see things in their wholeness, rather than as parts or fragments. Complex networks provide a powerful means to further study wholeness and understand the kind of problem a city is (Jacobs 1961). A city is essentially problems of organized complexity like those in biology. The notion that a city is more whole than another, or that a city has a high degree of wholeness, is the same as one city being more imaginable or legible than another (Lynch 1960; Jiang 2013b). Seen from the wholeness or the theory of centers, it is the city itself or the underlying living structure that determines the image of the city. The image of the

20 A Complex-Network Perspective on Alexander’s Wholeness

351

city for individuals may vary from one to another, but there is a shared image of the city for all people. This shared image is the most interesting part for urban design theory, which has been persistently criticized for lacking a solid and robust scientific underpinning (Jacobs 1961; Marshall 2012). The complex network perspective or more truly the wholeness itself will inject scientific elements into urban planning and design. The Sierpinski carpet is an image of the Earth’s surface metaphorically. This is because there are far more small things than large ones across all scales—the spatial property of heterogeneity, and related things are more or less similar in terms of magnitude on each scale—the spatial property of dependence. Both heterogeneity and dependence are commonly referred to as spatial properties about geographic space or the Earth’s surface (Anselin 1989; Goodchild 2004). The carpet is also an ideal metaphoric image toward which our built environment should be made. It implies that any space ought to be continuously differentiated to retain the scaling hierarchy across all scales ranging from the smallest to the largest, and any building or city ought to be adapted to its natural and built surroundings at each scale. The two processes of differentiation and adaptation enable us to create a whole with a high degree of wholeness. In this regard, the wholeness or the theory of centers would have enormous effects on geography, not only for better understanding geographic forms and processes, but also for planning and repairing geographic space or the Earth’s surface (Mehaffy and Salingaros 2015; Mehaffy 2007). Built environments must adapt to nature, and new buildings must adapt to their surroundings. The two spatial properties of heterogeneity and dependence constitute a true image of the Earth’s surface at both global and local scales. Globally, spatial phenomena vary dramatically with far more small things than large ones across all scales, while locally they tend to be dependent or auto-correlated with more or less similar things nearby. These two properties are the source of the two kinds of harmony or coherence respectively across all scales and on every scale. For example, there are far more small cities than large ones across all scales globally (Zipf 1949), whereas nearby cities tend to be more or less similar in terms of the central place theory (Christaller 1933, 1966). However, the geography literature focuses too much on dependence, formulated as the first law of geography (Tobler 1970), but very little on heterogeneity. More critically, spatial heterogeneity is mainly defined for spatial regression with limited variation, governed by Gaussian thinking (Jiang 2015d). This understanding of spatial heterogeneity is flawed, given the fractal nature of geographic space or features. Spatial heterogeneity should be formulated as a scaling law because it is universal and global. The wholeness or the theory of centers in general brings new perspectives to the science of complex networks. For example, the detection of communities of complex networks can benefit from the wholeness as a recursive structure. Instead of a flat hierarchy of community structures, a complex network contains numerous nested communities of different sizes, or far more small communities than large ones (Tatti and Gionis 2013; Jiang and Ma 2015). This insight about nested communities can be extended to classification and clustering. Current classification methods can be applied to mechanical assembly, so mechanical parts can be obtained. For

352

B. Jiang

a complex entity such as complex networks, the parts are often overlapping and nested inside each other. This resembles both wholeness and centers as a recursive structure. From a dynamic point of view, a complex network is self-evolved with differentiation and adaptation. A complex network has the power to evolve toward more coherence at both global and local scales, and is self-organized from the bottom up, rather than imposed from the top down. This insight about self-organization reinforces Alexander’s piecemeal design approach through step-by-step unfolding or transformations for a living structure with a high degree of wholeness.

20.6 Conclusion This paper develops a complex network perspective on the wholeness to make it more accessible to both designers and scientists. I discussed and compared related concepts such as whole and parts versus wholeness and centers to make them explicitly clear. A whole is a relatively coherent spatial set, while wholeness is a life-giving structure— not something about the way they are seen, but something about the way they are (Alexander 2003, p. 14). The wholeness is made of centers rather than arbitrarily identified parts. The centers are created or induced by the wholeness and made of other centers, rather than just those pre-existing in the whole. Given these differences, the mantra that the whole is more than the sum of its parts should be more correctly rephrased as the wholeness is more than the sum of its centers. I demonstrated that the complex-network perspective enables us to see things in their wholeness. More importantly, I elaborated on design or making living structure through wholenessextending transformation or unfolding in step-by-step differentiation and adaptation. Although understanding the nature of wholeness is essential, the ultimate goal of Alexander’s concept of wholeness or living geometry is to allow us to make artifacts and built environments with the same order and beauty of nature itself. The complex-network perspective enables us to develop new insights into planning and repairing geographic space. For example, differentiation and adaptation are two major processes for making living structures or geographic space in particular. This study showed that these processes underlie the two unique properties of geographic space: spatial heterogeneity and spatial dependence. These are respectively formulated as the scaling law and the first law of geography. Globally across all scales, there are far more small things than large ones. In contrast, locally at every scale, things tend to depend on each other with more or less similar sizes. These two spatial properties are the source of the two types of coherence respectively at global and local scales. In this connection, the theory of centers or living geometry would significantly contribute to understanding and making geographic space. Our future work points to this direction on how to rely on wholeness-extending transformations to create geographic features with a high degree of wholeness. Acknowledgements This chapter is a reprint of the journal paper (Jiang 2016) with permission of the publisher Elsevier. This work is substantially inspired by the life’s work of Christopher

20 A Complex-Network Perspective on Alexander’s Wholeness

353

Alexander, which aims to develop a solid and robust scientific underpinning for architecture. Unfortunately, the mainstream architecture profession has yet to grasp, or put to use, the more profound aspects of his thought. I would like to thank Michael Mehaffy for his constructive comments and Maggie Moore Alexander for sharing some of the unpublished manuscripts of Alexander. I would also like to thank the four anonymous referees for their useful comments, and Zheng Ren for helping with some of the figures.

References Alexander, C. (1964). Notes on the synthesis of form. Cambridge, Massachusetts: Harvard University Press. Alexander, C. (1965). A city is not a tree. Architectural Forum, 122(1+2), 58–62. Alexander, C. (1979). The timeless way of building. Oxford: Oxford University Press. Alexander, C. (1993). A Foreshadowing of 21st Century Art: The color and geometry of very early Turkish carpets. New York: Oxford University Press. Alexander, C. (2002–2005). The Nature of Order: An essay on the art of building and the nature of the universe, Center for Environmental Structure: Berkeley, CA. Alexander, C. (2003). New Concepts in Complexity Theory: Arising from studies in the field of architecture, an overview of the four books of The Nature of Order with emphasis on the scientific problems which are raised, http://natureoforder.com/library/scientific-introduction.pdf. Alexander, C., Neis, H., & Alexander, M. M. (2012). The battle for the life and beauty of the earth. Oxford: Oxford University Press. Anselin, L. (1989). What is special about spatial data: Alternative perspectives on spatial data analysis. Santa Barbara, CA: National Center for Geographic Information and Analysis. Barabási, A., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512. Bohm, D. (1980). Wholeness and the implicate order. London and New York: Routledge. Christaller, W. (1933, 1966), Central Places in Southern Germany, Prentice Hall: Englewood Cliffs, N.J. Goodchild, M. (2004). The validity and usefulness of laws in geographic information science and geography. Annals of the Association of American Geographers, 94(2), 300–303. Jacobs, J. (1961). The death and life of great american cities. New York: Random House. Jiang, B. (2013a). Head/tail breaks: A new classification scheme for data with a heavy-tailed distribution. The Professional Geographer, 65(3), 482–494. Jiang, B. (2013b). The image of the city out of the underlying scaling of city artifacts or locations. Annals of the Association of American Geographers, 103(6), 1552–1566. Jiang, B. (2015a), Head/tail breaks for visualization of city structure and dynamics, Cities, 43, 69–77. Reprinted in Capineri C., Haklay M., Huang H., Antoniou V., Kettunen J., Ostermann F., and Purves R. (editors, 2016), European Handbook of Crowdsourced Geographic Information, Ubiquity Press: London. Jiang, B. (2015b). Wholeness as a hierarchical graph to capture the nature of space. International Journal of Geographical Information Science, 29(9), 1632–1648. Jiang, B. (2015c). A city is a complex network. In M. W. Mehaffy (editor, 2015), Christopher Alexander A City is Not a Tree: 50th Anniversary Edition, Sustasis Press: Portland, OR, 89–98. Jiang, B. (2015d). Geospatial analysis requires a different way of thinking: The problem of spatial heterogeneity, GeoJournal, 80(1), 1–13. Reprinted in Behnisch M. and Meinel G. (editors, 2017), Trends in Spatial Analysis and Modelling: Decision-Support and Planning Strategies, Springer: Berlin, 23–40. Jiang B. (2016). A complex-network perspective on Alexander’s wholeness, Physica A: Statistical Mechanics and its Applications, 463, 475–484.

354

B. Jiang

Jiang, B., & Ma, D. (2015). Defining least community as a homogeneous group in complex networks. Physica A: Statistical Mechanics and its Applications, 428, 154–160. Köhler, W. (1947). Gestalt psychology: An introduction to new concepts in modern psychology. New York: LIVERIGHT. Lynch, K. (1960). The image of the city. Cambridge, Massachusetts: The MIT Press. Mandelbrot, B. (1982). The fractal geometry of nature. W. H. Freeman and Co.: New York. Marshall, S. (2012). Science, pseudo-science and urban design. URBAN DESIGN International, 17, 256–271. Mehaffy, M. W. (2007). Notes on the genesis of wholes: Christopher Alexander and his continuing influence. Urban Design International, 12, 41–49. Mehaffy, M. W., & Salingaros, N. A. (2015). Design for a Living Planet: Settlement, science, and the human future. Portland, Oregon: Sustasis Press. Newman, M. E. J. (2004). Detecting community structure in networks. European Physical Journal B, 38, 321–330. Newman, M. E. J. (2010). Networks: An introduction. Oxford: Oxford University Press. Page, L., & Brin, S. (1998). The anatomy of a large-scale hypertextual Web search engine, Proceedings of the Seventh International Conference on World Wide Web, 107–117. Salingaros, N. A. (1998). Theory of the urban web. Journal of Urban Design, 3, 53–71. Salingaros, N. A., & West, B. J. (1999). A universal rule for the distribution of sizes. Environment and Planning B: Planning and Design, 26(6), 909–923. Salingaros, N. A. (2013). Algorithmic Sustainable Design: Twelve lectures on architecture. Portland, Oregon: Sustasis Press. Salingaros, N. A. (2015). United Architectural Theory: Form, language and complexity. Kathmandu, Nepal: Vajra Books. Simon, H. A. (1962). The architecture of complexity. Proceedings of the American Philosophical Society, 106, 468–482. Tatti, N., & Gionis A. (2013), Discovering nested communities, Blockeel H., Kersting K., Nijssen S., Železný F. (eds.), Machine Learning and Knowledge Discovery in Databases, 32–47. Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(2), 234–240. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393, 440–442. Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4), 452–473. Zipf, G. K. (1949). Human Behaviour and the Principles of Least Effort. Cambridge, MA: Addison Wesley.

Chapter 21

Spatial-Temporal Behavior Analysis in Urban China Suhong Zhou and Yinong Peng

21.1 Introduction Behavior study has a long history in social science and humanities research. It emphasizes individual’s subjective cognizance and choice, how they make behavior decisions and what the environmental consequence is. Therefore, as a media of individual and urban environment, it helps people to understand urban structure from a new aspect. In 1960s, behavior geography emphasizes the effect of individual and explain the formation and development of space from micro perspective, becoming the theoretical foundation of space behavior interaction research (Chai et al. 2017). In 1970s, Hägerstrand put forward with the theoretical framework of time geography (Hägerstrand 1975), discussing the relationship of individual and environment space-time process. Since then, spatial-temporal behavior analysis has become an effective way to understand the mechanism of city. Space and time are two important dimensions in individual behavior. When examining individual behavior, visualization, geo-computing, spatial forecasting and other spatial synthesis methods helps to understand the concept of human in space, the laws of human behavior, the effects of urban environment and so on. Particularly, the development of internet technology and application generate massive spatial-temporal behavior trajectory data, which makes it possible to study human S. Zhou (B) School of Geography and Planning, Guangdong Provincial Engineering Research Center for Public Security and Disaster, Sun Yat-Sen University, Guangzhou 510275, China e-mail: [email protected] Y. Peng School of Geography and Planning, Guangdong Provincial Engineering Research Center for Public Security and Disaster, Guangzhou urban planning and design survey research institute, Guangzhou 510275, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_21

355

356

S. Zhou and Y. Peng

Fig. 21.1 The hot keywords and relations in different years (Source Gu et al. 2013)

behavior in a more dynamic and synthetical way. Spatial social science and humanities research has shifted from a data-scarce to a data-rich environment, calling for more computer-assisted methods. In order to summarize the research progress, Gu et al. (2013) set up a database, containing 2263 related papers included in the database of ‘Web of Science’ from 1999 to 2013. 78139 indirect papers that referenced by them are included to explore the trend and hot issues in the field of spatial-temporal behavior objectively. Using the database, the historical citations relationships and the hot spots of literatures, key words are visualized. Figure 21.1 showed that the main research fields came from the fields of Geography, Transportation and Land use, as well as GIS and Computer Science. The key words mentioned most in the papers gradually shifted from description of space-time behavior into interpretation of it. The implication of social equality, social segregation under spatial-temporal behavior were explored after 2000s, extending the research into the integration of behavior and social issues. After the year 2012, a new hot key word of “trajectories” appeared in this field. Since then, new methods and new spatial-temporal big data have been widely applied in spatial-temporal behavior research. Spatial-temporal behavior analysis research attracted widely concerned in China after late 20th century. It experienced a rapid development in recent twenty years. Multiple kinds of activities likes living, working (Sun and Wei 2014), shopping (Chai et al. 2008) and recreating were studied, discovering plenty of spatial-temporal patterns and the economic and social mechanism of city.

21 Spatial-Temporal Behavior Analysis in Urban China

357

In the initial period of Chinese behavior geographical research in 1990s, articles mainly focused on literature review and introduction on related theories, frameworks and methods, trying to understand city from microcosmic perspective and analyze urban issues through individual behavior (Chai et al. 2013), and doing some researches on describing spatial temporal characteristic of different groups of people. For example, the shopping activity of residents in Tianjin is analyzed (Wu et al. 2000), and the characteristic of recreation activity in Dalian is compared between weekdays and weekend (Li and Chai 1999). After 21st century, rapid development of urban transportation and environment brings a lot of new opportunities and problems, which makes the developing mechanism of Chinese cities differs a lot from western cities. Chinese geographers began to develop western theory of space-time geography to answer new questions in urban China, Zhen et al. (2009) discussed the guiding and substitution effect of internet on individual travel behavior, with the rapid developing of internet. Sun et al. (2009) made a research on e-shopping behavior of Shenzhen residents, finding out that activities of e-shopping behavior is closely related to the distance between their home and retailing centers. Faced the large increase in residential recreation demand, Zhang and Li (2006) studied the shopping preference and location choice of Beijing residents. Topics about social equity cause discussion in behavior research as well. Gender deference in individual recreation behavior is analyzed, pointing out that the quality of female recreation activity is not as good as male (Huang and He 2007). In 2010s, the improvement of economic and social status caused rapid increase of individual mobility and multiple activity request. Residential behavior in urban China is becoming more and more complex. In this period, different kinds of data are used in spatial-temporal behavior analysis. Previous behavioral studies were limited by the scale, accuracy, or promptness in obtaining statistical data. And the development of location-based services provides brings big data to behavior research. Cell phone data, ID card data and other data with spatial-temporal information helps to analyzed individual behavior more preciously and effectively. Besides, activity diary, as a traditional method to estimated 24-hour activity of individuals, is now combined with GPS services to provide more precious data (Chai and Ta 2013). Different methods using big data to measure individual behavior are concluded and lots of practices on using big data to analyze human behavior are carried out (Liu et al. 2014; Zhou 2015). From the aspect of consumption behavior, Wang et al. (2015) used cell phone signaling data to study the shopping behavior of residents in Shanghai, and then divided the market areas of three retailing centers. Qin et al. (2014) used data from Dianping website, a volunteered geographical information platform, to study the pyramid pattern of restaurants in Nanjing through recreating activity. And Long et al. (2012) used bus smart card data to identify typical commuting pattern in Beijing. From the aspect of travel behavior, Zhang et al. (2015) analyzed the capacity of road network based on smart phone data. After 30 years’ development, spatial-temporal behavior research in China is getting more and more complete, coming out with a series of studies closely related to Chinese social background, covering different kinds of behavior likes working, consuming, traffic and so on. According to the research context, these researches

358

S. Zhou and Y. Peng

Fig. 21.2 Spatial temporal analytical framework in Urban China

can be divided to three main research fields (Fig. 21.2). Spatial-temporal agglomeration discusses the spatial temporal characteristic of single activity, while spatial temporal correlation is a topic focusing on the relationship of several activities. And space-time process further discusses the spatial temporal process in this kind of relationship. The combination of spatial temporal patterns and behavior classification forms the research network of spatial temporal behavior in China. In this chapter, we summarize the related work that have been done on this framework.

21.2 Spatial-Temporal Agglomeration and Heterogeneity Spatial agglomeration and heterogeneity are two core research topics in geography research to describe a specific kind of behavior. It is mainly about the distribution of similar or different groups of people and further research issues likes social structure, industrial structure, urban space structure and so on. In previous study, spatial effect is a major topic while time is seldom mentioned. And with the introduce of time geography and new methods, it is possible to consider time effect as well when study individual behavior. Therefore, city structure is analyzed more accurately from the aspect of consuming behavior and outdoor behavior in the researches below.

21 Spatial-Temporal Behavior Analysis in Urban China

359

21.2.1 Spatial-Temporal Agglomeration and Urban Structure The study of intra-urban spatial structure has attracted wide attention for decades in urban studies. Most of the relevant literature to date used comparatively static data such as residents’ questionnaires, economic censuses and land use data and limit in revealing the actual spatial structure (Nong et al. 2017). Deeply affected by socioeconomic activity, urban structure can reflect the agglomeration and heterogeneity. For example, learning form residents’ spatial-temporal pattern of outdoor behavior, some interesting characteristics of retailing centers can be discovered. The following case study comes from our resent work (see Nong et al. 2017). Using a large amount of GPS-enabled taxi data, this research attempts to study the structure of retailing centers in Guangzhou, China. The boundaries of retailing centers are delimited and hierarchical characteristics are analyzed. The research used the GPS positioning data of taxis on May 2nd, 2009, a national holiday, for analysis. The GPS-enabled taxi data records the latitude and longitude, and vehicle condition of a taxi every 20-second, which can effectively show whether there is a passenger on the taxi. By recording the first location of the “pick-ups” status as “origin” and the last location of the “drop-offs” status as “destination”, passenger’s travel behavior can be identified. Based on passengers’ destination data, the article uses kernel density estimation method to detect hot pot area. Since there is less commuting and business travel on holiday, the hot pot area of destination can be used to identify retailing canters. When setting the search radius at 1000 m, 4 high-level retailing centers are effectively recognized, and when setting 500 m, 11 more retailing centers can be recognized. To check the reliability of these retailing centers, traditional functional unit method is applied to identify retailing centers. Based on economic census data in 2008, the scale of entertainment sectors and the total number of employees are calculated. When compare the boundary of retailing centers identified by GPS-enabled taxi data and economic census data, a high consistency is found. 11 out of 15 retailing centers from GPS-enabled taxi data are also successfully identified by traditional functional unit method. The overlap area of results counts for 34.17% of the total retailing area with taxi data and 57.11% of the total retailing area with census data. Retailing centers from taxi data has a larger overlap area because it mainly relies on consumer behavior, while the boundary based on the traditional method is formed from administrative community boundary by government policy. In 4 missing retailing centers, Yide Road is a wholesale commercial street with 100-year-old history and Changgang Road is a community center serving the community nearby. Generally identifying retailing center by GPS taxi data is reliable in most of time and bias may exists in retailing centers with small-scale commercial activities with special functions. On the basis of the retailing centers, their spatial-temporal hierarchy characteristics were further analyzed. It is indicated that the hierarchical characteristics of retailing centers in Guangzhou is obvious with the dynamic patterns. Using taxi-GPS data, the article studies on passengers shopping behavior in holiday, and thus analyzes the boundary and hierarchy of retailing centers. It shows

360

S. Zhou and Y. Peng

that spatial agglomeration pattern of urban structure can be recognized based on new sources of spatial-temporal big data, and extend our understanding of urban structure from the perspective of space and time.

21.2.2 Social and Spatial Heterogeneity Spatial heterogeneity and segregation are also discussed in the research of social segregation. With China’s rapid urbanization driven by fastly growing economy, the enlarging socio-spatial inequalities in the cities has been widely studied. The following case study comes from our resent work (see Zhou and Deng 2010; Zhou 2015). The study is based on the data of a random sample household survey conducted from May to August 2007 in 11 typical neighborhoods with a range of about 1 km2 in the old urban area, transitional area and marginal area of Guangzhou city. A total of 800 households and 1006 people were sampled in this survey, with a effective sample of 982. The contents of this paper include the basic attributes of the respondents, the space-time information of the travel log of the latest working day, the purpose of travel and the mode of transportation, as well as the location of residence and employment, the considerations of choice. The temporal-spatial patterns of residents’ activity are different among people who come from different classes (Fig. 21.3). The activity space of lower class is smaller, mostly located in inner city and the area around their residential communities, and their payment for transportation is the lowest. While the activity space of upper class is larger, mostly located around new center of the city, and the time using for outside activity is well-regulated, and their payment for transportation is the highest. There are close relationships between resident behavior and urban internal spatial structure, which will provide a reliable basis for urban planning and urban management. Rather than following the past study on residential spaces, socio-spatial differentiation based on the spaces of one’s out-of-home activities were also explored by Suhong Zhou et al. (2015). The spatial and temporal autocorrelation coefficient of GT is set up to examine the spatial and temporal clustering of the high-income and low-income groups’ activities. Space and time are the two continuous dimensions with the same influence, so the spatial autocorrelation coefficient Gi * and temporalautocorrelation coefficient T i should have equal weight in constructing the spatial and temporal autocorrelation coefficient GT i . Both Gi *and T i are standardized to be NGi * and NT i , and the spatiotemporal autocorrelation coefficient GT i is then calculated. Setting the GT i of out-of-home activities points as variable, with a search radius of 1500 m, the density map to identify the typical region of high- and low-income groups’ activity clustering was generated. High-income group’s space-time clusters of activities are found around the following places such as: the education and scientific research work units; the CBD; the suburbanized advanced and scientific industrial park, and Higher Education

21 Spatial-Temporal Behavior Analysis in Urban China

361

Fig. 21.3 The distribution of residents’ density surveyed from different classes at typical hour on one day Note Density grade is classified by equal spacing. The larger the number, the higher the representative density. (Source Zhou et al. 2010, translated by the author)

Mega Centre. Low income group’s space-time clusters of activities are located at the following places: the traditional old city centre, around traditional workers’ Danwei (Work unit) compounds and social housing districts; and in urban villages. The daily out-of-home activities of high-income groups are closely related to work space. The Chinese Danwei system (work unit system) still has a dominant impact on this groups’ daily activity space. The location of the new industrial space is another important place in the high-income group’s daily out-of-home activity

362

S. Zhou and Y. Peng

clustering. Three kinds of high-income out-of-home activity clustering zones are identified: Around the education and scientific research Danwei, the CBD and the suburbanized advanced and scientific industrial park area, and the Higher Education Mega Centre. Low living costs and institutional factors are the main driving forces of the activity clustering of low-income groups. The varied, cheap and highly accessible opportunities in the traditional city centre attract low income groups’ clustering. Three kinds of low-income workers’ out-of-home activities clustering zones could include: The traditional old city centre, around the traditional workers’ Danwei compounds and social housing districts, and urban villages. The above study studies social spatial differentiation from the aspect of daily space time activities, which provides a good complement to the previous research on social spatial differentiation based on residential communities, and provides a new perspective for further revealing the social spatial pattern and mechanism of Chinese cities.

21.3 Spatial-Temporal Correlation and Attenuation When talking of the relationship between several kinds of activities, correlation and attenuation are always mentioned. For example, behaviors with similar characteristics appearing in near location, called near-repeat phenomenon, is a kind of correlation. And attenuation is the phenomenon that one kind of activity is gradually replaced by another kind of activity with the increase of space and time. The three researches below will discuss correlation and attenuation in criminal behavior and consuming behavior.

21.3.1 Spatial Attenuation-Tested by Spatial-Temporal Floating Car GPS Big Data Spatial-temporal big data with space-time labels provide a data basis for the research on testing spatial attenuation of human behavior. The following case study comes from our resent work (see Zhou 2015). Using floating GPS data, the research aims to verify the spatial decay law of retailing centers in Shenzhen. Huaqiang North and Dongmen are two main retailing centers in Shenzhen. We select the trips with origins or destinations in these two retailing centers and calculate its destinations or origins. Kernel analysis with a search radius of 500 m is used to calculate the density of theses destinations and origins in surrounding areas. It is found that there is spatial difference between O/D points attracted by floating cars in 2 retailing centers (Fig. 21.4). In terms of quantity, Dongmen, as a traditional retailing center, attracts much more floating cars than Huaqiang North. The hot-spot areas of 2 commercial centers are

21 Spatial-Temporal Behavior Analysis in Urban China

363

Fig. 21.4 The O/D density of floating car visiting two commercial centers in Shenzhen (Source Zhou et al. 2014, translated by the author)

all concentrated in a certain region and the travel density is generally spread out from the retailing centers. What’s more, there is a certain mutual attraction between 2 retailing centers, especially the Huaqiang North commercial center. It attracts a large sum of consumers from Dongmen commercial center. Furthermore, the spatial-temporal attraction law of 2 retailing centers are analyzed to verified the gravity model. We use the distribution of floating cars to fit the negative power function (Formula 21.1). Y = aX−m

(21.1)

Here Y is the number of floating cars attracted by retailing center and X is the distance to retailing centers. The R2 of Dongmen and Huaqiang North are 0.88 and 0.87, which means that the attraction of retailing centers has a negative power relationship with the distance from retailing centers. Gravity model shows its validity in the attraction of retailing centers (Figs. 21.5 and 21.6). It is noteworthy that the decrease slows down between 500–1500 m, which is differ from the gravity model. That is because residents prefer walking than travelling by taxi in this distance. The attraction of retailing center decreases to 0 density/m2 when the distance to retailing centers is beyond 6–7 km, that is because 6–7 km is the furthest distance people are willing to pay for. For further comparison, we establish a 250 m2 x 250 m2 fishnet to calculate the service area of retailing centers using gravity model (Formula 21.2). α dij Rij = pi ∗ pj

(21.2)

364

S. Zhou and Y. Peng

Fig. 21.5 The floating car attracted by Dongmen (Source Zhou et al. 2014, translated by the author)

Fig. 21.6 The floating car attracted by Huaqiang North (Source Zhou et al. 2014, translated by the author)

Here Rij is the attraction of retailing center, Pi and Pj is the population of retailing center, measure by GPS taxi data. And d ij is the distance of retailing center i and j. Therefore, the service area based on gravity model is analyzed. The actural service area of retailing center is calculate based on the kernel analysis result. A 250 m2 × 250 m2 fishnet is established as well. For one unity, if the kernel density of Dongmen is larger than Huaqiang North’s kernel density, then this unity is identified as the service area of Dongmen. According to Figs. 21.7 and 21.8, the service area of Huaqiangbei is mainly in the west of Shenzhen while the service area of Dongmen is in the east. Therefore, in general, the theory of gravity model is valid in the attraction of retailing centers. However, some biases also exist in this gravity model due to a series of reasons. For example, when have a look at the area between Dongmen and Huaqiang North, the attraction of Domeng is larger than Huaqiang North. That is because Dongmen is not only a retailing center but also a tourist destination with a long history, which obviously enhances its attraction.

21 Spatial-Temporal Behavior Analysis in Urban China

365

Fig. 21.7 Simulation of service area of two retailing centers in Shenzhen (Source Zhou et al. 2014, translated by the author)

Fig. 21.8 The actual service area of Shenzhen two commercial centers (Source Zhou et al. 2014, translated by the author)

366

S. Zhou and Y. Peng

The case study above shows that space-time labeled big data provide a new data basis for the research on testing some spatial and temporal laws including attenuation and scaling laws of human behavior and some new research questions can be generated and discussed.

21.3.2 Spatial-Temporal Autocorrelation and Hot Spot Recognizing Besides spatial attenuation and scaling law, spatial-temporal autocorrelation is another theoretical issue that attracts attention in the field of spatial-temporal behavior. Strongly related to time factor, criminal behavior can also be simulated and predicted. The research figures out crime hotspot density maps taking autocorrelation into consideration and tries to forecast the criminal behavior. The following case study comes from our resent work (see Xu et al. 2016). Crime hotspot mapping is a method to forecast the occurrence of potential offense and direct the police activities to restrain the offense. It reveals the spatial pattern of the offense according to the historical data of where the crime occurred. Density Estimation has proved to be the most reliable way to estimate the law of criminal distribution among all the hotspot mapping methods, especially street crimes. However, how to make the spatial estimation of crime more accurate based on historical data is a research focus. When analyze and determine crime hot spot pattern based on long time scale of historical data, the change in urban physical space can’t be ignore from the perspective of objective environment especially for China where the speed of development of urbanization is high. Environmental characteristics that may generate or attract crime keep changing as the change of urban physical space. From the perspective of offender, when committing a crime, the criminal usually learns from the experience of previous successful crimes and tends to establish a certain criminal means, since there is no reason for criminals to risk their illegal activities in a different way. But with the change of environmental characteristics, the certain criminal means will also be adjusted accordingly. Therefore, time as an important influencing factor of criminal pattern analysis may increasing the accuracy of predictions when be considered. When analyzes time effect based on the formula of spatial autocorrelation, the urban development and change of China is added as an impact factor. We put forward that the weight of time is also affected by distance attenuation. Therefore, the older the case, the less it affects the current pattern of crime. Secondly, based on the previous analysis of the spatiotemporal characteristics of criminal cases, it is found that there is a phenomenon of adjacent repetition in the time and space of various types of cases, which tends to attenuate with the increase of duration of time and distance of space. Therefore, the research takes attenuation of time into account in the calculation of time weight.

21 Spatial-Temporal Behavior Analysis in Urban China

367

Compared with the density estimation algorithm without the spatial or temporal autocorrelation, the temporal autocorrelation, spatial autocorrelation and spatialtemporal autocorrelation among offense points are illustrated in this paper. With street robbery data of DP Peninsula during the period from 2006 to 2007, four different crime hotspot maps based on non-spatial-temporal autocorrelation, spatial autocorrelation, temporal autocorrelation and spatial-temporal autocorrelation are produced respectively. Moreover, with a validation data of 2008 cases, 2 different classification methods, Natural breaks (Jenks) classification (Fig. 21.9) and Equal proportional selection by area (Fig. 21.10) are utilized to delimit the hot area and then the comparisons among the scores of the Prediction Accuracy Index (PAI) of those 4 different crime hotspot maps are implemented with different classes of natural breaks (Table 21.1) and different equal proportional to select hot area (Table 21.2). The results present that the density estimation method based on the spatialtemporal autocorrelation holds significant advantages when predicting the potential offences, especially when the research area is comparatively small.

21.3.3 Spatial-Temporal Near-Repeat In addition to consideration of the global characteristics of space-time, the combination and relationship between time and space also deserves attention. Near-repeat is one of the phenomena studied a lot in the field of crime research. The following case study comes from our resent work (see Xu et al. 2015). Talking of criminal

Fig. 21.9 The results of third class Natural breaks (Jenks) with different methods (Source Xu et al. 2016, translated by the author)

368

S. Zhou and Y. Peng

Fig. 21.10 The smooth maps of different density methods (Source Xu et al. (2016) translated by the author)

behavior, there are also certain objective patterns. Taking DP peninsula in downtown H as the research area, 373 street robbery cases on the island from 2006 to 2011 are studied as the research object to explore the spatial-temporal distribution of street robbery cases on the island. The analysis framework of crime near-repeat is showed in Fig. 21.11. By calculating the Manhattan distance of all the cases, it is indicated that nearrepeat phenomenon occurs in street robbery in multi-spatial-temporal scales. After the first street robbery, there is a high risk of a second robbery within 400 m and 42 days. The probability of recurrence is the highest within 1–200 ms and 15–28 days. Its possibility is 68% higher than the expected level, passing the significance level test of 0.001. Thus, it is identified as the most sensitive and intense space-time scale of the street robbery cases in the study area. In addition, it is easy to found out that the frequency of street robbery has a significant attenuation with space and time in Table 21.3. Since 200 meters and 15–28 days is the most sensitive indicator in near-repeat phenomenon, we use it for further analysis. 373 cases were divided into 4 types, cases without near-repeat phenomenon, earlier cases in near-repeat phenomenon, later cases in near-repeat phenomenon, both earlier and later cases in near-repeat phenomenon. 96 cases can be divided into the three later types. More specifically, the number of earlier cases with near-repeat phenomenon is 39 and the number of later cases in near-repeat phenomenon is 18. And the number of both earlier and later cases in near-repeat phenomenon is 18, accounting for 18.8%. It means that there are multiple “near-repeat chains” in the street robberies in DP peninsula. If earlier

PAI

29

21

Spatial autocorrelation

Spatial-temporal autocorrelation

(Source Xu et al. 2016, translated by the author)

17

Temporal autocorrelation 1141

1674

1380 11.71

11.02

7.84

3.65

17

21

13

17

08 cases

Number of grids 3834

08 cases

22

Non-spatial-temporal autocorrelation

4 classes

3 classes

Table 21.1 The results of PAI with Natural breaks (Jenks) Number of grids

662

1173

1110

1180

16.34

11.39

7.45

9.17

PAI

13

12

5

13

08 cases

5 classes Number of grids

450

692

256

683

18.37

11.03

12.43

12.11

PAI

21 Spatial-Temporal Behavior Analysis in Urban China 369

370

S. Zhou and Y. Peng

Table 21.2 The results of PAI with equal proportion area 0.5%

1%

5%

10%

08 cases

PAI

08 cases

PAI

08 cases

PAI

08 cases

PAI

Non-spatial-temporal autocorrelation

4

11.27

10

14.08

21

5.92

35

4.93

Temporal autocorrelation

5

14.08

7

9.86

22

6.20

32

4.51

Spatial autocorrelation

2

5.63

4

5.63

23

6.48

46

6.48

Spatial-temporal autocorrelation

6

16.90

13

18.31

28

7.89

43

6.06

*“08 cases” represent the number of cases in the hot area. (Source Xu et al. 2016, translated by the author) (Source Xu et al. 2016, translated by the author)

Fig. 21.11 The analysis framework of crime near-repeat (Source Xu et al. 2015, translated by the author)

cases are successfully prevented and controlled, the police can cut off near-repeat chains result can cut down 57 robbery cases in total. In addition, a significant spatial correlation is found in the 57 earlier cases using kernel analysis. These hot spot areas are usually on the main roads, high land-use mixed areas and high accessibility areas (Fig. 21.12). When counting the number of all street robbery cases from 2006 to 2011, the number of crimes in the six years is the highest in 2007 with a total number of 100, while the number of crimes is the lowest in 2010 with a total number of 22 cases (Fig. 21.13). One of the important reasons for the abrupt decline in the number of cases since 2007 is the special action taken by the public security system in H city. The decrease of the cases with near-repeat phenomenon is similar to the total cases in the 6 years. Cases with near-repeat phenomenon reduces not only because of the general reduction, but also due to the ‘Crackdown Action’ by police since 2007. The proportion cases with near-repeat phenomenon experiences a peak of

21 Spatial-Temporal Behavior Analysis in Urban China

371

Table 21.3 Observed frequencies and significance levels Spatial distance (m)

Time distance (day) 0~14

15~28

29~42

43~56

>56

1–200

1.61**

1.68**

1.41*

1.20

0.97

201–400

1.37*

1.52**

1.35*

1.12

0.98

401–600

1.00

0.92

0.95

1.08

1.00

601–800

0.98

0.98

1.04

0.96

1.00

801–1000

0.70

0.92

0.86

1.09

1.01*

1001–1200

0.90

0.81

0.89

0.83

1.01*

1201–1400

0.88

0.69

0.93

0.83

1.01**

1401–1600

0.90

0.91

0.73

0.76

1.01**

1601–1800

1.02

1.11

1.06

1.25*

0.99

1801–2000

1.17

0.94

1.02

0.97

1.00

>2000

1.01

1.03

1.04

1.04

1.00

*p 0.05

Fixation duration of shops (s)

166.92 ± 61.62

115.01 ± 43.27

9.89

p < 0.05*

Fixation duration of elevator (s)

43.57 ± 19.13

61.19 ± 26.15

6.78

p < 0.05*

Other fixation duration (s)

67.67 ± 38.82

55.73 ± 28.92

1.59

p > 0.05

Wayfinding process

Overall performance Total duration (s)

265.49 ± 59.93

243.22 ± 47.16

8.12

p < 0.05*

Average duration (s)

39.18 ± 25.65

33.14 ± 18.53

6.32

p < 0.05*

Average error times (s)

1.40 ± 0.91

1.00 ± 0.83

1.03

p > 0.05

Fixation duration of shops (s)

185.33 ± 69.11

127.58 ± 50.31

11.51

p < 0.05*

Fixation duration of elevator (s)

51.25 ± 23.16

67.57 ± 21.22

7.04

p < 0.05*

Other fixation duration (s)

58.91 ± 33.95

48.07 ± 21.66

0.89

p > 0.05

Wayfinding process

M = mean, SD = standard deviation. ** p < 0.01, * p < 0.05

Figure 25.4 shows that in indoor 2D and 3D navigation map application, the subjects have significant differences only in the average stay time of turning position. There are no differences in total time, key point error times, AOI fixation time of shops, AOI fixation time of elevators and other AOI fixation time.

25.3.3.4

Discussions

In this experiment, eye movement experiment is implemented to verify whether noncartographic and cartographic subjects have different wayfinding behaviors when using indoor 2D and 3D navigation maps (Liao et al. 2017). At the same time, this experiment also proves that significant differences exist between expert and non-expert users. Based on the eye movement indicators and behavior data of the

444

S. Zheng et al.

Fig. 25.4 Difference of wayfinding between Indoor 2D and 3D navigation maps. ** p < 0.01, * p < 0.05

previous statistics, two rules can be found when using the indoor navigation map to find ways. (1) The indoor 2D or 3D navigation map has no significant influence on the user’s indoor wayfinding. According to Fig. 25.4, except for the average residence time data of turning position, there is no significant difference in the behavior of users in finding destinations using 2D or 3D navigation map. Thus, the indoor map has no significant impact on indoor wayfinding.

25 Application of Eye-Tracking Technology in Humanities …

445

(2) Expert users perform better than non-cartographic users in indoor wayfinding. And experts pay less attention to shop signs in indoor wayfinding. It can be seen from Table 25.3 that there are significant differences between the experts and non-experts in the total time of fixation and task completion. It takes less time for experts to complete the task of route memory, and less time to find the destination. At the same time, experts spend more time in the route memory task than non-experts, but less time in route searching process.

25.4 Conclusion Eye movement experiments are effective, real-time and noninvasive means of monitoring map cognition process and identifying personalized map searching strategies. In this article, the applications of eye movement experiments in humanities, social science and geospatial cognition are systematically reviewed. Two experiments are introduced to explore goal searching strategy differences and indoor wayfinding differences between experts and non-experts map users. As the basis of map design and evaluation, map visual cognition is an integrated process including the bottomup perception and top-down interference. The results show that such experiments have great potentials in exploring and revealing human visual cognitive process in various disciplines. However, details of the map visual cognition process remains to be clarified. Moreover, it is also a frontier area that personalized map cognition referred to users’ age, gender, occupation, hobbies and other characteristics. In the future, more work in this area will be conducted.

References Ai, T. (2016). The development of cartography driven by big data[J]. Journal of Geomatics, 41(2), 1–7. Altmann, G. T. M., & Kamide, Y. (2007). The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing[J]. Journal of Memory and Language, 57(4), 502–518. Brodersen, A.L., H H K., et al. (2005). Applying eye-movement tracking for the study of map perception and map design[C]. 2005. IEEE Symposium on Visual Languages and Human-Centric Computmy. Burian, J., Popelka, S., & Beitlova, M. (2018). Evaluation of the cartographical quality of urban plans by eye-tracking. ISPRS Int. J. Geo-Inf. 7, 192. Castner, H. W., & Eastman, R. J. (1985). Eye-Movement Parameters and Perceived Map Complexity—I[J]. American Cartographer, 11(2), 107–117. Coltekin, A., Fabrikant, S. I., & Lacayo, M. (2010). Exploring the efficiency of users’ visual analytics strategies based on sequence analysis of eye movement recordings. Dong, W., Ran, J., & Wang, J. (2012). Effectiveness and efficiency of map symbols for dynamic geographic information visualization[J]. Cartography and Geographic Information Science, 39(2), 98–106.

446

S. Zheng et al.

Dong, W., Liao, H., Roth, R. E., et al. (2014a). Eye tracking to explore the potential of enhanced imagery basemaps in web mapping[J]. The Cartographic Journal, 51(4), 313–329. Dong, W., Liao, H., Xu, F., et al. (2014b). Using eye tracking to evaluate the usability of animated maps[J]. Science China Earth Sciences, 57(3), 512–522. Dong, W., & Liao, H. (2016). Eye tracking to explore the impacts of photorealistic 3d representations in Pedstrian navigation performance[J]. ISPRS—International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLI-B2, 641–645. Dong, W., Zheng, L., Liu, B., et al. (2018). Using eye tracking to explore differences in mapbased spatial ability between geographers and non-geographers[J]. ISPRS International Journal of Geo-Information, 7(9), 337. Dong, W., Liao, H., & Zhan, Z. et al. (2019). New research progress of eye tracking-based map cognition in cartography since 2008[J]. Journal of Geographical Sciences. 74(3). Dong, W., & Yang, T., et al. (2020). How does map use differ in virtual reality and desktop-based environments? [J]. International Journal of Digital Earth. Duchowski, A. T. (2003). Eye Tracking Methodology: Theory and Practice[M]. Fabrikant, S. I., Rebich-Hespanha, S., Andrienko, N., et al. (2008). Novel method to measure inference affordance in static small-multiple map displays representing dynamic processes[J]. Cartographic Journal, 45(3), 201–215. Fabrikant, S. I., Hespanha, S. R., & Hegarty, M. (2010). Cognitively inspired and perceptually salient graphic displays for efficient spatial inference making [J]. Annals of the Association of American Geographers, 100(1), 13–29. Fabrikant, S. I., Christophe, S., & Papastefanou, G., et al. (2012). Emotional response to map design aesthetics[C]. Proceedings of Giscience Conference, Columbus, Ohio, USA. Fiori, M., & Antonakis, J. (2012). Selective attention to emotional stimuli: What IQ and openness do, and emotional intelligence does not [J]. Intelligence, 40(30), 245. Gao, J. (1991). Map Spatial cognition and cognitive cartography [M]. Chinese Yearbook Of Cartography. Beijing: Sinomaps Press, 220. Gao, J. (2003). An interpretation of cartography in the digital age[J]. Maps, 3, 5. Gao, J., Gong, J., Lu, X., et al. (2008). Spatial cognition studies in geoscience[J]. Journal of Remote Sensing, 12(2), 338. Giannopoulos, I., Kiefer, P., & Raubal, M. (2013). The influence of gaze history visualization on map interaction sequences and cognitive maps[C]. 21st SIGSPATIAL International Conference on Advances in Geographic Information Systems, MapInteract workshop, Orlando, FL, USA. Hermans, D., Vansteenwegen, D., & Eelen, P. (1999). Eye movement registration as a continuous index of attention deployment: data from a group of spider anxious students[J]. Cognition and Emotion, 13(4), 419–434. Jacob, R., & Karn, K. (2003). Eye tracking in human computer interaction and usability research: Ready to deliver the promises[M]. In J. Hyönä, R. Radach, & H. Deubel (eds), The Mind’s Eye: Cognitive and Applied Aspects of Eye Movement Research. Amsterdam, The Netherlands: Elsevier Ltd., 2003:573–605. Just, M. A., & Carpenter, P. A. (1984). Using eye fixations to study reading comprehension[M]. In D. E. Kieras & M. A. Just (Eds.), New methods in reading comprehension research (pp. 151–182). Hillsdale: Erlbaum. Havelková, L., & Goł˛ebiowska, I. M. (2020). What went wrong for bad solvers during thematic map analysis? lessons learned from an eye-tracking study. [J] ISPRS Int. J. Geo-Inf., 9, 9. Izabela, G., Tomasz, O., & Jan, K. R. (2017). For your eyes only? Evaluating a coordinated and multiple views tool with a map, a parallel coordinated plot and a table using an eye-tracking approach [J]. International Journal of Geographical Information Science, 31(2), 237–252. Keskin, M., Ooms, K., Dogru, A. O., & De Maeyer, P. (2019). EEG & eye tracking user experiments for spatial memory task on maps. [J] ISPRS Int. J. Geo-Inf., 8, 546. Kiefer, P., Giannopoulos, I., & Raubal, M., et al. (2013). Eye Tracking for Spatial Research[C]. 1st International Workshop on Eye Tracking for Spatial Research.

25 Application of Eye-Tracking Technology in Humanities …

447

Kulke, L., Atkinson, J., & Braddick, O. (2017). Neural mechanisms of attention become more specialised during infancy: Insights from combined eye tracking and EEG [J]. Developmental Psychobiology, 59(2), 250–260. Leonards, U., Sunaert, S., Van, H. P., et al. (2000). Attention mechanisms in visual search—an fMRI study [J]. Journal of Cognitive Neuroscience, 12, 61–75. Li, W., & Chen, Y. (2012). Cartography eye movements study and experimental parameter analysis [J]. Bulletin of Surveying and Mapping, 10, 16–20. Liao, H., Dong, W., Peng, C., et al. (2017). Exploring differences of visual attention in pedestrian navigation when using 2D maps and 3D geo-browsers[J]. Cartography and Geographic Information Science, 44(6), 474–490. Liao, H., Dong, W., Huang, H., et al. (2019). Inferring user tasks in pedestrian navigation from eye movement data in real-world environments [J]. International Journal of Geographical Information Science, 33(4), 739–763. Liao, Y. (2011). Research on eye movement characters of volleyball athleteses cognition[M]. Beijing: Beijing sports university press, 4–9, 33. Liu, B., Dong, W., & Meng, L. (2017). Using eye tracking to explore the guidance and constancy of visual variables in 3D visualization[J]. ISPRS International Journal of Geo-Information, 6(9), 1–18. Meng, L. (2005). Egocentric design of map-based mobile services[J]. The Cartographic Journal, 42(1), 5–13. Meng, L. (2006). Some theoretical considerations on the development of cartography technology[J]. Journal of Geomatics Science and Technology., 23(2), 89–96. Montello, D. R., Lovelace, K. L., & Golledge, R. G., et al. (1999). Self. Sex-related differences and similarities in geographic and environmental spatial abilities[J]. Annals of the Association of American Geographers, 89(3), 515–534. Montello, D. R. (2009). Cognitive research in GIScience: Recent achievements and future prospects[J]. Geography Compass, 3(5), 1824–1840. Nielsen, J., & Pernice, K. (2011). Eye tracking web usability [M]. Pearson Education, Inc. Ooms, K., De Maeyer, P., Fack, V., et al. (2012). Interpreting maps through the eyes of expert and novice users[J]. International Journal of Geographical Information Science, 26(10), 1773–1788. Pereira, M. L., Mv, C., & Aprahamian, I., et al. (2014). Eye movement analysis and cognitive processing: detecting indicators of conversion to Alzheimer’s disease [J]. Neuropsychiatr Dis Treat, 1273–1285. Popelka, S., Vondrakova, A., & Hujnakova, P. (2019). Eye-tracking evaluation of weather web maps. ISPRS Int. J. Geo-Inf., 8, 256. Rayner, K. (2009). Eye movements and attention in reading, scene perception, and visual search [J]. Quarterly Journal of Experimental Psychology, 62(8), 1457–1506. Schwering, A., Wang, J., Chipofya, M., et al. (2014). SketchMapia: Qualitative Representations for the Alignment of Sketch and Metric Maps [J]. Spatial Cognition & Computation, 14(3), 220–254. Shao, Z. (2006). Cognitive psychology [M] (pp. 75–79). Shanghai: Shanghai educational publishing house. Snopková, D., Švedová, H., Kubíˇcek, P., & Stachoˇn, Z. (2019). Navigation in Indoor Environments: Does the Type of Visual Learning Stimulus Matter?[J]. ISPRS Int J. Geo-Inf., 8, 251. Sun, Q., Xia, J., Nadarajah, N., et al. (2016). Assessing driversâ visual-motor coordination using eye tracking, GNSS and GIS: a spatial turn in driving psychology [J]. Surveyor, 61(2), 299–316. Sun, R., Zhang, X., Slusarz, P., et al. (2007). The interaction of implicit learning, explicit hypothesis testing learning and implicit-to-explicit knowledge extraction [J]. Neural Networks, 20(1), 34–47. Swienty, O., Zhang, M., & Reichenbacher, T. (2006). Attention guiding visualization of geospatial information[J]. Geoinformatics—Geospatial Information Technology. Swienty, O., Reichenbacher, T., Reppermund, S., et al. (2008). The role of relevance and cognition in attention-guiding geovisualisation[J]. Cartographic Journal, 45(3), 227–238. Tao, Y., et al. (2003). Different presentation and difficulty affected by the text of real-time processing research [J], 23(2), 26–30.

448

S. Zheng et al.

Van Tilborg, M. M., Murphy, P. J., & Evans, K. S. (2017). Impact of dry eye symptoms and daily activities in a modern office. Optometry & Vision Science Official Publication of the American Academy of Optometry, 94(6), 688. Vanclooster, A., Ooms, K., Viaene, P., et al. (2014). Evaluating suitability of the least risk path algorithm to support cognitive wayfinding in indoor spaces: An empirical study[J]. Applied Geography, 53(53), 128–140. Wang, S., Chen, Y., Yuan, Y., et al. (2016). Visualizing the Intellectual Structure of Eye Movement Research in Cartography [J]. ISPRS International Journal of Geo-Information, 5(10), 168. Wang, C., Chen, Y., & Zheng, S. (2018). User interest analysis method of web map point symbol considering eye movement data [J]. Geomatics and Information Science of Wuhan University, 9, 1429–1437. Wei, W. (2013). Eye-tracking assessment of advertisement choice effect on web advertising effectiveness [D]. Shanghai Jiao Tong University, 23. Zheng, S. (2015). Research on personalized map cognition mechanism [D]. Zhengzhou: Information Engineering University. Zheng, S. (2020). Personalized map cognition and eye movement analysis method [M]. Beijing: Publishing House if Electronics Industry.

Part V

Afterword

Chapter 26

Prospects of Spatial Synthesis in Computational Social Science and Humanities: Towards a Spatial Synthetics and Synthetic Geography Daniel Z. Sui

I hope you have enjoyed reading this edited volume co-edited by Professors Hui Lin and Xinyue Ye. All the chapters throughout this volume have provided new and intriguing examples in a variety of social science and humanity disciplines about why we need spatial synthesis and how spatial synthesis can be accomplished using data from different sources under a plethora of conceptual and theoretical frameworks via an artful mix of both quantitative and qualitative approaches. As Professors Batty and Goodchild so perceptively pointed out in their forewords that spatial synthesis not only reflects the convergence research trend in the broader research community but is also a necessity in practicing computation social science and humanities. Having reached this page of the current volume, you may be curious what is next for spatial synthesis? In this brief afterword, I’d like to share a few thoughts and I hope you continue your exploration of spatial synthesis in your future work. Just like the framework on data analytics has served the goal of data analysis so well during the past two decades, I strongly believe, as a natural next step, it is time to start the discussion on spatial synthetics as the organizing framework to further guide the spatial synthesis efforts in the coming years. Spatial synthetics refers to the systematic study on all the different approaches of how we can best synthesize data, methods, and theoretical/conceptual frameworks in order to address the challenging issues facing the world today that transcend the traditional disciplinary boundaries. What will the study of spatial synthetics actually entail? I think at minimum it should include the two dimensions of convergence research as defined by the U.S. National Academies of Sciences (NAS 2005). According to NAS (2005), convergence research must have two primary characteristics:

D. Z. Sui (B) Vice President for Research and Innovation, Virginia Tech, Blacksburg, VA 24061, USA e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 X. Ye and H. Lin (eds.), Spatial Synthesis, Human Dynamics in Smart Cities, https://doi.org/10.1007/978-3-030-52734-1_26

451

452

D. Z. Sui

• Transdisciplinarity—Deep integration across disciplines. As experts from different disciplines pursue common research challenges, their knowledge, theories, methods, data, research communities and languages become increasingly intermingled or integrated. New frameworks, paradigms or even disciplines can form sustained interactions across multiple communities. • Stakeholder synergy: In order to have broader impacts, research should be conducted by drawing together academic researchers, policy makers, and industry partners. Convergence research is generally inspired by the need to address a specific challenge or opportunity, whether it arises from deep scientific questions or pressing societal needs. If our goal is to maintain our current momentum to accelerate spatial synthesis, the development of spatial synthetics, similar to our recent efforts to develop data analytics, should help us achieve our goal faster. Spatial synthetics should, first and foremost, entail transdisciplinary deep integration across multiple disciplines. The emerging data-driven fourth paradigm (Hey et al. 2010) to conduct basic research provides new opportunities to facilitate spatial synthetics. Through convergence research, the rapidly emerging field of data-intensive science (aka: eScience) will continue to transform the world’s scientific and computing research communities and inspire the next generation of researchers across almost scholarly fields these days. The fourth paradigm of discovery based on data-intensive science offers insights into how the potential of convergence research can be fully realized. Transdisciplinary research on spatial synthetics will be accelerated in the coming years due to recent exciting advances in developing synthetic data (https://en.wikipedia.org/wiki/Syn thetic_data), mixing methods of both qualitative and quantitative approaches (Sui 2014), and blending plethora of theoretical and conceptual frameworks (Shaw and Sui 2019). Spatial synthetics should also include stakeholder synergy in order to create broader public impacts of spatial synthesis. Stakeholder synergy—the integration of academia, industry, and government–is critically important for the success of spatial synthetics. By default, seeking stakeholder synergy automatically mandates synthesis of creative works by multiple teams with diverse backgrounds. Apparently, a team science approach is needed for developing spatial synthetics in the context of stakeholder synergy. In general, team science is a collaborative effort to address a scientific challenge that leverages the strengths and expertise of professionals trained in different fields (NAS 2005). Team science-based spatial synthetics mandates moving spatial synthesis beyond the academic comfort zone to address pressing issues facing society today. Over the past two decades, there has been an emerging emphasis on scientifically addressing multi-factorial problems, such as climate change, fighting global terrorism, the rise of infectious/chronic diseases, the health impacts of social stratification, and growing concerns of social disparity. This has contributed to a surge of interest and investment in team science. Increasingly, scientists across many disciplines and settings are engaging in team-based research initiatives. These include

26 Prospects of Spatial Synthesis in Computational Social Science and Humanities …

453

small and large teams, uni- and multi-disciplinary groups, and efforts that engage multiple stakeholders such as scientists, community members, and policy makers. Spatial synthetics could also benefit from the emerging science of team science (SciTS) (Stokols 2008)—a rapidly emerging field focused on understanding and enhancing the processes and outcomes of team science. Among the multiple insights gained from the research in SciTS, we now know that interpersonal dynamics among team members are really the key for the success of a team science project. Team members’ collaborative skills and experiences can be very useful to guide our future efforts of data science-driven team science convergence research. In addition, the success of team science is influenced by a variety of contextual environmental influences (Börner et al. 2010). These factors influence each stage of a scientific initiative, with implications for efficiency, productivity, and overall effectiveness. If we can move forward with a research agenda for spatial synthetics in the spirit of convergence research, I am cautiously optimistic about the eventual development of a possible new field—synthetic geography. Similar to synthetic biology—a field of science that involves redesigning organisms for useful purposes by engineering them to have new abilities, synthetic geography will focus on geo-designing the environment to make the world a better place to live using the new advances in spatial sythetics. Unlike the earlier synthesis efforts in geography that rely primarily on qualitative and descriptive approaches (Turner, 1989), synthetic geographers can harness the power of spatial synthetics to address wide-ranging environmental, urban, social, and economic problems. As Berlin (1993) described so eloquently in his classic The Hedgehog and The Fox, all writers and thinkers throughout human history can be divided into two categories: hedgehogs, who view the world through the lens of a single defining idea (examples given include Plato, Dante, Pascal, Hegel, Dostoevsky, Nietzsche, and Proust), and foxes, who draw on a wide variety of experiences and for whom the world cannot be boiled down to a single idea (examples given include Aristotle, Shakespeare, Montaigne, Goethe, Pushkin, Balzac, and Joyce). Cognitively, Berlin’s hedgehogs and foxes can be related to Gardner’s (2009) laser intelligence versus searchlight intelligence. Laser intelligence probes deeply into a topic but ignores opportunities to cross-pollinate, which is often best suited for work requiring discipline. On the other hand, searchlight intelligence may not probe as deeply but is always scanning the environment and may, therefore, more readily discern connections (and identify differences) across spheres. Both hedgehogs with laser intelligence and foxes with searchlight intelligence may have the proclivity to synthesize, but the contents that they synthesize and the criteria for success may differ widely. It remains a challenge for educators to develop effective strategies and methods of teaching synthesis for both hedgehogs and foxes. Most pedagogic materials developed in geography and GIS are still focusing on analytical skills despite the importance of synthesis skills. Like spatial analysis, spatial synthesis is an integral part of geographical scholarship that transcends subdiscipline specializations. Indeed, analysis and synthesis are two inseparable aspects of a holistic geographic methodology. Analysis and synthesis always go hand in hand; in most cases, they complement one another— there are important situations in which one method can be regarded as more suitable

454

D. Z. Sui

than the other. Meaningful synthesis is built upon the results of preceding analysis, and meticulous analysis makes more sense only when a subsequent synthesis is achieved. Successful geographic research typically rests upon an artful combination of analysis and synthesis. I sincerely hope this book still stimulates more innovative work in the coming years in spatial synthesis and synthetics. In the post COVID-19 world, we need both hedgehogs and foxes. Just like an explorer in the real world is better off if he or she is equipped with both laser and search light, we can make better sense of the world if we train our students to be fluent in both spatial analysis and synthesis. The development of spatial synthetics and eventually synthetic geography through the spirit of convergence research will help us continue to improve our craft of spatial synthesis and cope with an uncertain world.

References Berlin, I. (1993). The Hedgehog and the Fox: An Essay on Tolstoy’s View of History. Chicago: Ivan Dee. Börner, K., Contractor, N., Falk-Krzesinski, H. J., Fiore, S. M., Hall, K.L., Keyton, J., Spring, B., Stokols, D., Trochim, W., & Uzzi, B. (2010). A multi-level systems perspective for the science of team science. Science Translational Medicine, 2, 49–24. Gardner, H. (2009). Five minds for the future. Boston: Harvard Business Press. Hey, T., Tansley, S., and Tolle, K. (eds.) (2010). The fourth paradigm: Data-intensive scientific discovery. Microsfot Research, available on-line at: https://www.microsoft.com/en-us/research/ publication/fourth-paradigm-data-intensive-scientific-discovery (last accessed 4/15/2020). National Academies of Sciences (NAS). (2005). Enhancing the Effectiveness of Team Science. Washington D.C., National Academies Press. National Academies of Sciences (NAS). (2014). Convergence: Facilitating Transdisciplinary Integration of Life Sciences, Physical Sciences, Engineering, and Beyond. Washington D.C., National Academies Press. Stokols, D., Hall, K. L., Taylor, B. K., & Moser, R. P. (2008). The Science of Team Science: Overview of the Field and Introduction to the Supplement. American Journal of Preventive Medicine, 35, S77–S89. Shaw, S. L., & Sui, D. Z. (2019). Understanding the New Human Dynamics in Smart Spaces and Places: Towards a splatial framework. Annals of AAG 110(2): 339-348 (available on-line at: https://www.tandfonline.com/doi/full/10.1080/24694452.2019.1631145). Sui, D. Z. (2014). Information synthesis. The International Encyclopedia of Geography: People, the Earth, Environment, and Technology. New York, N.Y.: Wiley-Blackwell. Turner, B. L. (1989). The specialist–Synthesis approach to the revival of geography: The case of cultural ecology. Annals of the Association of American Geographers, 79(1), 88–100.