Proceedings of the First International Forum on Financial Mathematics and Financial Technology (Financial Mathematics and Fintech) 9811583722, 9789811583728

This book contains high-quality papers presented at the First International Forum on Financial Mathematics and Financial

142 47 5MB

English Pages 246 [238] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
1 The Practice and Development of Digital Inclusive Finance in China
1.1 Overview of Fintech Development
1.1.1 Overview of Global Fintech Development
1.1.2 Overview of China Fintech Development
1.2 Application of Digital Industry of Fintech
1.2.1 Innovation and Developments of Fintech Industry in Banking
1.2.2 Inclusive Finance in Digitalization Background
1.2.3 Overview of the Development of Digital Inclusive Finance
1.2.4 China Digital Inclusive Financial Practice
1.3 Trends in Fintech and Digital Inclusive Finance in China
2 On Arbitrage-Free Pricing in Numeraire-Free Markets: With Applications to Forex and Cryptocurrency
2.1 Introduction
2.2 Definitions and Notation
2.3 Two Preliminary Pricing Methodologies
2.3.1 The Adequate Ordering
2.3.2 The Successive Pricing Methodology
2.3.3 The Total Repricing Methodology
2.4 Pricing for the Practitioner
2.5 Deducing Exchange Rates
2.5.1 Some Mathematical Results
2.5.2 Computing Exchange Rates
2.6 Conclusions
References
3 A Survey on Deep Learning in Financial Markets
3.1 Introduction
3.2 Deep Learning Models
3.2.1 Convolutional Neural Network
3.2.2 Recurrent Neural Network
3.2.3 Deep Belief Networks
3.2.4 Comparison of Deep Learning Models
3.3 Application of Deep Reinforcement Learning in Financial Industries
3.3.1 Credit Scoring
3.3.2 Stock Prediction
3.3.3 Market Trading
3.3.4 Portfolio Management
3.4 Conclusions
References
4 Information Transition in Trading and Its Effect on Market Efficiency: An Entropy Approach
4.1 Introduction
4.2 Information Entropy
4.2.1 Entropy Measures
4.2.2 Propositions of Causal Relationships in Transfer Entropy
4.3 Methodology
4.3.1 Measuring Causality Using Entropy
4.3.2 Entropy Calibration
4.4 Data
4.4.1 Market Index Price
4.4.2 News Sentiment Data
4.5 Results
4.5.1 The Distributions and the Shannon Entropy
4.5.2 Results of Self-causality
4.5.3 Results of Cross-Sectional Causality
4.5.4 Market Inefficiency and Information Entropy
4.6 Conclusions
References
5 Survey of Lattice-Based Group Signature
5.1 Introduction
5.1.1 Backgrounds
5.1.2 Outline of This Chapter
5.2 Traditional Group Signature Algorithm and Its Research Progress
5.2.1 Traditional Group Signature in the Random Oracle Model (ROM)
5.2.2 Group Signature Scheme Under Standard Model
5.3 Lattice-Based Group Signature
5.3.1 Preliminaries
5.3.2 Definition of Group Signature Based on Lattice
5.3.3 Improved Group Signature Based on Lattice
5.4 Conclusions
References
6 Insight on Hybrid Organizational Performance: A Systematic Review
6.1 Introduction
6.2 Methodology
6.3 Results
6.3.1 Profile of Literatures
6.3.2 Hybrid Organizing
6.3.3 Hybrid Impact
6.3.4 Paradocx
6.3.5 Hybrid Performance
6.4 Discussions
6.5 Conclusions
References
7 The Complex Systems' Methods in Financial Science and Technology
7.1 Introduction
7.2 Some Methods for the Complexity of Financial Systems
7.2.1 Hierarchical Structure in Financial Systems
7.2.2 Multiscale Analysis
7.2.3 The Causality Analysis of Financial Markets
7.2.4 The Challenge of the Analysis on Financial Systems
7.3 The New Strategy for Modelling the Evolution of Financial Systems
7.3.1 The Basic Consideration About the Model
7.3.2 The Self-Interactive Process
7.3.3 Self-Interactive Process with External Factors
7.3.4 The New Model of Financial Systems Based on the Self-Interactive Process with External Factors
7.4 Some of the Relevant Methods Proposed for the Further Analysis
7.4.1 The Nonlinear Tracking-Differentiators
7.4.2 The New Control Strategy for Uncertain Systems
7.4.3 The Use of Internal Model Principle
7.4.4 Some Applications of the Control Methods
7.5 Call for Advanced Method on Machine Learning
7.6 Conclusions
References
8 Estimating the Number of Fork Projects of Bitcoin Based on a Birth-Death-Immigration Process
8.1 Introduction
8.2 Fork Project
8.3 Population Model
8.4 Parameter Estimation
8.5 Numerical Results
8.6 Conclusions
References
9 Patterns Versus Spatial Heterogeneity—From a Variational Viewpoint
9.1 Introduction
9.2 Pattern Formation in Single Equations
9.2.1 Allen-Cahn Equation with Variable Coefficients
9.2.2 A Mono-Stable Nonlinearity
9.3 Pattern Formation in Systems of Equations
9.3.1 An Activator-Inhibitor System
9.3.2 Two-Stage Model
References
10 A Summary: Quantifying the Complexity of Financial Markets Using Composite and Multivariate Multiscale Entropy
10.1 Introduction
10.2 Quantifying the Complexity with Composite Multiscale Entropy Algorithm
10.2.1 Composite Multiscale Entropy Algorithm
10.2.2 The Combined Application of EEMD Method and CMSE Method
10.3 Quantifying the Complexity Using Multivariate Multiscale Entropy Analysis
10.3.1 Definition of Multivariate Sample Entropy
10.3.2 Quantitative Analysis of the Complexity of China's Stock Market
10.3.3 An Attempt to Quantify the Complexity of the Global Stock Market
10.4 Conclusions
References
11 Operator-Valued Dirichlet Forms and Module Operator Markov Semigroups
11.1 Introduction
11.2 Preliminaries
11.3 Operator-Valued Dirichlet Forms
11.4 Beurling-Deny Type Criterion
References
12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System with Generalized Logistic Source and Nonlinear Secretion
12.1 Introduction and Statement of Main Results
12.2 Local Existence and Preliminaries
12.3 Uniform Boundedness and Global Existence
12.4 Long Time Dynamics for Chemotaxis-Growth Models
References
13 Li-Yau Gradient Estimate on Graphs
13.1 Introduction and Notations
13.2 Curvature Dimension Inequalities
13.3 Li-Yau Inequality and Its Applications
13.3.1 Li-Yau Inequality for Bounded Laplacians
13.3.2 Li-Yau Inequality for Unbounded Laplacians
13.3.3 Gradient Estimate on General Graphs
References
14 Iterative Learning Control for FinTech
14.1 Introduction
14.2 Basic Principles of Iterative Learning Control
14.2.1 Structure of ILC
14.2.2 Formulation of ILC
14.3 Recent Progresses and Potential Applications
14.3.1 ILC for Multi-agent System
14.3.2 Fuzzy ILC
14.3.3 Predictive ILC
14.3.4 Other Application Scenarios of ILC
14.4 Example: Application of MAS Leader-Following Problem
14.4.1 Backgrounds
14.4.2 An Application of MAS Leader-Following Problem
14.4.3 Extensions
14.5 Conclusions
References
Appendix Index
Index
Recommend Papers

Proceedings of the First International Forum on Financial Mathematics and Financial Technology (Financial Mathematics and Fintech)
 9811583722, 9789811583728

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Financial Mathematics and Fintech

Zhiyong Zheng Editor

Proceedings of the First International Forum on Financial Mathematics and Financial Technology

Financial Mathematics and Fintech Series Editors Zhiyong Zheng, Renmin University of China, Beijing, Beijing, China Alan Peng, University of Toronto, Toronto, ON, Canada

This series addresses the emerging advances in mathematical theory related to finance and application research from all the fintech perspectives. It is a series of monographs and contributed volumes focusing on the in-depth exploration of financial mathematics such as applied mathematics, statistics, optimization, and scientific computation, and fintech applications such as artificial intelligence, block chain, cloud computing, and big data. This series is featured by the comprehensive understanding and practical application of financial mathematics and fintech. This book series involves cutting-edge applications of financial mathematics and fintech in practical programs and companies. The Financial Mathematics and Fintech book series promotes the exchange of emerging theory and technology of financial mathematics and fintech between academia and financial practitioner. It aims to provide a timely reflection of the state of art in mathematics and computer science facing to the application of finance. As a collection, this book series provides valuable resources to a wide audience in academia, the finance community, government employees related to finance and anyone else looking to expand their knowledge in financial mathematics and fintech. The key words in this series include but are not limited to: a) Financial mathematics b) Fintech c) Computer science d) Artificial intelligence e) Big data

More information about this series at http://www.springer.com/series/16497

Zhiyong Zheng Editor

Proceedings of the First International Forum on Financial Mathematics and Financial Technology

123

Editor Zhiyong Zheng Renmin University of China Beijing, China

ISSN 2662-7167 ISSN 2662-7175 (electronic) Financial Mathematics and Fintech ISBN 978-981-15-8372-8 ISBN 978-981-15-8373-5 (eBook) https://doi.org/10.1007/978-981-15-8373-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

With big data, AI, cloud computing, and block chain, the digital technology is profoundly changing the innovation capability, service quality, and risk control of the financial market. With the rapid development of FinTech, the in-depth integration between mathematics, finance, and advanced technology is the general trend. In such a context, School of Mathematics with several other faculties in Renmin University of China and Zhongguancun Internet Finance Center together held the first International Forum on Financial Mathematics and FinTech in Suzhou Campus from June 29 to July 2, 2019, and invited several distinguished scholars specializing in mathematics, statistics, IT, and interdisciplinary study of financial subjects from all over the world. The forum held exchanges and discussions about the bottlenecks faced by emerging technologies such as big data, AI, cloud computing, and block chain, with the aim to improve financial analysis and decision-making and enhance the quality of financial services and risk control. Through discussion and communications, new ideas and new directions were put forward to promote the upgrading and development of the digital financial industry and give full play to the innovation brought about by the underlying technology of financial science and technology. Through this forum, we have understood the development frontier and research hotspot of financial mathematics and financial technology, and strengthened the contact between our institute and research institutes from home and abroad. The proceedings focus on selected aspects of the current and upcoming trends in FinTech. In detail, the included scientific papers focus on financial mathematics and FinTech, presenting the innovative mathematical models and state-of-the-art technologies such as deep learning, with the aim to improve our financial analysis and decision-making and enhance the quality of financial services and risk control. The variety of the papers delivers added value for both scholars and practitioners where they will find the perfect integration of elegant mathematical models and up-to-date data mining technologies in financial market analysis.

v

vi

Preface

Chapter 1 provides the general overview of the practice and developments of digital inclusive finance in China. The applications of the digital industry of Fintech are elaborated in this chapter from various aspects. In addition, the authors provide cutting-edge outlook of the trends in Fintech and digital inclusive finance in China. Chapter 2 mainly deals with two problems inherent in today’s cryptocurrency market: the lack of a standardized methodology to convert the value of trades; the discrepancy present on cryptocurrency markets between the theoretical opportunity and the practical impediments to arbitrage. It presents several mechanisms to calculate indicative prices for forex and cryptocurrency markets in terms of a numeraire. A general theorem that guarantees the inability to induce an arbitrage for all the pricing mechanisms is presented. Chapter 3 did a survey on the applications of deep learning in financial markets, including the theory and applications of deep learning models in financial markets. The applications focus on financial predictions and quantitative trading, such as sentiment prediction, index prediction, intraday data prediction, financial distress prediction, and event prediction. The applications of markets focus on stock markets, futures markets, exchange rate markets, and energy markets. Finally, there are also some innovative methods in deep reinforcement learning for applications in financial fields. Chapter 4 investigates the time-series properties of information adopted in the intraday market, in particular, the causality effects. In terms of intraday information efficiency, it is worthwhile to adopt both types of information. Furthermore, there is still room for improving the price discovery process to reveal such information more effectively. Chapter 5 reviewed the research progress of the traditional group signature, and then summarized the main progress on lattice-based group signature schemes in recent years, and analyzed the tools used for designing signature schemes. Chapter 6 finds an answer for the assessment of hybrid organizational performance and a path for further study. A systematic review points out that mathematically understanding the quality of performance is necessary for the performance assessment. Chapter 7 models the evolution of the main financial indexes; a novel fractal structure model, which is iterated by a self-interactive process with the external factors, is proposed. And the relevant methods proposed for the further analysis of financial systems are also reviewed. Finally, from the viewpoint of the future development of financial technology, we propose some considerations that need to deal with, for example, the methods of advanced learning and intelligent automation or optimization for complex systems. Chapter 8 estimates the number of fork projects of Bitcoin based on a birth-death-immigration process. Chapter 9 generalizes the notion of pattern to that of spatially heterogeneous environments and builds a unified theory of spontaneous emergence of patterns against spatially homogeneous or heterogeneous backgrounds. Chapter 10 quantifies the Complexity of Financial Markets Using Composite and Multivariate Multiscale Entropy.

Preface

vii

Chapter 11 extends the noncommutative symmetric Dirichlet forms to the operator-valued setting based on the framework of the order of Hilbert W*-bimodules, and establishes the Beurling-Deny criterion between operator-valued Dirichlet forms and the associated module operator Markov semigroups. Chapter 12 studies the dynamical properties of nonnegative solutions for the quasi-linear parabolic-elliptic Keller–Segel chemotaxis system with generalized logistic source and nonlinear secretion. Chapter 13 provides Li-Yau gradient estimate and its applications for graphs. Chapter 14 introduces the potential application of an intelligent control approach, ILC, in FinTech and investigates how ILC can be applied in financial or economical models. We would like to take this opportunity to thank all the participants at the first International Forum on Financial Mathematics and FinTech. We are also pleased to thank the support of School of Mathematics, Renmin University of China, and Engineering Research Center of Finance Computation and Digital Engineering, Ministry of Education. Beijing, China February 2020

Zhiyong Zheng

Contents

1

2

The Practice and Development of Digital Inclusive Finance in China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yong Liu and Yanhong Shen

1

On Arbitrage-Free Pricing in Numeraire-Free Markets: With Applications to Forex and Cryptocurrency . . . . . . . . . . . . . . Jonathan Mostovoy, Tomás Domínguez, and Luis Seco

21

3

A Survey on Deep Learning in Financial Markets . . . . . . . . . . . . . Junhuan Zhang, Jinrui Zhai, and Huibo Wang

4

Information Transition in Trading and Its Effect on Market Efficiency: An Entropy Approach . . . . . . . . . . . . . . . . . . . . . . . . . . Anqi Liu, Jing Chen, Steve Y. Yang, and Alan G. Hawkes

5

Survey of Lattice-Based Group Signature . . . . . . . . . . . . . . . . . . . . Lei Zhang, Zhiyong Zheng, and Wei Wang

6

Insight on Hybrid Organizational Performance: A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuting Wu and Yonghong Long

35

59 79

93

7

The Complex Systems’ Methods in Financial Science and Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Wei Wang

8

Estimating the Number of Fork Projects of Bitcoin Based on a Birth-Death-Immigration Process . . . . . . . . . . . . . . . . . . . . . . 119 Wei Dai

9

Patterns Versus Spatial Heterogeneity—From a Variational Viewpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Izumi Takagi

ix

x

Contents

10 A Summary: Quantifying the Complexity of Financial Markets Using Composite and Multivariate Multiscale Entropy . . . . . . . . . . 139 Yunfan Lu and Zhiyong Zheng 11 Operator-Valued Dirichlet Forms and Module Operator Markov Semigroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Lunchuan Zhang 12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System with Generalized Logistic Source and Nonlinear Secretion . . . . . . . 177 Xin Wang, Tian Xiang, and Nina Zhang 13 Li-Yau Gradient Estimate on Graphs . . . . . . . . . . . . . . . . . . . . . . . 207 Yong Lin and Shuang Liu 14 Iterative Learning Control for FinTech . . . . . . . . . . . . . . . . . . . . . 217 Kun Zeng Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

Chapter 1

The Practice and Development of Digital Inclusive Finance in China Yong Liu and Yanhong Shen

Abstract The current world is facing a new round of technological revolution and industrial revolution. New technology and new business patterns merge in endlessly. Information wave characterized by digitalized, networked, intelligentized is booming and digital economy and sharing economy develop rapid in the global scope. As one of the important content, fintech has become a hot spot in the global financial innovation. In the year of 2017 and 2018, global fintech financing kept heating up, and the scale of fintech industry was rapidly expanding. The promotion of technology to finance is no longer limited to the aspects such as channels, but has been open to the deep integration of finance and technology. In the future, fintech and digital inclusive finance will give further play to their advantages and provide more new means to address the imbalance and inadequacy in financial development.

1.1 Overview of Fintech Development 1.1.1 Overview of Global Fintech Development The scale of fintech industry is growing rapidly in the world wide, among which, the United States and China are leading the global fintech market. According to a 2017 research from Ernst & Young, the average global penetration rate of fintech is 33 %, more than double the 2015 figure. The penetration rate of fintech in emerging countries such as China is 46 %, higher than the global average. Global fintech investment has increased from $ 3.9 billion and 608 deals in 2013 to $ 17.9 billion and 1350 deals in 2017. Among the top 100 global fintech companies in 2018, the US has the most companies on the list, with 18, accounting for three of the top 10, and China has 11, accounting for three of the top five. Of the 34 fintech unicorns in the Y. Liu (B) · Y. Shen Zhongguancun Internet Finance Institute, Beijing, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_1

1

2

Y. Liu and Y. Shen

world in 2018, 18 are in the US and 8 are in China. The top two fintech investment and financing countries are China and the United States. In 2018, there is at least 1 097 fintech investment and financing deals around the world, with a financing amount of about  436.09 billion, of which  121.53 billion is raised in June, the highest amount since 2016. Regionally, North America has the largest amount of financing and deals, followed by Asia and the United States and China has become global leaders in fintech. The global fintech sector is dominated by early-stage investment, but the proportion of early-stage financing enterprises decreased quarter by quarter, while the proportion of mid-stage financing increased quarter by quarter. Fintech not only promotes financial institutions to improve the efficiency of resource allocation and risk management, but also brings challenges to the current regulatory system. Financial risks in the era of fintech need to be prevented and resolved by effective financial supervision. Regulators in various countries have formed a consensus on the concept of supervision and innovation, which is encouraging and guiding the development of fintech on one hand, paying close attention to and analyzing the development and application of emerging technology, and focusing on strengthening the supervision of financial innovation mode on the other hand. In practice, most countries and regions believe that the business activities of fintech should be integrated into the existing regulatory system to ensure the consistency of regulatory principles. In terms of the design of regulatory departments, it is generally accepted by all sectors that existing facilities can be used to realize “sub-link” and “infiltration” supervision of fintech activities according to the division of functions of relevant departments. In terms of the design of regulatory framework, a subdivision catalogue can be created on top of the existing framework and dynamic supervision can be realized. “Regulatory sandbox” is a widely used new regulatory means. The Financial Conduct Authority (FCA) has pioneered the “regulatory sandbox” programme in 2015, which aims to simplify market access standards and procedures for fintech products, and facilitate the rapid introduction of new products without compromising consumer rights. Subsequently, Singapore, Australia, Hong Kong, Canada and other countries and regions have followed and applied the regulatory sandbox to the regulation of fintech. In 2017 and 2018, according to the regulatory policies and measures, different countries have different priorities in fintech regulation. The United States and Mexico adhere to the principle of functional regulation and integrate various innovative businesses into the corresponding regulatory system according to their financial functions. In addition, the United States tries to promote the safe development of fintech by regulating the application of sandboxes, and also seeks to incorporate emerging models into the existing regulatory system. The UK and Singapore focus on exploring the use of technology to enhance regulatory effectiveness to address regulatory challenges posed by fintech. The EU emphasizes creating a favorable environment for innovation subjects and eliminating non-technical barriers for fintech companies to enter the market.

1 The Practice and Development of Digital Inclusive Finance in China

3

At present, the United States, UK, Singapore, Japan and Hong Kong have become fintech centers with international influence thanks to their huge professional teams, rich investment resources and financial infrastructure, as well as the full support of the government. The United States is an innovator and pioneer in fintech. It has rich talent resources, sufficient corporate financing, and a high degree of technicalization of traditional financial institutions, supplemented by preferential fiscal and tax policies. It values and encourages underlying scientific and technological research and develop fintech in an all-round way. As a global fintech center and a model of fintech regulation, the UK has a good talent pool and a good investment climate. It has introduced a number of innovation policies to support the development of fintech companies. The government of Singapore is fully involved in the development of the fintech industry, and has successfully built a fintech center in southeast Asia by relying on tax incentives, legal incentives and extensive international cooperation. Japan has a relatively active investment environment, a multi-level capital market and a relatively perfect guarantee system, which has laid a good foundation for the development of fintech in Japan. Hong Kong has a complete financial ecology, high-quality commercial infrastructure, and multi-financial support from the market and the government. It is located in the Guangdong-Hong Kong-Macao Greater Bay Area, which is an ideal area for the combination of finance and technology.

1.1.2 Overview of China Fintech Development In 2017, China entered the era of fintech 4.0 with highly integration of finance and technology. The explosive development of technology and the deep embrace of users have enabled China’s fintech to break through the original framework of financial instruments, financial channels and financial services and continue to make breakthroughs on the new track. The development of China fintech industry ranks among the top of the world. In terms of the number of leading companies, according to KPMG’s Global Finance and Technology 100 released in 2018, Chinese companies Ant Financial, JD.com and LU.com are in the top five, reflecting China’s leading position in the fintech industry in the world. The revenue scale of the fintech industry has maintained steady growth, and there is large growth space in the market size. In 2017, the total revenue of China’s fintech enterprises reached  654.1 billion, an increase of 55.2 %, and the total revenue of China’s fintech enterprises is expected to reach  1970.5 billion in 2020. Great progress has been made in technological innovation. In 2017, China patent applications in the five major technologies of big data, cloud computing, artificial intelligence, biometrics and blockchain has exceeded that of the United States in absolute terms, and has greatly led the traditional technology powers such as the United Kingdom, France, Germany and Japan. The pace of talent growth lags far behind the speed of development of fintech itself, resulting in a serious shortage of talents. According to Michael Page’s 2018 China Fintech Employment Report,

4

Y. Liu and Y. Shen

Fig. 1.1 China fintech financing amount in 2012–2018 (US $ 100 million)

92 % of fintech companies surveyed found that China is facing a severe shortage of fintech professionals. As is shown in Fig. 1.1 (Source: 01caijing.com and Zhongguancun Internet Finance Institute), after a high growth in 2013–2015, China fintech financing slowed in 2016 and 2017, and investment gradually returned to rationality. The significant increase in financing in 2018 was due to $ 14 billion in larger financing from Ant Financial and  13 billion in larger financing from JD.com. In terms of the investment cycle, the long-term and mature investment is relatively active, especially the growth rate of long-term investment. Fintech companies are maturing, and investment corporations tend to invest early start-ups that can demonstrate their business models and have the ability to expand. The key to attract investment are corporate profitability and business model realization. In 2018, the comprehensive finance sector received a total of  246.435 billion in financing, far higher than the amount raised in other sectors. Consumer finance is also one of the highest financing sectors, followed by internet finance and roboadviser. In terms of fintech technology, blockchain financing is far ahead of several other technologies (see Fig. 1.2). In terms of the deals and amount of investment, institutions that have made the most of investments in fintech industry in recent years include Sequoia Capital, Matrix china and IDG Capital and so on. From the investment amount point of view, Sequoia Capital, SCGC, Tencent Collaboration Fund are the most devoted (see Table 1.1). Regulations are introduced intensively in 2017 and 2018. China fintech industry supervision has gradually entered the elaboration phases. In April 2016, the Gen-

1 The Practice and Development of Digital Inclusive Finance in China

5

Fig. 1.2 Amount of financing in China fintech segment 2018 ( 100 million) Table 1.1 China fintech industry investment active institutions as to 2018 Name of the institution Investment deals Total amount of investment (US $ 10,000) Sequoia Capital Matrix china IDG Capital ZhenFund SCGC Tencent Collaboration Fund China Broadband Capital Legend Capital Fortune VC

68 61 44 39 37 35 33 32 31

2761658 168388 228438 29794 1723021 1224302 281234 59140 16436

Source China Financial Technology Investment and Financing Report jointly published by Zhongguancun Internet Finance Institute and InnoTREE in 2017 and 01caijing.com

eral Office of the State Council officially issued the Implementation Plan for the Special Rectification of Internet Financial Risks, which marked the official launch of the special rectification work on internet financial risks in China. China fintech industry supervision has entered a new stage in 2017 with the introduction of a new phase of internet financial risk control, supervision of internet lending industry, payment industry supervision and other related policies. Supervision gradually refined (Table 1.2). In May 2017, People’s Bank of China established the Fintech Committee, which aims to strengthen the theoretical study and applied research, deploy regulatory technology, and identify and prevent new types of financial risks. The China Bank-

6

Y. Liu and Y. Shen

Table 1.2 China fintech regulatory documents in 2017 and 2018 The field of Release time Publishers remediation Internet Finance Risks 2017.5

Internet Lending

17 ministries and commissions including the People’s Bank of China

2017.6

Internet finance regulation work leading group office

2017.8

Office of leading group for special rectification of Internet financial risks

2017.2

Former China Banking Regulatory Commission (CBRC) Former China Banking Regulatory Commission (CBRC)

2017.8

2017.9

National Internet Finance Association of China

2017.10

National Internet Finance Association of China

Name of the regulation documents Notice on Further Improving the Special Rectification of Internet Financial Risks Notice on Cleaning up and Rectifying Illegal Businesses Jointly Conducted by Internet Platforms and Various Trading Venues Notice on the Implementation of Work Requirements for the Next Stage of the Clean-up and Rectification Guidance on Online Lending Fund Depository Business Guidance on Information Disclosure of Business Activities of Online Lending Information Intermediaries Standards for the Depository System of Individual Online Loan Funds in Internet Finance (draft for comments) and Standards for the Depository Business of Individual Online Loan Funds in Internet Finance (draft for comments) Internet Financial Information Disclosure Individual Online Lending (T/NIFA 1-2017) group standard (continued)

1 The Practice and Development of Digital Inclusive Finance in China Table 1.2 (continued) The field of Release time remediation 2017.11

2017.12

2017.12

2017.12

Payday loan

2017.5

2017.12

Network loan

2017.11

2017.12

7

Publishers

Name of the regulation documents

P2P Network Lending Risk Special Rectification Leading Group Office, National Internet Finance Association of China National Internet Finance Association of China

Notice on the Assessment of the Depository Management of Online Lending Funds

P2P Network Lending Risk Special Rectification Leading Group National Internet Finance Association of China

Former China Banking Regulatory Commission, Ministry of Education, Ministry of Human Resources and Social Security Internet Financial Risk Special Control Leading Group, P2P Network Lending Risk Special Rectification Leading Group P2P Network Lending Risk Special Rectification Leading Group

Standards for the Depository System of Online Lending Funds by Individuals in Internet Finance, and Standards for the Depository Business of Online Lending Funds by Individuals in Internet Finance Notice on Special Rectification and Acceptance of P2P Online Lending Risks Contract Elements of Internet Finance for Individual Online Lending and Loan (T/NIFA 5-2017) Notice on Further Strengthening the Standardized Management of Campus Loans Notice on Standardizing and Rectifying “Payday Loan” Business

Notice on the Immediate Suspension of Approval of Online Small Loan Companies Former China Banking Implementation Plan Regulatory for Risk Rectification Commission of Online Small Loan Business of Small Loan Companies (continued)

8

Y. Liu and Y. Shen

Table 1.2 (continued) The field of Release time remediation Third-party payments

Virtual currency

Publishers

Name of the regulation documents

2017.12

People’s Bank of China

2017.12

People’s Bank of China

2017.12

People’s Bank of China

2018.6

People’s Bank of China

2017.9

7 ministries and commissions including the People’s Bank of China People’s Bank of China

Notice on Regulating Payment Innovation Business Notice on Adjustment of Centralized Deposit Ratio of Payment Institutions’ Customer Provisions Bar Code Payment Business Specification (Trial) supporting the issuance of the Bar Code Payment Security Technical Specification (Trial) and Bar Code Payment Receiving Terminal Technical Specification (Trial) Notice of the Office of the China People’s Bank on the Central Deposit of All the Provisions of the Payment Institution’s Customer Funds Announcement on Preventing the Risk of Financing of Token Issue Notice on the Conduct of Self-examination and Rectification of Payment Services for Illegal Virtual Currency Transactions Notice on Strengthening the Management Business and Carrying out Acceptance Work Through the Internet Guidance on Further Regulating Internet Sales and Redemption of Money Market Funds

2018.1

Internet Asset Management

2018.3

Internet Financial Risk Special Control Leading Group

2018.5

The former securities regulatory commission of China, the People’s Bank of China

1 The Practice and Development of Digital Inclusive Finance in China

9

ing Regulatory Commission and the China Insurance Regulatory Commission have merged to form the China Banking Insurance Regulatory Commission to adapt to the cross-border operating environment of financial industry. In March 2017, Futian District of Shenzhen issued “Several Opinions on Promoting the Rapid, Healthy and Innovative Development of Fintech in Futian District”, launching a number of innovative initiatives in the field of fintech. In October 2018, Zhongguancun Science and Technology Park Management Committee, Beijing Municipal Financial Work Bureau, Beijing Municipal Science and Technology Commission jointly issued the Beijing Municipal Development Plan for the Promotion of Fintech (2018–2022), proposing to create an industrial clustering pattern with financial supervision technology as the core and financial innovation application as the support. At the same time, the Beijing Municipal Finance Bureau, Zhongguancun Management Committee, Xicheng District Government and Haidian District Government jointly issued the Guidance on the Development of Fintech Innovation in the Capital, which provides support for the research and development of fintech technology, construction of facilities, strengthening of the application of fintech, and the construction of a fintech demonstration zone and so on. In terms of the number of fintech companies and the development of the underlying technology, industrial ecology and financing, Beijing, Shanghai, Shenzhen, Hangzhou has become China fintech development leading area. Beijing is home to China financial management headquarters and the financial supervision headquarters. Top universities and research institutes are gathered in Beijing. The construction of the national technology innovation center, represented by Zhongguancun, has spawned a number of excellent enterprises. Shanghai is the commercial and financial center of China. As a special economic zone of China, Shenzhen has its own innovation genes and abundant talents and technical resources to support the development of fintech. Hangzhou government has always attached great importance to the “Internet+” industry, and Hangzhou is the home of ant financial, the head of fintech, which has promoted the rapid development of fintech in Hangzhou.

1.2 Application of Digital Industry of Fintech Fintech emphasizes the combination of finance and technology, and the focus on technology. Fintech technology mainly includes big data, artificial intelligence, interconnection technology, cloud computing, blockchain and security technology. Banks, insurance and other traditional financial industries rely on these technologies to achieve their own transformation and development. Technological innovation has given birth to intelligent investment, supply chain finance, consumer finance, thirdparty payment, regulatory technology and other emerging fields. The promotion of technology to finance is no longer limited to the business channels, but opens up to full integration of “finance and technology”.

10

Y. Liu and Y. Shen

1.2.1 Innovation and Developments of Fintech Industry in Banking 1.2.1.1

Overview of Digital Transformation in Banking

The physical interaction of social life gradually replaced by digital media because of the development of digital technology, and the popularization of internet technology and mobile devices accelerates the process of digitization. Digitization has brought about three important changes. First, user’s behavior and expectations have changed dramatically, and user’s demand for immediacy and convenience of products and services is increasing. Second, the loyalty of consumers depends greatly on whether the product or service meets their needs, and last, in financial world, the changes brought about by the digital process have created opportunities for non-bank fintech companies that take full advantage of technology. Different from the impact of the Internet on Banks that including expanding online business to offline, business processing process optimization and more convenient customer acquisition, fintech has penetrated into the risk control system and the underlying logical structure of Banks. This change has altered the success factors of banks from the number of branches, the amount of assets and whether they can achieve mass production of standardized products to the amount of data and the ability to use it. Banks need to provide personalized services for customers. Banks need to utilize the ability of technology to create new value, and through the application of technology, they need to connect their services to the complex digital value chain, reshape the way connecting with customers and gaining costumers’ trust. At present, banks seek innovative development mainly through two ways. One is internal innovation and the other is the external embedded development. Internal innovation includes changing business strategies, building financial services platforms, and building self-built fintech companies. External embedding is mainly the cross-border cooperation between banks and fintech companies. According to the direction and content of cooperation, external embedding can be divided into product cooperation, service cooperation and strategic cooperation.

1.2.1.2

Open Banking

In order to achieve the integration of data sharing and application scenarios between banks and banks, banks and non-bank financial institutions and even with crossborder enterprises, open banking is emerging. Open Banking was proposed by the UK, and in September 2015, the UK Treasury set up an Open Banking Working Group to study how data could be used to help people run their financial operations, and the Open Banking Standard was published at the end of the year. At the same time, at the end of 2015, the European Union adopted the Payment Service Directive 2 (PSD2), under which European banks “must” open payment services and related customer data to third-party service providers that lend credit to customers, with

1 The Practice and Development of Digital Inclusive Finance in China

11

the main objective of promoting innovation in the payment industry and promoting market competition and dynamism through the introduction of third-party payment service providers. The Directive requires member states to implement it in national laws by 13 January 2018. Initiatives in the European Union and the United Kingdom have had an important impact on the world, with Singapore, Hong Kong, Australia and other countries actively promoting the development of open banks in various ways. Open banking practices in countries or regions such as the UK, Europe are dominated by the open customer data to qualified third party after authorized by customers, largely due to regulatory requirements. In China, the rapid development of fintech in recent years and the active exploration of open banking by privately-owned banks have also made some banks put into the practice of open banking, but China’s open banking are mainly open to functions, including payment settlement, credit, risk control, which is the spontaneous and market-driven operation. Whether the promotion of opening customers’ data abroad or the current opening banking services at home, the optimal goal is to encourage market competition, promote innovation in financial products and services, and improve the quality of financial services. The practice of open banking in China could be traced back to 2012 when the Bank of China proposed the concept of open platform. Since the second half of 2018, the concept of open banking has become popular in China, and several joint-stock banks and large state-owned commercial banks have accelerated the practice of open banking (Table 1.3).

1.2.1.3

Innovation and Development of Fintech in Insurance Industry

In 2017 and 2018, the number of Internet insurance deals and the amount of premium income are in reverse growth. Internet property insurance market was speeding up. The importance of non-car insurance highlighted. Internet life insurance market has gradually lost growth momentum. Health insurance developed rapidly, and financing volume was rising steadily. The business model to business (2B) was become the main pattern. Technologies such as big data and artificial intelligence are constantly being applied to the insurance industry, and new products and new models are emerging. At the same time, fintech technology has had great impact on insurance theoretical basis. Thus, traditional insurance company system cannot fully adapt to the development of technology and business model needs to be reformed. Analyzing customer needs and precision marketing are main application practices of using big data to drive the development of insurance. For example, as early as 2013, Ping An Insurance has been working with Baidu to use big data to build a “car owner ecosystem”. At present, “big data and insurance” is moving towards predicting agent sales rate, promoting the automation of the underwriting process, and identifying high-risk insurance policies. The application of blockchain in the insurance industry is mainly reflected in the use of smart contracts, avoiding fraud and management of user information.

12

Y. Liu and Y. Shen

Table 1.3 Open banking practices of banks in China Time Banks Open banking practices 2012 2017

20187

20188

20189 20189

2018.10

Bank of China

Launch the BOC Open Platform, opening more than 1600 API Huarui bank Bank for “Friends Circle” and Huarui Bank Open Platform for Enterprises (Extreme SDK) Shanghai Pudong Development Bank Launch the industry’s first API Bank. Fully open banking services, seamless integration into the social life, production, management of all links, the scene of finance into the Internet ecology, around the customer demand and experience, customers can the bank API through the enterprise portal, enterprise resource planning management system, WeChat program, partner APP and other channels, forming ready-to-use cross-border services Industrial and Commercial Bank of China fully implement e-ICBC 3.0 Internet Finance Development Strategy, transition to “Smart Bank” China Construction Bank Set off open banking management platform China Merchants Bank Announced iterative launch of two APP products 7.0 version—China Merchants Bank App 7.0, Palm Life App 7.0, mobile phone applications into an open platform, on this platform through API, H5 and APP jump and other connections to achieve the connection between the financial and living scenes Wuhan Zhongbang Bank Launch the “Zhongbang Bank Open Platform”. The platform will focus on supply chain financing, investment, accounts, payment in various fields

The application of cloud computing in insurance is mainly reflected in the following aspects, which including optimizing the internal management approval process, automated marketing engine, assisting agents in managing advertising, providing cross-selling recommendations. The application and popularization of Internet of Things (IOT) combined with traditional insurance products not only can be used to develop users based insurance products (UBI), tracking user behavior through devices, but also can achieve deep integration with other industries. In March 2016, Kulong Insurance, Chinese first IOT based insurance company was established. Kulong Insurance serves equipment and equipment manufacturing related enterprises, customized insurance products

1 The Practice and Development of Digital Inclusive Finance in China

13

and services for customers by using IOT technology to identify of risk and precision marketing. The “IOT and Insurance” in China has developed later than other countries, but there is still a lot of development opportunities. In other countries, the IOT has been the first technology to be used in health and property insurance field. In smart home field, IOT has been used to monitor home device security in real-time. In smart personal wearable device field, IOT has been used to collect data of users and to provide health management and other related value services.

1.2.1.4

Innovation and Development of Fintech in Other Segments

China robo-adviser industry is still in its infant phase, and business model lacks its own characteristics. In 2017 and 2018, the size of China wealth management market got further expanded and robo-adviser got further increased both in asset management scale and market penetration rate. There are three main market players, including banks, fund and securities companies and third-party wealth management institutions. Most of the robo-adviser products are aimed at individual customers. As a result of license restrictions, banks, funds and securities companies have become the main players in the industry. At present, the artificial intelligence adopted by robo-adviser is still in the stage of weak artificial intelligence. With the further development of the whole industry and the progress of technology, extremely personalized programs will appear. In 2017 and 2018, the size of China supply chain financial market continues expanded. Among the market participants in supply chain finance, technology enterprises are playing a more and more important role. Among the business models involved in supply chain finance, accounts receivable gained the largest financing amount and pure credit financing got the least. New technologies represented by blockchain, big data and the IOT are driving the transformation of supply chain finance to more digital and intelligent. In 2017 and 2018, the market participants engaged in consumer finance business developed well. By May 2018, the number of licensed consumer finance companies had reached 26, and these companies showed obvious Matthew Effect in operation. Big data marketing, big data risk control and scenario expansion have the greatest impact on the development of consumer finance business. The proportion of consumer credit in loans is rising, and the importance of consumer credit in economic development is increasing. From 2012 to 2017, the balance of consumer loans of Chinese households grew by 24.7 % annually on average, portending broad prospects for future development and huge consumer finance market. Third-party payment on the basis of the overall environment is improving and supervision is tightening. Transaction scale is expanding and the growth is leveling off. In 2017 and 2018, artificial intelligence, biometrics and blockchain technology had been more widely used. At present, Alipay, WeChat payment and other thirdparty payment platforms have introduced fingerprints, face payment. Blockchain is gradually applied in the fields of cross-border payment, payment network construction and digital bill.

14

Y. Liu and Y. Shen

At present, China’s suptech development is its early times. The research on suptech is mainly on academic level, and there are few enterprises that actually apply suptech. Suptech in China is still a market to be developed.

1.2.2 Inclusive Finance in Digitalization Background 1.2.3 Overview of the Development of Digital Inclusive Finance In recent years, inclusive finance has developed rapidly and the government has issued several policies to promote it. At present, China digital inclusive finance is in the stage of internet inclusive finance, and is making great strides towards the stage of highly integrated development of technology and inclusive finance. China digital inclusive financial infrastructure has been constantly improved, and has shown the feature of extensive service coverage, popular customer groups, data-oriented risk management, and low transaction costs. But at the same time, digital inclusive finance also has challenges in information security, technological innovation and credit system construction. The essence of inclusive finance is to provide services to all sectors and groups of society in all aspects, and the wide-ranging inclusiveness of inclusive finance objectively determines that inclusive financial has the feature of high risk, high cost and low return, which makes financial institutions lack the impetus to promote the development of inclusive finance. The development of digital technology such as big data and artificial intelligence have great effect on eliminating the constraints of time and space, promoting information sharing, reducing transaction costs and financial services threshold, effectively expanding the coverage of financial services. Digital inclusive finance has the traits of wide coverage, popularization of customer groups, datamation of risk management and low transaction cost. Digital inclusive finance is supported by infrastructure such as communication, network environment, payment system, credit, etc., and further combined with advanced technology such as artificial intelligence, big data and cloud computing, to build a digital inclusive financial system, through various channels and methods to promote rural revitalization, precision poverty alleviation, micro-enterprise financing and so on (Fig. 1.3). In terms of the environment of China digital inclusive financial infrastructure, the number of internet users, especially rural Internet users are increasing. The communication and network environment has been further optimized. The scale and utilization rate of online payment are relatively high, among which the proportion of rural internet users is constantly increasing, and the payment system covers a wider range.

1 The Practice and Development of Digital Inclusive Finance in China

15

Fig. 1.3 Digital inclusive financial operations logic diagram

1.2.4 China Digital Inclusive Financial Practice With the continuous integration of digital technology and inclusive financial idea, digital inclusive financial practice has been emerged all over the world. The application‘of technology such as big data and artificial intelligence is driving the transition from finance to intelligent finance, which benefit the financial fields of payment, credit, capital management and insurance, promoting the sustainable development of inclusive finance. Fintech technology has been widely used in the field of inclusive finance. Among them, new models and new experiences continue to emerge while promoting the rural revitalization strategy, wining a good fight against poverty, optimizing the financing environment of small micro-enterprises. Digital inclusive finance not only establishes a multi-dimensional credit management and risk evaluation system, but also significantly improve the breadth, depth and precision of financial services. It not only brings new market increment for agriculture and other important positions of inclusive finance, but also broadens new market for the optimization of industry ecology. The 19th National Congress of the Communist Party of China put forward the goal of China future development on the premise that China has had gotten rid of poverty, narrowing the gap between urban and rural areas and consolidating the foundation of the market economy. Therefore, the development of digital inclusive finance should start from serving “agriculture, rural areas and farmers” to promote the rural revitalization strategy, winning the battle against poverty and optimizing the financing environment for small and micro enterprises.

16

Y. Liu and Y. Shen

Fig. 1.4 Financial technology industry revenue and the number of newly registered enterprises from 2013 to 2020 ( 100 million, deal)

1.3 Trends in Fintech and Digital Inclusive Finance in China A. Fintech industry enters the phase of structural optimization, and the function of the digital economy engine will continue to appear. From 2008 to 2015, the number of fintech companies continued to grow, with a compound annual growth rate of 44.6 percent. But the number of newly registered companies decline sharply in 2015 and returned to triple digits in 2017. The number of fintech startups is expected to remain low in the future. But at the same time, the scale of fintech revenue from 2013 to 2017 has achieved positive growth, with a compound annual growth rate of 56.6 percent, and is expected to reach  1.97 trillion by 2020. At this stage, excess capacity will be continuously cleared, and the quality of development will continue to rise. The whole fintech industry will enter a phase of structural optimization (Fig. 1.4). Knowledge and information has been the key factors of production and digital technology innovation has become the core driving force of this era. The development of fintech will accelerate the integration of digital technology and the real economy, improving the digital and intelligent level of traditional industries, and the function of the engine of digital economy will continue to emerge. B. There is a huge market space for digital inclusive financial market. Targeted poverty alleviation, rural revitalization and financing for small and micro businesses will become the focus of its business. Fintech has brought new models, new methods and new paths to traditional inclusive finance. There will be more and more solutions to practical demands in large-

1 The Practice and Development of Digital Inclusive Finance in China

17

scale scenarios, and digital inclusive finance will enter a period of rapid development. All kinds of large institutions and suppliers will flood into the of digital inclusive finance market, creating a “big sailing era” in the field of digital inclusive finance. With the continuous development of economy and society, targeted poverty alleviation, rural revitalization and financing of small and micro businesses have become increasingly urgent and important. The 19th National Congress of the Communist Party of China (CPC) set great goals for 2020, 2035 and 2050, all of which require China to achieve the goals of getting rid of poverty, narrowing the gap between urban and rural areas and consolidating the foundation of the market economy. As a result, digital inclusive financial will focus on these three areas in the coming years and beyond. C. Empowering other financial entities will become the mainstream business model of the future. With the loss of the demographic dividend, upgrading product demands, more and more fierce competition, sharp rise of customer acquisition cost, decline of enterprise profitability and market expansion space, the business model targeted other enterprises gradually stands out. The future growth of fintech companies targeted other enterprises is higher than the fintech companies targeted individuals. Empowering other financial entities will become the mainstream business model of the future. At present, Chinese fintech enterprises empowering other companies is still in slowgrowth period. There is still a large gap in industry size compared with the United States. In the next ten years, China will enter into a period with deep development of technology penetration throughout the industry. D. Strong supervision of fintech will become normal, and new supervision models such as supervision technology will become mainstream. With the continuous development of fintech, the regulation of fintech will be more stringent. Although it has been nearly a decade since the 2008 financial crisis, the destructive impact of the financial crisis on the world economy still exists, and the financial risk sensitivity will be further enhanced. In the new technological environment, the seven major financial risks faced by financial institutions are credit risk, operational risk, market risk, liquidity risk, compliance risk, reputational risk and systemic risk which will appear in a more subtle, volatile and challenging form. Therefore, it is the main direction of the future development of fintech to strengthen financial supervision and promote the transformation of finance from virtual to real. In 2018, the China Securities Regulatory Commission officially issued the “Overall Construction Plan for Supervision Technology of the China Securities Regulatory Commission”, marking the completion of top-level design for supervision technology construction and entering the stage of full implementation. As a new type of supervision, supervision technology has very large room to play. Supervision technology itself is a high-tech industry, with the advantages of real-time tracking, end-to-end coordination, and technology regulation. It not only conforms to the development trend of technology and finance, but also has strong implementation and outstanding results. The popular “regulatory sandbox” is a way of implementing supervision

18

Y. Liu and Y. Shen

technology. In the next few years, regulatory technology will become the mainstream mode of supervision, accompanied by the rapid development and iteration of new technologies such as artificial intelligence and big data. E. The technology “ABCD” (Artificial intelligence, Blockchain, Cloud Computing, Big Data)will keep contributing to further development of fintech, and the empowering properties of artificial intelligence and big data will be further enhanced. Based on Gartner’s global emerging technology maturity curve and systematic analysis of fintech technologies, “ABCD” i.e. artificial intelligence (AI), blockchain, cloud computing and big data (Big Data) will keep leading the development of fintech in depth and breadth. On this basis, the “ABCD plus” technology ecosystem will emerge, and the entire market structure will be more clear. The empowering properties of artificial intelligence and big data will be further enhanced. In the next few years, artificial intelligence will penetrate the whole social field, from the core layer to the expansion layer, forming the artificial intelligence ecosystem. Big data can not only fully integrate the network computing center, mobile network technology with internet of things, cloud computing and other new cuttingedge network technologies, but also promote multidisciplinary cross-integration and give full play to the new functions and benefits of interdisciplinary and marginal disciplines in the new era. The iteration speed of blockchain technology will be further accelerated, and the integration with other technologies such as cloud computing will be deepened. Cross-chain technology will thrive. Public chain, alliance chain technology will grow rapidly. Among them, the alliance chain will become the mainstream model of commercial Banks’ blockchain innovation. Security and capacity improvement are the development emphasis of cloud computing in the next five years. 2017 was the year with more cyber attacks than any other year in history, and cyber attacks will only increase in the future. Under this expectation, government, public and private sector security analysts have to become more sophisticated vigilant in designing methods of detecting and preventing attacks. F. Transforming to open banking will be a necessary stage for the future development of the banking industry. Over the past few years, banks, fintech start-ups and third-party service companies have been working together to launch new applications and services. Large banks are also creating internal API and working with small fintech start-ups to integrate innovative technologies. In the future, the continuous development of fintech will make the traditional banking value chain vertically integrated fragmented. The trend of financial disintermediation is becoming more and more obvious, the transformation of open banking will become an important strategy for banking industry to cope with the impact of fintech. Open banking is a brand new bank form, service mode, development concept and cooperative relationship, and its huge potential for development is emerging. By 2022, open banking is expected to create at least70 billion in revenue in the retail and SME (Small and medium-sized enterprises) markets. Chen Liwu, deputy director of the Science and Technology Department of the Chinese People’s Bank of China, has said that the People’s Bank of China will balance the relationship

1 The Practice and Development of Digital Inclusive Finance in China

19

between security and innovation, adapt to international experience, establish and improve the open banking rules and regulatory framework on basis of the current development of China’s banking industry, speed up the introduction of guidance, set open bank service red line for different types of banking financial institutions and different kinds of financial services, clear and definite the type of open information interface, service scope and other key elements. Transforming to open banks will be an important trend for the future development of the banking industry. G. Supply chain finance presents three major trends which are data networking, transaction standardization and service refinement, and the “supply chain finance as a service” model is gradually taking shape. In the future, the development of supply chain finance will show three major trends which are data networking, transaction standardization and service refinement. Data networking means that on the basis of gradually digitalized information such as order situation, transaction history and subject qualification, data nodes such as e-commerce, payment, logistics, bank, government and customs will be connected to form a panoramic information picture of the prospective financing customer to reduce financing risks. Transaction standardization refers to gradual standardization of data information, transaction interface, contract and process of each main body of the supply chain to improve the interconnection efficiency of data and information. Service refinement means that there will be more and more professional and vertical supply chain solution providers to provide individualized and customized financing services. On the basis of the above three trends, a new type of supply chain financial platform will be formed. H. There is a shortage of fintech talents, and the construction of talent training will be specialized and systematic. The rapid development of fintech and the shortage of professional talents lead to a huge gap between them, and human capital has become an important factor for fintech enterprises to gain competitive advantage. Fintech industry will entered into a period of structural optimization in the next few years, which will have higher requirements on professionalism, complexity and actual combat on talents. The market will have more vigorous demands for professionals who can adapt to the development of fintech industry. At present, the government and relevant enterprises in fintech developed cities have adopted various measures to cultivate financial talents. It is explicitly put forward in “Beijing promote financial science and technology development plan (2018–2022)” that Beijing government propose to establish a fintech research base, establish fintech majors or departments, and cultivate a group of top financial and technological talents. Futian district of Shenzhen has had awarded the first batch of fintech youth talent training base in March 2018. With the continuous development of fintech and the further expansion of the demand for professional talents, fintech talents training will be specialized and systematic.

Chapter 2

On Arbitrage-Free Pricing in Numeraire-Free Markets: With Applications to Forex and Cryptocurrency Jonathan Mostovoy, Tomás Domínguez, and Luis Seco Abstract Our work presents several mechanisms to calculate indicative prices for forex and cryptocurrency markets in terms of a numeraire. One of the mechanisms is tailored for the practitioner and is thus accompanied by analytic estimates that maximize its computational efficiency. Additionally, we discuss how to leverage the prices provided by the different mechanisms in terms of a numeraire to deduce pairwise prices between all currencies. Finally, we prove a general theorem that guarantees the inability to induce an arbitrage for all the pricing mechanisms presented.

2.1 Introduction The motivation for writing this chapter was derived from two problems inherent in today’s cryptocurrency market. The first is the lack of a standardized methodology to convert the value of trades, observed from exchanges, in terms of traded currencies, to a more tractable quantity in terms of a numeraire. The second, is the discrepancy present on cryptocurrency markets between the theoretical opportunity and the practical impediments to arbitrage. In fact, the cryptocurrency market allows, for instance, to trade Bitcoin for USD, Bitcoin for Etherum, and Ethereum for USD on a particular exchange without the former two trades replicating the latter. Of course, this induces an arbitrage opportunity which is practically prevented, among other reasons, by long settlement periods and lack of liquidity [1]. J. Mostovoy (B) · L. Seco Department of Mathematics, RiskLab, University of Toronto, 40 St George St, Toronto, ON M5S 2E4, Canada e-mail: [email protected] L. Seco e-mail: [email protected] T. Domínguez Mathematical Institute, University of Oxford, Oxford OX1 3LB, UK e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_2

21

22

J. Mostovoy et al.

Given the motivating factor for this chapter, we have deemed it best to implement a notation tailored towards cryptocurrency markets. As the reader will notice, said notation coincides with that of forex markets. In fact, for the purposes of this chapter cryptocurrencies are considered equivalent to any other currency. As such, the results, as they are presented, hold both for the cryptocurrency market and the forex. It is however important to restate that they also apply, mutatis mutandis, to any transnational economy, be it a contemporary financial market or a barter economy. The remainder of the introduction explains the structure of the chapter, while also stating certain general implications of each section. Section 2.2 introduces preliminary definitions and notation. In Sect. 2.3, we describe two mechanisms to price all currencies in terms of an arbitrary numeraire. Due to the practical limitations present in these pricing methodologies, most importantly their computational intenseness, we dedicate Sect. 2.4 to the presentation of an algorithm that amalgamates the mechanisms discussed in Sect. 2.3 to achieve the same goal in a computationally feasible way. Section 2.4 also includes the proof of an estimate relevant to the practitioner that aims to minimize the computational requirements of said algorithm. In Sect. 2.5, we assume one of the pricing methodologies discussed in Sects. 2.3 and 2.4 has been implemented and present a way to produce precise exchange rates between any two currencies irregardless of them being traded between one another or not. We then use some elementary Graph Theory to prove the impossibility of any arbitrage under our pricing mechanism [2]. This is without a doubt the main finding for the reader intrigued by the cryptocurrency market and its modelling, as it provides a solution to the two motivating problems of the chapter. Section 2.6 simply summarizes the chapter. Notice that in the more general context previously discussed, the methodologies presented in Sects. 2.3 and 2.4 can be used to price any asset in an economy in terms of a single numeraire while the results presented in Sect. 2.5 can be applied to determine pairwise prices between any two assets.

2.2 Definitions and Notation Before delving into the crux of the matter we shall take a moment to present the definitions and the notation assumed throughout the chapter. We begin by defining what it is that we mean when we talk of a trade and of an exchange in the world of currencies. When asked to define the notion of a trade, most people would opt for a response along the lines of the action of buying and selling currencies. However, in the present chapter we require a more mathematical framework with which we can actually work. We therefore define a trade between two currencies by its two characterizing quantities, Definition 2.1 A trade between two currencies Ci and C j is the exchange of a particular quantity of the base-currency, Ci , for a particular quantity of the quotecurrency, C j . We store this information by the tuple (V, P), where V is quantity of

2 On Arbitrage-Free Pricing in Numeraire-Free Markets …

23

C j traded (equivalently volume traded), and P is ratio of C j /Ci traded (equivalently price traded at). Notably is the omission of a time-stamp for a trade. Since the motivation of our analysis is to generate prices over a slice of time, we suppose all trades in consideration occur over a fixed time interval/slice. We do not consider the time for which a trade occurs within such interval in this chapter. When one thinks of an exchange, one usually envisions a system in which currency transactions can be carried out. We will yet again impose more rigidity in our definition of an exchange, using the information present on this system that we are naturally lead to think of to define an exchange in terms of the two entities that characterize it, n is the Definition 2.2 An exchange E is a set E = {C E , T E }, where C E = {CiE }i=1 E set of coins listed on exchange E and T = {(VT , PT )}T ∈I (E) is the set of trades occurring on exchange E between currencies in C E . Here, I (E) is an index set for the set of all trades on E, and we shall denote by Ii j (E) ⊂ I (E) the index set corresponding to the trades in E whose base currency is CiE and whose quote currency is C Ej . We also suppose Ii j (E) ∩ I ji (E) = ∅—i.e., each trade uniquely belongs to one of Ii j or I ji based on the order of base and quote currencies.

We now turn our attention to the main piece of notation that we will be utilizing, Given a set of exchanges, e = {E}E∈e , we denote the set of currencies listed on e by, C=



CE

E∈e

and the set of trades occurring on e by T =



TE

E∈e

Finally, we introduce the two formulae that govern the mathematics of the chapter, Definition 2.3 Given a set of exchanges, e = {E}E∈e , and two currencies Ci , C j ∈ C we define the total volume traded between Ci and C j in terms of C j to be, Vi j =

 E∈e

⎛ ⎝



VT +

T ∈Ii j (E)



⎞ ⎠ VT  PT−1 

T  ∈I ji (E)

and we denote the total price-volume traded between Ci and C j in terms of C j to be, ⎛ ⎞    ⎝ ⎠ VT PT + VT  PT−2 (P V )i j =  E∈e

T ∈Ii j (E)

T  ∈I ji (E)

24

J. Mostovoy et al.

2.3 Two Preliminary Pricing Methodologies In this section we fix a set of exchanges e as well as a currency F ∈ C and discuss two methodologies to calculate the price of every C ∈ C in terms of F using the n , with information provided by T . To simplify notation we will write C = {Ci }i=1 F = C1 , and denote by Ri j the exchange rate between Ci and C j , or equivalently the price of Ci in terms of C j , for i, j ∈ {1, . . . , n}. The main challenge one faces when pricing the set of currencies in C in terms of C1 , or equivalently calculating Ri1 for i ∈ {2, . . . , n}, is that exchanges list the volume and the price of trades in terms of other currencies being traded, with no reference to currency C1 . At first, one might consider a pricing methodology that looks only at the direct trades between C1 and other currencies. However, this is a very restrictive view as it uses only a small part of all the available information. Moreover, this approach might result in certain currencies—those that are not traded against C1 —to not be priced at all. A more founded approach is to exploit the notion of volume-weighting being an accurate representation of market value through iterative methods that allow the use of more of the available information at each of their iterations. We will propose two different iterative methods to price the currencies in C -the second being an extension of the first. However, before discussing said methods let us take a moment to comment on the order in which the currencies should be priced.

2.3.1 The Adequate Ordering To determine an appropriate order in which to price the currencies, let us momentarily assume that we know all the exchange rates between currencies in C . The price of a currency C ∈ C in terms of C1 would then be taken to be the volume-weighted average of all the trades concerning C expressed in terms of C1 by using the appropriate exchange rate. It therefore seems natural to price the currencies in an order that aims to minimize the error between the estimated price of a currency and the value that would be obtained through a volume-weighting if all exchange rates were known. Consequently, the first currency to be priced should be that for which trades with C1 constitute the greatest volume-percentage of all its trades. As such, let Ci ∈ C be such that, V1i V1l n ≥ n ∀l ∈ {2, . . . , n} j=1 V ji j=1 V jl Renumbering the currencies if necessary we may assume without loss of generality that i = 2. Of course, the remaining coins should be ordered using the same idea. To render this statement more precise, let us suppose that the currencies in the set C˜ ⊂ C have been priced in terms of C1 and let us assume without loss of generality that k for some k ∈ {2, . . . , n − 1}. Then, let Ci ∈ C \ C˜ be such that, C˜ = {Ci }i=1

2 On Arbitrage-Free Pricing in Numeraire-Free Markets …

k j=1 n j=1

V ji V ji

k j=1

V jl

j=1

V jl

≥ n

25

∀l ∈ {k + 1, . . . , n}

Renumbering the currencies if necessary we may assume without loss of generality that i = k + 1. The coins in C are now numbered so that pricing them one after the other has the effect of minimizing the error between the prices calculated and the prices we would obtain through a volume-weighting if all exchange rates were known.

2.3.2 The Successive Pricing Methodology Having adequately ordered the set of currencies C we may now take a simple volumeweighted average to successively determine the price of Ci for i ∈ {2, . . . , n} to be, i−1 Ri1 =

j=1 (P V )i j (R j1 ) i−1 j=1 Vi j R j1

2

For this iterative formula to make sense we must of course recall that R11 = 1. Also, observe that this formula is coherent in terms of units since the numerator has units (C 2j /Ci ) · (C12 /C 2j ) = C12 /Ci while the denominator has units C j · (C1 /C j ) = C1 yielding units of C1 /Ci for Ri1 , as desired.

2.3.3 The Total Repricing Methodology Having presented a methodology to price all currencies in C we may now wonder how the calculated price of a given currency Ci ∈ C would compare to its price obtained through a volume-weighted average of all the trades it is involved in expressed in terms of C1 through the adequate exchange rate for which we now have an estimate. It seems plausible that the result obtained for the currencies that were first priced will be substantially different to that calculated through the Successive Pricing Methodology, since a lot of new information is being used. This issue leads to a new pricing mechanism that aims to overcome the problem by repricing previously priced currencies after each iteration. Since a given currency will be attributed a number of intermediate prices-one after each iteration that follows its initial pricing-let us introduce the notation, n of currencies and two integers j, k ∈ {1, . . . , n} such Given a set C˜ = {C˜ i }i=1 ˜ that k ≥ j, we denote by R (k) j1 the price of currency C j calculated using its trades k ˜ with the set of currencies {Ci }i=1 .

26

J. Mostovoy et al.

Remark 2.1 This definition may seem to lack precision as it does not present an explicit formula for R (k) j1 . However, this omission is purposeful as this notation shall also be used in Sect. 2.4 and the formula will be different to the one we are about to give. That is, the explicit formula for R (k) j1 depends on the repricing mechanism used. With this notation at hand we are now in a position to describe the Total Repricing Methodology inductively. To do so, let us suppose that we have adequately ordered the currencies in C as in Sect. 2.3.1 and that we have priced C2 as in Sect. 2.3.2. This initial pricing of C2 will be seen to be necessary to initiate the Total Repricing Methodology once the formulae that govern it are introduced. Now, let us assume that k−1 for some the Total Repricing Methodology has been used to price currencies {Ci }i=1 (k−1) k−1 k ∈ {3, . . . , n} and that it has associated the price Ri1 to currency Ci ∈ {Ci }i=1 . (k−1) Then, let Rk1 be the price of currency Ck calculated as in Sect. 2.3 to be, k−1 (k−1) Rk1

=

(k−1) 2 ) j=1 (P V )k j (R j1 k−1 (k−1) j=1 Vk j R j1

and for i ∈ {1, . . . , k} define,

2 

2 (k) (k−1) k R R (P V ) + (P V ) i j i j j1 j1 j=1 j=i+1 k i−1 (k) (k−1) j=1 Vi j R j1 + j=i+1 Vi j R j1

i−1 (k) = Ri1

Finally, let the price of currency Ci ∈ C calculated by the Total Repricing Method(n) . ology be Ri1

2.4 Pricing for the Practitioner The Total Repricing Methodology is in theory more satisfactory than the Successive Pricing Methodology as it uses a lot more information to price currencies. However, it requires an exorbitant amount of computational power to be implemented in practice. We therefore require a more tractable alternative suitable for the practitioner. The key idea behind the practical pricing methodology that we propose is to find a subset of C that adequately summarizes the market and to focus on pricing this set accurately while pricing all other currencies based solely on their trades with the currencies in this set. Before picking the set of currencies that reflects the holistic behaviour of the market let us reorder the coins in C so that such a selection can be done inductively. Since our desire is to summarize the market, it only seems natural to order the currencies in C by total traded volume. However, when trying to implement such an idea, we run into the problem that traded volumes are listed in terms of the base currency as opposed to a universal currency. It is therefore not possible to

2 On Arbitrage-Free Pricing in Numeraire-Free Markets …

27

compare the total traded volume between different currencies by looking solely at the information provided by T . To overcome this difficulty let us use the Successive Pricing Methodology to obtain initial prices for all the currencies in C in terms of C1 . Observe that the reason we use the Successive Pricing Methodology as opposed to the Total Repricing Methodology is that we are trying to keep computational cost to a minimum. Now, let us inductively reorder C . To do so, let Ci ∈ C be such that, n 

Vi j R j1 ≥

j=1

n 

Vl j R j1 ∀l ∈ {1, . . . , n}

j=1

Renumbering the currencies if necessary we may assume without loss of generality that i = 1. Let us now suppose that we have ordered k of the currencies in the set C k and that without loss of generality we have obtained the ordered set {Ci }i=1 . Now, let i ∈ {k + 1, . . . , n} be such that, n 

Vi j R j1 ≥

j=1

n 

Vl j R j1 ∀l ∈ {k + 1, . . . , n}

j=1

Renumbering the currencies if necessary we may assume without loss of generality that i = k + 1. Having ordered the set C so that currencies with the greatest traded volume appear first, let us fix some threshold γ ∈ [0, 1] and define,

n

k = min i ∈ {1, . . . , n} : min l=1

i j=1 n j=1

V jl V jl



≥γ

k account for at least a γ ’th of all the volume traded so that the set of currencies {Ci }i=1 for any coin in C . Under the assumption that γ is sufficiently large and in the hope that k is moderately small due to our initial reordering, we can interpret the set of k as our desired holistic reflection of the market. currencies {Ci }i=1 We are finally in a position to describe the applicable pricing methodology that we are after. To this end, let us use the Total Repricing Methodology to calculate the (k) of Ci in terms of C1 for i ∈ {1, . . . , k} and let us then proceed inductively price Ri1 to price currency C j for j ∈ {k + 1, . . . , n} using the formula,



2 ( j−1) R (P V ) jl l=1 l1 k ( j−1) l=1 V jl Rl1

k R j1 =

and after pricing C j , but before pricing C j+1 , let us reprice the set of currencies k by setting, {Ci }i=1

28

J. Mostovoy et al.



2 

2 ( j) j ( j−1) R R (P V ) + (P V ) il il l=1 l1 l=i+1 l1 i−1 j ( j) ( j−1) l=1 Vil Rl1 + l=i+1 Vil Rl1

i−1 ( j)

Ri1 =

( j−1)

where to prevent an overly long formula we take Rl1 = Rl1 for l ∈ {k + 1, . . . , n}. Since one of the aims of this pricing methodology is to minimize computational cost, a fundamental question that we must address is that of stopping the repricing k after the pricing of each new currency. In fact, this is of the currencies in {Ci }i=1 where the majority of the computational power is being used. A natural stopping mechanism would consist in fixing a tolerance  > 0 and ceasing the repricing after pricing coin C j , where j is the first natural number in the set {k + 1, . . . , n} such that,

k ( j) ( j−1) max Ri1 − Ri1 ≤  i=1

Under such a stopping mechanism we have the following estimate, Theorem 2.1 Let m ∈ { j + 1, . . . , n} be some fixed integer, let Rm1 be the price of ( j−1) k and let R˜ m1 be the price of Cm Cm calculated using the set of prices {Ri1 }i=1 ( j) k calculated using the set of prices {Ri1 }i=1 . Then,  

+H

Rm1 − R˜ m1 ≤  Mm 2 + h where Mm , H and h are constants. Proof The proof of this theorem being slightly laborious we postpone it to the appendix. k Thus if  is picked to be sufficiently small, repricing the set of currencies {Ci }i=1 using the information acquired by pricing C j has a negligible impact on the price of subsequent coins. Although this does not confirm our intuition about the stopping mechanism being coherent, it certainly supports it.

2.5 Deducing Exchange Rates Having discussed three different methodologies to complete the vector,   R(1) = R11 , R21 , . . . , Rn−1,1 , Rn1 a natural extension that arises is how to use the information provided by R(1) to compute the exchange rates between all coins in C . As discussed in the introduction, we would like the exchange rates to be arbitrage-free and satisfy the desirable consistency relation,

2 On Arbitrage-Free Pricing in Numeraire-Free Markets …

29

m Definition 2.4 Let C˜ = {C˜ i }i=1 be a set of currencies and let R˜ i j denote the exchange ˜ ˜ rate between Ci and C j for i, j ∈ {1, . . . , n}. We say that the exchange rates between currencies in C˜ are consistent if,

R˜ i j R˜ ji = 1

∀i, j ∈ {1, . . . , n}

Observe that we can rephrase our objective by introducing the matrix, ⎡

R11 ⎢ R21 ⎢ R=⎢ . ⎣ .. Rn1

R12 . . . R22 . . . .. . . . . Rn2 . . .

⎤ R1n R2n ⎥ ⎥ .. ⎥ . ⎦ Rnn

and saying that our goal is to complete R under the assumption that we know only its first column to obtain arbitrage-free exchange rates that satisfy the consistency relation. We will actually prove the stronger and more general result that such a completion can be made knowing only one element per row in the lower triangular part of R and no other element of R. To achieve our aim we take a mathematical detour which is essential in rendering precise the notion of exchange rates being arbitrage-free in the present context.

2.5.1 Some Mathematical Results We will need a link between the two seemingly unrelated realms of matrix completion and graph theory. We therefore begin with some definitions which culminate in a precise notion of arbitrage-free exchange rates and then present an important lemma. Definition 2.5 Let A ∈ Mn×n (R) be a partially filled matrix such that ai j a ji = 1 for all i, j ∈ {1, . . . , n}. We define the graph associated to A, denoted by GA , to be the graph on n labelled vertices such that vertex i is connected by an edge to vertex j if and only if ai j is a known value of A. Remark 2.2 The graph GA is well defined because ai j is known if and only if a ji is known due to the relation ai j a ji = 1 for all i, j ∈ {1, . . . , n}. Definition 2.6 Let A ∈ Mn×n (R) be a partially filled matrix such that ai j a ji = 1 for all i, j ∈ {1, . . . , n} and let i and j be two vertices of GA connected by an edge E(i, j). We define the value of the edge E(i, j) to be ai j . Definition 2.7 Let A ∈ Mn×n (R) be a partially filled matrix such that ai j a ji = 1 for all i, j ∈ {1, . . . , n} and let P(k1 , . . . , km ) be a path in GA that traverses vertices (k1 , . . . , km ) through the edges (E(k 1 , k2 ), . . . , E(km−1 , km )). We define the value m−1 akl kl+1 . of the path P(k1 , . . . , km ) to be l=1

30

J. Mostovoy et al.

m Definition 2.8 Let C˜ = {C˜ i }i=1 be a set of currencies, let R˜ i j denote the exchange ˜ be the matrix, ˜ ˜ rate between Ci and C j for i, j ∈ {1, . . . , m} and let R



R˜ 11 ⎢ R˜ 21 ˜ =⎢ R ⎢ . ⎣ .. R˜ m1

R˜ 12 R˜ 22 .. . R˜ m2

⎤ . . . R˜ 1m . . . R˜ 2m ⎥ ⎥ . ⎥ .. . .. ⎦ . . . R˜ mm

We say that the exchange rates between currencies in C˜ are arbitrage-free if the value of any two paths between any two vertices of GR˜ is the same. Lemma 2.1 Let A ∈ Mn×n (R) be a matrix for which only one element per row of its lower triangular part is known and whose upper triangular part is completed using the relation ai j a ji = 1 for all i, j ∈ {1, . . . , n}. Then, GA is a tree. Proof By recalling that a graph with n vertices, n − 1 edges and no simple cycles is a tree, we notice that it is sufficient to show that GA has no cycles. To this end, suppose for contradiction that GA has a cycle and let i denote the highest numbered vertex in the cycle, with j and k being its two adjacent vertices. Then, by construction of GA from A, it follows that both ai j and aik are known values. As these two entries lie on the same line in the lower triangular part of A this yields a contradiction.

2.5.2 Computing Exchange Rates Using the mathematical results we have just developed we are in a position to complete the matrix R given only one element per row in its lower triangular part to obtain arbitrage-free and consistent exchange rates. To see this, we begin by imposing the formula Ri j R ji = 1 for all i, j ∈ {1, . . . , n} and then use our graph tree lemma to say that GR is a tree so that any two of its vertices can be connected by a unique simple path. Using this idea we determine entry Ri j for i > j by denoting by (i, k1 , k2 , . . . , kl , j) the unique simple path between vertices i m are the vertices traversed throughout the path, and set, and j, where {kl }l=1 Ri j = Rik1

m−1 

 Rkl kl+1

Rk m j

l=1

The remainder of the matrix is completed by imposing once again, Ri j R ji = 1

∀i, j ∈ {1, . . . , n}

Theorem 2.2 The exchange rates obtained by completing the matrix R are arbitragefree and consistent.

2 On Arbitrage-Free Pricing in Numeraire-Free Markets …

31

Proof The exchange rates obtained by completing R are clearly consistent as they are deduced by imposing the consistency relation, Ri j R ji = 1

∀i, j ∈ {1, . . . , n}

To see that they are also arbitrage-free, fix two vertices i, j ∈ {1, . . . , n} of GR and let S denote the simple path from i to j and P denote any non-simple path between these two vertices. Now, let v be the first vertex along S and P at which the two paths bifurcate. Since GR has no cycles it is clear that P cannot arrive to j without first returning to v. Moreover, P must return to v by going along a path to some given vertex and then returning to v by taking this same path in the opposite direction. Given that we have imposed the relation Ri j R ji = 1 this means that the deviation that P takes relative to S has no effect on the value of the path. Thus, all paths from i to j have the same value and so the exchange rates calculated are indeed arbitrage-free.

2.6 Conclusions This chapter tries to provide a standardization in the pricing methods used in forex and cryptocurrency markets. In doing so, it presents one particular mechanism that strives for computational efficiency through analytic estimates and is thus tailored towards the practitioner. It also provides a theorem that guarantees the inability to generate an arbitrage for the pricing mechanisms discussed.

Appendix T heor em7 Let m ∈ { j + 1, . . . , n} be some fixed integer, let Rm1 be the price of Cm calculated ( j−1) k and let R˜ m1 be the price of Cm calculated using using the set of prices {Ri1 }i=1 ( j) k the set of prices {Ri1 }i=1 . Then,  

+H

Rm1 − R˜ m1 ≤  Mm 2 + h where Mm , H and h are constants. Proof We begin by defining the quantities, k

( j)

H ( j) = max Rl1 l=1

  H = max H ( j−1) , H ( j)

32

J. Mostovoy et al. k

  h = max h ( j−1) , h ( j)

( j)

h ( j) = min Rl1 l=1



k

Mm = max max max l=1

E∈e

max PT , max PT−1

T ∈Iml (E)



T ∈Ilm (E)

and proceed in two steps. Step 1

( j) ( j−1) k Firstly, recalling that maxi=1

Ri1 − Ri1 ≤  we may say:

2

2 k ( j−1) ( j) R R (P V ) (P V ) ml ml l=1 l=1 l1 l1 = − k k ( j−1) ( j) l=1 Vml Rl1 l=1 Vml Rl1

2

2 k  ( j−1) ( j) k R R (P V ) (P V ) ml ml l=1 l=1 l1 l1 ≤ − k k k ( j−1) ( j−1) +  l=1 Vml l=1 Vml Rl1 l=1 Vml Rl1 k

Rm1 − R˜ m1

Upon which cross multiplication and use of the triangle inequality yields:



( j−1) 2 ( j) 2

R

(P V ) − R ml l=1 l1

l1

k Rm1 − R˜ m1 ≤

k

l=1

( j−1)

Vml Rl1

+

k

l=1

Vml

k +  k l=1

l=1

Vml ( j−1)

Vml Rl1

k



k



( j−1) Vml Rl1 +

l=1 (P V )ml l=1

( j−1) 2

Rl1

However, by positivity of the quantities at hand this simplifies to: k Rm1 − R˜ m1 ≤





( j−1) ( j) ( j−1) ( j) − Rl1 Rl1 + Rl1 l=1 (P V )ml Rl1 k ( j−1) l=1 Vml Rl1



k

l=1 Vml k V R ( j−1) l=1 ml l1

+

k

l=1 (P V )ml

k



( j−1) 2

Rl1

( j−1) l=1 Vml Rl1

Finally, observing that (P V )ml ≤ Mm Vml for all l ∈ {1, . . . , k} and using the inequal

( j) ( j−1) k R − Ri1 ≤  once again, we obtain: ity maxi=1

i1

( j−1) ( j) R V + R l=1 ml l1 l1 H ( j−1) ≤  Mm +  M m k ( j−1) h ( j−1) l=1 Vml Rl1    H ( j−1) ≤  Mm 2 + ( j−1) + ( j−1) h h   +H ≤  Mm 2 + h k

Rm1 − R˜ m1

2 On Arbitrage-Free Pricing in Numeraire-Free Markets …

33

Step 2 An identical reasoning leads us to the sequence of inequalities, k R˜ m1 − Rm1 =

l=1 (P V )ml



( j) 2

Rl1

k

l=1 (P V )ml



( j−1) 2

Rl1

− k ( j) ( j−1) Vml Rl1 l=1 Vml Rl1



k k ( j) 2 ( j−1) 2 l=1 (P V )ml Rl1 l=1 (P V )ml Rl1 ≤ − k k k ( j) ( j) l=1 Vml Rl1 l=1 Vml Rl1 +  l=1 Vml



( j−1) 2 ( j) 2

k k

( j) 2 k − Rl1

l=1 (P V )ml Rl1 (P V )ml Rl1 l=1 V ml ≤ +  k l=1 k k k ( j) ( j) ( j) l=1 Vml Rl1 +  l=1 Vml l=1 Vml Rl1 l=1 Vml Rl1 ! k ( j−1) ( j) + Rl1 l=1 Vml Rl1 H ( j) ≤  Mm +  Mm ( j) k ( j) h l=1 Vml Rl1 " # ( j) H  ≤  Mm 2 + ( j) + ( j) h h   +H ≤  Mm 2 + h k

l=1

From which we conclude,  

+H

˜ ≤  M 2 + − R

Rm1 m1 m h

References 1. Rauchs, M., Hileman, G.: Global Cryptocurrency Benchmarking Study, Cambridge Centre for Alternative Finance Reports, Cambridge Centre for Alternative Finance, Cambridge Judge Business School, University of Cambridge (2017) 2. Bondy, J.A., Murty, U.S.R.: Graph Theory, 1st edn. Springer Publishing Company (2008)

Chapter 3

A Survey on Deep Learning in Financial Markets Junhuan Zhang, Jinrui Zhai, and Huibo Wang

Abstract Recently, deep learning has become a frontier in the area of financial markets. In this article, we make a survey on the applications about it. Firstly, we review the deep learning models, which are convolutional neural networks, recurrent neural networks, and deep belief networks. Secondly, we summarize the applications of the three deep learning models in financial markets. The applications focus on financial predictions and quantitative trading, such as sentiment prediction, index prediction, intraday data prediction, financial distress prediction, and event prediction. The applications of markets focus on stock markets, futures markets, exchange rate markets, and energy markets. Finally, there are also some innovative methods in deep reinforcement learning for applications in financial fields.

3.1 Introduction Deep learning is a new concept in artificial neural network research in recent years. It is initially proposed by Hinton and Salakhutdinov [1]. Deep learning is a kind of machine learning in essence, which mainly simulates human brain to analyze and interpret information and data, and form new interpretation mechanism by supervised learning and unsupervised learning. Although the concept of deep learning appears just over ten years, it has become one of the most important concepts in artificial intelligence field and computer technology field rapidly and has obtained significant progress in some fields. For example, speech recognition, face recognition, computer image analysis and other techniques are all completed by deep learning. Deep learning promotes the development of artificial intelligence to a large extent, which makes the technology and concept enjoy popular support and changes the way people J. Zhang (B) · J. Zhai · H. Wang School of Economics and Management, Beihang University, Beijing 100191, China e-mail: [email protected] Key Laboratory of Complex System Analysis, Management and Decision (Beihang University) Ministry of Education, Beijing, China © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_3

35

36

J. Zhang et al.

work, study and live deeply and rapidly. At the same time, the integration of financial industry and deep learning has also become a hot issue in recent years, which not only attracts attention of theoretical community, but also has significant progress in practical applications [2–4]. As a method applying neural network model to learn rules of mass data, deep learning has become one of the most important techniques in the field of artificial intelligence currently. It also shows strong application vitality constantly in the development history of more than ten years. According to the basic development history of deep learning, its evolution has gone through the following key periods. Lecun et al. [5] proposed a new method of deep learning and revitalized research on neural network. After that, deep learning continued to heat up. Therefore, the year 2006 can be basically regarded as the beginning year of deep learning and since then the enthusiasm for deep learning had been set off among the academic community. DARPA, an agency of the United States Department of Defense, planned to fund deep learning project for the first time in 2010, which promotes the application and popularity of deep learning to a large extent. The application in military field is an important process in the development of many cutting-edge technologies, and deep learning is certainly no exception. Deep learning made a breakthrough in speech recognition in 2011. Now speech recognition has become a relatively matured technology. It has not only made progress in t e-commerce, financial customer service, transaction assistance and so on, but also brings more convenience to peoples life [6–10]. In 2012, Google Brain comprehended and recognized cats face from mass images. At the same time, the application of deep learning to the prediction of drug activity obtained the best result around the world. Face recognition based on deep learning makes breakthroughs from 2012. These technologies had been applied to more scenes of life in 2017, which improves peoples experience and feeling greatly. The DeepFace system improved the accuracy of face recognition to 97.25% in 2014, closely approaching the performance of human [11]. The promotion and efficiency improvement of face recognition technology make it become a reliable technology. At the same time, going beyond the ability of human means that deep learning starts to give play to practical value [12–14]. NVIDIA and Google developed specific processor of deep learning successively in 2015, which mainly promoted the development of deep learning technology from hardware and software [15]. AlphaGo applying multiple deep learning technologies defeated one of the best human Go players Li Shishi in 2016, which was generally regarded as “the real conquest of the most difficult and intelligent Go project” by media [16]. AlphaGo defeated current world No. 1 player Ke Jie in the Go competition in 2017, which indicated that it would be impossible for human to defeat artificial intelligence with deep learning technology in the field of Go and made deep learning technology and artificial intelligence became a hotspot with worldwide attention.

3 A Survey on Deep Learning in Financial Markets

37

3.2 Deep Learning Models 3.2.1 Convolutional Neural Network Convolutional neural network (CNN) is a feedforward neural network (FFNN) as an outstanding recognition technology. Its artificial neurons can respond to surrounding units in a restricted region and have outstanding performance in large image processing and object detection [17, 18]. Therefore, newly emerging CNN in recent years plays a prominent role in image recognition and analysis, and has also made people attach more importance to its research and development [19–22]. As shown in Fig. 3.1, CNN takes FFNN as technical core, and it collects, analyzes and integrates information of images by “layer-layer progressive” neuron response and recognition modes. The efficiency and quality of recognition are reliable. It can be seen from the figure that a CNN consists of an input layer, convolution layers, pooling layers and fully connected layer. The input image first reaches convolution layer and is processed by pooling layer, and then is processed by convolution layer and pooling layer for the second time later, finally enters into fully connected layer. We can see that the numerical value in fully connected layer changes from 1024 to 512 to finally displayed 10, which shows great advantages of CNN in image processing. CNN makes large image processing become easier by artificial neuron response method of FFNN and also has better performance in speech recognition. Li [24] studies the important value of CNN application starting with breakthroughs of key technologies of computer vision. He thinks that representative technologies in three levels of the research of computer vision based on CNN are object recognition, scene labeling and scene recognition respectively, while characteristic learning strategy based on multi-scale salient regions can improve the accuracy of scene recognition effectively. Chen [25] thinks that improvement and application of stereo matching technology based on CNN can further upgrade modern binocular stereo vision and has significant effect on practical application of his research. Also, Zhou [26] currently proves the universality of CNN, and the probablity of approximate any continuous function with it. In a word, there is a wild application space for CNN.

Fig. 3.1 Convolutional neural network [23]

38

J. Zhang et al.

Many scholars and experts have done a lot of work about how to improve financial prediction by applying CNN to domestic and overseas related researches. Ding et al. [27] predict S & P500 index and price of selected individual stock by combining neural tensor network and deep CNN and using combining model to process text information. The model is divided into two parts in their researches. Neural tensor network is used to implement event embedding training for events extracted from news text firstly, and then deep CNN is used to capture the impact of events and further establish short-term event impact and long-term impact models. Compared with the primary method (standard FFNN), the prediction accuracy of S & P500 index and individual stock is almost improved by 6% in the model that they built. Chen et al. [28] propose a financial time series analysis method based on CNN of deep learning technology. It means average mapping method (MAM) and double moving average mapping method (DMAM), and implemented predictive analysis of one minute trading data of Taiwan index futures (from January 2, 2001 to April 24, 2015). Vargas et al. [29] propose a new deep learning model-RCNN model, uses 106494 financial news of Reuters (from October 20, 2006 to November 21, 2013) to predict the stock market, and makes comparison with traditional CNN/NN model in prediction accuracy. Korczak and Hernes [30] propose a financial time series prediction method using deep learning related to developing multi-agent stock trading system—A-Trader by using S & P 500 index, FTSE 100 index, related data of petroleum and gold index. In addition, they also obtain that the error rate of time series prediction of CNN is decreased significantly. Sohangir et al. [31] analyze StockTwits investment sentiment data (from January 1, 2015 to June 30, 2015) and try to determine whether emotion analysis performance of StockTwits can be improved by using deep learning models. The results show that deep learning model can be efficiently applied to financial sentiment analysis and CNN is the best model to predict sentiment of the authors in StockTwits. In addition, the application of CNN plays an important role in quantitative trading. Because general quantitative trading system is divided into three parts: trading signal (mode recognition), position control and asset management, building CNN in recognition, information gathering, image analysis and other activities of these three parts significantly helps final decision making. In order to facilitate related researchers to understand research status in recent years, a table is listed in the chapter for comparative research, as shown in Table 3.1.

3.2.2 Recurrent Neural Network Recurrent neural network (RNN) is not a simple concept. It is a general term of two artificial neural networks, namely time RNN and structural recursive neural network. Specifically, connection among neurons of time RNN constitutes directed graph, while structural recursive neural network uses similar neural network structure recursion to construct more complex deep network. Training algorithms of two kinds of neural network have certain differences but they belong to an algorithm system

Research object

Stock price prediction

Financial time series method

Index trend prediction

Financial time series method

Investment sentiment prediction

Model

CNN and novel neural tensor network

CNN

CNN RNN RCNN

CNN

CNN LSTM (RNN)

Stock

Stock, gold and petroleum

Stock

Futures

Stock

Market

Table 3.1 Related research on financial application of CNNs Data

Results

Innovation

StockTwits investment sentiment data (from January 1, 2015 to June 30, 2015)

S& P 500 index, FTSE 100 index, petroleum and gold index (hour summary, daily summary, weekly summary and monthly summary data)

106494 financial news of Reuters (from October 20, 2006 to November 21, 2013)

One minute trading data of Taiwan index futures (from January 2, 2001 to April 24, 2015)

It proves that CNN can overcome problems of data mining method in stock sentiment analysis

The prediction error rate of CNN is significantly decreased

CNN is good at capturing text semantics and RNN excels at capturing context information and complex time characteristics of stock market

CNN model has certain characteristic acquisition and classification ability for futures market

References

Chen et al. 2016

Ding et al. 2015

CNN is the best model to predict sentiment of investors

A financial time series prediction method related to A-Trader and the investment strategy of A-Trader system are proposed

Sohangir et al. 2018

Korczak and Hernes 2017

A new deep learning Vargas et al. 2017 model—RCNN model is proposed and a comparison between traditional CNN/NN and other models on prediction accuracy is drawn

New financial time series analysis method based on deep learning is proposed

Financial news headlines The prediction accuracy is Deep learning models is of Reuters and Bloomberg almost improved by 6% first used in event-driven News (From October 2006 stock market prediction to November 2013)

3 A Survey on Deep Learning in Financial Markets 39

40

J. Zhang et al.

Fig. 3.2 Schematic diagram of RNN model [32]

basically. Therefore, there is also relatively complex correlation between the two. From operation mechanism, RNN is a kind of network constructed by the method of structural recursion. For example, common recursive Autoencoder can be used to parse statement in neural network analysis of natural language. RNN is usually used to describe dynamic time behavior sequences, transmits state in its own network circularly and can accept broader temporal sequence structure input. Different from feedforward deep neural network, RNN attaches importance to feedback effect of network. As connection between current state and previous state exists, RNN can have certain memory function. The current representative RNN includes traditional RNN model, long short-term memory neural network (LSTM) and GRU (gated recurrent unit) model. There are three gates in RNN: input, forget and output gate. These gates decide whether make new input (input gate) delete information because it is unimportant (forget gate) or make it affect output in current time steps (output gate). You can see an example of RNN. There are three gates below, as shown in Fig. 3.2. In the aspect of specific application, Graves et al. [33] think that with the guidance by RNN and starting from speech recognition can extend the application scope and value of such models and shape the application scenes better in essence. Whether it is applied to sequence combination of natural languages or speech analysis and understanding, such neural network model can show the value better and can lead to greater technical innovation. As to financial field, application and research of financial market prediction has become increasingly active with the help of RNN model, while stock market, futures and crude oil market also present good development trend. Yoshihara et al. [34] use 834882 financial news of Nikkei newspaper published in 1999 to 2008 these 10 years and firstly use a deep learning model which is a combination of RNN and restricted

3 A Survey on Deep Learning in Financial Markets

41

Boltzmann machine to process text information of news event, thus predicting the trend of stock prices. Xiong et al. [35] use a LSTM neural network to model S&P 500 volatility and took Google domestic trends data (from October 19, 2004 to July 24, 2015) representing public sentiments and macroeconomic factor as a breakthrough point to study the influence of certain factors on the volatility of S&P500. Heaton et al. [36] analyze S&P 500 index and stock price of 20 included companies, and then propose a hierarchical decision-making model of financial prediction and classification problem which can improve the prediction performance of traditional financial application. Singh and Srivastava [37] prove that compared with (2D) ZPCA, deep learning of (2D) ZPCA+ can improve inventory multimedia (chart) prediction in Google data set. Compared with traditional neural network method, its accuracy is improved significantly. The results show that correlation coefficient between actual income and predicted income of DNN is 17.1% higher than RBFNN and 43.4% better than RNN in the aspect of stock prediction. Deng et al. [38] use RNN to process real-time financial signal characteristics and try to construct trading strategy based on the structure to defeat financial asset traders with rich experience. Their model is inspired by deep learning and reinforcement learning. The deep learning technology is used to automatically learn related characteristics and extract characteristics from dynamic market conditions and the reinforcement learning technology is applied to make trading decisions in an unknown environment. Their model obtains better effect because it shows both the deep and recurrent structures. The inspection results show that the model had good robustness in both stock market and commodity futures market. The application of RNN model in financial market is largely expended in researches in recent two years, which is reflected in different financial market prediction methods. Karaoglu et al. [39] improve RNN to make it greatly suitable for time series data. Their model can also detect excessive movements in noisy time series data flow. Sezer et al. [40] propose a stock price prediction and trading system using technical analysis indexes based on neural network. Bao et al. [41] take 6 market indexes, including Shanghai and Shenzhen 300 indexes in A stock market of Chinese mainland, Nifty 50 indexes representing Indian stock market, Hang Seng index trading of Hong Kong market, Nikkei 225 indexes of Tokyo, S&P 500 indexes and DJIA indexes of New York stock exchange, as example and prove that a deep learning model is superior to other similar models in the aspects of prediction accuracy and profitability. Yan and Ouyang [42] apply the LSTM to predicting the daily closing price (from January 4, 2012 to June 31, 2017) of Shanghai Composite Index. They prove that LSTM has better prediction accuracy and has better effect on static prediction and dynamic trend prediction of financial time series, which shows its applicability and effectiveness for financial time series prediction. At the same time, wavelet decomposition and reconstruction of financial time series can improve generalization ability of LSTM prediction model and prediction accuracy of longterm dynamic tendency. Fischer and Krauss [43] apply LSTM network to the time series prediction of the S&P 500 indexes (from December 1989 to September 2015) and find that the performance is superior to unclassified memory method, namely random forest (RAF), deep neural network (DNN) and logical regression classifier

42

J. Zhang et al.

Fig. 3.3 Schematic diagram of DBN model [47]

(LOG). Chen et al. [44] use 2409 price data (from July 23, 2007 to February 24, 2017) of WTI crude oil market, propose new mixed crude oil price prediction model based on deep learning and use the model to analyze and simulate crude oil price trend. The model suggested by price data evaluation of WTI crude oil market is used in the chapter. The empirical results show that the model realized the expectation. In order to facilitate related researchers to understand research status in recent years, table is listed in the chapter for comparative research, as shown in Tables 3.2 and 3.3.

3.2.3 Deep Belief Networks DBNs was proposed by Geoffrey Hinton in 2006 [45]. It is a generative model. We can make the whole neural network generate training data according to the maximum probability by training weight among neurons. People can not only use DBN to recognize the characteristics of data and classify data, but can also use it to generate data. Therefore, related scholars and experts explain data in financial industry by DBN method in researches, help financial investors and analyzers make clearer judgments and promote development of financial investment and financial trading. DBN model is a fast learning algorithm in essence. It is shaped by generative model and it highlights the ability of fast processing and integration. Of course, DBN better displays the media forms and characteristics of data information in financial application field, so that the integration of the two has a better breakthrough point. As shown in Fig. 3.3, DBNs is derived from a restricted Boltzmann machine system, namely RBM. Under the neuron structure of RBM, the spread of DBN presents an orderly mode, while structural operation presents the form of artificial intelligent learning. This form has natural matching relationship with the concept of deep learning and reflects the advantages of DBN in the aspect of technical innovation [46]. Hinton and Salakhutdinov [1] propose the concept of simple belief network firstly and give a prototype model. Their researches prove that neuron weight training based

Market

Stock price prediction

Establish real-time financial trading system based on deep learning

Detect performance of model in periodic interval time series data

RNN DNN

RNN

RNN

Stock

Stock and futures

Stock

Stock

Index and stock price tendency prediction

RNN

Stock

Stock

Stock price tendency prediction

RNN DBN

LSTM (RNN) Predict stock index

Research object

Model

Results

Fixed time interval data of Istanbul security exchange

The first futures contract IF stock contract based on index in China and sliver (AG) and sugar (SU) contract in commodity market (2014.1-2015.9)

Multi-media data of Google stock price of NASDAQ

S& P 500 index and stock price of 20 included companies

Domestic trend data of Google (from October 19, 2004 to July 24, 2015)

The model has good performance and succeeds in trading data

The model has good effect and robustness in both stock market and commodity futures market

The correlation coefficient between the actual and predicted income of DNN is 17.1% higher than that of RBFNN and 43.4% better than that of RNN

The predictability of traditional financial application is improved

Mean absolute percentage error (MAPE) is 24.2%

834882 financial news of The error rate of the new Nikkei newspaper (from 1999 DBN combination proposed to 2008) in the Paper is the lowest

Data

Table 3.2 Related research on financial application of RNNs(1) Innovation

References

The RNN is improved to be more suitable for time series data and the excessive movement in noise time series data flow is detected

A model constituted by deep learning and reinforcement learning is proposed

Compared with traditional neural network, the accuracy of (2D) ZPCA + in Google data set is improved

A hierarchical decision making model of financial prediction and classification is proposed

The influence of public sentiments and macroeconomic factor on the volatility of S& P500 is studied

Karaoglu et al. 2017

Deng et al. 2016

Singh and Srivastava 2016

Heaton et al. 2016

Xiong et al. 2015

The RNN is combined with Yoshihara et al. 2014 restricted Boltzmann machine to predict stock market trend

3 A Survey on Deep Learning in Financial Markets 43

(RNN) Predict stock price change

(RNN) Predict crude oil price

LSTM DNN

LSTM DBN

Predict daily closing price of composite indexes of Shanghai Stock Exchange

Crude oil

Stock

Stock

Stock

LSTM (RNN)

LSTM (RNN)/WT/SAEs

Stock

Market

Stock

Stock price and index prediction

RNN

CNN RNN RCNN Index tendency prediction

Research object

Stock price prediction

Model

Results

The proposed model is superior to other similar models in the prediction accuracy and profitability

The result of proper buy-and-hold strategy under most circumstances is realized

The performance of LSTM network is superior to RAF, DNN and LOG

LSTM has better effect on static and dynamic trend prediction of financial time series

2409 price data of WTI crude The model improves the oil market (from July 23, prediction accuracy 2007 to February 24, 2017)

S& P 500 indexes (from December 1989 to September 2015)

Daily closing price of Shanghai Composite indexes (from January 4, 2012 to June 31, 2017)

106494 financial news of CNN is good at capturing Reuters (from October 20, text semantics and RNN 2006 to November 21, 2013) excels at capturing context information and complex time characteristics of stock market

Shanghai and Shenzhen 300 indexes, Nifty 50 indexes, Hang Seng index trading, Nikkei 225 indexes, S& P 500 indexes and DJIA indexes

Daily stock price of all Do230 stock during the period from 1997 to 2007

Data

Table 3.3 Related research on financial application of RNNs(2) Innovation

References Sezer et al. 2017

Vargas et al. 2017

New mixed crude oil price prediction model based on deep learning is proposed

LSTM network is applied to financial time series prediction and a kind of portfolio strategy based on LSTM model is designed

Chen et al. 2017

Fischer and Krauss 2017

A new time series prediction Yan and Ouyang 2017 model is proposed to capture complex characteristics of financial time series

A new deep learning model—RCNN model is proposed and a comparison between traditional CNN/NN and other models on prediction accuracy is drawn

A new deep learning frame Bao et al. 2017 combining wavelet transformation, stacked automatic encoders and LSTM is proposed to predict stock price

A stock price prediction and trading system using technical analysis indexes based on neural network is proposed

44 J. Zhang et al.

3 A Survey on Deep Learning in Financial Markets

45

on DBN can generate data with maximum probability to meet the needs of calculation and practical application. In the research level of financial application, related researches using DBNs are mainly focused on stock price prediction and enterprise finance prediction. Kuremoto et al. [48] propose a new neural network model of time series prediction with higher accuracy. Zhu et al. [49] establish an automatic stock decision support system by using DBN combining oscillation box theory. The results show that system trading based on the market is superior to basic buy-and-hold strategy. Batres-estrada [50] uses S& P 500 indexes (from January 1, 1985 to December 31, 2006) and apply new deep learning algorithm to the prediction of financial stock data. Lanbouri and Achchab [51] use financial data of 966 French companies and combined deep learning and support vector machine into financial distress prediction (FDP) model for related prediction. Shen et al. [52] use weekly exchange rates of GBP/USD, INR/USD, BRL/USD to improve typical DBN, and use constant restricted Boltzmann machine to construct DBN, making the model applicable to continuous data. Then they use the model to implement the prediction of the weekly exchange rates of GBP/USD, INR/USD, BRL/USD. Sharang and Rao [53] design an intermediate frequency trading strategy by using daily average price and biweekly average price of 5-year and 10-year US treasury futures. They use DBN constituted by stacked restricted Boltzmann machines to predict weekly trend of asset portfolio price. Constructed prediction model is applied to trading strategy. Hedging transaction mode is used to buy a kind of treasury futures and sell another treasury futures to avoid risks. After a period of time, comparison is made between trading profit and trading profit using random classifier model. Excluding trading cost and tax, the profit for a trade size of 10 units for the portfolio is about 90 thousand dollars. Zeng et al. [54] propose an improved financial time series data modeling and analysis method based on DBN decision algorithm by using financial time series data including closing price of all stocks in Shanghai and Shenzhen stock market in 100 business days before October 20, 2012 as sample set of model training. Lu [55] studies 2913 listed company samples of A stock with time period of financial statement from 1993 to 2015, and establishes credit risk measurement realization system suitable for commercial bank reality. He thinks that the credit risk measurement model is becoming an irresistible trend and he establishes financial crisis warning model based on financial data of enterprises with single account. In order to facilitate related researchers to understand research status in recent years, table is listed in the chapter for comparative research, as shown in Tables 3.4 and 3.5.

3.2.4 Comparison of Deep Learning Models Deep learning reflects the important tendency of intelligence times after deep integration and development of information technology as an important branch of modern artificial intelligence. Deep learning has been partially applied to speech recognition, face recognition and other specific scenes of life and has obtained good practical

Research object

Stock price trend prediction

Predict stock price

Predict stock price

Financial distress prediction

Exchange rate prediction

Model

DBN

DBN

DBN

DBN

DBN

Exchange rate

Enterprise finance

Stock

Stock

Stock

Market

Table 3.4 Application of DBN in financial markets(1) Results

System trading constructed by the model is superior to basic buy-and-hold strategy

Combination error rate of new DBN proposed in the chapter is the lowest

Innovation

References

An automatic stock decision Zhu et al. 2014 support system is established by using DBN combining oscillation box theory

A new neural network model Kuremoto et al. 2014 of time series prediction with higher accuracy is proposed

Weekly data of GBP/USD, INR/USD, BRL/USD these three exchange rates

Deep learning is first Lanbouri et al. 2015 combined with support vector machine and used into financial distress prediction (FDP) model

Compared with traditional An improved DBN is Shen et al. 2015 methods, FFNN is more proposed to predict exchange applicable to the prediction rate of foreign exchange rate and the effect

Financial data of 966 French Proposed model obtains companies 76.8% of examples with correct classification

S&P 500 indexes (from Results obtained by deep Adopt new deep learning Gilberto and Batres-Estrada January 1, 1985 to December neural network are better and algorithm to predict financial 2015 31, 2006) more stable than basis stock data

Historical trading data of 400 stocks in S&P 500

Data through smooth processing

Data

46 J. Zhang et al.

Financial time series method

Financial distress prediction

DBN

DBN

(RNN) Predict crude oil price

DBN

LSTM DBN

Research object

Weekly movement direction prediction of treasury portfolio

Model

Enterprise finance

Stock

Crude oil

Bond

Market

Table 3.5 Application of DBN in financial markets(2) Data

Results A trade size of 10 units for the portfolio makes a profit of 10 units that is about 90 thousand dollars

The accuracy of financial data samples selected by DBN model can reach 90.54% in decision analysis of financial time series data quantization

Innovation

References

Chen et al. 2017

Lu 2017

An improved financial Zeng et al. 2017 sequence data modeling and analysis method based on DBN and decision algorithm is proposed

New mixed crude oil price prediction model based on deep learning is proposed

An intermediate frequency Sharang and Rao 2015 trading strategy is designed by DBN constituted by stack restricted Boltzmann machine to predict weekly movement direction of asset portfolio price

Financial statements of 2913 Predicted financial statement Credit risk measurement listed company of A stock has smaller absolute error realization system suitable (from 1993 to 2015) and higher reliability for commercial bank reality and financial crisis warning model based on financial data of enterprises with single account are established

Financial sequence data sample including closing price of all stocks in Shanghai and Shenzhen stock market in 100 business days before October 20, 2012

2409 price data of WTI crude The model improves the oil market (from July 23, prediction accuracy 2007 to February 24, 2017)

Daily average price and biweekly average price data of 5-year and 10-year US treasury futures

3 A Survey on Deep Learning in Financial Markets 47

48

J. Zhang et al.

Table 3.6 Comparison of deep learning models Model types Property/ characteristics

Advantages

CNN

Deep feedforward artificial neural network

RNN

DBN

Disadvantages

Application scope

Algorithmic characteristics

Excellent Application scope recognition ability, is relatively process large narrow image and speech recognition

Recognition of large images

Train convolutional network by known modes to make it have mapping ability from input to output

General term of time RNN and structural RNN

Large parallel processing ability, cover two dimensions of time and structure

Algorithm is very complex and calculation is very complicated

Language Modeling and Generating Text, Machine Translation Speech Recognition

Error BackPropagation Algorithm

Generative model of neural network

Generate training data with maximum probability

Calculated amount Recognize data is large and the characteristic and cost is higher classify, integrate and generate data

Training algorithm of neuron weight

effect. Needless to say, deep learning technology and its derivative methods have many shortcomings even deficiencies, but it has revealed its importance for peak of production life even human thinking models. Of course, the deep learning models mentioned above also have respective advantages and disadvantages and application differences. Brief comparative analysis is demonstrated below, as shown in Table 3.6. Table 3.7 shows the respective property, characteristics, advantages and disadvantages, application scope and algorithm characteristics of deep learning models. It can be seen that three models have certain differences in the aspect of application scope and preference, while comparison between each other proves that the development of deep learning model in artificial intelligence field is not keeping pace, but has certain gap, which is the current situation of the development of deep learning and artificial intelligence. Taking CNN as example, it has excellent performance in processing large images and also can be used in speech recognition, visual analysis and other fields at the same time. It owns strong adaptive ability and behavior mapping ability and can play a role in the aspect of intelligent robots, analysis and prediction and so on [56]. It can be known hereby, financial industry is constructed by mass data, information and asset flow, which coincides with core function and application characteristics of three models. The three models of deep learning can better fit financial scenes in respective subdivision fields and play specific roles. The application model is dominated by RNN in stock market, followed by DBN model. Current researches are mostly focused on index and stock price prediction using historical data or text and the model application is mainly dominated by RNN and CNN in future market, while research focus is prediction of future price. In addition, the earliest model DBN in deep learning is mainly applied to enterprise finance and stock price prediction,

3 A Survey on Deep Learning in Financial Markets

49

Table 3.7 Comparison for application of deep learning in finance Market

Data

Research focus

Stock market Dominated by RNN, followed by DBN model

Model

Mainly index or daily closing price of individual stock; news or report text information

Using historical data or text for index and stock price prediction

Futures mar- Dominated by RNN and CNN ket

Mainly historical daily price data of futures

Using historical data to predict future price

Enterprise finance

Dominated by DBN model

Historical financial statement data of enterprises

Using enterprise statement data for future financial prediction

Others

Mainly CNN, RNN and DBN

Historical data of all markets (hour and weekly data)

Using historical data of all markets for future price prediction

it usually uses historical financial statement data of enterprises. Specific conditions are shown in Table 3.6. Therefore, the above models have overlap and intersection in the subdivision scenes and fields in general, but each of them has its own advantages and application intensity area. Mastering technical characteristics and application value of these models better is a great technical promotion for the development of the financial industry, especially the prosperity of financial market, and it is also in line with the general trend of current artificial intelligent technology application development.

3.3 Application of Deep Reinforcement Learning in Financial Industries The integration of financial field and deep learning is extremely success and this point has been agreed by many domestic and overseas scholars, experts and entrepreneurs. Deep reinforcement learning is also evolving and improving in recent years, and especially with the global economic recovery after the global financial crisis in 2008 and the interest of China and other emerging market countries, artificial intelligence, deep learning and financial industry are welcoming a great development opportunity. It can be seen that many countries, industries and enterprises and so on are accelerating the layout of artificial intelligence field actually, and one of the important points is financial field [57]. For example, Google, Microsoft, IBM, Facebook and BAT enterprise cluster in China take artificial intelligence based on deep machine learning as major development field and obtain remarkable progress in financial field. OCR system based on deep learning shortens the certificate check time from 1 day to 1 s and promotes passing rate by 30% at the same time [58]. Machine learning algorithms and big data are changing all industries, including financial and securities investment management. And deep reinforcement learning(DRL) has gradually entered our life. DRL is an artificial intelligence method that is closer to human thinking, combining the percep-

50

J. Zhang et al.

tion of deep learning with the decision-making ability. And we’ll specifically discuss how deep reinforcement learning is applied in the financial field in four aspects.

3.3.1 Credit Scoring In the financial field, the expert-based credit risk model still dominates, but with the continuous development and innovation of technology, traditional credit rating methods are gradually unable to meet the needs and changes of the entire industry. The further introduction of new methods has become the focus of more scholars, and the prerequisite is to be able to establish meaningful benchmarks and comparisons based on machine learning methods and models of human experts. Munkhdalai et al. [59] selected many machine learning models to compare with the traditional FICO scoring model, and found that the XGBoost model showed good performance. But the flaw is that it cannot be used to evaluate the merits of the scoring model. Mancisidor et al. et al. [60] developed two new semi-supervised Bayesian models from the perspective of rejecting reasoning using the depth generation model, opening up a new method for big data processing. At the same time, compared with the traditional single-stage model, Bastani et al. et al. [61] proposed a new two-stage scoring method combining extensive and deep learning models of the stance hardness threshold (IHT) to better judge default loans. Speaking of loans, in recent years, Peer2Peer (P2P), a kind of lending platform different from traditional bank lending, has developed rapidly, but the breach of contract occurs from time to time. Wang et al. [62] proposed a consumer credit scoring method combining attention mechanism and LSTM model, which is an innovative application of deep learning method. Ha et al. [63] introduced an effective method based on feature selection and restricted Boltzmann machine (RBM) integration, which fully exploited the advantages of RBM in determining the nonlinear transformation of feature space in unsupervised learning.

3.3.2 Stock Prediction Mer et al. [64] innovatively proposed a new method of integrating knowledge map embedding with stock market forecasting, and made a good prediction for a famous company. However, at present, the research on the knowledge map of the feature set is still in its infancy, and there is great uncertainty. Eapen et al. [65] constructed a stock price index prediction model based on multiple pipelines of convolutional neural networks and bidirectional long-term short-term memory units. Similarly, using the CNN model, Hoseinzade and Haratizadeh [66] worked out a CNNpred framework, which is well predicted from the rich initial variable set to extract higher-level features. In addition, Lv et al. [67] also pioneered the use of 12 machine learning algo-

3 A Survey on Deep Learning in Financial Markets

51

rithms from an industry perspective to demonstrate that the best predictive models can be constructed for most industries. Volatility is related to financial risk. Research on stock prices often requires volatility predictio and support vector machine (SVM) is a very popular calculation method. However, Liu [68] found that the new LSTM RNN model is superior to SVM. This method can learn from a large amount of raw data and also run with many hidden layers and neurons under the GPU. On the basis of LSTM, Moon et al. [69] proposed using the mixed momentum as the target variable, further providing the accuracy of the prediction. Based the above popular machine-learning algorithms, Nikou et al. [70] made a comparison, pointing that support vector regression method performed best. In addition, there are always being algorithms’ updating, such as multi-filters neural network (MFNN) [71] and order-encoding method [72].

3.3.3 Market Trading Jeonga and Kim [73] designed an automated system that analyzes the number of shares by adding a deep neural network (DNN) regression to the depth Q network, which combines reinforcement learning with DNN to analyze which action strategies Conducive to profit in a chaotic market. Zbikowski et al. [74] proposed a credit product identification system based on random forest classification and deep neural network, which can extract important patterns from customer historical transfer and transaction data, and predict the possibility of credit purchase. Zarkias et al. [75] overcomed the turbulence and instability of financial markets, and proposed a new price tracking method based on the Q-learning models. Li et al. [76] extended the value-based deep Q network so that models can make trading decisions and make profits autonomously in dynamic financial markets. Ma and Liu [77] not only used deep intensive learning to implement the limit order strategy, but also concluded that DRL is conducive to cryptocurrency market transactions. In addition, market transactions inevitably have a lot of fraud in order to maximize profits, such as money laundering, bank fraud and credit card fraud. With the application of deep learning continues to be widely used, it provides more possibilities for the identification of fraud. Lebichot et al. [78] and Singh et al. [79] both dealt with deep transfer learning approaches for credit card fraud detection, while Mark et al. [80] focused on money laundering, using a binary classification task predicting illicit transactions with variations of LR, RF, MLP, and GCN.

3.3.4 Portfolio Management The portfolio is an integral part of the financial market and is the primary means of hedging risk and arbitrage. From portfolio theory, capital asset pricing model to later

52

J. Zhang et al.

APT model, scholars are also enriching and supplementing the theory of portfolio theory with the development of technology. Tsang and Wong [81] proposed a deep neural network (DNN) architecture and numerical algorithm for multi-cycle portfolio optimization, and proved the accuracy of this convergence. Emerson et al. [82] introduced the application of MLP, SVM and SLM models in quantitative finance, especially risk modeling and portfolio construction. Zhang and Zhou [83] used feedforward neural networks to solve the problem of portfolio selection with proportional transaction costs and found the best buy and sell boundaries. Besides, Hu and Lin [84] suggested using deep RNN (GRU) for DL and using strategy gradients for RL to find the best portfolio to better optimize financial management. Innovatively, Huck [85] developed several independent portfolio arbitrage strategies based on the Random Forest, Deep Belief Networks and Elastic Net Regression models. Jiang and Liang [86] believe that it is possible to make decisions through “try and error” learning and feedback on their behavior, making machine learning more flexible in portfolio allocation. And Buehler et al. [87] proposed a new framework for hedging derivative product portfolios using modern deep-enhanced machine learning methods in the presence of market friction. Based on the above method from Buehler et al. [87], Zhang and Huang [88] modified it and put forward an optimal hedging strategy with market frictions with LSTM-RNN, obtaining better results.

3.4 Conclusions In conclusion, with the development of the times and social progress, development of modern artificial intelligence has changed the thinking modes and survival modes of people to great extent. It makes peoples life become more colorful, but it is also full of uncertain factors at the same time. As an important branch of modern artificial intelligence, deep learning reflects linearity, diversity, multiple branch and adaptive ability of intelligent learning activities under post-modern modes. New technologies based on deep learning has emerged rapidly in recent years and appeared in common working and life scenes of people repeatedly, which promotes social progress to some extent. For example, intelligent face recognition technology attracting attention in 2017 is not only applied to iPhone X and other smart phones, it is even promoted and popularized in metro ticket check, civil aviation security check and other scenarios. It is proved that the efficiency of such work has been improved greatly. Of course, specific to financial field and financial market, the value and technical advantages of deep learning are more obvious. Financial market is constituted by massive data, information and asset, while related technologies of deep learning has great calculation and processing advantages in these aspects naturally, therefore it has great facility. One of the most important features of the deep neural network is that it has a large number of adjustable free parameters, which makes the model have high flexibility. But on the other hand, it lacks strong theoretical guidance and support. In most cases, it is still dependent on experience. It is easy to get ideal fitting results on

3 A Survey on Deep Learning in Financial Markets

53

a specific set of data by using such a complex model, but it is often difficult to guarantee the generalization performance. In order to prevent the problem of overfitting, it is necessary to consider the scale of the data and the structure of the network so that the deep neural network can be better performance in practical problems. This chapter focuses on three deep learning models, namely CNN, RNN, and DBN, which have specific application value in modern financial field. Ultimately, the working efficiency and quality can be improved greatly in financial exploration, image recognition, intelligent customer service, financial time series analysis, financial opinion analysis, intelligent financial advisor (intelligent financial management) and other activities in modern financial market and financial activities relied on related models and technical methods of deep learning. The traditional time series analysis is the mainstream method of financial quantitative analysis, and now we can apply the deep learning technology to this field. It also includes the prediction of the volatility of the stock price or the index. The research models in recent years include many kinds of neural networks, such as DBN and LSTM, which perform well. In addition, image recognition, natural language processing, intelligent customer service and other branch fields have been proved to have bright prospects in previous financial practice and exceed artificial standards. In the future, financial industry and financial market will welcome a new technical revolution, also trigger a new research upsurge and promote great new development of the whole industry relying on the rapid development of deep learning technology and artificial intelligence. Acknowledgements The authors are grateful for support from National Natural Science Foundation of China (# 71801008), and Beihang University (# KG12113201).

References 1. Hinton, G.E., Salakhutdinov R.R.: Reducing the dimensionality of data with neural networks. Science 313(504) (2006). https://doi.org/10.1126/science.1127647 2. Kolm, P. N., Ritter, G.: Modern perspectives on reinforcement learning in finance. J. Mach. Learn. Financ. (2019) 3. Huang, J., Chai, J., Cho, S.: Deep learning in finance and banking: a literature review and classification. Front. Bus. Res. China 14, 1–24 (2020) 4. Emerson, S., Kennedy, R., O’Shea, L., O’Brien, J.: Trends and applications of machine learning in quantitative finance. In: 8th International Conference on Economics and Finance Research (2019) 5. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 6. Zhang, B., Ni, J., Lou, Y., Chen, H., Zhang, S.: Spoken emotion recognition based on deep belief networks Research on speech emotion recognition based on deep belief network. In: Proceedings of the 13th National Man-Machine Speech Communication Conference (NCMMSC2015) (2015) 7. Tu, Y.H., Du, J., Lee, C.H.: Speech enhancement based on teacher–student deep learning using improved speech presence probability for noise-robust speech recognition. IEEE/ACM Trans. Audio, Speech, Language Process. 27(12), 2080–2091 (2019) 8. Wen, M., Vasthimal, D.K., Lu, A., Wang, T., Guo, A.: Building large-scale deep learning system for entity recognition in E-Commerce search. In: Proceedings of the 6th IEEE/ACM

54

9.

10. 11.

12. 13. 14. 15. 16. 17.

18. 19.

20. 21. 22. 23.

24. 25. 26. 27. 28. 29.

30. 31.

J. Zhang et al. International Conference on Big Data Computing, Applications and Technologies, pp. 149–154 (2019) Maeda, I., deGraw, D., Kitano, M., Matsushima, H., Sakaji, H., Izumi, K., Kato, A.: Deep reinforcement learning in agent based financial market simulation. J. Risk Financ. Manage. 13(4), 71 (2020) Gao, W., Su, C.: Analysis on block chain financial transaction under artificial neural network of deep learning. J. Comput. Appl. Math. (2020) Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014) Ubbens, J., Cieslak, M., Prusinkiewicz, P., Stavness, I.: The use of plant models in deep learning: an application to leaf counting in rosette plants. Plant Methods 14(1), 6 (2018) Moen, E., Bannon, D., Kudo, T., Graf, W., Covert, M., Van Valen, D.: Deep learning for cellular image analysis. Nat. Methods 1–14 (2019) Li, X., Zhang, W., Ding, Q.: Deep learning-based remaining useful life estimation of bearings using multi-scale feature extraction. Reliab. Eng. Syst. Safe. 182, 208–218 (2019) Feng, R., Badgeley, M., Mocco, J., Oermann, E.K.: Deep learning guided stroke management: a review of clinical applications. J. Neurointerventional Surg. neurintsurg-2017-013355 (2017) Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V.D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016) Lecun, Y., Jackel, L.D., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., et al.: Learning algorithms for classification: a comparison on handwritten digit recognition. Neural Networks the Statistical Mechanics Perspective, pp. 261–276 (1995) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. (2015) Albarqouni, S., Baur, C., Achilles, F., Belagiannis, V., Demirci, S., Navab, N.: Aggnet: deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Trans. Med. Imaging 35(5), 1313–1321 (2016) Sarıgül, M., Ozyildirim, B. M., Avci, M.: Differential convolutional neural network. Neural Netw Wieland, M., Li, Y., Martinis, S.: Multi-sensor cloud and cloud shadow segmentation with a convolutional neural network. Remote Sens. Environ. 230, 111203 (2019) Cao, J., Wang, J.: Stock price forecasting model based on modified convolution neural network and financial time series analysis. Int. J. Commun. Syst. 32(12), e3987 (2019) Chen, Y., Fan, R., Wang, J., Wu, Z., Sun, R., Geomatics, S.O., et al.: High resolution image classification method combining with minimum noise fraction rotation and convolution neural network. Laser Optoelectron. Prog. 54(10), 102801 (2017) Li, Y.: Research on key technologies of computer vision based on convolutional neural network. University of Electronic Science and Technology (2017) Chen, T.: Stereo Matching Technology Based on Convolutional Neural Network. Zhejiang University (2017) Zhou, D. X.: Universality of deep convolutional neural networks. Appl. Comput. Harmon. Anal. 48(2), 787–794 (2020) Ding, X., Zhang, Y., Liu, T., Duan, J.: Deep learning for event-driven stock prediction, pp. 2327–2333 (2015) Chen, J.F., Chen, W.L., Huang, C.P., Huang, S.H., Chen, A.P.: Financial time-series data analysis using deep convolutional neural networks, pp. 87–92 (2016) Vargas, M.R., Lima, B.S.L.P.D., Evsukoff, A.G.: Deep learning for stock market prediction from financial news articles. In: IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications, pp. 60–65 (2017) Korczak, J., Hernes, M.: Deep learning for financial time series forecasting in A-Trader system. In: Federated Conference on Computer Science and Information Systems, pp. 905–912 (2017) Sohangir, S., Wang, D., Pomeranets, A., Khoshgoftaar, T.M.: Big data: deep learning for financial sentiment analysis. J. Big Data 5(1), 3 (2018)

3 A Survey on Deep Learning in Financial Markets

55

32. Graves, A.: Supervised sequence labelling with recurrent neural networks. Studies in Computational Intelligence, vol. 385 (2008) 33. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 38, pp. 6645–6649 (2013) 34. Yoshihara, A., Fujikawa, K., Seki, K., Uehara, K.: Predicting Stock Market Trends by Recurrent Deep Neural Networks, pp. 759–769. Springer International Publishing (2014) 35. Xiong, D., Zhang, M., Wang, X.: Topic-based coherence modeling for statistical machine translation. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 483–493 (2015) 36. Heaton, J.B., Polson, N.G., Witte, J.H.: Deep Learning in Finance (2016) 37. Singh, R., Srivastava, S.: Stock prediction using deep learning. Multimedia Tools Appl. 1–16 (2016) 38. Deng, Y., Bao, F., Kong, Y., Ren, Z., Dai, Q.: Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 653–664 (2017) 39. Karaoglu, S., Arpaci, U., Ayvaz, S.: A deep learning approach for optimization of systematic signal detection in financial trading systems with big data. SpecialIssue (SpecialIssue), 31–36 (2017) 40. Sezer, O.B., Ozbayoglu, A.M., Dogdu, E.: An artificial neural network-based stock trading system using technical analysis and big data framework. In: Southeast Conference, pp. 223– 226 (2017) 41. Bao, W., Yue, J., Rao, Y.: A deep learning framework for financial time series using stacked autoencoders and long-short term memory. Plos One 12(7), e0180944 (2017) 42. Yan, H., Ouyang, H.: Financial time series prediction based on deep learning. Wirel. Pers. Commun. 1–18 (2017) 43. Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for financial market predictions. FAU Discussion Papers in Economics (2017) 44. Chen, Y., He, K., Tso, G.K.F.: Forecasting crude oil prices: a deep learning based model. Procedia Comput. Sci. 122, 300–307 (2017) 45. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 46. Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted Boltzmann machines for collaborative filtering. In: International Conference on Machine Learning, vol. 227, pp. 791–798 (2007) 47. Salakhutdinov, R.: Learning deep generative models. Annu. Rev. Stat. Appl. 2(1), 361–385 (2009) 48. Kuremoto, T., Kimura, S., Kobayashi, K., Obayashi, M.: Time series forecasting using a deep belief network with restricted Boltzmann machines. Neurocomputing 137(15), 47–56 (2014) 49. Zhu, C., Yin, J., Li, Q.: A Stock Decision Support System Based on ELM, vol. 10(2), pp. 67–79. Springer International Publishing (2014) 50. Batres-estrada, G.: Deep Learning for Multivariate Financial Time Series (2015) 51. Lanbouri, Z., Achchab, S.: A hybrid deep belief network approach for financial distress prediction. In: International Conference on Intelligent Systems: Theories and Applications, vol. 131(2), pp. 1–6 (2015) 52. Shen, F., Chao, J., Zhao, J.: Forecasting exchange rate using deep belief networks and conjugate gradient method. Neurocomputing 167(C), 243–253 (2015) 53. Sharang, A., Rao, C.: Using machine learning for medium frequency derivative portfolio trading. Papers (2015) 54. Zeng, Z., Xiao, H., Zhang, X.: Modeling and decision of financial time series data based on DBN. Comput. Technol. Dev. 27(4), 1–5 (2017) 55. Lu, M.: Empirical Research on Commercial Bank Credit Risk Prediction Based on Deep Credit Network. Taiyuan University of Technology (2017) 56. Russell-Smith, J., Monagle, C., Jacobsohn, M., Beatty, R.L., Bilbao, B., Milln, A., et al.: Can savanna burning projects deliver measurable greenhouse emissions reductions and sustainable livelihood opportunities in fire-prone settings? Climatic Change 140(1), 47–61 (2017)

56

J. Zhang et al.

57. Katragkou, E., Garładłez, M., Vautard, R., Sobolowski, S., Zanis, P., Alexandri, G., et al.: Regional climate hindcast simulations within EURO-CORDEX: evaluation of a WRF multiphysics ensemble. Geoscientific Model Dev. 8(3), 603–618 (2015) 58. Soshinskaya, M., Crijns-Graus, W.H.J., Guerrero, J.M., Vasquez, J.C.: Microgrids: experiences, barriers and success factors. Renew. Sustain. Energy Rev. 40, 659–672 (2014) 59. Munkhdalai, L., Munkhdalai, T., Namsrai, O.-E., Lee, J.Y., Ryu, K.H.: An empirical comparison of machine-learning methods on bank client credit assessments. Sustainability 11(3) (2019) 60. Mancisidor, R.A., Kampffmeyer, M., Aas, K., Jenssen, R.: Deep generative models for reject inference in credit scoring. Quantitative Finance (2019) 61. Bastani, K., Asgari, E., Namavari, H.: Wide and deep learning for peer-to-peer lending. Expert Syst. Appl. 134, 209–224 (2019) 62. Wang, C., Han, D., Liu, Q., Luo, S.: A deep learning approach for credit scoring of peer-to-peer lending using attention mechanism LSTM. IEEE Access 7, 2161–2168 (2018) 63. Ha, V., Lu, D., Choi, G.S., Nguyen, H., Yoon, B.: Improving credit risk prediction in online peerto-peer (p2p) lending using feature selection with deep learning. In: 2019 21st International Conference on Advanced Communication Technology (2019) 64. Mer, J.O., Liu, Y., Zeng, Q., Yang, H.: Anticipating stock market of the renowned companies: a knowledge graph approach. Complexity (2019) 65. Eapen, J., Verma, A., Bein, D.: Novel deep learning model with CNN and bi-directional LSTM for improved stock market index prediction. In: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (2019) 66. Hoseinzade, E., Haratizadeh, S.: CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst. Appl. 129, 273–285 (2019) 67. Lv, D., Huang, Z., Li, M., Xiang, Y.: Selection of the optimal trading model for stock investment in different industries. Plos One (2019) 68. Liu, Y.: Novel volatility forecasting using deep learning-long short term memory recurrent neural networks. Expert Syst. Appl. 132, 99–109 (2019) 69. Moon, K.S., Kim, H.: Performance of deep learning in prediction of stock market volatility. Econ. Comput. Econ. Cybern. Stud. Res. 53, 77–92 (2019) 70. Nikou, M., Mansourfar, G., Bagherzadeh, J.: Stock price prediction using DEEP learning algorithm and its comparison with machine learning algorithms. Intell. Syst. Account Finance Manag. 26(4), 164–174 (2019) 71. Long, W., Lu, Z., Cui, L.: Deep learning-based feature engineering for stock price movement prediction. Knowl. Based Syst. 164, 163–173 (2019) 72. Tashiro, D., Matsushima, H., Izumi, K., Sakaji, H.: Encoding of high-frequency order information and prediction of short-term stock price by deep learning. Quant. Financ. 19(9), 1499–1506 (2019) 73. Jeonga, G., Kim, H.Y.: Improving financial trading decisions using deep Q-learning: predicting the number of shares, action strategies, and transfer learning. Expert Syst. Appl. 117, 125–138 (2019) 74. Zbikowski, K., Adyzynski, P., Gawrysiak, P.: Direct marketing campaigns in retail banking with the use of deep learning and random forests. Expert Syst. Appl. 134, 28–35 (2019) 75. Zarkias, K.S., Passalis, N., Tsantekidi, A., Tefas, A.: Deep reinforcement learning for financial trading using price trailing. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (2019) 76. Li, Y., Zheng, W.S., Zheng, Z.B.: Deep robust reinforcement learning for practical algorithmic trading. IEEE Access 7, 108014–108022 (2019) 77. Ma, L.X., Liu, Y.: Application of a deep reinforcement learning method in financial market trading. In: 2019 11th International Conference on Measuring Technology and Mechatronics Automation (2019) 78. Lebichot, B., Le Borgne, Y. A., He-Guelton, L., Oblé, F., Bontempi, G.: Deep-learning domain adaptation techniques for credit cards fraud detection. In: INNS Big Data and Deep Learning Conference, Springer, Cham, pp. 78–88 (2019)

3 A Survey on Deep Learning in Financial Markets

57

79. Singh, A., Jain, A.: An Empirical study of AML approach for credit card fraud detection– financial transactions. Int. J. Comput. Commun. Control, 14(6), 670–690 (2020) 80. Weber, M., Domeniconi, G., Chen, J., Weidele, D.K.I., Bellei, C., Robinson, T., Leiserson, C.E.: Anti-money laundering in bitcoin: experimenting with graph convolutional networks for financial forensics (2019) 81. Tsang, K.H., Wong, H.Y.: Deep-Learning Solution to Portfolio Selection with SeriallyDependent Returns. SSRN (2019) 82. Emerson, S., Kennedy, R., O’Shea, L., O’Brien, J.: Trends and Applications of Machine Learning in Quantitative Finance. SSRN (2019) 83. Zhang, W.W., Zhou, C.: Deep learning algorithm to solve portfolio management with proportional transaction cost. In: 2019 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (2019) 84. Hu, Y.J., Lin, S. J.: Deep reinforcement learning for optimizing finance portfolio management. In: 2019 Amity International Conference on Artificial Intelligence (2019) 85. Huck, N.: Large data sets, machine learning: applications to statistical arbitrage. Eur. J. Oper. Res. 278, 330–342 (2019) 86. Jiang, Z.Y., Liang, J.J.: Modern Perspective on Reinforcement Learning in Finance. SSRN (2019) 87. Buehler, H., Gonon, L., Teichmann, J., Wood, B.: Deep hedging. Quant. Financ. 19, 1271–1291 (2019) 88. Long, W., Lu, Z., Zhang, J., Huang, W.: Option hedging using Lstm-Rnn: an empirical analysis. Available at SSRN (2019)

Chapter 4

Information Transition in Trading and Its Effect on Market Efficiency: An Entropy Approach Anqi Liu, Jing Chen, Steve Y. Yang, and Alan G. Hawkes

Abstract The Efficient Market Hypothesis has been well explored in terms of daily responses to market movements and financial reports. However, there is lack of evidence about information efficiency after the popularization of intraday trading. We investigate the time series properties of information adopted in the intraday market, in particular the causality effects. We use 30-min market price and news data to represent the past market data and the public information respectively, so that our analysis is in line with the EMH framework. Traders’ responses to such information are associated with the financial crisis. There was strong overreaction to market data right before the 2008 crisis and traders tend to rely more on news data during the crisis. We confirm that, in terms of the intraday information efficiency, it is worthwhile to adopt both types of information. Furthermore, there is still room for improving the price discovery process to reveal such information more effectively.

4.1 Introduction In the recent 20 years, the financial market entered the electronic age. Trading is highly relying on advanced computing techniques, such as big data, text mining. In A. Liu (B) · J. Chen School of Mathematics, Cardiff University, Senghennydd Road, Cardiff CF24 4AG, UK e-mail: [email protected] J. Chen e-mail: [email protected] S. Y. Yang School of Business, Stevens Institute of Technology, 1 Castle Point on Hudson, Hoboken, NJ 07030, USA e-mail: [email protected] A. G. Hawkes School of Management, Swansea University, Bay Campus, Fabian Way, Swansea SA1 8EN, UK e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_4

59

60

A. Liu et al.

particular, the increasing popularity of intraday trading requires quick reactions to business and market information. In this chapter, we explore the information that is widely adopted in intraday trading activities, in particular the causality relationships. They indicate the channels of information transition that form the market movements; moreover, effecting market efficiency. The motivation of this study hinges on the Efficient Market Hypothesis [6]. It involves three information efficiency stages that reflect investors’ responses to past market data, public information (i.e. companies’ financial reports), and non-public information. It has been widely accepted that most developed markets achieve semistrong form market efficiency [2, 7, 8]. However, most studies that examined this argument were in 1980s and only focus on daily market responses. Apparently, such data frequency is too slow to fully represent the market condition now. Increasing high frequency trading volume sharply reduced the responding time of information on the markets [4]. Moreover, development of information technology has led to an “information revolution” in the investment industry. Traders tend to explore broader categories of information to seek profits through the fast moving electronic markets. Apart from the traditional fundamental analysis of financial statements, investors also attempt to “decode” textual information into trading signals [3, 14]. For example, traders in hedge funds track breaking news to respond to shocks quickly. Business news vendors, such as Bloomberg and Thomson Reuters, start providing news analysis solutions that allow investors to develop new strategies. Obviously, business news has replaced the fundamental information as “public information” in the intraday markets. In this study, we bridge the gap of the literature by breaking the limited types of information considered in price discovery and market efficiency. We investigate the two types of information that are the main leads to fast reactions in the modern financial markets: (1) the “instant” market price changes, to represent the past market data; and (2) the business news, to represent the public information. Our objective is to analyze the dynamics of information transitions in order to bring some new insights of information efficiency on financial markets. Intuitively, the two types of information should show causal relationships to each other. The price changes caused by news information can be explained by traders’ responses to business news. On the other hand, the conversion of market information to news means the public is informed about the market conditions through the news media in a timely manner; such an effect is expected to be more significant when there are shocks to the markets. Furthermore, both types of information should also have a “self-causality effect”. Apparently, the “habit” of following-up stories in the news industry would naturally result in a feedback pattern of news information. The rationale of causality in price information is that the technical trading highly involves the use of past market data. Similar to the causality from news to market, this indicates traders’ responses to market information. We believe that, for the entire market, there is an optimal level of reactions to each type of information such that the rest of the market would not be able to discover a profitable strategy using the same information. In other words, causality from news to market and from market to market would be stabilized at certain values which indicate sufficient responses; too high and too low causalities

4 Information Transition in Trading and Its Effect on Market Efficiency …

61

mean overreaction and underreaction respectively. Our results of causality relationships and regressions to market efficiency confirm this point. In traditional financial studies, the causality relationship is usually validated by the Granger causality test. However, this approach has a few limitations that do not fit our context. First, the Granger test is based on linear regression. Price movements caused by diversified trading strategies would be barely simplified into a linear model. For instance, technical analysis involves both momentum trading and reversal trading activities, such that a price increase (or a decrease) may be driven by either direction of price movements. This is obviously a non-linear causal relationship that cannot be identified by linear regression. Second, the Granger test is built under the assumption of Gaussian distributed variables. Its validity for the non-Gaussian financial data is questionable. Furthermore, the Granger test is designed for cross-sectional causality analysis. It is not appropriate to apply it for the self-causality. Therefore, in this study, we introduce the use of entropy measures to identify the causality relationships. This approach has been adopted in a few financial market studies, in particular to solve non-linear causality relationships [5, 10]. Despite overcoming the limitations of Granger test mentioned above, the rationale of using the entropy as an alternative approach is two-fold. From the perspective of multivariate stochastic modeling, it can be interpreted as a causality problem measured by conditional distributions. The entropy measures indicate whether the distribution of one variable is conditional on the process itself or other time series. Also, as a well-developed approach in the field of information theory, it provides valuable insights of quantifying information in financial markets. In this study, we adopt two entropy measures, the conditional block entropy and the transfer entropy, to tackle the self-causality and the cross-sectional causality of the two types of information respectively. We use the S&P 500 index as a representation of market information and the Thomson Reuters news analytics data to reflect news information. We track the changes of causality relationships using a 3-year rolling window and find that all these causalities are closely associated with the market conditions. As mentioned above, we highlight the causalities from market to market and from news to market because they reflect the predictability of price movements through different types of information. We observe a sharp increase of market selfcausality before the 2008 crisis which is due to the price chasing activity; it also indicated an overuse of market information. At that time, news had not yet been widely adopted in trading. It was not until the financial crisis that news information started to be involved in trading strategies and partially replaced the impacts of market information. We find that both causalities are stabilized after the market recovery from the double crisis. This confirms our thought that the market eventually evolves to figure out the efficient use of different information. Another contribution of this study is to examine how the information entropy affects market efficiency. We find that, apart from the causality from market to news, all other entropy measures have significant linear relationships with market efficiency. The results indicate that both technical analysis and trading through news are closely associated with market efficiency. Based on the regression coefficients and the observation of more peaceful responses to information, we think it is worthwhile to adopt these two types of information in the price discovery process.

62

A. Liu et al.

4.2 Information Entropy The concept of entropy was initially introduced by Claude E. Shannon in 1948 to quantify how much information is contained in a signal process. Through decades of research, several entropy measures have been proposed to analyze more complex information systems. This approach has also been applied to examine lagged impacts among financial time series. For example, [5] used transfer entropy of the VIX and the iTraxx Europe index to examine relative power of market risk and credit risk. We also show the equivalence of transfer entropy and the Granger test subject to the Gaussian distributed variables.

4.2.1 Entropy Measures In information theory, entropy measures uncertainty of a process. If we view from the trading perspective, higher uncertainty means less predictability. Therefore, we use entropy to analyze whether new information would help traders to “predict” price changes. Three entropy measures are applied in this study. The first entropy measure, Shannon entropy, is defined on a probability event space X (see Eq. 4.1). If the event space X is a time series, then we are interested in the joint probability spaces of its subseries. We denote k as the number of consecutive observations until time t, xt(k) = xt , xt−1 , . . . , xt−k+1 . Conditional block entropy and transfer entropy are defined based on this notation (see Eqs. 4.2 and 4.3). • Shannon entropy: It measures the amount of information in a random process. A larger entropy denotes less informative, also a higher uncertainty [12]. H (X ) = −



p(x) log2 p(x)

(4.1)

• Conditional block entropy: It measures the amount of information in a subseries that affects the subsequent observation [10]. h X (k) = −



p(xt+1 , xt(k) ) log2 p(xt+1 |xt(k) )

(4.2)

• Transfer entropy: It measures the amount of information in one process that affects the subsequent observation in another process. It examines the asymmetric dynamics of two processes [11]. TY →X (k, l) =

 x,y

p(xt+1 , xt(k) , yt(l) ) log2

p(xt+1 |xt(k) , yt(l) ) p(xt+1 |xt(k) )

(4.3)

4 Information Transition in Trading and Its Effect on Market Efficiency …

63

4.2.2 Propositions of Causal Relationships in Transfer Entropy In the area of finance, conditional block entropy and transfer entropy can be used to explain causal type of relationships. We provide a few propositions to establish the relationship, in particular between Granger causal relationships and the transfer entropy measure. Proposition 4.1 If X is a sequence of i.i.d. random variables, then there is no self information flow within the series X i.e. the conditional block entropy shall be equal to the Shannon entropy. Proof For an i.i.d. sequence X , we have

and

p(xt+1 , xt(k) )

p(xt+1 |xt(k) ) = p(xt+1 )

=

p(xt(k) ) p(xt+1 |xt(k) )

=

p(xt(k) ) p(xt+1 )

(4.4) (4.5)

Then from Eq. (4.2)  p(xt+1 , xt(k) ) log2 p(xt+1 |xt(k) ) h X (k) = −   p(xt+1 ) log2 p(xt+1 ))} = p(xt(k) ){−  = p(xt(k) )H X

(4.6)

= HX  Since p(X t(k) ) = 1. Thus, the conditional block entropy is the same as the unconditional entropy or, the past provides no information about the future. Proposition 4.2 For two independent series X and Y , the transfer entropy between them will be zero (i.e. no causal relationships between X and Y ). Proof For the two series X , Y , the transfer entropy satisfies Eq. (4.3). If the two series are independent, we have p(xt+1 |xt(k) , yt(l) ) = p(xt+1 |xt(k) ). Then for all possible series values the logarithmic term in the above expression becomes log2 (1) = 0. So TY →X = 0 for any positive integers k and l. Similarly, TX →Y = 0 as well. This proposition shows that non-zero transfer entropy between two time series indicates the existence of a dependent relationship. We can safely assume there is a causal relationship from the source series (Y ) to the target series (X ) if the transfer entropy TY →X > 0.

64

A. Liu et al.

Proposition 4.3 Granger causality and transfer entropy are equivalent if all variables involved are distributed as multivariate normal distributions. Proof This is a more succinct proof of a result of [1]. For any random vector Z with probability density f (Z ) the entropy is defined as  H (Z ) = −

f (z) ln f (z)dz = −E[ln f (Z )].

(4.7)

Note that we are using “Natural” logarithms rather than base 2 logs that are common in information theory. If Z has multi-Normal distribution Z ∼ M N (μ, (Z )) the probability density is f (z) =(2π )− 2 d Z |(Z )|− 2   1 exp − (z − μ) (Z )−1 (z − μ) , 2 1

1

(4.8)

where d Z is the dimension of Z. Then 1 1 d Z ln(2π ) + ln |(Z )| 2 2  1 (Z − μ) (Z )−1 (Z − μ) . +E 2

H (Z ) =

(4.9)

But the quadratic form in the final term has a chi-squared distribution with d Z degrees of freedom, and so has expectation d Z . Therefore 1 1 d Z ln(2π ) + d Z 2 2 1 d Z ln(2π e). 2

1 ln |(Z )| + 2 1 = ln |(Z )| + 2

H (Z ) =

Now let Z =

(4.10)

X , then Eq. (4.8) can be written as W f (z) = f (w) f (x|w),

(4.11)

where f (w), similar to Eq. (4.8), f (w) = (2π )− 2 dW |(W )|− 2   1 exp − (w − μW ) (W )−1 (w − μW ) , 2 1

1

(4.12)

4 Information Transition in Trading and Its Effect on Market Efficiency …

65

The conditional density is f (x|w) = (2π )− 2 d X |(X |W )|− 2   1 exp − (x − μ X |W ) (X |W )−1 (x − μ X |W ) , 2 1

1

(4.13)

where the conditional dispersion matrix is (X |W ) = (X ) − (X, W )(W )−1 (W, X )

(4.14)

with (Z ) = 

X W



=

(X ) (X, W ) . (W, X ) (W )

(4.15)

Note that, from Eqs. (4.8) and (4.11) to (4.13) |(Z )| = |(W )||(X |W )|.

(4.16)

Let xt+1 , xl(k) , yt(l) have a multivariate Normal distribution. Then transfer entropy is TY →X (k, l) = H (xt+1 |xt(k) ) − H (xt+1 |xt(k) , yt(l) ) 1 1 = ln |(xt+1 |xt(k) )| + ln(2π e) 2 2 1 1 (k) (l) − ln |(xt+1 |xt , yt )| − ln(2π e) 2 2 (xt+1 |xt(k) ) 1 = ln 2 (xt+1 |xt(k) , yt(l) )

(4.17)

The argument of the logarithm is just the ratio of the variance of xt+1 conditional on xt(k) and the variance of xt+1 conditional on both xt(k) and yt(l) . As we are dealing with multivariate Normal, these are calculated by appropriate forms of Eq. (4.14), which is a standard result for linear regression (whether or not distributions are Normal). This is therefore exactly the criterion that is used to determine whether Y Granger causes X , and so Granger causality and transfer entropy are equivalent if all variables involved are distributed as multivariate Normal. From the above three propositions, we conclude that the entropy method is sufficient in identifying causal relationships that are relevant and important for many fundamental financial problems.

66

A. Liu et al.

4.3 Methodology In this section, we outline the use of entropy measures to explore empirical features of intraday market price information and news information. The purpose is to show the dynamics of information transition on “high-frequency” level, in particularly consider news as the main source of public information for intraday trading.

4.3.1 Measuring Causality Using Entropy We investigate self-causality and cross-sectional causality of price and news information.

4.3.1.1

Self-causality of Information

The first property we examine is the self-causality effect. We think both types of information should carry some memory. The impact of price to itself is due to the use of past market information in trading, which is consistent with the understanding of technical analysis. The existence of technical trading is also crucial to ensure, at least, the weak form market efficiency. The memory of news information is often called “news of news”. For instance, one important Fed’s announcement could be followed by hundreds of news articles. In addition, it often hinges on the more complicated nature of news, namely the speed of news publication, contents of news, news sources and their validity, etc. The self-causality can be interpreted as the statistical property that the probability distribution at time t is conditional on the filtration of previous observations Ft−1 . This is slightly different from the serial correlation as it may lead to non-linear relationships. As suggested by the definition of conditional block entropy, it quantifies the uncertainty of signals based on known information. This is a perfect match for the identification of self-causality. We denote Δ X (k) as the contribution of memory xt(k) (see Eq. 4.18). The larger block size k, the longer memory is available to estimate xt+1 and the larger Δ X (k). Δ X (k) = H X − h X (k)

(4.18)

In other words, Δ X (k) increases until k reaches the memory length k X , and is bounded by 0 ≤ Δ X (k) ≤ H X .

4 Information Transition in Trading and Its Effect on Market Efficiency …

67

We standardize this measure through the Eq. 4.19 that maps the value to [0, 1], in order to keep the consistency for comparison. It can also be interpreted as “the contribution of previous information in percentage.” Δ X (k) h X (k) =1− HX HX 4.3.1.2

(4.19)

Cross-Sectional Causality of Information

The causality from price to news, in reality, would happen. For instance, if a sharp price increase or decrease occurs on the market index, this information is usually spread out to the public immediately as a breaking news. While intuitively, such causality effect would be weaker than the opposite direction, i.e. from news information to price movements. As news is considered as a reliable input for price discovery nowadays, this causality is directly relating to trading dynamics. Furthermore, we think this is a supplementary of the EMH which failed to consider the potential of using varied types of public information, apart from the companies’ reports, in rational trading practice. According to the definition in [11], transfer entropy TY →X measures the amount of information in Y to “forecast” X excluding the information of X itself used in the self-causality process. We adopt transfer entropy to investigate the causality between different types of information. It is different from the Granger causality in terms of the non-linearity and non-Normality. We also verified in Sect. 4.2.2 that these two measures are equivalent for Gaussian distributed variables. Similar to the self-causality, we also standardize the transfer entropy TY →X based on its upper boundary h X (k), i.e. TY →X . h X (k)

4.3.2 Entropy Calibration The entropy calibration involves modeling the probability distributions of the observed processes. Even though our data of both price and news information is continuous, we use discrete distributions in this study. This is firstly because the computing complexity is too high for continuous distributions. More importantly, we think discrete states of information are associated with investors’ trading philosophy. As trading decisions are usually based on optimistic/pessimistic prospect, we define three states for each type of information (see Eq. 4.20). ⎧ ⎪ ⎨−1, x(t) < μ − d L(t) = 0, μ − d ≤ x(t) ≤ μ + d ⎪ ⎩ 1, x(t) > μ + d

(4.20)

68

A. Liu et al.

in which x(t) is the observation at time t. The threshold d is determined based on the “even” distribution criterion Pr (−1) ≈ Pr (0) ≈ Pr (+1) ≈

1 . 3

Obviously, μ is the mean. This also provides the largest Shannon entropy among all three-state probability distributions. To measure the self-causality, it is crucial to choose an appropriate block size for the conditional block entropy. The principle is to make the block size as large as possible such that the useful information in previously observed X is fully extracted. Ideally, Δ X (k) reaches 0 if k is large enough, This means the process X does not hold a memory longer than k time intervals. However, due to the small sample bias in real practice, we may not be able to get an accurate estimation of Δ X (k) after k increases to a certain point. This issue is also mentioned by [10]. They introduced a solution called “effective transfer entropy”, which is to remove the noise by shuffling the process X . We adopt this modification in our study. The block size determined for Δ X (k) is applied in the estimation of cross-sectional causality TY →X ; and the block size for Y in this transfer entropy is set to 1.

4.4 Data In this study, the market information and news information are represented by market index price and news sentiment respectively. We use 30-min time intervals to observe the intraday dynamics. The dataset is from January 1, 2003 to December 31, 2014, excluding non-trading hours.

4.4.1 Market Index Price Stock market indexes are proxies of equity market performance. In this study we use S&P 500 (.SPX), a capitalization-weighted market index, to best represent the U.S. stock market. We collect 30-min intraday prices of the market index from Thomson Reuters Tick History™(TRTH). We use 30-min return for the entropy calculation as return is stationary. Pt rt = log Pt−Δt in which Pt denotes the index price and Δt is 30 min.

4 Information Transition in Trading and Its Effect on Market Efficiency …

69

4.4.2 News Sentiment Data The news data is provided by the Thomson Reuters News Analytics™(TRNA). It is a professional news sentiment database that has been adopted by previous studies [15]. The database contains over 80 metadata fields about financial news. News sentiment is the tone of news articles, i.e. good news or bad news. In the context of financial news, it tells prospect of bull or bear markets. The advantage of the TRNA database is that, apart from the sentiment of each piece of news, it provides a “relevance score” for each company mentioned in the news article to show relevance of the news to individual stocks. The metadata fields we used for sentiment calibration in this chapter are listed below. – datetime: The date and time of a news article. – ric: Reuters Instrument Code (RIC) of a stock for which the sentiment scores apply. – pos, obj, neg: Positive, neutral, and negative sentiment probabilities (i.e., pos + obj + neg = 1). – relevance: A real-valued number between 0 and 1 indicating the relevance of a piece of news to a stock. One news article may refer to multiple stocks. A stock with more mentions will be assigned a higher relevance. To evaluate the sentiment score of each record in the database, we calculate the standardized expectation of sentiment probabilities adjusted by relevance value (see Eq. 4.21). Then we calculate 30-min time-weighted-average news sentiment. The news published in non-trading hours are counted into the first 30 min of the following trading day. Sentiment = relevance × (pos − neg) × (1 − obj)

(4.21)

4.5 Results The objective of this study is to investigate the non-linear statistical relationships of the two types of information. More precisely, we would like to show how the selfand cross-sectional causality change over time. Based on this, we will then explore some insights of information efficiency issue in intraday markets. In general, the calibration of entropy needs large samples. We use a 3-year daily rolling window in this study. Every sample contains almost 10, 000 observations.

70

A. Liu et al.

Table 4.1 The distributions of market and news information Mean μ S&P500 return News sentiment

0.00 0.05

Threshold d 0.0006 0.0291

Fig. 4.1 Shannon entropy

4.5.1 The Distributions and the Shannon Entropy We use the full dataset to calibrate threshold defined in Eq. 4.20 (see Table 4.1). As mentioned in Sect. 4.3.2, the criterion of data partition is to achieve equivalent probabilities for the 3 states, of which the entropy value is 1.585. This is a benchmark and an upper boundary for any 3-state probability spaces. The further the entropy is below this value, the more observations are biased from equiprobability. However, it does not tell whether the values are biased to the positive or negative side. In the S&P500 returns, we observe a “shock” in Shannon entropy (see Fig. 4.1). We conjecture these entropy changes are highly associated with the formation of a price bubble that burst in 2008, thereby, detonating the crisis. The full period of the recent liquidity crisis has been featured in low entropy during late 2009. The Shannon entropy of news sentiment is less stable apart from the years after the 2008 financial crisis. For most of the time, the news sentiment entropy appears far from the benchmark 1.585, which may suggest that, in a 3-year time window, news articles usually indicates strong prospect in positive or negative market conditions. News information in the early ages is a bit “shaky” such that the entropy is volatile, showing a lack of consistent indication throughout short time periods. We observe a drop-then-rise pattern before and after the financial crisis. We think this is because good information shifted to bad information and eventually news becomes neutral after the shock in the market. In recent years, recovery of financial markets brings some good news and this explains the decrease of news sentiment entropy after 2012. Even though we observe unevenly distributed news sentiment again, we believe the

4 Information Transition in Trading and Its Effect on Market Efficiency …

71

news information is not biased too much as the entropy is still much higher than that before the financial crisis.

4.5.2 Results of Self-causality We introduced the rationale of self-causality in both market and news information in Sect. 4.3.1.1. In general, it can be regarded as the “memory” of a time series. The memory length (i.e. the optimized block size) and strength (i.e. conditional block entropy) are meaningful features of both types of information. Recall that we consider a 3-year rolling window to incorporate sufficient data to obtain the optimal memory length reflecting the impact on the market. In this case, the information flow of each point at time t actually represents an accumulative effect of the past 3 years prior to time t. According to Fig. 4.2, the memory lengths of both types of information cluster into three time periods: pre-crisis (before 2008), crisis (2008–2013, covering both 2008 liquidity crisis and EuroDebt crisis) and post-crisis (after 2013). This pattern is clearer for news information. News tends to update slower than returns and we think this will also be related to the memory length and strength. We observe that the memory of news increases from 2 blocks (i.e. 1 h) to 6 blocks (i.e. 3 h). As discussed in Sect. 4.3.1.1, a piece of news often has many follow-up stories. This could be due to varied broadcasting speed for different news vendors. Generally, big business news vendors tend to gather information and broadcast them faster than others and they also can follow up the development of the news that could last for hours. Sometimes even for the same piece of news, the subjective opinions and reporting styles of different vendors may cause differences in prospect of information. There is no doubt that the number of business news vendors has increased during the past decades. The increasing memory length indicates that, as a consequence of “news of news”, the timeline of follow-up news became longer. Furthermore, we find that increasing news memory length does not result in stronger

Fig. 4.2 Memory length of market and news information

72

A. Liu et al.

Fig. 4.3 Self-causality of market and news information

memory. Instead, self-causality of news information decreased. This indeed confirms the diverted opinions from or subjects covered by different news vendors, which result in less repetition of news information in recent years (Fig. 4.3). On the contrary to the increasing news memory length, the memory length of market information decreases from 3 h to a volatile stage of 1.5–3 h during the crisis, then stabilized at 1.5 h after 2013. We think this indicates more efficient technical trading in the intraday market which leads to a faster price discovery process. Obviously, the self-causality of market information is much weaker than that of the news information. This supports the argument of weak form market efficiency. The market movements should in general be hard to predict by using past market information, which results in a very low self-causality. However, we also observe the sharp increase of memory before the financial crisis. We believe this is an indication of pricing chasing trading activities; a lot of investors were seeking profits in the bull market without investigating fundamental values and raised the price. Eventually, the overreaction to good market information ended up with a crash. Therefore, after the financial crisis, the market returned to a rational environment in which the technical trading revealed the past market information into price discovery immediately. This can be supported by the stabilized but low self-causality of market information. In other words, the market reaches the weak form efficiency such that the price movements are not predictable only through technical analysis (Fig. 4.3).

4.5.3 Results of Cross-Sectional Causality Causality between the two types of information is measured by transfer entropy. It tells how much an extra source of information would contribute to the predictability. Similar to the self-causality, we also observe the 3 ages for cross-sectional causality in both directions. The causality from market information to news information tells the development of the financial news industry. Back to 2000s the news indus-

4 Information Transition in Trading and Its Effect on Market Efficiency …

73

Fig. 4.4 Cross-sectional causality of market and news information

try was under a transformation from paper-based to online media. We can see that before 2008 there was almost no news tells market conditions in a timely manner, at least within 1 block time (i.e. 30 min). However, things began to change during the financial crisis. The popularizing online news media was responding to the shocks in the market. This naturally leads to sharply increasing causality of market to news. It also explains why such causality effect died down in recent years; “no news is good news” when the market is stable (Fig. 4.4). On the other hand, the causality from news information to market information tells the story of news-based trading strategies. This direction of causality is more insightful as it reflects the efficiency of public information in intraday market. We observe very clear indication of the use of news information during the double crisis periods. In particular, the causality during the 2008 crisis is stronger than the Eurodollar crisis. This makes sense as the market we examine is the U.S. market, which experienced the severest crash in 2008. Recall that we also observed stronger self-causality of market information in the same period. This means, albeit traders’ reaction to bear market helped to bring back the rational price, it was not sufficient so that trading based on news information also contributed. The predictability of news also dropped in recent years, some time even to zero. We think this shows a semi-strong form of market efficiency. In general, the intraday traders have explored news information well to reveal new public information in the price movements and killed the potential profitability.

4.5.4 Market Inefficiency and Information Entropy One ultimate goal of this study is to understand, after investigating the statistical properties of both types of information, how the market efficiency can be affected. We still scope the problem in the intraday market. We want to know when the market changes to be more efficient/inefficient and whether the causalities of information

74

A. Liu et al.

Fig. 4.5 Market (in)efficiency index

appear to be systematic. These are very important questions as overuse of information would result in overreactions and then harm the information efficiency in financial markets. In this study, we explore the relation of information entropy and market efficiency through regression analysis. We adopt the efficiency index (EI) proposed by [9] to estimate financial market efficiency (see Eq. 4.22).   n  M i − Mi∗ 2  EI = Ri i

(4.22)

i is the ith efficiency measure, Mi∗ is the expected value of ith measure for where M the efficient market, and Ri is the range of the ith measure. Obviously, EI = 0 for an efficient market, and the higher the EI the stronger the inefficiency. According to [9], the Hurst exponent, the fractal dimension, and the first order autocorrelation are selected as three efficiency measures in this efficiency index. For the efficient market, market return follows Brownian motion so that expected values of the three measures are 0.5, 1.5 and 0.0, respectively; the ranges of the three measures are 1.0, 1.0 and 2.0, respectively. To be consistent with the evaluations of information entropy, we also use a 3year rolling window for the efficiency index. According to the definition, this index actually measures the level of market inefficiency. In other words, the smaller index value, the higher market efficiency. Overall, the efficiency did not change much during the sample period: the value ranged between 0.08 and 0.11 (see Fig. 4.5). An interesting observation was the increasing efficiency before the financial crisis, when the market was undoubtedly in an irrational age with a lot of positive feedback trading [13]. We know that, according to the EMH, rational investment leads to market efficiency. Meanwhile, according to the efficiency index, the inverse (i.e. irrational investment leads to market inefficiency) does not hold. We examine the following two linear models for the information entropy and market efficiency. In terms of the notation, EI is the efficiency index, R denotes

4 Information Transition in Trading and Its Effect on Market Efficiency … Table 4.2 Market inefficiency versus Information entropy Model I Dependent variable E I (t) Const. E I (t − 1) I R→R (t) I S→S (t) I S→R (t) I R→S (t) F-statistic Adj. R-squared Number of observations Residual Degrees of Freedom

0.0223 (21.955) 0.8009 (87.906) −0.1779 (−10.463) −0.0035 (−5.469) 0.1945 (7.556) 0.0608 (0.923) 2994. 0.881 2014 2009

75

Model II E I (t) 0.0202 (21.317) 0.8182 (95.131) −0.1793 (−13.318)

0.1969 (7.613)

4910. 0.880 2014 2011

Note We show the T-stats in parentheses. Significance level code: 0 ‘ ’ 0.001 ‘ ’ 0.01 ‘ ’ 0.05

the market return and S denotes the news sentiment. Recall that we use entropy to measure causalities, for example, I S→R is the causality from news to market. Model I: Market efficiency versus All information entropy. EI(t) = β0 + β1 EI(t − 1) + β2 I R→R (t) + β3 I S→S (t) + β4 I S→R (t) + β5 I S→S (t) Model II: Market efficiency versus Trading related causalities. EI(t) = β0 + β1 EI(t − 1) + β2 I R→R (t) + β3 I S→R (t) In both models, we use the lag-1 efficiency index as a control variable. This is to ensure the validity of the linear regressions as the market efficiency is highly autocorrelated. The Model I involves all information entropy even though some are not directly associated with market movements. The result of this model is validate our understanding of causalities among market and news information we interpreted throughout this chapter. The two trading related causalities (i.e. I R→R and I S→R ) show strong and significant linear relationship with the market efficiency (see Table 4.2). The coefficient I S→S is also significant, however, the relation is much weaker. It indicates the “news of news” somehow triggers trading reactions in the market, but not as strong as the “first-hand” information. Apparently, the I R→S is the most irrelevant to trading activities and, as expected, the coefficient is not significant.

76

A. Liu et al.

We further examine the two trading related information entropy measures in Model II. The coefficients are similar to those in the Model I. It is clear that the impacts of these two trading strategies are opposite. The increasing I R→R is associated with higher efficiency. On the contrary, the increasing I S→R is paired to lower efficiency. We think this could be explained by different ways to use information. We have mentioned that technical trading would lead to non-linear self-causality in market information. However, the reactions to news information are usually linear; positive news should cause price increase and vise versa. Therefore, I R→R does not necessarily mean predictability of price movements as we cannot tell which direction it is. Consider the fact that technical analysis has been broadly explored for decades, we think this result shows technical trading would facilitate the price discovery using market information. This argument is consistent with the EMH in terms of weak form efficiency. On the other hand, I S→R is associated with predictability of news to market and indicates the news information is not fully revealed. Hence, the coefficient of I S→R is positive. It also tells sufficient use of news information would kill the predictability of this causality and improve the market efficiency. This is another argument that is consistent with the EMH.

4.6 Conclusions The objective of our study is to find the information causality in the intraday market. We follow the EMH so that market data and news data (i.e. public information) are selected in this analysis. We find that the self-causality of both types of information declined during the entire sample period. In terms of news data, this is an indication of more diversified news publishers. On the other hand, the self-causality of market data tends to converging after 2012. We think this shows there is no overreaction in the market. The changing of cross-sectional causalities matches the financial crisis. In both the 2008 global crisis and the Euro Dollar crisis, we observe increasing news followed up financial market condition and more market movements caused by updating news. Both causalities diminished after the crisis. Firstly, traders are less sensitive to breaking news under good market conditions. Second, and more importantly, the news data is not fully investigated by traders. To examine the EMH in the modern, fast moving financial markets, we also highlight two regression models of the market (in)efficiency index. In the first model, we apply all causality measures as dependent variables and find out that their associations with the market efficiency are all linearly significant apart from the causality from market to news. To confirm traders’ use of information in price discovery would improve market efficiency, we run a second linear model that only involves the self-causality of market data (i.e. traders’ use of past market data) and causality from news to market (i.e. traders’ use of public information). The coefficients are consistent with those in the first model and are much larger than the coefficient of self-causality of news.

4 Information Transition in Trading and Its Effect on Market Efficiency …

77

In summary, we verify that appropriate use of information can improve market efficiency. More precisely, sufficient use of market data based on technical analysis would improve the market efficiency. We think the market is a bit conservative according to the relative low level of market to market causality. News has been overused during the financial crisis as it was an alternative to market data. We think that was a special case caused by the unreliable price movements. In general, news has not been well explored and there is no sign of overreaction after the financial crisis.

References 1. Barnett, L., Barrett, A.B., Seth, A.K.: Granger causality and transfer entropy are equivalent for Gaussian variables. Phys. Rev. Lett. 103, 238701 (2009) 2. Basu, S.: The relationship between earnings’ yield, market value and return for NYSE common stocks: further evidence. J. Financ. Econ. 12(1), 129–156 (1983) 3. Boulland, R., Degeorge, F., Ginglinger, E.: News dissemination and investor attention. Rev. Financ. 21(2), 761–791 (2017) 4. Brogaard, J., Hendershott, T., Riordan, R.: High-frequency trading and price discovery. Rev. Financ. Stud. 27(8), 2267–2306 (2014) 5. Dimpfl, T., Peter, F.J.: Using transfer entropy to measure information flows between financial markets. Stud. Nonlinear Dyn. Econom. 17(1), 85–102 (2013) 6. Eugene, F.: Fama: efficient capital markets: a review of theory and empirical work. J. Financ. 25(2), 28–30 (1970) 7. Goss, B.A.: The semi-strong form efficiency of the London metal exchange. Appl. Econ. 15(5), 681–698 (1983) 8. Groenewold, N., Kang, K.C.: The semi-strong efficiency of the Australian share market. Econ. Rec. 69(4), 405–410 (1993) 9. Kristoufek, L., Vosvrda, M.: Measuring capital market efficiency: global and local correlations structure. Phys. A: Stat. Mech. Appl. 392(1), 184–193 (2013) 10. Marschinski, R., Kantz, H.: Analysing the information flow between financial time series. Eur. Phys. J. B 30(2), 275–281 (2002) 11. Schreiber, T.: Measuring information transfer. Phys. Rev. Lett. 85(2), 461 (2000) 12. Shannon, C.E.: The Shannon information entropy of protein sequences. Bell Syst. Tech. J. 27(379–423), 623–656 (1948) 13. Shleifer, A.: Inefficient Markets: An Introduction to Behavioral Finance. Oxford University Press (2000) 14. Sutton, S.G., Arnold, V., Bedard, J.C., Phillips, J.R.: Enhancing and structuring the MD&A to aid investors when using interactive data. J. Inf. Syst. 26(2), 167–188 (2012) 15. Yang, S.Y., Song, Q., Mo, S.Y.K., Datta, K., Deane, A.: The impact of abnormal news sentiment on financial markets. J. Bus. Econ. 6(10), 1682–1694 (2015)

Chapter 5

Survey of Lattice-Based Group Signature Lei Zhang, Zhiyong Zheng, and Wei Wang

Abstract Group signature has two basic properties: anonymity and traceability. Due to its good properties, it has many applications in economy, politics, electronic voting, privacy protection, anonymous authentication and so on. But traditional group signature could not resist the quantum computational attacks. Lattice theory is seen as the most promising post-quantum crypto theory due to the fact that it is a kind of linear structure and that most of its operations are linear operations. Moreover, the lattice theory has better asymptotic efficiency than others do. The lattice-based group signature can not only keep its original security properties, but also resist quantum attacks, it has become a research hot spot. Therefore we think it’s necessary to sort out the achievements of lattice-based group signature in recent years. In this chapter we first simply reviewed the research progress of the traditional group signature, and then we summarized the main progress on lattice-based group signature schemes in recent years. Then we analysed the tools they used when designing signature schemes. In addition, we made a comparison about functionality and security assumptions. Finally, we put forward the further research direction and the development trend.

L. Zhang (B) · Z. Zheng · W. Wang School of Mathematics, Renmin University of China, Beijing, People’s Republic of China e-mail: [email protected] Z. Zheng e-mail: [email protected] W. Wang e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_5

79

80

L. Zhang et al.

5.1 Introduction 5.1.1 Backgrounds Since digital signature put forward by Diffie and Hellman [1] in 1976, this technology has become an important tool for constructing cryptographic algorithms and security protocols, etc. Chaum and Heyst [2] proposed group signature for the first time. Group signature allows group members to sign on behalf of the group anonymously. In case of disputes, group manager can track down the signers. At present, group signature has been widely used in finance, economy, politics, electronic voting [3], privacy protection [4–6], internet of vehicles protection [7, 8] and other specific scenarios. However, with the realization of quantum algorithm, those schemes are becoming increasingly insecure, as it can’t resist quantum attack. Traditional signature schemes are based on large integer decomposition and discrete logarithm. Compared with traditional schemes, those schemes based on post quantum cryptography [9], such as lattice, hash function, coding and multivariable cryptography can resist quantum computational attack. For years, scientists have been studying the design of group signature scheme based on post quantum cryptography. Among them, lattice theory has some advantages over others, for example, it’s operations are linear operations. In addition, it’s algebraic results are simple. Thus group signature based on lattice hard problems is faster and more secure, which has attracted many attentions and emerged some research results. Therefore, we hold the view that it is necessary to summarize the achievements in recent years.

5.1.2 Outline of This Chapter In the forthcoming chapters, we first review the traditional group signature scheme and its construction methods in retrospect in Sect. 5.2. Then we recall some basic definitions of lattice, and next summarize and compare the achievements of latticebased group signature in terms of efficiency, practicability and security and so on in Sect. 5.3. Finally, the further research direction of lattice-based group signature is proposed in Sect. 5.4.

5 Survey of Lattice-Based Group Signature

81

5.2 Traditional Group Signature Algorithm and Its Research Progress 5.2.1 Traditional Group Signature in the Random Oracle Model (ROM) Chaum and Heyst [2] proposed group signature for the first time. Subsequently Park et al. [10] first proposed an identity-based group signature scheme, which can verify the signature from the identity of group members. In 2003, Bellare et al. [11] first gave the formal definition of static group signature, and introduced the security requirements of group signature. When it comes to the security, they gave the formal definitions of fully anonymity and full traceability, and analysed the fact that these two properties can include the previous multiple security requirements. The security model of group signature is a static model, which can not be used to add new members dynamically. We recall the definition of group signature presented by Bellare et al. [11]. A group signature scheme GS is an array of four probabilistic polynomial time PPT algorithms (KeyGen, Sign, Verify, Open): KeyGen: input 1n and 1 N , where n is the security parameter, N is the number of group members, output group public key gpk, group manager private key gmsk, user’s private key gsk = {gsk[i]}i∈{0,1,...,N −1} where gsk[i] is the i-th user’s private key (i = 1, 2, . . . , N ). Sign: input the i-th user’s secret key gsk[i], message M ∈ {0, 1}∗ , output the signature δ of M. Verify: input group public key gpk, message M ∈ {0, 1}∗ , signature δ, if δ is the valid signature of M, then output 1, otherwise output 0. Open: input the secret key of group manager gmsk, message M ∈ {0, 1}∗ , the valid signature δ of M. If open successfully, then output index i ∈ {0, . . . N − 1}, otherwise output ⊥. The scheme must be right, namely for all n, N, (gpk, gmsk, gsk) ∈ KeyGen, i = {1, 2, . . . N } and M ∈ {0, 1}∗ , there has strong possibility for the following situation: Verify (gpk, M, σ ) = 1 and Open (gpk, gmsk, M, σ ) = i. In order to meet the needs of real life, since then more and more practical group signatures have been proposed one after another. The first group signature which supports revocation was proposed by Boneh et al. [12] on ACM. The verifier can update the group message at any time. The core of the scheme is to construct a zero knowledge proof protocol using SDH to obtain group signature. Member revocation is realized by group administrator updating public key and group member updating their private key locally. Although the signature and verification of the scheme are very efficient, it is a defect that the scheme can not support the group members join in dynamically. Shortly afterwards, Bellare et al. [13] formally proposed the security model of dynamic group signature scheme and its formal definition. The scheme not only

82

L. Zhang et al.

includes group manager who can decide user’s joining, but also includes group tracing manager TM who can open the signature. The scheme consists algorithms: group set up, key generation, member join, member update, sign, verify, trace and judge process. It needs to meet the requirements of correctness, anonymity, non-framing, traceability and tracking soundness. In the same year, Kiayias et al. [14] advanced the first efficient group signature which permitted the prospective users to concurrent join. This protocol uses a simple joining protocol based on a single message and signature response interaction between the prospective users and the GM. A dynamic full anonymous short group signature was proposed by Delerablee et al. [15] in 2006. Shortly after that, Khader [16] proposed the first attribute-based group signature. And Nakanishi et al. [17] proposed a group signature scheme with constant complexity for both signature and verification. This scheme can support group member revocation, but the size of public key increases linearly with the number of all group members, and must be determined in advance. In order to better balance the relationship between accountability and privacy protection, Sakai et al. [18] proposed a group signature with message-dependent opening. After that, Bootle et al. [19] proposed a completely dynamic group signature scheme.

5.2.2 Group Signature Scheme Under Standard Model Most schemes are based on the lattice-based difficulty problems in the random oracle model, a few schemes are proposed under standard model. In 2005, Ateniese [20] first proposed group signature under the standard model. Then, Boyen et al. [21] proposed a compact group signature under the standard model. And Groth [22] proposed a completely anonymous group signature under the standard model.

5.3 Lattice-Based Group Signature 5.3.1 Preliminaries In 1996, Ajtai [23] pointed out that some problem’s difficulty in the average-case on lattice is equivalent to the worst-case’s difficulty. This conclusion put forward the possibility of constructing secure cryptography based on lattice, which is a milestone development. We first briefly recall the definition of the lattice and some lattice hard problems. Definition 5.1 (Lattice) For a n-dimensional Euclidean space, if there is a set of n linearly independent  vector a1 , a2 , · · · am ∈ R , let A = [a1 , a2 , . . . am ], then the set Λ = (A) = xi ai |xi ∈ Z is called lattice, where the order of  is m. Any

5 Survey of Lattice-Based Group Signature

83

linearly independent vectors which can generate a lattice are called the basis of a lattice. Lattice  is a discrete additive subgroup of R n . Definition 5.2 (Shortest Vector problem, SVP) Find a vector v ∈ , v = 0, such that for any u ∈ , u = 0, it can satisfy v ≤ v. There is more than one shortest vector in a lattice. Definition 5.3 (Shortest Independent Vectors problem, SIVP) The order of lattice  is n. Find independent vector vi ∈ , i = 1, 2 . . . n, such thatvi ≤ λn (), where λi () = inf{r | dim (span ( ∩ Bn (r ))) ≥ i}(i = 1, 2 · · · n) . Definition 5.4 (Closest Vector problem, CVP) Assume w ∈ R m , w ∈ /  and there is a target vector t ∈ R m , find a vector v ∈ , v = 0, which satisfies that for any u ∈ , u = 0, there has v − t≤u − t. Definition 5.5 (Short Integer Solution problem, SIS) Given a uniform random matrix A ∈ Z qn×m and a real number β, The SIS problem is as follows, find a vector x ∈ Z m , x = 0, such that |x|∞ ≤ β and Ax = 0 mod q.

5.3.2 Definition of Group Signature Based on Lattice After Regev et al. [24] proposed the learning with error problems (LWE) and Gentry et al. [25] proposed the trapdoor problem for the difficult lattice, people began to focus on the study of group signature based on lattice. Gordon et al. [26] proposed the first lattice-based group signature in the random oracle model, which was also called GKV. The signature scheme based on the learning with error problem (LWE), and the relationship between the size of group signature and the maximum number of group members is linear. The tracking key is generated by the trapdoor sampling function, and the signature key of group members is distributed by the method of orthogonal sampling, and the message is signed by the signature scheme of Gentry et al. [25]. Using the encryption scheme based on Regev et al. [24] to encrypt the message and obtain the ciphertext C. A non-interactive witness-indistinguishable model (NIWI) under ROM is constructed to determine the gap version of the CVP. Using the construction idea of signature encryption NIWI proof, the idea of signature scheme is as follows:   G.keyGen 1n , 1 N : For all users u = 1, 2, . . . N , use Trapsamp (1n , 1m , q), calculate (B1, S1 ),. . .(B N, S N ), use Or thosamp (1n , 1m , q, Bi ), calculate (Ai , Ti ), output N N N as group public key, Tk = (Si )i=1 as tracing key, gsk = (Ti )i=1 P K = (Ai , Bi )i=1 as user’s secret key.  G.Sign (gsk[ j] , M): Choose a random number r ← {0, 1}n , let M¯ = M r , calculate h i = H ( M¯ i), H : {0, 1}∗ → Z qn , H is a hash function in the random oracle model.

84

L. Zhang et al.

  For each 1 ≤ i ≤ N : calculate e j ←GPVInvert A, T j , s, h j . For i = j, choose ei ∈ Z qm , which is uniform satisfy Ai ei = h i (modq). For all i, sample si → Z qn , calculate z i = BiT si + ei (modq) ∈ Z qm . Last, for gap language L s,γ , construct a NIWI proof π , output signature (r, z 1 , z 2 . . . z N , π ).  G.verify (PK, M, σ ): For signature (r, z 1 , z 2 . . . z N, π ), let M¯ = M r , Output 1 iff π is correct. And for all i, all satisfy Ai z i = H ( M¯ i)(modq). G.open(TK,M,σ ): For signature . . . z N , π ), use {Si }, output the smallest    (r, z 1 , z 2√ index i which satisfy dist  BiT , z i ≤ s m.  √ In the random oracle model, if L W E m,q,α is hard, where α = s/ q 2 , then the proof system is witness indistinguishable, and then the system is anonymous. If   GapSV Pγ is hard for γ = O nlog 4 n , then the system satisfies traceable. However, there are the following disadvantages: the size of signature is O˜ (N ), where N refers to the group user’s quality. When N is too big, the efficiency is not very high. In the key generation stage, because the group manager has all member’s signature keys, the dishonest group manager may forge the signature of the group members, which will lead to frame attack. Meanwhile, the scheme has some defects when adding or deleting group members. For example, when the group members are added, the group key needs to be recalculated and distributed again, which is inefficient. Moreover, the scheme does not give a specific method to delete group members, so it is not suitable for dynamic groups in real life.

5.3.3 Improved Group Signature Based on Lattice After the pioneering work of Gordon et al. [26], many improved lattice-based group signature schemes have been proposed. According to its functionality, we divide those improved schemes into the scheme in static group and those which is in non-static group. The schemes in non-static group include whether to support group member revocation, support message dependent opening or support dynamic group, etc.

5.3.3.1

Static Group

In 2012, camenich et al. [27] extended the GKV scheme and proposed a general method for group signature of anonymous attribute tokens. Users are issued attributes containing credentials, which they can use to anonymously sign messages and generate tokens that display only a subset of their attributes. This scheme puts forward two new schemes based on lattice theory. According to whether the group administrator has open permission, they are divided into two types: those without open permission (A AT − O) and those with open permission (A AT + O). A special case of the signature scheme without open permission to the administrator is: the first group

5 Survey of Lattice-Based Group Signature

85

Fig. 5.1 Security analysis of the scheme

signature based on lattice proposed by Gordon [26]. The anonymity, traceability and unforgettablility of the scheme depend on different difficulties in the lattice, as shown in the Fig. 5.1. At the same time, Camenisch et al. extended the scheme to protect users from the attack of group managers. The scheme also includes a new lattice based signature aggregation tool and a verifiable CCKA2 security encryption. But its signature size is still linear with N. When the number of group members is too large, the efficiency is still too low. In 2013, Laguillaumie et al. [28] presented the first group scheme based on lattices, which overcame the linear relationship between the length of group signature and the number of group members. This scheme uses the group signature scheme of Boyen et al. [21]. But unlike Boyen et al. [21], each group member is distributed with a completely short base of the common lattice related to his identity instead of a single vector, it allows the signer to generate a temporary certificate. This scheme improves Lyubashevsky’s [29] zero knowledge proof protocol of ISIS problem  in ROM. The length of its public key and signature is O˜ n2 · log N bit size, and its anonymity and traceability need to depend on S I V Plog N · O˜ (n 8 ) and S I V Plog N · O˜ (n 5 ) respectively. This scheme only satisfies the relax anonymity notion(CPA-anonymity), but it can be extended to a completely anonymous group signature (CCA-anonymity). The difficulty in realizing CCA-anonymous is to find a way to open adversariallychosen signatures, which is different from the skill used by Camenisch et al. [27]. It uses a more economical scheme to replace the identity-based signature from Agrawal et al. [30], which depends on the verification password VK and one time signature. In 2015, Ling San et al. [31] came up with a simpler and shorter group signature from lattices based on ring. In this scheme, a new lattice-based cryptography tool was established: for Boyen’s [32] signature scheme, an effective message signature pair with statistical zero knowledge proof was established, which was the key to the group signature scheme, and could be used to design various privacy protection system, such as anonymous certificate [33]. Meanwhile, identity-based encryption scheme is used to achieve higher efficiency in ROM. Compared with the previous scheme, it has more simpler construction and weaker  difficulty assumption. The group public key and signature size are log N · O˜ n2 and log N · O˜ (n) respectively. If OTS is a strong one-time signature that can’t be forged, the scheme is CCA-anonymous if L W E n,q,χ is difficult in ROM, and it is full traceable if S I V P· O˜ (n2 ) is hard. Moreover, this

86

L. Zhang et al.

scheme can be extended to ring setting, resulting in an ideal lattice-based group ˜ · log N) signatures, and the group public key length and signature size are both O(n bit size. It can satisfy CPA-anonymous in ROM, if in the worst case S I V P· O(3.5) ˜ problem is difficult in the ideal lattice of ring R. And it satisfies traceability if in the worst case S I V P· O˜ (n2 ) problem is difficult on the ideal lattice of ring R. In 2015, Nguyen et al. [34] proposed a more simple and efficient lattice based group signature scheme in PKC, which was more efficient than the previous lattice based signature schemes. Through the split-SIS problem, a family of new hash functions are obtained. Using the fail Shamir protocol the proctol can attained NIZKPoK, ˜ which saves a O(log N ) factor in the group public key and signature size. Through the identity-based encryption of Agrawal et al. [30], the code technique is established. Similar to Laguillaumie et al. [28], this scheme supports the CPA-anonymity, which can be reduced to IBEs by applying CHK transformation [35]. When replacing CPA encryption with CCA encryption, the scheme can be expand to support the CCA -anonymity. In the random oracle model, this scheme can obtain CPAanonymity which is rely on the LWE assumption and it can achieve fully traceable by depending on SIS assumption. But the scheme can not revoke members. In 2016, Libert et al. [36] put forth a more efficient lattice-based group signature without trapdoor. This can boils down to providing evidence that it builds a Merkle tree accumulator based on SIS which can eliminate the need for GPV trapdoor. Among them, Merkle tree accumulator is based on a family of anti-attack hash functions. Zero knowledge proof system uses a statistical hiding and computational binding string commitment scheme which proposed by Kawachi et al. [37]. In the signature construction, this scheme uses the multi bit encryption scheme adopted by Regev et al. [24], which is presented in Kawachi et al.’s paper [38]. The full anonymity of the scheme is based on the difficulty of the L W E n,q,χ problem and the argument system is simulation sound. The full traceability is based on the problem. Moreover, the group member’s key generation difficulty of the S I V Po(n) ˜ in this scheme depends on running the accumulator algorithm rather than the former generation method. Sample a random binary vector x j ← {0, 1}m and an uniform random  vector A ← Z qn×m , build a Merkle tree, and obtain the root u, use TWitness R, d j , output

  ( j) ( j) ( j) , a witness w ( j) = ( j1 , j2 , . . . , j ) ∈ {0, 1} , w , w−1 , . . . w1 ∈ {0, 1}nk where d j is accumulated in u, ( j1 , j2 , . . . , j ) is the binary representation of j,   $ E . For i ∈ {1, 2}, Samdefine gsk [ j] = x j , d j , w ( j) . Then Sample B ←− Z n×m p E E ple Si ←− Z n×m , E i ← χ ×m E , Compute Pi = SiT · B + E i ∈ Z ×m . p P And we can obtain gpk := {A, u, B, P1 , P2 } ; gmsk := S1 ; gsk := (gsk [0] , . . . gsk [N − 1]) .

$

5 Survey of Lattice-Based Group Signature

5.3.3.2

87

Non-static Group

In real life, group members dynamically changes constantly. The original system can not guarantee that group members can be added or revoked in time. To construct a dynamic environment and satisfy many scenes, many scholars put forward some strategies. In 2014, Langlois et al. [39] proposed the first group signature based on lattice with verifier-local revocation (VLR) scheme at the PKC conference. Previously, lattice-based group signatures did not support revocation. The advantages of the scheme are: supporting group member revocation, logarithmic signature size, weaker security assumption for the security parameter n and the  number of group  maximum members N, group public key and signature size are O˜ n2 · log N and O˜ (n · log N) respectively. Compared with the previous algorithms, the efficiency has been greatly improved. And it doesn’t rely on the encryption algorithm based on LWE during the construction. The main part is to build an interaction protocol, which allows a prover to persuade the verifier that he is a certificated group member and has not been revoked. The protocol needs to be run many times to ensure that the reliability error is very small. It can be transformed into signature scheme by the heuristic algorithm of Fait-Shamir heuristic. The scheme uses Bonsai tree [40] structure in hard random lattice, and string commitment scheme which is proposed by Kawachi et al. [37] used for the protocol to ensure the revocation mechanism. In the random oracle model, the scheme is considered as selfless anonymous, it relies on the fact that COM is a statistically hiding string commitment scheme, and its traceability is based on the difficulty of ∞ problem, also can be reduced to the hardness of S I V Po˜2 n 15 problem S I Sn,(+1)m,q,2β ( ) in worst-case. In 2016, LiBert et al. [41] advanced the first group signature scheme based on standard lattice theory supporting message-dependent opening (MDO). This scheme is based on Ling [31] scheme, adding the second layer of encryption with IBE, tied together with appropriate zero knowledge proof. Because opening ability is split two parts, anonymity is divided into two parts: anonymity for admitter and anonymity for opener. The fully anonymity of admitter is based on the difficulty of LWE assumption in ROM and the strong unforgeability of one time signature. The anonymity of opener is based on the difficulty of LWE assumption in ROM. The full traceability of this scheme is based on the difficulty of SIS assumption in ROM. In this scheme, the TrapGen (gpk, msk AD M , M) step is added to generate the token tm of the message. The message can only be opened by using the opening key and tm . This scheme has also been proved to be able to be used in dynamic growth groups. In order to solve the problems of user’s dynamically update, at the same year, Libert et al. [42] put forward the first lattice-based group signature for dynamic groups. Through designing an efficient protocols for multi-block messages and developing a zero-knowledge argument system for dynamic joining protocol, it can guarantee secure when many users join together. This protocol allows that group manager sign on the user’s public key to generate membership certificate when new members join,

88

L. Zhang et al.

thus the protocol is round-optimal. The scheme’s signature size is O (log N ) for groups of up to N members. This scheme can against framing and misidentification attacks boils down to the hardness of SIS assumption. And assuming that the onetime signature is strongly unforgeable, then the scheme can obtain CCA-anonymity due to the hardness of LWE assumption in the random oracle model. In 2017, Ling et al. [43] came up with the first lattice-based group signature scheme which can easily realize the completely dynamic requirements in ROM at ACNS meeting, that is, users can flexibly join and leave the group. This scheme is based on the theory of Libert et al. [36]. It costs too much to realize full dynamic, and it does not depend on the trapdoor problem on lattice. It uses a simple and efficient updatable algorithm to improve the Merkle hash tree proposed by Libert et al. [36], so that the accumulated value is constantly changed without reconstructing the whole tree. In order to avoid frame attack, the scheme allows users to calculate (xi , pi ) by themselves in order to solve the enroll and revocation problems. The scheme allows each potential user to have a leaf on the tree and use the update algorithm to set the whole system: if the user does not join the group or has been revoked, the value of his leaf is 0. If the user joins the group, the value is his public key pi . The signature ˜ log N ), and the group public key size is O(λ ˜ 2 + λ log N ), and the user’s size is O(λ ˜ private key reaches the O(λ) + log N bit length. In random oracle model, this scheme satisfies anonymous if L W E n,q,χ problem is hard. This scheme satisfies traceability ∞ problem is hard. if S I Sn,m,q,1 In 2018, Ling San et al. [44] first proposed a constant size group signature based on lattice on PKC, which means that the size of the signature and group member N in the scheme are independent, only depending on the security parameters n, and does not need to fix the number of group members N. Its signature size, public key length ˜ ). The scheme supports the dynamic enroll and user’s private key length are all O(N group members (i.e. adding new group members). This scheme is fully anonymous if R L W E n,,q,χ is hard, and its traceability is based on the fact that RS I Sn,m,q, ˜ (n 2 ) ¯ O assumptions is hard in the random oracle model. The core of the scheme is to improve the ZKAoK of a valid message-signature pair for Ducas-Micciancio [45] signature scheme, and to add the following two facts that the certifier can prove to the verifier: He knows the private key corresponding to the group public key, and he encrypts correctly and outputs the ciphertext under the public key. The protocol can use the Fait-Shamir heuristic algorithm to transform into a non-interactive zero knowledge proof (NIZKAoK). In 2019, Xie, R. et al. [46] raised the improved lattice-based dynamic group signature scheme, which allows any user to join the group dynamically while realizing effective revocation. In addition, it ensures that the signature of any user in the system will not be forged by other users, and effectively prevents the occurrence of innocent frame. In addition, the security of the scheme is based on the difficulty problem on lattice in the random oracle model. After analysis, the scheme is effective in practical application.

5 Survey of Lattice-Based Group Signature

89

5.4 Conclusions From the perspective of space, the group signature experienced the evolution from static group signature to dynamic group signature. It gradually has evolved into a more practical signature and has been improved to meet different scenarios. Chronologically, it has experienced the research based on the traditional number theory, followed by the research of group signature against quantum computational attack, such as lattice, hash function, coding theory, multivariable theory, etc. This chapter focused on the development of lattice-based group signature. The lattice-based group signature has great potential in theory and practical value, but the progress of constructing efficient signature schemes is not very fast. Up to now, only a few improved schemes have been proposed, scholars should pay more attention on improved schemes in this field. Through the analysis and investigation of this chapter, we put forward several further research directions. Firstly, most of the current lattice-based group signature schemes have been constructed in the random oracle model, which means it needs strong security assumption. Hence, we think that studying how to constructing more efficient and simple lattice-based group signatures under the weaker security assumption is also an urgent problem to be solved. Secondly, The existing lattice-based group signature has the prevailing problem that it is inefficient in real life, which makes its application subject to many limitations. How to improve the efficiency, while taking into account the functionality and security, has become a problem worthy of further study. Thirdly, since many improvements of lattice-based group signature have been proposed, we should strive to explore its application scenarios and to improve its usability. Last but not least, studying the relationships among different signatures is beneficial and can generate a huge boost to each other. For those techniques used in lattice-based group signature’s construction, we can try to used in other signatures, such as for lattice-based blind signature, ring signature and so on. How to define and build them in a large framework are also worth studying.

References 1. Diffie, W., Hellman, M.: New directions in cryptography. IEEE Trans. Inf. Theory 22(6), 644– 654 (1976) 2. Chaum, D., Van Heyst, E.: Group signatures. In: Workshop on the Theory and Application of Cryptographic Techniques, pp. 257–265. Springer, Berlin, Heidelberg (1991) 3. Malina, L., Smrz, J., Hajny, J., et al.: Secure electronic voting based on group signatures, pp. 6–10 (2015) 4. Wang, B., Li, B., Li, H.: Knox: Privacy-preserving auditing for shared data with large groups in the cloud. In: Bao, F., Samarati, P., Zhou, J. (eds.) Applied Cryptography and Network Security. ACNS 2012. Lecture Notes in Computer Science, vol. 7341. Springer, Berlin, Heidelberg (2012)

90

L. Zhang et al.

5. Wang, B., Li, H., Li M.: Privacy-preserving public auditing for shared cloud data supporting group dynamics. In: 2013 IEEE International Conference on Communications (ICC), Budapest, 1946–1950 (2013) 6. Yu, Y., Mu, Y., Ni, J., Deng, J., Huang, K.: Identity privacy-preserving public auditing with dynamic group for secure mobile cloud storage. In: Au, M.H., Carminati, B., Kuo, CC.J. (eds.) Network and System Security. NSS 2015. Lecture Notes in Computer Science, vol. 8792. Springer, Cham (2014) 7. Guo, J., Baugh, J.P., Wang, S.: A group signature based secure and privacy-preserving vehicular communication framework. In: 2007 Mobile Networking for Vehicular Environments, Anchorage, AK, pp. 103–108 (2007) 8. Malina, L., Vives-Guasch, A., Castella-Roca, J. et al.: Telecommun Syst. 58, 293 (2015). https:// xs.scihub.ltd/, https://doi.org/10.1007/PS1235-014-9878-3 9. Bernstein, D.J.: Introduction to post-quantum cryptography. In: Bernstein, D.J., Buchmann, J., Dahmen, E. (eds.) Post-Quantum Cryptography. Springer, Berlin, Heidelberg (2009) 10. Park, S., Kim, S., Won, D.: ID-based group signature. Electron. Lett. 33(19), 1616–1617 (1997) 11. Bellare, M., Micciancio, D., Warinschi, B.: Foundations of group signatures: formal definitions, simplified requirements, and a construction based on general assumptions. In: Biham, E. (eds.) Advances in Cryptology - EUROCRYPT 2003. EUROCRYPT 2003. Lecture Notes in Computer Science, vol. 2656. Springer, Berlin, Heidelberg (2003) 12. Boneh, D., Boyen, X., Shacham, H.: Short group signatures. In: Franklin, M. (eds.) Advances in Cryptology - CRYPTO 2004. CRYPTO 2004. Lecture Notes in Computer Science, vol. 3152. Springer, Berlin, Heidelberg (2004) 13. Bellare, M., Shi, H., Zhang, C.: Foundations of group signatures: the case of dynamic groups. In: Menezes, A. (eds.) Topics in Cryptology - CT-RSA 2005. CT-RSA 2005. Lecture Notes in Computer Science, vol. 3376. Springer, Berlin, Heidelberg (2005) 14. Kiayias, A., Yung, M.: Group signatures with efficient concurrent join. In: Cramer, R. (eds.) Advances in Cryptology - EUROCRYPT 2005. EUROCRYPT 2005. Lecture Notes in Computer Science, vol. 3494. Springer, Berlin, Heidelberg (2005) 15. Delerablee, C., Pointcheval, D.: Dynamic fully anonymous short group signatures. In: Nguyen, P.Q. (eds.) Progress in Cryptology - VIETCRYPT 2006. VIETCRYPT 2006. Lecture Notes in Computer Science, vol. 4341. Springer, Berlin, Heidelberg (2006) 16. Khader, D.: Attribute based group signatures. IACR Cryptology ePrint Archive, 2007, 159.berg (2007) 17. Nakanishi, T., Fujii, H., Hira, Y., Funabiki, N.: Revocable group signature schemes with constant costs for signing and verifying. In: Jarecki, S., Tsudik, G. (eds.) Public Key Cryptography - PKC 2009. PKC 2009. Lecture Notes in Computer Science, vol. 5443. Springer, Berlin, Heidelberg (2009) 18. Sakai, Y., Emura, K., Hanaoka, G., Kawai, Y., Matsuda, T., Omote, K.: Group signatures with message-dependent opening. In: Abdalla, M., Lange, T. (eds.) Pairing-Based Cryptography Pairing 2012. Pairing 2012. Lecture Notes in Computer Science, vol. 7708. Springer, Berlin, Heidelberg (2013) 19. Bootle, J., Cerulli, A., Chaidos, P., Ghadafi, E., Groth, J.: Foundations of fully dynamic group signatures. In: Manulis, M., Sadeghi, A.R., Schneider, S. (eds.) Applied Cryptography and Network Security. ACNS 2016. Lecture Notes in Computer Science, vol. 9696. Springer, Cham (2016) 20. Ateniese, G., Camenisch, J., Hohenberger, S., et al.: Practical group signatures without random oracles. IACR Cryptology ePrint Archive 2005, 385 (2005) 21. Boyen, X., Waters, B.: Compact group signatures without random oracles. In: Vaudenay, S. (eds.) Advances in Cryptology - EUROCRYPT 2006. EUROCRYPT 2006. Lecture Notes in Computer Science, vol. 4004. Springer, Berlin, Heidelberg (2006) 22. Groth, J.: Fully anonymous group signatures without random oracles. In: Kurosawa, K. (eds.) Advances in Cryptology - ASIACRYPT 2007. ASIACRYPT 2007. Lecture Notes in Computer Science, vol. 4833. Springer, Berlin, Heidelberg (2007)

5 Survey of Lattice-Based Group Signature

91

23. Ajtai, M.: Generating hard instances of lattice problems. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, pp. 99–108. ACM (1996) 24. Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. J. ACM (JACM) 56(6), 34 (2009) 25. Gentry, C., Peikert, C., Vaikuntanathan, V.: Trapdoors for hard lattices and new cryptographic constructions. In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, pp. 197–206. ACM (2008) 26. Gordon, S.D., Katz, J., Vaikuntanathan, V.: A group signature scheme from lattice assumptions. In: Abe, M. (eds.) Advances in Cryptology - ASIACRYPT 2010. ASIACRYPT 2010. Lecture Notes in Computer Science, vol. 6477. Springer, Berlin, Heidelberg (2010) 27. Camenisch, J., Neven, G., Ruckert, M.: Fully anonymous attribute tokens from lattices. In: Visconti, I., De Prisco, R. (eds.) Security and Cryptography for Networks. SCN 2012. Lecture Notes in Computer Science, vol. 7485. Springer, Berlin, Heidelberg (2012) 28. Laguillaumie, F., Langlois, A., Libert, B., Stehle D.: Lattice-based group signatures with logarithmic signature size. In: Sako, K., Sarkar, P. (eds.) Advances in Cryptology - ASIACRYPT 2013. ASIACRYPT 2013. Lecture Notes in Computer Science, vol. 8270. Springer, Berlin, Heidelberg (2013) 29. Lyubashevsky, V.: Lattice-based identification schemes secure under active attacks. In: Cramer, R. (eds.) Public Key Cryptography - PKC 2008. PKC 2008. Lecture Notes in Computer Science, vol. 4939. Springer, Berlin, Heidelberg (2008) 30. Agrawal, S., Boneh, D., Boyen, X.: Efficient lattice (H) IBE in the standard model. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 553–572. Springer, Berlin, Heidelberg (2010) 31. Ling, S., Nguyen, K., Wang, H.: Group signatures from lattices: simpler, tighter, shorter, ringbased. In: Katz, J. (eds.) Public-Key Cryptography – PKC 2015. PKC 2015. Lecture Notes in Computer Science, vol. 9020. Springer, Berlin, Heidelberg (2015) 32. Boyen, X.: Lattice mixing and vanishing trapdoors: a framework for fully secure short signatures and more. In: Nguyen, P.Q., Pointcheval, D. (eds.) Public Key Cryptography - PKC 2010. PKC 2010. Lecture Notes in Computer Science, vol. 6056. Springer, Berlin, Heidelberg (2010) 33. Camenisch, J., Lysyanskaya, A.: An efficient system for non-transferable anonymous credentials with optional anonymity revocation. In: Pfitzmann, B. (eds.) Advances in Cryptology - EUROCRYPT 2001. EUROCRYPT 2001. Lecture Notes in Computer Science, vol. 2045. Springer, Berlin, Heidelberg (2001) 34. Nguyen, P.Q., Zhang, J., Zhang, Z.: Simpler efficient group signatures from lattices. In: Katz, J. (eds.) Public-Key Cryptography – PKC 2015. PKC 2015. Lecture Notes in Computer Science, vol. 9020. Springer, Berlin, Heidelberg (2015) 35. Canetti, R., Halevi, S., Katz, J.: Chosen-ciphertext security from identity-based encryption. In: Cachin, C., Camenisch, J.L. (eds.) Advances in Cryptology - EUROCRYPT 2004. EUROCRYPT 2004. Lecture Notes in Computer Science, vol. 3027. Springer, Berlin, Heidelberg (2004) 36. Libert, B., Ling, S., Nguyen, K., Wang, H.: Zero-knowledge arguments for lattice-based accumulators: logarithmic-size ring signatures and group signatures without trapdoors. In: Fischlin, M., Coron, J.S. (eds.) Advances in Cryptology - EUROCRYPT 2016. EUROCRYPT 2016. Lecture Notes in Computer Science, vol. 9666. Springer, Berlin, Heidelberg (2016) 37. Kawachi, A., Tanaka, K., Xagawa, K.: Concurrently secure identification schemes based on the worst-case hardness of lattice problems. In: Pieprzyk, J. (eds.) Advances in Cryptology - ASIACRYPT 2008. ASIACRYPT 2008. Lecture Notes in Computer Science, vol. 5350. Springer, Berlin, Heidelberg (2008) 38. Kawachi, A., Tanaka, K., Xagawa, K.: Multi-bit cryptosystems based on lattice problems. In: Okamoto, T., Wang, X. (eds.) Public Key Cryptography - PKC 2007. PKC 2007. Lecture Notes in Computer Science, vol. 4450. Springer, Berlin, Heidelberg (2007) 39. Langlois, A., Ling, S., Nguyen, K., Wang, H.: Lattice-based group signature scheme with verifier-local revocation. In: Krawczyk, H. (eds.) Public-Key Cryptography - PKC 2014. PKC 2014. Lecture Notes in Computer Science, vol. 8383. Springer, Berlin, Heidelberg (2014)

92

L. Zhang et al.

40. Cash, D., Hofheinz, D., Kiltz, E., Peikert, C.: Bonsai trees, or how to delegate a lattice basis. In: Gilbert, H. (eds.) Advances in Cryptology - EUROCRYPT 2010. EUROCRYPT 2010. Lecture Notes in Computer Science, vol. 6110. Springer, Berlin, Heidelberg (2010) 41. Libert, B., Mouhartem, F., Nguyen, K.: A lattice-based group signature scheme with messagedependent opening. In: Manulis, M., Sadeghi, A.R., Schneider, S. (eds.) Applied Cryptography and Network Security. ACNS 2016. Lecture Notes in Computer Science, vol. 9696. Springer, Cham (2016) 42. Libert, B., Ling, S., Mouhartem, F., Nguyen, K., Wang, H.: Signature schemes with efficient protocols and dynamic group signatures from lattice assumptions. In: Cheon, J., Takagi, T. (eds.) Advances in Cryptology - ASIACRYPT 2016. ASIACRYPT 2016. Lecture Notes in Computer Science, vol. 10032. Springer, Berlin, Heidelberg (2016) 43. Ling, S., Nguyen, K., Wang, H., Xu, Y.: Lattice-based group signatures: achieving full dynamicity with ease. In: Gollmann, D., Miyaji, A., Kikuchi, H. (eds.) Applied Cryptography and Network Security. ACNS 2017. Lecture Notes in Computer Science, vol. 10355. Springer, Cham (2017) 44. Ling, S., Nguyen, K., Wang, H., Xu, Y.: Constant-size group signatures from lattices. In: Abdalla, M., Dahab, R. (eds.) Public-Key Cryptography - PKC 2018. PKC 2018. Lecture Notes in Computer Science, vol. 10770. Springer, Cham (2018) 45. Ducas, L., Micciancio, D.: Improved short lattice signatures in the standard model. In: Garay, J.A., Gennaro, R. (eds.) Advances in Cryptology - CRYPTO 2014. CRYPTO 2014. Lecture Notes in Computer Science, vol. 8616. Springer, Berlin, Heidelberg (2014) 46. Xie, R., He, C., Xu, C., et al.: Ann. Telecommun. 74, 531 (2019). https://doi.org/10.1007/ PGLLS8243-019-00705-x

Chapter 6

Insight on Hybrid Organizational Performance: A Systematic Review Yuting Wu and Yonghong Long

Abstract The development of the hybrid organization, especially social enterprises (SEs), is widely concerned by the world. Although abundant researches focused on hybrid strategy and performance improvement, the hybrid organizational performance evaluation has not been well defined. Moreover, review from the site of performance, especially the systematic review, remains sparse. Performance assessment is important to know the productivity of the hybrid organizations strategy and operation. Therefore the article aims to find an answer for the assessment of hybrid organizational performance and to find a path for further study. The authors utilized a systematic review approach to find the answer and possible study path for the future. After a strict literature selection procedure, 41 articles were adopted to form the research. By answering 5 research questions, the authors conclude that balance dual gaol is critical for the hybrid organization to achieve satisfying performance. However, accountability for dual performance objectives is one of the most challenges for hybrid organizational governance. The authors suggest that mathematically understanding the quality of performance is necessary for the performance assessment.

6.1 Introduction The topic of hybrid organization began at the 1980s, and it soon becomes a global popular topic [1–3]. Scholars researched this topic on various dimensions, such as environment [4], organizational boundary [5], hybrid impact [6–11], intrinsic property [1, 3, 12–19], as well as the hybrid performance [2, 4, 20–38]. Although abundant researches focused on hybrid strategy and performance improvement, the researches that indicate the assessment for hybrid organizational performance are absent. This article aims to explore the existing research to find the answer for Y. Wu (B) · Y. Long School of Mathematics, Renmin University, No. 59 Zhongguancun Street, Haidian District, Beijing, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_6

93

94

Y. Wu and Y. Long

hybrid organizational performance assessment and to find a path for further study. More specifically, the authors study this issue by answering five questions as follows: 1. 2. 3. 4. 5.

What is a hybrid organization? Why hybrid organization is developed? What is the challenge for hybrid organizational strategic management? How could a hybrid organization achieve successful performance? What are the criteria for defining successful performance.

To make the research procedure transparent and replicable, the authors in this article utilized a systematic review approach. The research method was designed on the base of [39] Saunders et al. (2012, pp. 112–115) [40] and Moher et al. (2009). The relevant literature within the last 10 years experienced a strict screening process, which refers to define, identification, screening, eligibility and include. At last, 41 articles were adopted to form this article. There are four parts to the main body. The authors would first introduce the methodology to show the research process. And then, the authors show the result of research from six aspects: profile of the literature, hybrid organizing, hybrid impact, paradox, hybrid performance, and discussion. After that, the authors of this article would discuss the result synthetically. At last, the authors make conclude the research. The authors find that balancing dual gaol is critical for the hybrid organization to achieve satisfying performance. However, accountability for dual performance objectives is one of the most challenges for hybrid organizational governance. Performance quantization is an essential topic for the hybrid organization, but seldom been researched by scholars. Mathematic would empower strategic management of hybrid organization to be more rational and rigorous. Thus, mathematically understanding the quality of performance is not only beneficial to evaluate organizational performance in this complex and paradox context, but also meaningful to guide the quality improvement of the organizational strategy.

6.2 Methodology In this research, the authors utilized a systematic review approach for literature selection as the approach is suitable for forming a general overall picture of the research evidence and for guiding future research. The literature selection procedure was designed based on the work of [39] Saunders et al. (2012, pp. 112–115) and [40] Moher, et al. (2009) to ensure its transparent and replicable. There were five steps in the literature selection procedure. They are defining, identification, screening, eligibility and include. Firstly, the authors formulated review questions and defined the parameters of the research. To keep the review in high quality, only academic journals within the last 10 years were selected. Journals were selected within the subject area of economics and strategic management. This is because the hybrid organization is a topic that not only concerned by strategic management researchers, but also tracked by economists who focus themselves on social welfare issue. Besides, to explore the topic on a more

6 Insight on Hybrid Organizational Performance: A Systematic Review

95

general level, there wasn’t any limitation for geographic. This means the authors try to give a worldwide view on hybrid organizational performance. However, due to the language limitation of the authors, only English and Chinese publication were considered for selection. The keywords for searching were generated from the research questions. All the relevant words were combined in light of Boolean logic ([39] Saunders et al., 2012, p. 100) and were searched with a meta-search engine (TUTTO). This enables the authors to access enough relevant works of literatures as much as possible. More specifically, keywords were designed with an asterisk (*) to show the possible form of the word and with a question mark(?) to pick up its different spellings. The authors adopted two kinds of keywords: 1. On theoretical level, the authors adopted the terms that used in the review questions, such as hybrid*, organi?ation, and social enterprise. 2. On a more practical level, the high-frequency vocabulary that appears in this research topic would also be adopted. They are privat*, market*, institutional logic, multiple discipline, and paradox. The authors searched the keywords with a range of combinations according to the search strings of Boolean logic: every two words that from different levels were linked by “or” and every two words from the same level were linked by “and”. A meta-search engine (TUTTO) was also utilized to access a wide range of well-known academic databases, such as Airitilibrary, Academic Search Complete (EBSCO), ABI/INFORM Trade and Industry (ProQuest), Academic Search Premier (EBSCO), ABI/INFORM Global (ProQuest), ABI/INFORM Dateline (ProQuest), ABI/INFORM Archive Complete (ProQuest), Business Source Complete (EBSCO), Business Source Premier (EBSCO), Cambridge University Press, Communication Source (EBSCO), CNPeReading, Chinese Social Sciences Citation Index, Cnki.net, Database of Renmin University, EconLit with FullText (EBSCO), Elsevier ScienceDirect, Emerald-Management Xtra, EIU-ViewsWire (ProQuest), IMF E-Library, JSTOR Journals, Kluwer Online Journals, OCLC FirstSearch, Oxford University Press, ProQuest European Business, ProQuest Asian Business and Reference, PNAS, Sage Online Journals, Scopus, Springer Online Journals, Taylor and Francis Journals, SourceOECD Online Library, Superstar Journals Database, Vip Database, Wiley Online Library and Wanfang Data. Secondly, in the identification step, 988 related articles were found and 437 of them still left after duplicates removed. Thirdly, the authors went on a screening process, each of the 437 articles was screened by title and abstract to ensure that the articles are relevant to the research questions. 352 of the articles were excluded in this step, and then only 85 articles remained for the next steps selection. And then, in the eligibility step, the full-text of the 85 articles were read by the authors to ensure they are fully relevant to the topic of hybrid organization management. After full-text reading, 62 articles left for next step analyzing. Moreover, to keep the research rationale, rigorous and up to date, the authors improved the research questions after the massive read process.

96

Y. Wu and Y. Long

At last, the remaining articles were analyzed and synthesized to ensure they are not only strong relevant to the research topic, but also superior for reporting. The 62 articles were analyzed from multiple aspects, such as research question, study context, industry sector, methodology, sample size, key findings. 41 research articles are finally reported in this article.

6.3 Results 6.3.1 Profile of Literatures Hybrid organization development becomes a popular and high-concerned phenomenon globally. Table 6.1 below presents the details of the reviewed articles. Reviewing 41 research articles, the authors conclude that both developing and developed countries are highly concern about this issue. Scholars utilized qualitative method, quantitative method and their mixture to explore the topic. Scholars also used various research strategies, such as case study, survey, archival research, grounded theory as well as the literature review. However, the authors of this article find that most scholars prefer a qualitative method, especially a case study, to a quantitative method. Researchers are used to utilizing quantitative method for correlation analysis. None of them study the topic by establishing a mathematical model, though assessment of strategic performance needs to be defined clearly. So far, academics have focused on different dimensions of this issue. The authors classified the reviewed articles into four dimensions according to their research questions. They are hybrid organizing, hybrid impact, paradox, and performance. Table 6.2 shows the quantity of the reviewed articles of each dimension. The detail of the four dimensions is explained as follow: 1. Hybrid organizing refers to the research article that aims to give a general idea of hybrid organizations on the features, operating model, organizational environment and definition. 3 reviewed articled are about hybrid organizing; 2. The hybrid impact is the issue of influence, which hybrid organization brings about to the market, society and itself. 6 articles are relevant to hybrid impact. 3. The paradox is an inherent characteristic of a hybrid organization. It is also a highly concerned issue by a wide range of stakeholders. Reviewed articles in these dimensions are mainly focused on the influence of paradox to hybrid organizational performance and the management of paradox. 10 of the reviewed articles focus on paradox. 4. Hybrid performance refers to the research articles that concentrate on the criteria of hybrid organizational performance and that about impact of strategy on performance. There are 22 articles in this dimension.

6 Insight on Hybrid Organizational Performance: A Systematic Review

97

Table 6.1 Profile of Hybridity Researches Authors(s)

Methodological choice

Research strategy

Geographic

Focus Hybrid Organizing

Battilana and Lee (2014); Qualitative method; Luke and Chu (2013); Rivera-Santos et al. (2015)

Case Study; Survey; Africa; Sub-Saharan Literature Review Africa; Vietnam

Hybrid Impact

Jay (2013); Khieng and Dahles (2014); Mikolajczak (2019); Meyer et al. (2012); Raišiene and Urmanaviˇciene (2018); Zhao and Han (2019)

Qualitative method; mixed-method; quantitative method

Case Study; Survey; Austria; Cambodia; Archival research; Poland; USA Literature Review

Paradocx

Boitier (2018); Battilana Qualitative Method and Dorado (2010); Ciambotti and Pedrini (2019); Fossestol et al. (2015); Greenwood (2011); Ismail and Johnson (2019); Mason and Doherty (2016); Pache and Santos (2010); Pache and Santos (2013); Perkmann et al. (2019); Zoogah (2015)

Case Study; Survey; Africa; Bolivia; EU; Grounded Theory; France; Kenya; Literature Review Middle East and North Africa; Norway;

Hybrid Performance

Bagnoli and Megali Quantitative (2011); Battilana et al. Method; Qualitative (2013); Bhattarai et al. Method (2019); Carnochan et al. (2014); Cornforth (2014); Dacin et al. (2010); Defourny and Nyssens (2017); Davies and Doherty (2018); Ebrahim and Rangan (2010); Ebrahim et al. (2014); Florin and Schmidt (2011); Liu et al. (2014); Liu et al. (2015); Lashitew et al. (2018); Powell et al. (2018); Puspadewi et al. (2019); Santos et al. (2015); Tracey et al. (2011); Taysir and Taysir (2012); Weerawardena and Mort (2012); Zhang and Swanson (2013)

Survey; Case Study; Grounded Theory; Archival Research; Literature Review

Australia; Balatic State; Canada; France; Indonesia; Japan; Kenya; United Kingdom; USA

98

Y. Wu and Y. Long

Table 6.2 Distribution of Reviewed Articles 2010–2014 Focus Hybrid Organizing Hybrid Impact Paradocx Hybrid Performance Total

2 3 4 14 23

2015–2019

Total

1 3 6 8 18

3 6 10 22 41

6.3.2 Hybrid Organizing The hybrid organization incorporates pluralistic institutional logics under a “single organizational roof” [6]. Social enterprise (SEs), which combine social welfare logic and commercial logic in the same organization, are always considered as an ideal sample to study “a creative variety of hybrid” [17, 41]. For both profit-oriented organizations and non-profit organization, the blurred boundary between social and commercial goals tends to be ambiguous, providing a chance for the development of social enterprise [2, 13]. Defourny and Nyssens (2017) classified social enterprises as four types: entrepreneurial non-profits, social business, social cooperatives, and public-sector social enterprise. Although they are different, they are all supposed to fulfill the social mission[5]. Luke and Chu (2013) explain that social enterprise derived from a non-profit background. The enterprise engages in commercial activities and utilize market base technique to fulfill its social mission. Unlike a for-profit organization that perceives social value as a by-product of profit creation, a social enterprise try to realize the social mission and self sustain via commercial activities [4, 26]. This phenomenon is defined as “the marketization of welfare” by [42]. Social enterprises face with complex external and internal environment: in terms of external environment, on the one hand, SEs need to compete with each other for fund, donation and governments grant, on the other hand, they against with commercial company to capture market share and to be self sufficient [24]; In terms of the internal environment, SEs are required to reconcile multiple logics [6, 17] explain that hybrid organization should combines commercial logic and social logic through selective coupling which helps to integrate the multiple logics and to avoid the risk of being faking compliance. Due to its pluralistic institutional logics, social enterprise stands at the intersection of the market, public policies and public welfare, where is also the point of opportunities, creativity and risk [20].

6 Insight on Hybrid Organizational Performance: A Systematic Review

99

6.3.3 Hybrid Impact The social enterprise was born to generate creative solutions for the complex environment [6]. The multiple logics enabled SEs to achieve capacity that a single logical entity could not access [18, 23]. Facing with the challenges of low organizational capacity, increasing demands as well as the declining public funding, market-oriented was introduced to public sector with the hope that the hybridity would help to ease social problems and to create social value [8, 17, 25, 27]. Battilana and Lee [43] suggest that, in some conditions, the combing logics provide the organization with more opportunities, autonomy, and flexibility. Researchers proved that the financial returns which a social enterprise creates not only allow them to secure organizational capital, but also protect them from being heavily controlled by its supporter. Mikolajczak [8] tried to find the influence of commercial logics on non-governmental organizations (NGOs) and to make sure the key drivers of NGOs marketization. The research utilized a mixed-method and collected data from 3800 Polish NGOs. It finally agrees that commercial logic offers an extra chance for social enterprise to gain funds, which allows the enterprise to become more independent from its donor. In these circumstances, they care less about the donors interest than before. Khieng and Dahles [7] who explored commercialization of the non-profit sector in Cambodia also insist that the financial returns rescue the social enterprise from driving largely by the donors agenda and condition. They explain that self-generated income endowed bargaining power for social enterprise in Cambodia to negotiate with donors. These enterprises may more possible to generate an innovative strategy to deal with the needs of donors. The commercial logic also helps to advance the sustainability of social enterprise as a higher organizational legitimacy would be established. The legitimacy, which refers to organizationals effectiveness, efficiency, innovations and stakeholders needs, helps to prevent an enterprise from crises and influence of unfavorable decisions [7, 9] argue that business-like behavior is beneficial for the non-profit organizations to advance their governance as the financial system and administration system become more transparent and accountable. Social enterprise shows more professional and standardized management and daily operation. Mikolajczak [8] stresses that comparing with NGOs management, social enterprises have a well-defined strategy for future development. The researchers also find that commercial logics of social enterprise result in more regular operation, greater flexible working-time and more thorough financial calculation. Besides, [9] infer that social enterprises advanced their performance while the business like behavior drives them to meet the markets needs. However, according to [26], though the government continuously highlights the importance of SEs in public service delivery, little empirical evidence is available. Santos et al. [4] indicate that though SEs promise to fulfill a social mission, mission drift alway happens. Raisiene and Urmanaviciene [11] also conclude that mission drift is inherent even in the countries that are mature in hybrid organizational performance. Lashitew et al. [27] shows that social enterprises change their social mission

100

Y. Wu and Y. Long

more often when they capture revenue via commercial activities. The commercial logic that embedded in social enterprise always result in redefining of social mission. Mission drift would happen under the market and financial pressure if the company fails to find an equilibrium between welfare realization and profit earning [7, 38] also find that both financial stability and potential for mission drift increased when the company conducts commercial and social charity activities. Zhao and Han [10] studied the risks and tensions of Chinese social enterprise at the perspective of scaling strategy. The researchers demonstrate that social enterprise, which takes the social responsibility to serve more public and cover a wider geographic region (scaling wide), might encounter cognitive legitimacy risk and legal legitimacy risk. In other words, stakeholders find social enterprise as an organizational form that out of established social categories. Therefore stakeholders would perceive this form as neither feasible nor legal. Moreover, when social enterprise intends to address social problems more effectively or more deeply (scaling deep), the social enterprise might suffer from not only cognitive legitimacy risk but also financial self-sufficiency risk and operational risk. This is because the stakeholder is unwilling to provide funds for these less economic rationale activities. The company may also face the risk of service delivery failure in its operation process.

6.3.4 Paradocx Greenwood et al. [14] suggest that the logics embedded in a hybrid organization may not always compatible and that the pattern of institutional complexity is always dynamic. Some researchers also disclose that different institutional logics inserted in the hybrid organization are contradicted with each other by nature [3, 44]. The commercial and social dimensions directly lead to tensions for social enterprises as the multiple institutional logics result in novel combinations of tacit knowledge, capital, and regimes of justification-means and ends-resulting [6, 13, 31, 45]. The conflicts would reflect in organizational identities, resource allocation, strategic positioning, employment, socialization, organizational design, governance, and stakeholder management [12, 13, 15, 26]. Due to the inherent paradox, members of SEs have to contend with competing external demand and internal organizational identity [16]. The paradox might cause organizational insatiability and change, therefore lead to low capacity in the strategic decision, or even result in the collapse of hybridity [6]. The way for paradox management still under continual experimentation [46]. Due to the dual goals and the inherent resource-scarce environment, the problem of resources constrain, especially in developing countries, is amplified in social enterprise [7, 13]. Resources, such as skilled people, raw materials and technologies, are especially scarce in the hybrid organization[19, 21, 47] also suggest that social enterprises have to face tradeoffs on beneficiaries and labor allocation across social and commercial activities. Zhang and Swanson [28] admit that in the social enterprise context, tradeoffs must be made between social mission and business management. However, they

6 Insight on Hybrid Organizational Performance: A Systematic Review

101

considered the dual objectives not as a paradox, but as converge. They insist that a positive correlation exists between economic value creation and the social objective. In other words, the financial returns of social enterprise would help to maintain or expand the companys social objectives. Moreover, the study of [28] shows that the more social objectives an enterprise achieves, the greater the possibility that the company will define itself as a social entrepreneurial. Some scholar points out that in a dual mission context, the conflicts could be defused more or less with some strategies. Perkmann et al. [18], did an in-depth case study with eight universities in the EU, indicate that actual structural hybrids who highly integrate the competing logics only within space and maintain dominant logic are less likely to suffer external legitimacy problems and less demand for conciliatory mechanisms. Ciambotti and Pedrini [13] also indicate that by utilizing hybrid harvesting strategies, the resource constrain that cause by paradox could be overcome. Hybrid harvesting strategies allow SEs to access the resources that are normally hidden, unavailable or expensive.

6.3.5 Hybrid Performance The hybrid organization faced with multiple criteria for performance [45]. The organization should keep the balance between social mission and financial goal to achieve sustainability [4, 27, 32, 33]. Some researchers also stress that managing paradox is the key to obtain successful performance [20, 28, 35]. Puspadewi et al. [20] indicate the importance of managing the paradox for social enterprise performance. They also infer that due to the logics complexity, managers in social enterprise are required to understand the information from multiple angles to achieve dual goals. They also found that both commercial behavior and social behavior are significantly positive to social enterprises sustainability. However, tension and mission drift may happen when a hybrid organization tries to balance the dual goals [31, 45]. Due to the competing logics inserted in hybrid social enterprise, members of a hybrid organization usually find their performance ambiguous and paradoxical [6]. In other words, they may success when interpreting the outcome in terms of social logic, but fail in terms of commercial logic. However, mission drift would occur when SEs become more focus on financial returns [6, 37] explains that it is difficult for social enterprises to integrate social mission and financial needs in the decision-making process. In this circumstance, scholars try to find the elements of organizational strategy that affect organizational performance. Many scholars emphasize the role of social identity for avoiding mission drift [21, 27] while [26] underlines the importance of understanding the needs of their multiple stakeholder groups. In the study of [21], the result indicates that the high organizational productive and that the social logic embedded in an organization would benefit social performance. However, social logic is negatively correlated with organizational productivity. Bhattarai et al. [22] utilized a quantitative method to explore the influence factors of SEs performance.

102

Y. Wu and Y. Long

By investigated 164 UK social enterprises, they proposed that a strong positive correlation exists between market orientation and social enterprises performance, which refers to financial performance and social performance. This result is also supported by the study of [29, 30] and Morgan et al. (2009). However, market disruptiveness capability, which refers to radical products and services innovative ability, only have a positive effect on commercial performance. This phenomenon was considered by the researches as a result of an investment mistake. This is because the radical innovations of the SEs always ignore their social mission that the product/service is design for its beneficiaries. Thus, to add social value for market disruptiveness capability, the research appealed to social enterprises to keep their social mission in mind. This research also proved that by utilize market orientation and market disruptiveness capability simultaneously, the social enterprise would be stuck in the middle and would not get advantages from any of the strategies. Scholars have researched heavily on the factors that influence organizational performance. However, the quantitative standard of strategic performance is still missing. Ebrahim and Rangan [36] find that the evaluation of social performance is lack of standardization and comparability while evaluation for financial assessment is well established. The study of [37] also conclude that accountability for dual performance objectives is one of the most challenges for hybrid organizational governance.

6.4 Discussions By analyzing 41 articles, the authors answered five questions: 1. What is a hybrid organization? 2. Why hybrid organization is developed? 3. What is the challenge for hybrid organizational strategic management? 4. How could a hybrid organization achieve successful performance? 5. What are the criteria for defining successful performance. The section of hybrid organizing answered the first research question. Hybrid organization incorporate multiple institutional logic within one organization. The social enterprise is a typical type of hybrid organization. By participating in commercial activities, it is supposed to be autarkic to fulfill the social mission. Thus, social enterprise has two organizational goals. One is the social welfare mission goal, another is the profit goal. Due to its pluralistic institutional logics, SEs face with a complex environment. They not only have to competing for fund, donation and governments grant with each other, but also fight against the for-profit organization for market share. The section of hybrid impact answered the second and third questions. To survive in the changing environment, market-oriented was utilized by public sector. Business-like behavior allows the non-profit organization to access more opportunities, autonomy, and flexibility. Operating as a social enterprise also enables them to gain higher financial stability, higher organizational legitimacy and more market recognition. However, social enterprise makes problems while they attempt to address problems. Mission drift always happens because of commercial activities.

6 Insight on Hybrid Organizational Performance: A Systematic Review

103

Some researchers also find that legal legitimacy risk, cognitive legitimacy risk, financial self-sufficiency risk, and operational risk would occur when social enterprise intends to scale up its impact on social welfare. The section of paradox also answered the third question. This section gave a critical view of the challenges of hybrid organizations. The combined logics in the hybrid organization is paradox inherently. The paradox leads to competing external demand and internal organizational identity. This might cause organizational insatiability and change, therefore lead to low capacity in the strategic decision, or even result in the collapse of hybridity. However, the institutional logics that embed in the hybrid organization does not completely conflict with each other. Moreover, the problems that result from paradox could be eliminated by some strategies, such as hybrid harvesting strategies and actual structural hybrids. The hybrid performance answered the forth and last questions. The authors find that managing the paradox is the key for hybrid organizational strategy. The hybrid organization needs to achieve both social welfare mission goal and profit goal simultaneously to realize satisfying organizational performance. The existing researches prefer to analyze the correlation of performance with various factors. However, the quantization of the social enterprises performance hasn’t been studied sufficiently. Performance quantization is critical to the research of hybrid organizations as the boundary between non-profit and for profit organizations as well as the boundary between the commercial goal and social welfare mission are becoming more and more ambiguous. Mathematically understanding the quality of performance is not only beneficial to evaluate organizational performance in this complex and paradox context, but also able to guide the quality improvement of the organizational strategy.

6.5 Conclusions In this chapter, the authors explored the existing research to find an answer for the assessment of hybrid organizational performance and to find a path for further study. The authors in this chapter utilized a systematic review approach to keep the research transparent and replicable. They designed the methodology on the base of [39, 40]. The relevant literature within the last 10 years experienced a strict screening process, which refers to define, identification, screening, eligibility and include. Finally, 41 articles were adopted to answer the research questions. More specifically, the authors answered: 1. What is a hybrid organization? 2. Why hybrid organization is developed? 3. What are the challenge for hybrid organizational strategic management? 4. How could a hybrid organization achieve successful performance? 5. Whats the criteria for defining successful performance. The research answered most of the questions. However, the author finds that although abundant researches on hybrid organizational performance exist, the accountability for dual performance objectives of hybrid organizational context is still an unsolved problem. Performance quantization is an important topic for the hybrid organization, but seldom been researched by scholars. Mathematic would empower the strategic management of hybrid orga-

104

Y. Wu and Y. Long

nizations to be more rational and rigorous. Thus, mathematically understanding the quality of performance is not only beneficial to evaluate organizational performance in this complex and paradox context, but also able to guide the quality improvement of the organizational strategy. Although there is no geographic constraint for the article selection, due to the language limitation, the authors only adopted articles with English or Chinese. This may influence the reliability that the article attempt to give a global view for hybrid organizational performance. By reviewing 41 research articles, the authors find accountability of hybrid organizational performance is a gap in the research field. Further study would pay more attention to the performance quantization of the hybrid organization.

References 1. Fossestol, K., Breit, E., Andreassen, T.A., Klemsdal, L.: Managing institutional complexity in public sector reform: hybridization in front-line service organizations. Public Adm. 93(2), 290–306 (2015) 2. Defourny, J., Nyssens, M.: Fundamentals for an international typology of social enterprise models. Voluntas 28(6), 2469–2497 (2017) 3. Battilana, J., Dorado, S.: Building sustainable hybrid organizations: the case of commercial microfinance organizations. Acad. Manag. J. 53(6), 1419–1440 (2010) 4. Santos, F., Pache, A.-C., Birkholz, C.: Making hybrids work: aligning business models and organizational design for social enterprises. Calif. Manag. Rev. 57(3), 36–58 5. Luke, B.G., Chu, V.: Social enterprise versus social entrepreneurship: an examination of the ‘why’ and ‘how’ in pursuing social change. Int. Small Bus. J. 31(7), 764–784 (2013) 6. Jay, J.: Navigating paradox as a mechanism of change and innovation in hybrid organizations. Acad. Manag. J. 56(1), 137–159 (2013) 7. Khieng, S., Dahles, H.: Commercialization in the non-profit sector: the emergence of social enterprise in Cambodia. J. Soc. Entrep. 6(2), 218–243 (2015) 8. Mikolajczak, P.: Becoming business-like: the determinants of NGOs? marketization turning into social enterprises in Poland. Oeconomia Copernicana 10(3), 537–559 (2019) 9. Meyer, M., Buber, R., Aghamanoukjan, A.: In search of legitimacy: managerialism and legitimation in civil society organizations. Voluntas 24(1), 167–193 (2013) 10. Zhao, M., Han, J.: Tensions and risks of social enterprises? scaling strategies: the case of microfinance institutions in china. J. Soc. Entrep. 1–21 (2019) 11. Raisiene, A.G., Urmanaviciene, A.: Mission drift in a hybrid organization: how can social business combine its dual goals? Ekonomski Vjesnik 30(2), 301–310 (2017) 12. Boitier, M., Riviere, A., Wenzlaff, F., Hattke, F.: Hybrid organizational responses to institutional complexity: a cross-case study of three European universities. Manag. Int. 22(4), 121–135 (2018) 13. Ciambotti, G., Pedrini, M.: Hybrid harvesting strategies to overcome resource constraints: evidence from social enterprises in Kenya. J. Bus. Ethics (1) (2019) 14. Greenwood, R., Raynard, M., Kodeih, F., Micelotta, E., Lounsbury, M.: Institutional complexity and organizational responses. Acad. Manag. Ann. 5(1), 317–371 (2011) 15. Ismail, A., Johnson, B.: Managing organizational paradoxes in social enterprises: case studies from the Mena region. Voluntas 30(3), 516–534 (2019) 16. Pache, A., Santos, F.M.: When worlds collide: the internal dynamics of organizational responses to conflicting institutional demands. Acad. Manag. Rev. 35(3), 455–476 (2010) 17. Pache, A., Santos, F.M.: Inside the hybrid organization: selective coupling as a response to competing institutional logics. Acad. Manag. J. 56(4), 972–1001 (2013)

6 Insight on Hybrid Organizational Performance: A Systematic Review

105

18. Perkmann, M., Mckelvey, M., Phillips, N.: Protecting scientists from Gordon Gekko: how organizations use hybrid spaces to engage with multiple institutional logics. Organ. Sci. 30(2), 298–318 (2019) 19. Zoogah, D.B., Peng, M.W., Woldu, H.: Institutions, resources, and organizational effectiveness in Africa. Acad. Manag. Perspect. 29(1), 7–31 (2015) 20. Puspadewi, I., Soetjipto, B.W., Wahyuni, S., Wijayanto, S.H.: Managing paradox for the sustainability of social enterprises: an empirical study of forestry community cooperatives in Indonesia. J. Soc. Entrep. 10(2), 177–192 (2019) 21. School, H.B.: Keeping a foot in both camps: understanding the drivers of social performance in hybrid organizations. Harvard Bus. Rev 22. Bhattarai, C., Kwong, C., Tasavori, M.: Market orientation, market disruptiveness capability and social enterprise performance: an empirical study from the United Kingdom. J. Bus. Res. 96, 47–60 (2019) 23. Tracey, P., Phillips, N., Jarvis, O.: Bridging institutional entrepreneurship and the creation of new organizational forms: a multilevel model. Organ. Sci. 22(1), 60–80 (2011) 24. Weerawardena, J., Mort, G.S.: Competitive strategy in socially entrepreneurial nonprofit organizations: innovation and differentiation. J. Public Policy Mark. 31(1), 91–101 25. Carnochan, S., Samples, M., Myers, M., Austin, M.J.: Performance measurement challenges in nonprofit human service organizations. Nonprofit Volunt. Sect. Q. 43(6), 1014–1032 (2013) 26. Powell, M., Gillett, A., Doherty, B.: Sustainability in social enterprise: hybrid organizing in public services. Public Manag. Rev. 21(2), 159–186 (2019) 27. Lashitew, A.A., Lydia, B., van Tulder, R.: Inclusive business at the base of the pyramid: the role of embeddedness for enabling social innovations. J. Bus. Ethics 28. Zhang, D.D., Swanson, L.A.: Social entrepreneurship in nonprofit organizations: an empirical investigation of the synergy between social and business objectives. J. Nonprofit Public Sect. Mark. 25(1), 105–125 29. Liu, G., Takeda, S., Ko, W.W.: Strategic orientation and social enterprise performance. Nonprofit Volunt. Sect. Q. 43(3), 480–501 (2014) 30. Liu, G., Eng, T.-Y., Takeda, S.: An investigation of marketing capabilities and social enterprise performance in the UK and Japan. Entrep. Theory Pract. 39(2), 267–298 31. Cornforth, C.: Understanding and combating mission drift in social enterprises. Soc. Enterp. J. 10(1), 3–20 (2014) 32. Bagnoli, L., Megali, C.: Measuring performance in social enterprises. Nonprofit Volunt. Sect. Q. 40(1), 149–165 (2011) 33. Taysir, E.A., Taysir, N.K.: Measuring effectiveness in nonprofit organizations: an integration effort. J. Transnatl. Manag. 17(3), 220–235 34. Florin, J., Schmidt, E.: Creating shared value in the hybrid venture arena: a business model innovation perspective. J. Soc. Entrep. 2(2), 165–197 (2011) 35. Dacin, P.A., Dacin, M.T., Matear, M.: Social entrepreneurship: why we don’t need a new theory and how we move forward from here. Acad. Manag. Perspect. 24(3), 37–57 (2010) 36. Ebrahim, A., Rangan, V.K.: The limits of nonprofit impact: a contingency framework for measuring social performance. Social Science Electronic Publishing (10-099) (2010) 37. Ebrahim, A., Battilana, J., Mair, J.: The governance of social enterprises: mission drift and accountability challenges in hybrid organizations. Res. Organ. Behav. 34, 81–100 (2014) 38. Davies, I.A., Bob, D.: Balancing a hybrid business model: the search for equilibrium at cafedirect. J. Bus. Ethics 39. Catterall, M.: Research methods for business students 3(4), 215–218 (2003) 40. Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G.: Preferred reporting items for systematic reviews and meta-analyses: the prisma statement. BMJ 339(7), 1006–1012 (2009) 41. Soteri-Proctor, A.: Hybrid organizations and the third sector: challenges for practice, theory and policy ? edited by David Billis 45(3), 328–330 42. Salamon, L.M.: The marketization of welfare: changing nonprofit and for-profit roles in the American welfare state. Soc. Serv. Rev. 67(1), 16–39 (1993)

106

Y. Wu and Y. Long

43. Battilana, J., Lee, M.: Advancing research on hybrid organizing ? insights from the study of social enterprises. Acad. Manag. Ann. 8(1), 397–441 (2014) 44. Kinch, M.S.: The rise (and decline?) of biotechnology. Drug Discov. Today 19(11), 1686–1690 (2014) 45. Stark, D.: The Sense of Dissonance (2009) 46. Mason, C., Doherty, B.: A fair trade-off? paradoxes in the governance of fair-trade social enterprises. J. Bus. Ethics 136(3), 451–469 47. Riverasantos, M., Holt, D., Littlewood, D., Kolk, A.: Social entrepreneurship in sub-Saharan Africa. Acad. Manag. Perspect. 29(1), 72–91 (2015)

Chapter 7

The Complex Systems’ Methods in Financial Science and Technology Wei Wang

Abstract Financial systems are determined by the activities engaged by thousands of people, corporates, and countries, which are with different wishes or demands. The complexity is inevitable not only for financial markets but also for financial supervision and service. In this chapter, some of the methods, such as, the hierarchical structure of financial markets, multiscale analysis and causality analysis of the financial factors, and so on, are summarized. For modelling the evolution of the main financial indexes, a novel fractal structure model, which is iterated by self-interactive process with the external factors, is proposed. And the relevant methods proposed for the further analysis of financial systems are also been reviewed. Finally, from the viewpoint for the future development of financial technology, we propose some considerations that need to deal with, for example, the methods on advanced learning and intelligent automation or optimization for complex systems.

7.1 Introduction With the development of information science and technology, the intension and extension of the word on finance have been extended. The combination of finance with science and technology has been the essential feature of finance. There will be new demands not only for the traditional financial analysis, such as market’s analysis, decision, supervision, risk control, but also for the related analysis in financial service or inclusive finance, etc. Because financial systems are composed by the activities engaged by thousands of people, corporates, and countries, which are with different wishes or demands, the complexity is inevitable not only for financial markets but also for financial supervision and service [1, 2]. That will bring about new challenge for the related analytical methods.

W. Wang (B) School of Mathematics, Renmin University of China, No. 59, Zhongguancun, Beijing, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_7

107

108

W. Wang

As a matter of fact, there have been many methods proposed for the analysis of financial systems, for example, the hierarchical structure in financial markets [3], multiscale analysis [4, 5], and causality analysis of financial factors [6], and so on. However, many of the methods are based on the regression analysis on the relevant factors of the financial phenomena [7, 8]. That is to say, they are about the statistical analysis or the analysis on the relevance of different factors. There is little concern about the complexity for the internal mechanism of financial phenomena or financial systems [9, 10]. In this chapter, some of the methods, such as, the hierarchical structure of financial markets, multiscale analysis and causality analysis of financial factors, are summarized. For modelling the evolution of the main financial indexes, a novel fractal structure model, which is iterated by self-interactive process with the external factors, is proposed. And the relevant methods proposed for the further analysis of financial systems are also been reviewed. Finally, from the viewpoint of us, for the future development of financial technology, there will be a strong called for of advanced learning and intelligent automation or optimization. The arrangement of this chapter is as follows. In Sect. 7.2, some of the complex systems’ methods proposed for the analysis of financial systems have been reviewed briefly. In Sect. 7.3, for modelling the evolution of the financial processes derived from the internal mechanism, the complex systems’ method, which is iterated by the self-interactive process with external factors, is considered. In Sect. 7.4, some of the methods related to the further financial analysis have been summarized. In Sect. 7.5, from the viewpoint of us, in order to have a better understand for the complexity of financial systems, the urgent tasks are to develop the methods on advanced learning or intelligent optimization.

7.2 Some Methods for the Complexity of Financial Systems As mentioned above, the complexity of financial systems is inevitable. As a matter of fact, there have been also some methods proposed for that. Here, we will give a very brief reviewed on those methods.

7.2.1 Hierarchical Structure in Financial Systems Owing to the hierarchical structure of societies or countries, the hierarchical structure of financial systems is a natural one [1, 3]. It indicates that financial systems are complex systems. To the hierarchical structure of financial systems, it has been analyzed by defining correlation coefficient, distance matrix and graph structure in topological space [3].

7 The Complex Systems’ Methods in Financial Science and Technology

109

Such kind of analysis gives us a more intuitive description to the relationships among the different factors or integrants. It is helpful for the analysis of financial systems from top to down or from down to up. However, such kind of relevance analysis can no longer meet the needs for solving the problem with huge amount of data, especially for the quantitative analysis of complex financial systems. To get a more powerful analysis, multiscale analysis has become an useful method.

7.2.2 Multiscale Analysis Because of the complexity of financial systems, and with the development of information science and technology, the data which concern about financial activities are huge enough. That makes the analysis of financial markets very difficult. Except for the analysis on the hierarchical structure, the multiscale analysis is an important method [4, 5]. Such kind of analysis can make us to have a more efficient processing for the data, or grasping the main contradiction of the analysis. However, as to the accuracy of the multiscale analysis, it is more like the traditional statistical analysis. It concerns more about the data for the financial markets in certain scale but without the concerns about the internal mechanism of the financial systems.

7.2.3 The Causality Analysis of Financial Markets The analytical methods mentioned above are based on the statistical analytical methods. They depend on the sample obtained. At the same time, they cannot reflect the master-slave relationship among the factors, and they cannot reflect the motivation for the evolution of financial systems. It means that the methods don’t reflect the causality of financial systems. In fact, there are many kinds of causality, for example, the supply-demand relationship of money, human and natural resources, etc., is the main factor for the evolution of financial systems. Demand is the cause, and supply is the result in financial systems. For such kind of analysis on financial markets, as the well known fact, Brownian motion is the basic mathematical model used for the description of financial markets [11]. As it is known that the phrase Brownian motion is used to describe the phenomena, which have considerable details and are used as approximations of other stochastic motion patterns. The mathematical motion is related to, but more structured than, the random walk, in which the displacement of a particle is entire randomized. Such model is widely used for the model of financial analysis. However, such kind of model has insufficient because the main parts of financial activities are the people, which are not the same as molecules. People are those with

110

W. Wang

their own wishes and demands. Their activities are more likely determined by internal and external factors. That will result on the financial processes are more complex than the motion process of molecules or even the diffusion process with jump [12].

7.2.4 The Challenge of the Analysis on Financial Systems For the challenges of the analysis on financial systems, first of all, it is that there is no proper model to reflect the essential characteristics from the internal mechanism of financial systems [13, 14]. The second one is that the data in financial computation or financial engineering are all with serious uncertainties. Models based on traditional time series or stochastic methods are often depending on past samples, which will seriously influence the adaptability of the model to the drastic changing of process, and also influence the accuracy of the estimation or prediction [15]. At the same time, with the development of science and technology, especially the information technology, the appearance of big data, machine learning, artificial intelligence, and blockchain etc., these technologies provide new possibility for financial activities. They also bring new problems for the innovation of financial analysis. How can we deal with the new challenges in financial technology? We’d better to find new methods for that. For example, except for the powerful methods for dealing with the causality, the learning methods for the complex process is necessary. From the viewpoint of machine learning, developing advanced learning method and analyzing the law of evolution from causality analysis is the urgent task at present. It is based on the analysis above, we propose a novel model for the evolution of financial systems based on the evolution process of polymer [19–21], which with the feature of self-interactive process that can reflect the characteristics of people engaged in.

7.3 The New Strategy for Modelling the Evolution of Financial Systems To solve the problem of modelling the evolution of financial systems from the internal mechanism of the systems, we intend to replace the basic or conventional model of Brownian motion by introducing self-interactive process with external factors.

7 The Complex Systems’ Methods in Financial Science and Technology

111

7.3.1 The Basic Consideration About the Model Financial systems are composed by a series of businesses. Those businesses mostly focuses on one or more industries, and aims at supporting the development of certain industries. And the engaged in of all kinds of people is the essential characteristics of the processes. As a matter of fact, at the early period of time, the evolution of financial systems are modeled by Brownian motions [11]. After that, it is modeled by some multifractal structure, i.e., it is Brownian motion iterated by Brownian motion [16, 17]. That is to say, it is a diffusion process iterated by Brownian motion—a fractal structure. As an example, Brownian motion iterated by a lot of Brownian motions. And the drawback of Brownian motion has been pointed out early. How can we improve the model? Here, we solve the problem from another point of view. We imagine the scenario that there are a large group of people gather in a huge hall or in a large square. They will talk freely with their friends or relatives nearby. What is the total sound? In the viewpoint of us, it will be the fractal structure [18]. That is to say, in the lower scale, the model is the one of self-interactive process with certain external factors, and in the more higher scale, the model is still the self-interactive process with certain external factors, which is iterated by the former or the lower scale one. To have a better understand for the novel model, we would like to have an interpretation to the processes of self-interactive process at first.

7.3.2 The Self-Interactive Process More precisely, self-interacting diffusions have been first introduced by Durrett and Rogers [19] under the name of Brownian polymers. They proposed a model for the shape of a growing polymer. This process has been studied by different authors [21]. They show in particular in the self-attracting case, that x converges almost surely. Another model of polymers has been proposed by Benam, Ledoux and Raimond [20]. They have studied, in the compact case, self-interacting diffusions depending on the empirical measure. The model was proposed as follows.  t f [a(xt − xs )]ds + dt + u t (7.1) x˙t = − 0

where xt refers to the state of the index, dt is the random factor of the index ut is the exogenous disturbance, and f and a are proper functions. In fact, for the study of processes which are with path-interaction since the seminal work of Norris, Rogers and Williams [21], self-interacting stochastic processes have been studied for many years from different aspects, and it has been an intensive

112

W. Wang

research area. It has been considered from the modelling and asymptotic analysis of polymers to more abstract self-interaction processes.

7.3.3 Self-Interactive Process with External Factors In the field of finance, the subjects are people with all kind of emotions, desires and requirements. It is the human beings making the financial systems more and more complex. How to describe the evolution of society, in [22], based on the idea of self-interactive systems, and also based on the dialectics of philosophy, the model of self-interactive systems with external factors was proposed as follows. 

t

x˙t = −

f [a(xt − xs − Ys )]ds + dt + u t

(7.2)

0

where xt is the state of the index, dt is the random factor of the index, Yt is the external factor to the indexes, for example, the political or natural factors ut is the exogenous disturbance, and f is a proper function. Of course, the function f may be different for different persons or different groups. There are a lot of things for us to do, especially with the development of information technology, the personality is more popular. The model proposed above is an original mathematical model on the evolution of persons based on the dialectical relationship between internal and external factors.

7.3.4 The New Model of Financial Systems Based on the Self-Interactive Process with External Factors As to the evolution of financial systems, it is more complex than that of the persons. At the same time, there are so many indexes needs to be considered. So, the principle of divide and rule is more suitable for that. In this section, we will consider the model for certain index. The analysis will pay more attention to each of the main indexes first. And then we combine all the indexes together by the proper method. Unfortunately, the modelling proposed for market evaluation with Brownian motion or Diffusion process with jumping are all without concern about the factor or causality of the changing. So, here, we proposed a new model for the model of the financial markets will contain the internal or external factors. In this section, we will consider the problem from another point of view. That is to say, except for the conventional consideration of self-interactive processes, we also need to consider the possible evolution of the process which can be influenced by certain impels from outside. The motivation of this work is that we need to deal

7 The Complex Systems’ Methods in Financial Science and Technology

113

with the financial process which with the influence not only from itself, but also from other influences which are exogenous [22, 23]. As mentioned above, financial systems are with the fractal structure. The different with the former ones is that the basic unit is the one with the characteristics of selfinteractive system with external factors. The systems are more like self-interactive process with external factors iterated by self-interactive process with external factors. That is to say, in a lower scale, it is a diffusion process converged by self-interactive process with external factors–a fractal structure. An example is the one as that: selfinteractive process with external factors composed by a lot of self-interactive process with external factors. For the more details on the model, we will consider in the future.

7.4 Some of the Relevant Methods Proposed for the Further Analysis The modelling method proposed above are more likely for modelling the evolution of financial processes. However, because of the complexity or the uncertainty of financial markets. The model should be adjusted to adapt the changing of the markets. How can it be done for that? Based on the idea that control theory can not only be used for the control of cars, trains, etc., but also be used for the control of models [23], it can make the models to adjust to the new inputs adaptively. In this section, we will summary some of the methods proposed which can be used for improving the adaptability of financial systems [22, 24]. So, in this section, we will give a brief summary to the methods we proposed for that. They are the methods as follows: • • • •

The nonlinear tracking-differentiator (TD) [25]; The new separation principle for PID control [26]; The stripping principle of networked control systems [26]; The internal model principle for the extension of time series models to an integrated model for big data [27].

7.4.1 The Nonlinear Tracking-Differentiators To obtain the derivatives for the signals which contain noise or are non-differential, the system called tracking-differentiator was proposed [25]. For dealing with the observation y(t) which contains noise or non-differential components, we can choose TD in the following form:

114

W. Wang

⎧ ⎨ y˙1 = y2

  |y2 |y2 ,δ ⎩ y˙2 = −Rsat y1 − y(t) + 2R

(7.3)

where sat is the saturation function, δ is the design parameter, and R is also the design parameter used to determine the tracking velocity. The function of TD is that it can realize the calculation of generalized derivatives dynamically and automatically for signals with disturbances. Relevant theories and methods have been widely used in recent years, especially in control engineering. It is the TD that make the realization of PID control effectively. At the same time, it has made the possibility of some new kinds of control methods, for example, the active disturbance reject control (ADRC). Its functions for financial analysis may be used to obtain the derivatives for reconstruction or prediction of the financial indexes.

7.4.2 The New Control Strategy for Uncertain Systems The new strategy for uncertain control systems or even complex systems has been as follows [26]. Firstly, it is the new separation principle of PID control which used for uncertain systems. Secondly, it is the double regulation mechanism which is used for high-order uncertain systems. Thirdly, it is the stripping principle which can be used for networked control systems.

7.4.2.1

The Separation Realization Method of PID Control

It is the proposition of the new separation principle of PID control that makes the tuning of the three parameters of the original PID control be realized separately by introducing operator variables, which simplifies the complexity of parameters selection and provides a scientific design method. It is the proposition of the new separation of PID control that the PID control is changed from a control technology to an efficient control theory.

7.4.2.2

The Double Regulation Mechanism Strategy

It is the proposition of new separation principle of PID controller that makes the extension of the control method to the higher-order systems by the combination binary control with PID control, which we call it the double regulation mechanism.

7 The Complex Systems’ Methods in Financial Science and Technology

7.4.2.3

115

The Stripping Principle of Disturbance for Networked Systems

It is the proposition of new separation principle of PID controller that makes the extension of the control method to the networked control systems by rejecting all the inter connected parts among the nodes. So, we can make the synthesis of the networked systems by using the stripping of internal disturbances.

7.4.3 The Use of Internal Model Principle After the extension of PID control to networked systems, we need to consider the situation of modelling for big data. The method of using the internal model principle is an important method for that. That is to say, by the former modelling of certain index, we can extend the model by using internal model principle to combine some new features to the model, and to integrate into a new model.

7.4.4 Some Applications of the Control Methods All of the methods mentioned in this section can be used for the control of uncertain systems or even complex systems, with the purposes of stable, tracking for unusual motions, and so on. In this section, I would like to say that these methods can also be used for the regulation of models, especially for the modelling where the traditional statistical methods or even the machine learning doesn’t work well. The control methods can also be used for improving the model’s adaptability of time series, at the same time, the methods of control can also be used for the analysis of big data with the use of the internal model principle. Furthermore, the methods can also be used for the prediction or multi-step prediction, especially for the modelling and analysis of financial data. The main aspects are as follows. • Model improvements for statistics models—The robustness of the control methods can ensure the proposed method be used in the modelling, prediction and control, especially for the data of financial markets [27]. • Reconstruction and prediction—The proposed method is the one based on the combination of TD with Taylor’s formula. • Social calculations—There has been model proposed, which is the original mathematical model of dialectical relation between internal and external causes. Or the dialectical relationship between internal and external factors [28].

116

W. Wang

7.5 Call for Advanced Method on Machine Learning With the popularity of information science and technology, the machine learning and artificial intelligence will be applied in financial technology more and more. However, the essential feature of the current machine learning is mainly based on the existing statistical methods to carry out correlative analysis to achieve the classification or feature’s extraction; And the deep learning will be based on convolution and other complex mathematical operations to achieve more accurate classification and recognition, which we call elementary learning. At the same time, the new coming events will appear almost every time in coming hours. The learning results based on the former sample cannot reflect the new features, especially in financial markets, the new sample will appear very often. For these situations, the machine learning or even deep learning cannot work well. That is the drawback of the learning method. How can we reflect the new events? It is becoming a more and more urgent task for us. So, it is the objective requirement to face the dynamic feature recognition of the changing world. It calls for some more powerful learning methods. And we call it the methods on advanced learning. It is the control for complex systems that makes the new learning method possible. The essential feature of advanced learning is that it can reflect the new coming events or features quickly. Advanced machine learning may be used for the learning of complex processes. Fortunately, the control methods proposed above provide the new prospective future for financial analysis, and also for the analysis for financial science and technology. In this field, there is still a long way to go. At the same time, although artificial intelligence can do the programming tasks, most of the procedures needs the optimization. That is to say, we also needs more optimal methods for artificial intelligence. There still have many things to do.

7.6 Conclusions In this chapter, some of the complex systems’ methods, such as, the hierarchical structure analysis, multiscale analysis, and causality analysis of financial systems, are summarized. For modelling the evolution of financial indexes, a novel fractal structure model, which is iterated by self-interactive process with the external factors, is proposed. The relevant methods proposed for the further analysis of financial systems are also being reviewed. Finally, from the viewpoint for the future development of financial technology, we propose some considerations that needs to do, i.e., the methods on advanced learning and intelligent automation or optimization for complex systems.

7 The Complex Systems’ Methods in Financial Science and Technology

117

References 1. Mantegna, R.N., Stanley, H.H.: An Introduction to Econophysics: Correlations and Complexity in Finance. RUC (Renmin University of China) Press, Beijing, China (2006) 2. Galanis, S.: Financial complexity and trade. Games Econ. Behav. 112, 219–230 (2018) 3. Mantegna, R.N.: Hierarchical structure in financial markets. Eur. Phys. J. B 11, 193–197 (1999) 4. Ahn, Y.-Y., Bagrow, J.P., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature 466, 761–765 (2010) 5. Zhang, Y., Wang, J.: Linkage influence of energy market on financial market by multiscale complexity synchronization. Physica A 516, 254–266 (2019) 6. Stavroglou, S.K., Pantelous, A.A., Stanley, H.E., Zuev, K.M.: Hidden interactions in financial markets. PNAS 116, 10646–10651 (2019) 7. Kappeler, P.M.: A framework for studying social complexity. Behav. Ecol. Sociobiol. 73, 13 (2019) 8. Gai, P., Haldane, A., Kapadia, S.: Complexity, concentration and contagion. J. Monetary Econ. 58, 453–470 (2011) 9. Herring, R.J.: The evolving complexity of capital regulation. J. Financ. Serv. Res. 53, 183–205 (2018) 10. McKibbin, W.J., Stoeckel, A.: Modelling a complex world: improving macro-models. Oxf. Rev. Econ. Policy 34, 329–347 (2018) 11. Stroock, D.W., Varadhan, S.R.S.: Multidimensional Diffusion Processes. World Book Inc., Beijing, China (2009) 12. Li, C., Shang, P.: Complexity analysis based on generalized deviation for financial markets. Physica A 494, 118–128 (2018) 13. Wang, Y., Shang, P., Liu, Z.: Analysis of time series through complexity - entropy curves based on generalized fractional entropy. Nonlinear Dyn. 96, 585–599 (2019) 14. Leventides, J., Loukaki, K., Papavassiliou, V.G.: Simulating financial contagion dynamics in random interbank networks. J. Econ. Behav. Organ. 158, 500–525 (2019) 15. Oh, G.: Multifractals of investor behavior in stock market. J. Korean Phys. Soc. 71, 19–27 (2017) 16. Wu, Y., Shang, P., Chen, S.: Modified multifractal large deviation spectrum based on CID for financial market system. Physica A 523, 1331–1342 (2019) 17. Wang, Y., Zheng, S., Zhang, W., et al.: Fuzzy entropy complexity and multifractal behavior of statistical physics financial dynamics. Physica A 506, 486–498 (2018) 18. Falconer, K.: Fractal Geometry, Mathematical Foundations and Applications, 2nd edn. Wiley (2003) 19. Durrett, R.T., Rogers, L.C.G.: Asymptotic behavior of Brownian polymers. Prob. Theo. Rel. Fields 92, 337–349 (1992) 20. Benaïm, M., Ledoux, M., Raimond, O.: Self-interacting diffusions. Prob. Theory Relat. Fields 122, 1–41 (2002) 21. Norris, J.R., Rogers, L.C.G., Williams, D.: Self-avoiding random walk: a Brownian motion model with local time drift. Prob. Theory Rel. Fields 74, 271–287 (1987) 22. Wang, W.: The method on modifying the dynamic properties of self-interactive systems by external factors. Appl. Mech. Mater. 109, 410–414 (2012) 23. Bardoscia, M., Battiston, S., Caccioli, F., Caldarelli, G.: Pathways towards instability in financial networks. Nat. Commun. https://doi.org/10.1038/ncomms14416 24. Silva, T.C., Alexandre da Silva, M., MirandaTabak, B.: Systemic risk in financial systems: a feedback approach. J. Econ. Behav. Organ. 144, 97–120 (2017) 25. Han, J.Q., Wang, W.: Nonlinear tracking-differentiators. J. Syst. Sci. Math. Sci. 14, 177–183 (1994). (In Chinese) 26. Wang, W.: The New Design Strategy on PID controllers. In: Panda, R.C. (ed.) Introduction to PID Controllers - Theory, Tuning and Application to Frontier Areas. InTech Publisher, Croatia (2012)

118

W. Wang

27. Wang, W.: The method on improving the adaptability of time series models based on dynamical innovation. Commun. Comput. Inf. Sci. Ser. 210–217 (2012) 28. Wang, W.: The conceptual models for the growth of individuals based on the viewpoint of philosophy. In: Proceedings of the 2014 Asia-Pacific Humanities and Social Sciences Conference, Shanghai, China (2014)

Chapter 8

Estimating the Number of Fork Projects of Bitcoin Based on a Birth-Death-Immigration Process Wei Dai

Abstract Since the first cryptocurrency, Bitcoin, was invented in 2008, there are 105 Bitcoin fork projects in total. The number of them is still raising now. Whether it will keep increasing and what the increasing ratio is, are important and interesting questions. However there is no model to answer these question. Thus, this chapter tries to propose a population model, using a birth-death-immigration process, to estimate the number of Bitcoin fork projects.

8.1 Introduction On October 31th 2008, with a link to a paper authored by Satoshi Nakamoto titled Bitcoin: A Peer-to-Peer Electronic Cash System [1] posted to a cryptography mailing list, the first cryptocurrency, Bitcoin, was invented. It is a decentralized digital currency without a central bank or single administrator that can be sent from user to user on the peer-to-peer bitcoin network. Since then, many cryptocurrencies were created [2]. Some cryptocurrency projects, using the Bitcoin source code, issue coins via some inheritance of the state of the Bitcoin ledger. They are called fork projects of Bitcoin, or a Bitcoin “fork”. These new projects run on their own set of rules, similar but different from what Bitcoin runs on. While “fork” happens, credit holders of Bitcoin would have new ‘forked’ coins on their new blockchain, just because theirs are derived from the Bitcoin blockchain. It is a quite special behavior that would never happened in the stock market or currency market. And it implies that the total supply of Bitcoin family, bitcoin-like cryptocurrencies, are continuous raising. Though the Bitcoin supply may not be increased, the raising supply of Bitcoin-like cryptocurrencies may cause the price of bitcoin going down, slightly. W. Dai (B) School of Finance, Central University of Finance and Economics, Beijing, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_8

119

120

W. Dai

Thus, it is important and emergent to build models to illustrate how these projects increase. While collecting datas, we found that there are 105 Bitcoin fork projects in total. Of those, 68 are considered active projects relevant to holders of Bitcoin. The remaining ones are considered historic and are no longer developed. It seems that enough datas have been observed and it is able to build a model to estimate the number of fork projects in the future. In this chapter, we propose a simple stochastic model, using birth-deathimmigration process, to show how Bitcoin forks. In fact, population processes under the influence of various types of catastrophes have been studied by Brockwell et al. [3], and Kyriakidis [4]. We add some parameters to this model to make it works fine in cryptocurrency scenario. This chapter is organized as follows. In Sect. 8.2, we introduce some basic knowledges of fork projects. Then we introduce our model in Sect. 8.3, estimate parameters in Sect. 8.4, and show numerical results in Sect. 8.5. At last, we give a brief conclusion.

8.2 Fork Project In this section, we recall some basic knowledges of Bitcoin and how its blockchain fork works. It help us to understand the forking model. The Bitcoin blockchain is a public ledger that records bitcoin transactions. It is implemented as a chain of blocks, each block containing a hash of the previous block up to the genesis block of the chain. A network of communicating nodes running bitcoin software maintains the blockchain. On many occasions recently, a group of individuals decided that they would prefer to have a different ruleset for bitcoin for a variety of reasons, which is often described on the project’s homepage or project announcement post. Implied is a declaration of a different ruleset imposed by a different set of code, this is often a change to a different proof of work, consensus algorithm or block size (Fig. 8.1). According to the Bitcoin blockchain, these rule changes are invalid and mining would not produce a valid block to add to the bitcoin blockchain. Miners and nodes

Fig. 8.1 Forked chain

8 Estimating the Number of Fork Projects of Bitcoin …

121

being governed by the rules of the forked project will be able to produce valid blocks, and in turn, they will consider new bitcoin blocks to be invalid and would not add them to their chain. This is where a fork happens and what was the bitcoin blockchain is now split into two separate and incompatible projects.

8.3 Population Model In this section, we introduce our population model base on the birth-deathimmigration process. The process is formulated by letting N (t) represent the number of the fork projects, as the population, at time t. Define Pn (t) as the probability that the number N (t) reach n at time t, we have Pn (t) = Pr {N (t) = n|N (0) = 0} And the expected value of N (t) is defined as follows, M(t) = E[N (t)]. And the probability-generating function G(z,t) is defined as follows, G(z, t) =

∞ 

Pk (t)z k .

k=0

It is clear that a new project not only can be derived from the original Bitcoin project, but also from the previous fork projects. And also we know that new technology bring up new projects. Thus, we consider that the births of new project occur proportional to the population size and some other factors such as the market. We set the birth rate as follows, λn = f (n, pt )λ + g(t), where λ is a constant, a fixing birth rate, n is the population size, pt is the price of Bitcoin, f (x, y) is the function to combine these factors, and g(t) is the birth rate caused by technology upgrading, that is, the immigration rate. Further, we set the death rate as follows, μn = f (n, pt )μ, where μ is a fixing death rate. According to definition of the classical BDI process, in a small interval h, we have E[N (t + h)|N (t)] − N (t) = (λn − μn )h + o(h) = ( f (n, pt )(λ − μ) + g(t))h + o(h).

122

W. Dai

For a simple case, assuming that f (x, y) = x and g(t) = α, taking the expect value of both side, we have M(t + h) − M(t) = (λ − μ)M(t) + α + o(h)/ h. h Let h tends to 0, we have M  (t) = (λ − μ)M(t) + α. Given N (0) = n 0 , the solution is M(t) =

α (exp {(λ − μ)t} − 1) + n 0 exp {(λ − μ)t} . λ−μ

It is clear that M(t) is the estimation of number of fork projects at time t, with given parameters μ, λ and g.

8.4 Parameter Estimation In this section, we propose a maximum likelihood estimation of the parameters of the simple case of our model under the observed data. The forward Kolmogorov equation shows that Pn (t) = (n + 1)μPn+1 (t) + ((n − 1)λ + α)Pn−1 (t) − (n(μ + λ) + α)Pn (t). Thus, the generating function, defined as follows, G(z, t) =

∞ 

Pk (t)z k

k=0

satisfies the following equation, ∂G(z, t) ∂G(z, t) = (λz − μ)(z − 1) + α(z − 1)G(z, t). ∂t ∂z Assuming that λ = μ, we have G(z, t) =

1 α(λ − μ)



μ−λ μ − λz + λ(z − 1)e(λ−μ)t

α/λ

· 2 F1

α λ

, 1, 1 +

 α , (μ − λz)e−(λ−μ)t , λ

where 2 F1 (a, b; c, z) is the hypergeometric function defined by

8 Estimating the Number of Fork Projects of Bitcoin … 2 F1 (a, b; c, z)

=

123

∞  (a)k (b)k z k . (c)k k! k=0

Here (q)n is the (rising) Pochhammer symbol, define by qn = q(q + 1) · · · (q + n − 1). Then we obtain that

  dn G(z, t) . Pn (t) = n dz z=0

If a series datas {(N1 , t1 ), (N2 , t2 ), · · · , (Nk , tk )} is observed, the likelihood function is L(N1 , N2 , · · · , Nk , t1 , t2 , · · · , tn ; λ, μ, α) =

k 

PN j+1 −N j (t j+1 − t j ).

j=0

With the maximum likelihood method, using Genetic Algorithm, we can estimate the parameters.

8.5 Numerical Results To make it an empirical result, we collect datas from the website http://forkdrop.io, and try to fit our models. It is easy to mark a fork project’s birth, just mark the time that project started. But it is hard to mark a fork project’s death. Many projects made no progress in developing, but the founders are still active on social networks and make people believe that this project is still living and worth investing. At last, we set our criteria as follows, We consider a fork project is dead when (1) its homepage is down, (2) its developer/founder is out of contact, and (3) project never launched. At last, we got 118 observations of this birth-death process, shown as follows (Fig. 8.2). Then we estimate the parameters with the maximum likelihood method and show a numerical result that α = 0.998, μ = 0.222 and λ = 0.771. With this parameter, we can calculate the expected value of fork projects that may increase in the next year by the following formula, the estimate curve is shown as follows (Fig. 8.3), M(t) =

α (exp {(λ − μ)t} − 1) + n 0 exp {(λ − μ)t} . λ−μ

we have M(0) = n 0 = 68 and M(1) = 119.07, where n 0 = 68 is the active projects now we have. It implies that in next year, about 50 more fork projects will appear and keep active.

124

W. Dai

Fig. 8.2 Birth-death process of fork projects

Fig. 8.3 Estimated growth of fork projects

8.6 Conclusions In this chapter, we proposed a birth-death-immigration process to estimate the number of Bitcoin fork projects and estimate its parameters. With this model, we make a prediction that number of fork projects will increase to 119 in next year.

8 Estimating the Number of Fork Projects of Bitcoin …

125

References 1. Satoshi, N.: Bitcoin: A Peer-to-Peer Electronic Cash System. bitcoin.org (2008) 2. Greenberg, A.: Crypto Currency. Forbes. Archived from the original on 31 August 2014. Retrieved 8 August 2014 3. Brockwell, P.J., Gani, J., Resnick, S.I.: Birth, immigration and catastrophe processes. Adv. Appl. Probab. 14(4), 709–731 (1982) 4. Kyriakidis, E.G.: Stationary probabilities for a simple immigration-birth-death process under the influence of total catastrophes. Statist. Probab. Lett. 20(3), 239–240 (1994)

Chapter 9

Patterns Versus Spatial Heterogeneity—From a Variational Viewpoint Izumi Takagi

Abstract By a pattern we usually mean a spatially nontrivial structure and hence its antonym is spatial homogeneity. Alan Turing found that, in a reaction-diffusion system of two species, different diffusion rates can destabilize a spatially uniform state, leading to spontaneous formation of a pattern. This chapter proposes to generalize the notion of pattern to that of spatially heterogeneous environments and to build a unified theory of spontaneous emergence of patterns against spatially homogeneous or heterogeneous backgrounds.

9.1 Introduction In the pioneering paper [21], Turing proposed that the diffusion-driven-instability (DDI) might account for the spontaneous formation of patterns in developmental biology. DDI is the destabilization of spatially homogeneous state caused by the reaction of two chemical substances, called morphogens, with different diffusivities. This is a counter-intuitive idea since diffusion is a flattening process. By linearized stability analysis he showed that DDI does occur under an appropriate condition. This has made the paper cited often in the context of linearized instability of constant steady states. However, he proposed many other things: He introduced a hypothetical chemical system and used a system of nonlinear diffusion equations to obtain spatial patterns by computer simulations. The paper includes a one-dimensional pattern and a two-dimensional pattern. He discussed the possibility of applying his morphogen theory to phyllotaxis, tentacle formation of hydra, and so on. In the final section of [21] Turing wrote: The ‘wave’ theory which has been developed here depends essentially on the assumption that the reaction rates are linear functions of the concentrations, an assumption which is I. Takagi (B) Institute of Mathematical Sciences, Renmin University of China, No. 59 Zhongguancun Street, Haidian District, Beijing, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_9

127

128

I. Takagi

justifiable in the case of a system just beginning to leave a homogeneous condition. Such systems certainly have a special interest as giving the first appearance of a pattern, but they are the exception rather than the rule.

The linear theory is not a goal, only the beginning. He continues: Most of an organism, most of the time, is developing from one pattern into another, rather than from homogeneity into a pattern. One would like to be able to follow this more general process mathematically also. The difficulties are, however, such that one cannot hope to have any very embracing theory of such processes, beyond the statement of the equations. It might be possible, however, to treat a few particular cases in detail with the aid of a digital computer. ………It might even be possible to take the mechanical aspects of the problem into account as well as the chemical, when applying this type of method. ………The morphogen theory of phyllotaxis, to be described, as already mentioned, in a later paper, will be covered by this computational method. Nonlinear equations will be used.

Unfortunately the paper on phyllotaxis never appeared due to his death. However, it is clear that he was interested in the morphogen theory applicable to entire embryonic development: pattern formation in spatially heterogeneous environments. The first stage may be the formation of a pattern from the uniform state, but the second and later stages need to be pattern formation against a spatially heterogeneous background. Spatial patterns are not limited to developmental biology, but are observed in many occasions such as in fluid dynamics, in chemical reactions, and in ecological systems and so on (see [13] for various aspects of biological pattern formation). These patterns emerge from more or less spatially heterogeneous background and we have already developed theories to investigate pattern formation by perturbation of spatially uniform environments (see, e.g., [11]). It is to be emphasized that there is a sizable amount of literature on population dynamics in spatially heterogeneous environments (see, e.g., [2, 4, 8] for systematic studies in this direction), which deals with large deviation from spatial homogeneity. In this article we are interested in the emergence of new patterns from spatially heterogeneous environments. It is natural to expect the existence of stable steady states subordinate to the given spatial heterogeneity; we would like to call such stable steady states backgrounds. We explore the possibility of building a theory to deal with the existence of patterns (i.e., stable steady states) other than backgrounds.

9.2 Pattern Formation in Single Equations Let Ω be a bounded domain in R N with smooth boundary ∂Ω. Let ν = (ν1 , . . . , ν N ) denote the unit outer normal to ∂Ω. Let (ai j (x))1≤i, j≤N be an N × N symmetric matrix defined on Ω such that λ0 |ξ |2 ≤

N  i, j=1

2 ai j (x)ξi ξ j ≤ λ−1 0 |ξ |

for all ξ ∈ R N , x ∈ Ω.

9 Patterns Versus Spatial Heterogeneity—From a Variational Viewpoint

129

We assume that ai j (x)’s are sufficiently smooth on Ω. This section reviews some known results on pattern formation in the initialboundary value problem for a single parabolic equation: ∂u = DA (x)u + f (u, x) for x ∈ Ω, t > 0 , ∂t B(x)u = 0 for x ∈ ∂Ω, t > 0 , u(x, 0) = u 0 (x) for x ∈ Ω ,

(9.1) (9.2) (9.3)

where D is a positive constant,   N  ∂ ∂u A (x)u = ai j (x) , ∂ xi ∂x j i, j=1

B(x)u =

N 

ai j (x)νi

i, j=1

∂u , ∂x j

and the initial function u 0 (x) is assumed to be sufficiently smooth.

9.2.1 Allen-Cahn Equation with Variable Coefficients In this subsection we consider the case f (u, x) = r (x)(u − a(x))(u − b(x))(c(x) − u),

(9.4)

where a(x), b(x), c(x) and r (x) are sufficiently smooth functions defined on Ω and a(x) < b(x) < c(x), r (x) > 0 for all x ∈ Ω. When a(x) ≡ a0 , b(x) ≡ b0 , c(x) ≡ c0 , r (x) ≡ r0 and ai j (x) = δi j , we immediately see that (9.2)–(9.3) has three constant steady-state solutions: u(x) = a0 , u(x) = b0 and u(x) = c0 . Moreover, it is easy to check that u(x) = a0 and u(x) = c0 are asymptotically stable, whereas u(x) = b0 is unstable. By the classical result of Casten and Holland [3], there is no stable nonconstant steady-state solution if Ω is convex. On the other hand, it is known that there is a stable nonconstant steadystate solution for some non-convex domains (Matano [10]). Therefore, in the case of constant coefficients, we can say that patterns appear only for a special type of (non-convex) domains, if we regard a stable nonconstant steady-state solution as a pattern. In one spatial dimension, a pioneering paper by Hale and Sakamoto [7] investigated the general case of nonconstant coefficients and proved the existence of steady-state solutions with transition layers for D > 0 sufficiently small. In fact, they proved the following: Theorem 9.1 (Hale-Sakamoto) c (0) = c (1) = 0. Let

Let Ω = (0, 1). Assume that a  (0) = a  (1) = 0,

130

I. Takagi

 J (x) =

c(x)

f (s, x) ds

for x ∈ [0, 1].

a(x)

Assume that there exist n points x j ∈ (0, 1), x j < x j+1 for j = 1, . . . , n − 1 such that (i) (ii) (iii)

J (x j ) = 0, j = 1, . . . , n  J  u(x j ) = 0, j = 1, . . . , n a(x j ) f (s, x j ) ds < 0 for u in the open interval (a(x j ), c(x j )) for j = 1, . . . , n.

Then there exists a constant D0 > 0 for which the following statements (a) and (b) hold: (a) There exist two families of steady-state solutions a D (x) and c D (x) of (9.2)–(9.3) for 0 < D < D0 such that max |a D (x) − a(x)| → 0,

0≤x≤1

max |c D (x) − c(x)| → 0

0≤x≤1

as D → 0.

Moreover, a D (x) and c D (x) are asymptotically stable. (b) There exist two families of steady-state solutions u n,± (x, D) of (9.2)–(9.3) for 0 < D < D0 such that  a(x) on Ω0 , lim u n,+ (x, D) = D→0 c(x) on Ω1  c(x) on Ω0 lim u n,− (x, D) = D→0 a(x) on Ω1 compact uniformly. Here Ω0 and Ω1 are defined by Ω0 = [0, x1 ) ∪ (x2 , x3 ) ∪ · · · ∪ (xn , 1] if n is even = [0, x1 ) ∪ (x2 , x3 ) ∪ · · · ∪ (xn−1 , xn ) if n is odd Ω1 = (x1 , x2 ) ∪ · · · ∪ (xn−1 , xn ) if n is even = (x1 , x2 ) ∪ · · · ∪ (xn−2 , xn−1 ) ∪ (xn , 1] if n is odd Therefore, we obtain a steady-state solution which has n interior transition layers. We call {x1 , x2 , . . . , xn } the layer positions of u n,±,D . It is clear that a layer position belongs to the set {x ∈ (0, 1) | 2b(x) = a(x) + c(x)}. In addition, they considered the eigenvalue problem for the linearized problem around u n,± (x, D): Ln,±,D φ = DA (x)φ + f u (u n,± (x, D), x)φ = λφ in Ω,

9 Patterns Versus Spatial Heterogeneity—From a Variational Viewpoint

131

under the non-flux boundary condition B(x)φ = 0 on ∂Ω. Let {λ j,±,D }∞ j=1 be the eigenvalues of Ln,±,D . Since N = 1, we see that all eigenvalues are simple and may be numbered as λ1,±,D > λ2,±,D > · · · > λ j,±,D > λ j+1,±,D > · · · ↓ −∞. They proved that the first n eigenvalues λ1,±,D > λ2,±,D > · · · > λn,±,D tend to zero ≤ −μ0 for all as D → 0, while there exists a positive constant μ0 such that λ j,±,D √ j ≥ n + 1. Moreover, they gave a formula to compute the limit λ j,±,D / D as D → 0 (see Sect. 5 of [7]). For instance, in the case where a(x) ≡ 0, c(x) ≡ 1, r (x) ≡ 1 and a11 (x) ≡ 1, i.e., the equation reduces to u t = Du x x + u(u − b(x))(1 − u), assume that there exists a unique x1 ∈ (0, 1) such that b(x1 ) = 1/2 and if in addition b (x1 ) < 0, then the monotone increasing solution u 1,+ (x, D) is asymptotically stable. We remark that Ai, Chen and Hastings [1] proved the existence of steady states with multiple layers and spikes and determined the Morse index of such solutions. For more on the nonlinearity (9.4), see [1] and the references therein. Let us call the solutions a D (x) and c D (x) in Theorem 9.1 the primary patterns or the backgrounds. These are asymptotically stable steady-state solutions which are close to a(x) and c(x), respectively. If there exists an asymptotically stable steadystate solution which is different from the primary patterns, we would like to call it a secondary pattern or de novo pattern. In one dimensional domains, there is no secondary patterns for equations with constant coefficients; however, there does exist a secondary pattern if coefficients depend on x in an appropriate way (Fig. 9.1).

Fig. 9.1 Above: coefficients a(x), b(x), c(x); Left: primary pattern a D (x), Center: primary pattern c D (x), Right: secondary pattern

132

I. Takagi

9.2.2 A Mono-Stable Nonlinearity Next, in this subsection we consider the case p

f (u, x) = −b(x)u + c(x)u + + δσ (x),

(9.5)

where u + = max{u, 0}; b(x), c(x) are positive smooth functions on Ω, whereas σ (x) is a smooth function satisfying 0 ≤ σ (x) ≤ 1 on Ω and maxx∈Ω σ (x) = 1; δ is a nonnegative constant. The exponent p is assumed to satisfy 1 < p < (N + 2)/(N − 2) if N > 2 and 1 < p < ∞ if N = 1, 2. Here we are interested in the stationary problem: p

DA (x)u − b(x)u + c(x)u + + δσ (x) = 0 in Ω ,

(9.6)

B(x)u = 0

(9.7)

on ∂Ω .

We introduce an energy functional associated with (9.6): ⎫ ⎧  ⎨  N ⎬ ∂u ∂u b(x) 2 c(x) p+1 D u − u + − δσ (x)u d x J D (u) = ai j (x) + ⎭ ∂ xi ∂ x j 2 p+1 Ω ⎩ 2 i, j=1 (9.8) It is known that a critical point u ∈ H 1 (Ω) of J D (u) is a weak solution of (9.6)–(9.7), and by the standard elliptic regularity theory, it is a classical solution. We observe that, if δ = 0, then u ≡ 0 is a solution of (9.6)–(9.7). Moreover, h u (0, x) = −μ(x) ≤ − min x∈Ω b(x). By making use of this property we can prove that there exists a unique solution u m,D (x) satisfying 0 < u m,D (x) ≤ δ/ min x∈Ω b(x), provided that δ > 0 is sufficiently small. In the case δ = 0 we define u m,D (x) ≡ 0, and call u m,D (x) the minimum solution of (9.6)–(9.7). It is clear that the minimum solution is asymptotically stable, and the functional J D (u) attains a strict local minimum at u m,D . Let us put I D (v) = J D (u m,D + v) − J D (u m,D ). Then I D (v) attains a strict local minimum at v = 0. For any nonnegative function φ ∈ H 1 (Ω) \ {0}, we see that the function t → I D (tφ) for t ≥ 0 has exactly two critical points t = 0 and t = T , and I D (tφ) → −∞ as t → +∞. Therefore, there exists an e ∈ H 1 (Ω) \ {0} such that I D (e) = 0; moreover, one can check that I D (v) satisfies the Palais-Smale condition. Hence, by the Mountain Pass Lemma we obtain a positive critical value c D of I D (v). Furthermore, c D can be characterized by cD =

inf

φ∈H 1 (Ω)\{0}, φ(x)≥0

max I D (tφ). t≥0

(9.9)

9 Patterns Versus Spatial Heterogeneity—From a Variational Viewpoint

133

We can prove that there exists a positive constant α0 such that α0 ≤ c D ≤ α0−1 D N /2 (see Sects. 2.2 and 2.4 of [20]). Let v D be a critical point of I D (v) with critical value c D . Then u D = u m,D + v D is a critical point of J D (u), that is, u D is a classical solution of (9.6)–(9.7) other than the minimum solution u m,D . We call u D a groundstate solution and v D the mountain pass part of u D . In [20] we obtained detailed description of the asymptotic profile of u D as D → 0. We introduce the primary locator function Φ(Q) for Q ∈ Ω by  1/2 , Φ(Q) = b(Q)2/( p−1)+1−N /2 c(Q)−2/( p−1) det(ai j (Q))

(9.10)

which treats the case δ = 0. In order to handle the case δ > 0, we introduce a few quantities. Let u m (Q) be the smaller root of the algebraic equation f (ζ, Q) = 0 in ζ > 0, so that f (u m (Q), Q) = 0. Note that u m (Q) = O(δ) as δ → 0. Put γ (Q) = (c(Q)/b(Q))1/( p−1) u m (Q). For γ ≥ 0 sufficiently small, let wγ (z) be a unique positive solution of the following boundary value problem in R N : Δw − w + (γ + w) p − γ p = 0 in R N ,

(9.11)

lim w(z) = 0 ,

(9.12)

w(0) = max w(z).

(9.13)

|z|→∞

z∈R N

We define the energy of wγ by I (γ ) =

1 2

 (|∇wγ |2 + wγ2 ) dz    1 − (γ + wγ ) p+1 − γ p+1 − ( p + 1)γ p wγ dz. p + 1 RN RN

Finally the locator function Λ(Q) is defined by Λ(Q) = I (γ (Q)) Φ(Q).

(9.14)

Theorem 9.2 ([20]) Let δ ≥ 0 be sufficiently small. Suppose that {u D } D>0 is a family of ground-state solutions of (9.6)–(9.7). Then for D sufficiently small, u D has exactly one local maximum, hence the global maximum. Let PD be the maximum point of u D and assume that PD j → P0 ∈ Ω as D j ↓ 0. If minΩ Λ(Q) < 21 min∂Ω Λ(Q), then P0 ∈ Ω and Λ(P0 ) is the global minimum of Λ(Q) over Ω. (ii) If minΩ Λ(Q) > 21 min∂Ω Λ(Q), then P0 ∈ ∂Ω and Λ(P0 ) is the minimum of Λ(Q) over ∂Ω. (i)

134

I. Takagi

Let us call P0 in the theorem above the concentration point for {u D }. If δ = 0, then γ (Q) ≡ 0 and hence Λ(Q) = I (0)Φ(Q) and I (0) is a positive constant. Therefore, in this case we can replace Λ(Q) in Theorem 9.2 with the primary locator function Φ(Q), and the concentration point is either a global minimum point or a minimum point over ∂Ω of Φ(Q). Remark 9.1 Ren [19] considered Problem (6)–(7) in the case where only c(x) depends on x. Wang and Zeng [22] considered semilinear elliptic equations h 2 Δu − V (x)u + K (x)|u| p−1 u + Q(x)|u|q−1 u = 0, with 1 < q < p < (N + 2)/(N − 2)+ , in R N and abstracted a function called the ground-energy function, which is the same as the primary locator function in the case Q(x) ≡ 0. Remark 9.2 In the case where all coefficients are constant, the ground-state solution is concentrated around a maximum point of the mean curvature function H (Q), Q ∈ ∂Ω, of the boundary ∂Ω as D → 0 (see, [14, 15]). Hence, in this case we may regard H (Q) as the primary locator function. It is also known that for the constant coefficient case, for a given positive integer K , there exists a family of solutions which are concentrated around K points {P1 , P2 , . . . , PK } in the interior of the domain. The concentration points P j ’s appear as the centers of K congruent, nonoverlapping spheres S R (P j ) of the maximum radius R such that ∪ Kj=1 S R (P j ) ⊂ Ω (see, e.g., [6]). Therefore, we can say that in uniform environments patterns are determined by the geometry of the domain. Problem (6)–(7) was considered also by Omel’chenko and Recke [18], but their approach is not variational. Remark 9.3 For the mono-stable nonlinearity (9.5), we have at least two solutions: the minimum solution u m,D (x) and the ground-state solution u D (x) for any D > 0. The minimum solution is asymptotically stable; on the other hand, the groundstate solution is unstable, since the linearized operator DA (x) + f u (u D (x), x) has a positive eigenvalue. We may call u m,D (x) a primary pattern or a background state, and this is the only stable steady-state solution.

9.3 Pattern Formation in Systems of Equations 9.3.1 An Activator-Inhibitor System In Turing’s morphogen theory, finding nonlinearities that lead to pattern formation is the most important. Twenty years after Turing’s paper, Gierer and Meihardt [5] proposed a few types of reaction-diffusion systems which are capable of producing patterns. Among them is the activator-inhibitor model consisting of • slowly diffusing activator (auto- and cross-catalytic enhancer) • and rapidly diffusing inhibitor (cross-catalytic suppressor).

9 Patterns Versus Spatial Heterogeneity—From a Variational Viewpoint

135

Let A(x, t) and H (x, t) denote the respective concentrations of the activator and the inhibitor. Then the activator-inhibitor model reads   ∂A Ap = Da ΔA − μa A + ρa ca q + ρ for x ∈ Ω, t > 0 , 0 ∂t H (1 + κ A p ) ∂H Ar = Dh ΔH − μh H + ρh ch s for x ∈ Ω, t > 0 , ∂t H ∂H ∂A = =0 for x ∈ ∂Ω, t > 0, ∂ν ∂ν

(9.15) (9.16) (9.17)

where Da , Dh , ca , ch are positive constants, κ is a nonnegative constant, and μa (x), μh (x), ρa (x), ρh (x) are positive functions, while ρ0 (x) is a nonnegative function. The exponents p > 0, q > 0, r > 0, s ≥ 0 are assumed to satisfy 0
0, a different type of pattern appears, see, e.g., [12]). In the extreme case Dh → +∞ the situation becomes much simpler: the inhibitor H (x, t) converges to a spatial constant ξ(t) as the second equation over Ω and using the non-flux boundary Dh → +∞. Integrating  condition to get Ω A (x)H d x = 0, we are led to the shadow system:   ∂A Ap = Da A (x)A − μa (x)A + ρa (x) ca q + ρ0 (x) , ∂t ξ   dξ 1 1 =− μh (x) d xξ + s ρh (x)ch Ar d x, dt |Ω| Ω ξ |Ω| Ω under the boundary condition B(x)A = 0 on ∂Ω. Let (A(x), ξ ) be a stationary solution of the shadow system. If we put A(x) = ξ ( p−1)/q u(x), then the new unknown (u(x), ξ ) satisfies   Da A (x)u − μa (x)u + ρa (x) ca u p + ρ0 (x)ξ −q/( p−1) = 0 in Ω , B(x) u = 0 on ∂Ω ,  ρh (x)ch u r d x qr −μ − (s + 1) ( > 0 by (9.18)). , where μ = ξ = Ω p−1 Ω μh (x) d x In the case ρ0 (x) ≡ 0, the problem reduces to finding a solution u of (6)–(7) with δ = 0, and ξ is determined uniquely for each solution u. By Theorem 9.2, we know that the shadow system (with ρ0 ≡ 0) has a family of steady-state solutions {(A Da (x), ξ Da )} Da >0 which exhibits a point-condensation phenomenon: For suffi-

136

I. Takagi

ciently small Da > 0, A Da (x) has exactly one local maximum point PDa ∈ Ω. Let P0 ∈ Ω be an accumulation point of {PDa }. Then P0 is either (i) a global minimum point of the primary locator function Φ(Q) or (ii) a minimum point of Φ(Q) over the boundary ∂Ω depending on the structure of Φ(Q). Note also that ξ Da → +∞ as Da ↓ 0. Moreover, we can prove that A Da, j (x) → 0 locally uniformly in Ω \ {P0 }, along any sequence Da, j ↓ 0. The stability question of point-condensation solutions is treated in [16, 23] in the case of constant coefficients; the method in [16] can be applied variable coefficients [17]. It is interesting to consider the case where ai j (Q) ≡ δi j , μa (Q) ≡ 1, p = 2, δ = 0. Then Φ(Q) = ρa (Q)−2 . Therefore, in order to have an interior maximum point (for the ground-state solution), we require min ρa (Q)−2 < 2 min ρa (Q)−2 , Q∈Ω

that is, max ρa (Q) > Q∈Ω

Q∈∂Ω

√ 2 max ρa (Q). This seems to explain why Gierer and MeinQ∈∂Ω

hardt had to assume the strong heterogeneity in the coefficient ρa (x) in their simulation of hydra transplantation experiment.

9.3.2 Two-Stage Model When a piece of tissue near the head of another hydra is grafted on the middle of body column of a hydra, a new head is formed in the body column [5]. In the activatorinhibitor model of Gierer and Meinhardt, the activator is a substance which promotes the formation of head. Therefore, to simulate the transplantation experiment by the activator-inhibitor model, it is important to make the model have a stable steady-state solution with a local maximum in the interior of the interval. From the analysis of the shadow system, we know that the locator function plays an essential role to control the location of local maxima. In an on-going project with Maini and Yamamoto [9], we propose a modified activator-inhibitor model, which has two stages of processes: Stage 1 Quick enhancement of the pre-pattern stored in the basic production term σa (x). Small change in σa may be attributed to the cell age. This rapid process adjusts the internal condition of each cell and maximizes the reactions between activator and inhibitor. (intra-cellular process.) Stage 2 The classical pattern formation via reaction-diffusion mechanism. (intercellular process) We did simulations by using the following model system:

9 Patterns Versus Spatial Heterogeneity—From a Variational Viewpoint

137

Fig. 9.2 Patterns by the activator-inhibitor model. Left: Two stage model. Right: Constant coefficients. Φ(x) is the primary locator function. Both are obtained as the limit of the solution of the initial-boundary value problem with the same initial condition

∂A ∂t ∂H τ ∂t ∂μa τa ∂t ∂ρa τa ∂t

∂2 A A2 + σa (x) , − μ (x, t)A + ρ (x, t)c a a a ∂x2 H ∂2 H = Dh − μa (x, t)H + ρa (x, t)ca A2 , ∂x2 = Da

(9.19) (9.20)

= σa (x) − γμa log(μa0 + μa (x, t)) ,

(9.21)

= σa (x) − γρa log(ρa0 + ρa (x, t)).

(9.22)

In the simulations we take τ = 0.5, γμa = 0.2, γρa = 0.15, τa = 0.5 (Fig. 9.2). Our simulations suggest (1) Uniform media slow down the speed of formation of patterns. (2) Two-stage model produces a pattern much faster than single-stage model. (3) In two-stage model, small built-in pattern of the basic production term can determine the final pattern. (4) Local minimum points of the locator function triggers formation of heterogeneity in the activator concentration, but cell-to-cell communication by way of diffusion regulates the distance of adjacent peaks.

References 1. Ai, S., Chen, X., Hastings, S.P.: Layers and spikes in non-homogeneous bistable reactiondiffusion equations. Trans. Am. Math. Soc. 358, 3169–3206 (2006) 2. Cantrell, R.S., Cosner, C.: Spatial Ecology via Reaction-Diffusion Equations. Wiley, Chichester (2003)

138

I. Takagi

3. Casten, R.G., Holland, C.J.: Instability results for reaction diffusion equations with Neumann boundary conditions. J. Differ. Equ. 27, 266–273 (1978) 4. Cosner, C., Lou, Y.: Does movement toward better environments always benefit a population? J. Math. Appl. 277, 489–503 (2003) 5. Gierer, A., Meinhart, H.: A theory of biological pattern formation. Kybernetik (Berlin) 12, 30–39 (1972) 6. Gui, C., Wei, J.: Multiple interior peak solutions for some singularly perturbed Neumann problems. J. Differ. Equ. 158, 1–27 (1999) 7. Hale, J., Sakamoto, K.: Existence and stability of transition layers. Jpn. J. Appl. Math. 5, 367–405 (1988) 8. Hutson, V., Lou, Y., Mischaikow, K.: Spatial heterogeneity of resources versus Lotka-Volterra dynamics. J. Differ. Equ. 185, 97–136 (2002) 9. Maini, P., Takagi, I., Yamamoto, H.: A two-stage model of activator-inhibitor type in developmental biology (in preparation) 10. Matano, H.: Asymptotic behavior and stability of solutions of semilinear diffusion equations. Publ. RIMS, Kyoto Univ. 15, 401–454 (1979) 11. Mimura, M., Nishiura, Y.: Spatial patterns for an interaction-diffusion equation in morphogenesis. J. Math. Biol. 7, 243–263 (1979) 12. Mimura, M., Tabata, M., Hosono, Y.: Multiple solutions of two point boundary value problems of Neumann type with a small parameter. SIAM J. Math. Anal. 11, 613–631 (1980) 13. Murray, J.D.: Mathematical Biology, vol. II, 3rd edn. Springer, New York/Berlin/Heidelberg (2003) 14. Ni, W.-M., Takagi, I.: On the shape of least-energy solutions to a semilinear Neumann problem. Commun. Pure Appl. Math. 44, 819–851 (1991) 15. Ni, W.-M., Takagi, I.: Locating the peaks of least-energy solutions to a semilinear Neumann problem. Duke Math. J. 70, 247–281 (2019) 16. Ni, W.-M., Takagi, I., Yanagida, E.: Stability of least energy patterns of the shadow system for an activator-inhibitor model. Jpn. J. Indust. Appl. Math. 18, 259–272 (2001) 17. Ni, W.-M., Takagi, I., Yanagida, E.: Stability analysis of point-condensation solutions to a reaction-diffusion system (in preparation) 18. Omel’chenko, O., Recke, L.: Existence, local uniqueness and asymptotic approximation of spike solutions to singularly perturbed elliptic problem. Hiroshima Math. J. 45, 35–89 (2015) 19. Ren, X.: Least-energy solution to a nonautonomous semilinear problem with small diffusion coefficient. Electron. J. Differ. Equ. No. 05, approx. 21 pp. (1993) 20. Takagi, I., Yamamoto, H.: Locator function for concentration points in a spatially heterogeneous semilinear Neumann problem. Indiana Univ. Math. J. 68, 63–103 (2019) 21. Turing, A.M.: The chemical basis of morphogenesis. Phil. Trans. Roy. Soc. B 237, 37–72 (1952) 22. Wang, X., Zeng, B.: On concentration of positive bound states of nonlinear Schrödinger equations with competing potential functions. SIAM J. Math. Anal. 28, 633–655 (1997) 23. Wei, J., Winter, M.: Mathematical Aspects of Pattern Formation in Biological Systems. Springer, London (2003)

Chapter 10

A Summary: Quantifying the Complexity of Financial Markets Using Composite and Multivariate Multiscale Entropy Yunfan Lu and Zhiyong Zheng

Abstract This a summary to introduce the composite multiscale entropy analysis and the multivariate multiscale entropy analysis as two new attempts to measure the overall complexity of the stock market, and the results will also be new input dimensions to measure financial risk. According to the combined results of the ensemble empirical mode decomposition and the composite multiscale entropy analysis the investment risk in the Chinese stock market may be relatively low, possibly because of the Chinese government’s supervision of the stock market. And the multivariate multiscale sample entropy is improved to quantify the complexity of multi-channel data over different time scales. Due to the expanded application of this method in the financial field, the complexity of the four ternary return sequences generated by each stock trading time in the Chinese stock market was quantified for the first time. We find that as the stock trading time increases, the complexity of the three-variable return series per hour shows a significant downward trend. As another new attempt, the complexity of the global stock market (Asia, Europe and the United States) is quantified by analyzing the multiple returns of the global stock market.

10.1 Introduction Modern finance is a fast-developing scientific field with the characteristics of integration with multiple disciplines. Correspondingly, in recent years, the demand for financial engineering, financial mathematics and financial statistics has increased in the financial field. In these subject areas, it is generally believed that a more intuitive understanding of standard mathematical theory is needed to better train scientists Y. Lu (B) School of Mathematics, Renmin University of China, Beijing, China e-mail: [email protected] Y. Lu · Z. Zheng Engineering Research Center of Financial Computing and Digital Engineering, Ministry of Education, Beijing, China © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_10

139

140

Y. Lu and Z. Zheng

and financial engineers responsible for financial risks and derivative product pricing. The advantage of the mathematical framework is that it makes risk issues more transparent. Currently, many activities are dedicated to creating and developing new methods to measure and control financial risks, pricing derivative products, and assisting in decision-making for transaction design. For example, the treatment of joint behaviors of asset returns, including volatility clustering, extreme correlations, and cross-sectional changes in returns, plays an important role in risk management [1]. As a special complex evolutionary nonlinear system, the financial market has stylized facts observed empirically, such as the fat-tail phenomenon and the power law of logarithmic returns [2–5], long-term memory and volatility aggregation [6], autocorrelation and cross-correlation [7, 8], multi-fractal of return volatility [9– 11], complexity of financial system [12], etc. These studies, aiming to expand and integrate the known stylized facts, are also essential for risk management, and have attracted wide attention from researchers in nonlinear financial dynamic systems. Today, there are excellent research results in the field of exploring the nonlinear nature of financial markets [5, 9, 10, 13–17]. One direction that has received much attention is to quantify the complexity of financial markets by analyzing financial time series. The multi-scale entropy (MSE) method recently proposed by Costa et al. [18] has become a useful and popular method to quantify signal complexity in different research fields, such as biomedical and physiological time series [26], vibration of rotating machinery [25], gait patterns of people in different ages [19, 20] and financial time series [12]. The MSE method (which can evaluate the sample entropy from a univariate time series and perform coarsening on multiple scales to show the longterm correlation on a certain range of time scales in complex systems) has been shown to distinguish between different degrees of complexity physiological time series. However, research also shows that the MSE method has certain limitations in different applications. In order to solve such problems, some derived methods based on the MSE method have been proposed. The composite multi-scale entropy (CMSE) algorithm is proposed to improve the reliability of distinguishing time series generated by different systems. In order to extend univariate MSE to multivariate situations, Ahmed and Mandic introduced multivariate sample entropy (MSampEn) and evaluated its evolution on multiple time scales to perform multivariate multiscale entropy (MMSE) analysis [21]. The introduction of multivariate multiscale entropy (MMSE) has shown great advantages in operating any number of data channels at the same time, and provides a dynamic complexity measure for multivariate data observed from the same system, especially when the system has a large uncertainty or potential coupling. This method has been further validated in the simulation of synthetic and real-world multiple processes, such as gait, wind and physiological data [21–23]. In this article, we summarize the application of composite multiscale entropy (CMSE) and multivariate multiscale entropy (MMSE) in quantifying the complexity of financial markets by analyzing stock returns. And the application of multivariate multiscale entropy (MMSE) in the analysis of financial market multivariate time series is the first time.

10 A Summary: Quantifying the Complexity of Financial Markets …

141

10.2 Quantifying the Complexity with Composite Multiscale Entropy Algorithm 10.2.1 Composite Multiscale Entropy Algorithm The modern financial market as a complex evolutionary nonlinear dynamic system, the complexity of its time series has always been an attractive focus in financial economics. Multi-scale entropy (MSE) analysis proposed by Costa et al. [18] has become an important method to evaluate the complexity of time series by quantifying the sample entropy on multiple time scales. The MSE method has been successfully applied in different research fields, including biomedical signals, rainfall time series, electric shock time series, vibration of rotating machinery, and financial time series [12, 24–29]. The conventional MSE algorithm includes two steps. For detailed steps, see [16]. It can be seen from the algorithm that since the length of each coarsegrained time series is equal to the length of the original time series divided by the scale factor τ , as the length of the coarse-grained time series decreases, the variance of the entropy measurement increases. Under large scale factors, the estimation error of the conventional MSE algorithm will be very large. And this disadvantage leads to a reduction in the reliability of distinguishing time series generated by different systems. In order to overcome the limitations of the MSE algorithm, an effective modification of the composite multiscale entropy (CMSE) algorithm is proposed in the first step. For a given one-dimensional discrete time series x = {x1 , x2 , . . . , x N }, the kth τ τ τ , yk,2 , . . . , yk, coarse-grained time series of scale factor τ is defined as yk(τ ) = {yk,1 p} [12, 30], where τ yk, j =

1 τ

jτ +k−1



xi ,

1 ≤ j ≤ N /τ, 1 ≤ k ≤ τ.

(10.1)

i=( j−1)τ +k

For a clearer explanation, Fig. 10.1a shows the first step of the CMSE program. In the second step of the algorithm, the sample entropy of all coarse-grained time series is calculated with the given time scale factor τ accordingly. Then, we calculate the CMSE value through the average entropy value by C M S E(x, τ, m, γ ) =

τ 1 SampEn(yk(τ ) , m, γ ). τ k=1

(10.2)

Figure 10.1b shows the flow chart of the CMSE algorithm. It has been demonstrated in reference [12] that for white noise and 1/ f noise, the error bars (variances) obtained by CMSE is shorter than that obtained by MSE. This shows that the CMSE method has advantages in effectively detecting the behavior of noise signals and improving the estimation accuracy of short-term time series.

142

Y. Lu and Z. Zheng

Fig. 10.1 a Schematic diagram of the coarse-grained process in the CMSE algorithm. b A simple flowchart of the CMSE algorithm

For empirical research, we selected the daily closing prices of seven global real stock indexes, namely: Dow Jones Industrial Average (DJI, United States), Financial Times and Stock Exchange (FTSE, United Kingdom), Costation Assisee en Continu (CAC, France), Deutsche Aktion Index (DAX, Germany), Hang Seng Index (HSI, Hong Kong, China), Shanghai Stock Exchange (SSE, China) and Shenzhen Stock Exchange (SZSE, China). The data used was generated from September 2004 to September 2014 (no data on weekends and holidays), and the closing price of each stock index is about 2500 data points. By applying the CMSE algorithm, we calculated the SampEn value of the returns of the seven stock indexes (DJI, FTSE, CAC, DAX, HSI, SSE and SZSE) on a scale factor of 1 to 60. According to Ref. [30], the two parameters in the algorithm are set to m = 3, γ = 0.15σ , σ is the standard deviation of the input time series. Table 10.1 and Fig. 10.2 clearly show the experimental results of applying the CMSE algorithm to analyze the seven yield time series. As we all know, in order to better Table 10.1 CMSE results for seven real stock returns (DJI, FTSE, CAC, DAX, HSI, SSE and SZSE) Scale

1

DJI

1.7498 1.2219 1.0277 0.8703 0.7227 0.6608 0.5592 0.5007 0.4285 0.3578 0.2456 0.1675

3

5

7

10

13

17

20

25

30

40

60

FTSE

1.8521 1.2911 1.0583 0.8950 0.7752 0.6813 0.5617 0.4884 0.4240 0.3737 0.2992 0.2279

CAC

2.1216 1.4737 1.1939 1.0474 0.8966 0.8297 0.7250 0.6513 0.5646 0.4935 0.4120 0.3318

DAX

2.0594 1.5393 1.3012 1.1458 0.9834 0.8838 0.7793 0.7011 0.6119 0.5348 0.4273 0.3057

HSI

2.2009 1.6705 1.4750 1.2880 1.1126 0.9725 0.8982 0.8034 0.7170 0.6340 0.5404 0.3747

SSE

2.3872 1.9302 1.7274 1.5337 1.3818 1.3371 1.2574 1.1589 1.0461 0.9954 0.8111 0.6300

SZSE

2.6002 2.0945 1.8572 1.7036 1.5414 1.5023 1.4074 1.3105 1.1703 1.0501 0.9689 0.7577

10 A Summary: Quantifying the Complexity of Financial Markets …

143

characterize the characteristics of the financial market, not only should we consider the specific result value of CMSE, but also the curve trend of the result value [12, 18]. Obviously, at the first scale, the value of CMSE for each time series of returns is the maximum value. When the time scale factor increases from 1 to 40, the value of each CMSE decreases monotonously as a whole, and then gradually stabilizes to a constant value. This result is consistent with the fact that unlike 1/ f noise [12, 18], when the time scale is traversed from 1 to 40, all analyzed stock indexes cannot contain complex structures that span multiple time scales. However, the tendency to stabilize on a large scale indicates that they may contain complex structures on a large time scale. Interestingly, the CMSE curves of these seven stock indexes have a clear order, which depends on the size of the CMSE value on multiple time scales. The CMSE curve of Shenzhen Stock Exchange is significantly higher than all other curves, which indicates that it has the highest complexity. The second is for SSE, followed by HSI, DAX, CAC, FTSE and DJI. There are also two sets of curves, DJI and FTSE, CAC and DAX, which are close to each other in multiple time frames. A unified concept indicates that physiological complexity is fundamentally related to the adaptability of organisms, which requires comprehensive, multi-scale functionality [18, 24]. Mapping this result to the financial market, we believe that when the time series of yields show higher complexity, the system stability in the stock market it represents is stronger. The above results also indicate that China’s stock index has more obvious complexity than other countries’ indexes, which may be because the Chinese government’s supervision of the stock market is stronger than other countries. In the results, DJI and FTSE, CAC and DAX have similar complexity attributes, respectively. This finding indicates that the curve of the CMSE value can be used as a useful method to help separate time series representing different financial market indices.

10.2.2 The Combined Application of EEMD Method and CMSE Method Empirical Mode Decomposition (EMD) [31], an adaptive and efficient method for time-frequency data analysis, has proven to be versatile in a wide range of applications. This method can extract a complete set of almost orthogonal intrinsic modal functions (IMFs) from complex signals generated during noise nonlinearity and nonstationary processes [31]. However, it has a disadvantage that the pattern mixing occurs frequently. In order to alleviate this defect, Wu and Huang [32] proposed a noise-assisted data analysis method, the ensemble empirical mode decomposition (EEMD). This method defines the true IMF component as the average value of a set of experiments, each component is composed of a signal plus white noise with limited amplitude [31, 33]. By adding limited noise, EEMD largely eliminates the mode mixing problem and retains the physical uniqueness of decomposition. For detailed steps of EEMD, please refer to [16]. We selected the first six EEMD components

144

Y. Lu and Z. Zheng

Fig. 10.2 CMSE results for seven real stock returns (DJI, FTSE, CAC, DAX, HSI, SSE and SZSE)

Fig. 10.3 a, b The CMSE values and box plots of the first six IMFs for DJI and SZSE respectively

(IMF1 to IMF6) for CMSE method analysis to study their complexity. We use the EEMD method to decompose the seven stock market yield series, and then perform CMSE analysis on the resulting IMFs to investigate whether their components IMFs still has the complex properties of the original yield series. The experimental results of DJI and SZSE are shown in Fig. 10.3a–b respectively. The results clearly have a similar trend, especially for the first scale, the CMSE values of IMF1 to IMF6 drop in an orderly manner. From IMF2 to IMF6, the curve of the CMSE value begins to show a significant curve, which is different from the original data and the curve of IMF1, and the scale corresponding to the first curve is also increasing. In order to further study the complexity of IMFs, we re-tested the IMFs of seven real stock index returns. The parameters γ used were obtained corresponding to

10 A Summary: Quantifying the Complexity of Financial Markets …

145

their own standard deviations. Figure 10.4 and Table 10.2 show the results of CMSE analysis of the IMFs sequence of seven real stock index returns. In Fig. 10.4a, all the curves in the CMSE analysis results of IMF1s show that the maximum appears on the first scale, and the curve shows a clear monotonous downward trend This is similar to the trend of the CMSE result curve of the original data in Fig. 10.2. In Fig. 10.4b, unlike the original data and IMF1, the CMSE curve of IMF2s bends at scale 2 and reaches its peak, then gradually decreases. In Fig. 10.4c, all the CMSE curves of IMF3s (same as IMF2s) bend at scale 4 and reach their peaks, then roughly decrease. In Fig. 10.4d, all curves of CMSE of IMF4s (same as IMF2s and IMF3s) bend at scale 7 or 8 and reach their peaks, and then also decrease substantially. The peak value of the CMSE curve of IMF5s in Fig. 10.4e is delayed, and it appears in the scale range [12, 21], and then these curves begin to show small fluctuations. Similarly, the CMSE curve of IMF6s in Fig. 10.4f reaches its peak, which is postponed until the scale 24, and begins to show more obvious fluctuations than IMF5s. At the same time, there is an interesting phenomenon in the above results. The order of the CMSE curves of all IMFs is roughly consistent with the order of the original data in Fig. 10.2. Therefore, the IMFs of the time series of stock returns still have complex properties and roughly maintain the same order as the original data. However, their complexity performance in complex structures may be different from that of the original series.

10.3 Quantifying the Complexity Using Multivariate Multiscale Entropy Analysis 10.3.1 Definition of Multivariate Sample Entropy Recently, Ahmed and Mandic [21] proposed multivariate sample entropy (MSampEn), which can calculate the entropy of multi-channel data by considering intrachannel and inter-channel dependencies, and introduce it into multi-dimensional multi-scale entropy (MMSE) analysis. We show the multivariate sample entropy (MSampEn) in Algorithm 1, by extending the standard univariate sample entropy in Ref. [18] and introduce multivariate multiscale entropy (MMSE) analysis in Algorithm 2. In Algorithm 1, the multivariate sample entropy method is based on the estimation of the following conditional probability: when the next data point is included, two similar sequences will remain similar. This is achieved by calculating the average number of adjacent delay vectors for a given tolerance level r and repeating the process after increasing the embedding size from m to m + 1. Figure 10.5 show the principle of multi-sample entropy calculation of the two high-frequency (fiveminute interval) returns of the Shanghai Stock Exchange (SSE) and Shenzhen Stock Exchange (SZSE), respectively expressed as [x(t) and y(t)], The data used is from February 16, 2015 to February 18, 2015. For the sake of clarity, only 100 points are shown for each sequence in Fig. 10.5a. To illustrate the principle, when the embedding dimension m = 2, we assume the time lag vector τ = [1, 1] and the embedding

146

Y. Lu and Z. Zheng

Fig. 10.4 a–f The CMSE values of IMF1s, IMF2s, IMF3s, IMF4s, IMF5s and IMF6s from seven real yield series respectively

10 A Summary: Quantifying the Complexity of Financial Markets … Table 10.2 Scales of peak points of IMFs for the real data Data DJI FTSE CAC DAX IMF1 IMF2 IMF3 IMF4 IMF5 IMF6

1 2 4 7 16 24

1 2 4 8 14 24

1 2 4 7 15 33

1 2 4 8 12 31

147

HSI

SSE

SZSE

1 2 4 7 15 28

1 2 4 8 17 34

1 2 4 8 21 26

vector M = [1, 1], then the composite bivariate delay vector is [x(t), y(t)], as shown in Fig. 10.5b, where t represents time as the sample index. During the calculation of MSampEn, for any such vector (e.g., [x(42), y(42)]), we need to calculate the number of neighbors within the distance r (tolerance level), represented by a circle with [x(42), y(42)] is the center, and the radius in Fig. 10.5b is r . Then, when the embedding dimension is increased from m = 2 to m = 3, the embedding vector M evolves into two new embedding vectors M = [2, 1] and M = [1, 2], according to step 4 in Algorithm 1. This is why we have two possible subspaces: (I) The subspaces of all vectors [x(t), x(t + 1), y(t)] are shown in Fig. 10.5c, (Ii) The subspaces of all vectors [x(t), y(t), y(t + 1)] are shown in Fig. 10.5d. Similarly, for any such vector (e.g., [x(42), x(43), y(42)]), we also need to calculate the number of neighbors within the distance r (tolerance level), as shown in Fig. 10.5, The sphere with radius r is centered on [x(42), x(43), y(42)]. For any such vector (e.g., [x(42), y(42), y(43)]) in Fig. 10.5d, we should perform a similar count. We adopt this strict method to compare the composite delay vectors (to find neighbors) not only in each subspace but also in all subspaces, so as to fully consider the within- and cross-channel correlation. From Algorithm 2, we can obtain a multivariate MSE (MMSE) curve as a proof of experimental results, where MSampEn (as a function of scale factor λ) is used to evaluate the relative complexity of the normalized multi-channel time series. MMSE analysis has its own interpretation, summarized as follows [21]: (i) In most time scales, if the MSampEn values of multiple scales of the multivariate time series X are greater than the MSampEn values of the multivariate time series Y , the dynamic complexity of the signal X is greater than the signal Y . (ii) The multivariate time series X containing useful information in a smaller scale range is represented by the monotonic decrease shown in the multivariate MSE curve, which is a typical feature of completely random time series. (iii) If the MSampEn value with multiple scales is kept constant, or the multivariate MSE plot shows a monotonic increase, the multivariate time series X exhibits a long-term correlation and the multivariate system exhibits complex dynamics. Algorithm 1. Multivariate sample entropy (MSampEn) N (k = 1, 2, . . . , p), N is the number of Step 1: For a p-variate time series {xk,i }i=1 samples in every channel, the embedding vector is M = [m 1 , m 2 , . . . , m p ] ∈ R p , the delay vector is τ = [τ1 , τ2 , . . . , τ p ]. By forming N − νm composite delay vectors,

148

Y. Lu and Z. Zheng

Fig. 10.5 The two high-frequency time series of SSE and SZSE returns (five-minute interval) are represented as x(t) and y(t), respectively, with a length of 100. b 2D graph of composite delay vector [x(t), y(t)]. c 3D graph of vectors [x(t), x(t + 1), y(t)]. d 3D graph of the vector [x(t), y(t), y(t + 1)]

the multivariate embedded reconstruction [36] is defined as X m (i) = [x1,i , x1,i+τ1 , . . . , x1,i+(m 1 −1)τ1 , x2,i , x2,i+τ2 , . . . , x2,i+(m 2 −1)τ2 , . . . , x p,i , x p,i+τ p , . . . , x p,i+(m p −1)τ p ], (i = 1, 2, . . . , N − νm )

(10.3) p where m = k=1 m k , νm is defined as νm = max{M} × max{τ }. Step 2: Referring to the maximum norm [34], we define the distance d[X m (i), X m ( j)] between any two composite delay vectors X m (i) and X m ( j) as d[X m (i), X m ( j)] = max {|x(i + l − 1) − x( j + l − 1)|}. l=1,...,m

(10.4)

Step 3: For a given composite delay vector X m (i) And a given threshold r , the number of instances, expressed as Pim is obtained by satisfying condition d[X m (i), X m ( j)] ≤ r (1 ≤ j ≤ N − νm , j = i), and its frequency of occurrence is

10 A Summary: Quantifying the Complexity of Financial Markets …

calculated by Bim (r ) = defined as

1 Pm. N −νm −1 i

Bm (r ) =

149

By taking the average value over all i, Bm (r ) is N −νm 1 B m (r ). N − νm i=1 i

(10.5)

Step 4: The dimension in Step 1 is increased from m to m + 1. Considering the previous space with the embedding vector, M = [m 1 , m 2 , . . . , m k , . . . , m p ] (k = 1, 2, . . . , p), the system can evolve into p different forms of space with the new embedding vector, [m 1 , m 2 , . . . , m k + 1, . . . , m p ], by changing the specific variable k from 1 to p gradually and simultaneously keeping the dimension of other variables k = (m k + 1) × max{τ } = (max{M} + unchanged. Supposing m k = max{M}, νm+1 k a total 1)  × max{τ } = νm + max{τ }; If m k = max{M}, there is νm+1  p= νm . Thus p k k m+1 (i = 1, 2, . . . , k=1 (N − νm+1 )) are of k=1 (N − νm+1 ) vectors X m+1 (i) ∈ R obtained, and the overall embedding dimension of the system undergoes the change from m to m + 1. Pim+1 , is calculated for which Step 5: For a given X m+1 (i), the number  p of instances, k d[X m+1 (i), X m+1 ( j)] ≤ r (1 ≤ j ≤ k=1 (N − νm+1 ), j = i), the frequency of its occurrence is calculated, Bim+1 (r ) is defined as 1 Pim+1 . k k=1 (N − νm+1 ) − 1

Bim+1 (r ) =  p

(10.6)

Finally, by taking the average over all i in (m + 1) dimensional space, Bm+1 (r ) is defined as p k k=1 (N −νm+1 )  1 Bim+1 (r ). (10.7) Bm+1 (r ) =  p k (N − ν ) m+1 k=1 i=1 Step 6: For the tolerance level r , MSampEn is estimated as M SampEn(M, τ, r, N ) = − ln

B

m+1 (r )

Bm (r )



.

(10.8)

Algorithm 2. Multivariate multiscale entropy (MMSE) Analysis N Step 1: For a p-variate time series {xk,i }i=1 (k = 1, 2, . . . , p), N is the number of samples in every channel, λ is a scale factor, the corresponding coarse-grained multivariate time series is defined as λ yk,l =

λl 1  N xk,i , (1 ≤ l ≤ , k = 1, 2, . . . , p). λ i=λ(l−1)+1 λ

(10.9)

Step 2: By using the steps in Algorithm 1, the multivariate sample entropy MSampEn λ is evaluated for each coarse-grained multivariate time series yk,l , and then MSampEn can be expressed as a function of the scale factor λ.

150

Y. Lu and Z. Zheng

Fig. 10.6 a A schematic diagram illustrating the grouping process of 12 groups of five-minute returns. b The display of X 1 s for SSE and SZSE and the box plot of all 12 groups of five-minute returns for SSE and SZSE

10.3.2 Quantitative Analysis of the Complexity of China’s Stock Market In this section [35], we apply the MMSE method to analyze the complexity of financial time series. Inspired by Ref. [12], we chose the high-frequency (five-minute interval) returns of the Shanghai Stock Exchange (SSE) and Shenzhen Stock Exchange (SZSE) from November 1, 2013 to February 18, 2016. The data is 26784 points (five minutes return). The Chinese stock market trades 4 h a day, from 9:35 a.m. to 11:30 a.m. and 1:00 p.m. to 3:00 p.m., So there are 48 high-frequency (five-minute intervals) data per trading day. We divide the 48 5-minute gains into 12 groups at intervals of 20 minutes each day, and each group has 4 5-minute gains. Then, at the same time the next day, we connected each set of data with the corresponding data set, Finally, 12 groups of five-minute return sequences are generated, denoted as X 1 , X 2 , . . . , X 12 , and the length of each group is 2232. In order to better illustrate the grouping process, a visual display of the process is shown in Fig. 10.6a. Pink Dk1 is the trading day, The black xk2 ,k3 is the five-minute return on each trading day, The blue X k4 is a set of five-minute returns. The colored boxes clearly show the grouping results. In order to observe the value of the five-minute return rate of each group, the X 1 s of SSE and SZSE were selected as the representatives shown in Fig. 10.6b. The box plots of the 12-group five-minute returns of SSE and SZSE are also shown in Fig. 10.6b. The simple statistical properties expressed by these box plots clearly show the similarities (e.g. the mean of the data) and differences (e.g. the range of the data) between the five-minute yield groups. Then we generate 4 three-variable time series to represent each trading hour, Y1 = [X 1 , X 2 , X 3 ] for the 1st hour, Y2 = [X 4 , X 5 , X 6 ] for the 2nd hour, Y3 = [X 7 , X 8 , X 9 ] for the 3rd hour, Y4 = [X 10 , X 11 , X 12 ] for the 4th hour. In order to study the complexity of the stock market in different trading hours, we applied MMSE analysis to study the four three-variable time series Y1 , Y2 , Y3 , Y4

10 A Summary: Quantifying the Complexity of Financial Markets …

151

Fig. 10.7 MMSE results of Y1 , Y2 , Y3 , Y4 from SSE and SZSE

(corresponding to 4 different trading hours) from the Shanghai Stock Exchange and Shenzhen Stock Exchange. The MMSE analysis results are shown in Fig. 10.7, using isometric subgraphs for better presentation. By comprehensively analyzing the results of the MMSE curve in Fig. 10.7 and the numerical results of MMSE in Table 10.3, we find that all MMSE curves show a monotonous decrease, which means that the multiple income series contains useful information on a smaller scale. For all time scales (especially smaller time scales), The four ternary time series MSampEn values from the Shenzhen Stock Exchange are correspondingly larger than those from the Shanghai Stock Exchange, This indicates that the complexity of the hourly stock returns of the Shenzhen Stock Exchange is higher than those of the Shanghai Stock Exchange. In Fig. 10.7, for SZSE and SSE, the MMSE values of Y1 , Y2 , Y3 and Y4 gradually decrease on a smaller scale (when the scale factor τ ∈ [1, 5]), and all tend to be less than 0.1 (when the scale factor τ ∈ [15, 20]). The error bars (SD) of the average line in the last sub-graph shows that as the scale factor increases, the difference in complexity of different trading hours gradually decreases. It is also clearly shown in Table 10.3, when τ = 1, the MSampEn values of Y1 , Y2 , Y3 and Y4 from SSE are 0.5766, 0.5297, 0.4428, 0.3935 respectively, the MSampEn values of Y1 , Y2 , Y3 and Y4 from SZSE at τ = 1 are 0.7528, 0.6917, 0.5658, 0.4366 respectively. These results show that the complexity of the stock market’s hourly returns is different and shows a clear downward trend as the stock trading time increases. From the results of 4 different trading hours, it can be seen that the system complexity of the stock market in the forenoon is significantly higher than that in the afternoon.

152

Y. Lu and Z. Zheng

Table 10.3 MSampEn values of Y1 , Y2 , Y3 , Y4 from SSE and SZSE Data: SSE Scale factor(τ ) 1 2 3 5 7 10 12 Y1 0.5766 Y2 0.5297 Y3 0.4428 Y4 0.3935 Mean 0.4856 SD 0.0717 Data: SZSE Y1 0.7528 Y2 0.6917 Y3 0.5658 Y4 0.4366 Mean 0.6118 SD 0.1215

15

20

0.4007 0.4104 0.3442 0.2686 0.3560 0.0564

0.2806 0.2928 0.2548 0.2078 0.2590 0.0326

0.1760 0.2069 0.1632 0.1466 0.1732 0.0221

0.1567 0.1646 0.1373 0.1029 0.1404 0.0238

0.1207 0.1306 0.1243 0.1067 0.1206 0.0087

0.0938 0.1210 0.1136 0.0992 0.1069 0.0109

0.0706 0.0902 0.0935 0.0872 0.0854 0.0088

0.0638 0.0699 0.0959 0.0772 0.0767 0.0120

0.5456 0.5508 0.4206 0.3351 0.4630 0.0904

0.3935 0.4003 0.3092 0.2668 0.3424 0.0565

0.2463 0.2643 0.2090 0.1740 0.2234 0.0348

0.2149 0.2025 0.1465 0.1306 0.1736 0.0358

0.1394 0.1613 0.1365 0.1171 0.1386 0.0156

0.1038 0.1446 0.1185 0.0956 0.1156 0.0186

0.0891 0.0919 0.1067 0.0925 0.0950 0.0069

0.0754 0.0701 0.1047 0.0762 0.0816 0.0135

10.3.3 An Attempt to Quantify the Complexity of the Global Stock Market By conducting multivariate data analysis at the same time, the multivariate MSE method can perform a more comprehensive and systematic analysis of complex systems. In this section, we use the MMSE method to create a new method to analyze the complexity of the stock markets in three different regions (Asia, Europe and the Americas), and try to distinguish their differences by studying the daily returns of stock indexes. In Asian stock markets, we choose HSI (China’s Hang Seng Index), KOSPI (South Korea’s Korea Composite Stock Price Index) and Nikkei (Japan’s Nikkei Index). For the European region, we choose DAX (German DAX Index), CAC (France CAC Index) and FTSE (UK Financial Times and Stock Exchange). For the United States, we choose S&P (Standard & Poor’s Index), NASDAQ (Nasdaq Index) and DJI (Dow Jones Index). Their daily closing prices all come from the same period, from January 2, 1991 to July 1, 2016, with about 6,000 data points (due to the different non-trading hours of these markets and other issues, there are still some slight differences). Then, we obtain 9 yield time series and recombine them into three multivariate time series that can represent Asia, Europe and America respectively. The MMSE analysis algorithm shows that if a single variable is input into the method, the method is equivalent to the MSE method and is still valid [21, 22]. Therefore, we first perform MMSE analysis on each univariate return (HSI, KOSPI, Nikkei), and then analyze the corresponding multiple return (Asia). The results are shown in Fig. 10.8a. The data from Europe and the Americas were analyzed in the

10 A Summary: Quantifying the Complexity of Financial Markets …

153

Fig. 10.8 a MMSE results for each univariate return (HSI, KOSPI, Nikkei) and corresponding multiple return in Asia. b MMSE results for each univariate return (DAX, CAC, FTSE) and corresponding multiple returns in Europe. c MMSE results of the US univariate returns (S&P, NASDAQ, DJI) and corresponding multiple returns. d MMSE results for multiple returns in Asia, Europe and America for comparison

same process, and the results are shown in Fig. 10.8b and Fig. 10.8c, respectively. In Fig. 10.8, It can be easily found that the MMSE graphs of all returns show the same trend, that is, they decrease as the scale factor λ decreases, This means that all returns contain useful information on a smaller scale. The difference between them is only in the range of values, and the histograms of different heights are used to express their complexity in order to more easily display their differences. Through the MMSE results on a small scale, we have an interesting finding that the MMSE values of multiple returns in Asia in Fig. 10.8a and Europe in Fig. 10.8b are lower than their corresponding univariate returns. However, the MMSE value of America’s multiple returns in Fig. 10.8c is higher than its corresponding univariate return. For a clearer comparison, the MMSE results of their multiple returns are shown in Fig. 10.8d, this shows that the multiple income from the United States shows the highest complexity, followed by Asia and Europe.

154

Y. Lu and Z. Zheng

10.4 Conclusions In this article, we summarize the application of compound multiscale entropy (CMSE) analysis and multivariate multiscale entropy (MMSE) analysis in quantifying the complexity of financial markets. Relevant conclusions may help to measure and control financial risks, pricing derivatives and other areas of concern in modern financial markets. The experimental results of CMSE analysis lead to an interesting finding that stock markets with higher complexity may be relatively more stable and investors face relatively low investment risks. Compared with other countries, China’s two classic stock markets may have relatively low investment risks, which may be attributed to the proper supervision of the stock market by the Chinese government. We also conducted a new experiment to apply the MMSE method to quantify the complexity of the four three-variable time series generated by each stock trading time of the Shanghai Stock Exchange and Shenzhen Stock Exchange in the Chinese stock market. This experiment made three new discoveries: the multivariate return sequence from the Chinese stock market contains useful information at small scales; the sequence of stock returns for each trading hour of the Shenzhen Stock Exchange is more complicated than that of the Shanghai Stock Exchange; with the lapse of daily stock trading time, the complexity of stock returns shows a clear downward trend. In addition, there is another new attempt, the MMSE method is used to analyze the complexity of the stock market in the three major regions of Asia, Europe and the Americas. The empirical results show that the multiple stock returns from the United States have the highest complexity, followed by Asia and then Europe.

References 1. Bouchaud, J.P., Potters, M.: Theory of Financial Risk and Derivative Pricing from Statistical Physics to Risk Management. Cambridge University Press, Cambridge (2003) 2. Gabaix, X., Gopikrishnan, P., Plerou, V., Stanley, H.E.: A theory of power-law distributions in financial market fluctuations. Nature 423, 267–270 (2003) 3. Drozdz, S., Forczek, M., Kwapien, J., Oswiecimka, P., Rak, R.: Stock market return distributions: from past to present. Physica A 383, 59–64 (2007) 4. Drozdz, S., Kwapien, J., Gruemmer, F., Ruf, F., Speth, J.: Are the contemporary financial fluctuations sooner converging to normal? Acta Phys. Pol. B 34, 4293–4306 (2003) 5. Lux, T.: Financial Power Laws: Empirical Evidence, Models and Mechanisms. Cambridge University Press, Cambridge (2008) 6. Lo, A.W.: Long-term memory in stock market prices. Econometrica 59, 1279–1313 (1991) 7. Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L., Stanley, H.E.: Econophysics: financial time series from a statistical physics point of view. Physica A 279, 443–456 (2000) 8. Mantegna, R.N., Stanley, H.E.: An Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge University Press, Cambridge (2000) 9. Hong, W.J., Wang, J.: Multiscale behavior of financial time series model from Potts dynamic system. Nonlinear Dyn. 78, 1065–1077 (2014) 10. Drozdz, S., Kwapien, J., Oswiecimka, P., Rak, R.: Quantitative features of multifractal subtleties in time series. Europhys. Lett. 88, 60003 (2009)

10 A Summary: Quantifying the Complexity of Financial Markets …

155

11. Fang, W., Wang, J.: Statistical properties and multifractal behaviors of market returns by Ising dynamic systems. Int. J. Mod. Phys. C 23, 1250023 (2012) 12. Niu, H.L., Wang, J.: Quantifying complexity of financial short-term time series by composite multiscale entropy measure. Commun. Nonlinear Sci. Numer. Simul. 22, 375–382 (2015) 13. Lopes, A.M., Machado, J.A.T.: Analysis of temperature time-series: embedding dynamics into the MDS method. Commun. Nonlinear Sci. Numer. Simulat. 19, 851–871 (2014) 14. Machado, J.A.T., Duarte, F.B., Duarte, G.M.: Analysis of stock market indices through multidimensional scaling. Commun. Nonlinear Sci. Numer. Simulat. 16, 4610–4618 (2011) 15. Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis. Cambridge University Press, Cambridge (1997) 16. Lu, Y.F., Wang, J.: Nonlinear dynamical complexity of agent-based stochastic financial interacting epidemic system. Nonlinear Dyn. 86, 1823–1840 (2016) 17. Machado, J.A.T.: Complex dynamics of financial indices. Nonlinear Dyn. 74, 287–296 (2013) 18. Costa, M., Goldberger, A.L., Peng, C.K.: Multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 89(6), 068102 (2002) 19. Costa, M., Peng, C.K., Goldberger, A.L., Hausdorff, J.M.: Multiscale entropy analysis of human gait dynamics. Physica A 330, 53–60 (2003) 20. Bisi, M.C., Stagni, R.: Complexity of human gait pattern at different ages assessed using multiscale entropy: from development to decline. Gait Posture 47, 37–42 (2016) 21. Ahmed, M.U., Mandic, D.P.: Multivariate multiscale entropy: a tool for complexity analysis of multichannel data. Phys. Rev. E 84, 3067–3076 (2011) 22. Ahmed, M.U., Mandic, D.P.: Multivariate multiscale entropy analysis. IEEE Signal Process. Lett. 19, 91–94 (2012) 23. Gao, Z.K., Ding, M.S., Geng, H., Jin, N.D.: Multivariate multiscale entropy analysis of horizontal oil-water two-phase flow. Physica A 417, 7–17 (2015) 24. Costa, M., Goldberger, A.L., Peng, C.K.: Multiscale entropy analysis of biological signals. Phys. Rev. E 71, 021906 (2005) 25. Wu, S.D., Wu, P.H., Wu, C.W., Ding, J.J., Wang, C.C.: Bearing fault diagnosis based on multiscale permutation entropy and support vector machine. Entropy 14(8), 1343–1356 (2012) 26. Humeau, A., Mahe, G., Chapeau-Blondeau, F., Rousseau, D., Abraham, P.: Multiscale analysis of microvascular blood flow: a multiscale entropy study of laser Doppler flowmetry time series. IEEE Trans. Biomed. Eng. 58, 2970–2973 (2011) 27. Humeau, A., Mahe, G., Durand, S., Abraham, P.: Multiscale entropy study of medical laser speckle contrast images. IEEE Trans. Biomed. Eng. 60, 872–879 (2013) 28. Chou, C.M.: Wavelet-based multi-scale entropy analysis of complex rainfall time series. Entropy 13, 241–253 (2011) 29. Guzman-Vargas, L., Ramirez-Rojas, A., Angulo-Brown, F.: Multiscale entropy analysis of electroseismic time series. Nat. Hazards Earth Syst. Sci. 8, 855–860 (2008) 30. Wu, S.D., Wu, C.W., Lin, S.G., Wang, C.C., Lee, K.Y.: Time series analysis using composite multiscale entropy. Entropy 15(3), 1069–1084 (2013) 31. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., Yen, N.C., Tung, C.C., Liu, H.H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 454, 903–995 (1998) 32. Wu, Z., Huang, N.E.: Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1, 1–41 (2009) 33. Sharma, G.K., Kumar, A., Jayakumar, T., Rao, B.P., Mariyappa, N.: Ensemble Empirical Mode Decomposition based methodology for ultrasonic testing of coarse grain austenitic stainless steels. Ultrasonics 57, 167–178 (2015) 34. Grassberger, P., Schreiber, T., Schaffrath, C.: Nonlinear times sequence analysis. Int. J. Bifurcat. Chaos 1, 521 (1991) 35. Lu, Y.F., Wang, J.: Multivariate multiscale entropy of financial markets. Commun. Nonlinear Sci. Numer. Simulat. 52, 77–90 (2017) 36. Cao, L., Mees, A., Judd, K.: Dynamic from multivariate time series. Physica D: Nonlinear Phenom. 121(1–2), 75–88 (1998)

Chapter 11

Operator-Valued Dirichlet Forms and Module Operator Markov Semigroups Lunchuan Zhang

Abstract In this chapter, we extend the noncommutative symmetric Dirichlet forms to the operator-valued setting based on the framework of order Hilbert W ∗ -bimodules, and establish the Beurling-Deny criterion between operator-valued Dirichlet forms and the associated module operator Markov semigroups, which contain all of the scalar-valued Dirichlet forms previously studied on various noncommutative probability spaces as special cases. Finally, example of operator-valued Dirichlet form is given by module derivation in operator-valued free probability theory.

11.1 Introduction The theory of commutative Dirichlet forms, began with the work of Beurling and Deny [1]. The study of noncommutative Dirichlet form originated with the work of L. Gross [2]. The general analysis of Dirichlet form and the associated Markov semigroup in the noncommutative setting of a C ∗ -algebra with a semifinite trace τ on it, was pioneered by S. Albeverio and R. Hoegh- Krohn [3], where, in particular, they obtained the generalization of the Beurling-Deny criterion of Markov semigroup in terms of Dirichlet form. From then on, the theory of noncommutative Dirichlet forms and associated Markov semigroups was subsequently developed by J.L. Sauvageot [4], E.B. Davies and O.S. Rothaus [5], E.B. Davies and J.M. Lindsay [6], S. Goldstein and J.M. Lindsay [7], D. Guido, T. Isola and S. Scarlatti [8], F. Cipriani [9], Y.M. Park [10], F. Cipriani and J.L. Sauvageot [11], etc., based on various noncommutative probability spaces. It has been recognized that noncommutative Dirichlet forms theory shares a flavor of geometry in the sense of Connes’ noncommutative geometry [12]. The ranges of the above mentioned various noncommutative Dirichlet forms are all scalar-valued. It is natural to expect an extension of the theory of noncommutative L. Zhang (B) School of Mathematics, Renmin University of China, Beijing 100872, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_11

157

158

L. Zhang

Dirichlet forms to the operator-valued cases, and find an unified means to deal with the noncommutative Dirichlet forms based on the above kinds of noncommutative probability spaces. In fact, as early as in [13, 14] we tried to characterize a class of operator-valued Dirichlet forms and associated module operator Markov semigroups based on special order Hilbert W ∗ -bimodule l2 ⊗ Mn (C) and l2 ⊗ A, respectively, where Mn (C) is a n × n matrix algebra, and A is a I I1 -factor. Based on the above observations, in this chapter we will develop operatorvalued Dirichlet form and module operator Markov semigroup on order Hilbert W ∗ -bimodule over finite von Neumann algebra to cover the above various scalarvalued noncommutative Dirichlet forms. This chapter is organized as follows. In Sect. 11.2 we first recall briefly the main concepts and results of order Hilbert spaces and standard forms of von Neumann algebras, then collect definitions and facts about scalar-valued noncommutative symmetric Dirichlet forms and associated strongly continuous Markov semigroups. Section 11.3 is devoted to characterize operator-valued Dirichlet form based on order Hilbert W ∗ -bimodule over a finite von Neumann algebra. In Sect. 11.4 we introduce weak ∗ continuous module operator Markov semigroup. As the main result of this paper, we establish the Beurling-Deny type criterion in our noncommutative context. In the end, we give an example of operator-valued Dirichlet form under the framework of operator-valued free probability.

11.2 Preliminaries In this section we first collect briefly the definitions and facts about order Hilbert spaces and standard forms of von Neumann algebras, then recall scalar-valued noncommutative symmetric Dirichlet forms and associated Markov semigroups we need in the sequel, referring to [9, 10, 15] for proofs and further results. We point out that throughout this chapter inner product in Hilbert spaces and Hilbert C ∗ -modules will be assumed to be linear in the right hand entry and conjugate linear in the left hand one. Given a separable complex Hilbert space H , if there exists a selfdual positive cone H + in H such that H = Hh ⊕ i Hh , then H is called an order Hilbert space, where Hh = {x ∈ H :< x, y >∈ R, ∀ y ∈ H + }, is the real subspace of H associated to H + , and the meaning of selfduality of H + is that it satisfies {x ∈ H :< x, y >≥ 0, ∀ y ∈ H + } = H + . It follows that H + is a closed convex cone in H . Thus, H + gives rise to an order relation on Hh : x ≤ y if and only if y − x ∈ H + for x, y ∈ Hh . Furthermore, from the meaning of the above order Hilbert space H , there exists an antiunitary J on H which preserves H + and Hh : J : H → H, x + i y → x − i y, ∀ x, y ∈ Hh ,

11 Operator-Valued Dirichlet Forms and Module Operator …

159

and each element of Hh has a Jordan decomposition, that is, for all x ∈ Hh there exists uniquely two orthogonal elements x + , x − in H + such that x = x + − x − , where x + and x − are the positive part and negative part of x, respectively. As a classical example, given a locally compact Hausdorff space X equipped with a positive Radon measure m on it, then the space L 2 (X, m) of all square integrable functions on X is an order Hilbert space, its selfdual positive cone L 2+ (X, m) is the set of all m-a.e. nonnegative functions in L 2 (X, m). The conjugate J is the pointwise complex conjugate of functions in L 2 (X, m). It is well known that the classical Dirichlet forms and associated Markov semigroups relies on the lattice operation in the order functions space L 2 (X, m), but general Hilbert space is not lattice unless it is a functions space as above. Hence, the generalizations of various noncommutative Dirichlet forms mentioned above work on order Hilbert spaces associated to standard forms of von Neumann algebras: Given a von Neumann algebra A acting faithfully on an order Hilbert space H , then (A, H, H + , J ) is called to be a standard form of A, if it satisfies the following statements: (1) J A J = A ; (2) J a J = a ∗ for all a ∈ A ∩ A ; (3) J x = x for all x ∈ H + ; (4) a J a J (H + ) ⊂ H + for all a ∈ A, where A is the commutant of A. For example, (L ∞ (X, m), L 2 (X, m), L 2+ (X, m), J ) is the standard form of the commutative von Neumann algebra L ∞ (X, m) of all essentially bounded functions on (X, m). Hence, the classical Dirichlet forms are based actually on the standard form (L ∞ (X, m), L 2 (X, m), L 2+ (X, m), J ). Following this line, the noncommutative Dirichlet form that was pioneered by Albeverio and Hoegh-Krohn [3] is based on the Segal’ standard form (L ∞ (A, τ ), L 2 (A, τ ), L 2+ (A, τ ), J ) of the von Neumann algebra (L ∞ (A, τ ), where A is a C ∗ -algebra with a densely defined, faithful, semifinite, lower semicontinuous trace τ on it, L 2 (A, τ ) is the order Hilbert space of the GNS-representation πτ associated to τ , and L ∞ (X, m) is the von Neumann algebra πτ (A)

in B(L 2 (A, τ )) of all bounded linear operators on L 2 (A, τ ), the selfdual positive cone L 2+ (A, τ ) induces an anti-linear isometry J (the modular conjugation) on L 2 (A, τ ) which is the extension of the involution a → a ∗ of A. The subspace of J -invariant elements (called real) is L 2h (A, τ ). From then on, the study of noncommutative Dirichlet forms are in the context of the Segal’s standard form (see [3–6, 8]), or in the Haagerup’s standard form (see [7]), or in the Tomita-Takesaki’s standard form (see [9–11]). In particular, Cipriani [9] extended the noncommutative symmetric Dirichlet form and its associated Markov semigroup to general σ -finite von Neumann algebra under the framework of TomitaTakesaki’s standard form as below: Given a σ -finite von Neumann algebra A acting faithfully on a separable Hilbert space H. Let Ω be a cyclic and separating unit vector for A, then φ(a) =< Ω, xΩ > for all a ∈ A, is a faithful normal state on A. Let S = J 1/2 denote, as usual, the closure of the conjugate linear operator aΩ → a ∗ Ω for all a ∈ A, where the antilinear

160

L. Zhang

partial isometry J and the positive operator  as above in the polar decomposition of S are the modular conjugation and the modular operator associated to Ω, respectively. By the Tomita-Takesaki theory (see [15]) it follows that J Ω = Ω, J 2 = I and J A J = A , it A−it = A, ∀t ∈ R. Therefore, one can obtain the module automorphism group {σt }t∈R on A : σt (a) = it a−it for all a ∈ A and t ∈ R, where I is the unit of A, and A is the commutant of A. Moreover, the set H + = {1/4 A+ Ω} is the selfdual positive cone in the above H , so that H becomes an order Hilbert space and (A, H, H + , J ) is the standard form of the above σ -finite von Neumann algebra A. The real subspace Hh is the set of all J -invariant elements in H. Let x0 be a fixed vector in H + . For a given x ∈ Hh , the symbol x ∧ x0 and x ∨ x0 will denote the Hilbert projection of x onto the closed and convex subset x0 − H + and x0 + H + , respectively. Indeed, if the above order Hilbert space H is reduced to functions space, then x ∧ x0 and x ∨ x0 equal inf(x, x0 ) and sup(x, x0 ), respectively. Definition 11.1 (cf. [9], Definition 4.8) A closed, densely defined, nonnegative quadratic form (, D()) on H is said to be (1) real if for x ∈ D() then J (x) ∈ D() and [J (x)] = [x]; (2) a Dirichlet form if it is real and [x ∧ x0 ] ≤ [x], for x ∈ Dh () := D() ∩ Hh ; n n (3) a completely Dirichlet n form if the canonical extension ( , D( )) to H ⊗ n n Mn (C):  [[ai j ]i, j=1 ] := i, j=1 [ai j ], is a Dirichlet form for all n ≥ 1, where [ai j ]i,n j=1 ∈ D( n ) := D() ⊗ Mn (C). Definition 11.2 Given a strongly continuous semigroup {Tt }t≥0 of bounded linear operators defined on H. (1) it is symmetric, if < Tt (x), y >=< x, Tt (y) >; (2) it is Markovian, if 0 ≤ x ≤ x0 implies that 0 ≤ Tt (x) ≤ x0 ; (3) it is completely Markovian, if {Tt ⊗ In } is Markovian on H ⊗ Mn (C) for all n ∈ N, where In is the identity map on Mn (C). Remark 11.1 Given the above standard form (A, H, H + , J ) of σ -finite von Neumann algebra A, and assume that the set of all entire analytic elements for the module automorphism group {σt }t∈R is weakly dense in A, which is denoted by A0 . Put x0 be the cyclic and separating vector in H + associated to the faithful normal state φ on A, then by [15] Lemma 2.5.40 (see also [9] Proposition 1.1) the map i : A → H, i(a) = 1/4 ax0 for all a ∈ A, is injective with dense range and σ (A, A∗ ) − σ (H, H ) continuous, in particular i maps [0, 1] onto [0, x0 ], where 1 is the unit of A. So that we can identity x and i(x) in H for all x ∈ A. Hence, by [9] Theorem 2.11 a weak ∗ continuous φ-symmetric Markov semigroup {Tt }t≥0 on A can be extended to strongly continuous symmetric Markov semigroup in the sense of the above Definition 11.2, where φ-symmetric Markov semigroup on A means that φ(Tt (x)σ−i/2 (y)) = φ(σ+i/2 (x)Tt (y)), ∀a, y ∈ A0 , t ≥ 0, and 0 ≤ Tt (x) ≤ 1 whenever 0 ≤ x ≤ 1,

11 Operator-Valued Dirichlet Forms and Module Operator …

161

where 1 is the unit of A. In the end of this section, the well-known Beurling-Deny type criterion is given based on the framework of Tomita-Takesaki’s standard form. Theorem 11.1 (cf. [9] Theorem 4.11) Given a strongly continuous symmetric L semigroup {Tt }t≥0 = {e−t√ }t≥0 with L, and the associated √ infinitesimal generator √ quadratic form [x] =< L x, L x > for x ∈ D() = D( L). Then the following statements are equivalent: (1) The form  is a (completely) Dirichlet form; (2) The semigroup {Tt }t≥0 is (completely) Markovian.

11.3 Operator-Valued Dirichlet Forms The main theme of this section is to characterize operator-valued Dirichlet form based on order Hilbert W ∗ -bimodule over a finite von Neumann algebra. First we recall briefly the concepts and facts of Hilbert C ∗ -modules we use in this chapter, referring to [16–18] for more details. Given a right A-module E over a C ∗ -algebra A, then it is said to be a Hilbert C ∗ -module over A provided that it is equipped with a A-valued inner product : E × E → A, (x, y) →< x, y >, ∀(x, y) ∈ E × E, for which the associated norm x → || < x, x > ||1/2 , is complete. When A is a von Neumann algebra, the above Hilbert C ∗ -module E is called to be a Hilbert W ∗ - module in case E is selfdual, that is, for each bounded A-linear mapping φ : E → A, there exists an element ξφ ∈ E such that φ(x) =< ξφ , x > for all x ∈ E. Recall that the weak ∗-topology on a Hilbert C ∗ -module E over a von Neumann algebra A is the locally convex vector topology induced by the following family of seminorms

= { pϕ,x (·) = |ϕ(< x, · >)| : ϕ ∈ A∗ , x ∈ E}, where A∗ is the predual of A. By [19] Proposition 2.9 a Hilbert C ∗ -module E over a von Neumann algebra A is selfdual if and only if the unit ball of E is complete with respect to the above weak ∗-topology. Given two Hilbert W ∗ -modules E and F over a von Neumann algebra A, the set of all bounded module maps from E to F is denoted by B(E, F), the set of all adjointable maps from E to F is denoted by L(E, F). In particular, if E = F then L(E) = B(E) is a von Neumann algebra. It is well known that each closed subspace of Hilbert space is orthogonal complemented, but closed submodule of Hilbert C ∗ -module need not be orthogonal

162

L. Zhang

complemented, and this is one of the essential differences between the Hilbert space and the Hilbert C∗-module, and the cause of difficulties that follows. In the Hilbert space case, a densely defined closed operator has a densely defined adjoint operator, but we would not expect that to hold for module operator on Hilbert C ∗ -module. Hence, the property of having a densely defined adjoint has to be built for further analysis. Definition 11.3 (see [17] Chap. 9) Given a Hilbert C ∗ -module E, a densely defined closed module operator T : D(T ) → E, is called to be a regular module operator if T ∗ is also densely defined and I + T ∗ T has dense range, where I is the identity module operator on E, T ∗ is the adjointable operator of T , and D(T ) is the domain of T. Regular module operators maintain almost all the important properties of closed densely defined operators on Hilbert spaces. By [17] Proposition 10.6 a closed densely defined symmetric module operator L on E is regular if and only if the submodules R(L ± i I ) are orthogonal complemented in E. In particular, if the above L is selfadjoint, by [17] Lemma 9.8 it is regular if and only if the module operators L ± i I are surjective. In the following we will consider the spectrum calculus of regular selfadjoint module operator L on a Hilbert W ∗ -module E over a von Neumann algebra A: The Cayley transform of the above L can be constructed as below: C L x = (L − i I )(L + i I )−1 x, x ∈ D(L), the corresponding Cayley inverse transform is that L x = i(I + C L )(I − C L )−1 x, x ∈ D(L). Therefore, L is selfadjoint if and only if C L is unitary. Notice that L(E) is a von Neumann algebra, using the spectrum decomposithere exists a spectral family {E λ : tion representation of C L in L(E) (see [20]),  +∞ d E λ x, x ∈ E, in the sense of −∞ < λ < +∞} ⊂ L(E) such that C L x = −∞ λ−i λ+i weak ∗-convergence of approximating Riemann sums, that is, f (< C L x, y >) =  +∞ λ−i d f (< E x, y >, ∀y ∈ E, ∀ f ∈ A . Then combining the Cayley inverse λ ∗ −∞ λ+i transform, similar to the Hilbert space case, one can obtain the spectrum decomposition representation of L  Lx =

+∞

−∞

λd E λ x, x ∈ D(L),

in the sense of weak ∗-convergence of approximating √ Riemann  +∞ √sums. λd E λ . Moreover, if the above L is nonnegative, then L = 0 On the above bases, now we can introduce the concepts of order W ∗ -bimodule and operator-valued Dirichlet form.

11 Operator-Valued Dirichlet Forms and Module Operator …

163

Definition 11.4 Given a A-A-bimodule E over a finite von Neumann algebra A with a faithful, finite and normal trace τ on it. If E is a Hilbert W ∗ -module as right A- module and satisfies that < ax, y >=< x, a ∗ y > for all x, y ∈ E, a ∈ A, then it is called a Hilbert W ∗ -bimodule. Furthermore, if there exists a selfdual positive cone E + in the sense that E + = {x ∈ E : τ (< x, y >) ∈ E + , ∀ y ∈ E + }, such that E = E h ⊕ i E h , where E h = {x ∈ E : τ (< x, y >) ∈ R, ∀ y ∈ E + }, that is, E is the complexification of the real submodule E h , then E is called to be an order Hilbert W ∗ -bimodule. It follows that E + reduces an order relation on E h : x ≤ y if and only if y − x ∈ + E , and each element of E h has a Jordan decomposition, that is, for x ∈ E h , there exists uniquely two orthogonal elements x + and x − in E + such that x = x + − x − , where x + and x − are called the positive part and negative part of x, respectively. Moreover, similar to the order Hilbert space case, there exists an antiunitary J on the above E which preserves E + and E h : J : E → E, x + i y → x − i y, f or all x, y ∈ E h . Remark 11.2 We point out that order Hilbert W ∗ -bimodules appeared in the rest of this chapter are all over finite von Neumann algebra. Example 11.1 Consider an operator-valued probability space (A, B, ), where A is a finite von Neumann algebra with a faithful finite and normal trace τ on it, and B is a von Neumann subalgebra of A. Then there exists a trace preserving faithful normal conditional expectation : A → B (see [21]). Hence, A becomes a preHilbert C ∗ -module over B with the B-valued inner product < x, y >= (x ∗ y) for all x, y ∈ A. Thus, the closure E of A with respect to the above weak ∗ topology is a Hilbert W ∗ -module over B. Since A is a B-B- bimodule and satisfies < b a, a >=< a, b∗ a > ∀a, a ∈ A, b ∈ B, then it is easy to check that E is a B-B- bimodule and < b x, y >=< x, b∗ y > for all x, y ∈ E, b ∈ B. Hence, E is a Hilbert W ∗ bimodule over B. Since the canonical map i : A → E, is injective, so that we can identity x with i(x) in E for all x ∈ A. Define the positive cone E + to be the closure of A+ with respect to the above weak ∗ topology. By the property of the trace τ we have E + = {x ∈ E : τ (< x, y >) ≥ 0, ∀ y ∈ E + }, this shows that the E + defined above is selfdual, from which implies that E = E h ⊕ i E h , where E h = {x ∈ E : τ (< x, y >) ∈ R, ∀ y ∈ E + }. Hence, E is an order Hilbert W ∗ -bimodule. Indeed, E h is the closure of Asa in E with respect to the above weak ∗ topology, where Asa is the selfadjoint part of A. Example 11.2 Given a separable Hilbert space H and a finite von Neumann algebra A with a faithful finite and normal trace τ on it, then the algebraic tensor product H ⊗alg A is a pre-Hilbert C ∗ -bimodule with respect to the following A-valued inner product: < e ⊗ a, f ⊗ b >=< e, f > a ∗ b, ∀ e ⊗ a, f ⊗ b ∈ H ⊗alg A.

164

L. Zhang

¯ of Similar to the above Example 11.1, it is easy to check that the closure H ⊗A H ⊗alg A with respect to the above weak ∗ topology is a Hilbert W ∗ -bimodule ¯ + to be the weak ∗ cloover A. Furthermore, define the positive cone (H ⊗A) n + + sure of the algebraic positive cone H ⊗alg A = {i=1 λi ei ⊗ ai : ei ∈ H + , ai ∈ + + A , λi ≥ 0, i = 1, 2, · · · , n, n ∈ N}, where A is the positive part of A. By the property of the trace τ combining with the meaning of order Hilbert space H it is ¯ = (H ⊗A) ¯ h ⊕ i(H ⊗A) ¯ h , where ¯ + is selfdual, and H ⊗A easy to check that (H ⊗A) ¯ : τ (< x, y >) ∈ R, ∀ y ∈ (H ⊗A) ¯ + }. Hence, H ⊗A ¯ is an ¯ h = {x ∈ H ⊗A (H ⊗A) ¯ h = Hh ⊗A ¯ sa . order Hilbert W ∗ -bimodule. Indeed, we can prove easily that (H ⊗A) On the above bases, now we can introduce the concepts of A-valued symmetric quadratic form and operator-valued symmetric Dirichlet form on an order Hilbert W ∗ -bimodule E over a finite von Neumann algebra A with a faithful finite and normal trace τ on it: Definition 11.5 A-valued sesquilinear form (, ) on the order Hilbert W ∗ -bimodule E is a map  : D() × D() → A, with the following properties: (1) (x, ya + zb) = (x, y)a + (x, z)b; (2) (xa + yb, z) = a ∗ (x, z) + b∗ (y, z); (3) (ax, y) = (x, a ∗ y), for all x, y ∈ D(), a, b ∈ A, where D() is a J invariant submodule of E, that is, J D() ⊂ D(). Moreover, ε(, ) is called a symmetric sesquilinear form provided that (x, y)∗ = (y, x) for all x, y ∈ D(), correspondingly, [x] = (x, x) is said to be a A-valued symmetric quadratic form. [x] = (x, x) is called a A-valued non-negative definite quadratic form in case that [x] ≥ 0 for all x ∈ D(). In addition, (, ) is called to be τ -real sesquilinear if it satisfies that τ (x, y) = τ (J (x), J (y)) for all x, y ∈ D(), where − is denoted the complex conjugate. Remark 11.3 (1) In particular, the A-valued inner product on E is a A-valued positive definite sesquilinear form; (2) Similar to the proof of Hilbert space case, it is easy to check that if the above sesquilinear (, ) is non-negative definite, then it is symmetric and the Cauchy Schwarz type inequality holds: (y, x)(x, y) ≤ ||(y, y)||(x, x). In view of the above Definition 11.5, A-valued non-negative definite quadratic form [x] = (x, x) determines a A-valued inner product on D(), which induces a norm on D(), ||x||1 = ||1 [x]||1/2 , where 1 (x, y) = (x, y)+ < x, y >, ∀x, y ∈ D(). It is clear that || · ||1 is stronger than || · ||. If the unit ball of D() is complete in the weak ∗ topology induced by the family of seminorms { pϕ,x = |ϕ(1 (x, ·))| : ϕ ∈ A∗ , x ∈ D()}, then D() is an order Hilbert W ∗ -bimodule itself, and in this case, [, ] is called a closed form. Furthermore, if D() is dense in E with respect to the weak ∗-topology on E, then [, ] is called a densely definite closed form. Similar to the proof of Theorem 3.4 in [13] we have the following result:

11 Operator-Valued Dirichlet Forms and Module Operator …

165

Proposition 11.1 There is a one to one correspondence between the family of closed densely defined nonnegative definite A-valued quadratic forms and the family of closed densely defined self-adjoint non-negative module operators on E. Definition 11.6 Given an order Hilbert W ∗ -bimodule E over a finite von Neumann algebra A with a faithful finite and normal trace τ on it. For x, y ∈ E h , define x ∨ y = y + (x − y)+ ; x ∧ y = y − (x − y)− . It is easy to prove that the above relations have the following basic properties: Proposition 11.2 (1) x ∨ y = y ∨ x, x ∧ y = y ∧ x; (2) x + y = x ∨ y + y ∧ x; (3) |x − y| = x ∨ y − x ∧ y. Remark 11.4 On the framework of above functions space L 2 (X, m), where X is locally compact space with σ -finite measure m on it. Then x ∨ y = sup(x, y); x ∧ y = inf(x, y), ∀x, y ∈ L 2h (X, m). On the framework of standard form (A, H, H + , J ) of von Neumann algebra A as described in the above Sect. 11.2, x ∨ y and x ∧ y equal the projections of the vector x onto the closed convex cone y + H + and y − H + , respectively (see [9], Lemma 4.4). Definition 11.7 Given a closed densely defined non-negative definite A-valued τ real quadratic form ([ ], D()) on the above order Hilbert W ∗ -bimodule E, and a fixed non-zero element x0 ∈ E + . Then it is called a A-valued Dirichlet form if it satisfies the following statements: x ∈ D()h := D() ∩ E h implies x ∧ x0 ∈ D(), and τ ([x ∧ x0 ]) ≤ τ ([x]). ¯ and the von Neumann algebra A is reduced to the Remark 11.5 When E is H ⊗A complex field C, then if H is associated to the Tomita-Takesaki’s standard form, the above Definition 11.7 is the Definition 4.8 in [9]; then if H is associated to the Segal’ standard form of a semifinite von neumann algebra and x0 is its unit, by [6] Proposition 2.12 the above Definition 11.7 is equivalent to the noncommutative cases [3, 4, 6, 7]. The above A-valued Dirichlet form possesses the following properties: Proposition 11.3 (1) x ∈ D() =⇒ x + ∈ D() and τ ([x + ]) ≤ τ ([x]); (2) x ∈ D() =⇒ |x| ∈ D() and τ ([|x|]) ≤ τ ([x]). Proof (1) Since the trace τ is faithful, then E is an inner product space with respect to the inner product (x, y) = τ (< x, y >) f or all a, y ∈ E.

166

L. Zhang

The completion of E in the corresponding norm is a Hilbert space, which is denoted by Hτ . Since the map i : E → Hτ , x → x, is a linear order preserving injection, so that Hτ is an order Hilbert space, it follows that the selfdual positive cone Hτ+ and the real subspace Hτh in Hτ are the closure of E + and E h , respectively. Hence, the A-valued Dirichlet form (, ) on E can be extended to a scalar-valued Dirichlet form τ (, ) on Hτ such that τ (x, y) = τ ((x, y)) for all x, y ∈ E. For each x ∈ D()h , by the above Definition 11.6 x ∧ x0 = x0 − (x − x0 )− , then using the above Definition 11.7 we have τ ([x0 − (x − x0 )− ]) = τ ([x ∧ x0 ]) ≤ τ ([x]). Thus, similar to the proof of [9] Proposition 4.10 it follows that τ ([λx0 − (x − λx0 )− ]) ≤ τ ([x]) f or all λ > 0. Now we regard x and x0 as elements in Hτ , so that x − is the projection of x onto the closed convex cone −Hτ+ . Using the continuity of the projection on Hilbert spaces and combining with the lower semicontinuity of Dirichlet form τ (, ) on Hτ , then it implies that τ ([x − ]) = τ [x − ] ≤ lim inf τ [λx0 − (x − λx0 )− ] λ→0

= lim inf τ ([λx0 − (x − λx0 )− ]) ≤ τ ([x]). λ→0

Replace the x in the above equation with αx + − x − for α > 0, and notice that (, ) is τ -real, so that τ ((x + , x − )) = τ ((x + , x − )) = τ ((x − , x + )). Hence τ ([x − ]) ≤ τ ([αx + − x − ]) = τ ([x − ]) + α 2 τ ([x + ]) − 2ατ ((x + , x − )), from it implies that τ ((x + , x − )) ≤ 0 since α is any positive. Therefore τ ([x]) = τ ([x + − x − ]) = τ ([x + ]) − 2τ ((x + , x − )) + τ ([x − ]) ≥ τ ([x + ]). (2) Notice that |x| = x + + x − . By (1) we have τ ([|x|]) − τ ([x]) = τ ([x + + x − ]) − τ ([x + − x − ]) = 4τ ((x + , x − )) ≤ 0.

11 Operator-Valued Dirichlet Forms and Module Operator …

167

11.4 Beurling-Deny Type Criterion In this section, we first introduce weak ∗ continuous module operator semigroup based on order Hilbert W ∗ -bimodule E over a finite von Neumann algebra A with a faithful finite and normal trace τ on it, then give the characterization of module operator Markov semigroup on E. As the main result of this chapter, we obtain the Beurling-Deny criterion between operator-valued Dirichlet forms and the associated module operator Markov semigroups. We conclude this section with an example of operator-valued Dirichlet form and module operator Markov semigroup associated with a module derivation under the framework of operator-valued free probability. Definition 11.8 One-parameter family {T (t)}t≥0 ⊆ L(E) is called a weak ∗ continuous module operator semigroup on E, if it satisfies the following statements: (1) T (0) = I (I is the identity module operator); (2) (the semigroup property) T (t + s) = T (t)T (s), ∀t, s ≥ 0; (3) the map t → T (t)x, [0, +∞) → E, is weak ∗ continuous, ∀x ∈ E, i.e., t → φ(< T (t)x, y >) is continuous, ∀φ ∈ A∗ , ∀y ∈ E; (4) x → T (t)x, E → E, is weak ∗ – weak ∗ continuous, ∀t ≥ 0. Given a weak ∗ continuous module operator semigroup {T (t)}t≥0 on E, the linear module operator L defined by exists } with respect to the above weak ∗ topolD(L) = {x ∈ E : limt→0 T (t)x−x t ogy and T (t)x − x d + T (t)x L x = lim = |t=+0 , ∀x ∈ D(L), t→+0 t dt is called the infinitesimal generator of the semigroup {T (t)}t≥0 , and in this case, D(L) is the domain of L. By the semigroup property of {T (t)}t≥0 and uniform boundedness principle, it is easy to prove that there exists constant α ≥ 0 and M ≥ 1 such that ||T (t)|| ≤ Meαt , t ≥ 0. Furthermore, if each T (t)(t ≥ 0) is self-adjoint and satisfies that < T (t)x, T (t)x >≤< x, x >, ∀t ≥ 0, ∀x ∈ E, then it is called a symmetric contraction module operator semigroup. In addition, if the infinitesimal generator L is regular, then {T (t)}t≥0 is called to be a regular module operator semigroup. In order to reconstruct contraction module operator semigroups from its generator, similar to the classical Banach space case, we have the following theorem. Theorem 11.2 (Hille-Yosida type Theorem) A linear (unbounded) module operator L becomes the infinitesimal generator of a weak ∗ continuous contraction module operator semigroup {T (t)}t≥0 , if and only if the following statements hold:

168

L. Zhang

(1) L is weak ∗ dense defined and weak ∗ closed; (2) (0, +∞) ⊂ ρ(L), and ||λ(λI − L)−1 || ≤ 1, ∀λ > 0, where ρ(L) is the resolvent set of L. The proof of the above Theorem 11.2 is parallel to the classical Hille-Yosida Theorem (see [15], Theorem 3.1.10), here we omit the details. In the following we consider weak ∗ continuous symmetric contraction module operator semigroup on E. Similar to the classical Hilbert space cases, the generators and their resolvents have the following basic properties. Proposition 11.4 Given a weak ∗ continuous symmetric contraction module operator semigroup {T (t)}t≥0 on E with infinitesimal generator L. Then (1) L is a weak ∗ closed and weak ∗ dense defined, non-positive selfadjoint module operator; (2) dtd T (t)x = L T (t)x = T (t)L x, ∀x ∈ D(L); ∞ (3) set G α = (α − L)−1 , α > 0. Then G α x = 0 e−αt T (t)x, x ∈ E;  (tα)n n (4) T (t)x = limα→+∞ e−αt ∞ n=0 n! (αG α ) x, x ∈ E. Furthermore, {G α }α>0 is a weak ∗ continuous symmetric contraction resolvent of {T (t)}t≥0 , and L is also the generator of {G α }α>0 . The proof of the above proposition is basic and omitted (cf. [22], Proposition 1.10). Proposition 11.4(3) says that {G α }α>0 is the Laplace transform of {T (t)}t≥0 . Solving informally the ordinary differential equation in proposition 11.4(1) yields Tt = et L , t ≥ 0. As we know, in the classical Hilbert spaces cases, by the spectrum calculus relative to (unbounded) closed selfadjoint operators, then there exists a one to one correspondence between the family of strong continuous symmetric contraction semigroups and the family of closed densely defined selfadjoint nonpositive operators. The above conclusion can be generalized to the regular module operators semigroups cases: Proposition 11.5 Given the above order Hilbert W ∗ -bimodule E over A. Then there exists a one to one correspondence between the family T of weak ∗ continuous symmetric contraction regular module operator semigroups and the family L of closed densely defined selfadjoint non-positive regular module operators on E. Proof Suppose that Tt = et L , t ≥ 0 be a weak ∗ continuous symmetric contraction regular module operator semigroup with infinitesimal generator L. By the above Theorem 11.2(1) L is a closed densely defined selfadjoint non-positive regular module operator on E. Then we define a map

: T → L, et L → L . Since {Tt }t≥0 = {et L }t≥0 and L are mutually determined, so that the above map is well defined and is injective. Next, we will prove it is surjective. Let L be a closed densely defined selfadjoint non-positive definite regular module operator on E. In order to prove that L is an infinitesimal generator of a weak ∗ continuous

11 Operator-Valued Dirichlet Forms and Module Operator …

169

symmetric contraction regular module operator semigroup, by the above Theorem 11.2 it only need to prove that the inequality ||α(α I − L)−1 || ≤ 1. Indeed, from the above Sect. 11.3,  there exists a spectrum family {E λ : 0 ≤ λ ≤ +∞} ⊂ L(E) such that −L x = [0,+∞) λd E λ x, x ∈ D(L), in the sense of weak ∗-convergence of approximating Riemann sums. Therefore f (< α(α I − L)−1 x, α(α I − L)−1 x >) =  [0,+∞)

α d f (< E λ x, x >) ≤ ||x||2 , ∀ f ∈ A∗ , x ∈ E. (α + λ)2

This yields that ||α(α I − L)−1 || ≤ 1. Remark 11.6 For a given weak ∗ continuous symmetric contraction regular module operator semigroup Tt = et L , t ≥ 0 and the associated resolvents {G α , α > 0} with infinitesimal generator L. Define approximation symmetric forms  (t) and  (β) (β > 0) as below: (1)  (t) (x, y) = 1t < x − Tt x, y >, x, y ∈ E; (2)  (β) (x, y) = β < x − βG β x, y >, x, y ∈ E. Using the spectral family just as in the above Proposition 11.5, similar to the [4] Lemma 1.3.4 then we have [x] = lim  (t) [x], [x] = lim  (β) [x], t→0

β→∞

in the sense of weak ∗ convergence, respectively. Based on the above materials, now we can introduce the concept of module operator Markov semigroup as below: Definition 11.9 Let x0 be a fixed nonzero vector in E + . A weak ∗ continuous, symmetric and contraction module operator semigroup {Tt = et L }t≥0 ⊂ L(E) is called a Markov module operator semigroup provided that for any x ∈ E h and 0 ≤ x ≤ x0 then 0 ≤ Tt x ≤ x0 . In this case, every Tt (t ≥ 0) is called a Markov module operator. In view of the meaning of Definition 11.9 and combining Proposition 11.4 we can get directly the following conclusion: Proposition 11.6 The notations are the same as in Proposition 11.4. Then the following are equivalent: (1) {Tt }t≥0 is a module operator Markov semigroup; (2) each αG α Markov module operator for all α > 0. The following example will show that completely positive Markov semigroup on σ -finite von Neumann algebra A can be characterized by module operator Markov semigroup on order Hilbert W ∗ -bimodule generated by tensor product of order Hilbert space and hyperfinite I I1 -factor.

170

L. Zhang

Example 11.3 Given a σ -finite von Neumann algebra A acting faithfully on a separable Hilbert space H . Let Ω be a cyclic and separating unit vector for A, it follows that φ(a) =< Ω, aΩ >, ∀a ∈ A, is a faithful normal state on A. Consider a σ -weak continuous and φ-symmetric contractive semigroup {Pt }t≥0 of everywhere defined maps on A, in which the meaning of φ-symmetry is the same as above Remark 11.1. Let Tt(n) = Pt ⊗ In : A ⊗ Mn (C) → A ⊗ Mn (C), where Mn (C) is the n × n matrix algebra, and In is the identity map on Mn (C). Then A ⊗ Mn (C) is a pre-Hilbert C ∗ -bimodule over Mn (C) with the matrix valued inner product < a ⊗ e, b ⊗ f >= φ((1/4 a)∗ 1/4 b)e∗ f, ∀a ⊗ e, b ⊗ f ∈ A ⊗ Mn (C). The closure of A ⊗ Mn (C) is the order Hilbert W ∗ -bimodule H ⊗ Mn (C). By routine calculation the module operator semigroup {Tt(n) }t≥0 can be weak ∗-continuously extended to a symmetric contractive module operator semigroup on H ⊗ Mn (C), when no confusion can arise, we shall use the same symbol to denote the extension. Recall that a φ-symmetric semigroup {Pt }t≥0 on A is Markovian provided that it satisfies 0 ≤ Pt x ≤ 1 if 0 ≤ x ≤ 1, and the map t → Pt (a) from [0, +∞) to A, is σ -weak continuous on A for all a ∈ A, where 1 is the unit of A. Furthermore, {Pt }t≥0 on A is a n-positive Markov semigroup in case the canonical extension {Pt ⊗ In }t≥0 is Markovian. Furthermore, if it holds true for each n ∈ N, then {Pt }t≥0 is completely positive Markov semigroup. By the above observations, the weak ∗ continuous extension {Tt(n) }t≥0 of n-positive Markov semigroup {Pt }t≥0 is a module operator Markov semigroup on H ⊗ Mn (C) associated to the cyclic vector Ω. Moreover, Let B be the hyperfinite I I1 -factor generated by ∪n M2n (C) with an unique normal trace τ on it. Hence, if {Pt }t≥0 is a completely positive Markov semigroup on A, then it can be extended by weak ∗ continuously to a module operator Markov semigroup on order Hilbert W ∗ -bimodule ¯ H ⊗B. In what follows, we will characterize the Beurling-Deny correspondence between operator-valued Dirichlet form and its associated module operator Markov semigroup based on order Hilbert W ∗ -bimodule E finite von Neumann algebra A with a faithful finite and normal trace τ on it. Theorem 11.3 Given a weak ∗ continuous symmetric contractive regular module operator semigroup {Tt }t≥0 ⊆ L(E). Then {Tt }t≥0 is Markovian if and only if [x] =< Gx, x >, x ∈ D(G), is a A-valued Dirichlet form, where −G is the infinitesimal generator of {Tt }t≥0 .

11 Operator-Valued Dirichlet Forms and Module Operator …

171

Proof Suppose that [x] =< Gx, x > is a A-valued Dirichlet form. In order to prove {Tt }t≥0 is a Markovian module operator semigroup, by the above Proposition 11.6 it suffices to prove that each αG α (α > 0) is a Markov module operator, that is to prove 0 ≤ αG α y ≤ x0 whenever 0 ≤ y ≤ x0 . Write x = αG α y, in the following we are divided into two steps to prove: Step 1. We will prove x ≥ 0, that is to prove x = x + . Let us introduce a A-valued function F : F : D() → A+ , F(z) = α −1 [z]+ < z − y, z − y >, ∀z ∈ D(). Then we will prove that x = αG α y is the unique element of D() minimizing F, that is to prove that F(z) ≥ F(x), ∀z ∈ D(), and F(z) = F(x) ⇐⇒ z = x. Observe that x = αG α y, αG α = (I + α −1 G)−1 , where I is the unit of L(E). We have F(z) − F(x) = α −1
−α −1
+ < z, z > −

< z, y > − < y, z > − < x, x > + < x, y > + < y, x >= < (I + α −1 G)z, z > − < (I + α −1 G)x, x > − < z, y > − < y, z > + < x, y > + < y, x >= < (I + α −1 G)(z − x), z > − < z − x, y >= < (I + α −1 G)(z − x), z − x >≥ 0 (4.1). Therefore, from the above (4.1) it implies directly that F(z) ≥ F(x), ∀z ∈ D(), and F(z) = F(x) ⇐⇒ z = x. Thus, for proving x = x + it suffices to prove F(x + ) ≤ F(x). Assume that F(x + ) > F(x), it yields τ (F(x + ) − F(x)) > 0. Since [, ] is a Dirichlet form, by the above Proposition 11.3 we have τ ([x + ]) ≤ τ ([x]), it follows that τ (F(x + )) = α −1 τ ([x + ]) + τ (< x + − y, x + − y >) ≤ α −1 τ ([x]) + τ (< x + − y, x + − y >), Observe that

τ (< x + − y, x + − y > − < x − y, x − y >) =

τ (< x + , x + > − < x + , y > − < y, x + > − < x, x > + < x, y > + < y, x >) =

τ (< x + , x + > − < x + , y > − < y, x + > − < x + , x + > − < x − , x − > + < x + , y > − < x − , y > + < y, x + > − < y, x − >) =

172

L. Zhang

−τ (< x − , x − > + < x − , y > + < y, x − >) ≤ 0. Hence, τ (F(x + )) ≤ τ (F(x)). This conclusion is in contradiction with the above hypothesis, so that x = x + . Step 2. We will prove x = x ∧ x0 . It suffices to prove that τ (< x − x ∧ x0 , x − x ∧ x0 >) = 0 since τ is faithful. Let us observe that τ (< x − x ∧ x0 , x − x ∧ x0 > + < x − x ∧ x0 , x ∧ x0 − y >) = τ (< x − x ∧ x0 , x − y >) = τ (< x − x ∧ x0 , (I − α −1 G −1 α )x >) = −1 −α −1 τ (< x − x ∧ x0 , (G −1 α − α I )x >) = −α τ ((x − x ∧ x 0 , x)) =

−(2α)−1 τ ((x − x ∧ x0 , x + x ∧ x0 ) + (x − x ∧ x0 , x − x ∧ x0 )) (4.2). The first term of (4.2) τ ((x − x ∧ x0 , x + x ∧ x0 ) = τ ([x] − [x ∧ x0 ] − (x ∧ x0 , x) + (x, x ∧ x0 )) =

τ ([x] − [x ∧ x0 ]) − τ ((x ∧ x0 , x)) + τ ((x, x ∧ x0 )) = τ ([x]) − τ ([x ∧ x0 ]) ≥ 0, where we used the τ -real symmetric property of the Dirichlet form [, ]. It is clear that the second term of (4.2) ≥ 0. Therefore τ (< x − x ∧ x0 , x − x ∧ x0 >) + τ (< x − x ∧ x0 , x ∧ x0 − y >) ≤ 0. Consequently τ (< x − x ∧ x0 , x ∧ x0 − y >) ≤ −τ (< x − x ∧ x0 , x − x ∧ x0 >) ≤ 0. But on the other hand side τ (< x − x ∧ x0 , x ∧ x0 − y >) = τ (< x − x ∧ x0 , x ∧ x0 − x0 + x0 − y >) = τ (< x − x ∧ x0 , x ∧ x0 − x0 >) + τ (< x − x ∧ x0 , x0 − y >) ≥ τ (< x − x ∧ x0 , x ∧ x0 − x0 >) = −τ (< (x0 − x)− , (x0 − x)+ >) = 0, so that τ (< x − x ∧ x0 , x ∧ x0 − y >) = 0.

11 Operator-Valued Dirichlet Forms and Module Operator …

173

Combining the above proof we can get τ (< x − x ∧ x0 , x − x ∧ x0 >) = 0, from which implies x = x ∧ x0 . Conversely, suppose that {Tt }t≥0 is Markovian. We have to show that x ∧ x0 ∈ D()h and τ ([x ∧ x0 ]) ≤ τ ([x]) whenever x ∈ D()h . For this purpose, put  (β) (y, z) = β < y, z − βG β z >, ∀y, z ∈ E, ∀β > 0. Then τ ( (β) ((x − αx0 )+ , x ∧ αx0 )) = τ (β < (x − αx0 )+ , x − αx0 − βG β (x ∧ αx0 ) >) =

τ (β < (x − αx0 )+ , x ∧ αx0 >) − τ (β < (x − αx0 )+ , βG β (x ∧ αx0 ) >) (4.3). It is easy to prove that the first term of the right hand side of (4.3) τ (β < (x − αx0 )+ , x ∧ αx0 >) ≥ αβτ (< (x − αx0 )+ , x0 >). The second term of (4.3) τ (β < (x − αx0 )+ , βG β (x ∧ αx0 ) >) ≤ αβτ (< (x − αx0 )+ , x0 >). Therefore

τ ( (β) ((x − αx0 )+ , x ∧ αx0 )) ≥ 0.

Consequently τ ( (β) ((x − αx0 )+ , (x − αx0 )+ )) = τ ( (β) ((x − αx0 )+ , x − x ∧ αx0 )) = τ ((β) ((x − αx0 )+ , x)) − τ ( (β) ((x − αx0 )+ , x ∧ αx0 )) ≤ τ ( (β) (x − αx0 )+ , x)) (4.4). Moreover, by Cauchy-Schwarz inequality (4.4) ≤ (τ ( (β) [x]))1/2 · (τ ( (β) [(x − αx0 )+ ]))1/2 . Hence

τ ( (β) [(x − αx0 )+ ]) ≤ τ ( (β) [x]).

Let β → ∞ in the above inequality, then we have τ ([(x − αx0 )+ ]) ≤ τ ([x]), which shows that (x − αx0 )+ ∈ D()h . Hence, we obtain that x − , x + , x ∧ x0 ∈ D()h whenever α = 0, 1, respectively. Finally, we will prove τ ([x ∧ x0 ]) ≤ τ ([x]). Since

174

L. Zhang

τ ([x] − [x ∧ x0 ]) = τ ((x − x ∧ x0 , x + x ∧ x0 )) = τ ((x − x ∧ x0 , x − x ∧ x0 )) + 2 τ ((x − x ∧ x0 , x ∧ x0 )) (4.5), where we used the τ -real symmetric property of the Dirichlet form in the second line of (4.5). It is clear that the first term of (4.5) is non-negative. In view of the above inequality τ ( (β) (x − x ∧ x0 , x ∧ αx0 )) = τ ( (β) ((x − αx0 )+ , x ∧ αx0 )) ≥ 0. Let β → ∞ in the above inequality, then we have τ ((x − x ∧ αx0 , x ∧ αx0 )) ≥ 0. Set α = 1 in this inequality, we see that the second term of (4.5) is non-negative also. Therefore τ ([x ∧ x0 ]) ≤ τ ([x]). Remark 11.7 In the above order Hilbert W ∗ -bimodule E over a finite von Neumann algebra A, if A is reduced to the matrix algebra Mn (C), then every closed submodule of E is complemented (see [23]), it follows from [17] Proposition 10.6 that any closed densely defined selfadjoint module operator is regular. In particular, when A is reduced to the complex field C. Then if E as order Hilbert space is associated to the Tomita-Takesaki’s standard form of σ -finite von Neumann algebra, the above Theorem 11.1 is a direct consequence of the above Theorem 11.3; Then if E as order Hilbert space is associated to the Segal’ standard form of semifinite von Neumann algebra and x0 is its unit, the generalizations of Beurling-Deny criterion [3, 4, 6, 7]) are direct consequences of the above Theorem 11.3 also. Finally, we construct an operator-valued Dirichlet form and module operator Markov semigroup using module derivation under the framework of operator-valued free probability. Example 11.4 Given an operator-valued probability space (A, B, ), where A is a finite von Neumann algebra with a faithful finite and normal trace τ on it, B is a von Neumann subalgebra of A, and is a trace preserving faithful normal conditional expectation from A onto B. Let C be a von Neumann subalgebra of A and satisfies B ⊂ C, and let X = X ∗ ∈ A be a noncommutative random variable such that B[X ] and C are algebraically free module B, where B[X ] ⊂ A is the ∗subalgebra generated by B and X (i.e. the canonical homomorphism B[X ] ⊗ B C → B[X ] ∨ C, has trivial kernel, the notations and relative details can referee to [24– 26]). In the following we consider order Hilbert W ∗ -bimodule L 2B (C[X ]) over B which is generated by C[X ] with the B-valued inner product < x, y >= (x ∗ y) for all x, y ∈ C[X ]. Define B-module derivation ∂ B : L 2B (C[X ]) → L 2B (C[X ]) ⊗ B L 2B (C[X ]) such that ∂ B (X ) = 1 ⊗ 1, ∂ B (c) = 0, ∀c ∈ C,

11 Operator-Valued Dirichlet Forms and Module Operator …

175

and ∂ B (c0 X c1 X · · · X cn ) =  nj=1 (c0 X · · · X c j−1 ) ⊗ B (c j X · · · X cn ), for all c0 , c1 , · · · , cn ∈ C. Under the assumption that 1 ⊗ 1 ∈ D(∂ B∗ ), then it is easy to check that ∂ B is closable and its closure is regular. In fact, by [27] if the above X is a standard semicircular element, then ∂ ∗ (1 ⊗ 1) ∈ L 2B (C[X ]). In what follows, set  B = ∂ B∗ ◦ ∂ B , using the properties of regular module operators (see [17], Chaps. 9 and 10), similar to the proof of [28] Theorem 4.5, one can show that  : L 2B (C[X ]) → B, [x] =< x, x >, for all x ∈ D(), is a B-valued Dirichlet form, so that from the above Theorem 11.3 its associated module operator semigroup Tt = e−t , t ≥ 0, is Markovian.

References 1. Beurling, A., Deny, J.: Espaces de Dirichlet I: le cas elementaire. Acta Math. 99, 203–224 (1958) 2. Gross, L.: Hypercontractivity and logarithmic Sobolev inequalities for the Clifford Dirichlet forms. Duck Math. J. 42(3), 383–396 (1975) 3. Albeverio, S., Hoegh-Krohn, R.: Dirichlet forms and Markovian semigroups on C ∗ -algebras. Comm. Math. Phys. 56, 173–187 (1977) 4. Sauvageot, J.L.: Quantum Dirichlet forms, differential calculus and semigroups. In: Accardi, L., von Waldenfels, W., (eds.) Quantum Probability and Applications V. Lecture Notes in Mathematics, vol. 1442. Springer, Berlin (1988) 5. Davies, E.B., Rothaus, O.S.: Markov semigroups on C ∗ -bundles. J. Funct. Anal. 85, 264–286 (1989) 6. Davies, E.B., Lindsay, J.M.: Non-commutative symmetric Markov semigroups. Math. Z. 210, 379–411 (1992) 7. Goldstein, S., Lindsay, J. M.: Beurling Deny conditions for KMS symmetric dynamical semigroups. C. R. Acad. Sci. Paris Ser. 317, 1053–1057 (1993) 8. Guido, D., Isola, T., Scarlatti, S.: Non-symmetric Dirichlet forms on semifinite von Neumann algebras. J. Funct. Anal. 135, 50–75 (1996) 9. Cipriani, F.: Dirichlet forms and Markovian semigroups on standard forms of von Neumann algebras. J. Funct. Anal. 147, 259–300 (1997) 10. Park, Y.M.: Construction of Dirichlet forms on standard forms of von Neumann algebras. Infinite Dim. Anal., Quant. Prob. Relat. Top. 3, 1–14 (2000) 11. Cipriani, F., Sauvageot, J.L.: Derivations as square roots of Dirichlet forms. J. Funct. Anal. 201, 78–120 (2003) 12. Connes, A.: Noncommutative Geometry. Academic Press, San Diego, CA (1994) 13. Zhang, L., Guo, M.: The characterization of a class of quantum Markov semigroups and the associated operator-valued Dirichlet forms based on Hilbert C ∗ -module l2 (A). Sci. China Math. 57(2), 377–387 (2014) 14. Zhang, L., Guo, M.: The characterization of a class of quantum Markov semigroups and the associated operator-valued Dirichlet forms based on Hilbert W ∗ -modules. Acta Math. Sinica (English series) 29(5), 857–866 (2013) 15. Bratteli, O., Robinson, D.W.: Operator Algebras and Quantum Statistical Mechanics-I. Springer, Berlin (1979)

176

L. Zhang

16. Paschke, W.: Inner product modules over B ∗ -algebras. Trans. Am. Math. Soc. 182, 443–468 (1973) 17. Lance, E.C.: Hilbert C ∗ -Modules. Cambridge University Press, London (1995) 18. Zhang, L.: Hilbert C ∗ -Module Theory and Applications. China Science Press, Beijing (2014) 19. Schweizer, J.: Hilbert C ∗ -modules with a predual. J. Oper. Theory 48, 621–632 (2002) 20. Kadison, R.V., Ringrose, J.R.: Fundamentals of the Theory of Operator Algebras. Academic Press, New York (1983) 21. Takesaki, M.: Conditional expectations in von Neumann algebras. J. Funct. Anal. 9, 306–321 (1972) 22. Ma, Z.M., Rocker, ¨ M.: An Introduction to the Theory of (Non-symmetric) Dirichlet Forms. Springer, Berlin (1992) 23. Magajna, B.: Hilbert C ∗ -modules in which all closed submodules are complemented. Proc. AMS 123(3), 849–852 (1997) 24. Voiculescu, D.: Operations on certain noncommutative operator-valued random variables. Asterisque ´ 232, 243–275 (1995) 25. Speicher, R.: Combinatorial theory of the free product with amalgamation and operator-valued free probability theory. Mem. AMS 627 (1998) 26. Nica, A., Shlyakhtenko, Speicher, R.: Operator-valued distributions 1. Characterizations of freeness. Int. Math. Res. Not. 29, 1509–1538 (2002) 27. Meng, B., Guo, M., Cao, X.: Operator-valued free Fisher information and modular frames. Proc. AMS 133(10), 3087–3096 (2005) 28. Cipriani, F.: Dirichlet Forms on Noncommutative Spaces. Quantum Potential Theory. Lecture Notes in Mathematics, vol. 1954, pp. 161–276. Springer (2008)

Chapter 12

Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System with Generalized Logistic Source and Nonlinear Secretion Xin Wang, Tian Xiang, and Nina Zhang

Abstract In this chapter, we study dynamical properties of nonnegative solutions for the following quasilinear parabolic-elliptic Keller-Segel chemotaxis system with generalized logistic source and nonlinear secretion: ⎧ ⎪ ⎨u t = ∇ · (D(u)∇u) − ∇ · (S(u)∇v) + f (u), x ∈ Ω, t > 0, x ∈ Ω, t > 0, 0 = v − v + u κ , ⎪ ⎩ x ∈ Ω, u(x, 0) = u 0 (x),

(∗)

with homogeneous Neumann boundary conditions in a bounded domain Ω ⊂ Rn (n ≥ 2) with smooth boundary, where κ > 0 and the parameter functions D and S are smooth and, for some d, χ > 0, α, β ∈ R, D(u) ≥ du −α , S(u) ≤ χ u β for all u > 1 and the logistic source f (u) fulfills f (0) ≥ 0 as well as f (u) ≤ a0 − bu γ with a0 ≥ 0, b > 0, γ > 1. We first establish a boundedness principle for the chemotaxis system (∗) asserting that blow-up of the solution is impossible if u(·, t) L q (Ω) is bounded for some q > max{ n2 (α + β + κ − 1), 0}. Then, with the aid of this criterion, we show the uniform-in-time L ∞ -boundedness of solutions under either one of the followings:

X. Wang · T. Xiang (B) Institute for Mathematical Sciences, Renmin University of China, Beijing 100872, China e-mail: [email protected] X. Wang e-mail: [email protected] N. Zhang Institute of Psychology, Chinese Academy of Sciences, Beijing 10083, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_12

177

178

X. Wang et al.

(B1) β + κ < max{γ , 1 +

−⎧ α}, [n(α+γ −1)−2] ⎨ n(α+γ χ for β > 0, −1)+2(β−1) (B2) β + κ = γ and b > b∗ = ⎩χ for β ≤ 0, (B3) β > 0, β + κ = γ , b = b∗ and either a0 = 0 or ⎧ ⎪ ⎨

2 n

α≤1 1 < α ≤ 21 + n2 ⎪ ⎩1 2 + n < α < 1 + n2 2

for γ > 1, 2 for 1 − α + n+2−nα < γ ≤ 1 − α + n4 , for γ > 1 − α + n4 ,

(B4) β = 0, κ = γ > 1, b = b∗ = χ and either a0 = 0 or α < 1 + n2 . Our results capture the effects of the net proliferation rate (whether a0 = 0 or not) of cells and weak chemotaxis (β ≤ 0) and, they encompass and extend the existing boundedness results, and hence enlarge the parameter range of boundedness. Finally, for the prototypical choices D(u) = (u + 1)−α , S(u) = χ u(u + 1)β−1 for β < 1 or S(u) = χ u β for β ≥ 1 and f (u) = au − bu γ for some a ∈ R, b > 0, the global 1 κ stabilities of the equilibria ((a/b) γ −1 , (a/b) γ −1 ) and (0, 0) are investigated in great detail and their respective convergence rates are explicitly calculated out. These stabilization results exhibit the effect of each ingredient in (∗) and, in particular, illustrate that no pattern formation can arise for small chemosensitivity χ or large damping b.

12.1 Introduction and Statement of Main Results Chemotaxis, the active movement of cells (or organisms) toward higher concentrations of diffusible signalling substances, has received great attentions both in biological and mathematical communities. A classical and fundamental mathematical system modelling chemotaxis was proposed by Keller and Segel in 1970 [18]. In its mostly commonly used formulation, the cell density u(x, t) at position x ∈ Rn and time t > 0 satisfies a diffusion-advection-reaction equation with the nonlinear advective term as their most characteristic ingredient modelling chemotaxis cell aggregation; this equation is coupled to the chemical concentration v(x, t), which typically solves a parabolic or elliptic (for chemoattractant with “fast” diffusion) equation with a reaction term depicting production and degradation of the chemoattractant. The pioneering work of Keller-Segel has initiated and continue to stimulate a fast growing research on the subject and, a great number of chemotaxis-involving system have been proposed and investigated both theoretically and experimentally, we refer to the review articles [3, 12, 14, 40, 46] for various progresses and applications of chemotaxis. In this chapter, we center on the following quasilinear parabolic-elliptic Keller-Segel chemotaxis system

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

⎧ u t = ∇ · (D(u)∇u) − ∇ · (S(u)∇v) + f (u), ⎪ ⎪ ⎪ ⎨0 = v − v + u κ , ∂u ∂v ⎪ = ∂ν = 0, ⎪ ⎪ ⎩ ∂ν u(x, 0) = u 0 (x),

x x x x

∈ Ω, t > 0 ∈ Ω, t > 0, ∈ ∂Ω, t > 0, ∈Ω

179

(12.1)

where κ > 0, Ω ⊂ Rn (n ≥ 2) is a bounded domain with the smooth boundary ∂Ω ∂ and ∂ν denotes the outward normal derivative on ∂Ω. In the first equation of (12.1), D(u) describes the random nonlinear diffusivity of the cells, S(u) signifies the chemotactic sensitivity (the second term is known as chemotaxis term, the defining term of chemotaxis models) accounting for the assumption that cells partially orient their motion to migrate toward increasing chemical concentrations and, the kinetic term f (u) depicts cell proliferation and cell death. We assume that the nonlinear diffusion D(u) and chemosensitivity S(u) satisfy D, S ∈ C 1+θ ([0, ∞)) for some θ ∈ (0, 1), D(u) > 0 and S(u) ≥ 0 for all u ≥ 0.

(12.2)

Moreover, based on the usual growth restrictions on D and S in the literature, cf. [5, 33, 49, 51], we shall assume there exist some constants α, β ∈ R and d, χ > 0 such that (12.3) D(u) ≥ d(u + 1)−α , S(u) ≤ χ (u + 1)β for all u ≥ 0. Finally, to match our assumptions imposed on D and S as in (12.2) and (12.3), we 1,∞ (Ω) and satisfies f (0) ≥ 0 and suppose that the kinetic f ∈ Wloc f (u) ≤ a0 − b(u + 1)γ , u ≥ 0, with some a0 ≥ 0, b > 0, γ > 1.

(12.4)

Here, we would like to comment that the conditions in (12.3) and (12.4), in particular (12.3), are biologically motivated. Indeed, for our mathematical reasonings, we only need the following set of conditions, cf. Remark 12.3: D(u) ≥ du −α , S(u) ≤ χ u β ,

f (u) ≤ a0 − bu γ , u > 0.

(12.5)

In the case of α > 0 (resp. β < 0), when u ≈ 0, the condition on D (resp. S) in (12.5) is not biologically reasonable. Therefore, we shall adopt the set of conditions in (12.3) and (12.4), even through they are not as mathematically simple as the set of conditions in (12.5). Since we are interested in the behavior of classical solutions, we impose a nondegeneracy condition on the nonlinear diffusion coefficient D (if the diffusion of cells is degenerate at u = 0, we can modify the approximation procedures in [36, 39] to study the global existence and boundedness of weak solutions). We mention that the assumptions in (12.2), (12.3) and (12.4) or their convenient version in (12.5) on D, S and f cover a variety of biological choices; for instance, they include, not limited to, the following prototypical choices:

180

X. Wang et al.

 −α

D(u) = (u + 1)

, S(u) = χ

u(u + 1)β−1 uβ

if β < 1, , if β ≥ 1

f (u) = au − bu γ . (12.6)

In the second equation of the KS system (12.1), for definiteness, we have specified the chemical secretion function to be the power-like production u κ , which is motivated by the choice u 2 used in the monograph [25, Chap. 5] to model the aggregation patterns formed by bacterial chemotaxis. Whilst, the arguments presented here can be readily adapted to other slightly general types of nonlinear secretion functions in e.g. [27, 28]. Strikingly, even the simple chemotaxis term −χ ∇ · (u∇v) is widely known to have strong influences on the boundedness and blow-up of the solutions of the underlying models [2, 11, 19, 26, 30, 38–40, 42, 43, 45]. While, the system (12.1) encompasses four ingredients: nonlinear diffusion, nonlinear chemotaxis, logistic source and nonlinear secretion. So far, numerous variants of (12.1) and its fully parabolic version when the second equation is replaced with vt = v − v + u κ have been investigated to provide kinds of conditions enhancing the boundedness, blow-up and other dynamical properties of the proposed models, cf. [3, 5, 7–10, 12, 14, 22, 32, 33, 35, 36, 41, 44, 47–49, 51] and the references therein. It is the purpose of this chapter to comprehend the interplay of the full combination of the four ingredients on the boundedness and large time behavior of solutions in parabolic-elliptic setting. This project stems primarily from [9, 16, 35, 47, 50, 51] and aims to give a unification and extension over them. The minimal choices with D(u) = 1, S(u) = χ u, γ = 2 and κ = 1 was initially studied in [35] wherein boundedness is ensured χ . Moreover, for f (u) = bu(1 − u) and b > 2χ , the solution of under b > (n−2) n (12.1) fulfills   lim u(·, t) − 1 L ∞ (Ω) + v(·, t) − 1 L ∞ (Ω) = 0.

t→∞

(12.7)

These results were first extended in [5, 36] for a system with nonlinear diffusion and logistic source, namely, (12.1) with D, f as in (12.2)–(12.4) or equivalently (12.5), S(u) = χ u and κ = 1, and then, were further generalized in [51] for a model with nonlinear diffusion, nonlinear chemosensitivity and logistic damping, precisely, (12.1) with D, S, f as given (12.2) and (12.5) for α ≤ 0, β > 0 and κ = 1; with such specifications, it is shown in [51] that the occurrence of blow-ups of (12.1) is ruled out provided either β + 1 < max{γ , 1 +

2 [n(α + β) − 2] − α} or b > b∗ = χ if γ = β + 1. n n(α + β) + 2(β − 1)

(12.8)

Here, we wish to employ simpler arguments than that of [51] to improve the results therein for (12.1) with the additional nonlinear secretion u κ (κ = 1) to the borderline

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

181

case b = b∗ . As a matter of fact, for the particularized choices D(u) = 1, S(u) = χ u and f as given in (12.5), this kind of extension has been progressively realized in [17, 47]: global existence and boundedness (and hence no blow-up) are ensured under either 1 + κ < max{γ , 1 +

(κn − 2) 2 } or b ≥ b∗ = χ if γ = κ + 1. n κn

(12.9)

Extensions in the direction of [35, 36] have been first made in [9] for the specialized choices D(u) = 1, S(u) = χ u β , β ≥ 1, f (u) = b(u − u γ ), γ > 1 and κ ≥ 1: the global existence and boundedness are obtained under [n(β + κ − 1) − 2] χ if γ = β + κ. n(β + κ − 1) + 2(β − 1) (12.10) Under further conditions, they extended the comparison argument in [35] to show that the constant equilibrium (1, 1) is globally stable and obeys (12.7). Recently, with D(u) = (u + 1)1−α , S(u) = χ u, f (u) = b(u − u γ ), α ≤ 0, γ > 1 and κ ≥ 1, non-boardline boundedness (see Remark 12.1 below) is derived and then convergence to constant equilibrium is shown by Lyapunov functional argument in [50]. Finally, we note that, with D(u) = 1, S(u) = χ u β , f (u) = bu(1 − u γ −1 ), β ≥ 1, γ > 1, the non-borderline boundedness of (12.10) was extended to the borderline in [16]: either β + κ < γ or b > b∗ =

[n(β + κ − 1) − 2] χ if γ = β + κ. n(β + κ − 1) + 2(β − 1) (12.11) We observe there are several additional restrictions on α and β in some previous mentioned papers; for instance, α ≤ 0 and β > 0 were required in [51], α ≤ 0, β = 1, κ ≥ 1 were required in [50] and, both β ≥ 1 and κ ≥ 1 were asked in [9]. Motivated by these observations, in this project, we systematically study the boundedness and convergence of solutions for the IBVP (12.1) to extend and unify those of [9, 16, 35, 47, 50, 51]. More specifically, we first give a set of criteria to entail the uniform-intime boundedness and thus global existence under either one of the followings: either β + κ < γ or b ≥ b∗ =

(B1) β + κ < max{γ , 1 + (B2) β + κ = γ and

2 n

− α},

b > b∗ =

⎧ [n(α+β+κ−1)−2] ⎨ n(α+β+κ−1)+2(β−1) χ

for β > 0,

⎩χ

for β ≤ 0,

(12.12)

(B3) β > 0, β + κ = γ , b = b∗ and either a0 = 0 or ⎧ ⎪ ⎨

α≤1 1 < α ≤ 21 + n2 ⎪ ⎩1 2 + n < α < 1 + n2 2

for γ > 1, 2 for 1 − α + n+2−nα < γ ≤ 1 − α + n4 , for γ > 1 − α + n4 ,

(12.13)

182

X. Wang et al.

(B4) β = 0, κ = γ > 1, b = b∗ = χ and either a0 = 0 or α < 1 + n2 . Remark 12.1 These boundedness principles cover (12.8), (12.9), (12.10) and (12.11) as special cases, as can be easily seen for the modelling choices in (12.6). Also, (B1) and (B2) with (α, β) = (1 − m, 1) gives precisely the non-borderline boundedness obtained in [50] wherein the factor χ was missed. It is worthwhile noting that the admissibility of the positiveness of α, for instance α ≤ 1, implies that, especially, for D(u) = (1 + u)−α , even through cells diffuse very slowly at points of high densities, no blowup of cells could occur. While, this is not the case in the absence of growth source, see a simplified variant of (12.1) in [6], wherein the condition α > n2 − 1 can even induce finite time blow-up. Moreover, our boundedness results also relax the restrictions on αβ > 0 and κ and extend and unify those ones in [9, 16, 35, 47, 50, 51] in an obvious way. Besides, our boundedness results also (R1) reveal that weak chemotaxis allows for less damping and strong production; for example, as a result of (B1), the following chemotaxis model: u t = u − χ ∇ ·

1 1 u ∇v + au − bu 1+ 2n , 0 = v − v + u 2+ n in Ω ⊂ Rn 2 (u + 1)

has no blow-up for any a ≥ 0, b, χ > 0, even though “β + κ > γ ”; (R2) show that any presence of logistic type source can prevent critical mass blowup in the case of b∗ = 0; for instance, for the critical exponent chemotaxis model 2

2

u t = ∇ · (∇u − u n ∇v) + au − bu 1+ n ,

0 = v − v + u in Ω ⊂ Rn .

When a = b = 0, the exponent n2 is widely known to be critical for the occurrence of blow-ups [8, 13, 23] and it has critical mass blow-up [24]. While, such blow-ups will be fully suppressed as long as any presence of the logistic 2 source au − bu 1+ n , a ∈ R, b > 0 appears, thanks to (B2). As another example, consider the following Keller-Segel chemotaxis model u t = ∇ · (u −( n −1) ∇u − u∇v) + au − bu 2 , 2

0 = v − v + u in Ω ⊂ Rn .

When a = b = 0, the exponent −( n2 − 1) is also well-known to be critical for the system and it has critical mass blow-up [4, 14, 21]. While, our result (B2) shows that the absorptive character of the logistic kinetics (b > 0) is sufficient to enforce boundedness and hence preclude any blow-up to occur; (R3) capture the effect of the net proliferation rate (whether a0 = 0 or not) of cells in the case of b∗ > 0. To our knowledge, this kind of effect has not been detected in the literature before. As an illustration of (B3), the following chemotaxis model u t = ∇ · (u −α ∇u) − χ ∇ · (u β ∇v) − b∗ u β+κ , 0 = v − v + u κ in Ω ⊂ Rn

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

183

has no blow-up for any α ∈ R, β, χ , κ > 0 with β + κ > 1 and α + β + κ > 1 + n2 . We note that (B1) is violated for such parameters. Here, b∗ is defined by (12.12). Apart from these uniform boundedness and global existence, which will be shown in Sect. 12.3, for the prototypical choices as in (12.6), in Sect. 12.4, we shall utilize energy functional methods inspired from [1, 10, 15] to conduct a systematical treat1 ment for the global stabilities of the homogeneous steady sates ((a/b) γ −1 , a/b) and (0, 0). Under explicit conditions, their global stabilities are derived, cf. Theorems 12.3 and 12.4. As a particular consequence, no pattern formations could arise for small chemosensitivity χ or large dampening b. Moreover, their respective exponential (a = 0) and algebraic (a = 0) convergence rates are explicitly calculated out. These stabilization results refine the uniform convergence in [9, 35, 50] achieved via the comparison arguments to exponential convergence, extend the convergence in [10, Theorems 1 and 2] to nonlinear diffusion, chemosensitivity and secretion and extend the convergence of [47, 50] to nonlinear diffusion and chemosensitivity. These asymptotical behaviors exhibit the effect of each ingredient in the underlying model on its large time behaviors.

12.2 Local Existence and Preliminaries Throughout this chapter, we shall abbreviate Ω f (x)d x as Ω f for simplicity. Moreover, we shall use ci (numbered within sections) or Ci (i = 1, 2, 3, . . .) to denote a generic constant which may vary line by line. The following basic statement on local existence and extensibility of classical solutions can be obtained via a suitable fixed point framework by adapting the well-established arguments for the quasilinear chemotaxis model with logistic source, hence we omit the details of proof and refer to the similar reasonings in [13, 22, 33, 51], for instance. Lemma 12.1 (Local existence) Let Ω ⊂ Rn (n ≥ 2) be bounded domain with smooth boundary and let the initial datum u 0 ∈ W 1,∞ (Ω) be nonnegative. Suppose that 1,∞ (Ω) with f (0) ≥ 0. Then there is a maxiD(u), S(u) satisfy (12.2) and f ∈ Wloc mal existence time Tmax ∈ (0, ∞] and a pair nonnegative functions (u, v) ∈ C 0 (Ω¯ × [0, Tmax )) ∩ C 2,1 (Ω¯ × (0, Tmax )) classically solving (12.1). Moreover, if Tmax < ∞, then it holds u(·, t) L ∞ (Ω) → ∞ as t  Tmax . Lemma 12.2 Assume that f satisfies (12.4) and (u, v) is the local solution of system (12.1). Then there exists a constant c1 > 0 such that u(·, t) L 1 ≤ c1 for all t ∈ (0, Tmax ).

(12.14)

Proof Integrating the first equation of (12.1) over Ω and using (12.4), we obtain

184

X. Wang et al.

d dt



Ω

u=

Ω

f (u) ≤

Ω

(a0 − bu γ ) ≤ −b

Ω

u + c2 |Ω|,

(12.15)

where c2 = max{a0 − bu γ + u : u ≥ 0} < ∞ due to the fact that b > 0, γ > 1. Then solving the differential equality (12.15), one can derive (12.14) directly. The following variant of standard Gagliardo-Nirenberg inequality will be frequently used in our upcoming discussions; for details, we refer the readers to [29, 31, 37]. Lemma 12.3 (Gagliardo-Nirenberg inequality) Let Ω be a bounded domain in Rn 2n with smooth boundary. Let 1 ≤ p ≤ (n−2) + and q ∈ (0, p). Then there exist two positive constants c3 and c4 depending only on Ω, p, q and n such that 1,2 (Ω) ∩ L q (Ω), φ L p ≤ c3 ∇φθL 2 φ1−θ L q + c4 φ L q , ∀φ ∈ W

where θ ∈ (0, 1) fulfilling 1 =θ p



1 1 − 2 n



− p 1 q ⇐⇒ θ = q 1 − n2 + n

+ (1 − θ )

n

n q

∈ (0, 1).

For convenience, we list the next simple comparison result, which will be used later. Lemma 12.4 ([34]) Let y(t) be a solution of problem 

y  (t) + Ay p ≤ B, t > 0 y(0) = y0

with A > 0, p > 0 and B ≥ 0. Then we have 1p  B y(t) ≤ max y0 , , t > 0. A

12.3 Uniform Boundedness and Global Existence To derive our main uniform boundedness and global existence of solutions for the system (12.1) announced in the Introduction, motivated by [44, 47], we first reduce the hard task of proving the L ∞ -boundedness of u to proving some L q -boundedness of u for some suitably finite q. Along this line, the next boundedness principle for (12.1) extends that of [47] to a system with nonlinear diffusion and nonlinear chemosensitivity. Theorem 12.1 (Boundedness criterion) In addition to the conditions in Lemma 12.1, let D, S and f further satisfy (12.3) and (12.4). Suppose that (u, v) is the unique maximal solution of (12.1) defined on [0, Tmax ). For some

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

q > max

n 2

 (α + β + κ − 1), 0 ,

185

(12.16)

if there exists a constant M = M(q, n, Ω) > 0 such that u(·, t) L q (Ω) ≤ M for all t ∈ (0, Tmax ), then Tmax = ∞, i.e., (u, v) exists globally in time. Moreover, there is a C > 0 such that u(·, t) L ∞ (Ω) + v(·, t)W 1,∞ (Ω) ≤ C for all t ∈ (0, ∞). Remark 12.2 When α = 0, β = 1, this criterion reduces to [47, Theorem 3.1]. Proof For our later purpose, we choose p suitably large to satisfy p > max{1, α, 1 − β}

(12.17)

and 

1 (n − 2)q nα , + (n − 2)(β + κ − 1) . (12.18) p > max q − β − κ + 1, α + n 2 2 Then we test the first equation in (12.1) by (u + 1) p−1 and integrate over Ω by part to deduce that 1 d p (u + 1) + ( p − 1) D(u)(u + 1) p−2 |∇u|2 p dt Ω Ω (12.19) = ( p − 1) S(u)(u + 1) p−2 ∇u · ∇v + (u + 1) p−1 f (u). Ω

Ω

First, the conditions on D and f in (12.3) and (12.4) and the fact p > α in (12.17) readily give

Ω

D(u)(u + 1) p−2 |∇u|2 ≥ d

Ω

as well as Ω

(u + 1) p−2−α |∇u|2 =

(12.20)



f (u)(u + 1)

p−1

≤ a0

p−α 4d |∇(u + 1) 2 |2 2 ( p − α) Ω

Ω

(u + 1)

p−1

−b

Ω

(u + 1) p−1+γ .

(12.21)

Second, because of p > 1 − β, the condition imposed on S in (12.3) along with the second equation in (12.1) implies that

186

X. Wang et al.







S (u)v χ κ = S (u)(u − v) ≤ (u + 1) p−1+β+κ , p−1+β Ω Ω (12.22) where we have used the fact that u χ (u + 1) p−1+β . S(z)(z + 1) p−2 dz ≤ 0 ≤ S (u) := p − 1+β 0 Ω

S(u)(u + 1) p−2 ∇u · ∇v =

Ω

∇S (u) · ∇v = −

Ω

Substituting (12.20), (12.21) and (12.22) into (12.19), and letting w = u + 1, we end up with 1 d p dt

Ω

p−α 4( p − 1)d |∇w 2 |2 2 ( p − α) Ω ( p − 1)χ p−1+γ p−1+β+κ ≤ −b w + w + a0 w p−1 (12.23) p−1+β Ω Ω b ( p − 1)χ ≤− w p−1+γ + w p−1+β+κ + c1 |Ω|, 2 Ω p−1+β Ω

wp +

where, due to b > 0, γ > 1,

b  c1 = max − z p−1+γ + a0 z p−1 : z ≥ 1 < ∞. 2 For a fixed q fulfilling (12.16), applying the Gagliardo-Nirenberg interpolation inequality, cf. Lemma 12.3 and using our assumption that w L q = u + 1 L q is bounded, we then deduce ( p − 1)χ β+ p+κ−1 w L β+ p+κ−1 p−1+β 2(β+ p+κ−1) p−α ( p − 1)χ p−α w 2  2(β+ = p+κ−1) p−α p−1+β L 2(β+ p+κ−1)θ1 p−α L2

≤ c2 ∇w

p−α 2



≤ c3 ∇w

p−α 2

L 2

2(β+ p+κ−1)θ1 p−α

w

p−α 2



2(β+ p+κ−1)(1−θ1 ) p−α 2q L p−α

(12.24) + c2 w

p−α 2



2(β+ p+κ−1) p−α 2q L p−α

+ c3 ,

where we used the choice of p in (12.18) to ensure θ1 =

p−α − 2(β+p−α 2q p+κ−1) p−α 1 + − 21 2q n

∈ (0, 1).

On the other hand, by the restriction of q in (12.16), one can easily check that

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

2(β + p + κ − 1)θ1 = p−α

β+ p+κ−1 q p−α + n1 2q

−1 −

1 2

187

< 2.

Then an application of Young’s inequality to (12.24) gives rise to p−α ( p − 1)χ 4( p − 1)d β+ p+κ−1 w L β+ p+κ−1 ≤ ∇w 2 2L 2 + c4 . β + p−1 ( p − α)2

(12.25)

Substituting (12.25) into (12.23), one obtains 1 d p dt

Ω

wp ≤ −

b 2

Ω

w p+γ −1 + c1 |Ω| + c4 ≤ −

Ω

w p + (c1 + c5 )|Ω| + c4 (12.26)

for p satisfying (12.17) and (12.18), where, due to b > 0, γ > 1,   b p+γ −1 p + z : z ≥ 1 < ∞. c5 = max − z 2 Solving the differential inequality (12.26) and applying Hölder inequality, we conclude p

p

p

u(·, t) L p ≤ w(·, t) L p ≤ u 0 + 1 L p + (c1 + c5 )|Ω| + c4 for all 1 ≤ p < ∞. This along with the elliptic estimate applied to the second equation in (12.1) yields v(·, t)W 2, p/κ ≤ c6 u(·, t) L p ≤ c7 , for all 1 ≤ p < ∞, which implies v(·, t)W 1,∞ ≤ c8 by Sobolev embedding. With these knowledge at hand, we can perform the well-known Moser iteration technique as in [33, Lemma A.1] (see also [3, 44]) to obtain the L ∞ - boundedness of u. Finally, the extendibility criterion provided by Lemma 12.1 concludes that Tmax = ∞ and then (u, v) is bounded in the topology stated in Theorem 12.1. With the help of the boundedness principle provided by Theorem 12.1, we are now in the position to prove our main assertions on the boundedness of solution for system (12.1) as announced early in Introduction. Theorem 12.2 Let the basic assumptions in Lemma 12.1 hold and let D(u), S(u) and f satisfy (12.2)–(12.4). If one of the items (B1)–(B4) as listed in Introduction holds, then, for any nonnegative initial datum u 0 ∈ W 1,∞ (Ω), there exists a unique pair (u, v) ∈ C 0 (Ω¯ × [0, ∞)) ∩ C 2,1 ((Ω¯ × (0, ∞)) which solves (12.1) classically. Moreover, there exists a constant C > 0 independent of t such that u(·, t) L ∞ (Ω) + v(·, t)W 1,∞ (Ω) ≤ C.

188

X. Wang et al.

For clarity, we shall prove Theorem 12.2 case by case. The boundedness criterion in Theorem 12.1 plays a crucial role, especially, in the borderline case (B3). Proof Case 1: Boundedness of solutions under (B1). Case 1.1: β + κ < γ . In this case, from (12.23) one has 1 d p dt



Ω

wp ≤ −

Ω

w p + (c1 + c8 )|Ω|,

(12.27)

where

b  ( p − 1)χ p−1+β+κ c8 = max − z p−1+γ + z + zp : z ≥ 1 < ∞ 2 p−1+β due to the facts that b > 0, β + κ < γ and r > 1. Then applying the Gronwall p p p inequality to (12.27), one has u(·, t) L p ≤ w(·, t) L p ≤ u 0 + 1 L p + (c1 + c8 ) |Ω| for any p > max{1, 1 − β}. Case 1.2: β + κ < 1 + n2 − α. This is the same as n (α + β + κ − 1) < 1. 2 Recalling that u(·, t) L 1 is uniformly bounded, c.f. Lemma 12.2. Therefore, in both cases, u(·, t) L q is uniformly bounded for some q satisfying (12.16), and then a simple application of Theorem 12.1 ends the proof of the theorem in the case of (B1). Case 2: Boundedness of solutions under (B2). In this case, since β + κ = γ , we thus have from (12.23) that   p−α 1 d 4( p − 1)d ( p − 1)χ p−1+γ + a 2 |2 ≤ − b − w p−1 wp + |∇w w 0 p dt Ω p−1+β Ω ( p − α)2 Ω

(12.28) for p > max{1, 1 − β, α}. Notice also we may assume without loss of generality that n n (12.29) pc := (α + β + κ − 1) = (α + γ − 1) ≥ 1, 2 2 since otherwise we are through by (B1). Then we define ⎧ −1)χ

( p − 1)χ  ⎨ (ppc−1+β c b∗ = inf : p > max{α, 1 − β, pc } = ⎩χ p−1+β

for β > 0,

for β ≤ 0. (12.30) This recovers the definition of b∗ defined by (12.12). Then, for any b > b∗ , we can > 0. This in pick a fixed q satisfying q > max{α, 1 − β, pc } for which b − (q−1)χ q−1+β conjunction with the fact that γ > 1 further entails

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

189

  (q − 1)χ  q−1+γ z c9 = max − b − + a0 z q−1 + z q : z ≥ 1 < ∞. q −1+β Then (12.28) enables us to derive an ordinary differential inequality for 1 d p dt q

Ω

wq :



Ω



wq ≤ −

Ω

wq + c9 |Ω|,

q

q

and hence u(·, t) L q ≤ w(·, t) L q ≤ u 0 + 1 L q + c9 |Ω|. As q > pc , the desired result follows from the boundedness criterion as before. Case 3: Boundedness of solutions under (B3). We shall divide the proof into two steps for the purpose of clarity. Step 1: We show that u(·, t) L pc is uniformly bounded with pc given by (12.29). Thanks to (B1) and (B2), we need only to proceed with pc > 1, and hence (12.30) gives b∗ > 0. Then the definitions of pc and b∗ in (12.29) and (12.30) further show that   n 2 n (12.31) pc = (α + γ − 1) > max 1, α, α, α + 1 − 2 2 n and b∗ − that

( pc −1)χ pc −1+β

= 0 for β > 0. Hence, we infer from (12.28) and substitute b = b∗ pc −α 1 d 4( pc − 1)d 2 |2 ≤ a w pc + |∇w w pc −1 . (12.32) 0 pc dt Ω ( pc − α)2 Ω Ω

In the case of a0 = 0, an integration of (12.32) directly yields the boundedness of u L pc . Next, we focus on the case a0 > 0. In this case, we aim to digest the integral on the right-hand side of (12.32) in terms of the dissipation term on its left-hand side. To this end, we employ the Gagliardo-Nirenberg inequality and the fact that w L 1 = u + 1 L 1 is bounded to deduce that p

w Lcpc = w

pc −α 2

2 pc

 pc −α ≤ c10 ∇w 2 pc L

pc −α 2

pc −α

≤ c11 ∇w

pc −α 2

2 pc θ 2

 Lpc2−α u 

2 pc θ 2 pc −α L2

pc −α 2



2 pc (1−θ2 ) pc −α 2 L pc −α

+ c10 w

pc −α 2

2 pc

 pc −α2 L

pc −α

+ c12 , (12.33)

where we have employed (12.31) to guarantee θ2 =

pc −α − p2c −α 2 pc pc −α 1 1 + − 2 n 2

∈ (0, 1).

(12.34)

Then an application of Hölder inequality together with (12.33) leads to p −1

1

p

a0 w Lcpc −1 ≤ a0 |Ω| pc (w Lcpc )

pc −1 pc

≤ c13 ∇w

pc −α 2

2( pc −1)θ2

 L 2pc −α + c14 .

(12.35)

190

X. Wang et al.

Now, we wish

2( pc − 1)2 2( pc − 1)θ2 = < 2, pc − α ( pc − α + n2 − 1) pc

which is the case if α . n 2 1 + n2 − α

(12.36)

Accordingly, under (12.36), then we can apply the Young’s inequality to (12.35) to obtain pc −α 2d( pc − 1) p −1 ∇w 2 2L 2 + c15 . (12.37) a0 w Lcpc −1 ≤ ( pc − α)2 On the other hand, we obtain from (12.33) that ∇w

pc −α 2

p −α  p  c 2L 2 ≥ c16 w Lcpc pc θ2 − c17 .

(12.38)

Thus, if the key inequality (12.36) holds, then we can substitute (12.37), (12.38) into p (12.32) and set y(t) := w Lcpc to conclude that y +

2( pc − 1)dpc c16 ppc −α 2( pc − 1)dpc c17 y c θ2 ≤ + pc c15 , ( pc − α)2 ( pc − α)2

p

y(0) = u 0 + 1 Lcpc ,

(12.39) whereupon a simple application of Lemma 12.4 gives the boundedness of w L pc = u + 1 L pc . In the sequel, we shall show the validity of (12.36) in the first two cases of (B3). Case 3.1: α ≤ 1 and γ > 1. Since α < 1 + n2 , we only need to check that the second condition in (12.36) holds. When α ≤ n2 , one has 1+ 12 −α ≤ 1 < pc = n2 (α + n

γ − 1), i.e., the second inequality in (12.36) holds; when n2 < α ≤ 1, it follows from plain calculations that 2 − α ≤ 0 < γ − 1, n + 2 − na which implies n2 (α + γ − 1) > 1+ 12 −α . All in all, (12.36) holds in the case α ≤ 1 n and γ > 1. 2 Case 3.2: 1 < α ≤ 21 + n2 and 1 − α + n+2−nα < γ ≤ 1 − α + n4 . In this case, 2 < γ alone entails n2 (α + γ − 1) > 1+ 12 −α . Hence, (12.36) even 1 − α + n+2−nα n simply holds. Case 3.3: 21 + n2 < α < 1 + n2 and γ > 1 − α + n4 . In this case, the restriction on γ implies pc = n2 (α + γ − 1) > 2. Then the interpolation inequality and (12.33) enable us to improve (12.35) to

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System … 1

p −1

pc −2

p

191

2 pc ( pc −2)θ2

pc −α

a0 w Lcpc −1 ≤ a0 w Lpc1−1 (w Lcpc ) pc −1 ≤ c18 ∇w 2  L( p2c −α)( pc −1) + c19 (12.40) pc −α 2d( pc − 1) 2 2 + c ≤ ∇w , 2 20 L ( pc − α)2 where we have applied the Young’s inequality due to the fact 2 pc ( pc − 2)θ2 2( pc − 2) pc . Then the desired boundedness of solution will be obtained directly from Theorem 12.1. In the case of β > 0, to proceed, we recall b = b∗ and then denote ( p − 1)χ ( p − 1)χ −b = − b∗ . p−1+β p−1+β

δ( p) =

Then the definition of b∗ in (12.30) shows that δ( pc ) = 0 and δ( p) > 0 for any p > pc . Now, for p ∈ ( pc , pc + 1], we use Step 1 and substitute (12.20) into (12.28) to infer that p−α 1 d 4d( pc − 1) 2 |2 ≤ δ( p) wp + |∇w w p+γ −1 + c21 . (12.41) p dt Ω ( pc + 1 − α)2 Ω Ω The fact that pc > max{1, α} shows the coefficient of the second integral on the lefthand side of (12.41) is positive. In the sequel, we wish to control the first term on the right-hand side of (12.41) by the second integral on its left-hand side. To this end, we first observe from the definition of pc in (12.29) that γ − 1 = n2 pc − α. Next, in the case of n > 2, the Hölder inequality along with the boundedness of w L pc enables us to estimate Ω

w p+γ −1 =

Ω

w

n( p−α) n−2 n−2 · n

2

· w pc n ≤

 Ω

w

n( p−α)  n−2  n n−2

Ω

w pc

2 n

≤ c22 w

p−α 2 2

2n

.

L n−2

2n

As such, the Sobolev embedding W 1,2 → L n−2 implies p+γ −1

w L p+γ −1 ≤ c22 w

p−α 2

2

2n

L n−2

≤ c23 (∇w

p−α 2

2L 2 + w

p−α 2

2L 2 ).

(12.42)

The Young’s inequality with epsilon then easily yields, for any  ∈ (0, 1), that

192

X. Wang et al.

w

p−α 2

2L 2 =



Ω

w p−α ≤  =

2

Ω Ω

w p−α+ n pc + Mε,α |Ω| (12.43) w

p+γ −1

+ Mε,α |Ω|,

where Mε,α = sup

 2 − n( pc +1−α)  ( p − α + 2 pc ) − p−α 2 n 2 pc n pc : p ∈ ( pc , pc + 1] ≤  . 2 p − α n p − α + n pc 2 n pc

A substitution of (12.43) with  = of c24 independent of p such that

1 2

−1 min{c23 , 1} into (12.42) produces the existence

p+γ −1

w L p+γ −1 ≤ c24 (∇w

p−α 2

2L 2 + 1).

(12.44)

In the case n = 2, setting  q = 2+

pc  ( p − α) := m( p − α), pc − α 2 n

we see that q > p + γ − 1 = p − α + n2 pc for all p > pc . Hence, we can select σ ∈ (0, 1) such that σ :=

p + γ − 1 − pc ⇐⇒ p + γ − 1 = m( p − α)σ + pc (1 − σ ). q − pc

(12.45)

Then the Hölder inequality entails Ω

w p+γ −1 =

Ω

w m( p−α)σ · w pc (1−σ ) ≤

 Ω

w m( p−α)

σ  Ω

w pc

1−σ

≤ c25 w

p−α 2 2mσ . L 2m

We thus conclude from the Gagliardo-Nirenberg inequality that Ω

  p−α p−α 2mσ (1−θ ) p−α θ3 3 2  2 2mσ w p+γ −1 ≤ c26 ∇w 2 2mσ w + w 2 pc 2 pc L2 L

≤ c27 (∇w

p−α 2

θ3 2mσ L2

p−α

L

+ 1) = c27 (∇w

p−α 2

2L 2

p−α

(12.46)

+ 1)

where we have used the following facts θ3 =

p−α 1 − 2m 2 pc p−α + n1 − 21 2 pc

∈ (0, 1),

2mσ θ3 = 2,

(12.47)

the latter is due to (12.45). Then (12.46) implies (12.47). Notice that pc > α and thus

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

193

2 pc 2 pc 2 pc ≤ < , ∀ p ∈ ( pc , pc + 1], pc + 1 − α p−α pc − α which implies that c26 and c27 can be uniformly bounded in p ∈ ( pc , pc + 1] and then can be chosen independent of such p. That is, (12.44) is also true in the case of n = 2. In summary, we obtain from (12.44) and (12.46) that δ( p)

Ω

w p+γ −1 ≤ max{c24 , c27 }δ( p)



|∇w

Ω

p−α 2

 |2 + 1 .

(12.48)

Since δ( pc ) = 0 and c24 , c27 are independent of p, there exists p0 > pc such that δ( p0 ) max{c24 , c27 } ≤

2d( pc − 1) , ( pc + 1 − α)2

and so a substitution of (12.48) with p = p0 into (12.41) with p = p0 yields

Ω

w p0 +

2d( pc − 1) ( pc + 1 − α)2



2d( pc − 1) + c21 := c27 . ( pc + 1 − α)2 (12.49) Keeping in mind p0 > pc , repeating the similar arguments used to (12.33) and (12.38), one can easily derive that 1 d p0 dt

∇w

p0 −α 2

Ω

|∇w

p0 −α 2

|2 ≤

p −α  p  0 2L 2 ≥ c28 w L0p0 p0 θ2 ( p0 ) − c29 ,

(12.50)

where θ2 ( p0 ) is defined by (12.34) with pc replace by p0 . Finally, we combine (12.49) and (12.50) to get a differential inequality for z(t) := p w L0p0 : z +

p0 −α 2d( pc − 1) p0 2d( pc − 1) p0 p p0 θ 2 ( p0 ) c z ≤ c29 + p0 c27 , z(0) = u 0 + 1 L0p0 , 28 ( pc + 1 − α)2 ( pc + 1 − α)2

which, upon comparison, directly shows that u + 1 L p0 = w L p0 is uniformly bounded. Because of p0 > pc , the desired assertion then follows from the boundedness principle provided by Theorem 12.2. Case 4: Boundedness of solutions under (B4). In the case of β = 0, we see from the definition of b∗ in (12.30) that b = b∗ = χ p−1)χ = 0 for any p > 1. Consequently, it follows from (12.28) that and so b − (p−1+β 1 d p dt



4( p − 1)d w + ( p − α)2 Ω



p

Ω

|∇w

p−α 2

| ≤ a0 2

Ω

w p−1 ,

p > max{1, α}.

(12.51) In the case of a0 = 0, an integration shows that w L p = u + 1 L p is bounded for any p > max{1, α}.

194

for

X. Wang et al.

In the case of a0 > 0, we are in a situation similar to Case 3.3 or (12.49). Indeed,   n 2 , (12.52) p > max 2, α, α, α + 1 − 2 n

we use the Gagliardo-Nirenberg inequality and the fact that w L 1 is bounded to infer p

w L p = w

p−α 2

2p

 p−α2 p ≤ c30 ∇w L

p−α 2

p−α

≤ c31 ∇w

p−α 2

2 pθ4

 Lp−α 2 w 

2 pθ4 p−α L2

p−α 2



2 p(1−θ4 ) p−α 2 L p−α

+ c30 w

p−α 2

2p

 p−α2 L

p−α

+ c32 , (12.53)

where we have applied the choice of p in (12.52) to guarantee θ4 =

p−α − p−α 2 2p p−α 1 + n − 21 2

∈ (0, 1).

Then we use the interpolation inequality and (12.53) to bound p−1

1

p

p−2

p−α

2 p( p−2)θ4

( p−α)( p−1) p−1 ≤ c a0 w L p−1 ≤ a0 w Lp−1 + c34 1 (w L p ) 33 ∇w 2  L 2 p−α 2d( p − 1) ≤ ∇w 2 2L 2 + c35 , ( p − α)2

(12.54)

where we have applied the Young’s inequality with epsilon due to the fact 2( p − 2) 2 p( p − 2)θ4 = 0 ∈ Ω, t > 0, ∈ ∂Ω, t > 0, ∈Ω

(12.55) where α, β, a ∈ R, χ , κ, b > 0 and γ > 1. For the simplest case that α = 0, β = κ = 1, γ = 2 and b = a, under the assumption b > 2χ , Tello and Winkler in [35] used comparison argument to show that the solution of (12.55) converges uniformly to its constant steady state (1, 1). On the other hand, for α = 0, β = κ = 1 and γ = 2, He and Zheng [10] modified the energy functional method from [1, 15] to investigate the stabilities of the constant equilibria (0, 0) and (a/b, a/b) with convergence rate estimates. This energy method was again used to study the case that α = 0, β = 1 and γ = κ + 1, κ > 0, cf. [47, 50]. Here, we further extend the energy functional method to carry out a comprehensive 1 analysis for the global stabilities of the constant steady sates ((a/b) γ −1 , a/b) and (0, 0) of (12.55) as well as their precise convergence rates. The first set of long time dynamics for (12.55) as t tends to infinity read as follows. Theorem 12.3 Let (u, v) be the global-in-time bounded smooth solution of (12.55) gained from Theorem 12.2. (C1) In the case of a > 0, assume β ≤1−

χ2  γ2κ−1 2κ−γ +1 α , γ ≥ 2κ, b > z 0 a γ −1 , 2 16

where z0 =

(z κ − 1)2 . γ −1 − 1) z∈(0,1)∪(1,∞) (z − 1)(z sup

(12.56)

(12.57)

Then the global solution (u, v) of (12.55) converges exponentially: (γ −1)aη a 1 a κ u(·, t) − ( ) γ −1  L ∞ (Ω) + v(·, t) − ( ) γ −1  L ∞ (Ω) ≤ Ce− (n+2)b t b b

(12.58)

for all t ≥ 0 and some large constant C = C(u 0 , v0 , κ, γ ) independent of t. Here η=b−

2κ+1−γ 2κ−γ +1 2κ a 2κ+1−γ χ2 χ2 z 0 ( ) γ −1 = b− γ −1 (b γ −1 − z 0 a γ −1 ) > 0. 16 b 16

(12.59)

196

X. Wang et al.

(C2) In the case of a = 0, the global solution (u, v) of (12.55) converges algebraically: 1 (12.60) u(·, t) L ∞ (Ω) ≤ C(t + 1)− (γ −1)(n+1) and v(·, t) L ∞ (Ω) ≤ C

⎧ κ ⎨ (t + 1)− (γ −1)(n+1) , if 0 < κ ≤ 1, ⎩

(t + 1)

κ+n − (γ −1)(n+1) 2

(12.61)

, if κ > 1

for all t ≥ 0 and some large constant C independent of t. (C3) In the case of a < 0, the global solution (u, v) of (12.55) converges exponentially: a

u(·, t) L ∞ (Ω) ≤ Ce n+1 t , v(·, t) L ∞ (Ω) ≤ C

⎧ aκ ⎨ e n+1 t , ⎩

e

a(n+κ) t (n+1)2

if 0 < κ ≤ 1, ,

if κ > 1 (12.62)

for all t ≥ 0 and some large constant C independent of t. Remark 12.4 For the special case of a = b := μ, the item (i) reduces to β ≤1−

χ2 α , γ ≥ 2κ, μ > 2 16

(z κ − 1)2 . γ −1 − 1) z∈(0,1)∪(1,∞) (z − 1)(z sup

The proof of Theorem 12.55 is to find a so-called Lyapounov functional as similar to [1, 10, 15, 47]. Whilst, we will present all the necessary details for the clarity of deriving the explicit convergence rates. Lemma 12.5 In the case of (C1) of Theorem 12.3, the solution (u, v) of (12.55) satisfies    a  κ 2  a  1 2  a  κ 2  γ −1 γ −1 γ −1 → 0, ≤ → 0 as t → ∞. u− v− uκ − b b b Ω Ω Ω

(12.63)

Proof Inspired by [1, 10, 15, 47], we consider the functional G(t) =

 Ω

u − c − c ln

 u  , c

c=

1  a  γ −1

b

, t > 0.

(12.64)

Since g(s) = s − c − c ln( cs ), s > 0 is decreasing on (0, c) and is increasing on (c, ∞), it achieves its global minimum zero at s = c. Thus, G(t) = Ω g(u) ≥ 0 for all t ≥ 0. Setting D(u) = (u + 1)−α and S(u) = χ u(u + 1)β−1 , we obtain from the first equation in (12.55), integration by parts, Cauchy-Schwarz inequality and (12.56) that

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

197



u−c ut u Ω  u − c ∇ · (D(u)∇u − χ S(u)∇v) + u(a − bu γ −1 ) = u Ω D(u) S(u) 2 |∇u| + cχ ∇u · ∇v − b (u − c)(u γ −1 − cγ −1 ) = −c 2 2 u u Ω Ω Ω √ 2 cχ 2  S(u) 2 χ S(u) D(u) ∇u − √ ∇v + |∇v|2 = −c √ u 4 2u D(u) u D(u) Ω Ω γ −1 γ −1 − b (u − c)(u −c ) Ω cχ 2 ≤ |∇v|2 − b (u − c)(u γ −1 − cγ −1 ). 4 Ω Ω (12.65) Testing the second equation in (12.55) by (v − cκ ), we discover d G(t) = dt



|∇v| = −

κ 2

2

Ω

Ω



(v − c ) +

Ω

(u κ − cκ )(v − cκ ).

(12.66)

Substituting (12.66) into (12.65), we end up with cχ 2 d cχ 2 G(t) ≤ − (v − cκ )2 + (u κ − cκ )(v − cκ ) − b (u − c)(u γ −1 − cγ −1 ) dt 4 Ω 4 Ω Ω cχ 2 (u κ − cκ )2 − b (u − c)(u γ −1 − cγ −1 ). ≤ 16 Ω Ω

(12.67)

Since γ ≥ 2κ and u > 0, from (12.57), we have, for u = c, that

(u

cχ 2 (u κ − cκ )2 16 − c)(u γ −1 − cγ −1 )

=

[( uc )κ − 1]2 c2κ+1−γ χ 2 16 ( uc − 1)[( uc )γ −1 − 1]



c2κ+1−γ χ 2 16

=

a 2κ+1−γ χ2 z 0 ( ) γ −1 < b. 16 b

(z κ − 1)2 γ −1 − 1) z∈(0,1)∪(1,∞) (z − 1)(z sup

(12.68)

Combing (12.67) and (12.68), we obtain d G(t) ≤ −η dt

Ω

(u − c)(u γ −1 − cγ −1 )

(12.69)

with η given in (12.59). Now, for any t0 ≥ 0, an integration of (12.69) from t0 to t yields t

G(t) − G(t0 ) ≤ −η t0

Ω

(u − c)(u γ −1 − cγ −1 ),

198

X. Wang et al.

and then the nonnegativity of G and the positivity of δ by (12.59) show



Ω

t0

(u − c)(u γ −1 − cγ −1 ) ≤

G(t0 ) < ∞. δ

Thanks to Theorem 12.2, we know that (u, v) is a global bounded classical solution of (12.55). This implies that the integrand Ω (u − c)(u γ −1 − cγ −1 ) is globally bounded and uniformly continuous with respect to t. Therefore, Ω

(u − c)(u γ −1 − cγ −1 ) → 0 as t → ∞,

and then (12.68) entails (u κ − cκ )2 ≤ z 0 c2κ−γ (u − c)(u γ −1 − cγ −1 ) → 0 as t → ∞. Ω

(12.70)

Ω

On the other hand, the Hölder inequality applied to (12.66) gives rise to

1 |∇v| ≤ − 2 Ω 2

and so

Ω

(v − cκ )2 ≤



1 (v − c ) + 2 Ω κ 2

Ω

Ω

(u κ − cκ )2

(u κ − cκ )2 → 0 as t → ∞.

(12.71)

Since u is bounded, we can choose R > c such that u ≤ R on Ω¯ × [0, ∞), and then obtain through a simple analysis that ⎧ 2 κ κ 2 ⎨ Ω (u − c)2 ≤ ( RR−c κ −cκ ) Ω (u − c ) → 0 as t → ∞, ⎩ (u − c)2 ≤ c−2(κ−1) (u κ − cκ )2 → 0 as t → ∞, Ω Ω

if 0 < κ ≤ 1, if κ > 1.

(12.72)

With c given in (12.64), the L 2 -stabilization in (12.63) follows from (12.72) and (12.71). Proof (Proof of (C1) of Theorem 12.3) The fact that (u, v) is the global-in-time bounded smooth solution of (12.55), say u ≤ R on Ω¯ × [0, ∞), ensures that v is bounded by the v-equation and that (u + 1)−α ≥ min{1, (R + 1)−α } > 0, and hence the standard parabolic regularity for quasilinear equations (cf. [20]) shows the existence of σ ∈ (0, 1) and C such that + vC 2+σ,1+ σ2 (Ω×[t,t+1]) ≤ C, ∀t ≥ 1. uC 2+σ,1+ σ2 (Ω×[t,t+1]) ¯ ¯

(12.73)

Consequently, we deduce from the Gagliardo-Nirenberg interpolation inequality and (12.63), (12.72) and (12.73) that

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System … n

199 2

u(·, t) − c L ∞ (Ω) ≤ C G N u(·, t) − cWn+21,∞ u(·, t) − c Ln+2 2 2

≤ Cu(·, t) − c Ln+2 2

(12.74) 2

≤ Cκ u κ (·, t) − cκ  Ln+2 → 0 as t → ∞. 2 In light of (12.69) and the definition of G in (12.64), we calculate u − c − c ln( uc ) 1 g(u) = lim = . u→c (u − c)(u γ −1 − cγ −1 ) u→c (u − c)(u γ −1 − cγ −1 ) 2(γ − 1)cγ −1 lim

This in conjunction with (12.74) allows one to find t1 ≥ 0 such that 1 1 (u − c)(u γ −1 − cγ −1 ) ≤ g(u) ≤ (u − c)(u γ −1 − cγ −1 ), 4(γ − 1)cγ −1 (γ − 1)cγ −1

t ≥ t1 ,

and thus 1 (u − c)(u γ −1 − cγ −1 ) 4(γ − 1)cγ −1 Ω (12.75) 1 γ −1 γ −1 ≤ G(t) = g(u) ≤ (u − c)(u − c ), t ≥ t1 . (γ − 1)cγ −1 Ω Ω From (12.69) and (12.75), we derive a Gronwall differential inequality for G: d G(t) ≤ −(γ − 1)δcγ −1 G(t), t ≥ t1 , dt immediately yielding G(t) ≤ G(t1 )e−(γ −1)δc

γ −1

(t−t1 )

, t ≥ t1 .

(12.76)

Consequently, we deduce from (12.68) or (12.70), (12.74), (12.75) and (12.76) that 2

u(·, t) − c L ∞ ≤ Cκ u κ (·, t) − cκ  Ln+2 2 1   n+2 (γ −1)δcγ −1 ≤ Cκ 4(γ − 1)z 0 c2κ−1 G(t1 ) e− n+2 (t−t1 ) , t ≥ t1 . (12.77) Keeping (12.71), (12.77) and (12.73) in mind and then repeating the similar reasonings applied to u for v, we obtain 1   n+2 (γ −1)δcγ −1 v(·, t) − cκ  L ∞ ≤ Cκ 4(γ − 1)z 0 c2κ−1 G(t1 ) e− n+2 (t−t1 ) , t ≥ t1 . (12.78)

200

X. Wang et al.

Now, substituting the definitions of δ and c into (12.77) and (12.78) and then taking suitably large C, we conclude the stabilization estimate (12.58). Proof (Proof of (C2) and (C3) of Theorem 12.3) From Sect. 12.3, one can easily see that the sign of a does not affect the boundedness and global existence. Hence, (u, v) is still a global-in-time bounded classical solution. In the case of a = 0, we integrate the first equation in (12.55) and employ Hölder inequality to obtain d dt



Ω

u = −b

γ

Ω

u ≤ −b|Ω|

−(γ −1)

 u



Ω

, t > 0,

which gives Ω

u≤

−(γ −1)

 Ω

u0

+ b(γ − 1)|Ω|−(γ −1) t

1 − (γ −1)

, t > 0.

(12.79)

Then Gagliardo-Nirenberg inequality together with the boundedness of u shows that n

1

u(·, t) L ∞ ≤ C G N u(·, t)Wn+11,∞ u(·, t) Ln+1 1 1  − (γ −1)(n+1) ≤ C ( Ω u 0 )−(γ −1) + b(γ − 1)|Ω|−(γ −1) t , t > 0. (12.80) An integration of the second equation in (12.55) shows 



Ω

v=

κ

Ω

u ≤



u)κ u(·, t)κ−1 L ∞ Ω u,

|Ω|1−κ (

Ω

if 0 < κ ≤ 1, if κ > 1.

This in conjunction with (12.79) and (12.80) gives rise to ⎧ k  − γ −1 ⎪ ⎪ ⎨ |Ω|1−κ ( Ω u 0 )−(γ −1) + b(γ − 1)|Ω|−(γ −1) t v≤ κ+n  − (γ −1)(n+1) ⎪ Ω ⎪ −(γ −1) −(γ −1) ⎩ C u ) + b(γ − 1)|Ω| t , Ω 0



if 0 < κ ≤ 1, if κ > 1.

Then we infer from (12.79) and (12.80) with u replaced by v that, t > 0,

v(·, t) L ∞ (Ω) ≤ C

⎧  κ − ⎪ (γ −1)n+1 ⎪ , ⎨ ( Ω u 0 )−(γ −1) + b(γ − 1)|Ω|−(γ −1) t

if 0 < κ ≤ 1,

κ+n   ⎪ ⎪ ⎩ ( u )−(γ −1) + b(γ − 1)|Ω|−(γ −1) t − (γ −1)(n+1)2 , if κ > 1. Ω 0

(12.81) In the case of a < 0, we integrate the first equation in (12.55) to obtain

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

d dt



Ω

u=a

Ω

u−b

Ω

u, t > 0,





and so

Ω

uγ ≤ a

201

Ω

u≤e

at Ω

u 0 , t > 0.

Then, as in (12.80), one has u(·, t) L ∞ (Ω) ≤ Ce

1

n+1

a n+1 t

Ω

u0

, t > 0.

(12.82)

With this decay estimate at hand, using the discussions leading to (12.81), we derive v(·, t) L ∞ (Ω)

⎧ aκ κ ⎨ e n+1 t ( Ω u 0 ) n+1 ≤C n+κ t ⎩ a(n+κ) e (n+1)2 ( Ω u 0 ) (n+1)2 ,

if 0 < κ ≤ 1, if κ > 1.

(12.83)

Extracting the essential ingredients of the estimates (12.80), (12.81), (12.82) and (12.83), we can easily infer the stabilization estimates (12.60), (12.61) and (12.62). When α ≥ 0, the validity of (12.56) directly gives β ≤ 1 − α2 ≤ 1; that is, the chemosensitivity can allow to grow at most sub-linearly. A natural question then arises: how about the large time behavior when the chemosensitivity grows suplinearly, i.e., β > 1? In the case, the prototypical choice for S(u) is u β . Thus, in the rest of this chapter, we explore the long time behavior for the following Keller-Segel chemotaxis model: ⎧ u t = ∇ · ((u + 1)−α ∇u) − χ ∇ · (u β ∇v) + au − bu γ , x ∈ Ω, t > 0 ⎪ ⎪ ⎪ ⎨ x ∈ Ω, t > 0, 0 = v − v + u κ , ∂u ∂v ⎪ = = 0, x ∈ ∂Ω, t > 0, ⎪ ∂ν ⎪ ⎩ ∂ν x ∈ Ω, u(x, 0) = u 0 (x), (12.84) where α, a ∈ R, χ , κ, b > 0 and β, γ > 1. For the case that α = 0, β ≥ 1, κ ≥ 1, γ ≥ β + κ and b = a, an extension of the comparison argument from [35] was made in [9]. Therein, the uniform convergence of the constant equilibrium (1, 1) was shown. Here, we will respectively show the exponential convergence and algebraic convergence for the constant equilibria 1 ((a/b) γ −1 , a/b) and (0, 0) as well as their explicit convergence rates for the KS chemotaxis-growth system (12.84). From the proofs of (C2) and (C3) in Theorem 12.3, we need only to study the large time dynamics of (12.84) for the case of a > 0. Theorem 12.4 Let (u, v) be the global-in-time bounded smooth solution of (12.84) obtained in Theorem 12.2. Suppose that

202

X. Wang et al.

β > 1, γ ≥ 2 max{κ, β − 1} and b>

⎧   γ2κ−1 2κ+1−γ ⎪ ⎪ ⎨ χκ z 0 a γ −1  ⎪ ⎪ ⎩

χ (z β−1 1

+



z 0 z 2 )a

(12.85)

for β − 1 = κ, κ+β−γ γ −1

(12.86)

γ −1  κ+β−1

for β − 1 = κ,

where z 0 is defined in (12.57), z 1 and z 2 are defined by (z β−1 − 1)(z κ − 1) (z β−1 − 1)2 , z . = sup 2 γ −1 − 1) γ −1 − 1) z∈(0,1)∪(1,∞) (z − 1)(z z∈(0,1)∪(1,∞) (z − 1)(z (12.87) In the case of a > 0, the global bounded solution (u, v) of (12.84) converges exponentially: z1 =

sup

κ

1

u(·, t) − ( ab ) γ −1  L ∞ (Ω) + v(·, t) − ( ab ) γ −1  L ∞ (Ω) ⎧ (γ −1)aμ ⎨ e− (n+2)b t for β − 1 = κ, ≤C −1)aσ ⎩ − (γ(n+2)b t for β − 1 = κ, e

(12.88)

for all t ≥ 0 and some large constant C independent of t. Here, due to (12.86), μ=b− and

χ a  z0 κ b

2κ+1−γ γ −1

= b−

2κ+1−γ γ −1

 2κ  2κ−γ +1 χ b γ −1 − z 0 a γ −1 > 0 κ

 a  κ+β−γ √ χ γ −1 (z 1 + z 0 z 2 ) β −1 b   κ+β−γ κ+β−1 √ χ − κ+β−γ γ −1 γ −1 γ −1 b > 0. (z 1 + z 0 z 2 )a =b − β −1

(12.89)

σ =b−

(12.90)

Remark 12.5 In the case of β > 1, the nonlinear diffusion exponent α plays no explicit role in the stabilization of the bounded solutions of (12.55). Proof With G defined in (12.64), we apply integration by parts to (12.84) to get

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

203



u−c ut u Ω  u − c ∇ · ((u + 1)−α ∇u − χ u β ∇v) + u(a − bu γ −1 ) = u Ω ≤ cχ u β−2 ∇u · ∇v − b (u − c)(u γ −1 − cγ −1 ) Ω Ω cχ β−1 β−1 = ∇(u − c ) · ∇v − b (u − c)(u γ −1 − cγ −1 ) β −1 Ω Ω cχ β−1 β−1 κ (u − c )(−v + u ) − b (u − c)(u γ −1 − cγ −1 ) = β −1 Ω Ω cχ cχ β−1 β−1 κ κ (u − c )(u − c ) − (u β−1 − cβ−1 )(v − cκ ) = β −1 Ω β −1 Ω − b (u − c)(u γ −1 − cγ −1 ).

d G(t) = dt

Ω

(12.91)

When β − 1 = κ, we derive from (12.66) that Ω

(u β−1 − cβ−1 )(v − cκ ) =

Ω

(u κ − cκ )(v − cκ ) =

Ω

|∇v|2 +

Ω

(v − cκ )2 ≥ 0.

(12.92)

By the assumption γ ≥ 2κ, c.f. (12.85), we deduce as in (12.68) that cχ β −1

Ω

(u κ − cκ )2 ≤

c2κ−γ +1 χ z0 κ

Ω

(u − c)(u γ −1 − cγ −1 ).

(12.93)

Substituting (12.92) and (12.93) into (12.91) and dropping some non-positive terms, we conclude that   d c2κ−γ +1 χ G(t) ≤ − b − z0 (u − c)(u γ −1 − cγ −1 ) = −μ (u − c)(u γ −1 − cγ −1 ), dt κ Ω Ω

(12.94) where μ is defined by (12.89). When β − 1 = κ, using γ ≥ 2 max{κ, β − 1} ≥ β − 1 + κ, we infer as in (12.68) that cχ cκ+β−γ χ β−1 β−1 κ κ z 1 (u − c)(u γ −1 − cγ −1 ), (u − c )(u − c ) ≤ β −1 Ω β −1 Ω where z 1 is given by (12.87). In view of Hölder inequality and (12.71), we estimate, for any  > 0, −

cχ cχ  cχ (u β−1 − cβ−1 )(v − cκ ) ≤ (u β−1 − cβ−1 )2 + (u − cκ )2 . β −1 Ω β −1 Ω 4(β − 1) Ω

Again since γ ≥ 2(β − 1), we deduce as in (12.68) that

204

X. Wang et al.

cχ  β −1

Ω

(u

β−1

−c

c2β−1−γ χ  z2 ) ≤ β −1

β−1 2

Ω

(u − c)(u γ −1 − cγ −1 ),

where z 2 is given by (12.87). Combining these estimates with (12.93) and dropping the non-positive term, we conclude that   d cκ+β−γ χ c2β−1−γ χ  c2κ−γ +1 χ G(t) ≤ −b + z1 + z2 + z0 (u − c)(u γ −1 − cγ −1 ) dt β −1 β −1 4(β − 1) Ω   cκ+β−γ χ √ cκ+β−γ χ z1 + z0 z2 (u − c)(u γ −1 − cγ −1 ) = −b + β −1 β −1 Ω (u − c)(u γ −1 − cγ −1 ), := −σ Ω

(12.95) where σ is defined by (12.90) and we have substituted the choice of  in accordance with  c2β−1−γ χ  c2κ−γ +1 χ cκ−β+1 z 0 ⇐⇒ = z2 = z0 . 2 z2 β −1 4(β − 1) With the key inequalities (12.94) and (12.95) at hand, we can readily adapt the proof of (C1) of Theorem 12.3 to achieve the exponential convergence (12.88). Acknowledgements T. Xiang is funded by the NSF of China (No. 12071476 and 11871226) and the Research Funds of Renmin University of China (No. 2018030199). He also thanks his good friend Hai-Yang Jin for stimulating discussions and interactions on certain parts of earlier version of this manuscript.

References 1. Bai, X., Winkler, M.: Equilibration in a fully parabolic two-species chemotaxis system with competitive kinetics. Indiana Univ. Math. J. 65, 553–583 (2016) 2. Biler, P.: Global solutions to some parabolic-elliptic systems of chemotaxis. Adv. Math. Sci. Appl. 9, 347–359 (1999) 3. Bellomo, N., Bellouquid, A., Tao, Y., Winkler, M.: Toward a mathematical theory of KellerSegel models of pattern formation in biological tissues. Math. Models Methods Appl. Sci. 25(9), 1663–1763 (2015) 4. Blanchet, A., Carrillo, J., Laurencot, P.: Critical mass for a Patlak-Keller-Segel model with degenerate diffusion in higher dimensions. Calc. Var. Partial Differ. Equ. 35, 133–168 (2009) 5. Cao, X., Zheng, S.: Boundedness of solutions to a quasilinear parabolic-elliptic Keller-Segel system with logistic source. Math. Methods Appl. Sci. 37, 2326–2330 (2014) 6. Cie´slak, T., Winkler, M.: Finite-time blow-up in a quasilinear system of chemotaxis. Nonlinearity 21, 1057–1076 (2008) 7. Cie´slak, T., Stinner, C.: Finite-time blowup and global-in-time unbounded solutions to a parabolic-parabolic quasilinear Keller-Segel system in higher dimensions. J. Differ. Equ. 252, 5832–5851 (2012) 8. Ciéslak, T., Stinner, C.: New critical exponents in a fully parabolic quasilinear Keller-Segel system and applications to volume filling models. J. Differ. Equ. 258, 2080–2113 (2015) 9. Galakhova, E., Salievab, O., Tello, J.: On a Parabolic-Elliptic system with chemotaxis and logistic type growth. J. Differ. Equ. 261, 4631–4647 (2016)

12 Dynamics in a Quasilinear Parabolic-Elliptic Keller-Segel System …

205

10. He, X., Zheng, S.: Convergence rate estimates of solutions in a higher dimensional chemotaxis system with logistic source. J. Math. Anal. Appl. 436, 970–982 (2016) 11. Herrero, M., Velázquez, J.: Singularity patterns in a chemotaxis model. Math. Ann. 306, 583– 623 (1996) 12. Hillen, T., Painter, K.: A user’s guide to PDE models for chemotaxis. J. Math. Biol. 58, 183–217 (2009) 13. Horstmann, D., Winkler M.: Boundedness vs. blow-up in a chemotaxis system. J. Differ. Equ. 215(1), 52–107 (2005) 14. Horstmann, D.: From 1970 until now: the Keller-Segel model in chemotaxis and its consequence I. Jahresber DMV 105, 103–165 (2003) 15. Hsu, S.: Limiting behavior for competing species. SIAM J. Appl. Math. 34, 760–763 (1978) 16. Hu, B., Tao, Y.: Boundedness in a parabolic-elliptic chemotaxis-growth system under a critical parameter condition. Appl. Math. Lett. 64, 1–7 (2017) 17. Kang, K., Stevens, A.: Blowup and global solutions in a chemotaxis-growth system. Non-linear Anal. 135, 57–72 (2016) 18. Keller, E., Segel, L.: Initiation of slime mold aggregation viewed as an instability. J. Theoret. Biol. 26, 399–415 (1970) 19. Jäger, W., Luckhaus, S.: On explosions of solutions to a system of partial differential equations modelling chemotaxis. Trans. Am. Math. Soc. 329, 819–824 (1992) 20. Ladyzhenskaya, O., Solonnikov, V., Uralceva, N.: Linear and Quasilinear Equations of Parabolic Type. AMS, Providence, RI (1968) 21. Laurencota, P., Mizoguchib, N.: Finite time blowup for the parabolic–parabolic Keller–Segel system with critical diffusion. Ann. Inst. H. Poincaré Anal. Non Linéaire 34, 197–220 (2017) 22. Li, X., Xiang, Z.: Boundedness in quasilinear Keller-Segel equations with nonlinear sensitivity and logistic source. Discrete Contin. Dyn. Syst. 35, 3503–3531 (2015) 23. Mizoguchi, N., Souplet, P.: Nondegeneracy of blow-up points for the parabolic Keller-Segel system. Ann. Inst. H. Poincaré Anal. Non Linéaire 31, 851–875 (2014) 24. Montaru, A.: A semilinear parabolic-elliptic chemotaxis system with critical mass in any space dimension. Nonlinearity 26, 2669–2701 (2013) 25. Murray, J.D.: Mathematical Biology. I. An Introduction, 3rd edn. Interdisciplinary Applied Mathematics, vol. 17. Springer, New York (2002) 26. Nagai, T.: Blow-up of radially symmetric solutions to a chemotaxis system. Adv. Math. Sci. Appl. 5, 581–601 (1995) 27. Nakaguchi, E.: Osaki: global existence of solutions to a parabolic-parabolic system for chemotaxis with weak degradation. Nonlinear Anal. 74, 286–297 (2011) 28. Nakaguchi, E.: Osaki: global solutions and exponential attractors of a parabolic-parabolic system for chemotaxis with subquadratic degradation. Discrete Contin. Dyn. Syst. Ser. B 18, 2627–2646 (2013) 29. Nirenberg, L.: An extended interpolation inequality. Ann. Scuola Norm. Sup. Pisa 20, 733–737 (1966) 30. Senba, T., Suzuki, T.: Parabolic system of chemotaxis: blowup in a finite and the infinite time. Methods Appl. Anal. 8, 349–367 (2001) 31. Tao, Y., Wang, Z.: Competing effects of attraction vs. repulsion in chemotaxis. Math. Models Methods Appl. Sci. 23, 1–36 (2013) 32. Tao, X., Zhou, S., Ding, M.: Boundedness of solutions to a quasilinear parabolic-parabolic chemotaxis model with nonlinear signal production. J. Math. Anal. Appl. 474, 733–747 (2019) 33. Tao, Y., Winkler, M.: Boundedness in a quasilinear parabolic-parabolic Keller-Segel system with subcritical sensitivity. J. Differ. Equ. 252, 692–715 (2012) 34. Temam, R.: Infinite-Dimensional Dynamical Systems in Mechanics and Physics. Applied Mathematical Sciences, 2nd edn. Spring, New York (1997) 35. Tello, J., Winkler, M.: A chemotaxis system with logistic source. Comm. Partial Differ. Equ. 32, 849–877 (2007) 36. Wang, L., Mu, C., Zheng, P.: On a quasilinear parabolic-elliptic chemotaxis system with logistic source. J. Differ. Equ. 256, 1847–1872 (2014)

206

X. Wang et al.

37. Winkler, M.: A critical exponent in a degenerate parabolic equation. Math. Methods Appl. Sci. 25, 911–925 (2002) 38. Winkler, M.: Aggregation vs. global diffusive behavior in the higher-dimensional Keller-Segel model. J. Differ. Equ. 248, 2889–2905 (2010) 39. Winkler, M.: Blow-up in a higher-dimensional chemotaxis system despite logistic growth restriction. J. Math. Anal. Appl. 384, 261–272 (2011) 40. Winkler, M.: Finite-time blow-up in the higher-dimensional parabolic-parabolic Keller-Segel system. J. Math. Pures Appl. 100, 748–767 (2013) 41. Winkler, M.: How far can chemotactic cross-diffusion enforce exceeding carrying capacities? J. Nonlinear Sci. 24, 809–855 (2014) 42. Winkler, M.: A critical blow-up exponent in a chemotaxis system with nonlinear signal production. Nonlinearity 31, 2031–2056 (2018) 43. Winkler, M.: Finite-time blow-up in low-dimensional Keller-Segel systems with logistic-type superlinear degradation. Z. Angew. Math. Phys. 69, 69 (2018) 44. Xiang, T.: Boundedness and global existence in the higher-dimensional parabolic-parabolic chemotaxis system with/without growth source. J. Differ. Equ. 258, 4275–4323 (2015) 45. Xiang, T.: How strong a logistic damping can prevent blow-up for the minimal Keller-Segel chemotaxis system? J. Math. Anal. Appl. 459, 1172–1200 (2018) 46. Xiang, T.: Chemotactic aggregation versus logistic damping on boundedness in the 3D minimal Keller-Segel model. SIAM J. Appl. Math. 78, 2420–2438 (2018) 47. Xiang, T.: Dynamics in a parabolic-elliptic chemotaxis system with growth source and nonlinear secretion. Commun. Pure Appl. Anal. 18, 255–284 (2019) 48. Yang, C., Cao, X., Jiang, Z., Zheng, S.: Boundedness in a quasilinear fully parabolic KellerSegel system of higher dimension with logistic source. J. Math. Anal. Appl. 430, 585–591 (2015) 49. Zhang, Q., Li, Y.: Boundedness in a quasilinear fully parabolic Keller-Segel system with logistic source. Z. Angew. Math. Phys. 66, 2473–2484 (2015) 50. Zhao. J., Mu, C., Wang, L., Lin, K.: A quasilinear parabolic-elliptic chemotaxis-growth system with nonlinear secretion. Appl. Anal. https://doi.org/10.1080/00036811.2018.1489955 51. Zheng, J.: Boundedness of solutions to a quasilinear parabolic-elliptic Keller-Segel system with logistic source. J. Differ. Equ. 259, 120–140 (2015)

Chapter 13

Li-Yau Gradient Estimate on Graphs Yong Lin and Shuang Liu

Abstract In this chapter, we review some of the discrete notions of Ricci curvature, and summarize our research on Li-Yau gradient estimate and its applications for graphs.

13.1 Introduction and Notations In the past two decades, geometry and analysis on graphs have developed rapidly, and there has been remarkable progress, see [1, 2, 6–9, 11, 13, 15] and references therein. Due to the discrete nature of graphs, it is universally acknowledged that there are two difficulties to establish geometric notions and derive other interesting geometric and analytic results. Firstly, the chain rule fails on graphs. And the more challenging one is to define Ricci curvature meaningfully. Numerous definitions and approaches have been put forward which are motivated by specific curvature properties of Riemannian manifolds. Two popular ways to define Ricci curvature are based on optimal transportation of probability measures (Ollivier Curvature) and on Bochner formula (Bakry-Émery Curvature), see [13, 15]. These notions of curvature are widely used in lots of real-world fields, such as Finance [18], Computer Science [20] and Biology [16, 17]. For example, [18] regards Ollivier curvature as a quantitative indicator of the systemic risk in financial networks and the fragility of financial markets. In this chapter, we pay attention to Bakry-Émery Curvature and Li-Yau inequality. Li-Yau inequality is the most fundamental result in geometric analysis, which is a very powerful tool for studying positive solutions to the heat equation Y. Lin Yau Mathematical Science Center, Tsinghua University, Beijing 100084, China e-mail: [email protected] S. Liu (B) School of Mathematics, Renmin University of China, Beijing 100872, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_13

207

208

Y. Lin and S. Liu

∂t u = Δu on manifolds. In its simplest case, it states that a positive solution u on a compact n-dimensional manifold with non-negative Ricci curvature satisfies ∂t u |∇u|2 n − ≤ , 2 u u 2t

t > 0.

Variants of the Li-Yau inequality have proven to be an important tool in nonRiemannian settings as well. Our purpose in this chapter is to summarize without proofs the relevant material on Li-Yau inequality and its applications for graphs. Now, we introduce the notations in this chapter. Let G = (V, E) be a finite or infinite graph with the set of vertices V and the set of edges E, a symmetric subsets of V × V . Two vertices are called neighbours if they are connected by an edge {x, y} ∈ E, which is denoted by x ∼ y. In this chapter, we do allow loops for graphs, which means {x, x} ∈ E. We just consider connected graphs, that is, there is a finite path connecting x and y for any distinct x, y ∈ V . On (V, E), we may assign a measure on vertices by a function m : V → R+ , and give a weight on edges by a function ω : E → R+ , the edge {x, y} ∈ E has weight ωx y > 0, and the weight function is symmetric, i.e. ωx y = ω yx . We can extent the edge / E. We weight function to every pairs of vertices by letting ωx y = 0 when {x, y} ∈ call the quadruple G = (V, E, m, ω) a weighted graph. In this chapter, we restrict our interest to the locally finite graph, that is, each vertex has finitely many neighbours. We denote by V R the set of real-value functions on V , and by C0 (V ) the set p p of finitely supported functions on V . We denote by m , p ∈ [1, ∞] the m spaces of functions on V with respect to the measure m, and by  · mp the p-norm of a function. To a weighted graph G, one associates a Dirichlet form Q : D(Q) × D(Q) → R Q( f, g) :=

1  ωx y ( f (y) − f (x))(g(y) − g(x)), 2 x,y∈V

where D(Q) is defined as the completion of C0 (V ) under the Q-norm  ·  Q given by   1   ωx y ( f (y) − f (x))2 .  f  Q =  f 22 + m 2 x,y∈V We denote by L the infinitesimal generator of the Dirichlet form Q and by Pt f = e−t L f the corresponding semigroup. For a locally finite graph, the generator L coincides with the Laplacian Δ on the domain of D(L) = { f ∈ 2m |L f ∈ 2m }, i.e. −L f = Δf for any f ∈ D(L). We recall the definition of the Lapalcian Δ, which is defined by

13 Li-Yau Gradient Estimate on Graphs

Δf (x) =

209

1  ωx y ( f (y) − f (x)), f ∈ V R . m(x) y∼x

Obviously, the measure m plays an important role in the definition of Laplacian. Given the weight ω on E, there are two typical choices of Laplacian as follows:  • m(x) = deg(x) := y∼x ωx y for all x ∈ V , which is called the normalized graph Laplacian; • m(x) ≡ 1 for all x ∈ V , which is the combinatorial graph Laplacian. Throughout the paper, we may assume one of the following conditions: (A1 ) The Laplacian Δ is bounded on 2m , which is equivalent to Dm := sup x∈V

deg(x) < +∞. m(x)

(A2 ) G is complete, that is, there exists a non-decreasing sequence {ηk }∞ k=0 ∈ C 0 (V ) such that 1 lim ηk = 1 and Γ (ηk ) ≤ . k→∞ k Moreover, the measure m is non-degenerate, i.e. inf m(x) = δ > 0.

x∈V

Note that the normalized Laplacian is bounded, and the combinatorial Laplacian may be unbounded but is non-degenerate.

13.2 Curvature Dimension Inequalities Definition 13.1 The gradient form Γ and iterated gradient form Γ2 associated with Laplacian are defined by 2Γ ( f, g)(x) = (Δ( f · g) − f · Δ(g) − Δ( f ) · g)(x) 1  = ωx y ( f (y) − f (x))(g(y) − g(x)), μ(x) y∼x 2Γ2 ( f, g) = ΔΓ ( f, g) − Γ ( f, Δg) − Γ (Δf, g). We write Γ ( f ) = Γ ( f, f ) and Γ2 ( f ) = Γ2 ( f, f ). Definition 13.2 The graph G satisfies curvature dimension inequality C D(n, K ) if, for any function f ∈ V R and at every vertex x ∈ V

210

Y. Lin and S. Liu

Γ2 ( f ) ≥

1 (Δf )2 + K Γ ( f ). n

(13.1)

On graphs, it seems insufficient to prove a generalization of Li-Yau inequality only satisfying C D(n, 0) inequality. In [2], the authors prove a discrete analogue of the Li-Yau inequality by modifying the classical curvature notion, which they call the exponential curvature dimension inequality, i.e. C D E and C D E , both of which we recall below. Definition 13.3 We say that a graph G satisfies C D E(x, n, K ) if for any positive function f : V → R+ such that Δf (x) < 0, we have Γ2 ( f )(x) := Γ2 ( f )(x) − Γ

 f,

1 Γ(f) (x) ≥ (Δf )(x)2 + K Γ ( f )(x). (13.2) f n

We say that C D E(n, K ) is satisfied if C D E(x, n, K ) is satisfied for all x ∈ V . Definition 13.4 We say that a graph G satisfies C D E (x, n, K ), if for any positive function f : V → R+ , we have Γ2 ( f )(x) ≥

1 f (x)2 (Δ log f ) (x)2 + K Γ ( f )(x). n

(13.3)

We say that C D E (n, K ) is satisfied if C D E (x, n, K ) is satisfied for all x ∈ V . Remark 13.1 The dimension parameter n can be chosen to be an arbitrary positive real number, including ∞. If the semigroup generated by Δ is a diffusion semigroup (e.g. the Laplacian on a manifold), then C D(n, K ) and C D E (n, K ) are equivalent. On graphs, C D E (n, K ) implies C D E(n, K ). Moreover, [14] showed that C D E (n, K ) yields C D(n, K ) but the converse is not true. In [2], the authors proved that lattices, and more generally Ricci-flat graphs in the sense of Chung and Yau [3] which include the abelian Cayley graphs, satisfy C D E(n, 0) and C D E (n , 0) for some n, n .

13.3 Li-Yau Inequality and Its Applications In this section, we show Li-Yau inequality and its main applications on non-negative graphs for bounded and unbounded Laplacians. Furthermore, we give a brief exposition of gradient estimates on more general graphs without curvature condition.

13 Li-Yau Gradient Estimate on Graphs

211

13.3.1 Li-Yau Inequality for Bounded Laplacians In this subsection, we assume the graph satisfies the condition (A1 ). Using the maximal principle, [2] proved Li-Yau inequality for positive solutions to heat equation under C D E(n, 0) with finite n on graphs. In this section, we further assume that deg(x) < +∞ Dω := sup y∼x,x,y∈V ωx y Theorem 13.1 ([2], Theorem 4.20) If the weighted graph G satisfies C D E(n, 0), for any R > 0, x0 ∈ V , let u be a positive solution to the heat equation on B(x0 , 2R), then √ √ Γ ( u) ∂t u n(1 + Dω )Dm n − √ ≤ + on B(x0 , R). u 2t R u As applications, the above Li-Yau inequality leads to the Harnack inequality ([2], Corollary 5.3) and hence the heat kernel bounds ([2], Theorem 7.6) and polynomial volume growth property ([2], Corollary 7.8). Next, we consider a fixed group with a finite, symmetric generating set S. The Cayley graph associated to this generating set will be denoted by Cay( , S). The following result is the straight conclusion of Gromov’s theorem [5] and polynomial volume growth property under the assumption of C D E(n, 0) in [2]. Theorem 13.2 If the Cayley graph Cay( , S) satisfies C D E(n, 0), then is a virtually nilpotent group, which means the group has a finite-index nilpotent subgroup. However, the above Li-Yau inequality is insufficient to derive the equivalent conditions of volume doubling and Poincaré inequalities, along with Gaussian heat kernel bounds, and the strongest form of a Harnack inequality. This failure arose from the term containing R in the right hand of the above inequality: in the manifold case ‘ R12 ’ occurred instead of ‘ R1 ’. In [8], we utilize the semigroup methods and then proved the above three equivalent properties under the assumption of C D E (n, 0). In the rest of this section, we only consider normalized graph Laplacians. Definition 13.5 (DV ) A graph G satisfies the volume doubling property DV (C) for constant C > 0 if for all x ∈ V and all r > 0: V (x, 2r ) ≤ C V (x, r ). (P) A graph G satisfies the Poincaré inequality P(C) for a constant C > 0 if  x∈B(x0 ,r )

m(x)| f (x) − f B |2 ≤ Cr 2

 x,y∈B(x0 ,2r )

ωx y ( f (y) − f (x))2 ,

212

Y. Lin and S. Liu

for all f ∈ V R , for all x0 ∈ V , and for all r ∈ R+ , where fB =

1 V (x0 , r )



m(x) f (x).

x∈B(x0 ,r )

(H ) Fix η ∈ (0, 1) and 0 < θ1 < θ2 < θ3 < θ4 and C > 0. G satisfies the continuoustime Harnack inequality H (η, θ1 , θ2 , θ3 , θ4 , C), if for all x0 ∈ V and s, R ∈ R+ , and every positive solution u(t, x) to the heat equation on Q = [s, s + θ4 R 2 ] × B(x0 , R), we have sup u(t, x) ≤ C inf+ u(t, x), Q−

Q

where Q − = [s + θ1 R 2 , s + θ2 R 2 ] × B(x0 , η R), and Q+ 2 2 = [s + θ3 R , s + θ4 R ] × B(x0 , η R). (H) Fix η ∈ (0, 1) and 0 < θ1 < θ2 < θ3 < θ4 and C > 0. G satisfies the discretetime Harnack inequality H (η, θ1 , θ2 , θ3 , θ4 , C), if for all x0 ∈ V and s, R ∈ R+ , and every positive solution u(x, t) to the heat equation on Q = ([s, s + θ4 R 2 ] ∩ Z) × B(x0 , R), we have (n − , x − ) ∈ Q − , (n + , x + ) ∈ Q + , d(x − , x + ) ≤ n + − n − implies

u(n − , x − ) ≤ Cu(n + , x + ),

and Q+ where Q − = ([s + θ1 R 2 , s + θ2 R 2 ] ∩ Z) × B(x0 , η R), 2 2 = ([s + θ3 R , s + θ4 R ] ∩ Z) × B(x0 , η R). (G) Fix positive constants cl , Cl , Cr , cr > 0. The graph G satisfies the Gaussian estimate G(cl , Cl , Cr , cr ) if, whenever d(x, y) ≤ n, Cr m(y) −cr d(x,y)2 cl m(y) −Cl d(x,y)2 n n ≤ pn (x, y) ≤ . √ e √ e V (x, n) V (x, n) We need further assume the following rather mild condition. Definition 13.6 Let α > 0. G satisfies Δ(α) if, (1) x ∼ x for every x ∈ V , and (2) If x, y ∈ V , and x ∼ y,

ωx y ≥ αm(x).

Theorem 13.3 ([2], Theorem 2.2) If the graph satisfies C D E (n 0 , 0) and Δ(α), we have the following four properties. (1) There exists C1 , C2 , α > 0 such that DV (C1 ), P(C2 ), and Δ(α) are true. (2) There exists cl , Cl , Cr , cr > 0 such that G(cl , Cl , Cr , cr ) is true.

13 Li-Yau Gradient Estimate on Graphs

213

(3) For any η ∈ (0, 1) and 0 < θ1 < θ2 < θ3 < θ4 , there exists C H such that H (η, θ1 , θ2 , θ3 , θ4 , C H ) is true. (3) For any η ∈ (0, 1) and 0 < θ1 < θ2 < θ3 < θ4 , there exists CH such that H (η, θ1 , θ2 , θ3 , θ4 , CH ) is true. A function u on G is called harmonic function if Δu = 0. A harmonic function u on G has polynomial growth if there is positive number d such that ∃x0 ∈ V, ∃C > 0, ∀x ∈ V, | u(x) |≤ Cd(x0 , x)d . Combining Theorem 13.3 and Delmotte’s Theorem 3.2 from [4], we obtain the following result which confirms the analogue of Yau’s conjecture [19] on graphs. Theorem 13.4 ([2], Theorem 2.3) If the graph satisfies C D E (n 0 , 0) and Δ(α), then the dimension of space of harmonic functions on G has polynomial growth is finite.

13.3.2 Li-Yau Inequality for Unbounded Laplacians In [7], we consider unbounded Laplacians, and follow from the method in [8], then prove the same version of Li-Yau inequality as the bounded setting under the assumption of C D E . In this subsection, we assume the infinite graph satisfies (A2 ). Theorem 13.5 ([7], Theorem 1.1) If G satisfies C D E (n, K ) with K ∈ R, then for p any 0 ≤ f ∈ m with p ∈ [1, ∞], and any constant b ≥ 1, we have √   Γ ( Pt f ) 1 2K t ΔPt f n b2 K 2t ≤ 1− + + −K . Pt f 2 2b + 1 Pt f 2 (2b − 1)t 2b + 1 Similarly, the above Li-Yau inequality leads to Harnack inequality ([7], Corollary 1.2), heat kernel diagonal upper bound ([7], Theorem 1.3) and Cheng’s eigenvalue estimate ([7], Theorem 1.4) for unbounded Laplacians.

13.3.3 Gradient Estimate on General Graphs In this subsection, we do not consider any curvature condition on graphs, and show two kinds of gradient estimates for general functions not only positive solutions to the heat kernel. Theorem 13.6 ([10], Theorem 2.1) Let G = (V, E) satisfy the assumption (A1 ). If u : V → R is a positive function, then we have

214

Y. Lin and S. Liu

√ Γ ( u)(x) Δu(x) − ≤ Dm , ∀x ∈ V. u(x) 2u(x) In particular, if u : V × R → R is a positive solution to the heat equation (Δ − ∂t )u = 0, then we conclude √ √ Γ ( u)(x) ∂t u(x) − √ ≤ Dm , ∀x ∈ V. u(x) u(x) Theorem 13.7 ([12], Theorem 1) Suppose that d :=

sup x∈V, x∼y

m(x) < +∞. ωx y

Then for any positive function u : V → R, there holds √

2Γ (u) √ Δu √ ≤ d + d Dm + Dm . u u Several special cases are listed below: (i) If u is a positive solution to the differential inequality Δu − qu ≤ 0 on V , where q : V → R is a function, then there holds √

√ 2Γ (u) √ − dq ≤ d Dm + Dm . u

(ii) If u is a positive solution to the differential inequality Δu − hu α ≤ 0, where α ∈ R, and h : V → R is a function, then there holds √

√ 2Γ (u) √ − dhu α−1 ≤ d Dm + Dm . u

(iii) If u is a positive solution to the differential inequality Δu − ∂t u ≤ qu, where q : V × R → R is a function, then there holds √

√ 2Γ (u) √ ∂t u √ − d − dq ≤ d Dm + Dm . u u (iv) If u is a positive solution to the differential inequality Δu − ∂t u + au log u ≤ 0, where a ∈ R is a constant, then there holds √

√ 2Γ (u) √ ∂t u √ − d − da log u ≤ d Dm + Dm . u u

13 Li-Yau Gradient Estimate on Graphs

215

References 1. Bauer, F., Chung, F., Lin, Y., Liu, Y.: Curvature aspects of graphs. Proc. Am. Math. Soc. 145(5), 2033–2042 (2017) 2. Bauer, F., Horn, P., Lin, Y., Lippner, G., Mangoubi, D., Yau, S.-T.: Li-Yau inequality on graphs. J. Differ. Geom. 99, 359–405 (2015) 3. Chung, F., Yau, S.-T.: Logarithmic Harnack inequalities. Math. Res. Lett. 3, 793–812 (1996) 4. Delmotte, T.: Harnack Inequalities on graphs. Seminaire ´ de Theorie ´ spectrale et geom ´ etric, ´ tome 16, 217–228 (1997–1998) 5. Gromov, M.: Groups of polynomial growth and expanding maps. Inst. Hautes Etudes Sci. Publ. Math. 53, 53–C73 (1981) 6. Lippner, G., Liu, S.: Li-Yau inequality on virtually Abelian groups. arXiv:1610.05227 7. Chao, G., Lin, Y., Liu, S., Yau, S.-T.: Li-Yau inequality for unbounded Laplacian on graphs. Adv. Math. https://doi.org/10.1016/j.aim.2019.106822 8. Horn, P., Lin, Y., Liu, S., Yau, S.-T.: Volume doubling, Poincaré inequality and Guassian heat kernel estimate for nonnegative curvature graphs. J. Reine Angew. Math. 757, 89–130 (2019) 9. Liu, S.: Buser’s inequality on infinite graphs. J. Math. Anal. Appl. 475(2), 1416–1426 (2019) 10. Lin, Y., Liu, S., Yang, Y.: Global gradient estimate on graph and its applications. Acta Mathematica Sinica, English Series 32(11), 1350–1356 (2016) 11. Lin, Y., Lu, L., Tau, S.-T.: Ricci curvature of graphs. Tohoku Math. J. 63(4), 605–627 (2011) 12. Lin, Y., Liu, S., Yang, Y.: A gradient estimate for positive functions on graphs. J. Geom. Anal. 27(2), 1667–1679 (2017) 13. Lin, Y., Yau, S.-T.: Ricci curvature and eigenvalue estimate on locally finite graphs. Math. Res. Lett. 17(2), 343–356 (2010) ¨ 14. Munch, F.: Remarks on curvature dimension conditions on graphs. Calc. Var. Partial Differ. Equ. 56(1) (2017) 15. Ollivier, Y.: Ricci curvature of Markov chains on metric spaces. J. Funct. Anal. 256, 810–864 (2009) 16. Pouryahya, M., Mathews, J., Tannenbaum, A.: Comparing three notions of discrete Ricci curvature on biological networks. arXiv:1712.02943 17. Sandhu, R., Georgiou, T., Reznik, E., Zhu, L., Kolesov, I., Senbabaoglu, Y., Tannenbaum, A.: Graph curvature for differentiating cancer networks. Sci. Rep. 5, 12323 (2015) 18. Romeil, S., Sandhu, T., Georgiou, T., Tannenbaum, A.R.: Ricci curvature: an economic indicator for market fragility and systemic risk. Sci Adv. 2(5) (2016) 19. Yau, S.-T.: Nonlinear analysis on geometry. L’enseignement Mathematique, ´ SRO-KUNDIG, Geneve ` (1986) 20. Zhou, X., Liang, X., Zhu, X., Tang, Z.: Convex edges in social networks. IEEE Trans. Comput. Soc. Syst., in press

Chapter 14

Iterative Learning Control for FinTech Kun Zeng

14.1 Introduction Iterative Learning Control (ILC) is a control approach for intricate systems running in repetitive tracking mode. It was firstly introduced by Uchiyama, dated back to 1978 [1], which was not widely spread because it is written in Japanese. In 1984, the paper published by Arimoto et al. [2] has attracted attention from the control community. Since then, related research of ILC have increased rapidly. Especially in the first decade of this century, ILC garnered significant developments in both theoretical and engineering aspects. By reviewing existing description of ILC, two salient features distinguish ILC and other control methodologies. The first one is that ILC exploits the past information to optimize the current control input, which means that all the past information is reflected in the current control signals. Such a learning pattern leads to quick and precise tracking performance. The other peculiarity is that ILC is designed to deal with the same or similar trajectory-tracking task in finite time interval in contrast to the infinite time interval of traditional control methodologies. In short, ILC has typical features such as learning, tracking, high precision and finite time operation process. These are also typical application scenarios of ILC [3]. According to the characteristics, it is natural to speculate that ILC is widely used in engineering. As a matter of fact, ILC is playing an important role in diverse branches of control engineering, especially in those situations requiring both speediness and high precision such as welding, injection molding, component assembly, disk driving, computer numerical control, and machine tools operation. These applications greatly promote theoretical development and benefit from theoretical development either. K. Zeng (B) College of Information Science and Technology, Beijing University of Chemical and Technology, Beijing 100029, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5_14

217

218

K. Zeng

In control field, as a new intelligent control algorithm, ILC are fully investigated under various conditions such as multiple connected control objectives [4, 5], passive incomplete control information [6], batch-varying or time-varying tracking objects [7, 8, 50], and periodical or non-periodical disturbances [3]. Meanwhile, recent research also investigates ILC in various formulations or in combination with other control approaches, for examples, fuzzy ILC [9, 10], predictive ILC [11–13], adaptive ILC [14, 15], point-to-point ILC [16, 17], and robust ILC [3, 18]. Among those progresses are several remarkable results including [3, 4, 14, 16]. All these results, in engineering and in theory, lay the foundation for extending ILC to other areas such as economics and finance. In retrospect of applying control theory in economics, the development is steady yet in twists and turns. Early in the 1940s, when cybernetics was just firstly investigated [19], Norbert Wiener referred to the potential influence of cybernetics in social sciences. Owing to [19] by Wiener, cybernetics has become well-known to a wider public as a comprehensive method on the essence of practical systems. Later on, cybernetics began to brought developments in theory and applications. In the early 1950s, cybernetics was introduced in economics by Tustin [20] and Phillips [21]. Tustin managed to combine feedback with Keynesian macroeconomics, which is still a newly-merged discipline at that time. Although Tustin and Phillips seemed promising to trigger a trend, they did not make significant impact on economical field in the 1950s. In retrospect, according to Cochrane’s research [22], the emergence of digital computers, economists’ misunderstanding on the effectiveness of cybernetics in Western Economics, competition with other mathematical techniques, the premature Keynesian theory, the atmosphere in economical community and historical background of Space Race all contributed to the result that Tustin’s research did not bring a revolution to economics. Generally speaking, it is a historical process [23]. Later in the 1960s, aerospace continued to drag attention from all over the world. Astronautics took the dominant place and the most talented engineers devoted themselves into astronautics in that time. Little attention was paid to the economics. It is until the end of The Age of Sputnik that engineers began to switch their attention to economics. In early 1970s, along with the development of digital computer, cybernetics finally found its way into the mainstream of economics and research in this interdisciplinary field started to boom. Results at that time contain linear quadratic tracking problem used in fiscal policy [24, 25] and monetary stock [26], adaptive control for parameter uncertainty in the economics model [27, 28], stochastic control for commodity market disturbance [29] and macroeconomics [30], optimal control for advertising [31], and Kalman filter used in volatility coefficient estimation [32, 33]. Those are a small part of the research in early 1970s. During this period, adaptive control, stochastic control and optimal control was most frequently combined with both microeconomics and macroeconomics. Kalman filter was exploited to solve accurate estimation (in portfolio selection and volatility coefficient) and began to show its power. Phillips and Tustin’s research was extended by Cooper [34]. A large amount of research in this time set the foundation of the following investigation in the economics-cybernetics disciplinary field. Profound influence last until now and

14 Iterative Learning Control for FinTech

219

Fig. 14.1 Conceptual diagram of time-independent economic system

many methods are still in use nowadays including Kalman filter and linear quadratic model. Roughly speaking, financial model or economical model, in a way, can be divided into two types: the time-dependent dynamics and the time-independent dynamics. The time-dependent dynamics involves changing trends in sequence or inherent pattern in data analysis over a time interval. The time-independent dynamics contains relations among variables or system structure. For time-independent dynamics, the model can be viewed as black-box that links input variables with output variables, as is shown in Fig. 14.1. The economic process or model with at least one time-dependent parameter is a time-dependent dynamics model. However, the major difficulty lies in distinguishing time-independent parameters from time-dependent parameters. Situations vary from process to process and classification criteria can be remarkably different when considering the specific features of a economical system. It is of great influence in modeling process to determine the sufficient number of time-dependent parameters for a problem and the degree that the simplicity will not undermine the validity of model. In this chapter, we attempt to introduce the potential application of an intelligent control approach, ILC, in FinTech and investigate how ILC can be applied in financial or economical models. This chapter is organized as follows. We first introduce the basic structure of ILC. Then, we present current theoretical achievements of ILC. Following that, we discuss how ILC could be used in the financial model. An example will be elaborated to demonstrate the promising effect of ILC.

14.2 Basic Principles of Iterative Learning Control This section is to introduce the basic concept of ILC. Firstly, we give an introduction of the principles and the block diagram of ILC. Then, we give detailed descriptions of the basic principle with state-space equations.

220

K. Zeng

14.2.1 Structure of ILC As is mentioned in [3], ILC is designed for multi-pass tracking process in a general way. Each pass during the operation process is called an “iteration” or a “batch”, which is origin of the name “iterative learning control”. Each iteration operates in a finite time interval. The control structure of ILC is shown in Fig. 14.2. In each iteration, the controller exploits the past information of output signals and error signals to rectify the input signal for the current iteration and, therefore, reduces the difference between yk (t) and yd . The schematic diagram in Fig. 14.2 illustrates the basic structure of ILC algorithm. It is worth noting that Fig. 14.2 shows the control structure in the (k + 1)th iteration. The controller receives the information of u k from a memory and tracking error ek from the plant to generate the control signal u k+1 (t). Then, u k+1 (t) will be restored in the memory for the next update and transferred to the plant for the subsequent iteration.

14.2.2 Formulation of ILC In this part we will discuss the formulation of ILC. It should be emphasized that all formulations in this chapter are given in the state-space form, an internal description of a system proposed by Kalman in 1960. In particular, the state-space equation is exploited to describe relations among the input signal, output signal and inner state information of an involved system. In this formulation, the dynamics can be investigated thoroughly as the internal information is revealed. In addition, the statespace formulation takes the system as a white box contrary to the black-box in Fig. 14.1. With this formulation, economists usually assume main state variables can be accurately measured [35]. The structure of state-space equation could be represented in discrete-time form or continuous-time form. In consideration that both discrete and continuous forms

Fig. 14.2 Control diagram of ILC

14 Iterative Learning Control for FinTech

221

have been adopted in certain problems of financial analysis, they will be introduced in the following. Discrete-time state-space system can be represented as: xk (t + 1) = At xk (t) + Bt u k (t), yk (t) = Ct xk (t)

(14.1)

where x ∈ R n , y ∈ R p , u ∈ R q are the state, output, and input, respectively. k = 1, 2, . . . is iteration number and t = 0, 1, . . . , N is the time instance during an iteration, where N is the length of iteration. A, B, C are the time-varying system matrices with proper dimensions. The corresponding continuous-time system can be represented as: x˙k (t) = Axk (t) + Bu k (t), yk (t) = C xk (t)

(14.2)

where we usually denote the time t in an interval [0, T ], where T is the length of iteration. ILC Control law is generally represented as follows: u k+1 (t) = h(u k (·), . . . , u 1 (·), ek (·), . . . , e0 (·))

(14.3)

where u k (t) is the control input signal and ek (t) is the tracking error. L is the learning matrix. Here, the tracking error ek (t) is given by the following equation: ek (t) = yd (t) − yk (t)

(14.4)

The control objective is to generate control input sequence such that the limitation of the output yk (t) can track yd (t) as closely as possible, where yd (t) denotes the tracking reference. Usually, we hope that ek (t) converges to zero if the desired reference is realizable. The realizability of the reference yd (t) indicates that there exists certain input signal such that yd (t) can be generated completely by the state-space system. Remark 14.1 Equations (14.1) and (14.2) are expressed in the time domain. This is because the dynamics in economics system is divided in time-dependent and time-independent parts. For the time-independent dynamics, the parameters in the equations are constants. For time-independent dynamics, parameters are partially or completely variable. In addition, the time-domain expressions are explicit. Remark 14.2 As is illustrated in (14.1) and (14.2), the difference lies in the state x. The dynamics of the state in the discrete-time form is described by xk (t + 1), the signal in the next time instance, whereas in the continuous-time case it is replaced by x˙k (t), the differential of the state x. The discrete-time form can be used in time series analysis (TSA) for the case that data is given in a certain time instance. The

222

K. Zeng

continuous-time form can be used in pricing model because the variable requires a differential form. It should be pointed out that the analysis of two models is rather different. Remark 14.3 Unlike most traditional control methodologies, ILC is a typical feedforward control law. The main difference between feedforward and feedback control strategies is whether the current error signal is utilized directly to rectify the current input signal. As is shown in Fig. 14.2, the tracking error is utilized to rectify the input signals for the following iteration. Thus, ILC is a feedback control in the iteration domain and feedforward control in the time domain. Remark 14.4 The ILC algorithm is designed for repetitive tracking tasks, which implies that the state x should be reset to the same value in each iteration. This reset condition is given as follows: (14.5) xk (0) ≡ x0 , ∀k This reset condition, also called the identical initialization condition (i.i.c), is rather vital in ILC analysis. In other words, it is a premise of the precise tracking. Although this condition is relaxed or even can be learned, it is worth pointing out that perfect tracking performance can never accomplished without i.i.c. Remark 14.5 The ILC algorithm utilizes the former information to rectify the input signal. Owing to this feature, ILC algorithm requires little information of the system. ILC is capable of achieving precise tracking performance even if the system is nonlinear and model-uncertain. For example, the system matrix A in the (14.1) and (14.2) could be nonlinear functions depending on states. This feature allows ILC algorithms to apply in complicated situations.

14.3 Recent Progresses and Potential Applications In this section we are going to discuss potential application of ILC. After more than three decades of developments, ILC is investigated not only in various conditions, but also with other control techniques. Besides the engineering application, ILC has been applied in economics and finance. We do not concern the specific mechanism or theorem in the economics. That belongs to the economics field. The extended applications of ILC is more of another solution to the same economics question. ILC could be a complementary perspective to solve a problem either by solving the economical problem in cybernetics perspective or analyzing the information contained in the input signal. After all, various perspectives would expose different features of the same issue. This is one motivation for applying ILC to the FinTech. For the rest of this section, we are going to introduce several specific algorithms with potential applications in financial engineering and try to make extensions. Our attempt to extend current theoretical work of ILC to economics model is tend to provide new perspectives for academic research and practical applications.

14 Iterative Learning Control for FinTech

223

14.3.1 ILC for Multi-agent System Multi-agent system (MAS) stems from the application of distributed systems. MAS is a group system consisting of several connecting agents. Individual or single agent in the system could be any controllable or partially controllable object, like a person, an animal, and a robot. Complicated communication and relatively independent individuals enable MAS to operate as an effective instrument for analyzing group behavior or accomplishing certain control objects that are difficult for a single agent. Here, each agent has a motion pattern while all agents communicate through the connected network to share information and complete control tasks collectively. MAS has been introduced in the financial field for a long time. MAS is utilized to solve various financial problems. The most influential applications include building trading system, risk management, portfolio selection, economical simulation experiments, prediction of time series and so on. In the combination of finance, MAS is usually related to real systems. Each agent has a corresponding relation to the entity in real world. When combined with ILC, MAS is mainly designed for coordination control. Coordination means synchronized behaviors of the whole system. Owing to application potential of MAS and good features of ILC, the combination of ILC and MAS drew much attention. Representative works includes [36–39]. Unlike the situation with only one agent where the tracking trajectory is definitely known, MAS has more agents and all the agents does not have access to the desired trajectory. Those agents that know the desired trajectory are called leader (see agent 1 and 6 in Fig. 14.3) and the others are called followers (see agent 2, 3, 4, 5, 7 in Fig. 14.3). Another tricky part is that there could be more than one trajectory. Each leader could be related to a trajectory. According to the number of leaders, coordination control of MAS can be divided into three types: consensus problem (no leader with no trajectory), leader-following problem (unique leader with one trajectory) and containment problem (multiple leaders with multiple trajectories). Consensus means

Fig. 14.3 An illustrative example of MAS structure

224

K. Zeng

all the agents asymptotically converge to a stable equilibrium state at the terminal time as the iteration increases [40]. Leader-following means all followers are able to follow the trajectory of the leader and asymptotically converge to the desired trajectory as the leader. Containment means all followers are able to follow at least one leader and ultimately each follower converge to a trajectory. Figure 14.3 shows an containment situation. Currently, except for the containment control with ILC, consensus and following problem are investigated in many different perspectives. In financial area, coordination problem is usually investigated in the simulation case of economic model or economic behaviors. Explaining past financial process or predicting future investment strategy is plausible. Agent in the system usually refers to an investors, an account or financial data in a process. Current research of ILC-based MAS problem focus on the simulation case. Driven by a certain problem, ILC is probable combined with other question in MAS. In the Sect. 14.4, we are trying to give a plausible example in which MAS with ILC could explain a certain market dynamics.

14.3.2 Fuzzy ILC Fuzzy set theory was initially proposed by Zadeh [41] to describe the imprecision of behaviors and decision making. By expressing property of dynamics with a linguistic feature instead of an accurate number, data or dynamics are roughly classified yet easy to describe. As a matter of fact, properties are often described in a vague way, meaning that fuzzy set theory is pragmatic in certain application scenarios. Fuzzy control is an effective application of fuzzy theory since the 1970s. Because economics model is intended to describe individual behaviors and most of actual behaviors may be described vaguely, fuzzy control is suitable to analyze individual economic behaviors. In retrospect, classical application of fuzzy control is in capital budgeting because budgeting is apt to be expressed in linguistic form. Fuzzy theory and fuzzy control have be applied in a widespread application such as risk analysis, portfolio selection, Black-Scholes model, regression analysis, and insurance. In fuzzy control, the first step is to transform the data and dynamics into a fuzzy set with fuzzy membership functions. That is to say, the specific numbers are divided into several classes according to the membership functions. Then, each element in the fuzzy set is calculated according to a set of fuzzy inference rules. Finally, the result in fuzzy form will be transformed into original form through defuzzy interface. Because specific data is transformed into fuzzy form and elements in fuzzy set are much easier compared with the original data, computation complexity decreases drastically by introducing fuzzy control methods. In addition, fuzzy control algorithm is also an important approximation approach of nonlinear system (Fig. 14.4). Let us focus on the membership function and defuzzy function. Membership function could be a predetermined piecewise function or a linear combination of several functions with different coefficients. Membership function and defuzzy rule

14 Iterative Learning Control for FinTech

225

Fig. 14.4 An illustrative example of fuzzy control

are set according to experience. The ideal situations are that empirical knowledge is enough for classification. However, this is a very harsh condition. The parameters of membership functions may be improper during the control process and need to be rectified. Besides that, the introduction of the fuzzy rule also damages precision. To deal with these two disadvantages, fuzzy control is usually applied with other control methods. In this section, we are going to introduce the combination with ILC. In repeatable control scenarios or multi-pass process, parameters could be adjusted based on past error information. Provided that the convergence result of ILC is given, by applying ILC in parameter adjustment, the suitable parameter would be quickly given. Such a situation is a typical application in the economic model or financial model. Chien [42] proposed a method to approximate the nonlinear item in the system with a fuzzy system. Later on, Chien continued to focus on the application of fuzzy in ILC and conducted much research. As can be seen in the [9], the nonlinear item is approximated by the fuzzy system and the parameters are learned by ILC. This method is a simple yet effective way to deal with a system containing nonlinear items.

14.3.3 Predictive ILC As is mentioned in the first section, the financial model can be divided into a timedependent model and a time-independent model. Time series analysis is an important branch in the time-dependent model. The time series analysis problem investigates the pattern in the past data and predicts the future date in the sequence. This analysis process is called prediction or estimation, the main instrument to analyze the digit sequence. In cybernetics, the estimation problem has been investigated to a deep extent. Various algorithms are proposed to solve the estimation problem. The most commonly used methods are the ARMA model and the Kalman filter. Besides these two famous models, other predictive algorithms are proposed. Predictive ILC is one of these algorithms. ARMA model is the integration of the Autoregressive method and the Moving Average method. AR part is utilized to analyze the relationship between current state information and past state information in the series. Then future state information is

226

K. Zeng

given by the relation. MA part is adopted to analyze the relation of the past white noise signal and current white noise signal. Then the white noise signal in the next time instance is given in a similar way. White noise seems to be redundant in prediction. Actually, white noise in time sequence is new information or shock that emerge in different time instant or a result of a collective random choice. These white noise can not be ignored and will affect the future tendency of time sequence. Time-dependent dynamics is represented in discrete form. Despite time is continuous, it is sampled into a discrete sequence and the discrete form is better to transfer or store. In the following part, time series analysis will be represented in discrete form. Considering the discrete-time system: y(k) + a1 y(k − 1) + · · · + an y(k − n) = bm u(k − 1) + · · · + b p u(k − p) + d1 e(k − 1) + · · · + dq e(k − q)

(14.6)

where k is time instance, y(k) is output at time instance k, u(k) is input at time instance k, e(k) is white noise. This equation shows relations between input, output, and error. In a different model, some coefficient or variables in (14.6) could be zero and that part is excluded from the model. For example, the ARMA model does not contain an input signal, so u are all zero. In the analysis, different coefficients value combination results in different models. We mentioned that the ARMA model is an integration of the AR model (autoregressive model) and the MA model (moving average model). If all the input u is zero and d2 , d3 , . . . , dq are zero, where the equation only describes linear relation among output data with white noise as an error, the model is called autoregressive model. If all the input u is zero and a2 , a3 , . . . , an are zero, where the equation describes linear relation among current output y(k) and past white noise e(k − 1), . . . , e(1) with a white noise as error, the model is called moving average model. The ARMA model is represented by the following equation: y(k) = −a1 y(k − 1) − · · · − an y(k − n) + d1 e(1) + d2 e(2) + · · · + dq e(q). (14.7) ARMA model describes the linear relation among current output, past output and past white noise with white noise as an error. ARMA model has been utilized to analyze the stock market. In application, n, p, q in (14.6) still need to be determined in ARMA. How to determine those parameters involve another set of theory. However, those are not our interests in this chapter. Further research conducted by Ruey S. Tsay [43] and his textbook is recommended. Another prediction algorithm is the Kalman filter, one of the most commonly used data fusion algorithm till now. Kalman filter exploits past information and known measurement noisy information to provide the most probable estimation of current unobservable state. This is the optimal estimator [44] for a linear system with minimum mean square [45]. Kalman filter is applied widely because of the

14 Iterative Learning Control for FinTech

227

reduced computational complexity, simple recursive form and minimum mean square of Gaussian uncertainty. Because current state estimation can be given by prediction and measurement from past information, Kalman filter can also be applied to predict future state with current information. Therefore, the Kalman filter could be used to predict the future value of the time series. ARMA model and Kalman filter are effective tools in time series analysis. Both algorithms are applied in factual prediction. However, as is mentioned in the introduction of two algorithms, the ARMA model and Kalman filter are designed for a linear system and linear relation of sequence. It is probable that the actual system is nonlinear. Extensions of the existing algorithms are proposed to deal with nonlinearity. Predictive ILC is a method to deal with a nonlinear system or nonlinear model. It is worth noting that ILC is not designed for prediction or estimation. To deal with the estimation problem, it is necessary to combine ILC with other predictive techniques. According to [46], existing researches investigating ILC with predictive techniques contain the T-S fuzzy model [47] and CARMAX model [48]. By introducing these predictive algorithms, the estimation of future output is given and the ILC algorithm could be adjusted in real-time. CARMAX model here is an extended form of the ARMA model. In CARMAX model, control input u in (14.6) is not zero. Predictive ILC could be applied in many time-dependent financial dynamics analysis. By introducing predictive techniques, it is plausible that ILC is utilized in a time series analysis instrument as a FinTech. Further research or application is promising in these fields.

14.3.4 Other Application Scenarios of ILC There are still many classical applications of ILC that have not mentioned in the former sections. These applications contain certain premises that hamper direct extension or application to financial models or FinTech. However, this application is related to the financial model to some extent. We hope to give a brief introduction of these algorithms for further heuristic researches or more insightful viewpoints. In engineering, it is rather common that the control unit and plant are located in different places. This situation gives rise to a great demand for communication networks. The introduction of network guarantees scale applications and portable features. Meanwhile, the introduction of a network could also bring drawbacks to the system like network congestion, broke linkage and transmission errors [49]. Network in the ILC control system is as Fig. 14.5. It should be emphasized that ILC be immune to the drawbacks in the network to some extent. Random data dropout, signal fading and noisy channel have little influence on the control effect as long as certain ILC control condition is satisfied, which is a reliable feature in the factual application. Related work can be referred to as [49]. In financial applications, the networked communication structure is also applied widely, communication break-

228

K. Zeng

Fig. 14.5 Networked Structure of ILC

down or incomplete transmission of information is common in the financial system. If the situation fits, ILC is an effective instrument to cope with this problem. Another potential application is point-to-point ILC, which is an effective way to reduce information transferred in communication. Unlike traditional ILC requiring information along with the whole trial, point-to-point ILC requires information on a certain time instance. In engineering, a classical application scenario is ‘pick up and place’ tasks for we only care about the result. However, certain application scenarios of point-to-point ILC has yet not been found or proposed. More research on point-topoint ILC can be found in [51–54]. In the financial system or economic application, this control task also exists. Data on a certain time instant is probably of our interest. Probably because of the repetitive features and other constraint conditions, point-topoint ILC has not been used in FinTech. Stochastic ILC is designed for a system with a stochastic signal or stochastic process, measurement noise, a system with the random asynchronous signal, etc. [55]. Differing from the deterministic system, the analysis of the stochastic system is always formulated in the form of probability. The basic formulation of stochastic ILC is represented as follow: xk (t + 1) = At xk (t) + Bt u k (t) + ω(t), yk (t) = Ct xk (t) + υ(t) where ω(t) and υ(t) are system noise and measurement noise, respectively.

(14.8)

14 Iterative Learning Control for FinTech

229

14.4 Example: Application of MAS Leader-Following Problem 14.4.1 Backgrounds Considering the practical human decision-making process, individual behaviors are influenced by the state of the time and received information from the external world. This results in a fact that an individual’s decision-making process and behavior could be described by large-scale multi-agent system dynamics to some extent. Leaders in the MAS are the key opinion leaders (KOLs) or influential people while the followers are the rest people. The tricky part is the number of agents could be tremendous and the decision-making process probably contains randomness. These two tricky problems could be solved by digital computer and probability techniques. For simplicity of discussion, we will give an introduction of the analogical MAS in the simplest case, which means a small scale with definite dynamics. MAS has a profound application in finances. Research on the MAS tracking problem starts from a static leader(s), a special case of the dynamics. Here, we would like to clarify that even leader(s) is static, the situation is still available. Take the stock market, for example, a suitable case is that influential investors or the leader in the system are the cornerstone investors and the followers are the private investors. Cornerstone investors mean those who invest in IPO in the first place. In IPO, cornerstone investor needs to be confirmed before IPO is issued and their trade is locked in a certain time interval such as three to six months after IPO is issued. This is an institution in Hong Kong designed for providing a model for the market. Cornerstone investors could be large institutional investors like sovereign wealth fund or private equity company. Peculiarities of these investors are longterm, low-frequent and large-scale. Compared with them, private investors trade more frequently and refer to the leader. Cornerstone investors and private investors are similar to the leader-follower model in MAS. In the following part, we attempt to construct the structure and show how MAS works and how ILC is utilized. In the simplest situation, a cornerstone investor is seen as a stationary leader in MAS while private investors as followers. If the model is used in a more common situation, secondary market. The situation will become a leader-following question or containment control problem with moving leaders.

14.4.2 An Application of MAS Leader-Following Problem The simplest case in the leader-following problem is following a stationary leader with one dimension. Our discussion of the Financial application of ILC also starts from there. With one investment cycle of cornerstone viewed as an iteration, cornerstone investors can be viewed as a stationary leader. The Private investors adjusting their portfolio according to the leaders every day or every week can be viewed as

230

K. Zeng

the following behavior of the followers. Our target is to design a control law that guarantees the followers converge to the desired trial. The State-space equation is utilized to formulate the model as follows: xk (t + 1) = xk (t) + Bu k (t), yk (t) = C xk (t)

(14.9)

where x is state of portfolio. It is assumed that x reset to 0 in each iteration, y is also the state of portfolio but y is represented by x in linear relation, in this case y equals to x because y contains one dimension, i.e. C equals to 1. Equation (14.3) gives a general form of the control law but not a specific formula. In this problem, the control law is given in p-type form as follows: u k+1 (t) = u k (t) + K ek (t + 1) The control law is given to adjust u to guarantee the state of followers converges to the desired state of leader. Such a problem is solved and an algorithm is given in [56]. In the end, all followers reach the same state as the leader.

14.4.3 Extensions In the former section, we have discussed the simplest case. However, the real structure is more complicated. To begin with, usually, there is more than one leader in the real market and the state of leaders is various. Such a condition brought another problem called containment control. The containment Control task is followers are designed to converge to a convex hull spanned by the leaders, meaning that the dynamics of followers have already been surrounded by leaders in a certain way. Also, the state may be in the vector form. For example, it is common that a leader contains information about more than one stock. After all, diversification is an important part of the investment. Then the scenario becomes a containment problem and each agent is in vector form with more than one dimension. Containment control tasks and multi-dimension leaders require state equations to be formulated in matrix form and the investigation method varies from the simplest case. Besides, ILC requires an initial state of each iteration is the same. However, this condition may be not satisfied in some applications. This is solved by [42]. Remark 14.6 Another perspective of solving the problem is viewing the trajectory of leaders as a point-to-point trial. By considering the states of the leader as a sequence of specific time instants and followers as a portfolio that needs to be adjusted in the algorithm of robot-advisor, we can choose the iteration number arbitrarily.

14 Iterative Learning Control for FinTech

231

Remark 14.7 In the stock markets, leader behaviors can change frequently instead of holding the same shares. In this case, the iteration length could be rather short. As long as the iteration length is not changing along the iteration axis, the length will not affect the convergence result. Remark 14.8 If the leader is in multiple dimensions, the trajectory is also in the same dimension. In this case, the trajectory could be in high dimension instead of in a plane, two-dimension coordinate system. Remark 14.9 Because analysis needs to be under the same framework, usually the input and output need to be normalized or mapped to the same domain. For example, a follower may choose a thousand as a unit and a leader may choose a million as a unit. With the normalization, all investors correspond to agents in the MAS.

14.5 Conclusions In this chapter, we introduce fundamental rules of ILC and discuss several potential applications of ILC in the Financial model or FinTech. By combining with other techniques, ILC is capable of treating the various problems in the financial application and take advantage of both techniques. With proper application scenarios, ILC could be used to coping with any problem that is difficult for traditional control methods like nonlinearity. An exemplary case is given to show the potential application of ILC with MAS. Though research on this field is limited. With more research and analysis published, ILC is going to find a place in the financial model.

References 1. Uchiyama, M.: Formation of high-speed motion pattern of a mechanical arm by trial. Trans. Soc. Instrum. Control Eng. 14(6), 706–712 (1978) 2. Arimoto, S., Kawamura, S., Miyazaki, F.: Bettering operation of robots by learning. J. Robot. Syst. 1(2), 123–140 (1984) 3. Ahn, H.S., Chen, Y., Moore, K.L.: Iterative learning control: brief survey and categorization. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 37(6), 1099–1121 (2007) 4. Liu, Y., Jia, Y.: An iterative learning approach to formation control of multi-agent systems. Syst. Control Lett. 61(1), 148–154 (2012) 5. Ahn, H.S., Chen, Y.: Iterative learning control for multi-agent formation. In: 2009 ICCASSICE, pp. 3111–3116 (2009) 6. Shen, D.: Iterative learning control with incomplete information: a survey. IEEE/CAA J. Automatica Sinica 5(5), 885–901 (2018) 7. Meng, D., Jia, Y., Du, J., Yu, F.: Tracking control over a finite interval for multi-agent systems with a time-varying reference trajectory. Syst. Control Lett. 61(7), 807–818 (2012) 8. Xiao, T.F., Li, X.D., Ho, J.K.: An adaptive discrete-time ILC strategy using fuzzy systems for iteration-varying reference trajectory tracking. Int. J. Control Autom. Syst. 13(1), 222–230 (2015)

232

K. Zeng

9. Chien, C.J.: A combined adaptive law for fuzzy iterative learning control of nonlinear systems with varying control tasks. IEEE Trans. Fuzzy Syst. 16(1), 40–51 (2008) 10. Li, J., Li, J.: Adaptive fuzzy iterative learning control with initial-state learning for coordination control of leader-following multi-agent systems. Fuzzy Sets Syst. 248, 122–137 (2014) 11. Liu, X., Kong, X.: Nonlinear fuzzy model predictive iterative learning control for drum-type boiler turbine system. J. Process Control 23(8), 1023–1040 (2013) 12. Amann, N., Owens, D.H., Rogers, E.: Predictive optimal iterative learning control. Int. J. Control 69(2), 203–226 (1998) 13. Chen, C., Xiong, Z., Zhong, Y.: Design and analysis of integrated predictive iterative learning control for batch process based on two-dimensional system theory. Chin. J. Chem. Eng. 22(7), 762–768 (2014) 14. Tayebi, A.: Adaptive iterative learning control for robot manipulators. Automatica 40(7), 1195– 1203 (2004) 15. Norrlof, M.: An adaptive iterative learning control algorithm with experiments on an industrial robot. IEEE Trans. Robot. Autom. 18(2), 245–251 (2002) 16. Xu, J.X., Chen, Y., Lee, T.H., Yamamoto, S.: Terminal iterative learning control with an application to RTPCVD thickness control. Automatica 35(9), 1535–1542 (1999) 17. Freeman, C.T., Tan, Y.: Iterative learning control with mixed constraints for point-to-point tracking. IEEE Trans. Control Syst. Technol. 21(3), 604–616 (2012) 18. Xu, J.X., Qu, Z.: Robust iterative learning control for a class of nonlinear systems. Automatica 34(8), 983–988 (1998) 19. Wiener, N.: Cybernetics or Control and Communication in the Animal and the Machine. Technology Press (1948) 20. Tustin, A.: The Mechanism of Economic Systems. Harvard University Press, Cambridge (1953) 21. Phillips, A.W.: The stabilization policy in a closed economy. Econ. J. 64(6), 290–323 (1954) 22. Cochrane, J.L., Graham, J.A.: Cybernetics and macroeconomics. Econ. Inq. 14(2), 241–250 (1976) 23. Nicholas, G.R.: The Entropy Law and the Economic Process. Cambridge, Mass. London, Harvard UP (1971) 24. Pindyck, R.: An application of the linear quadratic tracking problem to economic stabilization policy. IEEE Trans. Autom. Control 17(3), 287–300 (1972) 25. Pindyck, R.S.: Optimal Planning for Economic Stabilization (1973) 26. Pindyck, R.S., Roberts, S.M.: Studies of economic problems: optimal policies for monetary control. Ann. Econ. Soc. Meas. 3(1), 207–237 (1974) 27. Taylor, J.B.: A Criterion for Multiperiod Controls in Economic Models with Unknown Parameters. Department of Economics, Columbia University (1973) 28. Rausser, G.C., Freebairn, J.W.: Approximate adaptive control situations to US beef trade policy. Ann. Econ. Soc. Meas. 3(1), 177–204 (1974) 29. Kim, H.K., Goreux, L.M., Kendrick, D.A.: Feedback stochastic decision rule for commodity stabilization: an application of control theory to the world cocoa markets. In: Proceedings of the 1972 IEEE Conference on Decision and Control and 11th Symposium on Adaptive Processes, pp. 690–695 (1972) 30. Barrett, J.F., Coales, J.F., Ledwich, M.A., Naughton, J.J., Young, P.C.: Macroeconomic modelling: a critical appraisal. In: Proceedings of the IFAC/IFORS Conference on Dynamic Modelling and Control of National Economies (1973) 31. Ireland, N.J., Jones, H.G.: Optimality in advertising: a control theory approach. In: Proceedings of the IFORS/IFAC International Conference, Coventry, England (1973) 32. Szeto, M.W.: Estimation of the volatility of securities in the stock market by Kalman filtering techniques. In: Joint Automatic Control Conference (1973) 33. Sarris, A.H.: A Bayesian approach to estimation of time-varying regression coefficients. Ann. Econ. Soc. Meas. 2(4), 501–523 (1973) 34. Sandblom, C.L.: On Control Theory and Economic Stabilization. Lund University, Sweden (1970)

14 Iterative Learning Control for FinTech

233

35. Athans, M., Kendrick, D.: Control theory and economics: a survey, forecast, and speculations. IEEE Trans. Autom. Control 19(5), 518–524 (1974) 36. Hu, W., Liu, L., Feng, G.: Consensus of linear multi-agent systems by distributed eventtriggered strategy. IEEE Trans. Cybern. 46(1), 148–157 (2015) 37. Cheng, L., Hou, Z.G., Tan, M., Lin, Y., Zhang, W.: Neural-network-based adaptive leaderfollowing control for multiagent systems with uncertainties. IEEE Trans. Neural Netw. 21(8), 1351–1358 (2010) 38. Galbusera, L., Ferrari-Trecate, G., Scattolini, R.: A hybrid model predictive control scheme for containment and distributed sensing in multi-agent systems. Syst. Control Lett. 62(5), 413–419 (2013) 39. Dimarogonas, D.V., Egerstedt, M., Kyriakopoulos, K.J.: A leader-based containment control strategy for multiple unicycles. In: Proceedings of the 45th IEEE Conference on Decision and Control, pp. 5968–5973 (2006) 40. Liu, H., Xie, G., Wang, L.: Necessary and sufficient conditions for containment control of networked multi-agent systems. Automatica 48(7), 1415–1422 (2012) 41. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965) 42. Chien, C.J., Hsu, C.T., Yao, C.Y.: Fuzzy system-based adaptive iterative learning control for nonlinear plants with initial state errors. IEEE Trans. Fuzzy Syst. 12(5), 724–732 (2004) 43. Tsay, R.S.: Financial Time Series. Wiley StatsRef: Statistics Reference Online 1–23 (2014) 44. Anderson, B.D., Moore, J. B.: Optimal Filtering. Courier Corporation (2012) 45. Bibby, J., Toutenburg, H.: Prediction and Improved Estimation in Linear Models. Wiley (1977) 46. Anwaar, H., Xin, Y.Y., Ijaz, S.: A comprehensive survey on recent developments in iterative learning control algorithms and applications. In: 2017 29th Chinese Control and Decision Conference, pp. 3282–3289 (2017) 47. Xi, K., Liu, X.: A modified design framework for nonlinear model predictive iterative learning control. In: Proceedings of the 33rd Chinese Control Conference, pp. 7752–7757 (2014) 48. Shi, J., Jiang, Q., Cao, Z., Zhou, H., Yang, Y.: Design method of PID-type model predictive iterative learning control based on the two-dimensional generalized predictive control scheme. In: 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV), pp. 452–457 (2012) 49. Shen, D., Shen, D., Huang.: Iterative Learning Control with Passive Incomplete Information. Springer (2018) 50. Meng, D., Jia, Y.: Iterative learning approaches to design finite-time consensus protocols for multi-agent systems. Syst. Control Lett. 61(1), 187–194 (2012) 51. Chu, B., Freeman, C.T., Owens, D.H.: A novel design framework for point-to-point ILC using successive projection. IEEE Trans. Control Syst. Technol. 23(3), 1156–1163 (2014) 52. Chi, R., Hou, Z., Jin, S., Huang, B.: An improved data-driven point-to-point ILC using additional on-line control inputs with experimental verification. IEEE Trans. Syst. Man Cybern. Syst. 49(4), 687–696 (2017) 53. Zhou, Y., Yin, Y., Zhang, Q., Gan, W.: Model-free iterative learning control for repetitive impulsive noise using FFT. In: International Symposium on Neural Networks, pp. 461–467. Springer, Berlin, Heidelberg (2012) 54. Wei, Q., Liu, D., Shi, G.: A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Trans. Ind. Electron. 62(4), 2509–2518 (2014) 55. Shen, D., Xiong, G.: Discrete-time stochastic iterative learning control: a brief survey. In: Proceedings of the 10th World Congress on Intelligent Control and Automation, pp. 2624– 2629. IEEE (2012) 56. Cao, Y., Ren, W., Egerstedt, M.: Distributed containment control with multiple stationary or dynamic leaders in fixed and switching directed networks. Automatica 48(8), 1586–1597 (2012)

Index

A Advanced learning, 116 Artificial intelligence, 18

B Bakry-Emery curvature, 207 Big data, 18 Bitcoin, 119 Blockchain, 18, 120

C Causality analysis, 109 Cloud computing, 18 Complexity analysis, 150 Complexity of financial systems, 108 Composite multiscale entropy, 140 Conditional block entropy, 62 Convolutional neural network, 37 Credit scoring, 50

D Deep belief network, 42 Deep learning, 35 Diffusion-driven instability, 127 Diffusion process, 110 Digital economy, 16 Digital inclusive finance, 1, 14 Digital industry of fintech, 9 Digital transformation, 10 Digitization, 10

Dirichlet form, 157, 159

E Efficient market hypothesis, 60 Exchange rate, 28

F Financial system, 107, 108, 110 Fintech, 1 Fintech regulation, 2 Fractal structure, 111 Fuzzy control, 224

G Gaussian heat kernel, 211 Global well-posedness, 181 Group signature, 79

H Heat equation, 207 Hilbert W ∗ -bimodules, 157

I Information entropy, 62 Insurance industry, 11 Intelligent finance, 15 Intelligent optimization, 108 Iterative learning control, 217

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 Z. Zheng (ed.), Proceedings of the First International Forum on Financial Mathematics and Financial Technology, Financial Mathematics and Fintech, https://doi.org/10.1007/978-981-15-8373-5

235

236 L Lattice-based group signature, 82 Leader-following problem, 229 Li-Yau inequality, 207 Long short-term memory, 40

M Machine learning, 35 Markov semigroups, 158 Multi-agent system, 223 Multiscale entropy, 140 Multivariate multiscale entropy, 140

N Nonlinear tracking-differentiator, 113 Numeraire-free market, 21

O Open banking, 10 Operator-valued Dirichlet form, 158 Order Hilbert space, 158

Index Q Quasi-linear parabolic equation, 177

R Random oracle model, 81 Recurrent neural network, 38 Regulatory sandbox, 2 Reinforcement learning, 49 Ricci curvature, 207

S Self-causality, 66 Self-Interactive process, 111 Shannon entropy, 62 Social enterprise, 98 Stock prediction, 50 Supply chain, 19

T Transfer entropy, 62

V Von Neumann algebra, 158 P Parameter estimation, 122 Population model, 121 Pricing methodology, 22

Y Young’s inequality , 191