Modeling in Mathematics 1774071436, 9781774071434

Modeling in Mathematics deals with the concept of modeling in the field of mathematics. It includes a statistical and sp

217 58 29MB

English Pages 290 [366] Year 2019

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Title Page
Copyright
DECLARATION
ABOUT THE EDITOR
TABLE OF CONTENTS
List of Contributors
List of Abbreviations
Preface
Chapter 1 A Statistical and Spectral Model for Representing Noisy Sounds with Short-Time Sinusoids
Abstract
Introduction
Background
Justification of a Spectral And Statistical Model For Noises
The CNSS Model
Analysis
Synthesis
Application and Conclusion
Conclusion
Acknowledgments
References
Chapter 2 Robust Bayesian Regularized Estimation Based on Regression Model
Abstract
Introduction
T Regression Model and L1 Norm Regularization
Bayesian Formulation of Robust T Adaptive Lasso
Simulation Studies
Land Rent Data
Conclusion Remarks
Appendices
Acknowledgment
References
Chapter 3 Robust Quadratic Regression and Its Application to Energy-Growth Consumption Problem
Abstract
Introduction
Robust Quadratic Regression Models
Robust Energy-Growth Regression Models
Numerical Experiments
Conclusions and Future Works
Acknowledgment
References
Chapter 4 Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions
Abstract
Introduction
Restricted IFWLR Model With Fuzzy Entropy
Estimation of Regression Coefficients
Numerical Examples
Conclusions
References
Chapter 5 A New Method of Hypothesis Test for Truncated Spline Nonparametric Regression Influenced by Spatial Heterogeneity and Application
Abstract
Introduction
Truncated Spline Nonparametric Regression Influenced by Heterogeneity Spatial
Method
Parameter Estimation Under Space H0 and Space Population in The Model
Statistics Test For Truncated Spline Nonparametric Regression With Spatial Heterogeneity
Distribution of Test Statistic and Critical Area of Hypothesis
Empirical Study on Unemployment Rate in Java Indonesia
Conclusion
Acknowledgments
References
Chapter 6 A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements
Abstract
Introduction
Literature Review
Empirical Analysis
Conclusion and Suggestion
References
Chapter 7 Dynamical Analysis in Explicit Continuous Iteration Algorithm and its Applications
Abstract
Introduction
Construction of Explicit Continuous Iterative Algorithm
Error Analysis
Numerical Experiments
Conclusion
Acknowledgements
References
Chapter 8 A New Stability Analysis of Uncertain Delay Differential Equations
Abstract
Introduction
Uncertain Delay Differential Equation
Almost Sure Stability
Stability Theorem
Comparison
Conclusion
Appendix
Acknowledgments
References
Chapter 9 Dynamical Analysis and Chaos Control of a Discrete SIS Epidemic Model
Abstract
Introduction
Analysis of Equilibria
Analysis of Bifurcation
Numerical Simulation
Chaos Control
Conclusion
Acknowledgements
Authors’ Contributions
References
Chapter 10 Applications of Parameterized Nonlinear Ordinary Differential Equations and Dynamic Systems: An Example of the Taiwan Stock Index
Abstract
Introduction
Literature Review
Methodology
Empirical Study
Conclusions
References
Chapter 11 Electricity Market Stochastic Dynamic Model and Its Mean Stability Analysis
Abstract
Introduction
The Stochastic Dynamic Modeling of Electricity Market
Stochastic Differential Equation Theories
Mean Stability of Electricity Market Stochastic Model
The Numerical Examples
Conclusions
Acknowledgment
References
Chapter 12 Using Artificial Neural Networks to Predict Direct Solar Irradiation
Abstract
Introduction
Literature Review of Estimation of Direct Solar Radiation
Test Area and Data
Feedforward Neural Network
Experimental Procedure
Results And Discussions
Conclusions
References
Chapter 13 Modeling of Relative Intensity Noise And Terminal Electrical Noise of Semiconductor Lasers Using Artificial Neural Network
Abstract
Introduction
Artificial Neural Network Modeling
Results And Discussions
Conclusion
References
Chapter 14 Quantum-Like Bayesian Networks for Modeling Decision Making
Introduction
Violations of The Sure Thing Principle
Violation of The Sure Thing Principle: Classical Approaches
Violation of The Sure Thing Principle: Quantum-Like Approaches
Problems With Current Classical and Quantum-Like Approaches
A Quantum-Like Bayesian Network For Decision And Cognition
Experimental Results
Discussion and Conclusion
Acknowledgments
References
Index
Back Cover
Recommend Papers

Modeling in Mathematics
 1774071436, 9781774071434

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

MODELING IN MATHEMATICS

MODELING IN MATHEMATICS

Edited by: Olga Moreira

ARCLER

P

r

e

s

s

www.arclerpress.com

Modeling in Mathematics Olga Moreira Arcler Press 2010 Winston Park Drive, 2nd Floor Oakville, ON L6H 5R7 Canada www.arclerpress.com Tel: 001-289-291-7705         001-905-616-2116 Fax: 001-289-291-7601 Email: [email protected] e-book Edition 2020 ISBN: 978-1-77407-376-6 (e-book) This book contains information obtained from highly regarded resources. Reprinted material sources are indicated. Copyright for individual articles remains with the authors as indicated and published under Creative Commons License. A Wide variety of references are listed. Reasonable efforts have been made to publish reliable data and views articulated in the chapters are those of the individual contributors, and not necessarily those of the editors or publishers. Editors or publishers are not responsible for the accuracy of the information in the published chapters or consequences of their use. The publisher assumes no responsibility for any damage or grievance to the persons or property arising out of the use of any materials, instructions, methods or thoughts in the book. The editors and the publisher have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission has not been obtained. If any copyright holder has not been acknowledged, please write to us so we may rectify. Notice: Registered trademark of products or corporate names are used only for explanation and identification without intent of infringement. © 2020Arcler Press ISBN: 978-1-77407-143-4 (Hardcover) Arcler Press publishes wide variety of books and eBooks. For more information about Arcler Press and its products, visit our website at www.arclerpress.com

DECLARATION Some content or chapters in this book are open access copyright free published research work, which is published under Creative Commons License and are indicated with the citation. We are thankful to the publishers and authors of the content and chapters as without them this book wouldn’t have been possible.

ABOUT THE EDITOR

Olga Moreira obtained her Ph.D. in Astrophysics from the University of Liege (Belgium) in 2010, her BSc. in Physics and Applied Mathematics from the University of Porto (Portugal). Her post-graduate travels and international collaborations with the European Space Agency (ESA) and European Southern Observatory (ESO) led to great personal and professional growth as a scientist. Currently, she is working as an independent researcher, technical writer, and editor in the fields of Mathematics, Physics, Astronomy and Astrophysics.

TABLE OF CONTENTS



List of Contributors........................................................................................xv



List of Abbreviations..................................................................................... xxi

Preface................................................................................................... ....xxiii Chapter 1

A Statistical and Spectral Model for Representing Noisy Sounds with Short-Time Sinusoids.................................................................................. 1 Abstract...................................................................................................... 1 Introduction................................................................................................ 2 Background................................................................................................ 3 Justification of a Spectral And Statistical Model For Noises.......................... 4 The CNSS Model...................................................................................... 10 Analysis.................................................................................................... 15 Synthesis................................................................................................... 21 Application and Conclusion..................................................................... 25 Conclusion............................................................................................... 27 Acknowledgments.................................................................................... 27 References................................................................................................ 29

Chapter 2

Robust Bayesian Regularized Estimation Based on Regression Model...... 33 Abstract.................................................................................................... 33 Introduction.............................................................................................. 34 T Regression Model and L1 Norm Regularization...................................... 36 Bayesian Formulation of Robust T Adaptive Lasso..................................... 38 Simulation Studies.................................................................................... 42 Land Rent Data......................................................................................... 44 Conclusion Remarks................................................................................. 45 Appendices............................................................................................... 46 Acknowledgment...................................................................................... 50 References................................................................................................ 51

Chapter 3

Robust Quadratic Regression and Its Application to Energy-Growth Consumption Problem..................................................... 53 Abstract.................................................................................................... 53 Introduction.............................................................................................. 54 Robust Quadratic Regression Models........................................................ 57 Robust Energy-Growth Regression Models................................................ 64 Numerical Experiments............................................................................. 67 Conclusions and Future Works................................................................. 70 Acknowledgment...................................................................................... 71 References................................................................................................ 72

Chapter 4

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions.................................................. 75 Abstract.................................................................................................... 75 Introduction.............................................................................................. 76 Restricted IFWLR Model With Fuzzy Entropy............................................ 81 Estimation of Regression Coefficients........................................................ 85 Numerical Examples................................................................................. 93 Conclusions.............................................................................................. 95 References................................................................................................ 96

Chapter 5

A New Method of Hypothesis Test for Truncated Spline Nonparametric Regression Influenced by Spatial Heterogeneity and Application........................................................................................ 99 Abstract.................................................................................................... 99 Introduction............................................................................................ 100 Truncated Spline Nonparametric Regression Influenced by Heterogeneity Spatial............................................................... 101 Method................................................................................................... 107 Parameter Estimation Under Space H0 and Space Population in The Model................................................................................. 108 Statistics Test For Truncated Spline Nonparametric Regression With Spatial Heterogeneity........................................................... 113 Distribution of Test Statistic and Critical Area of Hypothesis.................... 115 Empirical Study on Unemployment Rate in Java Indonesia...................... 121 Conclusion............................................................................................. 124 Acknowledgments.................................................................................. 125 References.............................................................................................. 126

x

Chapter 6

A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements......................................... 129 Abstract.................................................................................................. 129 Introduction............................................................................................ 130 Literature Review.................................................................................... 131 Empirical Analysis.................................................................................. 137 Conclusion and Suggestion..................................................................... 142 References.............................................................................................. 145

Chapter 7

Dynamical Analysis in Explicit Continuous Iteration Algorithm and its Applications................................................................................ 149 Abstract.................................................................................................. 149 Introduction............................................................................................ 150 Construction of Explicit Continuous Iterative Algorithm.......................... 151 Error Analysis.......................................................................................... 153 Numerical Experiments........................................................................... 156 Conclusion............................................................................................. 160 Acknowledgements................................................................................ 160 References.............................................................................................. 161

Chapter 8

A New Stability Analysis of Uncertain Delay Differential Equations...... 163 Abstract.................................................................................................. 163 Introduction............................................................................................ 164 Uncertain Delay Differential Equation.................................................... 165 Almost Sure Stability............................................................................... 167 Stability Theorem.................................................................................... 169 Comparison............................................................................................ 177 Conclusion............................................................................................. 178 Appendix................................................................................................ 178 Acknowledgments.................................................................................. 180 References.............................................................................................. 181

Chapter 9

Dynamical Analysis and Chaos Control of a Discrete SIS Epidemic Model..................................................................................... 185 Abstract.................................................................................................. 185 Introduction............................................................................................ 186 xi

Analysis of Equilibria.............................................................................. 188 Analysis of Bifurcation............................................................................ 191 Numerical Simulation............................................................................. 200 Chaos Control......................................................................................... 206 Conclusion............................................................................................. 208 Acknowledgements................................................................................ 209 Authors’ Contributions............................................................................ 209 References.............................................................................................. 210 Chapter 10 Applications of Parameterized Nonlinear Ordinary Differential Equations and Dynamic Systems: An Example of the Taiwan Stock Index...................................................................................................... 213 Abstract.................................................................................................. 213 Introduction............................................................................................ 214 Literature Review.................................................................................... 215 Methodology.......................................................................................... 216 Empirical Study....................................................................................... 223 Conclusions............................................................................................ 236 References.............................................................................................. 237 Chapter 11 Electricity Market Stochastic Dynamic Model and Its Mean Stability Analysis.................................................................................... 239 Abstract.................................................................................................. 239 Introduction............................................................................................ 240 The Stochastic Dynamic Modeling of Electricity Market.......................... 242 Stochastic Differential Equation Theories................................................ 245 Mean Stability of Electricity Market Stochastic Model............................. 247 The Numerical Examples........................................................................ 251 Conclusions............................................................................................ 255 Acknowledgment.................................................................................... 255 References.............................................................................................. 256 Chapter 12 Using Artificial Neural Networks to Predict Direct Solar Irradiation..... 259 Abstract.................................................................................................. 259 Introduction............................................................................................ 260 Literature Review of Estimation of Direct Solar Radiation........................ 261 Test Area and Data.................................................................................. 262 xii

Feedforward Neural Network.................................................................. 263 Experimental Procedure.......................................................................... 264 Results And Discussions......................................................................... 266 Conclusions............................................................................................ 269 References.............................................................................................. 271 Chapter 13 Modeling of Relative Intensity Noise And Terminal Electrical Noise of Semiconductor Lasers Using Artificial Neural Network........... 275 Abstract.................................................................................................. 275 Introduction............................................................................................ 276 Artificial Neural Network Modeling........................................................ 277 Results And Discussions......................................................................... 279 Conclusion............................................................................................. 283 References.............................................................................................. 284 Chapter 14 Quantum-Like Bayesian Networks for Modeling Decision Making........ 285 Introduction............................................................................................ 286 Violations of The Sure Thing Principle..................................................... 288 Violation of The Sure Thing Principle: Classical Approaches.................... 290 Violation of The Sure Thing Principle: Quantum-Like Approaches........... 293 Problems With Current Classical and Quantum-Like Approaches........... 302 A Quantum-Like Bayesian Network For Decision And Cognition............ 304 Experimental Results............................................................................... 316 Discussion and Conclusion..................................................................... 326 Acknowledgments.................................................................................. 327 References.............................................................................................. 328 Index...................................................................................................... 333

xiii

LIST OF CONTRIBUTORS Pierre Hanna SCRIME, Laboratoire Bordelais de Recherche en Informatique (LaBRI), Universite Bordeaux 1, 33405 Talence Cedex, France Myriam Desainte-Catherine SCRIME, Laboratoire Bordelais de Recherche en Informatique (LaBRI), Universite Bordeaux 1, 33405 Talence Cedex, France Zean Li School of Computer Science and Technology, Nantong University, Nantong 226019, China Weihua Zhao School of Science, Nantong University, Nantong 226019, China Yongzhi Wang College of Instrumentation & Electrical Engineering, Jilin University, Changchun 130061, China Yuli Zhang Department of Automation, TNList, Tsinghua University, Beijing 100084, China Fuliang Zhang Development and Research Center of China Geological Survey, Beijing 100037, China Jining Yi Development and Research Center of China Geological Survey, Beijing 100037, China School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, China

xv

Gaurav Kumar Singhania University, Pacheri Bari, Jhunjhunu, Rajasthan 333515, India  Rakesh Kumar Bajaj Jaypee University of Information Technology, Waknaghat 173234, India Sifriyani Department of Mathematics, Faculty of Mathematics and Natural Sciences, Mulawarman University, Samarinda, Indonesia I. N. Budiantara Department of Statistics, Faculty of Mathematics, Computing and Data Sciences, Sepuluh Nopember Institute of Technology, Surabaya, Indonesia S. H. Kartiko Department of Mathematics, Faculty of Mathematics and Natural Sciences, Gadjah Mada University, Yogyakarta, Indonesia Gunardi Department of Mathematics, Faculty of Mathematics and Natural Sciences, Gadjah Mada University, Yogyakarta, Indonesia Suduan Chen Department of Accounting Information, National Taipei University of Business, 321 Jinan Road, Section 1, Taipei 10051, Taiwan Yeong-Jia James Goo Department of Business Administration, National Taipei University, No. 67, Section 3, Ming-shen East Road, Taipei 10478, Taiwan Zone-De Shen Department of Business Administration, National Taipei University, No. 67, Section 3, Ming-shen East Road, Taipei 10478, Taiwan Qingyi Zhan College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, P.R. China

xvi

Zhifang Zhang Department of Sciences and Education, Fujian Center for Disease Control and Prevention, Fuzhou, P.R. China Xiangdong Xie Ningde Normal University, Ningde, P.R. China Xiao Wang School of Economics and Management, Beijing Institute of Petrochemical Technology, Beijing 102617, China Beijing Academy of Safety Engineering and Technology, Beijing 102617, China Yufu Ning School of Information Engineering, Shandong Youth University of Political Science, Jinan 250103, China Zengyun Hu State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Beijing Road, Urumqi, 830011, China College of Mathematics and System Sciences, Xinjiang University, Shengling Road, Urumqi, 830046, China Zhidong Teng College of Mathematics and System Sciences, Xinjiang University, Shengling Road, Urumqi, 830046, China Chaojun Jia State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Beijing Road, Urumqi, 830011, China Chi Zhang State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Beijing Road, Urumqi, 830011, China Long Zhang College of Mathematics and System Sciences, Xinjiang University, Shengling Road, Urumqi, 830046, China xvii

Meng-Rong Li Department of Mathematical Sciences, National Chengchi University, Taipei 116, Taiwan Tsung-Jui Chiang-Lin Graduate Institute of Finance, National Taiwan University of Science and Technology, Taipei 106, Taiwan Yong-Shiuan Lee Department of Statistics, National Chengchi University, Taipei 116, Taiwan Zhanhui Lu School of Mathematics and Physical Science, North China Electric Power University, Beijing 102206, China Weijuan Wang School of Mathematics and Physical Science, North China Electric Power University, Beijing 102206, China Gengyin Li School of Electrical and Electronic Engineering, North China Electric Power University, Beijing 102206, China  Di Xie School of Mathematics and Physical Science, North China Electric Power University, Beijing 102206, China James Mubiru Department of Physics, Makerere University, Kampala, Uganda A. Rezaei Electrical Engineering Department, Kermanshah University of Technology, Kermanshah, Iran L. Noor Electrical Engineering Department, Kermanshah University of Technology, Kermanshah, Iran

xviii

Catarina Moreira  Department of Computer Science, Instituto Superior Técnico, University of Lisbon, INESC-ID, Lisbon, Portugal Andreas Wichert Department of Computer Science, Instituto Superior Técnico, University of Lisbon, INESC-ID, Lisbon, Portugal

xix

LIST OF ABBREVIATIONS ACF

Autocorrelation function

ANN

Artificial neural network

BALS.L

Bayesian adaptive Lasso

CPAs

Certified public accountants

LS-CQR

Classical least square quadratic regression

DT

Decision Tree

DAE

Differential/algebraic equations

EC

Energy consumption

ERB

Equivalent rectangular band

EBP

Error based pruning

EM

Euler-Maruyama

ECI

Explicit continuous iterative

FFS

Fraudulent financial statement

FLR

Fuzzy linear regression

GWR

Geographically Weighted Regression

IFS

Intuitionistic fuzzy set

IFWLR

Intuitionistic fuzzy weighted linear regression

LARS

Least angle regression

LAD

Least absolute deviance

LS-RQR

Least square quadratic regression

LPC

Linear predictive coding

MAPE

Mean Absolute Percentage Error

MLE

Maximum Likelihood Estimator

MLRT

Maximum likelihood ratio test

MBE

Mean bias error

MMAD

Median of mean absolute deviations

MLP

Multi-layer perceptron

NN

Neural network

OUR

Open Unemployment Rate

ODEs

Ordinary differential equations

PACF

Partial autocorrelation function

PEP

Pessimistic error pruning

RIN

Relative intensity noise

RMSE

Root mean square error

SCAD

Smoothly clipped absolute deviation

SOCP

Second-order cone programming

STFT

Short-time Fourier transform

STAR

Smooth transition autoregressive

SMS

Spectral representations of sounds

SCM

Subcarrier multiplexed

SVM

Support vector machine

TAIEX

Taiwan Stock Exchange Capitalization Weighted Stock Index

TEN

Terminal electrical noise

xxii

PREFACE

The development of mathematical models consists of describing natural phenomena or patterns observed as a system defined by a set of governing equations. “Modeling in Mathematics” is a selection of contemporaneous open-access books featuring three classes of mathematical models: statistical, dynamical and machine learning models. Statistical models are used for testing hypotheses by comparing theoretical mathematical models and experimental measurements by establishing a relationship between random variables are physical (non-random) variables. Regression analysis is often used in statistical modeling for estimating relationships among variables focusing on the relationship between dependent variables (output variable representing the outcome of an experiment) and independent variables (input variable representing the cause or potential reason for an observed pattern). Chapters 1 to 6 include several open-access papers about statistical modeling focusing mainly on regression and Bayesian estimation methods. Dynamical models are developed for predicting the stability of a system over time and are usually described as a set of differential equations. The stability of a dynamical system usually entails the analysis of solutions of differential equations under small perturbations of initial conditions. It involves the study of equilibrium points, bifurcations, and chaotic mathematical models. Chapters 7 to 11 include several open-access papers on dynamical stability modeling and its applications. With the increase of computing power, artificial neural networks are becoming a powerful tool for solving a variety of complex problems. Artificial neural networks have successfully used for exploring, classifying, and identifying patterns in data. The last group of selected open-access papers (Chapters 12 to 14) is focused on machine learning methods, particularly, the use of artificial neural networks in the mathematical modeling of physical and biological systems as well as the human decision-making theory. The intended audience of this book is scientists, engineers and graduate students who are familiar with statistical inference, dynamical systems, stability theory, and neural network concepts.

CHAPTER 1

A Statistical and Spectral Model for Representing Noisy Sounds with Short-Time Sinusoids Pierre Hanna and Myriam Desainte-Catherine SCRIME, Laboratoire Bordelais de Recherche en Informatique (LaBRI), Universite Bordeaux 1, 33405 Talence Cedex, France

ABSTRACT We propose an original model for noise analysis, transformation, and synthesis: the CNSS model. Noisy sounds are represented with short-time sinusoids whose frequencies and phases are random variables. This spectral and statistical model represents information about the spectral density of frequencies. This perceptually relevant property is modeled by three mathematical parameters that define the distribution of the frequencies. This model also represents the spectral envelope. The mathematical parameters are defined and the analysis algorithms to extract these parameters from sounds are introduced. Then algorithms for generating sounds from the Citation (APA): Hanna, P., & Desainte-Catherine, M. (2005). A statistical and spectral model for representing noisy sounds with short-time sinusoids. EURASIP Journal on Advances in Signal Processing, 2005(12), 182056. (13 pages). DOI: https://doi.org/10.1155/ ASP.2005.1794 Copyright: This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

2

Modeling in Mathematics

parameters of the model are presented. Applications of this model include tools for composers, psychoacoustic experiments, and pedagogy. Keywords and phrases: stochastic part of sounds, analysis and realtime synthesis of noisy sounds, spectral models, spectral density, musical transformations of sounds.

INTRODUCTION Computers offer new possibilities for sound processing. Applications are numerous in the musical field. Digital sound models are developed to represent signals with mathematical parameters in order to allow composers to transform the original sound in a musical way. Noises are used more and more frequently in contemporary music, especially in electroacoustic music. A new vocabulary describing noisy sound properties has been proposed during the twentieth century [1]. We consider as noisy sounds the natural sounds such as rubbing or scratching, but also some parts of instrumental sounds like the breath of a saxophone, and speech phonemes such as consonants or whispered voices. Existing models only represent sounds composed of low noise levels. They consider natural sounds as mixes of sinusoids (deterministic part) and noise (stochastic part). They first extract sinusoids and model the residual using a low noise model such as LPC [2] or piecewise-linear spectral envelopes [3, 4]. In such approaches, the noisy part is assumed to be parts of the signal that cannot be represented with sinusoids whose amplitude and frequency slowly vary with time and is implicitly defined as whatever is left after the sinusoidal analysis/synthesis. These approximations lead to audible artifacts and explain why such models are limited to the analysis and the synthesis of purely noisy signals. Our research concerns improvements of the modeling of this noisy part. The goals is the extraction of the structure of the pseudoperiodic components and to propose a reasonable approximation of the residual relying on psychoacoustics. We focus on robust stand-alone noise modeling. In this article, we present an original noise model to analyze, transform, and synthesize such noisy signals in real time. This spectral model represents noisy signals with shorttime sinusoids whose frequency values are randomly chosen according to statistical parameters. Modifying these mathematical parameters extracted from sounds leads to original transformations which cannot be performed using the previously described models. Some of these

A Statistical and Spectral Model for Representing Noisy Sounds....

3

transformations are related to the modification of the distribution of the frequency values of the sinusoids in the spectra. Psychoacoustic experiments show that this distribution is perceptually relevant and mainly depends on the number of sinusoidal components. For this reason, we focus on the spectral density and the mathematical parameters related to it. After presenting existing models and their limitations in Section 2, we present theory behind the representation of noise with short-time sinusoids in Section 3. The new model and its mathematical parameters are defined in Section 4. Then, in Section 5, we propose an original method to extract these parameters from analyzed sounds. In Section 6, the synthesis algorithms are detailed before presenting the limitations of this model and two applications in Section 7.

BACKGROUND Many model types have been considered for music synthesis. In this section, we summarize previous approaches to analyzing, transforming, and synthesizing noise-like signals and indicate their limitations.

Temporal Models The existing models for analyzing, transforming, and synthesizing noisy sounds are temporal or spectral models. Temporal models generate noises by randomly drawing samples using a standard distribution (uniform, normal, etc.). Then, they may be filtered (subtractive synthesis). The main temporal models use linear predictive coding (LPC) to color a white noise source. These approaches are common in speech research but are less closely linked to perception and are less flexible [5, 6].

Spectral Models We are particularly interested in spectral models because they are useful for (mostly) harmonic sounds [7]. These sinusoidal models are very accurate for sounds with low noise levels and are intuitively controlled by users [8]. Therefore, it seems interesting to adapt them to the representation of more complex sounds. Several advances have been proposed in the area of sinusoid-plusnoise models [9]. Macon has extended the ABS/OLA model [10] to enable time-scale and pitch modifications to unvoiced and noise-like signals by randomizing phases [11]. Another extension proposes to modulate the frequencies and/or amplitudes with a lowpass-filtered noise [12].

4

Modeling in Mathematics

In 1989, research led to hybrid models, which decompose the original sounds into two independent parts: the sinusoidal part and the stochastic part [13]. Extensions have been proposed to consider transients separately [14, 15]. The stochastic part corresponds to the noisy part of the original sound. It is entirely defined by the time-varying spectral envelopes [6]. Other methods use piecewise-linear spectral envelopes [3], LPC [2], or DCT modeling of the spectrum [16]. Another residual model related to the properties of the auditory system is proposed in [4]. The noisy part of any sound is represented by the time-varying energy in each equivalent rectangular band (ERB). However, because of such approximations, artifacts may result if this model is applied to sounds with high-level stochastic components. Hybrid models considerably improve the quality of synthesized sounds, but it is desirable to present more parameters for the user to control musical sounds. The only musical parameters presented to the composers are amplitude (related to the volume) and spectral envelope (related to the color). We propose to develop a robust noise model that allows the largest possible number of high-fidelity transformations on the analysis data before synthesis. Experiments have demonstrated the importance of the spectral density of sinusoidal components [17, 18]. A spectral model for noisy sounds is adequate to control these parameters. Furthermore, the color of the noise, related to the spectral envelope, is intuitively represented on the frequency scale. For these reasons, the modeling of noisy sounds in the spectral domain is justified. However, the mathematical justification of the representation of any random signal by a sum of sinusoids with timeconstant amplitude, frequency, and phase, is not straightforward. Similar models have been developed with theory in physics [19].

JUSTIFICATION OF A SPECTRAL AND STATISTICAL MODEL FOR NOISES In this section, we present the justifications from the fields of statistics, physics, perception, and music, for the proposed spectral model of noisy sounds.

A Statistical and Spectral Model for Representing Noisy Sounds....

5

Thermal Noise Model Thermal noises can be described in terms of a Fourier series [19]:



(1)

where N is the number of frequencies, n is an index, ωn are equally spaced component frequencies, Cn are random variables distributed according to a Rayleigh distribution, and Φn are random variables uniformly distributed between 0 and 2π. The samples X defining the signal are distributed according to a normal law. This definition is the starting point of our work. This definition represents a noise by a finite sum of sinusoids. It justifies a spectral model for the noisy sounds and is the central point of the model presented in this article. Nevertheless, the thermal noise model does not specify the number of sinusoids and the difference between frequencies. It is obvious that choosing N = 2 sinusoids in a frequency band whose width is 20 kHz is not sufficient to synthesize a white noise that is perceptually equivalent to a white noise synthesized by randomly distributing samples according to a Gaussian law. So long as the number of sinusoids is not small, the synthesized samples are normally distributed because of the law of large numbers [19]. Nevertheless, this normal distribution is not sufficient to define colored noise. The perception is sensitive to the number N of sinusoids (for a given bandwidth). It is detailed in the next section.

Figure 1: Illustration of the spectral density: spectral differences between (a) Gaussian white noise and (b) white noise whose spectral density is low (synthesized with the CNSS model). The black parts indicate gaps of energy. These gaps are perceived and allow human ears to differentiate these two sounds.

6

Modeling in Mathematics

It is important to note that the representation of the stochastic part for the hybrid models (e.g., the SMS model) is implicitly based on the thermal noise model. The only difference comes from the deterministic definition of the amplitudes of the sinusoids. Indeed, the resynthesis part [3] generates shorttime spectra from the spectral envelopes by randomly distributing phase values according to a uniform law. Then an inverse Fourier transform is computed. This mathematical operation consists of summing a fixed number of sinusoids whose frequencies are equally spaced, and whose amplitudes are fixed. We denote by Fs the sample rate and by Ws the size of the synthesis window. The difference between successive frequencies is Fs/Ws, and the number of sinusoids is Ws/2 (number of sinusoids in the audible frequency range). This synthesis model can be described by the equation: (2) The necessary number of sinusoids needs to be discussed. Even if experiments confirm that the number implicitly used when computing an inverse Fourier transform appears to be sufficient [6], the question is to know if it is necessary. Another question would be to know if it is necessary to define in a random way the amplitudes of sinusoids. Here again, experiments seem to indicate that fixed amplitudes do not introduce audible artifacts [20].

Perception of the Spectral Density Psychoacoustic Experiments For psychoacoustic experiments, noise can be synthesized by two different ways. The first one filters white noise computed by random distribution according to a normal or a uniform law [5]. The second one requires the desired noise spectrum and synthesizes sound by summing sinusoids whose amplitudes depend on this desired noise spectrum [19]. The spectral method is based on the thermal noise model. It is generally preferred because it directly controls the spectrum [21]. This approach raises the question of the necessary number of sinusoids to generate a noise which cannot be discriminated from noises synthesized by random distribution of samples. Gerzso did the first experiments in [17]. These experiments have been improved by Hartmann et al. [18] to study the human ability to discriminate bands of noise composed of different numbers of sinusoids.

A Statistical and Spectral Model for Representing Noisy Sounds....

7

Results of these experiments are numerous. In the case of a narrowband of noise, the mechanism of discrimination is related to the intensity fluctuations [21]. In the case of broadband of noise, it is related to the spectral resolution [18]. This result leads to the fact that humans would perceive the energy variations in the short-time spectrum corresponding to wide intervals between two neighbor frequencies. Figure 1 shows two synthetic sounds: The second one is characterized by a low spectral density that is indicated by black points corresponding to spectral gaps. These experiments show that human auditory system is sensitive to the number of sinusoids used to synthesize bands of noise. In the following, this number is thus assumed to be a perceptual characteristic of sounds. It is related to spectral gaps or intensity fluctuations. The control of the number of sinusoids is thus related to perception of sounds. Moreover, these experiments confirm that a spectral approach to noise synthesis is possible. Indeed, it is now possible to compute an adequate number of oscillators to synthesize white noise whose spectral density and bandwidth are at their maximum. This case corresponds to the highest computational cost.

Definition of the Spectral Density Psychoacoustic experiments show that the spectral density is used by the auditory system to discriminate bands of noise. Nevertheless, giving an exact and complete definition of the spectral density is difficult. Gerzso related the spectral density to the ratio of the number of sinusoids by the width of the frequency band [17]. For a band of noise of width ∆F with N sinusoids, the spectral density ρ is defined as (3) We believe that this definition is not strictly correct, because it does not take into account the distribution of the sinusoidal component frequencies [20] and the duration of the band of noise. Perception of pitch depends on the duration of sounds [22]. The experiments we have done confirm that the use of successive short-time windows may cancel the sensation related to a low spectral density. Indeed, the difference between a thermal noise and a harmonic sound comes from the value of the difference between successive frequencies. This difference corresponds to the fundamental frequency. Periodic sound waves

8

Modeling in Mathematics

can have a pitch only if it has a sufficient duration. This duration depends on the periodicity. Psychoacoustic experiments indicate that the number of cycles necessary lies in the range of tens of cycles [23]. This observation is also confirmed by the usual method based on the inverse Fourier transform. This method considers the number of sinusoids as a function of the number of samples, and thus as a function of the duration of the synthesized sound. In the following, we consider two independent parameters: the number of sinusoids and the duration of the sound.

Statistical Model The spectral model we propose is based on the thermal noise model. This model defines the frequencies of the sinusoids as equally spaced. The study of the perception of the spectral density shows that humans can perceive spectral gaps or intensity fluctuations. These phenomena can be due to one or more missing sinusoids. We thus propose to define frequencies as random variables which are controlled by mathematical parameters. The random property of the frequencies is justified because the ear is not sensitive to the precise information about the intensity fluctuations or the spectral gaps, but to their statistical properties. It is useful to represent the probabilities, but it seems useless to retain exact informations about these properties. Moreover, the study of the intensity fluctuations shows that they are dependent on the distribution of the phases of sinusoids [20]. This dependency is illustrated by the two limits: phases with same values and uniformly distributed values. In the first case, intensity fluctuations grow as the number of sinusoids increases [19], because the corresponding waveforms are composed of one or more intensity peaks that are audible. Conversely, uniformly distributed phases correspond to the thermal noise model and lead to weak intensity fluctuations. Therefore, it appears useful to control this phase distribution in order to modify the audible properties related to the intensity variations. The thermal noise model considers the amplitudes of the sinusoids as random variables distributed according to a Rayleigh law. Practically, fixed amplitudes lead to bands of noise that cannot be discriminated from bands of noise synthesized with sinusoids whose amplitudes are randomly determined [19]. Moreover, we haven’t managed to relate the distribution of the amplitudes to a perceptual property. For these reasons, we restrain our spectral model to fixed amplitudes determined from spectral envelopes.

A Statistical and Spectral Model for Representing Noisy Sounds....

9

Mathematical Justification The distribution of the frequencies and the phases of sinusoids that compose bands of noise are perceptually relevant. The synthesized signal can thus be described from (1) by (4) where Fn and Φn are random variables, and an are fixed values.

We can mathematically show that this spectral and statistical approach, based on the thermal noise model, defines a white noise in the case of constant amplitudes (for all n, an = a0). White noise satisfies the following equation: (5) where

denotes the expectation [24].

By writing for all (p, q) the product of the expectation of Xp and X(p + q), we have



(6)

Since this expectation is defined by integrating over the phases, which are assumed to be uniformly distributed in the interval [0; 2π), the equation reduces to (7) and, for l

n, (8)

It leads to

10

Modeling in Mathematics



(9)

We conclude that for all integers (p, q),

These equations correspond to the definition of white noise.

Figure 2: Parameters for the control of the frequency distribution: the number of bins defines the edge of the bins and the bin width is determined by the parameter L.

Concerning the white noise, two assumptions are thus imposed by this definition. The first one concerns phases which have to be uniformly distributed between 0 and 2π. The second one concerns frequencies which also have to be uniformly distributed all over the audible frequency range.

THE CNSS MODEL In the previous section, we have shown that a band of noise can be represented by a sum of sinusoids. This representation is the starting point of the statistical and spectral model we present in this paper: the CNSS model.

Definition The CNSS model (colored noise by sum of sinusoids) defines sounds as

A Statistical and Spectral Model for Representing Noisy Sounds....

11

random processes Xk. They are represented by a fixed sum of sinusoids whose amplitudes an are fixed and whose phases Φn and frequencies Fn are random variables. Phases Φn are distributed according to a uniform law in the interval [0; 2π) and frequencies Fn are distributed in a band whose width is denoted ∆F. Therefore, (4) defines the CNSS model.

Short-time Frames Practically, the sound is analyzed and synthesized by overlapping and adding two (or more) temporal frames. Each frame is defined by different sets of parameters. This approach does not appear in the definition of the thermal noise model. However, it can be justified. Indeed, as previously seen, the number of sinusoids depends on the size of the synthesis window. This number can be reduced by considering successive short-time signals: the duration is too short for ears to perceive the low spectral density. Furthermore, real-time synthesis requires successive short-time windows in order to enable modifications of the parameters from frame to frame.

Parameters The CNSS model represents sounds by analyzing successive temporal frames. Each frame is modeled by many mathematical parameters. The duration of these frames is denoted by Ws. It is a positive integer and is expressed in samples. Concerning the distribution of frequencies, M bins (M ≤ N), equally spaced, divide the frequency bandwidth. In each successive frame, N frequency values are drawn into these bins from a uniform distribution. These parameters are illustrated by Figure 2 and are detailed below.

Bandwidth The signal represented is supposed to be a band of noise. One of the parameters of the CNSS model is the width of this band. It defines the interval of the probability density function of the frequencies. It is denoted by ∆F and is defined by a maximum frequency Fmax and a minimum frequency Fmin: (10) Since we assume ∆F > 0, we also assume Fmin < Fmax.

This parameter is obviously constrained by the resolution of the auditory system (20 − 20 000 Hz). However, due to the Nyquist criteria (sample rate Fs = 44 100 Hz), the interval is [0; Fs/2 = 22 050] Hz. It also corresponds

12

Modeling in Mathematics

to the interval implicitly considered when an inverse Fourier transform is computed.

Number of bins In order to describe the probability density function of frequencies, we propose to define bins, whose sizes are constant, covering the entire bandwidth ∆F. The number of bins is a parameter and is denoted by M. Each bin, denoted by Bi (i ∈ [0; M − 1]), defines an interval of the bandwidth ∆F. The width of every bin, denoted by ∆B, is constant: The interval IBi defined by each bin Bi is

(11)

The number of bins is positive and is not bounded:

(12)

(13) Inside each bin, at most one frequency is randomly chosen according to a distribution law defined by the other parameters N and L.

Number of sinusoids The number of sinusoids, denoted by N, appears in (4) of the CNSS model. This number is linked to the number of bins M. It is not possible to define more than one frequency in the same bin. At the opposite extreme, at least one sinusoid composes the signal: (14) As previously seen in Section 3.2, the influence of the number of sinusoids is not perfectly understood yet. It is linked to the duration of the synthesis frames [20]. Nevertheless, we present approximations about the linear variations of this number as a function of the bandwidth and the duration. If N equally spaced frequencies are defined in a band whose width is Fs/2 Hz, the difference between successive frequencies is Fs/2N Hz. In order to be perceived, the minimum duration is 2N/Fs seconds, which corresponds to 2N samples. Therefore, Ws/2 sinusoids have to be used to define a noise with maximum spectral density. It is important to note that this value is the number of sinusoids used when computing an inverse Fourier transform of size Ws. The usual technique for the synthesis of the stochastic part of hybrid

A Statistical and Spectral Model for Representing Noisy Sounds....

13

models [3] requires a number of sinusoids corresponding to the maximum spectral density. Therefore, filtered white noises that can be synthesized applying this technique are always characterized by a maximum spectral density. In the case of white noise (bandwidth Fs/2 Hz), the maximum number is half the synthesis frame duration Ws: (15) As a conclusion, a band of noise defined by a width ∆F, a duration Ws, and a maximum spectral density, is represented by (16) The number N of sinusoids is thus defined in the interval [0;∆F · Ws/Fs].

Width of the PDF of frequencies Inside each selected bin, one frequency is randomly determined according to a uniform law. One parameter, denoted by L, defines the relative width of this law. Its value is in the interval [0; 1]. When it is null, the probability density function is a delta function, and the frequency is the upper boundary of the bin. In the bin Bi, the frequency would be Fmin+(i+1)((Fmax − Fmin)/M). At the opposite extreme, if the parameter L is 1, the probability density function is a rectangular function: all the frequencies have the same probability to be chosen. In the case when the number of sinusoids equals the number of bins N = M, we write

(17) The probability density function, denoted by ρ and associated to the bin Bi, is, for L 0,



(18)

14

Modeling in Mathematics

This parameter defines the regularity of the differences between the frequencies composing modeled sounds. For example, when L = 0 and N = M, all frequencies are equally spaced: (19)

Width of phase PDF Thermal noise model defines phases of each sinusoid composing sounds as random variables distributed in the interval [0; 2π) according to a uniform law. The CNSS model allows the modification of this law by limiting the interval [0; 2π). The relative width of the probability density function is a real number in the interval [0; 1] and is denoted by P. When it is null, the probability density function is a delta function and all the phases are the same. This results in a intensity peak occurring at periodic times and depending on the duration of the frames. When this parameter P is 1, the phases are uniformly distributed according to the thermal noise model. By considering the phases of the sinusoids at the time t0, the relation between the parameter P and these phases is

(20)

We thus write the phases of sinusoids at the time t = 0, as a function of their frequency Fi: (21)

Color The color is a parameter already used in other spectral representations of sounds (SMS [6], STN [14], etc.) and refers to the spectral envelope. The SAS model [7] also introduces this parameter. Its name is due to the analogy between audible and visible spectra [8]. In the CNSS model, the color is defined by smoothed spectral envelopes. It is denoted by C and represents the variations of the amplitude as a function of the frequency. It is theoretically a continuous function, but it is modeled as a finite number of points. This representation allows manipulations that are more intuitive than the manipulations of filters [3]. Here, the main point is the independence between the spectral envelope and the spectral density. Existing models only consider spectral envelope, and the information related

A Statistical and Spectral Model for Representing Noisy Sounds....

15

to the spectral density is contained in the spectral envelope or is not taken into account. The CNSS model allows independent manipulations of the spectral envelope and the spectral density.

Generalization of the Filtered White Noise Models The CNSS model is essentially a generalization of the filtered white noise models. The mathematical parameters of the model enables control of the frequency distribution. Nevertheless, it is possible to define frequencies of sinusoids as fixed values, according to the existing models. By choosing a band of frequency whose width is half of the sample rate (∆F = Fs/2 = 22 050 Hz), with the number of frequencies N equal to the number of bins M and with the relative width L null, the frequencies are no longer random variables. They also are equally spaced: (22) When the number of frequencies is half of the length of the frame, it is equivalent to an inverse Fourier transform.

(23)

ANALYSIS The CNSS model represents noisy sounds with two perceptual parameters: the spectral density and the spectral envelope. The analysis stage consists of approximating these two parameters and estimating the related mathematical parameters that are described in the previous section.

Approximation of the Spectral Density As previously seen, psychoacoustic experiments show that energy gaps in the spectrum of noisy sounds are perceptually relevant. These energy gaps are related to intensity fluctuations [21]. We have proposed an original method [25] to analyze these properties. It is based on the statistical study of these fluctuations.

Limitations of the Usual Techniques The study of the energy distribution is based on the use of the short-time Fourier transform (STFT) [26]. Two main limitations lead us to choose

16

Modeling in Mathematics

another way. The first limitation concerns the resolution of this discrete transform and the usual problem of the tradeoff of time versus frequency. The second one is related to the analysis algorithm. One basic idea would be to detect gaps in the amplitude spectrum. But the determination of thresholds is needed, and this determination must rely on psychoacoustic research. Furthermore, approximations of the short-time Fourier transform lead to amplitude gaps that are due to the analysis windows applied to the sound [27]. For these reasons, we applied another method based on the statistical analysis of the intensity fluctuations.

Statistical analysis of the intensity fluctuations The intensity fluctuations have been studied and modeled in order to explain the ability for humans to discriminate noises with different spectral densities [18]. Another theoretical study of these intensity fluctuations leads to comparable results [28, 29]. We relate the variance of the envelope power of any signal to the number of sinusoidal components composing this signal. We define VNEP as the ratio of this variance to the average envelope power: (24) We consider a narrow frequency band. This condition allows us to assume that the amplitudes of the sinusoidal components that compose the signal within this band are equal. We show that VNEP is directly linked to the variations of sinusoidal components of the analyzed signal. In this case, the theoretical relation between the measure VNEP and the number of sinusoidal components is

(25) The method we propose consists of producing several values obtained by successively computing the measure VNEP on filtered signal. The consecutive calculations of VNEP lead to an approximation of the intensity fluctuations and thus to the presence of energy gaps. Indeed, if the analyzed band is composed of noise that is modeled by several sinusoids (N >> 1), the measure VNEP is high. At the opposite end, if the band is composed of a very few sinusoidal components N ≈ 1, the measure VNEP becomes low.

A Statistical and Spectral Model for Representing Noisy Sounds....

17

The analysis method consists of the following operations. •

Bandpass filtering: this first stage is basic and consists of bandpass filtering the original sound in order to generate signals for the estimation of the intensity fluctuations. • Calculation of the measure VNEP: the measure of VNEP is done using the envelope power of the signal, as given in (25). • Thresholding: once the values of VNEP have been computed, the next stage consists of counting the number N of values of VNEP inside a frequency band which are below the selected threshold th. This threshold th is one parameter of this analysis method. After this stage, a number N is associated to each frequency value F, center of each studied frequency band. • Maximization: an iteration on the width of the frequency band used to compute VNEP leads to the maximum value of N . This maximum is assumed to be the difference between two sinusoids inside the analyzed band or, to say it differently, the size of the spectral gap in the analyzed band [20]. This method leads to an approximation of the differences between sinusoids as a function of the frequency. Several experimental examples are shown in [20].

Figure 3: Principle of the analysis of the statistical parameters of the model.

Assumptions The method relies on two main assumptions. The first one concerns the variations of the spectral envelope. The analysis stage of this spectral envelope consists of smoothing it using lowpass filter and compressing it. For this reason, we assume that in a narrow frequency band that the spectral envelope does not vary enough to introduce variations for the values of VNEP inside this band. The spectral envelope can be assumed as constant.

18

Modeling in Mathematics

Nevertheless, this assumption obviously depends on the width of the frequency band used. Choosing the width too large would result in variations of the spectral envelope and thus in errors concerning the estimation of VNEP.

The second assumption concerns the approximation of the spectral density of the frequency bands studied. We assume that the spectral density is constant in this frequency band. Of course, the validity of this assumption depends on the width of the frequency bands studied: the larger this band, the weaker the probability for the spectral density to be constant.

Extraction of the Parameters of the Model We represent the distribution of the frequencies with the parameters N, M, and L, where N is the number of sinusoids, M is the number of bins, and L is the width of the PDF of frequencies. We estimate these parameters by analyzing the approximation of the spectral density using the previously discussed method. Therefore, this stage consists of linking the approximation of the spectral density to these mathematical parameters. The different steps of this part of the analysis algorithm are illustrated in Figure 3. The first part consists of estimating the number of bins M because it is directly related to the maximum of the probability density function of the difference between frequencies. The second part tests if the number of frequencies N is different from the number of bins M. If it is different, this number of sinusoids is estimated. Then the width of the probability density function inside each bin is approximated.

Estimation of the Probability Density Function of the Frequency Difference We denote by q the probability density function of the difference between two successive frequencies (or the width of a spectral gap). This function is obtained from the results of the approximation of the variations of the spectral density as a function of frequency. These variations have the same characteristics as the probability density function q. The properties of this function give the estimation of the parameters of the CNSS model.

Estimation of the Number of Bins M Figure 4 shows an experimental probability density function of the difference between two successive frequencies. It has been computed with a high number (10000) of outcomes of frequency drawings. Different values

A Statistical and Spectral Model for Representing Noisy Sounds....

19

of M have been chosen. This figure shows that the most probable value corresponds to the value ∆F/M. The number of bins is thus directly linked to the most probable difference between two successive frequencies:

(26)

Equality between N and M Once the number of bins has been determined, the analysis method uses the symmetry of the cumulative function extracted. Indeed, theory shows that the probability density function is symmetric around the value ∆F/M in the case of N = M, as opposed to the case M>N. These two cases are represented in Figure 4. The algorithm that is proposed to determine whether the number of frequencies N and the number of bins M are the same, tests the maximum value of q that is not zero. We denote this value by qm. If N = M, this value is slightly less than twice the number of bins M. Otherwise, the number of frequencies N is less than the number of bins M:

(27)

Estimation of the Number of Frequencies N In the case when N is different from M, an algorithm is proposed to estimate the number of frequencies needed. This algorithm calculates the number of q which is greater than a fixed frequency difference. Indeed if M is greater than N, the number of wide intervals between neighbor frequencies becomes higher. For a fixed number of bins M, the higher the number of frequencies, the higher this number of wide intervals. However, this method is slightly more complex than the ones used for the determination of M and the equality between M and N, because it requires a calibration stage [20]. Indeed, this calibration stage is necessary because the number of wide intervals detected between neighbor frequencies depends on the analysis parameters.

20

Modeling in Mathematics

Figure 4: Experimental illustration (∆F = Fs/2 Hz) of the probability density function of the difference between two successive frequencies in the cases of N = 128 and M = 128 (right), M = 171 (center), and M = 256 (left), and in the cases of N = 171 and M = 256 (dotted lines). The maximum is dFmax = ∆F/M (resp., 2, 3, and 4 bins, which are equivalent to 86 Hz, 129 Hz, and 172 Hz). Only the curve corresponding to the case N = M is symmetric.

Figure 5: Synthesis block diagram.

Estimation of the Harmonicity Coefficient L The parameter L is correlated to the harmonicity of the sound and thus to its periodicity. A low value L (near 0) imposes a fixed difference between successive frequencies whereas a high value (near 1) implies a distribution of the differences between 0 and ∆F/M. For this reason, we compute the autocorrelation function to extract the value of L from the analyzed sound. We measure the ratio of the second maximum to the first point (zero-lag

A Statistical and Spectral Model for Representing Noisy Sounds....

21

peak) of the autocorrelation function (which is the total energy of the signal and the maximum). This ratio is used as a discrimination function of the voiced and unvoiced sounds for speech [30].

Extraction of the Spectral Envelope The spectral envelope (also named the color [7]) is estimated using the same process used in the other spectral models [3]. A short-time Fourier transform is performed and the classical methods usually applied to residual (spline interpolation, line-segment approximation, etc.) can be computed to find a function that matches the amplitude spectrum. The CNSS model needs an adapted analysis method to be able to estimate the spectral density of natural noisy sounds. The limitations of the shorttime Fourier transform necessitate the use of new approaches. The proposed method has been successfully tested on synthetic and natural sounds [20, 25]. It is difficult to compare its accuracy because, to our knowledge, there are no comparable alternatives. The technique proposed is still in experimentation and will certainly be improved in the future. But, for now, it is the only method that allows the estimation of the spectral density of the sinusoidal components and enables the extraction of the CNSS parameters.

SYNTHESIS In this part, we present the algorithms used to synthesize sounds from the statistical and the mathematical parameters. The first part consists of generating the oscillators for each successive frame. The frequency, amplitude, and phase of each sinusoid are computed using the model parameters. Then the temporal samples are generated in each successive frame to produce the synthesized sound. Figure 5 shows the general diagram for the synthesis.

Determination of Sinusoids Frequencies The frequency values of the sinusoidal components of each frame have to be computed from the statistical parameters. The first step consists in defining M bins (denoted by Bi) from the bandwidth values:

22

Modeling in Mathematics

(28) Then N frequencies are determined from these M bins. N bins have to be drawn from the M possibilities. A statistically correct algorithm to choose these bins is the classic algorithm to randomly define a permutation. One bin i is randomly drawn, then bins M − 1 and i are interchanged. Another bin is randomly chosen from the M − 1 last bins. This algorithm repeats until all N bins are chosen. This algorithm has a large cost if M is large compared to N because one large array has to be initialized and manipulated in each frame. But experiments show that defining M more than 100 times N amounts to a uniform draw of frequencies: (29) We could consider the special case where there are the same number of frequency bins as sinusoids (N = M). The calculation would be more efficient in that case, because we could directly associate sinusoid i with bin Bi (i ∈ {0, ... , N− 1}). But we know that most of the time spent in the synthesis is spent in partial synthesis, so improving the algorithm for that special case may not pay off enough. Once the bins have been chosen, frequency values have to be determined from the parameter L. Another uniform draw is made in a band which is defined by the upper bound of the bin Bj (j ∈ {0, ... , M − 1}) and whose length is L multiplied by the bin length (Fmax − Fmin)/M:

(30)

Therefore, the following operations have to be done in sequence for each frame of the temporal signal. (1) Define an array b of integers [0, M − 1]. (2) For n ∈ [0, N − 1],

(a) draw an integer k from the last M − n, (b) draw a real r in [0; L], (c) calculate fn = Fmin + ((1 − r) + b[k]) ∗ (Fmax + Fmin)/M, (d) set b[k] = b[M − n − 1].

A Statistical and Spectral Model for Representing Noisy Sounds....

23

Determination of Phases The model of thermal noise described in [19] imposes on each component its phase to be uniformly distributed:

(31)

However, noise synthesized with sinusoids with equal phases results in intensity peaks. These peaks can be periodic depending on the length of the synthesis window. Such noises are described as impulsive noises. By changing the width of the probability density function of phases, users can control the amplitude of these peaks. The proposed synthesis model introduces a new parameter by controlling the relative width P (P ∈ , 0 ≤ P ≤ 1) of the probability density function of the phase:

(32)

Determination of Amplitudes The amplitudes are simply defined from the frequency values and the spectral envelope by linearly interpolating the smooth spectral envelope extracted from the synthesis model. However, other types of interpolation (splines, LPC, etc.) are possible.

Additive Synthesis of Frames Once the frequency, amplitude, and phase values are calculated, temporal samples are generated with additive synthesis. An efficient algorithm is presented in [31]. This algorithm can generate approximatively 2 partials per sample for each MHz of CPU clock speed. The algorithms we present have been implemented to create a realtime sound synthesizer. The most CPU consumption is in the case of white noise (or filtered white noise). Synthesizing sounds with more sinusoidal components is useless, because the difference cannot be heard by increasing N. For this reason, we define a maximum value for N depending on the synthesis window size. N cannot be greater than half of the synthesis window size (Ws in samples). This limit corresponds to the inverse Fourier transform technique [32, 33, 34, 35]:

(33)

24

Modeling in Mathematics

OLA of Frames Spectral synthesis techniques often use the overlap-add method. The resulting temporal signal does not taper to 0 at the boundaries of each frame because of the random values of phase spectrum. This may also be the case when analyzed sounds are transformed. This is the reason why the synthesis method uses the OLA technique. But in the case of noise synthesis, both experiments and theory show that this method introduces intensity fluctuations which result in audible artifacts [36]. Indeed, the statistical moments are not preserved. We have proposed new methods to avoid these variations. We next describe a method which involves time shifting the sinusoids.

Figure 6: Spectrographic plots of sounds: (a) an original whispered French sentence; (b) sentence analyzed then synthesized using the CNSS model. This method is applied to N sinusoids (denoted by sn) with random phase. It consists in shifting the start of each sinusoid in each frame in order to distribute the intensity variations introduced by the weighting windows. Thus each component starting time (dn with n ∈ {0, ... , N − 1}) is set to different values before being multiplied by the weighting window. The resulting signal x’ can be written as (34) where sn are sinusoids. By choosing dn equally spaced over the half window, one can show that this sum of sinusoids leads to noise with constant statistical properties. There are many ways to determine the offset values. For example, they can be randomly drawn according to a uniform distribution. But this method may lead to artifacts because many offsets may have the same value, which introduces variance fluctuations [36]. To avoid these probabilities, we prefer choosing these values by dividing the half window in bins. As a conclusion, the following operations have to be done in sequence for each partial of each

A Statistical and Spectral Model for Representing Noisy Sounds....

25

frame of the temporal signal: • • • • •

draw an offset off, synthesize the current partial, multiply it by weighting window, offset output buffer with off, add partial buffer to the output buffer.

APPLICATION AND CONCLUSION This noise model is still being tested, especially with respect to the analysis method. We now present applications and details about the implementation.

Implementation In order to test the real-time capabilities of the model, we developed the synthesis part of the model on one of the existing free software tools for real-time audio. The objective was to control all the synthesis parameters as fast as possible, while the sound is rendered. The first target was jMax (see http://www.ircam.fr/jmax) because we have used it with the SAS sound model [7] successfully for a long time. The libcnss library and its jMax extension are free software developed on the GNU/Linux platform. They are available at http://scrime.labri.fr.

Applications We can consider two uses for the CNSS model. The first one is the synthesis by using composer-specified control parameters. The synthesis-based applications do not apply to the analysis process. They consists in changing parameters in real time in order to modify synthesized synthetic sounds. This approach will be very useful to understand the perception of noisy sounds. Applications for this noise model have been developed. One of the major application is the pedagogic tool Dolabip [37]. The application uses the two sound models SAS and CNSS to help children to understand sound phenomena. The second use for the CNSS model is the analysis followed by synthesis, with or without modification. Parameters of the CNSS model are extracted from natural noisy sounds. The main interest is to be able to perform original transformations on analyzed sounds. Users can then modify analyzed before resynthesizing transformed original sounds. Figure 6 shows an example of

Modeling in Mathematics

26

whispered voice that is analyzed and then resynthesized. The transformations allowed by the CNSS model are perceptually and musically relevant. •

Time scaling. In [38] we presented a method to perform time transformations without changing the statistical properties of noises. The first experimentations we have done show the limitations of the analysis methods. We hope to considerably improve these results by further development of better analysis methods. • Spectral density transformations. A key original aspect of the model we have developed is the ability to control of the spectral density by modifying parameters such as the number of sinusoids and the distribution of these sinusoids. • Harmonicity. For sounds with low spectral density, users can control the difference between successive frequencies and thus the periodicity of the temporal envelope. This characteristic is perceptually relevant. • Color. As in existing spectral models, the spectral envelope can be modified. Another application is a musical tool for electroacoustic composers. Composers use the software developed under jMax based on the CNSS model to synthesize original sounds which can be incorporated in musical pieces.

Future Work The approach we described here is original because we analyze a new parameter, the spectral density, which has been experimentally determined to be perceptually essential for noises. The analysis method is very complex and the approach we present can certainly be improved in the future. But it can already permit the development of psychoacoustic experiments on the perception of the spectral density, which is not completely understood. The results of these experiments and, in particular, the resolution of the human auditory system will give important data to improve the model. The analysis method proposed here is limited to sounds which do not contain any transient or sinusoid whose amplitude and frequency vary slowly with time. We are developing methods in order to detect fast energy variations (transients) and stable sinusoids. Several methods for transient detection have been proposed (e.g., [39] or [40]). These methods will soon

A Statistical and Spectral Model for Representing Noisy Sounds....

27

be incorporated in the analysis stage to prevent extracting information related to transients or sinusoids, and which is now assumed as linked to the noise part of the analyzed signal. Furthermore, we limit the model to the analysis and synthesis of one band of noise. However, a polyphonic signal can be composed of several bands. Each band can be analyzed, transformed, and synthesized independently. One of the improvements of the analysis method is to be able to discriminate noisy bands which are independent: their perceptual properties (spectral density, harmonicity, etc.) may be totally different.

CONCLUSION In this paper, we propose the study of the representation of noisy sounds with short-time sinusoids. No complete justification has been proposed for this representation, whereas many models apply it implicitly. This study leads to the CNSS model, a spectral and statistical model for the analysis, the musical transformation, and the synthesis of noisy sounds. It is appropriate for representing the noisy part of natural sounds, and it allows new high-fidelity transformations (e.g., modifications of the spectral density). The quality of the classical transformations are also at least as good as transformations performed with the existing models (e.g., the timescale operations [38]). For now, the CNSS model assumes that the modeled sound does not contain any stable sinusoid and any transient. This may lead to audible artifacts in the case of transformations of complex sounds, at the contrary of models such as [12], for example. The CNSS model we have developed is still in experimentation: the values of the parameters of the model have to be refined using psychoacoustic tests. But the model already shows considerable promise for musical creation, psychoacoustic experimentation, and pedagogy. Several sound examples can be found at http://www.labri.fr/Perso/hanna/ sounds.html.

ACKNOWLEDGMENTS This research was carried out in the context of the SCRIME1 project which is funded by the DMDTS of the French Culture Ministry, the Aquitaine Regional Council, the General Council of the Gironde Department, and IDDAC of the Gironde Department. SCRIME project is the result of a cooperation convention between the Conservatoire National de Region ´ of Bordeaux, ENSEIRB (School of Electronic and Computer Science Engineers), and the University of Sciences of Bordeaux. It is composed of

28

Modeling in Mathematics

electroacoustic music composers and scientific researchers. It is managed by the LaBRI (Laboratory of Computer Science of Bordeaux). Its main missions are research and creation, diffusion, and pedagogy.

A Statistical and Spectral Model for Representing Noisy Sounds....

29

REFERENCES 1. 2.

P. Schaeffer, Trait´e des objets musicaux, Seuil, 1966. B. Edler and H. Purnhagen, “Parametric audio coding,” in Proc. 5th International Conference on Signal Processing (WCCC-ICSP ’00), vol. 1, pp. 21–24, Beijing, China, August 2000. 3. X. Serra and J. Smith, “Spectral modeling synthesis: a sound analysis/synthesis system based on a deterministic plus stochastic decomposition,” Computer Music Journal, vol. 14, no. 4, pp. 12–24, 1990. 4. M. Goodwin, “Residual modeling in music analysissynthesis,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’96), vol. 2, pp. 1005–1008, Atlanta, Ga, USA, May 1996. 5. F. R. Moore, Elements for Computer Music, Prentice Hall, Englewood Cliffs, NJ, USA, 1990. 6. X. Serra, “Musical sound modeling with sinusoids plus noise,” in Musical Signal Processing, pp. 91–122, Roads Swets and Zeitlinger, Lisse, The Netherlands, 1997. 7. S. Marchand, Sound models for computer music: analysis, transformation, synthesis of musical sound, Ph.D. thesis, LaBRI, Universite Bordeaux I, Talence, France, 2000. ´ 8. M. Desainte-Catherine and S. Marchand, “Structured additive synthesis: towards a model of sound timbre and electroacoustic music forms,” in Proc. International Computer Music Conference (ICMC ’99), pp. 260–263, Beijing, China, October 1999. 9. H. Purnhagen, “Advances in parametric audio coding,” in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA ’99), pp. 31–34, New Paltz, NY, USA, October 1999. 10. E. B. George and M. J. T. Smith, “Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model,” IEEE Trans. Speech Audio Processing, vol. 5, no. 5, pp. 389– 406, 1997. 11. M. W. Macon and M. A. Clements, “Sinusoidal modeling and modification of unvoiced speech,” IEEE Trans. Speech Audio Processing, vol. 5, no. 6, pp. 557–560, 1997. 12. K. Fitz and L. Haken, “ Bandwidth enhanced sinusoidal modeling in lemur,” in Proc. International Computer Music Conference (ICMC ’95), pp. 154–157, Banff Centre, Alberta, Canada, 1995.

30

Modeling in Mathematics

13. X. Serra, A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition, Ph.D. thesis, CCRMA, Stanford University, Stanford, Calif, USA, 1989. 14. T. S. Verma and T. H. Y. Meng, “Extending spectral modeling synthesis with transient modeling synthesis,” Computer Music Journal, vol. 24, no. 2, pp. 47–59, 2000. 15. S. Levine, Audio representations for data compression and compressed domain processing, Ph.D. thesis, CCRMA, Stanford University, Stanford, Calif, USA, 1998. 16. H. Purnhagen and N. Meine, “HILN-the MPEG-4 parametric audio coding tools,” in Proc. IEEE Int. Symp. Circuits and Systems (ISCAS ’00), vol. 3, pp. 201–204, Geneva, Switzerland, May 2000. 17. A. Gerzso, “Density of spectral components: preliminary experiments,” report, Ircam, 1978. 18. W. M. Hartmann, S. McAdams, A. Gerzso, and P. Boulez, “Discrimination of spectral density,” Journal of Acoustical Society of America, vol. 79, no. 6, pp. 1915–1925, 1986. 19. W. M. Hartmann, Signals, Sound, and Sensation, Modern Acoustics and Signal Processing, AIP Press, New York, NY, USA, 1997. 20. P. Hanna, Mod´elisation statistique de sons bruit´es: ´etude de la densit´e spectrale, analyse, transformation musicale et synth`ese, Ph.D. thesis, LaBRI, Universite Bordeaux I, Talence, France, ´ 2003, http:// www.labri.fr/Perso/∼hanna/these.html. 21. W. M. Hartmann, “Temporal fluctuations and the discrimination of spectrally dense signals by human listeners,” in Auditory Processing of Complex Sounds, W. A. Yost and C. S. Watson, Eds., pp. 126–135, Erlbaum, Hillsdale, NJ, USA, 1987. 22. E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models, SpringerVerlag, New York, NY, USA, 1999. 23. J. Pierce, “Introduction to pitch perception,” in Music, Cognition, and Computerized Sound, P. R. Cook, Ed., chapter 5, pp. 57–70, MIT Press, Cambridge, Mass, USA, 1999. 24. S. J. Orfanidis, Introduction to Signal Processing, Prentice Hall, Upper Saddle River, NJ, USA, 1996. 25. P. Hanna and M. Desainte-Catherine, “Analysis method to approximate the spectral density of noises,” in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA ’03), pp. 201–

A Statistical and Spectral Model for Representing Noisy Sounds....

26.

27. 28.

29.

30.

31.

32.

33.

34.

35.

36.

31

204, New Paltz, NY, USA, October 2003, Institute of Electrical and Electronics Engineers (IEEE). J. Allen, “Short term spectral analysis, synthesis, and modification by discrete Fourier transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 25, no. 3, pp. 235–238, 1977. A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, Prentice Hall, Englewood Cliffs, NJ, USA, 1989. P. Hanna and M. Desainte-Catherine, “Detection of sinusoidal components in sounds using statistical analysis of intensity fluctuations ,” in Proc. International Computer Music Conference (ICMC ’02), pp. 100–103, Goteborg, Sweden, ¨ September 2002. P. Hanna and M. Desainte-Catherine, “Using statistical analysis of the intensity fluctuations to detect sinusoids in noisy signals,” Tech. Rep., LaBRI, University of Bordeaux 1, Talence, France, 2003, http://www. labri.fr/Labri/Publications/ Rapports-internes/. A. Zolnay, R. Schluter, and H. Ney, “Extraction methods of voicing feature for robust speech recognition,” in Proc. European Conference on Speech Communication and Technology (EUROSPEECH ’03), vol. 1, pp. 497–500, Geneva, Switzerland, September 2003. R. Strandh and S. Marchand, “Real-time generation of sound from parameters of additive synthesis,” in Proc. Journ´ees d’Informatique Musicale (JIM ’99), pp. 83–88, Paris, France, May 1999. R. J. McAulay and T. F. Quatieri, “Speech analysis/synthesis based on a sinusoidal representation,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 34, no. 4, pp. 744–754, 1986. R. J. McAulay and T. F. Quatieri, “Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’88), vol. 1, pp. 370–373, New York, NY, USA, April 1988. M. Tabei and M. Ueda, “FFT multi-frequency synthesizer,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’88), vol. 3, pp. 1431–1434, New York, NY, USA, April 1988. X. Rodet and P. Depalle, “A new additive synthesis method using inverse Fourier transform and spectral envelopes,” in Proc. International Computer Music Conference (ICMC ’92), pp. 410–411, San Jose, Calif, USA, October 1992. P. Hanna and M. Desainte-Catherine, “Adapting the overlapadd

32

37.

38.

39.

40.

Modeling in Mathematics

method to the synthesis of noise,” in Proc. International Conference on Digital Audio Effects (DAFx ’02), pp. 101–104, University of the Federal Armed Forces, Hamburg, Germany, September 2002. M. Desainte-Catherine, G. Kurtag, S. Marchand, C. Semal, and P. Hanna, “Playing with sounds as playing video games,” Computers in Entertainment, vol. 2, no. 2, pp. 16–38, 2004. P. Hanna and M. Desainte-Catherine, “Time scale modification of noises using a spectral and statistical model,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’03), vol. 6, pp. 181–184, Hong Kong, China, April 2003, Institute of Electrical and Electronics Engineers (IEEE). P. Masri and A. Bateman, “Improved modelling of attack transients in music analysis-resynthesis,” in Proc. International Computer Music Conference (ICMC ’96), pp. 100–103, Hong Kong, China, August 1996. X. Rodet and F. Jaillet, “Detection and modeling of fast attack transients,” in Proc. International Computer Music Conference (ICMC ’01), pp. 30–33, Havana, Cuba, September 2001.

CHAPTER 2

Robust Bayesian Regularized Estimation Based on Regression Model

Zean Li1 and Weihua Zhao2 School of Computer Science and Technology, Nantong University, Nantong 226019, China

1

School of Science, Nantong University, Nantong 226019, China

2

ABSTRACT The t distribution is a useful extension of the normal distribution, which can be used for statistical modeling of data sets with heavy tails, and provides robust estimation. In this paper, in view of the advantages of Bayesian analysis, we propose a new robust coefficient estimation and variable selection method based on Bayesian adaptive Lasso t regression. A Gibbs sampler is developed based on the Bayesian hierarchical model framework, where we treat the t distribution as a mixture of normal and gamma distributions and put different penalization parameters for different regression coefficients. We also consider the Bayesian t regression with Citation (APA): Li, Z., & Zhao, W. (2015). Robust Bayesian Regularized Estimation Based on Regression Model. Journal of Probability and Statistics, 2015. (9 pages). DOI: http:// dx.doi.org/10.1155/2015/989412 Copyright: 2015 Zean Li and Weihua Zhao. This is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

34

Modeling in Mathematics

adaptive group Lasso and obtain the Gibbs sampler from the posterior distributions. Both simulation studies and real data example show that our method performs well compared with other existing methods when the error distribution has heavy tails and/or outliers.

INTRODUCTION Following the pioneer work of Tibshirani [1], the Lasso (least absolute shrinkage and selection operator) has generated much interest in statistical literatures [2–5]. For the Gaussian linear regression,

(1)

where x𝑖’s are either fixed or random covariates, 𝜖1,...,𝜖𝑛, i.i.d. ∼ 𝑁(0, 𝜎2 ). The Lasso estimates are viewed as 𝐿1 regularized least squares estimates, which are defined as

(2)

for 𝜆≥0 and with 𝑛×1 vectors y = (𝑦1,...,𝑦𝑛) , 𝑛×𝑝 matrix 𝑥 = (𝑥1,...,𝑥𝑛) 𝑇, and 𝑝×1 vector 𝛽. The key advantage of Lasso lies in its ability to do simultaneous parameter estimation and variable selection and the entire path of Lasso estimates for all values of 𝜆 can be efficiently computed through a modification of the LARS (least angle regression) algorithm of Efron et al. [5]. 𝑇

The Gaussian assumption is not crucial in model (1), but it is useful to make connections to the likelihood framework for regularized 𝑡 regression in Section 2. The Lasso estimator in (2) is equivalent to minimizing the penalized negative loglikelihood 𝑙(𝛽; y) as a function of the regression coefficients 𝛽 and using the 𝐿1-penalty ‖𝛽‖ = : equivalence means here that we obtain the same estimator for a potentially different tuning parameter, but the Lasso estimator in (2) does not provide an estimate of the variance parameter 𝜎2 . In fact, the variance parameter 𝜎2 plays an important role especially when the error in the regression model has high variance.

On the other hand, the Lasso estimate in (2) based on least square loss may suffer from poor performance when the error distribution has heavy tail which is longer than normal distribution or the error has large variance or contains outliers in the linear regression model, which motivate many researches to consider regularized estimation based on robust method. The most existing

Robust Bayesian Regularized Estimation Based on Regression Model

35

robust regularized estimation methods mainly replace the least square loss function in (2) by some robust loss functions, such as Huber loss, L1 loss, and quantile loss function. Li and Zhu [6] considered quantile regression with the Lasso penalty and developed its piecewise linear solution path. Wang et al. [7] investigated the least absolute deviance (LAD) estimate with adaptive Lasso penalty (LAD-Lasso) and proved its oracle property. Wu and Liu [8] further discussed the oracle properties of the SCAD (smoothly clipped absolute deviation) and adaptive Lasso regularized quantile regression. Zou and Yuan [9] proposed a new efficient estimation and variable selection procedure which is called composite quantile regression. Chen et al. [10] studied the robust Lasso in the presence of noise with large variance. They considered two editions of robust Lasso, which use the convex combined loss and Huber loss function criterion instead of least square criterion. Besides the above existing robust regularized method, which, in essence, replace the L2 loss by other loss criteria, we can also directly consider robust estimation from the standpoint of error distribution in model (1). It is well known that when the error in the linear regression follows a normal distribution, the estimations will be sensitive to outliers, which motivate us to consider using robust error distribution in the linear regression model in (1). The t distribution as an alternative to the normal distribution has frequently been suggested in the literature as a robust extension of the traditional normal models. The t distribution provides a convenient description for regression analysis when the residual term has density with heavy tails and excess kurtosis. Since the Lasso estimate in (2) can be interpreted as a Bayesian posterior mode estimate when the regression parameters have independent Laplace priors, motivated by this connection, Park and Casella [11] discussed the Bayesian Lasso estimation for linear regression based on Gibbs sampler method. The Bayesian Lasso has several advantages; for example, it can conveniently provide interval estimates, that is, Bayesian credible intervals, that can guide variable selection, the structure of the hierarchical model provides methods for selecting the Lasso parameter, and sometimes its performances outperform non-Bayesian method. Therefore, several authors were subsequently devoted to the variable selection problem based on Bayesian method, such as Park and Casella [11], Hans [12], Li et al. [13], and Kyung et al. [14]. The main goal of this paper is to extend the Bayesian regularized method for t regression model (4), which is a robust edition Bayesian regularized estimation method, and it is expected that the performances will

36

Modeling in Mathematics

be better than other regularized methods when the error distribution follows a symmetry distribution and there are outliers in regression model. The rest of the paper is organized as follows. In Section 2, we introduce the t regression with Lasso penalty. In Section 3, we build the framework of Bayesian t regression model with adaptive Lasso penalty and then show the corresponding Bayesian hierarchical model and obtain the Gibbs sample estimation procedure. In Sections 4 and 5, we carry out some simulation studies and real data analysis to illustrate the performance of our new method. Some conclusion and remarks are contained in Section 6. We also discuss the Bayesian t regression with group Lasso penalty and its Gibbs sampler method in the Appendices.

t REGRESSION MODEL AND L1 NORM REGULARIZATION t Regression Model The 𝑡 distribution as an alternative to the normal distribution has frequently been suggested in the literatures; for example, Lange et al. [15] introduced 𝑡 distributed errors regression models as a robust extension of the traditional normal models; Liu and Rubin [16] discussed the estimations of 𝑡 distribution by EM algorithm and its extensions; and Lin et al. [17] studied heteroscedasticity diagnostics problem by score test for 𝑡 linear regression models. And it has often been used in many real applications, especially in finance. The probability density function of a 𝑡 distributed variable 𝑦 with location parameter 𝜇, dispersion parameter 𝜎, and degree of freedom V, denoted as 𝑡(𝜇, 𝜎2 , V), is

(3) where Γ(⋅) denotes the gamma function, 𝑦, 𝜇 ∈ , and 𝜎>0. The mean and variance of 𝑦 are 𝜇 (V > 1) and V𝜎2 /(V − 2) (V > 2), respectively. When V = ∞ and 1, the 𝑡 distribution reduces to the normal and Cauchy distributions, respectively. The univariate linear Student-𝑡 regression model can be expressed as

Robust Bayesian Regularized Estimation Based on Regression Model

37

(4) where the x𝑖’s are known 𝑝-vector of covariates and 𝛽 = (𝛽1,...,𝛽𝑝) 𝑇 are the unknown 𝑝-vectors of parameters.

t Regression with Lasso Penalty

As discussed in Section 1, the 𝐿2 loss in the Lasso model is not robust to heavytailed error distribution and/or outliers. This indicates that the Lasso is not an ideal goodness-of-fit measure criterion in the presence of noise with large variance. We consider following regularized 𝑡 regression, which is defined as (5) where 𝜃 = (𝛽 , 𝜙) and 𝑙𝑖(𝛽, 𝜎 ; 𝑦𝑖, x𝑖) is the log-likelihood function of 𝑦𝑖: 𝑇

𝑇

2

(6) Comparing (5) with (2), we take the parameter 𝜎2 into the optimization of the penalized maximum likelihood estimator. Though we are penalizing only the parameter 𝛽, the variance parameter estimate is influenced indirectly by the amount of shrinkage 𝜆 in (5). There is main drawback of the estimator in (5) [18]. It is not equivalent under scaling of the response [19]. More precisely, consider the transformation

(7) which leaves model (2) invariant, but this is not the case for the estimator in (5). We address this drawback by using the penalty term 𝜆‖𝛽‖1/𝜎, leading to the following estimator:

38

Modeling in Mathematics

(8) Noting the form of the regularized estimation in (8), the Lasso estimates can be interpreted as posterior mode estimates when the regression parameters have independent Laplace (i.e., double-exponential) priors: (9) Park and Casella [11] argued that conditioning on 𝜎2 is important because it guarantees a unimodal full posterior.

BAYESIAN FORMULATION OF ROBUST T ADAPTIVE LASSO Bayesian Adaptive Lasso t Regression It is well known that the ordinary Lasso has a conflict between the consistency for model selection and optimal prediction. As a solution to achieve both estimation and model selection consistency, the adaptive Lasso [20] was proposed and its model selection consistency and asymptotic normality under certain rate of shrinkage parameter 𝜆𝑛 were proved. To extend the idea of the adaptive Lasso [20] to our proposed robust 𝑡 Lasso regression, we define the robust 𝑡 adaptive Lasso as (10) where 𝜆𝑗’s allow unequal penalties for the coefficients and we define the precision parameter 𝜙=𝜎−2. Motivated by the connection with Bayesian estimation and (9), we consider a fully Bayesian analysis using a conditional Laplace prior on 𝛽𝑗: (11) For any 𝑎≤0, by the following equality [21]: the Laplace prior (11) on 𝛽 can be written as



(12)

Robust Bayesian Regularized Estimation Based on Regression Model

39

(13) On the other hand, from Lange et al. [15] and Liu and Rubin [16], 𝑡(𝜇, 𝜙, V) can be treated as a mixture of normal and gamma distributions; that is, if 𝑢 ∼ Γ(V/2, V/2) and 𝑧|𝑢∼ 𝑁(𝜇, (𝜙𝑢)−1), then 𝑧 ∼ 𝑡(𝜇, 𝜙, V). If we further put gamma priors on the parameter and use the improper prior density on 𝜙, then we have the following Bayesian hierarchical model:

where u = (𝑢1,...,𝑢𝑛), z = (𝑧1,...,𝑧𝑛), and s = (𝑠1,...,𝑠𝑝).

(14)

Remark 1. The amount of shrinkage parameters ’s depend on the value of hyperparameters 𝑎 and 𝑏. Because larger 𝑎 and smaller 𝑏 lead to bigger penalization, it is important to treat 𝑎 and 𝑏 as unknown parameters to avoid giving specific values which affect the estimates of the regression coefficients. Remark 2. Similar to Park and Casella [11], there is also another existing

40

Modeling in Mathematics

method to obtain the estimation of adaptive Lasso parameters beside sampling from the posterior distribution of

’s. We can use the empirical

Bayes estimation method to update ’s without setting the prior of ’s in the Bayesian hierarchical model. Our little experience shows that the two methods perform well, but the empirical Bayes estimation method can save some computing time; for more details see Park and Casella [11] and Leng et al. [22].

Gibbs Sampler for All Parameters Let 𝛽−𝑖 be the parameter vector 𝛽 excluding the component 𝛽𝑖, s−𝑗 the vector s excluding the component 𝑠𝑗, and u−𝑖 the variable excluding the component 𝑢𝑖. Based on the Bayesian hierarchical model (14), the posterior distribution of all parameters can be given by

(15) Based on the expression (15), we can obtain the posterior full conditional distribution for all parameters, which can yield a tractable and efficient Gibbs sampler that works as follows. The full conditional distribution

is

(16)

Robust Bayesian Regularized Estimation Based on Regression Model

41

Thus, the full conditional distribution of ui is a gamma distribution. The full conditional distribution

is

(17) Thus, the full conditional distribution of 𝑠𝑗 is a generalized inverse Gaussian distribution. Now we consider the full conditional distribution of 𝛽𝑗 which is given by



(18)

where is just the normal distribution is

. Then the full conditional distribution of 𝛽𝑗 . The full conditional distribution of 𝜙

(19) So the full conditional distribution of 𝜙 is a gamma distribution. The full conditional distribution of

is



(20)

That is, the full conditional distribution of is a gamma distribution. At last, the full conditional distributions of a and b are

42

Modeling in Mathematics

(21) Obviously, the full conditional distribution of 𝑏 is just a gamma distribution. However, the full conditional posterior distribution of 𝑐 does not have a closed form. Fortunately, we note that the full conditional posterior distribution of 𝑐 is a log-concave function, so the adaptive rejection sampling algorithm [23] can be used to sample from this distribution. The adaptive rejection sampling algorithm is an efficient sampling method from any univariate probability density function which is log-concave. Remark 3. The Lasso was originally designed for variable selection for the posterior mode of 𝛽𝑗’s can be exactly zeros due to the nature of the laplace prior in (11). However, the posterior draws of 𝛽𝑗’s cannot be exact zeros since 𝛽𝑗’s are drawn from the continuous posterior distribution. A post hoc thresholding rule may overcome this difficulty. Alternatively, kyung et al. [24] recommended using the credible interval on the posterior mean to guide variable selection. Remark 4. For variable selection, sometimes, several explanatory variables may be represented by a group of derived input variables. In this case the selection of important variables corresponds to the group of variables. The group Lasso penalty [13, 25] takes the group structure into account and can do variable selection at the group level. For the Bayesian 𝑡 regression model with adaptive group Lasso penalty, we give the details in the Appendices.

SIMULATION STUDIES The main goal of this section is to assess the performance of our method (BAT.L) through some simulations with comparison to several Bayesian and non-Bayesian methods. The Bayesian methods include the Bayesian Lasso (BLS.L) [11] and Bayesian adaptive Lasso (BALS.L) [22]. The non-Bayesian methods include Lasso (LASSO) [1] and adaptive Lasso

Robust Bayesian Regularized Estimation Based on Regression Model

43

(ALASSO) [20]. The data in the simulation studies are generated by

(22)

x𝑖 ∼ (0, Σ), where the covariance matrix Σ is an AR(1) matrix; that is, Σ𝑗𝑘 = 𝜌|𝑗−𝑘| with correlation 𝜌 = 0.5, and the error distributions of 𝜖𝑖 considered here are the following six situations: The first choice is a standard normal distribution, (0, 1).. 

The second choice is a normal distribution with variance 9, (0, 32 ). The third choice is a 𝑡 distribution with degree of freedom V = 3, (3).  The fourth choice is Laplace distribution, Lp(0, 1). 

The last two choices are the mixture of two normal distributions: 0.8(0, 1)+0.2𝑁(0, 62 ) and the mixture of two Laplace distributions: 0.9Lp(0, 1) + 0.1Lp(0, 5). For each simulation study and each choice of the error distribution, we run 50 simulations. In each simulation, we generate a training set with 20 observations, a validation set with 20 observations, and a testing set with 200 observations. The validation set is used to choose the penalty parameters in LASSO and ALASSO. After the penalty parameters are selected, we combine the training set and validation set together to estimate 𝛽. Since BAT.L, BLS.L, and BALS.L do not need the validation set since they estimate the penalty automatically, so we combine the training set and validation set together for estimation. The testing set is used to evaluate the performance for these methods. Within each simulation study, we consider three different settings for 𝛽.  Simulation 1: 𝛽 = (3, 1.5, 0, 0, 2, 0, 0, 0).  Simulation 2: 𝛽 = (5, 0, 0, 0, 0, 0, 0, 0).

Simulation 3: if 𝑗 = 5, 10, 15, 20, 𝛽𝑗 = 5; else, 𝛽𝑗 = 0, 𝑗 ≤ 20.

Simulations 1, 2, and 3 are corresponding to the sparse case, very sparse case, and sparse recovery problem in the predictors, respectively. Note that we intentionally choose error distributions that are different from the 𝑡 distribution and Laplace distribution to see how the Bayesian 𝑡 regression adaptive Lasso estimation depends on the error assumption. Our simulation results show that, in terms of parameter estimation accuracy, the Bayesian 𝑡 adaptive Lasso method still performs well even when this error distribution assumption is violated. Unless otherwise specified, the freedom degree V = 4 is used in all

44

Modeling in Mathematics

simulations for BAT.L. Since the true model is known, we can compute the median of mean absolute deviations (MMAD), that is, median , where the median of mean absolute deviations is calculated based on 50 simulations. We present the results for simulation in Table 1. The parentheses in these tables are the standard deviations of the MMAD obtained by 500 bootstrap resamplings of the 50 mean absolute deviations. Table 1: Simulation results of MMAD for different distribution errors† Method

𝑁(0, 1)

𝑁(0, 3)

𝑡(3)

Lp(0, 1)

Normal mixture

Laplace mixture

Simulation 1 BAT.L

0.2321 (0.1211)

1.0274 (0.4014)

0.3012(0.1287)

0.2721 (0.2201)

0.3734 (0.4105)

0.4168 (0.3710)

BLS.L

0.3613 (0.1068)

0.9694 (0.3339)

0.4980 (0.2126)

0.4958 (0.1556)

0.8623 (0.3160)

0.8498 (0.5587)

BALS.L

0.2147 (0.1145)

0.9433 (0.4071)

0.3517 (0.2010)

0.3349 (0.1770)

0.7209 (0.3799)

0.6407 (0.6589)

LASSO

0.3221 (0.1311)

0.9755 (0.4875)

0.4396 (0.2771)

0.4306 (0.1635)

0.8073 (0.3775)

0.7993 (0.5934)

ALASSO

0.2219 (0.1165)

0.9609 (0.4882)

0.3504 (0.2628)

0.3265 (0.1962)

0.8352 (0.4046)

0.6840 (0.6998)

BAT.L

0.1362 (0.0658)

0.3932 (0.2607)

0.2188(0.1658)

0.1745 (0.0906)

0.2587 (0.2985)

0.2330 (0.1060)

BLS.L

0.3081 (0.0877)

0.6778 (0.2811)

0.4881 (0.2249)

0.4709 (0.1606)

0.7045 (0.2552)

0.7356 (0.2784)

BALS.L

0.1308 (0.1100)

0.4518 (0.3194)

0.2479 (0.2349)

0.2015 (0.1759)

0.3700 (0.2961)

0.4355 (0.2568)

LASSO

0.1910 (0.1089)

0.6835 (0.3459)

0.3506 (0.2245)

0.3112 (0.1935)

0.5540 (0.3064)

0.4977 (0.3626)

ALASSO

0.0932 (0.1219)

0.4152 (0.3220)

0.2187 (0.2509)

0.1508 (0.2113)

0.2429 (0.3322)

0.3974 (0.3141)

BAT.L

0.3253 (0.1049)

0.8987 (0.4659)

0.3108(0.1538)

0.3610 (0.3654)

0.3915 (0.1939)

0.4196 (0.0112)

BLS.L

0.6694 (0.1443)

1.7382 (0.3742)

0.8894 (0.3500)

0.9042 (0.2057)

1.6136 (0.6524)

1.3718 (0.0775)

BALS.L

0.3183 (0.1374)

0.9895 (0.3686)

0.5150 (0.2234)

0.4676 (0.1962)

0.9364 (0.5522)

0.6973 (0.0660)

LASSO

0.4469 (0.1449)

1.3959 (0.4807)

0.7231 (0.3199)

0.6462 (0.2530)

1.2599 (0.6710)

1.1223 (0.1315)

ALASSO

0.2803 (0.1463)

0.9267 (0.4087)

0.4601 (0.2306)

0.4513 (0.2290)

0.9204 (0.8101)

0.6814 (0.0987)

Simulation 2

Simulation 3

In the parentheses are standard deviations of the MMADs obtained 500 bootstrap resampling. The bold numbers correspond to the smallest MMAD in each category †

Simulations show that, in terms of the MMAD, our new regularized method performs better than the other four methods in general, especially for the nonnormal error distributions.

LAND RENT DATA In this section, we demonstrate the performance of BAT.L together with other methods by a real data set. Weisberg [26] reported a data set about land rent data. The data was collected by Douglas Tiffany to study the variation

Robust Bayesian Regularized Estimation Based on Regression Model

45

in rent paid in 1977 for agricultural land planted with alfalfa. The variables are average rent per acre planted to alfalfa (𝑦), average rent paid for all tillable land (𝑥1), density of dairy cows (number per square mile) (𝑥2), and proportion of farmland used as pasture (𝑥3) and 𝑥4 = 1 if liming is required to grow alfalfa; 𝑥4 = 0, otherwise. The unit of analysis is a county in Minnesota; the 67 counties with appreciable rented farmland are included. The response variable alfalfa, which has mean 42.1661 and standard deviation 22.5866, is a high protein crop that is suitable feed for dairy cows. It is thought that rent for land planted with alfalfa relative to rent for other agricultural purposes would be higher in areas with a high density of dairy cows and rents would be lower in counties where liming is required, since that would mean additional expense. We first center all the variables so that the intercept is not considered and the predictors are standardized and then use above five regularized methods to fit the whole data set. For the three Bayesian methods, we use the Bayesian credible interval to guide variable selection. We found that the five different methods all select 𝑥1 and 𝑥2 as important variables. Furthermore, in order to evaluate the performance of BAT.L together with other methods, we randomly select 40 samples as train data set and the remaining 27 samples as test data set; the mean square errors for test data set are reported in the Table 2 for five different regularized methods. Table 2: MSE of five different regularized methods for land rent data Method MSE

BAT.L 94.0764

LASSO 114.1111

ALASSO 104.3413

BLS.L 109.9957

BALS.L 104.8580

As we can see from Table 2, the performance of BAT.L outperforms other variable selection procedures for this data set.

CONCLUSION REMARKS In this paper, we studied the regularized t regression model based on Bayesian method. This new method extended the Bayesian Lasso [11] through replacing the least square loss function by the log-likelihood function of t distribution and the Lasso penalty by adaptive Lasso penalty. Bayesian hierarchical models are developed and Gibbs samplers are derived for our new method. Both the simulation studies and the real data analysis show that BAT.L performs better than other existing Bayesian and non-Bayesian methods when the error distribution has heavy tails and/or outliers. We

46

Modeling in Mathematics

also discussed the Bayesian t regression model with adaptive group Lasso penalty. There is room to improve our methods. In this paper, we treat the dispersion parameter 𝜎 in 𝑡 regression model as constant. In fact, the dispersion parameter 𝜎 may be different for different observations, which need us to consider the problem of regularized 𝑡 regression with varying dispersion model; that is,

(23) where z𝑖’s are covariates, which constitute, in general, although not necessary, a subset of x𝑖’s. For the varying dispersion 𝑡regression model, it is important to consider how to simultaneously do variable selection for both regression coefficients 𝛽 and 𝛾 based on Bayesian method or nonBayesian method. Furthermore, it also be interesting to consider the problem of variable selection for varying dispersion 𝑡 regression models with high dimensional covariates. Research in this aspect is ongoing.

APPENDICES A. Bayesian t Regression with Adaptive Group Lasso Penalty In the Appendices, we consider regression problems where some explanatory variables may be represented by a group of derived input variables. In this case, the selection of important variables corresponds to the group of variables. The group Lasso penalty [13, 25] takes the group structure into account and can do variable selection at the group level. Suppose that the predictors are grouped into 𝐺 groups and 𝛽𝑔 is the coefficient vector of the 𝑔th group predictors x𝑖𝑔. Then

. Let 𝑑𝑔 be the dimension of the vector 𝛽𝑔 and let K𝑔 be a known 𝑑𝑔 × 𝑑𝑔 positive definite matrix (𝑔 = 1, . . . , 𝐺). Define ‖ . We consider the following adaptive group Lasso 𝑡 regression:

Robust Bayesian Regularized Estimation Based on Regression Model

Pick the prior of 𝛽𝑔 as the Laplace prior:

47

(A.1)

(A.2) where 𝐶𝑔 = 2−(𝑑𝑔+1)/2(2𝜋)−(𝑑𝑔−1)/2/Γ((𝑑𝑔 + 1)/2). By the equality (12), we have

(A.3) If we further put gamma priors on the parameter , then we have the following Bayesian hierarchical model:

48

Modeling in Mathematics

(A.4) B. The Gibbs Sampler for Bayesian Adaptive Group Lasso t Regression Based on the Bayesian hierarchical model (A.4), the posterior distribution of all parameters can be given by

(B.1) By the expression (B.1), we can obtain the posterior full conditional distribution for all parameters, which can yield a tractable and efficient Gibbs sampler. It is easy to see that the full conditional distribution of 𝑢𝑖 is the same as in the adaptive Lasso 𝑡 regression. The full conditional distribution

is

Robust Bayesian Regularized Estimation Based on Regression Model

49

(B.2) Thus, the full conditional distribution of 𝑠𝑔 is a generalized inverse Gaussian distribution. The full conditional distribution of 𝛽𝑔 is given by

where

(B.3)

.

Let . Then the full conditional distribution of 𝛽𝑔 is just the normal distribution . The full conditional distribution of

is

(B.4) That is, the full conditional distribution of

is a gamma distribution.

The full conditional distribution of 𝜙 is

(B.5) At last, the full conditional distributions of 𝑎 and 𝑏 are

50

Modeling in Mathematics

(B.6) Again, the full conditional distribution of 𝑏 is gamma distribution and we can use the adaptive rejection sampling algorithm (Gilks, 1992) to obtain the sampler of 𝑎.

ACKNOWLEDGMENT

The research was supported in part by the Research Project of Social Science and Humanity Fund of the Ministry of Education (14YJC910007).

Robust Bayesian Regularized Estimation Based on Regression Model

51

REFERENCES 1.

2.

3. 4.

5. 6.

7.

8. 9.

10.

11. 12. 13. 14.

R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the Royal Statistical Society, Series B: Methodological, vol. 58, no. 1, pp. 267–288, 1996. W. J. Fu, “Penalized regressions: the bridge versus the LASSO,” Journal of Computational and Graphical Statistics, vol. 7, no. 3, pp. 397–416, 1998. K. Knight and W. Fu, “Asymptotics for lasso-type estimators,” The Annals of Statistics, vol. 28, no. 5, pp. 1356–1378, 2000. J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle regression,” The Annals of Statistics, vol. 32, no. 2, pp. 407–499, 2004. Y. Li and J. Zhu, “L1-norm quantile regression,” Journal of Computational and Graphical Statistics, vol. 17, no. 1, pp. 163–185, 2008. H. Wang, G. Li, and G. Jiang, “Robust regression shrinkage and consistent variable selection through the LAD-Lasso,” Journal of Business and Economic Statistics, vol. 25, no. 3, pp. 347–355, 2007. Y. Wu and Y. Liu, “Variable selection in quantile regression,” Statistica Sinica, vol. 19, no. 2, pp. 801–817, 2009. H. Zou and M. Yuan, “Composite quantile regression and the oracle model selection theory,” The Annals of Statistics, vol. 36, no. 3, pp. 1108–1126, 2008. X. Chen, Z. J. Wang, and M. J. McKeown, “Asymptotic analysis of robust LASSOs in the presence of noise with large variance,” IEEE Transactions on Information Theory, vol. 56, no. 10, pp. 5131–5149, 2010. T. Park and G. Casella, “The Bayesian lasso,” Journal of the American Statistical Association, vol. 103, no. 482, pp. 681–686, 2008. C. Hans, “Bayesian lasso regression,” Biometrika, vol. 96, no. 4, pp. 835–845, 2009. Q. Li, R. Xi, and N. Lin, “Bayesian regularized quantile regression,” Bayesian Analysis, vol. 5, no. 3, pp. 533–556, 2010. M. Kyung, J. Gill, M. Ghosh, and G. Casella, “Penalized regression,

52

15.

16.

17. 18. 19. 20.

21.

22. 23.

24.

25.

26.

Modeling in Mathematics

standard errors, and Bayesian lassos,” Bayesian Analysis, vol. 5, no. 2, pp. 369–411, 2010. K. L. Lange, R. J. Little, and J. M. Taylor, “Robust statistical modeling using the t distribution,” Journal of the American Statistical Association, vol. 84, no. 408, pp. 881–896, 1989. C. Liu and D. Rubin, “ML estimation of the t distribution using EM and its extensions, ECM and ECME,” Statistica Sinica, vol. 5, no. 1, pp. 19–39, 1995. J.-G. Lin, L.-X. Zhu, and F.-C. Xie, “Heteroscedasticity diagnostics for t linear regression models,” Metrika, vol. 70, no. 1, pp. 59–77, 2009. N. Städler, P. Bühlmann, and S. Van de Geer, “L1-penalization for mixture regression models,” TEST, vol. 19, no. 2, pp. 209–256, 2010. E. Lehmann, Theory of Point Estimation, Wadsworth and Brooks/ Cole, Pacific Grove, Calif, USA, 1983. H. Zou, “The adaptive lasso and its oracle properties,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418–1429, 2006. D. F. Andrews and C. L. Mallows, “Scale mixtures of normal distributions,” Journal of the Royal Statistical Society Series B: Methodological, vol. 36, pp. 99–102, 1974. C. Leng, M. Tran, and D. Nott, “Bayesian adaptive lasso,” http://arxiv. org/abs/1009.2300. W. R. Gilks and P. Wild, “Adaptive rejection sampling for Gibbs sampling,” Journal of the Royal Statistical Society Series C: Applied Statistics, vol. 41, no. 2, pp. 337–348, 1992. M. Kyung, J. Gill, M. Ghosh, and G. Casellax, “Penalized regression, standard errors, and Bayesian lassos,” Bayesian Analysis, vol. 5, no. 2, pp. 369–411, 2010. M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society Series B, vol. 68, no. 1, pp. 49–67, 2006. S. Weisberg, Applied Linear Regression, Wiley, New York, NY, USA, 1985.

CHAPTER 3

Robust Quadratic Regression and Its Application to Energy-Growth Consumption Problem Yongzhi Wang1 , Yuli Zhang2 , Fuliang Zhang3 , and Jining Yi3,4 College of Instrumentation & Electrical Engineering, Jilin University, Changchun 130061, China 1

Department of Automation, TNList, Tsinghua University, Beijing 100084, China

2

Development and Research Center of China Geological Survey, Beijing 100037, China

3

School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, China 4

ABSTRACT We propose a robust quadratic regression model to handle the statistics inaccuracy. Unlike the traditional robust statistic approaches that mainly focus on eliminating the effect of outliers, the proposed model employs the recently developed robust optimization methodology and tries to minimize the worst-case residual errors. First, we give a solvable equivalent semidefinite Citation (APA): Wang, Y., Zhang, Y., Zhang, F., & Yi, J. (2013). Robust quadratic regression and its application to energy-growth consumption problem. Mathematical Problems in Engineering, 2013.(10 pages). DOI: http://dx.doi.org/10.1155/2013/210510 Copyright: 2013 Yongzhi Wang et al. This is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Modeling in Mathematics

54

programming for the robust least square model with ball uncertainty set. Then the result is generalized to robust models under 𝑙1- and 𝑙∞-norm critera with general ellipsoid uncertainty sets. In addition, we establish a robust regression model for per capital GDP and energy consumption in the energy-growth problem under the conservation hypothesis. Finally, numerical experiments are carried out to verify the effectiveness of the proposed models and demonstrate the effect of the uncertainty perturbation on the robust models.

INTRODUCTION Traditional regression analysis is a useful tool to model the linear or nonlinear relationship between the observed data. In the simplest linear regression model, there is only one explanatory variable 𝑥 (the regressors) and one dependent variable 𝑦 (the regressand) that is assumed to be an affine function of 𝑥; it is further extended to a polynomial regression model where 𝑦 is an 𝑛th order polynomial of 𝑥. In this case, the corresponding multivariate regression model contains more than one explanatory variable. To make the regression models work well, there are several specific assumptions on the model and the observed data. Consider the following standard multivariate linear regression model: where

(1)

are given observed data and 𝜀 is a random error vector.

Assuming that have zero mean and constant variance, they are independent of each other. Besides the assumption on the random errors, there is another important weak exogeneity assumption that the explanatory variables are known deterministic values. Under this assumption, one can arbitrarily transform their values and construct any complex function relationship between the regressors and the regressand. For example, in this case the polynomial regression is merely a linear regression with regressors 

.

Although this weak exogeneity assumption makes the linear regression model very powerful to fit the given data or predict the regressand for given known regressors, it may lead to overfitting or inconsistent estimations [1]. Actually this assumption may be quite unreasonable in some case. For instance, in the process of collecting data, there is often unavoidable

Robust Quadratic Regression and Its Application to....

55

observation noise that makes the observed data quite inaccurate. Furthermore, in statistics the incomplete sampling approach sometimes can only give an approximation of the real values. Researches on regression models with imprecise data have been reported. One way to handle the noisy observation is the measure error model or the errors-in-variable model, where it is assumed that there exist some unknown latent (or true) variables that follow the true functional relationship, and the actual observations are affected by certain random noise [2]. Based on different assumptions about random noise, there are a variety of regression models, such as the method of moments [3] that is based on the third-(or higher-) order joint cumulants of observable variables and the Deming regression [4] assuming that the ratio of the noise variance is known. A brief historical overview of linear regression with errors in variables can be found in [5]. In addition to the errors-in-variable models, motivated by the robust optimization theory under uncertainty, studies on the robust regression models are reported. In such case, the perturbations are deterministic and unknown but bounded. Ghaoui and Lebret [6] study the robust linear regression with bounded uncertainty sets under least square criterion. They utilize the second-order cone programming (SOCP) and semi-definite programming to minimize the worstcase residual errors. Shivaswamy et al. [7] propose SOCP formulations for robust linear classification and regression models when the first two moments of the uncertain data are computable. BenTal et al. [8] provide an excellent framework of robust linear classification and regression. Based on general assumption on the uncertainty sets, they provide explicit equivalent formulations for robust least squares, 𝑙1, 𝑙∞, and Huber penalty regressions. For more results regarding robust classification and regression that are similar to this work, we refer the readers to [9–11]. Also according to [8], the traditional robust statistic approaches (see [12]), which try to reject the outliers in data, are different from the point of view in this paper as the authors here intend to minimize the maximal (worst-case) residual errors. However in order to overcome the confliction, a two-step approach can be easily implemented. First, the outliers are identified, and related data is removed. Then, our proposed method is applied in order to safely eliminate the effect generated from imprecise data. We employ this approach in the real energy-growth regression problem in Section 3. Besides the regression models, there are a wide variety of forecasting models, such as support vector machines, decision tree, neural network, and Bayes classifier. For example, [13] utilizes the support vector machine based on trend-based segmentation method for financial time series forecasting.

56

Modeling in Mathematics

The proposed models have been tested by using various stocks from America stock market with different trends [14]. proposes a new adaptive local linear prediction method to reduce the parameters uncertainties in the prediction of a chaotic time series. Real hydrological time series are used to validate the effectiveness of the proposed methods. More related literatures can be found in [15] (chaotic time series analysis), [16] (fractal time series), and [17] (knowledge-based Green’s kernel for support vector regression). Compared with these models, we focus on the handling of the statistics inaccuracy. The regression model is an appropriate basis to develop effective and tractable robust models. In this paper, we try to extend the robust linear regression model to general multivariate quadratic regression and provide equivalent tractable formulations. Different from the simple extension from the classical linear model to classical polynomial (even general nonlinear) models under the weak exogeneity assumption, the perturbation of explanatory variables in the quadratic terms will affect the model in a complex nonlinear manner. Although [8, 12] have discussed the robust polynomial interpolation problem, only an upper bound and the corresponding suboptimal coefficients are given. They further conjecture that the proposed problem cannot be solved exactly in polynomial time. Our proposed robust multivariate quadratic regression model in this paper also needs to solve a complex biquadratic min-max optimization problem. However, under certain assumption on the uncertainty sets, we can obtain a series of equivalent semidefinite programming formulations for robust quadratic regression under different residual error criteria. In particular, we first extend the traditional quadratic regression model by introducing the separable ball (2-norm) uncertainty set and formulate the optimal robust regression problem as a min-max problem that tries to minimize the maximal residual error. By utilizing the S-lemma [18] and Schur complement lemma, we provide an equivalent semi-definite programming formulation for the robust least square quadratic regression model with ball uncertainty set. This result is then generalized to models with general ellipsoid uncertainty sets and under the 𝑙1-, 𝑙∞-norm criteria. Furthermore the robust quadratic regression models are applied to the economic growth and energy consumption regression problem. We take the per capital GDP as the explanatory variable and the per capital energy consumption as the dependent variable. Under the conservation hypothesis, we establish a corresponding robust model. Finally we test the proposed model on different history data sets and compare our models with the classical regression models.

Robust Quadratic Regression and Its Application to....

57

The paper proceeds as follows. In Section 2, we present a general robust quadratic regression model, give a solvable equivalent semi-definite programming for the robust least square quadratic regression model with ball uncertainty set, and further generalize the result. In Section 3, the proposed models are applied to the energy-growth problem. Numerical experiments are carried out in Section 4, and Section 5 concludes this paper and gives future research directions.

ROBUST QUADRATIC REGRESSION MODELS General Robust Models Consider the standard multivariate quadratic regression model: (2) where 𝑥∈𝑅𝑛 denotes the 𝑛-dimension explanatory data, 𝑦 ∈ 𝑅 denotes the dependent data, and 𝑄∈𝑅𝑛×𝑛, 𝛼∈𝑅𝑛 , and 𝛽∈𝑅 are unknown coefficients that will be determined based on certain minimal criteria. Given a set of data 𝐷 = [𝑋; 𝑌𝑇]∈𝑅(𝑛+1)×𝑚, where 𝑋 = [𝑥1,...,𝑥𝑚]∈𝑅𝑛×𝑚 and 𝑌 = [𝑦1;...;𝑦𝑚]∈𝑅𝑚, we utilize the 𝑝-norm to measure the prediction error (3) In traditional regression models, we assume that the explanatory data are precise and reliable. Based on this weak exogeneity assumption, the quadratic regression can be expressed as the following linear regression:

(4) where

are the problem data and the linear operator ∘ for matrix

𝐴 and 𝐵∈𝑅 is defined as 𝐴 ∘ . Therefore, we can easily solve the above linear regression model for 𝑝 = 1, 2 (the least square regression) and +∞. To relax the weak exogeneity assumption, we assume that the real data are contained in the following uncertainty set: 𝑠×𝑙

58

Modeling in Mathematics

(5) To minimize the worst-case residual error, we establish the following robust quadratic regression model:

(6) From the computational perspective, although the robust linear regression problem (where the coefficients Q are set to zero) with a large variety of uncertainty sets can be efficiently solved, the robust quadratic regression problems are much more difficult. Actually for general uncertainty sets and least square criteria, even the inner maximization problem, which includes convex biquadratic polynomial as the objective function and general convex set as feasible set, is in general not solvable in polynomial run time. Next we will introduce some meaningful uncertainty sets and provide the corresponding tractable equivalences.

Separable Ball Uncertainty Sets Model In this subsection, we consider the following separable ball uncertainty set: where

(7)

(8) and 𝛿𝑖 ≥ 0. Thus the inner problem (IP) is of the following form (here we first consider square of the original objective function):

(9) Note that for the inner problem, the separable uncertainty set and the summation form of the objective function allow us to decompose it into 𝑚 small scale subproblems with quadratic objective function and ball

Robust Quadratic Regression and Its Application to....

59

constraints. The quadratic objective function and constraints motivate us to use the following S-lemma to obtain an equivalent solvable reformulation. Lemma 1 (inhomogeneous version of S-lemma [8]). Let 𝐴, 𝐵 be symmetric matrices of the same size, and let the quadratic form 𝑥𝑇𝐴𝑥 + 2𝑎𝑇𝑥+𝛽 be strictly positive at some point. Then the implication holds true if and only if



(10)

(11) We can obtain the following equivalent semidefinite programming for the separable robust least square quadratic regression model. Proposition 2. The robust least square quadratic regression model with separable uncertainty set 𝑈𝑠 is equivalent to the following semidefinite programming:

(12) where

60

Modeling in Mathematics

(13) Proof. First consider the inner maximization subproblem. It is obvious that



(14)

If 𝛿𝑖 = 0, we have that

(15) where  If 𝛿𝑖 > 0, we can utilize the S-lemma as follows:

Robust Quadratic Regression and Its Application to....

Note that in the last step, if

61

(16)

, then there exists (Δ𝑦𝑖; Δ𝑥𝑖) = (0; 0) such

that quadratic form is strictly positive; thus the condition of S-lemma holds truly. Similarly we have that



(17)

Thus the inner maximization problem is equivalent to the following semidefinite programming:

62

Modeling in Mathematics

(18) Note that based on the Schur complement lemma, the second-order cone constraint  definite constraint:

 can also be formalized as the following semi-

(19) Thus we complete the proof by embedding the equivalent semi-definite programming into the outer problem. Due to the advance of interior algorithms for conic programming, the above semidefinite programming can be efficiently solved in polynomial run time. There are several efficient and free software packages for solving the semidefinite programming, such as the SDPT3 [19]. Next we make several extensions based on the separable robust least square quadratic regression model.

Ellipsoid Uncertainty Set and More Norm Criterion The above result on standard ball uncertainty set can be further extended to that on the following general ellipsoid uncertainty set:

(20) where 𝑃𝑖 ∈ 𝑅𝑘×(𝑛+1). Linear transformation operator 𝑃𝑖 allows us to impose more restrictions on the uncertainty set. For example, if we choose the diagonal matrix 𝑃𝑖 = Diag{𝜎1,...,𝜎𝑛+1}, we can put different weights on deviation of components of (𝑥𝑖; 𝑦𝑖); general matrix can further restrict the correlated deviation of different components. To obtain the corresponding reformulation, we only need to modify the first two constraints based on the S-lemma as follows:

Robust Quadratic Regression and Its Application to....

63

(21) We further consider the robust quadratic regression models with 𝑙∞-norm and 𝑙1-norm criterion. Note that for 𝑙∞-norm criteria, the inner maximization problem is of the following form:

(22) And for 𝑙1-norm criteria, we have the following equivalent reformulation:

(23) Using the similar approach as in Proposition 2, both can be further reformulated as semi-definite programming. Proposition 3.  The separable robust quadratic regression model under 𝑙∞-norm and 𝑙1-norm criteria are equivalent to the following semidefinite programming, respectively:

64

where

Modeling in Mathematics



(24)



(25)

ROBUST ENERGY-GROWTH REGRESSION MODELS Studies have been reported on the causal relationship between economic growth and energy consumption. In this section, we try to apply the proposed robust quadratic regression model to the energy-growth problem. The seminal paper of J. Kraft and A. Kraft [20] first studies the casual relationship for USA. In a recent survey, Ilhan [21] categorizes the casual relationships into four types: no causality, unidirectional causality running from economic growth to energy consumption, the reverse case, and the bidirectional causality. Note that the resulted relationships depend on the selected data and analysis approaches. Sometimes the results obtained from

Robust Quadratic Regression and Its Application to....

65

different approaches conflict with each other when even using the data from the same country. For example, using the Toda-Yamamoto causality test method, Bowden and Payne [22] show that energy consumption plays an important role in economic growth in USA based on history data from 1949 to 2006 while using the same method Soytas and Sari [23] find that no causality exists between them based on USA data from 1960 to 2006. On the other hand, based on the same USA’s data from 1947 to 1990, Cheng [24] and Stern [25] conclude different causalities by utilizing different analyzing approaches. Unlike the previous energy-growth studies, we attempt to provide a long-run stationary regression model between the per capital GDP (G) and per capital energy consumption (EC). The underlying assumption of our model is similar to the traditional “conservation hypothesis” that means that an increase in real GDP will cause an increase in energy consumption [21]. The “per capital” perspective provides us with a new insight on the causality and new regression models. Figures 1 and 2 demonstrate the relationship between per capital energy consumption and per capital GDP in USA and Germany respectively. From the subfigures on the left hand side, we can see that in both countries there is a gradual increase in economy while the per capital energy consumption may decrease after reaching a certain level; the subfigures on the right hand side inspire us to establish a nonlinear regression model to characterize the relationship. 6 Mathematical Problems in Engineering

2

Per capital energy (ton)

4 2

4.5 Per capital energy (ton)

Per capital GDP ($10000)

5.0

4.0 3.5 3.0 2.5 2.0

0

1960

1970

1980

1990

2000

1.5

2010

Year

0.6

0.8

Per capital energy Per capital GDP

1.0 1.2 1.4 1.6 Per capital GDP ($10000)

1.8

2.0

(b)

(a)

Figure 1: Germany data from 1960 to 2006.

Figure 1: Germany data from 1960 to 2006.10 4 10

4 2 0

1860 1880 1900 1920 1940 1960 1980 2000 2020

Year

Per capital energy (ton)

6

2

Per capital energy (ton)

Per capital GDP ($10000)

9 8

8 7 6 5 4 3 2 0.0

0.5

1.0 1.5 2.0 2.5 Per capital GDP ($10000)

Per capital energy Per capital GDP (a)

(b)

3.0

3.5

Per

Per c

Pe

2

2.5 2.0

0

1960

1970

1980

1990

2000

1.5

2010

Year

0.6

0.8

Per capital energy Per capital GDP

Modeling in (a)Mathematics

66

1.0 1.2 1.4 1.6 Per capital GDP ($10000)

1.8

2.0

(b)

Figure 1: Germany data from 1960 to 2006. 4

10

10

4 2 0

1860 1880 1900 1920 1940 1960 1980 2000 2020

Per capital energy (ton)

6

2

Per capital energy (ton)

Per capital GDP ($10000)

9 8

8 7 6 5 4 3 2 0.0

0.5

Year

1.0 1.5 2.0 2.5 Per capital GDP ($10000)

3.0

3.5

Per capital energy Per capital GDP (b)

(a)

Figure 2: USA data from 1870 to 2006.

Figure 2: USA data from 1870 to 2006. to energy consumption, the reverse case, and the bidirectional causality. Note that the resulted relationships depend on the selected data and analysis approaches. Sometimes the results obtained from different approaches conflict with each other when even using the data from the same country. For example, using the Toda-Yamamoto causality test method, Bowden and Payne [22] show that energy consumption plays an important role in economic growth in USA based on history data from 1949 to 2006 while using the same method Soytas and Sari [23] find that no causality exists between them based on USA data from 1960 to 2006. On the other hand, based on the same USA’s data from 1947 to 1990, Cheng [24] and Stern [25] conclude different causalities by utilizing different analyzing approaches. Unlike the previous energy-growth studies, we attempt to provide a long-run stationary regression model between the per capital GDP (G) and per capital energy consumption

(EC). The underlying assumption of our model is similar

To eliminate effect of the imprecise statistics data, “conservation we employ the proposed to the traditional hypothesis” that means that an increase in real GDP will cause an increase in energy robust quadratic regression model and put different on the residual consumption [21]. weights The “per capital” perspective provides us with a new on the causality new regression errors at different time points. Specifically weinsight establish the and following models. Figures 1 and 2 demonstrate the relationship between weighted robust quadratic regression model: per capital energy consumption and per capital GDP in

USA and Germany respectively. From the subfigures on the left hand side, we can see that in both countries there is a gradual increase in economy while the per capital energy consumption may decrease after reaching a certain level; the subfigures on the right hand side inspire us to establish a nonlinear regression model to characterize the relationship. To eliminate effect of the imprecise statistics data, we employ the proposed robust quadratic regression model and 𝑡 the residual errors at different time put different weights on



(26)

where the weight factor 𝑤𝑡 ∈ [0, 1] represents the relative importance of the predicted residual error in the 𝑡th year. We could set 𝑤 = 0 for the abnormal data point and set 𝑤𝑡 as an increase function of 𝑡 to emphasize the importance of recent data. The uncertainty set is defined as

(27)

where . Parameter 𝜀 controls the relative amplitude of the fluctuation in observed data.

The weighted robust quadratic regression model can be summarized as follows.(1)Solve the classical quadratic regression model using the nominal values  .(2) Based on the quadratic regression, remove the data with the first 𝑘 largest residual errors and set weights value 𝑤𝑡.(3)Solve the equivalent semi-definite programming problem and return the final weighted robust quadratic regression model.

Robust Quadratic Regression and Its Application to....

67

NUMERICAL EXPERIMENTS In this section, we verify the effectiveness of the proposed robust quadratic regression models on several data sets. The equivalent semi-definite programming problem is solved by the SDPT3 solver [19]. Numerical experiments are implemented using MATLAB 7.7.0 and run on Intel(R) Core(TM)2 CPU E7400. First we test the proposed robust least square quadratic regression (LSRQR) model with Germany data from 1960 to 2006. As previously discussed, after the preliminary quadratic regression analysis, we will remove the data with the first 𝑘 largest residual errors, where 𝑘=3%×data size. Then for the rest of data, we establish the classical least square quadratic regression (LSCQR) and LS-RQR models, respectively.

Table 1 lists the computation results for LS-CQR and LS-RQR with a series of 𝜖 values. The listed Err value represents the mean square error from the nominal value, and 𝑇 represents the run time for solving the optimization cal Problems in Engineering 7 problem. 𝑄𝑄 𝜖4.254 𝜖3.899 𝜖3.690 𝜖3.423 𝜖2.900 𝜖2.243

Model CQR 𝜖𝜖 𝜖 𝜖𝜖𝜖𝜖 RQR 𝜖𝜖 𝜖 𝜖𝜖𝜖𝜖 RQR 𝜖𝜖 𝜖 𝜖𝜖𝜖𝜖 RQR 𝜖𝜖 𝜖 𝜖𝜖𝜖𝜖 RQR 𝜖𝜖 𝜖 𝜖𝜖𝜖𝜖 RQR 𝜖𝜖 𝜖 𝜖𝜖𝜖𝜖

𝑄𝑄 6.721 6.225 5.938 5.561 4.755 3.719

𝑄𝑄 𝜖4.254 𝜖3.899 𝜖3.690 𝜖3.423 𝜖2.900 𝜖2.243

Table𝑄𝑄1: LS-CQR and LS-RQR models Err with different 𝜖𝜖.

−6.099 −5.433 −5.063 −4.564 −3.363 −1.817

𝑄𝑄 6.721 6.225 5.938 5.561 4.755 3.719

1.688 1.621 1.663 1.735 2.029 2.574

points. Specifically we establish the following weighted robust 5.0

cifically we establish the following weighted robust quadratic regression model: egression model:

Per capital energy (ton)

1/𝑝𝑝 4.5 𝑝𝑝 (∑(𝑤𝑤𝑡𝑡 (EC𝑡𝑡 − 𝑞𝑞𝑞𝑞𝑡𝑡2 −2𝛼𝛼𝛼𝛼𝑡𝑡 −𝛽𝛽𝛽𝛽 ) , min max 𝜖𝜖1/𝑝𝑝 𝛼𝛼𝛼𝛼𝛼𝛼𝛼𝛼 (𝐺𝐺𝑡𝑡 ,𝐸𝐸𝐸𝐸𝑡𝑡 )∈𝑈𝑈𝑡𝑡 𝑇𝑇 𝑡𝑡𝑡𝑡 𝑝𝑝 4.0 (26) max (∑(𝑤𝑤𝑡𝑡 (EC𝑡𝑡 − 𝑞𝑞𝑞𝑞𝑡𝑡2 −2𝛼𝛼𝛼𝛼𝑡𝑡 −𝛽𝛽𝛽𝛽 ) , ,𝐸𝐸𝐸𝐸𝑡𝑡 )∈𝑈𝑈𝑡𝑡𝜖𝜖 𝑡𝑡𝑡𝑡 3.5 where the weight factor 𝑤𝑤 ∈ [0, 1] represents the relative 𝑡𝑡 (26) importance of the predicted residual error in the 𝑡𝑡th year. We 3.0 point and set 𝑤𝑤 as an could set 𝑤𝑤𝑡𝑡 =0 for the abnormal data 𝑡𝑡 weight factor 𝑤𝑤𝑡𝑡 ∈ [0, 1] increase represents the relative function of 𝑡𝑡 to emphasize the importance of recent 2.5 data. in Thethe uncertainty set is defined as e of the predicted residual error 𝑡𝑡th year. We 𝑇𝑇

5.0

𝑄𝑄 −6.099 −5.433 −5.063 −4.564 −3.363 −1.817

7

𝑇𝑇 (s) 0.000 0.500 0.500 0.500 0.516 0.484

Err 1.688 1.621 1.663 1.735 2.029 2.574

𝑇𝑇 (s) 0.000 0.500 0.500 0.500 0.516 0.484

4.5

Per capital energy (ton)

𝜖𝜖 𝜖𝜖 𝜖𝜖𝜖 𝜖𝜖 𝜖𝜖 𝜖𝜖

Mathematical Problems in Engineering

Table 1: 1: LS-CQR LS-CQR and LS-RQRand models with different models 𝜖𝜖. Table LS-RQR with different 

4.0 3.5 3.0 2.5 2.0 1.5

and set 𝑤𝑤𝑡𝑡 as󵄩󵄩 an 𝑤𝑤𝑡𝑡 =0 for the abnormal data point 󵄩󵄩󵄩󵄩 𝑈𝑈𝑡𝑡𝜖𝜖 = {(𝐺𝐺𝑡𝑡 , EC𝑡𝑡 ) : 󵄩󵄩󵄩󵄩󵄩󵄩󵄩󵄩(𝐺𝐺𝑡𝑡 − 𝐺𝐺𝑡𝑡 ) , (EC2.0 󵄩󵄩󵄩󵄩2 ≤ 𝛿𝛿𝑡𝑡 } , (27) 𝑡𝑡 − EC𝑡𝑡 )󵄩󵄩 nction of 𝑡𝑡 to emphasize the importance of recent ncertainty set is defined as 1.5 2 2

0.6

0.8

1.0 1.2 1.4 1.6 Per capital GDP ($10000)

1.8

2.0

LS-RQR ( 𝜖𝜖 = 0.03)

History data

LS-RQR ( 𝜖𝜖 = 0.04) LS-CQR (𝜖𝜖 = 0.00) where 𝛿𝛿𝑡𝑡 = 𝜀𝜀√𝐺𝐺𝑡𝑡 + EC𝑡𝑡 . Parameter 𝜀𝜀0.6controls relative1.2 0.8 the 1.0 1.4 1.6 LS-RQR 1.8 (𝜖𝜖 =2.0 LS-RQR ( 𝜖𝜖 = 0.05) 0.01) 󵄩󵄩󵄩󵄩 󵄩󵄩󵄩󵄩 of the fluctuation in observed data. amplitude 𝜖𝜖 LS-RQR ( = 0.02) Per capital GDP ($10000) 󵄩󵄩 , 𝐺𝐺𝑡𝑡 , EC𝑡𝑡 ) : 󵄩󵄩󵄩󵄩󵄩󵄩(𝐺𝐺𝑡𝑡 − 𝐺𝐺𝑡𝑡 ) , (EC𝑡𝑡 − EC ≤ 𝛿𝛿 ) } (27) 󵄩󵄩 𝑡𝑡 𝑡𝑡 The weighted robust quadratic regression model can be 󵄩󵄩2 Figure 3: LS-CQR and LS-RQR models on Germany data. summarized as follows. LS-RQR (𝜖𝜖 = 0.03) History data 2 2 LS-RQR (𝜖𝜖 = 0.04) 𝜖𝜖 LS-CQR ( = 0.00) √ (1) Solvethe the relative classical quadratic regression model using 14 = 𝜀𝜀 𝐺𝐺𝑡𝑡 + EC𝑡𝑡 . Parameter 𝜀𝜀 controls LS-RQR (𝜖𝜖 = 0.05) 𝑇𝑇 LS-RQR (𝜖𝜖 = 0.01) of the fluctuation in observed data.the nominal values (𝐺𝐺𝑡𝑡 , EC𝑡𝑡 )𝑡𝑡𝑡𝑡 . LS-RQR (𝜖𝜖 = 0.02) 12 (2) Based on can the quadratic regression, remove the data ighted robust quadratic regression model be Figure 3: LS-CQR and LS-RQR models on Germany data. with the first 𝑘𝑘 largest residual errors and set weights 10 ed as follows. value 𝑤𝑤𝑡𝑡 .

12

ed on the quadratic regression, remove the data h the first 𝑘𝑘 largest residual errors and set weights 4. Numerical Experiments 10 ue 𝑤𝑤𝑡𝑡 .

In this section, we verify the effectiveness of the proposed Err

8 several data sets. The ve the equivalent semi-definite programming robust quadratic regression models on blem and return the final weighted robustprogramming problem is solved by equivalent semi-definite 6 are impleadratic regression model. the SDPT3 solver [19]. Numerical experiments

mented using MATLAB 7.7.0 and run on Intel(R) Core(TM)2

on8 Germany data.

Err

3: LS-CQR LS-RQR models (3) Solve the using equivalent and semi-definite programming ve the classical quadraticFigure regression model 14 problem and return the final weighted robust nominal values (𝐺𝐺𝑡𝑡 , EC𝑡𝑡 )𝑇𝑇𝑡𝑡𝑡𝑡 . quadratic regression model.

6 4 2 0

0.00

0.02

0.04

0.06

0.08

𝜖𝜖

LS-CQR

LS-RQR (𝜖𝜖 = 0.03)

0.10

tical Problems in Engineering

68

Modeling in Mathematics

7

Table 1: LS-CQR and LS-RQR models with different 𝜖𝜖.

It is seen𝑄𝑄 that the resulted robust model absolute values 𝑄𝑄 Err exhibits smaller 𝑇𝑇 (s) 6.721 −6.099 1.688 0.000 of 𝑄, 𝛼, and 𝛽 with the increase of 𝜖 value; that is, the regression curve is 6.225 −5.433 1.621 0.500 more flat as are1.663 less precise. 0.500 It is obvious that one 5.938the model parameters −5.063 5.561 −4.564 1.735 0.500 drawback of model is that 2.029 the mean square 4.755the robust −3.363 0.516 error will increase 3.719 increases.−1.817 0.484 curves for different as uncertainty Figure 3 plots2.574the regression models and also supports our analysis of the effect of increasing data ecifically we establish the following weighted robust uncertainty on robust 5.0regression. regression model: 𝑄𝑄 𝜖4.254 𝜖3.899 𝜖3.690 𝜖3.423 𝜖2.900 𝜖2.243

𝜖𝜖𝜖𝜖 𝜖𝜖𝜖 𝜖𝜖𝜖𝜖 𝜖𝜖𝜖 𝜖𝜖𝜖 𝜖𝜖𝜖

4.5

To demonstrate the effectiveness of the robust models, we test the 1/𝑝𝑝 𝑝𝑝 4.0 2 max (∑(𝑤𝑤𝑡𝑡 (EC𝑡𝑡 − 𝑞𝑞𝑞𝑞worst-case , of the resulted models when 𝜖 varies from 0 to 0.1. 𝑡𝑡 −2𝛼𝛼𝛼𝛼𝑡𝑡 −𝛽𝛽𝛽𝛽 )performance 𝐺𝐺 ,𝐸𝐸𝐸𝐸 )∈𝑈𝑈 𝑡𝑡𝑡𝑡 3.5 Specifically, for each 𝜖 value, we randomly generate 500 groups of data from (26) 𝑡𝑡

𝜖𝜖 𝑡𝑡

Per capital energy (ton)

𝑡𝑡

𝑇𝑇

3.0

the defined uncertainty set e weight factor 𝑤𝑤𝑡𝑡 ∈ [0, 1] represents the relative 2.5 ce of the predicted residualat error in the data 𝑡𝑡th year.point. We each Figure 𝑤𝑤𝑡𝑡 =0 for the abnormal data point and set 𝑤𝑤𝑡𝑡 as an LS-RQR models 2.0with unction of 𝑡𝑡 to emphasize and the importance of recent uncertainty set is defined as 1.5

and then calculate the maximal residual error 4 plots the worst-case error of LS-CQR model 𝜖 = 0.01, 0.03, and 0.05. It is seen that the error of LS-CQR model increases 0.6 0.8 rapidly, 1.0 1.2 and 1.4 LS-RQR 1.6 1.8 2.0with 𝜖 = 0.05 has the 󵄩󵄩󵄩󵄩 󵄩󵄩󵄩󵄩 Per capital GDP ($10000) 󵄩󵄩 󵄩󵄩 (𝐺𝐺𝑡𝑡 , EC𝑡𝑡 ) : 󵄩󵄩󵄩󵄩(𝐺𝐺𝑡𝑡 − 𝐺𝐺𝑡𝑡 ) , (EC most 󵄩󵄩󵄩󵄩2 ≤ 𝛿𝛿error 𝑡𝑡 − EC𝑡𝑡 )flat 𝑡𝑡 } , (27)curve. Figure 4 also indicates that it is critical to accurately LS-RQR (𝜖𝜖 = 0.03) Historydata data estimate the variability of the and set proper value for 𝜖. In our case, 2 2 LS-RQR (𝜖𝜖 = 0.04) LS-CQR (𝜖𝜖 = 0.00) = 𝜀𝜀√𝐺𝐺𝑡𝑡 + EC𝑡𝑡 . Parameter 𝜀𝜀 controls the relative 𝜖𝜖 = 0.05) LS-RQR 𝜖𝜖 LS-RQR ( = 0.01) we data. recommend LS-RQR with 𝜖 = 0.03 that is (almost always better than the e of the fluctuation in observed LS-RQR (𝜖𝜖 = 0.02) eighted robust quadratic traditional regression modelLS-CQR can be model. Figure 3: LS-CQR and LS-RQR models on Germany data.

ed as follows.

lve the classical quadratic regression model using e nominal values (𝐺𝐺𝑡𝑡 , EC𝑡𝑡 )𝑇𝑇𝑡𝑡𝑡𝑡 .

14

lve the equivalent semi-definite programming oblem and return the final weighted robust uadratic regression model.

8

erical Experiments

12 10

Err

ased on the quadratic regression, remove the data th the first 𝑘𝑘 largest residual errors and set weights lue 𝑤𝑤𝑡𝑡 .

6 4 2

ction, we verify the effectiveness of the proposed 0 adratic regression models on several data sets. The 0.00 0.02 0.04 0.06 0.08 0.10 t semi-definite programming problem is solved by 𝜖𝜖 T3 solver [19]. Numerical experiments are impleLS-RQR (𝜖𝜖 = 0.03) LS-CQR sing MATLAB 7.7.0 and run on Intel(R) Core(TM)2 LS-RQR (𝜖𝜖 = 0.05) LS-RQR (𝜖𝜖 = 0.01) 00. we test the proposed robust least square quadratic Figure 4: Mean square error of LS-CQR and LS-RQR models when n (LS-RQR) model with Germany data from 1960 to 𝜖𝜖 varies. previously discussed, after the preliminary quadratic square error of LS-CQR and LS-RQR models when 𝜖 varies. Figure 4: Mean n analysis, we will remove the data with the first 𝑘𝑘 represents the mean square error from the nominal value, idual errors, where 𝑘𝑘 𝑘𝑘% × data size. Then for the test the RQR models 𝑙1 (L1- RQR) and 𝑙∞-(LIand proposed 𝑇𝑇 represents the run time for solvingunder the optimization ta, we establish the classical leastNext square we quadratic problem. It is same seen that data the resulted model5exhibits n (LS-CQR) and LS-RQR models, RQR)respectively. norm criteria on the set. robust Figure plots the corresponding smaller absolute values of 𝑄𝑄, 𝛼𝛼, and 𝛽𝛽 with the increase of 1 lists the computation results for LS-CQR and regression thethatsame uncertainty set 𝜖flat=as0.02. For the same 𝜖 value, 𝜖𝜖 value; is, the regression curve is more the model with a series of 𝜖𝜖 values. The listed Errcurves value for

LIRQR model can be considered as the most robust one, and L1-RQR and L2-RQR models are similar. It is noticeable that it contradicts with the

Robust Quadratic Regression and Its Application to....

69

traditional robust regression terms. For example, [26] refers to the 𝑙1-norm regression as the robust regression model in the sense that the corresponding model is insensitive to the large residual errors(corresponding to the outliers). However, after removing the possible abnormal data points, here we try to make our regression analysis insensitive to the worst-case residual errors at Mathematical Problems in Engineering each data point.

8

5.0

Per capital energy (ton)

4.5

8

4.0 3.5 3.0 2.5 Mathematical Problems in Engineering

2.0 1.5 0.6

4.5

Per capital energy (ton)

5.0

4.0 3.5 3.0

0.8

1.0 1.2 1.4 1.6 Per capital GDP ($10000)

1.8

2.0

L2-RQR (𝜖𝜖 = 0.02) LI-RQR (𝜖𝜖 = 0.02)

History data L1-RQR ( 𝜖𝜖 = 0.02)

Figure 2.5 5: RQR models under 𝑙𝑙1 -, 𝑙𝑙2 - and 𝑙𝑙∞ -norm criteria.

Figure 5: RQR models under 𝑙1-, 𝑙2- and 𝑙∞-norm criteria. 2.0

10

1.5

Finally we apply the proposed RQR model more data sets, including 30 1.8 on 0.6 0.8 1.0 1.2 1.4 1.6 2.0 Per capital GDP ($10000) USA data from 1870 to 2006, Switzerland data from 1965 to 2006, and 8 L2-RQR data 25(𝜖𝜖 = 0.02) Belgium data from 1960 toHistory 2006. LI-RQR (𝜖𝜖 =6, 0.02) 7, and 8 give the resulted L1-RQR ( 𝜖𝜖 = 0.02) Figures 7 Figure 5: RQR models under 𝑙𝑙 residual -, 𝑙𝑙 - and 𝑙𝑙 -norm criteria. regression models and the worst-case errors for different 𝜖 values. 20 1

10

5

2 0.0

0.5

8

25

7

5 41.0 3

15 10

20

6 Err

3



30

9

4

2

Err

6

Per capital energy (ton)

Per capital energy (ton)

9

1.5 2.0 2.5 Per capital GDP ($10000)

2 History data CQR 0.0 RQR (𝜖𝜖 = 0.03)

3.0

3.5

15

5 0.02

0.00

10

0.04

0.06

0.08

0.10

𝜖𝜖 5 0.5

1.0 1.5 2.0 2.5 Per capital GDP ($10000)

3.0

3.5 0.00

0.02

LS-CQR LS-RQR (𝜖𝜖0.06= 0.03) 0.04 0.08 𝜖𝜖

History (a)data CQR RQR (𝜖𝜖 = 0.03)

LS-CQR LS-RQR (𝜖𝜖 = 0.03)

(a)

0.10

(b)

Figure 6: USA data from 1870 to 2006.

(b)

Figure 6: USA data from 1870 to 2006.

parameters are less precise. It is obvious that one drawback of rapidly, and LS-RQR with 𝜖𝜖 𝜖 𝜖𝜖𝜖𝜖 has the most flat error 6: USA data from 1870 to 2006. curve. the robustFigure model isparameters that thearemean square error will Figure it is critical to accurately less precise. It is obvious that oneincrease drawback ofas rapidly, and LS-RQR with4𝜖𝜖 also 𝜖 𝜖𝜖𝜖𝜖indicates has the most that flat error the robust model3isplots that thethe meanregression square error willcurves increase as Figure 4 alsothe indicates that it is critical to accurately uncertainty increases. Figure for curve.estimate variability of the data and set proper value for uncertainty increases. Figure 3 plots the regression curves for the variability of the data and set proper value for different models and also supports our analysis of the effect estimate 𝜖𝜖. In our case, we recommend LS-RQR with 𝜖𝜖 𝜖 𝜖𝜖𝜖𝜖 that is different models and also supports our analysis of the effect 𝜖𝜖. In our case, we recommend LS-RQR with 𝜖𝜖 𝜖 𝜖𝜖𝜖𝜖 that is of increasing data uncertainty on robust regression. almost always better than the traditional LS-CQR model. of increasing data uncertainty on robust regression. almost always better than the traditional LS-CQR model. To demonstrate the effectiveness the robustmodels, models, Next we test the RQR proposed models under 𝑙𝑙RQR 1 (L1- models under 𝑙𝑙 To demonstrate the effectiveness of theof robust Next weproposed test the 1 (L1RQR) and 𝑙𝑙∞ -(LI-RQR) norm criteria on the same data set. we test the worst-case performance of the resulted models -(LI-RQR) norm criteria on the same data set. RQR) and 𝑙𝑙 we test the worst-case performance of the resulted models ∞ Figure 5 plots the corresponding regression curves for the when 𝜖𝜖 varies from 0 to 0.1. Specifically, for each 𝜖𝜖 value, 5 set plots curves for the when 𝜖𝜖 varies from 0 to 0.1. Specifically, value, sameFigure uncertainty 𝜖𝜖 𝜖 the 𝜖𝜖𝜖𝜖. corresponding For the same 𝜖𝜖 value, regression LIwe randomly generate 500 groups offor data each from the𝜖𝜖defined RQR same model can be considered asset the 𝜖𝜖 most𝜖robust one, For and the same 𝜖𝜖 value, LIthe maximal residual uncertainty 𝑈𝑈𝑡𝑡𝜖𝜖 and then uncertainty 𝜖𝜖𝜖𝜖. we randomly generate 500setgroups of calculate data from the defined

Modeling in Mathematics

70

It is seen that the proposed RQR models still almost always outperform the CQR model, especially for large uncertainty sets. Based on the robust quadratic regression models, these three countries reach the highest per capital energy consumption points at per capital GDP value around 23, 000 Mathematical Problems in Engineering 9 while the peak values vary from 5.7 to 8.5 Ton. Mathematical Problems in Engineering 9 6.0 12

6.0 5.5

Per capital energy (ton) Per capital energy (ton)

12 10

5.5 5.0

10 8 Err Err

5.0 4.5 4.5 4.0 4.0 3.5 3.5

8 6 6 4 4 2

1.0

1.2

1.4 1.6 1.8 2.0 Per capital GDP ($10000) 1.0 1.2 1.4 1.6 1.8 2.0 History dataPer capital GDP ($10000) CQR History RQR (𝜖𝜖data = 0.03) CQR (a) RQR (𝜖𝜖 = 0.03)

(a)

2.2

2.4

2 0

2.2

2.4

0

0.02 0.04 0.06 𝜖𝜖 0.00 0.02 0.04 0.06 LS-CQR 𝜖𝜖 LS-RQR (𝜖𝜖 = 0.03) LS-CQR (b) LS-RQR (𝜖𝜖 = 0.03) 0.00

Figure 7: Switzerland data from 1965 to 2006.

Figure 7: Switzerland dataFigure from 1965 to 2006. 7: Switzerland data from 1965 to 2006.

0.08

0.08

0.10

0.10

(b)

6.0

12

5.0 5.5

10 12

Per capital energy (ton) Per capital energy (ton)

6.05.5

4.5 5.0

108

2.5

Err

3.5 4.0 3.0 3.5 2.5 3.0

Err

4.0 4.5

6 8 4 6 2 4

0.6

0.8

1.0

1.2 1.4 1.6 1.8 2.0 Per capital GDP ($10000)

2.2

2.4

0 2

0.00 0.02 0.04 0.06 𝜖𝜖 data 1.2 1.4 1.6 1.8 2.0 2.2 2.4 0.6 History 0.8 1.0 0 CQR LS-CQR 0.00 0.02 0.04 0.06 Per capital GDP ($10000) LS-RQR (𝜖𝜖 = 0.03) RQR ( 𝜖𝜖 = 0.03) 𝜖𝜖 History data (a) (b) CQR LS-CQR (𝜖𝜖 = 0.03) RQR ( 𝜖𝜖 = 0.03) Figure 8: Belgium data from 1960LS-RQR to 2006.

(a)

0.08

0.08

0.10

0.10

(b)

Figure 8: Belgium from 1960 to 2006. regression model in the sense that the corresponding model data CQR model, especially for large uncertainty sets. Based on the robust quadratic regression models, these three countries is insensitive to the large residual errors(corresponding to the reach the highest per capital energy consumption points at outliers). However, after removing the possible abnormal data per GDP value around 23, 000 while thesets. peakBased valueson points, here we try make ourthat regression analysis insensitive regression model intothe sense the corresponding model CQRcapital model, especially for large uncertainty vary from 5.7 to 8.5 Ton. the worst-case errors errors(corresponding at each data point. to the the robust quadratic regression models, these three countries isto insensitive to theresidual large residual reach the highest per capital energy consumption points at Finally we apply theremoving proposedthe RQR modelabnormal on more data outliers). However, after possible per capital GDP value around 23, 000 while the peak values sets, including USA data from 1870 to 2006, Switzerland data points, here we try to make our regression analysis insensitive 5. Conclusions and Future Works vary from 5.7 to 8.5 Ton. 1965 to 2006, and errors Belgium data data frompoint. 1960 to 2006. tofrom the worst-case residual at each Figures 6, 7, and 8 give the resulted regression models and the In this paper, we studied the multivariate quadratic regression Finally we apply the proposed RQR model on more data worst-case residual errors for different 𝜖𝜖 values. It is seendata that model with imprecise statistic data. Unlike the traditional sets, including USA data from 1870 to 2006, Switzerland 5. Conclusions and Future Works the proposed RQR models still almost the robust statistic approaches that focus on the detection of from 1965 to 2006, and Belgium dataalways from outperform 1960 to 2006. Figures 6, 7, and 8 give the resulted regression models and the In this paper, we studied the multivariate quadratic regression worst-case residual errors for different 𝜖𝜖 values. It is seen that model with imprecise statistic data. Unlike the traditional the proposed RQR models still almost always outperform the robust statistic approaches that focus on the detection of

Figure 8: Belgium data from 1960 to 2006.

CONCLUSIONS AND FUTURE WORKS

In this paper, we studied the multivariate quadratic regression model with imprecise statistic data. Unlike the traditional robust statistic approaches that focus on the detection of the outliers and the elimination of the effects,

Robust Quadratic Regression and Its Application to....

71

we employed the recently developed robust optimization framework and uncertainty set theory. In particular, we first extended the existing robust linear regression results to the robust least square quadratic regression model with the separable ball uncertainty set. The specific form of the uncertainty set allowed us to use the well-known S-lemma and give the tractable equivalent semidefinite programming. We further generalized the result to robust models under 𝑙1- and 𝑙∞-norm criteria with general ellipsoid uncertainty sets. Next the proposed robust models were applied to the energy-growth problem. Under the classical conservation hypothesis, we employed the traditional quadratic regression model to remove the abnormal data and established a robust quadratic regression model for the per capital GDP and per capital energy consumption. Finally the proposed models were tested on the history data of Germany, USA, Switzerland, and Belgium. From the numerical experiments, we found that (1) the amplitude of the uncertainty perturbation 𝛿 plays a critical role on the robust models; (2) with the increase of 𝛿, the robust model has a more flat curve; (3) for the same 𝛿 value, compared with 𝑙1- and 𝑙2-norm models, 𝑙∞-norm model is the most robust one; (4) as expected, the robust approach provides a serial robust regression models that can reduce the worst-case residual errors when the observed data contain noise For further research, robust polynomial (nonlinear) regression models are interesting in their own right. Although we may always reduce them to the linear regression model with polynomially (or nonlinearly) transformed uncertainty data set, it is still worth studying whether the resulted regression models are solvable for quadratic regression with coupled uncertainty sets.

ACKNOWLEDGMENT This work was supported by Geological Survey Project of China (nos. 1212010881801, 1212011120995).

72

Modeling in Mathematics

REFERENCES 1. 2. 3.

4. 5.

6.

7.

8. 9.

10.

11.

12. 13.

Z. Griliches and V. Ringstad, “Errors-in-the-variables bias in nonlinear contexts,” Econometrica, vol. 38, no. 2, pp. 368–370, 1970.  W. A. Fuller, Measurement Error Models, John Wiley & Sons, New York, NY, USA, 1987.   T. Erickson and T. M. Whited, “Two-step GMM estimation of the errors-in-variables model using high-order moments,” Econometric Theory, vol. 18, no. 3, pp. 776–799, 2002.   P. J. Cornbleet and N. Gochman, “Incorrect least-squares regression coefficients,” Clinical Chemistry, vol. 25, no. 3, pp. 432–438, 1979.  J. W. Gillard, “An historical overview of linear regression with errors in both variables,” Tech. Rep., Cardiff University School of Mathematics, Cardiff, UK, 2006.  L. El Ghaoui and H. Lebret, “Robust solutions to least-squares problems with uncertain data,” SIAM Journal on Matrix Analysis and Applications, vol. 18, no. 4, pp. 1035–1064, 1997.   P. K. Shivaswamy, C. Bhattacharyya, and A. J. Smola, “Second order cone programming approaches for handling missing and uncertain data,” Journal of Machine Learning Research, vol. 7, pp. 1283–1314, 2006.  A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust Optimization, Princeton University Press, Princeton, NJ, USA, 2009.  T. B. Trafalis and R. C. Gilbert, “Robust classification and regression using support vector machines,” European Journal of Operational Research, vol. 173, no. 3, pp. 893–909, 2006.   H. Xu, C. Caramanis, and S. Mannor, “Robustness and regularization of support vector machines,” Journal of Machine Learning Research, vol. 10, pp. 1485–1510, 2009.   T. B. Trafalis and R. C. Gilbert, “Robust support vector machines for classification and computational issues,”  Optimization Methods & Software, vol. 22, no. 1, pp. 187–198, 2007.   P. J. Huber, Robust Statistics, John Wiley & Sons, New York, NY, USA, 1981.  J. L. Wu and P. C. Chang, “A trend-based segmentation method and the support vector regression for financial time series forecasting,” Mathematical Problems in Engineering, vol. 2012, Article

Robust Quadratic Regression and Its Application to....

14.

15. 16. 17.

18. 19.

20.

21. 22.

23.

24.

25. 26.

73

ID 615152, 20 pages, 2012.   D. X. She and X. H. Yang, “A new adaptive local linear prediction method and its application in hydrological time Series,” Mathematical Problems in Engineering, vol. 2010, Article ID 205438, 15 pages, 2010.   Z. Liu, “Chaotic time series analysis,” Mathematical Problems in Engineering, vol. 2010, Article ID 720190, 31 pages, 2010.   M. Li, “Fractal time series: a tutorial review,” Mathematical Problems in Engineering, vol. 2010, Article ID 157264, 26 pages, 2010.   T. Farooq, A. Guergachi, and S. Krishnan, “Knowledge-based green’s kernel for support vector regression,” Mathematical Problems in Engineering, vol. 2010, Article ID 378652, 16 pages, 2010.   I. Pólik and T. Terlaky, “A survey of the S-lemma,” SIAM Review, vol. 49, no. 3, pp. 371–418, 2007.   K. C. Toh, R. H. Tütünü, and M. J. Todd, “On the implementation and usage of SDPT3Ca Matlab software package for semidefinitequadraticlinear programming,” version 4. 0, 2006, http://ecommons.library. cornell.edu/handle/1813/15133. J. Kraft and A. Kraft, “On the relationship between energy and GNP,” Journal of Energy and Development, vol. 3, no. 2, pp. 401–403, 1978.  O. Ilhan, “A literature survey on energy growth nexus,” Energy Policy, vol. 38, pp. 340–349, 2010.  N. Bowden and J. E. Payne, “The causal relationship between US energy consumption and real output: a disaggregated analysis,” Journal of Policy Modeling, vol. 31, no. 2, pp. 180–188, 2009.  U. Soytas and R. Sari, “Energy consumption, economic growth, and carbon emissions: challenges faced by an EU candidate member,” Ecological Economics, vol. 68, no. 6, pp. 1667–1675, 2009.  B. Cheng, “An investigation of cointegration and causality between energy consumption and economic growth,” Journal of Energy Development, vol. 21, no. 1, pp. 73–84, 1995.  D. I. Stern, “Energy and economic growth in the USA: a multivariate approach,” Energy Economics, vol. 15, no. 2, pp. 137–150, 1993.  S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004.

CHAPTER 4

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions Gaurav Kumar1 and Rakesh Kumar Bajaj2 Singhania University, Pacheri Bari, Jhunjhunu, Rajasthan 333515, India

1

Jaypee University of Information Technology, Waknaghat 173234, India

2

ABSTRACT In fuzzy set theory, it is well known that a triangular fuzzy number can be uniquely determined through its position and entropies. In the present communication, we extend this concept on triangular intuitionistic fuzzy number for its one-to-one correspondence with its position and entropies. Using the concept of fuzzy entropy the estimators of the intuitionistic fuzzy regression coefficients have been estimated in the unrestricted regression model. An intuitionistic fuzzy weighted linear regression (IFWLR) model with some restrictions in the form of prior information has been considered.

Citation (APA): Kumar, G., & Bajaj, R. K. (2014). Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions. International scholarly research notices, 2014. (10 pages). DOI: http://dx.doi.org/10.1155/2014/358439 Copyright: 2014 Gaurav Kumar and Rakesh Kumar Bajaj. This is an open access article distributed under the Creative Commons Attribution License (http://creativecommons. org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

76

Modeling in Mathematics

Further, the estimators of regression coefficients have been obtained with the help of fuzzy entropy for the restricted/unrestricted IFWLR model by assigning some weights in the distance function.

INTRODUCTION In statistical analysis, regression is used to explore the relationship between 𝑘 input variables x1, x2,..., x𝑘 (also known as independent variables or explanatory variables) and the output variable y (also called dependent variable or response variable) from 𝑛 sets of observations. In linear regression, the method of least-squares is applied to find the regression coefficients 𝛽𝑗, 𝑗 = 0, 1, . . . , 𝑘, which describe the contribution of the corresponding independent variable x𝑗 in explaining the dependent variable y. The aim of regression analysis is to estimate the parameters on the basis of available/observed empirical data. Traditional studies on regression assume the observations to have crisp values. In the crisp linear regression model, the parameters (regression coefficients are crisp) appear in a linear form; that is,

(1)

Once the coefficients 𝛽0, 𝛽1, 𝛽2,...,𝑘 are determined from the observed samples, the responses are estimated from any given sets of x1, x2,..., x𝑘 values.

Fuzzy set theory, developed by Zadeh [1], has capability to describe the uncertain situations, containing ambiguity and vagueness. It may be recalled that a fuzzy set A defined on a universe of discourse 𝑋 is characterized by a membership function 𝜇𝐴(𝑥) which takes values in the interval [0, 1] (i.e., 𝜇𝐴 : 𝑋 → [0, 1]). The value 𝜇(𝑥) represents the grade of membership of 𝑥∈𝑋 in 𝐴. This grade corresponds to the degree to which that element or individual is similar or compatible with the concept represented by the fuzzy set. Thus, the elements may belong in the fuzzy set to a greater or lesser degree as indicated by a larger or smaller membership grade. Tanaka et al. [2, 3] initiated the research in the area of linear regression analysis in a fuzzy environment, where a fuzzy linear system is used as a regression model. They consider a regression model in which the relations of the variables are subject to fuzziness, that is, the model with crisp input and fuzzy parameters. In general, fuzzy regression can be classified into two categories:(i)when the relations of the variables are subject to fuzziness,(ii) when the variables themselves are fuzzy.

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....

77

There exist several conceptual and methodological approaches to fuzzy regression with respect to the characterization mentioned above. Tanaka and Watada [4], Tanaka et al. [5], and Tanaka and Ishibuchi [6] considered more general models in fuzzy regression. In the approaches of Tanaka et al., they considered the L-R fuzzy data and minimized the index of fuzziness of the fuzzy linear regression model. As described by Tanaka and Watada [4], “A fuzzy number is a fuzzy subset of the real line whose highest membership values are clustered around a given real number called the mean value; the membership function is monotonic on both sides of this mean value.” Hence, fuzzy number can be decomposed into position and fuzziness, where the position is represented by the element with the highest membership value and the fuzziness of a fuzzy number is represented by the membership function. The comparison among various fuzzy regression models and the difference between the approaches of fuzzy regression analysis and conventional regression analysis have been presented by Redden and Woodall [7]. Chang and Lee [8] and Redden and Woodall [7] pointed out some weaknesses of the approaches proposed by Tanaka et al. A fuzzy linear regression model based on Tanaka’s approach by considering the fuzzy linear programming problem has also been introduced by Peters [9]. In fuzzy set theory, the entropy is a measure of degree of fuzziness which expresses the amount of average ambiguity/difficulty in making a decision whether an element belongs to a set or not. The following are the four properties introduced in de Luca and Termini [10], which are widely accepted as a criterion for defining any new fuzzy entropy measure (⋅) of the fuzzy set 𝐴: (i)

P1 (sharpness): (𝐴) is minimum if and only if 𝐴 is a crisp set; that is, 𝜇𝐴(𝑥) = 0 or 1 for all 𝑥; (ii) P2 (maximality): (𝐴) is maximum if and only if 𝜇𝐴(𝑥) = 0.5 for all 𝑥; (iii) P3 (resolution): (𝐴) ≥ 𝐻(𝐴∗), where 𝐴∗ is sharpened version of 𝐴;

(iv) P4 (symmetry): (𝐴) = is the complement of 𝐴; that is, = 1 − 𝜇𝐴(𝑥). Dubosis and Prade [11, 12] interpreted the measure of fuzziness (𝐴) as quantity of information which is being lost in going from a crisp number to a fuzzy number. It may be noted that the entropy of an element with a given membership function 𝜇𝐴̃(𝑥) is increasing if 𝜇𝐴(𝑥) is in [0, 0.5] and decreasing if 𝜇𝐴(𝑥) is in [0.5, 1]. We accept the definition of fuzzy number given by Tanaka and Watada [4], where the mean value is also called apex.

78

Modeling in Mathematics

Let 𝑋 = (𝑥1, 𝑥2,...,𝑛) be a discrete random variable with probability distribution 𝑃 = (𝑝1, 𝑝2,...,𝑝𝑛) in an experiment; then according to Shannon [13], the information contained in this experiment is given by (2) Based on this famous Shannon’s entropy, de Luca and Termini [10] indicated the following measure of fuzzy entropy:

(3) Kumar et al. [14] studied fuzzy linear regression (FLR) model with some restrictions in the form of prior information and obtained the estimators of regression coefficients with the help of fuzzy entropy for the restricted FLR model. Here, we propose an intuitionistic fuzzy regression model and its general form in triangular intuitionistic fuzzy setup is given by (4) where the value of the output variable ỹ defined by (4) is a triangular intuitionistic fuzzy number; is a vector of intuitionistic fuzzy parameters where ; fuzzy number for 𝑗 = 0, 1, . . . , 𝑘 and fuzzy (explanatory) variables.

is a triangular intuitionistic are triangular intuitionistic

Intuitionistic Fuzzy Sets: Basic Definitions and Notations Basic Definitions and Notations. It may be recalled that a fuzzy set 𝐴 in 𝑋, given by Zadeh [1], is as follows: (5)

where 𝜇𝐴 : 𝑋 → [0, 1] is the membership function of the fuzzy set 𝐴 and 𝜇𝐴(𝑥) is the grade of belongingness of 𝑥 into 𝐴. Thus in fuzzy set theory the grade of nonbelongingness of an element 𝑥 into 𝐴 is equal to 1−𝜇(𝑥). However, while expressing the degree of membership of an element in a fuzzy set, the corresponding degree of nonmembership is not always equal to one minus the degree of belongingness. The fact is that, in real life, the linguistic negation does not always identify with logical negation. Therefore, Atanassov [15–18] suggested a generalization of classical fuzzy set, called intuitionistic fuzzy set (IFS).

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....

79

Atanassov’s IFS 𝐴̃ under the universal set 𝑋 is defined as

(6)

where 𝜇𝐴, v𝐴̃ : 𝑋 → [0, 1] are the membership and nonmembership functions such that 0≤𝜇𝐴̃ + 𝜇𝐴̃ ≤ 1 for all 𝑥∈𝑋. The numbers 𝜇𝐴̃(𝑥) and v𝐴̃(𝑥) denote the degree of membership and nonmembership of an element 𝑥∈𝑋 to the set 𝐴⊂𝑋 ̃ , respectively. For each element 𝑥∈𝑋, the amount 𝜋𝐴̃(𝑥) = 1 − 𝜇𝐴̃(𝑥) − v𝐴̃(𝑥) is called the degree of indeterminacy (hesitation part). It is the degree of uncertainty whether 𝑥 belongs to 𝐴̃ or not.

Intuitionistic Fuzzy Numbers (IFNs)

In literature, Burillo and Bustince [19], Lee [20], Liu and Shi [21], and Grzegorzewski [22] proposed various research works on intuitionistic fuzzy numbers. In this section, the notion of IFNs has been studied and presented by the taking care of these research works. Definition 1. An intuitionistic fuzzy subset 𝐴 = {⟨𝑥, 𝜇𝐴̃(𝑥), v𝐴̃(𝑥)⟩ : 𝑥 ∈ 𝑋} of the real line is called an intuitionistic fuzzy number if the following axioms hold:

(ii)

𝐴̃ is normal; that is, there exist 𝑚 ∈ R (sometimes called the mean value of 𝐴̃) such that 𝜇𝐴̃(𝑚) = 1 and v𝐴̃(𝑚) = 0; the membership function 𝜇𝐴̃ is fuzzy-convex; that is,

(iii)

(7) the nonmembership function v𝐴̃ is fuzzy-concave; that is

(i)

(iv)

(8) the membership and the nonmembership functions of 𝐴̃ satisfying the conditions 0≤𝑓1(𝑥) + 𝑔1(𝑥) ≤ 1 and 0≤𝑓2(𝑥) + 𝑔2(𝑥) ≤ 1 have the following form:

(9) where the functions 𝑓1(𝑥) and 𝑓2(𝑥) are strictly increasing and decreasing functions in [𝑚−𝛼, 𝑚] and [𝑚, 𝑚 + 𝛽], respectively, and

80

Modeling in Mathematics

(10) where the functions 𝑔1(𝑥) and 𝑔2(𝑥) are strictly decreasing and increasing functions in [𝑚 – 𝛼’ , 𝑚] and [𝑚, 𝑚 + 𝛽’ ], respectively. Here 𝛼 and 𝛽 are called the left and right spreads of the membership function 𝜇𝐴̃, respectively. 𝛼’ and 𝛽’ are called the left and right spreads of the nonmembership function v𝐴̃(𝑥). Symbolically, an intuitionistic fuzzy number is represented as 𝐴̃IFN = (𝑚; 𝛼, 𝛽; 𝛼’ , 𝛽’ ). Definition 2.  An IFN 𝐴IFN = (𝑚; 𝛼, 𝛽; 𝛼’ , 𝛽’ ) may be defined as a triangular intuitionistic fuzzy number (TIFN) if and only if its membership and nonmembership functions take the following form:



(11)

(12) It may be noted that a TIFN 𝐴̃ = (𝑚; 𝛼, 𝛽; 𝛼’ , 𝛽’ ) degenerate to a triangular fuzzy number 𝐴̃ = (𝑚; 𝛼, 𝛽) if 𝛼=𝛼’ , 𝛽=𝛽’ , and v𝐴̃(𝑥) = 1 − 𝜇𝐴̃(𝑥), ∀𝑥 ∈ R. Further, an TIFN 𝐴̃ = {⟨𝑥, 𝜇 𝐴̃(𝑥), v𝐴̃(𝑥)⟩ : 𝑥 ∈ R}; that is, 𝐴̃ = (𝑚; 𝛼, 𝛽; 𝛼’ , 𝛽’ ) is a conjunction of two fuzzy numbers 𝐴+ = (𝑚; 𝛼, 𝛽)with the membership function 𝜇𝐴+ (𝑥) = 𝜇𝐴̃(𝑥) and 𝐴− = (𝑚; 𝛼’ , 𝛽’ ) with the membership function 𝜇𝐴̃(𝑥) = 1 − v𝐴̃(𝑥). The entropy calculated using (3) from the membership function of TIFN given by (11) can be expressed as follows: size

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....

81

(13) where 𝐻𝐿(𝐴̃) = 𝐾𝛼/2 and 𝐻𝑅(𝐴̃) = 𝐾𝛽/2 ̃ . It follows that (𝐴̃) = 𝐾(𝛼 + 𝛽)/2 , which does not depend on 𝑚. It may be observed that, in the case of symmetrical TIFN, the left and the right entropies are identical. On the other hand, in case of nonsymmetric TIFN, the left entropy is a function of 𝛼 and the right entropy is a function of 𝛽. Similarly, the left entropy and the right entropy from the nonmembership function (which we called left to left and right to right entropies) of the TIFN are the functions of 𝛼’ and 𝛽’ , respectively. Hence, a triangular intuitionistic fuzzy number can be characterized by five attributes: the position parameter 𝑚, the left entropy 𝛼, the right entropy 𝛽, left to left entropy 𝛼’ , and right to right entropy 𝛽’ . There is a one-toone correspondence between a triangular intuitionistic fuzzy number and its entropies. In other words, given a triangular intuitionistic fuzzy number, one can determine the unique position and entropies. Conversely, given a position and entropies, one can construct a unique triangular intuitionistic fuzzy number. Sometimes experimenter’s past experiences may be available as prior information about unknown regression coefficients to estimate more efficient estimators. Here, we assume that such prior information is provided in the form of exact linear restrictions on regression coefficients. In the present work, we first find the unrestricted estimators of regression coefficients with the help of fuzzy entropy. Next, we introduce the restricted intuitionistic fuzzy linear regression model with fuzzy entropy. Further, the restricted estimators of the regression coefficients are obtained by incorporating the prior information in the form of linear restrictions.

RESTRICTED IFWLR MODEL WITH FUZZY ENTROPY Without

loss

of

generality, suppose that all observations , in the regression analysis are triangular intuitionistic fuzzy numbers. The notion of regression using fuzzy entropy

82

Modeling in Mathematics

is to construct five conventional regression equations (one for apex, one for left entropy of the membership function, one for right entropy of the membership function, one for left entropy of the nonmembership function, and one for right entropy of the nonmembership function) for the response variable ỹ using the corresponding attributes of the 𝑘 fuzzy explanatory variables x̃𝑗. In order to be specific, we denote apexes of

by the

, respectively,

by the left

entropy of

, respectively,

by the

right entropy of

, respectively,

by the left

to left entropy of , respectively, and by the right to right entropy of , respectively. Therefore, the five fundamental regression equations in a nonrecursive (nonadaptive) setup may be written as



(14)

where are the error vectors of dimension 𝑛×1. The compact form of the above mentioned nonrecursive or nonadaptive equations is given by

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....

where

(15)

83

84

Modeling in Mathematics

(16) In many real life situations, where the measurements are carried out (for example car speed astronomical distance), it is natural to think that the spread (vagueness) in the measure of a phenomenon is proportional to its intensity. D’Urso and Gastaldi [23] have done several simulations and observed that even if we consider an adaptive or recursive regression model along with nonadaptive or nonrecursive regression model, they yield identical solutions when there is only one independent variable. But if there are more than one independent variable, then the estimated values of the left entropies and right entropies obtained through the recursive fuzzy regression model will have less variance as compared to the nonrecursive fuzzy regression model. With this consideration, we rewrite the proposed intuitionistic fuzzy linear regression model (15) in a recursive/adaptive setup where dynamic of the entropies is dependent on the magnitude of the estimated apexes as follows:



(17)

where X is the 𝑛 × (5𝑘 + 1)-matrix containing the values of the input variables (data matrix), 𝛽 is a column 5𝑘 + 1- vector containing the regression coefficients for the apexes of the first model (referred to as core regression model), ya and ya∗ are the vector of the observed apexes and the vector of the interpolated apexes, respectively, both having dimension 𝑛×1, are the vector of the observed left entropies and the vector of the interpolated left entropies, respectively, both having dimension 𝑛×1, are the vector of the observed right entropies and the vector of the interpolated right entropies, respectively, both having dimension 𝑛×1,

are the vector

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....

85

of the observed left to left entropies and the vector of the interpolated left to left entropies, respectively, both having dimension 𝑛×1, and are the vector of the observed right to right entropies and the vector of the interpolated right to right entropies, respectively, both having dimension 𝑛×1, and 1 is a (𝑛 × 1)- vector of all 1’ s, 𝑏 and 𝑑 are regression parameters for the second regression equation model (referred to as left entropy regression model), 𝑓 and 𝑔 are regression parameters for the third regression model (referred to as right entropy regression model), 𝑝 and 𝑞 are regression parameters for the fourth regression equation model (referred to as left to left entropy regression model), and 𝑢 and V are regression parameters for the fifth regression equation model (referred to as right to right entropy regression model). The error term in the regression equation of apexes will remain the same while the error terms in the regression equations of entropies may be different. The error vectors

in the left and right entropies are of

the dimension (𝑛 × 1) and the error vectors right to right entropies are of the dimension (𝑛 × 1).

in the left to left and

If some prior information about unknown regression coefficients is available on the basis of past experiences, then it may be used to estimate more efficient estimators. We assume that such prior information is in the form of exact linear restrictions on regression coefficients. In the present model, we associate such restrictions in the equations for the estimation of regression coefficients in the intuitionistic fuzzy linear relations between the size of the entropies and the magnitude of the estimated apexes. Moreover, we assume that the regression coefficients 𝛽 are subjected to the 𝑗 (𝑗 < 5𝑘+1) exact linear restrictions, which are given by (18) where h and H are known and the matrix H is of full row rank.

ESTIMATION OF REGRESSION COEFFICIENTS In many applications, it is possible that the values of the variables are on completely different scales of measurement. Also, the possible larger variations in the values will have larger intersample differences, so they will dominate in the calculation of Euclidean distances. Therefore, some form of standardization is necessary to balance out the individual contributions. Consider the Euclidean distance between two triangular intuitionistic fuzzy

86

Modeling in Mathematics

numbers and with weights 𝑤1, 𝑤2, 𝑤3, 𝑤4, and 𝑤5 as follows:

along

(19) It may be observed that we compute the usual squared differences between the values of variables on their original scales, as in the usual Euclidean distance, but then multiply these squared differences by their corresponding weights. Next, similar to common linear regression (based on crisp data), the regression parameters are estimated by minimizing the following sum of square errors (we use a compact matrix notation):

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....

87

88

Modeling in Mathematics

(20) Differentiating (𝛽, 𝑏, 𝑑, 𝑓, 𝑔, 𝑝, 𝑞, 𝑢, V), that is, (20), partially with respect to 𝛽 and equating it to zero, we get

(21) Similarly, differentiating (20) partially with respect to 𝑏, 𝑑, 𝑓, 𝑔, 𝑝, 𝑞, 𝑢, and V, we get (22)

(23)

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....



89

(24)

(25) (26) (27) (28) (29)

respectively. Equations (21)–(29) are recursive solutions for the problem of least square estimation with intuitionistic fuzzy data. Therefore, we rewrite the system of equations explicitly in a recursive way as follows:

90

Modeling in Mathematics

(30) In order to initiate the recursive process of obtaining the estimators, we take some initial values for 𝑏, 𝑑, 𝑓, 𝑔, 𝑝, 𝑞, 𝑢, V, and 𝛽. After several numbers of iterations, the values of estimators get corrected to a predefined error of

tolerance. We denote these values by ̂ in order to differentiate them from the eventually obtained restricted estimator 𝛽̃ in the next commutation. In a more general setup, if, in the linear regression model (17), we consider 𝑘1 crisp and 𝑘2 intuitionistic fuzzy input variables, then the dimensions of X and 𝛽 will be 𝑛 × (𝑘1 + 5𝑘2 + 1) and (𝑘1 + 5𝑘2 + 1) × 1, respectively. It may further be noted that the core of the solution’s structure will remain the same and we will have similar kind of estimators. Remark.  If a TIFN degenerate to a triangular fuzzy number 𝐴 = (𝑚; 𝛼, 𝛽), then our nonsymmetric intuitionistic fuzzy weighted linear regression model reduces to nonsymmetric fuzzy linear regression model defined by Kumar et al. [24]. Next, we assume that the regression coefficients are subjected to the linear restrictions which are given by (18). It may be noted that the unrestricted estimator obtained above in (21) does not satisfy the given restrictions (18). We aim to obtain the restricted estimator which satisfies the given restrictions under the regression model (17). For this, we propose to minimize the following score function:

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....

91

(31) where 2𝜆 is the vector of Lagrange’s Multiplier. Differentiating (𝜆, 𝛽, 𝑏, 𝑑, 𝑓, 𝑔, 𝑝, 𝑞, 𝑢, V) partially with respect to 𝛽 and equating it to zero, we get

(32) Here, we again relabel the computed restricted estimator by 𝛽̃. Therefore, in view of (21) and (32), we get size

92

Modeling in Mathematics



(33)

Similarly, differentiating (𝜆, 𝛽, 𝑏, 𝑑, 𝑓, 𝑔, 𝑝, 𝑞, 𝑢, V) partially with respect to 𝜆 and equating it to zero, we get

(34) From (33) and (34), we have

(35)

Also, differentiating (31) partially with respect to 𝑏, 𝑑, 𝑓, 𝑔, 𝑝, 𝑞, 𝑢, and V and equating all to zero, we get

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....



(36)



(37)

93

respectively. From (35) we see that

Therefore, the estimator 𝛽̃ satisfies the given restrictions (18).

NUMERICAL EXAMPLES

We consider the following numerical examples to illustrate the proposed model. Example 1. We apply our procedure to estimate the intuitionistic fuzzy output value for a data consisting of the crisp input and intuitionistic fuzzy output (where left entropy and right entropy are equal) and tabulate the data in Table 1. We

obtain

iterations required is 125.

 where the number of

Table 1: Crisp input-int. fuzzy output data

Example 2. We apply our procedure to estimate intuitionistic fuzzy output value for a data consisting of crisp input and intuitionistic fuzzy

94

Modeling in Mathematics

output (where left and right entropy are not equal) and tabulate the data in Table 2. We obtain 

required is 113.

 where the number of iterations

Table 2: Crisp input-int. fuzzy output data

Example 3. We apply our procedure to estimate intuitionistic fuzzy output value for a data consisting of crisp input, intuitionistic fuzzy input, and intuitionistic fuzzy output (where left and right entropy are not equal) and tabulate the data in Table 3. We

obtain 

number of iterations required is 51. Table 3: Crisp and int. fuzzy input-int. fuzzy output data

   where the

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....

95

Example 4. We apply our procedure to estimate intuitionistic fuzzy output value for a data consisting of intuitionistic fuzzy input and intuitionistic fuzzy output (where left and right entropy are not equal) and tabulate the data in Table 4. We

obtain 

of iterations required is 255.

 where the number

Table 4: Intuitionistic fuzzy input-intuitionistic fuzzy output data

CONCLUSIONS An intuitionistic fuzzy weighted linear regression (IFWLR) model with and without some linear restrictions in the form of prior information has been studied. The estimators of regression coefficients have also been obtained with the help of fuzzy entropy for the restricted/unrestricted IFWLR model by assigning some weights in the distance function. It has been observed that the restricted estimator is better than unrestricted estimator in some sense. Thus, whenever some prior information is available in terms of exact linear restrictions on regression coefficients, it is advised to use restricted estimator 𝛽̃ in place of unrestricted estimator .

96

Modeling in Mathematics

REFERENCES 1. 2.

3.

4.

5.

6.

7.

8.

9. 10.

11. 12. 13. 14.

L. A. Zadeh, “Fuzzy sets,” Information and Computation, vol. 8, pp. 338–353, 1965.   H. Tanaka, S. Uejima, and K. Asai, “Fuzzy linear regression model,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 10, pp. 2933–2938, 1980.  H. Tanaka, S. Uejima, and K. Asai, “Linear regression analysis with fuzzy model,” IEEE Transactions on Systems, Man and Cybernetics, vol. 12, no. 6, pp. 903–907, 1982.   H. Tanaka and J. Watada, “Possibilistic linear systems and their application to the linear regression model,” Fuzzy Sets and Systems, vol. 27, no. 3, pp. 275–289, 1988.   H. Tanaka, I. Hayashi, and J. Watada, “Possibilistic linear regression analysis for fuzzy data,” European Journal of Operational Research, vol. 40, no. 3, pp. 389–396, 1989.   H. Tanaka and H. Ishibuchi, “Identification of possibilistic linear systems by quadratic membership functions of fuzzy parameters,” Fuzzy Sets and Systems, vol. 41, no. 2, pp. 145–160, 1991.   D. T. Redden and W. H. Woodall, “Properties of certain fuzzy linear regression methods,” Fuzzy Sets and Systems, vol. 64, no. 3, pp. 361– 375, 1994.   P.-T. Chang and E. S. Lee, “Fuzzy linear regression with spreads unrestricted in sign,” Computers and Mathematics with Applications, vol. 28, no. 4, pp. 61–70, 1994.   G. Peters, “Fuzzy linear regression with fuzzy intervals,” Fuzzy Sets and Systems, vol. 63, no. 1, pp. 45–55, 1994.   A. de Luca and S. Termini, “A definition of a nonprobabilistic entropy in the setting of fuzzy sets theory,” Information and Control, vol. 20, no. 4, pp. 301–312, 1972.   D. Dubosis and H. Prade, Fuzzy Sets and Systems: Theory and Applications, Academic Press, New York, NY, USA, 1980. D. Dubois and H. Prade, Fuzzy Sets and Statistical Possibility Theory, Plenum Press, New York, NY, USA, 1988. C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal, vol. 27, pp. 379–656, 1948.   T. Kumar, N. Gupta, and R. K. Bajaj, “Fuzzy entropy on restricted

Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy ....

15. 16. 17. 18. 19.

20. 21.

22.

23.

24.

97

fuzzy linear regression model with cross validation and applications,” in Proceedings of the International Conference on Advances in Computing and Communications (ICACC ‘12), pp. 5–8, August 2012.   K. T. Atanassov, “Intuitionistic fuzzy sets,” Fuzzy Sets and Systems, vol. 20, no. 1, pp. 87–96, 1986.   K. T. Atanassov, “More on intuitionistic fuzzy sets,” Fuzzy Sets and Systems, vol. 33, no. 1, pp. 37–45, 1989.   K. T. Atanassov, Intuitionistic Fuzzy Sets: Theory and Applications, vol. 35 of Studies in Fuzziness and Soft Computing, Physica, 1999.   K. T. Atanassov, “New operations defined over the intuitionistic fuzzy sets,” Fuzzy Sets and Systems, vol. 61, no. 2, pp. 137–142, 1994.   P. Burillo and H. Bustince, “Some definitions of intuitionistic fuzzy number,” in Proceedings of the 3rd Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT ‘03), pp. 223–227, Zittau, Germany, September 2003. F. Lee, Fuzzy Information Processing System, Peking University Press, Beijing, China, 1998. H. Liu and K. Shi, “Intuitionistic fuzzy numbers and intuitionistic distribution numbers,” Journal of Fuzzy Mathematics, vol. 8, no. 4, pp. 909–918, 2000.   P. Grzegorzewski, “Distances and orderings in a family of intuitionistic fuzzy numbers,” in Proceedings of the 3rd Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT ‘03), pp. 223– 227, Zittau, Germany, September 2003. P. D’Urso and T. Gastaldi, “A least-squares approach to fuzzy linear regression analysis,” Computational Statistics and Data Analysis, vol. 34, no. 4, pp. 427–440, 2000.   T. Kumar, R. K. Bajaj, and N. Gupta, “Fuzzy entropy in fuzzy weighted linear regression model under linear restrictions with simulation study,” International Journal of General Systems, vol. 43, no. 2, pp. 135–148, 2014.

CHAPTER 5

A New Method of Hypothesis Test for Truncated Spline Nonparametric Regression Influenced by Spatial Heterogeneity and Application

Sifriyani1 , I. N. Budiantara2 , S. H. Kartiko3 , and Gunardi3 Department of Mathematics, Faculty of Mathematics and Natural Sciences, Mulawarman University, Samarinda, Indonesia 1

Department of Statistics, Faculty of Mathematics, Computing and Data Sciences, Sepuluh Nopember Institute of Technology, Surabaya, Indonesia 2

Department of Mathematics, Faculty of Mathematics and Natural Sciences, Gadjah Mada University, Yogyakarta, Indonesia 3

ABSTRACT Tis study developed a new method of hypothesis testing of model conformity between truncated spline nonparametric regression influenced

Citation (APA): Budiantara, I. N., & Kartiko, S. H. (2018). A New Method of Hypothesis Test for Truncated Spline Nonparametric Regression Influenced by Spatial Heterogeneity and Application. In Abstract and Applied Analysis (Vol. 2018). Hindawi. (13 pages). Copyright: 2018 Sifriyani et al. This is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

100

Modeling in Mathematics

by spatial heterogeneity and truncated spline nonparametric regression. This hypothesis test aims to determine the most appropriate model used in the analysis of spatial data. The test statistic for model conformity hypothesis testing was constructed based on the likelihood ratio of the parameter set under H0 whose components consisted of parameters that were not influenced by the geographical factor and the set under the population parameter whose components consisted of parameters influenced by the geographical factor. We have proven the distribution of test statistics V and verified that each of the numerators and denominators in the statistic test V followed a distribution of . Since there was a symmetric and idempotent matrix S, it could be proved that

. Matrix

was positive semidefnite and contained weighting matrix which had different values in every location; therefore matrix not idempotent. If also Ỹ was a

was

was not idempotent and distributed random vector, then there were constants

; therefore it was concluded that test statistic V followed an F distribution. The modeling is implemented to find factors that influence the unemployment rate in 38 areas in Java in Indonesia.

INTRODUCTION This study examines theoretically the multivariate nonparametric regression influenced by spatial heterogeneity with truncated spline approach. The model is the development of truncated spline nonparametric regression that takes into account geographic or spatial factors. Truncated spline is a function constructed on the basis of polynomial components and truncated components; i.e., polynomial pieces that have knot points, which can overcome the pattern of changes in data behavior. Truncated spline approach is used as a solution to solve the problem of spatial data analysis modeling; that is, the relationship between the response variable and the predictor variable does not follow a certain pattern and there is a changing pattern in certain subintervals. The response variable in the model contains the predictor variables whose respective regression coefficients depend on the location where the data is observed, due to differences in environmental and geographic characteristics between the observation sites; therefore each observation has different variations (spatial heterogeneity). Spatial is one type of dependent data, where data at a location is influenced by the measurement of data at another location (spatial dependency).

A New Method of Hypothesis Test for Truncated Spline ....

101

This study determines the model conformity hypothesis test between multivariable nonparametric regression that is influenced by spatial heterogeneity with truncated spline approach and multivariable nonparametric regression in general. This hypothesis test aims to determine the model that is most suitable for spatial data analysis. The test statistic was derived using the maximum likelihood ratio test (MLRT) method. The first step in this study was formulating the hypothesis to be tested and then defining the set of parameters under H0whose components consist of parameters that are not influenced by geographical factors and the set under population parameters whose components consist of parameters influenced by geographical factors. Likelihood ratio was constructed based on the maximum ratio of the likelihood function under H0 as the numerator and set under the population as a denominator. Based on the likelihood ratio test statistic V was obtained. Furthermore, the distribution of test statistic V was determined. To prove the distribution of test statistic V, we first proved that each numerator and denominator are chi square distributed. The purpose of this study is to obtain a new method for the determination of hypothesis test of model conformity between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity versus multivariate nonparametric truncated spline regression in general. This hypothesis test aims to determine what model is most suitable for spatial data analysis.

TRUNCATED SPLINE NONPARAMETRIC REGRESSION INFLUENCED BY HETEROGENEITY SPATIAL Truncated spline nonparametric regression influenced by spatial heterogeneity is the development of nonparametric regression for spatial data with parameter estimators local to each location of observation. Truncated spline approach is used to solve spatial analysis problems whose regression curve is unknown [1]. The assumption of the regression model used is the normal distributed error with mean zero and variance at each location . Location coordinates are an important factor in determining the weights used to estimate the parameters of the model. Given data and relationship between and , it is assumed to follow multivariate nonparametric regression model as follows:

102

Modeling in Mathematics

is response variable and

(1) is unknown regression

curve and assumed to be additive. If is approached with a truncated spline function. Mathematically, the relation between response variable and the predictor variable at i-th location for the multivariate nonparametric truncated spline regression model can be expressed as follows [2]:

with truncated function:



(2)

(3) Equation (2) is a multivariate nonparametric truncated spline regression model of degree m with n area. The components in (2) are described as follows: is a response variable at i-th location, where i=1, 2,…, n. is a p-th predictor variable at i-th location with p=1, 2,…, l. Kph is an h-th knot point in p-th predictor variable component with h=1, 2,…, r. is a polynomial component parameter of a multivariate nonparametric truncated spline regression. is a k-th parameter from p-th predictor variable at i-th location. is a truncated component from multivariate nonparametric truncated spline regression. is an l+h-th parameter in h-th knot point and p-th predictor variable at i-th location. Multivariate nonparametric truncated spline regression in (2) is described as follows:

A New Method of Hypothesis Test for Truncated Spline ....

Equation (4) can also be expressed as follows:

(4)

103

104

Modeling in Mathematics

(5) Thus (5) can be expressed by (6) whose vector contains of truncated spline function with geographical weighting sized ; response variable and error, respectively, are given by vectors as follows:

(7) Vectors are, respectively, sized . Meanwhile matrices X and P are, respectively, given by

Vectors

are, respectively, given by

(8)

A New Method of Hypothesis Test for Truncated Spline ....

105

106

Modeling in Mathematics

(9) Matrix X is sized ; matrix P contains predictor variable of truncated function sized is a parameter vector sized . Vector is a parameter vector containing truncated function sized . The estimator forms are complete in Theorem 1 and Corollary 2 [2]. Theorem 1. If the regression model (2) with an error normally distributed with zero mean and variance was given Maximum Likelihood Estimator (MLE), it is used to obtain estimator follows.

where

as

(10)

Corollary 2. If estimator for the regression curve

(11)

are given by Theorem 1, then the is given by (12)

A New Method of Hypothesis Test for Truncated Spline ....

107

where

(13)

Estimator of regression curve contains the polynomial components represented by matrix X and truncated components represented by matrix P [3]. If the matrix P=0, then the estimator multivariable of nonparametric regression curve in the Geographically Weighted Regression (GWR) models with truncated spline approach, , will change to estimator polynomial parametric regression curve in the GWR model. Furthermore, if P=0 and matrix X contains a linear function, the estimator of the multivariable spline nonparametric regression curve in the GWR model, , will change to estimator of linear parametric regression curves in the GWR model or multiple linear regression in the GWR model developed by many researchers such as Brusdon and Fotheringham [4], Fotheringham, Brunsdon, and Charlton (2003), Demsar, Fotheringham, and Charlton [5], Yan Li, Yan Jiao, and Joan A. Browder [6], Shan-shan Wu, Hao Yang, Fei Guo, and Rui -Ming Han [7], and Benassi and Naccarato [8]. This study continued the previous research [2]; in this study the test statistics that will be used in the truncated spline nonparametric regression influenced by spatial heterogeneity modeling will be found; further research continued the distribution of test statistics and rejection areas.

METHOD The hypothesis test for model conformity between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity with nonparametric truncated spline regression is derived. Step 1. Formulating hypothetical model:

108

Modeling in Mathematics

Step 2. Defining the set of parameters under population

.

Step 3. Determining estimators which are parameters in the space under population ( ). Step 4. Obtaining maximum likelihood function under population ( ). Step 5. Defining parameter space under H0, i.e., w. Step 6. Determining estimators which are parameters under H0. Step 7. Obtaining maximum likelihood function under space H0. Step 8. Obtaining likelihood ratio . Step 9. Obtaining test statistic from model conformity testing. Step 10. Specifying the distribution of numerator

from test statistic V.

Step 11. Specifying the distribution of denominator * from test statistic . Step 12. Specifying the distribution of test statistic V. Step 13. Deciding the rejection area of H0 and writing the conclusion.

PARAMETER ESTIMATION UNDER SPACE H0 AND SPACE POPULATION IN THE MODEL A hypothesis testing of model conformity for nonparametric spline regression with spatial heterogeneity was designed by using hypothesis formulation: 

This hypothesis test was derived using maximum likelihood ratio test method by defining the parameter spaces under H0(w) and under population ( ). The parameter space under H0(w) is given by where

(14)

.

While the parameter space under the population (

(15) ) is given by

A New Method of Hypothesis Test for Truncated Spline ....

109

(16) Obtaining the test statistic of hypothesis above required some lemmas as follows. Lemma 3. If is a parameter under population ( ) from nonparametric spline regression with spatial heterogeneity (2), then estimator

is given by (17)

Proof. To obtain estimator population parameter space L( mean

we form likelihood function under ).Therefore yi has normal distribution with

(18) and variance by

; then probability functions

Obtained likelihood function is as follows:

Equation (20) in matrix form is

are given

(19)

(20)

110

Modeling in Mathematics

Estimator results:

(21)

is obtained on the basis of the following derivative

(22) Then the following is obtained:

Therefore,

Furthermore, estimator



(23)



(24)

is shown in Lemma 4.

Lemma 4. If is a parameter in space under population from nonparametric spline regression with spatial heterogeneity spatial (2), then estimator

which is obtained from likelihood function:



(25) is given by

(26)

A New Method of Hypothesis Test for Truncated Spline ....

Proof. Estimator

111

is obtained using likelihood function:

(27)

The ln likelihood function is given by

(28) Furthermore, estimator derivative results:

is obtained on the basis of the following

(29) Then the following was obtained:

Therefore,



(30)

(31) Based on estimators and 4 the following is obtained:

which are given by Lemmas 3

112

Modeling in Mathematics

(32) Lemma 5. If are parameters under H0 from multivariate nonparametric truncated spline influenced by spatial heterogeneity model (2), then estimator for is given by (33) and estimator for

is given by (34)

Proof. To obtain estimators we form likelihood function under parameter space . Therefore yi has normal distribution with mean and variance

(35)

; then probability functions

are given by

(36) The following likelihood functions were obtained:

The equation is in the form of a matrix



(37)

A New Method of Hypothesis Test for Truncated Spline ....

113

(38) Estimators

are obtained:

(39) Based on Lemma 5, maximum likelihood function is obtained as follows:

(40) are parameters estimator of under H0 from multivariate nonparametric regression with truncated spline approach.

STATISTICS TEST FOR TRUNCATED SPLINE NONPARAMETRIC REGRESSION WITH SPATIAL HETEROGENEITY The test statistic for the model conformity hypothesis test can be obtained by using Lemmas 3, 4, and 5. In the next step, we show the likelihood ratio for test statistic presented in Lemma 6. Lemma 6. If likelihood ratio

where

, respectively, are given by (32) and (40), then the is given by

(41)

114

Modeling in Mathematics

(42) Proof. Based on Lemmas 3, 4, and 5, and also (32) and (40), the likelihood ratio is obtained:

Based on (3) and (5), the likelihood ratio

(43)

(44) Given test statistic for model conformity hypothesis is presented by Theorem 7. Theorem 7. If likelihood ratio H0 versus H1 in (2) is given by

is given by Lemma 6, then test statistic for

(45) Proof. Based on Lemma 6, the likelihood ratio is as follows:

A New Method of Hypothesis Test for Truncated Spline ....

115

(46) Based on MLRT method, H0 is rejected if

(47) For a constant c, (43) is equivalent to (48) In the two sections of the inequality above, each numerator is divided by and each denominator is divided by the following inequality is obtained:

; then

(49) Based on (44), the test statistic for H0 versus H1 is given by (50) Furthermore, the distribution of statistics test V will be found. The statistics test given in Theorem 7 is test statistics developed from the spline truncated approach in the GWR model, different from the one developed by Leung, Mei, and Zhang [9], Leung, Mei, and Zhang [10], and Mennis and Jordan [11] using GWR without using the Truncated Spine approach.

DISTRIBUTION OF TEST STATISTIC AND CRITICAL AREA OF HYPOTHESIS To prove prove

the

distribution

Theorems 8and 9 as follows.

of

test

statistic

V,

we

first

. The proofs are presented in

116

Modeling in Mathematics

Theorem 8. If S is a matrix given by Lemma 6 then statistic is (51) Proof. To prove this Lemma, the following steps are taken. Matrix S is shown which is a symmetric and idempotent matrix as follows: (52) Based on equation above, it is proved that matrix S is symmetric.

(53) It is proved that matrix S is idempotent. Furthermore, tr(S) is calculated as follows: Therefore, it is proved that

(54)

(55) Theorem 9. If

is a matrix given by Lemma 6 then statistic is

Proof. Based on (24), we obtain And estimator is obtained:

in which

(56) (57)



(58)

A New Method of Hypothesis Test for Truncated Spline ....

Further, error vector is given as follows:

117

(59)

(60) Sum Square of Error (SSE) of model is obtained by squaring the following error vectors:

Furthermore,

(61)

(62) Since SSE is a quadratic form from random variable: (63) hence, matrix Next, we obtain

is positive semidefinite but not idempotent.

(64) Since , and since matrix is not idempotent, the distribution of statistic is (65) For constants k and r, based on (55), we obtain

118

Modeling in Mathematics

(66) Since is symmetric and positive semidefinite, hence there is an orthogonal matrix ; therefore, is a diagonal matrix in which

(67)

are eigenvalues from matrix

. Hence

(68) in which . Random variables independent, identical, and normal distributed; therefore

are

(69) with mean 1 and variance 2; therefore

(70) Since

,

A New Method of Hypothesis Test for Truncated Spline ....

(71) Hence the values of k and r are as follows: ; as a result,

Hence



(72)

(73) Corollary 10. If statistic V is given by Theorem 7, then

(74) Proof. Based on Theorem 8, statistic is obtained: (75) Based on Theorem 9, statistic is obtained:

Hence

(76)

119

120

Modeling in Mathematics

(77) The critical area for the model conformity hypothesis is derived which is given by Lemma 11. Lemma 11. If given test statistic V is as in Theorem 7, then the critical area for H0 is given by A constant c is obtained according to in which

(78)

(79) is a determined level of significance and

(80) Proof. Based on Theorem 7, the following relationship is obtained:

(81) for a constant c* According to Corollary 10, statistic is obtained:

for a level of significance

(82) given by H0 which is rejected if

(83) After finding the hypothesis test formulation, the suitability of the model

A New Method of Hypothesis Test for Truncated Spline ....

121

between the truncated spline nonparametric regression model which is influenced by spatial heterogeneity and nonparametric regression (global) will then be implemented on unemployment rate data in 38 regions in Java Indonesia.

EMPIRICAL STUDY ON UNEMPLOYMENT RATE IN JAVA INDONESIA Description of Research Data In this study, the nonparametric truncated spline regression model influenced by spatial heterogeneity was applied to Open Unemployment Rate (OUR) data in province of Java, Indonesia, and some predictor variables that were suspected to affect it, i.e., population density (X1), percentage of the poor (X2), percentage of population with low education (X3), percentage of population working in agriculture sector (X4), area of agricultural land (X5), economic growth rate (X6), regional minimum wage (X7), and ratio number of large industries being number of labor force (X8). The amount of data used is 382 from 38 provinces and 8 predictor variables. Table 1 shows the description of our research data and the predictor variables. Table 1: Description of our research data and the predictor variables Variable Y X1 X2 X3 X4 X5 X6 X7 X8

Data 38 38 38 38 38 38 38 38 38

Minimum 0,61 4,59 0,2105 0,07 35,10 851582 0,0074281 0,0405726 474

Maximum 8,59 25,80 7,6445 7,19 33.548,70 2507632 0,1640892 12,4704588 85122

Rata-rata 4,3939 12,0963 2,631579 5,5747 3.378,7658 1516816.21 0,052024430 2,631578947 28730,32

Standard Deviation 1,81385 4,99263 1,6160512 1,22772 6.497,23435 420415.074 0,0474488599 2,8535682531 22372,865

Source: BPS (2017a, 2017b, and 2017c). The spread of Open Unemployment Rate in East Java is shown by Figure 1. It shows the percentage of East Java unemployment rate in 2015.

122

Modeling in Mathematics

Figure 1: Open Unemployment Mapping in province of Java, Indonesia.

Spatial Heterogeneity Test Each region has different characteristics and different parameters, as well as different functional forms; this is what proves spatial aspect. BreuschPagan testing is used to see the spatial heterogeneity of each location. Table 2 shows the Breusch-Pagan test. Table 2: Breusch-Pagan test Test Breusch-Pagan

Significance Value 0.002414

Decision Reject H0

Since spatial effect testing is fulfilled, i.e., there are effects of spatial heterogeneity, then the case can be solved by using the point approach. Furthermore, an analysis was performed using nonparametric truncated spline regression model influenced by spatial heterogeneity.

Model Conformity Test Hypotheses for model conformity test between multivariate nonparametric truncated spline regression model influenced by heterogeneity spatial and multivariate nonparametric truncated spline regression model (global) are as follows:

A New Method of Hypothesis Test for Truncated Spline ....

123

The test statistic is given by Theorem 7 as follows: (84) Matrix S was constructed by multivariate nonparametric truncated spline. Matrix was constructed by multivariate nonparametric truncated spline regression. Hence, the numerator is obtained: (85) with degree of freedom is obtained:

. Meanwhile the denominator

(86) with degree of freedom V= 2,06 with level of significance

. Test statistic = 0,05 was obtained and concluded to

reject H0 since 1,88. Therefore, there is a significant difference between multivariate nonparametric truncated spline regression influenced by spatial heterogeneity and nonparametric truncated spline regression. Due to the influence of geographical factors on the model, the appropriate model used is multivariate nonparametric truncated spline regression influenced by spatial heterogeneity. The modeling application used Open Unemployment Rate (OUR) data in 38 districts/cities in East Java. The results of the empirical study showed that the OUR data has a geographical influence, namely, spatial heterogeneity, and based on the results of the model conformity hypothesis test, the appropriate model used is a multivariable nonparametric truncated spline regression model influenced by spatial heterogeneity with the weighted Gaussian kernel function. The modeling produced a coefficient of determination of 80.42%.

124

Modeling in Mathematics

CONCLUSION Multivariate nonparametric regression with truncated spline approach influenced by spatial heterogeneity is given as follows:

(87) Based on the results of the discussion and data analysis, some conclusions can be drawn as follows: (1)The hypotheses for model conformity between multivariate nonparametric truncated spline regression model influenced by spatial heterogeneity and nonparametric truncated spline regression (global) are as follows:

Test statistic derived using Maximum Likelihood Ratio Test (MLRT) is obtained as follows:

(88) (2)The distribution of multivariate nonparametric truncated spline regression model influenced by spatial heterogeneity is as follows: (89) with level of significance

; therefore H0 is rejected if

(90)

A New Method of Hypothesis Test for Truncated Spline ....

125

ACKNOWLEDGMENTS The authors thank The Ministry of Research, Technology and Higher Education, Republic of Indonesia/Kementerian Riset, Teknologi dan Pendidikan Tinggi Republik Indonesia (Kemenristekdikti RI) for funding this work.

126

Modeling in Mathematics

REFERENCES 1.

Sifriyani, Haryatmi, I. N. Budiantara, and Gunardi, “Geographically weighted regression with spline approach,” Far East Journal of Mathematical Sciences, vol. 101, no. 6, pp. 1183–1196, 2017. 2. Sifriyani, Sri Haryatmi Kartiko, I. Nyoman Budiantara, and Gunardi, “Development of nonparametric geographically weighted regression using truncated spline approach,” Songklanakarin Journal of Science and Technology, vol. 40, no. 4, pp. 909–920, 2018. 3. Sifriyani, Multivariable Nonparametric Regression Truncated Spline in the Geographically Weighted Regression Models, [Dissertation], Universitas Gadjah Mada, 2018. 4. C. Brunsdon, A. S. Fotheringham, and M. Charlton, “Some notes on parametric significance tests for geographically weighted regression,” Journal of Regional Science, vol. 39, no. 3, pp. 497–524, 1999. 5. U. Demšar, A. S. Fotheringham, and M. Charlton, “Exploring the spatiotemporal dynamics of geographical processes with geographically weighted regression and geovisual analytics,” Information Visualization, vol. 7, no. 3-4, pp. 181–197, 2008. 6. Y. Li, Y. Jiao, and J. A. Browder, “Modeling spatially-varying ecological relationships using geographically weighted generalized linear model: A simulation study based on longline seabird bycatch,” Fisheries Research, vol. 181, pp. 14–24, 2016. 7. S.-S. Wu, H. Yang, F. Guo, and R.-M. Han, “Spatial patterns and origins of heavy metals in Sheyang River catchment in Jiangsu, China based on geographically weighted regression,” Science of the Total Environment, vol. 580, pp. 1518–1529, 2017. 8. F. Benassi and A. Naccarato, “Households in potential economic distress. A geographically weighted regression model for Italy, 2001-2011,” Spatial Statistics, vol. 21, no. part B, pp. 362–376, 2017. 9. Y. Leung, C.-L. Mei, and W.-X. Zhang, “Statistical tests for spatial nonstationarity based on the geographically weighted regression model,” Environment and Planning A, vol. 32, no. 1, pp. 9–32, 2000. 10. Y. Leung, C.-L. Mei, and W.-X. Zhang, “Testing for spatial autocorrelation among the errors of the geograhically weighted regression,” Environment and Planning A, vol. 32, no. 1, pp. 871–890, 2000. 11. J. L. Mennis and L. Jordan, “The distribution of environmental equity:

A New Method of Hypothesis Test for Truncated Spline ....

127

Exploring spatial nonstationarity in multivariate models of air toxic releases,” Annals of the Association of American Geographers, vol. 95, no. 2, pp. 249–268, 2005.

CHAPTER 6

A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements Suduan Chen1 , Yeong-Jia James Goo2 , and Zone-De Shen2 Department of Accounting Information, National Taipei University of Business, 321 Jinan Road, Section 1, Taipei 10051, Taiwan 1

Department of Business Administration, National Taipei University, No. 67, Section 3, Ming-shen East Road, Taipei 10478, Taiwan 2

ABSTRACT As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question

Citation (APA): Chen, S., Goo, Y. J. J., & Shen, Z. D. (2014). A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements. The Scientific World Journal, 2014. (9 pages). Copyright: 2014 Suduan Chen et al. This is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

130

Modeling in Mathematics

for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

INTRODUCTION The financial statement is the main basis of decision-making by investors, creditors, and other accounting information demanders and concurrently also the concrete expression of management performance, financial condition, and possessing social responsibility of the listed and OTC companies, but the fraudulent financial statement (FFS) has the trend of becoming increasingly serious in recent years [1–8]. This behavior not only makes the investing public subject to vast amount of loss but also, more seriously, influences the capital market order. Because the fraudulent case is increasingly serious with each passing day, the United States Congress passed Sarbanes-Oxley Act in 2002 and mainly hope by which to improve the accuracy and reliability of the financial statement of a company and disclosure to make the auditors able to forecast the omen of the FFS before the FFS of an enterprise occurs. When one checks corporations’ financial statements due to fraud which led to a significant misstatement, there are fairly strict norms for audit staff in Taiwan [9]. The FFS can be regarded as a typical classification problem [10]. The classification problem carries out a computation mainly in light of the variable attribute numerical value of some given classification data to acquire the relevant classification rule of every classification and bring the unknown classification data into the rule to acquire the final classification result. Many authors apply the logistic regression to make a fraudulent classification and acquire the result in the FFS issue in the past [3, 6, 7, 11–13]. Data mining is an analytical tool used to handle a complicated data analysis. It discovers previously unknown information from mass data and aims for data to make an induction from the structured model as reference amount in making a decision with many different functions,

A Hybrid Approach of Stepwise Regression, Logistic Regression,....

131

such as classification, association, clustering, and forecasting [4, 5, 8, 14]. “Classification” function is used the most often therein, and its result can serve as the decision basis and prediction. However, whether every application of data mining in the FFS is superior to the traditional classification model is controversial. The purpose of this study is to expect that a better method of forecasting fraudulent financial statement can be presented to forecast the omen of the fraudulent financial statement and to reduce damage to the investors and auditors. The study will adopt the logistic regression and the support vector machine (SVM) as well as the decision tree (DT) C50 in data mining as the basis and match the stepwise regression to separately establish classification model to make a comparison. In conclusion, the study first aims at the “fraudulent financial statement” issue to make an arrangement for and carry out an exploration of relevant literature to ensure the research variable and sample adopted by the study. We then take the logistic regression, SVM, and DT C5.0 as the bases to establish the FFS classification model. Finally, we present the conclusions and suggestions of the study.

LITERATURE REVIEW Fraudulent Definition The FFS is a kind of intentional or illegal behavior, the result of which directly causes the seriously misleading financial statement or financial disclosure [2, 15]. Pursuant to the provision of SAS NO.99, a kind of fraudulent pattern is dishonest financial report, and it means a kind of intentional erroneous narration, neglecting amount or disclosure, which makes the misunderstood financial statement [6].

Research Method The classification problem carries out a computation mainly in light of the variable attribute numerical value of some given classification data to acquire the relevant classification rule of every classification and bring the unknown classification data into the rule to acquire the final classification result. Many authors apply the logistic regression to make a fraudulent classification in the FFS issue in the past [3, 11, 12, 15–17]. However, the traditional statistic method has limitation of having to accord with specific assumption in data.

132

Modeling in Mathematics

As a result, the machine learning way which does not require any statistic assumption about data portfolio rises abruptly. Many scholars recently try to adopt the machine learning way as the classification machine to conduct a research. The empirical result also points out that it possesses an excellent classification effect. Chen et al. [13] applied the neural network and SVM to forecast network invasion, and the research result indicates that the SVM has excellent classification ability. Huang et al. [18] applied the neural network and SVM to explore the classification model of credit evaluation. Shin et al. [19] conducted a relevant research of bankruptcy prediction. Yeh et al. [4] apply it in prediction of enterprise failure. On the other hand, Kotsiantis et al. [3] and Kirkos et al. [10] apply DT C5.0 in the relevant research to acquire the excellent classification result. Thus, the study will adopt the foresaid logistic regression, SVM, and DT C5.0 as the classifier construction classification model.

Variable Selection As for variable selection via relevant literature exploration, some authors adopt the financial variable as the research variable [3, 10], others adopt the nonfinancial variable as the research variable [12, 16, 17], and still others adopt both the financial variable and nonfinancial variable as the research variable [15, 20]. Because financial statement data often have cheating suspicion, if we purely consider the financial variables, the possibility of erroneous classification may increase. Therefore, the study not only adopts the financial variable as the research variable, but also adds the nonfinancial variable to construct the fraudulent financial prediction model.

METHODOLOGY The purpose of this study is to present a two-stage research model which integrates the financial variable and nonfinancial variable to establish the fraudulent early warning model of an enterprise. The procedure of the study is to aim at the data to make a stepwise regression analysis, to acquire the result of the important variable of the TTF after screening, and then to take such variable as the input variable of the logistic regression and SVM. Finally, the study makes a comparison and an analysis to acquire a better FFS classification result.

A Hybrid Approach of Stepwise Regression, Logistic Regression,....

133

Stepwise Regression The study selects a variable of the maximum classification ability in accordance with forward selection and incorporates the predictor into the model by stepwise increase. During each process, P value of the statistic test is used to screen the variables. If P value is less than or equal to 0.05, then the variable enters the regression model, and the selected variable is the independent variable of the regression model.

Logistic Regression The logistic regression resembles the linear regression, while the response variable and explanatory variable of the general linear regression are usually the continuous variable, but the response variable explored by the logistic regression is the discrete variable; that is, it handles the qualitative variable of the two-dimensional independent variable problem (e.g., yes or no and success or failure). The model utilizes cumulative probability density function to convert real number value of the explanatory variable to probability value between 0 and 1. The elementary assumption is different from the analytic assumption of another multivariate analysis. The influence of the explanatory variable on the response variable is to fluctuate in the index form, which means that the logistic regression does not need to conform to the normal distribution assumption. In other words, it can handle the population of the nonnormal distribution and the problem of the nonlinear model and the nonmeasuring variable. The general logistic regression model is as follows:



(1)

where 𝑌: response variable of actual observation, 𝑌 = 1: a financial crisis event occurs, 𝑌 = 0: no financial crisis event occurs, 𝑌∗: latent variable without observation, 𝑥: matrix of explanatory variable, 𝛽: matrix of explanatory variable parameter, and 𝜀: error of explanatory variable.

Support Vector Machine (SVM)

The operation model of the SVM projects the initial input vector to eigenspace of the high dimension with linear and nonlinear core function

134

Modeling in Mathematics

and utilizes the separating hyperplane to distinguish two or many materials of different classes. The SVM utilizes the hyperplane classifier to classify the materials.

Linear Divisibility When the plain formed by the training sample data is linear, which consider the training vector: 𝑥𝑖 = belongs to two classes 𝑦𝑖 ∈ {−1, +1}. In order to definitely distinguish the training vector class, it is necessary to find out the optimal partition hyperplane able to separate the materials. If the hyperplane 𝑤⋅𝑥+𝑏 can separate the training sample, it is shown as (2)

(3) Adjust 𝑤 and 𝑏 properly; (2) and (3) can be rewritten as or as

(4)

(5) Pursuant to the statistics theory, the best interface not only separates two classes of samples correctly, but also maximizes the classification margin. The class margin of the interface 𝑤 ⋅ 𝑥+𝑏 is shown as Equation (7) can be acquired from (4):



(6)

(7) So the problem of the maximization class margin (𝑤, 𝑏) transforms to minimization |𝑤|2 /2 under constraint condition (5). Pursuant to Lagrange relaxation, the foresaid problem must accord with the hypothesis of (8) and (9). In the foresaid condition, the minimization is shown as (10): (8)

A Hybrid Approach of Stepwise Regression, Logistic Regression,....

135

(9) (10) Every 𝛼𝑖 corresponds to a training sample 𝑥𝑖, and the training sample of its corresponding 𝛼𝑖 > 0 is called the support vector. Classification function acquired finally is shown as

(11) where 𝑁𝑠 is the number of the support vector.

Linear Indivisibility If the training sample is linearly indivisible, (4) can be rewritten as (12)

where 𝜉𝑖 ≥ 0, 𝑖 = 1, . . . , 𝑛. If 𝑥𝑖 is classified mistakenly, then 𝜉𝑖 > 1. Thus, the mistaken classification is less than ∑𝑖 𝜉𝑖. Add a given parameter value in the objective function. Consider reasonably the maximum class margin and the minimum mistaken class sample; that is, seeking the minimum of |𝑤2 |/2 + (∑𝑖 𝜉𝑖) can acquire the SVM under linear indivisibility. Pursuant to Lagrange relaxation, the foresaid problem must accord with the hypothesis of (13) and (14). In the foresaid condition, the minimization is shown as (15): (13) (14)

(15)

Decision Tree (DT) The Decision Tree (DT) is the simplest in the inductive learning method

136

Modeling in Mathematics

[21]. It belongs to the data mining tool and can handle the continuous and noncontinuous variable. It establishes the tree structure diagram mainly by the given classification fact and induces some principles therein. The principles are mutually exclusive, and the DT generated can also make an out-of-sample prediction. The DT algorithms used most frequently include CART, CHAID, and C5.0 [22]. C5.0 [23] improves from ID3 [23]. Thanks to ID3 use limitation, it cannot handle the continuous numerical value materials; thus, Quinlan conducts a research for improvement, and C5.0 is developed to handle the continuous and the noncontinuous numerical value. The DT C5.0 is mainly separated into two parts. The first part is classification criterion, which is calculated pursuant to the gain ratio. Construct the DT completely as shown in (2). Information gained in (16) is used to calculate the pretest and posttest gain of the data set and is defined as “pretest information” minus “postinformation” from (17). The entropy in (16) is used to calculate impurity, which is called randomness. In other words, it is used to calculate randomness of the data set. When randomness in the data set reaches the most disorderly state, the value will be 1. Therefore, the less random the posttest data set is, the larger the information gain is calculated, and the more favorable it is for DT construction: (16) (17) The second part is pruning criterion. Pursuant to the error based pruning (EBP), the DT is properly pruned to enhance the correct ratio of classification. EBP is evolved from the pessimistic error pruning (PEP), and such two pruning methods are presented by Quinlan. The main concept of the EBP is to make a judgment using the error ratio, calculate the error ratio of every node, and further judge the node which results in rise of the error ratio of the overall DT. Finally, this node is pruned properly to further enhance the correct ratio of the DT.

Definition of Type I Error and Type II Error In order to establish the valid forecasting fraudulent financial statement, it is considerably important to measure type I type II errors of the study. Type I error is to mistakenly judge the normal financial statement company as the FFS company. This judgment does not cause investors’ damage, but it carries out an erroneous audit opinion for being too conservative and further

A Hybrid Approach of Stepwise Regression, Logistic Regression,....

137

influences credit of the company audited. Type II error is that the FFS enterprise is mistaken for the normal enterprise. This classification error leads to auditing failure, auditors’ investment loss, or investors’ erroneous judgment.

EMPIRICAL ANALYSIS Data Collection and Variables The research samples are the FFS enterprises from the years 1998 to 2012. 66 enterprises are selected from the listed and OTC companies of the Taiwan Economic Journal Data Bank (TEJ). The 1 by 1 pair way is adopted to match 66 normal enterprises, so there are 132 enterprises in total as research samples. As for selection of the research variables, the study altogether selects 29 variables, including 24 financial variables and 5 nonfinancial variables (see appendix). For consideration of the number of samples, to avoid having too few samples of the test group and to improve test accuracy, we propose to utilize 50% of the sample materials as the train sample to establish the regression classification model. The remaining 50% of the sample materials serve as the test sample to test validity of the classification model established.

Figure 1: Train and test subsets design.

138

Modeling in Mathematics

In addition, to test the stability of the proposed research model, this study randomly selects three groups at a ratio of 80% from the test data as the test sample for cross-validation. The compartment and sampling of data in this research are shown in Figure 1.

Model Development To begin with, the study aims for the financial and nonfinancial variable to screen using the stepwise regression screening method. The variables screened serve as the input variable of the logistic regression and SVM. Next, the study aims at every method to carry out the model training and test. Finally, the study compares the merit and demerit of the classification correct ratio and gives the relevant suggestions for the analytic result. The model construction is divided into three parts. The first part is the variable screening way; the second part is the classification way; the third part compares the test results of two kinds of classification models. The research process of the study is shown as Figure 2.

Figure 2: Research model.

Important Variable Screening While constructing the classification model, there may be quite many variables, but not every variable is important. Therefore, the variables of no account need to be eliminated to construct a simpler classification model. There are quite many variable screening ways, among which the stepwise regression variable screening method is used most frequently [24]. Therefore, the study adopts the suggestions of Pudil et al. [24] to screen the variables using the stepwise regression by which to retain the research variables with more influence. The input variables of the study are screened

A Hybrid Approach of Stepwise Regression, Logistic Regression,....

139

via the stepwise regression to acquire the results as shown in Table 1, including 7 financial variables and 1 nonfinancial variable. Subsequently, the study takes these 8 variables as the new input variables to construct the classification model. Table 1: Results of stepwise regression variable screening

Classification Model The prediction accuracy of the three types of models using the train datasets is displayed in Table 2. Table 2: Hit ratio of three models using the train datasets Research model Hit ratio

C5.0 93.94%

Logistic 83.33%

SVM 78.79%

As shown in Table 2, C5.0 has the best performance in the establishment of the prediction model and its accuracy rate is 93.94%. The traditional logistic model is the second best. The accuracy rate of the SVM model, at 78.79%, is the lowest of the three. The cross-validation results of the proposed three prediction models are shown in Tables 3 to 5. Table 3: C5.0 cross-validation results C5.0 model

Predict value







NonFFS

FFS

Actual value

CV1

Non-FFS

25

3

FFS

6

22

Non-FFS

25

3

FFS

4

24

Non-FFS

25

3

FFS

5

23

CV2 CV3

Hit ratio

Type I error

Type II error

83.93%

10.71%

21.42%

87.50%

10.71%

14.28%

85.71%

10.71%

17.85%

Modeling in Mathematics

140  







Average

25

3

5

23

85.71%

10.71%

17.85%

Decision Tree (DT) The study constructs the DT C5.0 model, sets EBP at 𝛼=5%, and adopt the binary partition principle to obtain the optimal spanning tree. The prediction results of the DT C5.0 classification model are shown as Table 3. On average, 25 of the 28 non-FFS materials are correctly classified in the non-FFS, and three of them are incorrectly classified in the FFS. The type I error is 10.71%. On the other hand, 23 of the 28 FFS materials are correctly classified, and the remaining five FFS materials are incorrectly classified in the non-FFS. The type II error is 17.85%.

Logistic Regression Table 4 is the empirical results of the logistic classification model, which shows that 25 of 28 non-FFS materials are correctly classified and that three of them are incorrectly classified in the FFS. The overall type I error is 9.52%. In addition, 20 of the 28 FFS materials are correctly classified, and the remaining eight FFS materials are incorrectly classified in the non-FFS. The type II error is 28.57%. Table 4: Logistic regression cross-validation results Logistic regression model

Predict value







Non-FFS

Actual value

CV1

Non-FFS

25

3

FFS

8

20

Non-FFS

26

2

FFS

8

20

Non-FFS

25

3

FFS

8

20

Non-FFS

25

3

FFS

8

20

CV2 CV3  

Average



Hit ratio

Type I error

Type II error

80.36%

10.71%

28.57%

82.14%

7.14%

28.57%

80.36%

10.71%

28.57%

80.95%

9.52%

28.57%

FFS

Table 5: SVM cross-validation results SVM model  



Predict value  

Non-FFS

FFS

Hit ratio

Type I error

Type II error

A Hybrid Approach of Stepwise Regression, Logistic Regression,.... Actual value

   

CV1

Non-FFS

26

2

FFS

13

15

CV2

Non-FFS

26

2

FFS

14

14

CV3

Non-FFS

26

2

FFS

14

14

Non-FFS

26

2

FFS

14

14

Average

141

73.21%

7.14%

46.42%

71.43%

7.14%

50.00%

71.43%

7.14%

50.00%

72.02%

7.14%

48.81%

Support Vector Machine (SVM) The operation core is set at RBF when the study constructs the SVM model. As for the parameter, the C search scope is set at 2−10 to 210, and γ is set at 0.1. The SVM classification results are shown as Table 5. In this part, 26 of the 28 non-FFS materials are correctly classified, and two of them are incorrectly classified in the FFS. The type I error is 7.14%. In addition, 14 of the 28 FFS materials are correctly classified, and the remaining 14 FFS materials are incorrectly classified in the non-FFS. The type II error is 48.81%.

Comprehensive Comparison and Analysis Kirkos et al. [10] pointed out that the merit and demerit of the evaluation model must also consider the type I error and type II error. The type I error means to classify the nonfraudulent companies into the fraudulent companies. Occurrence of these two type errors results from the auditing failure of the auditors. Type II error means that the auditors classify the fraudulent companies into the nonfraudulent companies. Both types of error would cause different loss costs, and the auditors must avoid occurrence of these two errors. Comparing the results of these three models, we conclude that the classification ability of the DT C5.0 is the best, the next is the logistic regression, and the last is the SVM. The classification correct ratios of three kinds of model are summarized as shown in Table 6. Table 6: Summary of classification results Model Logistic regression SVM DT C5.0

Type I error 9.52% 7.14% 10.71%

Type II error 28.57% 48.81% 17.85%

Hit ratio 80.95% 72.02% 85.71%

Ranking 2 3 1

142

Modeling in Mathematics

The comparison shows that, although the logistic classification model performs the best for type I errors, the DT C5.0 possesses the best classification effect, both for type II errors and the hit ratio. The correct classification ratio is 85.71%, followed by 80.95% for the logistic model, and 72.02% for the SVM model. Unlike general studies using type I errors to judge the performance of prediction models, FFS studies use type II errors to determine the performance of prediction models. For the sake of prudence, we conduct the statistical test of type II errors in the abovementioned cross-validation results to confirm whether the differences in between models are significantly other than 0. The analysis results are shown in Table 7, which shows that the t-values of the prediction model type II error differences are −5.201 (C5.0—Logistic); −16.958 (Logistic—SVM); and 9.823 (SVM— C5.0), respectively, and all of them reach the significance level. Table 7: Paired-samples t test Model C5.0—logistic Logistic—SVM SVM—C5.0

t-value −5.201 −16.958 9.823

DF 2 2 2

Significant (two-tailed) 0.35 0.03 0.10

CONCLUSION AND SUGGESTION As the fraudulent financial statement (FFS) increases on the trot in recent years, the auditing failure risk of the auditors also rises thereby. Therefore, many researches focus on developing a good classification model to reduce the relevant risk. In the past, the accuracy of forecasting FFS purely using regression analysis has been relatively low. Many scholars have pointed out that prediction by data mining can improve the accuracy rate. Thus, this study adopts stepwise regression to screen the important factors of financial and nonfinancial variables. Meanwhile, it combines the above with data mining techniques to establish a more accurate FFS forecast model. A total of eight critical variables are screened via the stepwise regression analysis, including two parts: financial variables (accounts receivables/total assets, inventory/current assets, interest protection multiples, cash flow ratio, accounts payable turnover, operation profit/last year operation profit > 1.1) and nonfinancial variables (pledge ratio of shares of the directors and supervisors).

A Hybrid Approach of Stepwise Regression, Logistic Regression,....

143

The financial variables include operating capabilities, profitability index, debt solvency ability index, and financial structure. The nonfinancial variables include relevant variables of stock rights and scale of an enterprise’s directors and supervisors. The results indicate that when auditors investigate FFS, they must focus on the alert provided by the nonfinancial information as well as the financial information. In the classification model, the study adopts the logistic regression of the traditional classification method and the DT C5.0 and SVM of data mining to construct the classification model. The empirical result indicates that the SVM model performs the best in the type I error after comparison, and the DT C5.0 has the best classification performance in the type II error and overall classification correct ratio. One of the research purposes is to anticipate accommodating the auditors with another assistant auditing tool besides the traditional analysis method, but the research about the forecasting FFS is not sufficient. Therefore, the subsequent researchers can also adopt other methods to forecast the FFS to provide a better reference. In addition, future researchers can also try to adopt different variable screening methods to enhance the classification correct ratio of the method. As for the variable, some nonfinancial variables are difficult to measure, and material acquisition is difficult, so the study does not incorporate them. Finally, as for the sample, the study focuses on the FFS scope research, and a certain number of the FFSs may not be found. Therefore, the pair companies can also be the FFS companies in the coming year, which can influence the accuracy of the study. The findings of this study can provide a reference to auditors, certified public accountants (CPAs), securities analysts, company managers, and future academic studies.

Appendix See Table 8. Table 8: Selection of the research variables

144

Modeling in Mathematics

A Hybrid Approach of Stepwise Regression, Logistic Regression,....

145

REFERENCES 1.

C. Spathis, M. Doumpos, and C. Zopounidis, “Detecting falsified financial statements: a comparative study using multi-criteria analysis and multivariate statistical techniques,” The European Accounting Review, vol. 11, pp. 509–535, 2002.  2. Z. Rezaee, “Causes, consequences, and deterence of financial statement fraud,” Critical Perspectives on Accounting, vol. 16, no. 3, pp. 277– 298, 2005.   3. S. Kotsiantis, E. Koumanakos, D. Tzelepis, and V. Tampakas, “Forecasting fraudulent financial statements using data mining,” Transactions on Engineering Computing and Technology, vol. 12, pp. 283–288, 2006. 4. C.-C. Yeh, D.-J. Chi, and M.-F. Hsu, “A hybrid approach of DEA, rough set and support vector machines for business failure prediction,” Expert Systems with Applications, vol. 37, no. 2, pp. 1535–1541, 2010.  5. W. Zhou and G. Kapoor, “Detecting evolutionary financial statement fraud,” Decision Support Systems, vol. 50, no. 3, pp. 570–575, 2011.   6. S. L. Humpherys, K. C. Moffitt, M. B. Burns, J. K. Burgoon, and W. F. Felix, “Identification of fraudulent financial statements using linguistic credibility analysis,” Decision Support Systems, vol. 50, no. 3, pp. 585–594, 2011.   7. K. A. Kamarudin, W. A. W. Ismail, and W. A. H. W. Mustapha, “Aggressive financial reporting and corporate fraud,” Procedia-Social Behavioral Sciences, vol. 65, pp. 638–643, 2012.  8. P.-F. Pai, M.-F. Hsu, and M.-C. Wang, “A support vector machinebased model for detecting top management fraud,” Knowledge-Based Systems, vol. 24, no. 2, pp. 314–321, 2011.   9. Accounting Research and Development Foundation, Audit the Financial Statements of the Considerations for Fraud, Accounting Research and Development Foundation, Taipei, Taiwan, 2013. 10. S. Kirkos, C. Spathis, and Y. Manolopoulos, “Data mining techniques for the detection of fraudulent financial statements,” Expert Systems with Applications, vol. 32, no. 4, pp. 995–1003, 2007.   11. T. B. Bell and J. V. Carcello, “A decision aid for assessing the likelihood of fraudulent financial reporting,” Auditing, vol. 19, pp. 169–178, 2000.   12. V. D. Sharma, “Board of director characteristics, institutional ownership, and fraud: evidence from Australia,” Auditing, vol. 23, no.

146

13.

14.

15.

16. 17.

18.

19.

20.

21.

22.

23.

Modeling in Mathematics

2, pp. 105–117, 2004.   W.-H. Chen, S.-H. Hsu, and H.-P. Shen, “Application of SVM and ANN for intrusion detection,” Computers and Operations Research, vol. 32, no. 10, pp. 2617–2634, 2005.   J. W. Seifert, “Data mining and the search for security: challenges for connecting the dots and databases,” Government Information Quarterly, vol. 21, no. 4, pp. 461–480, 2004.   M. S. Beasley, “An empirical analysis of the relation between the board of director composition and financial statement fraud,” Accounting Review, vol. 71, no. 4, pp. 443–465, 1996.  P. Dunn, “The impact of insider power on fraudulent financial reporting,” Journal of Management, vol. 30, no. 3, pp. 397–412, 2004.   G. Chen, “Positive research on the financial statement fraud factors of listed companies in China,” Journal of Modern Accounting and Auditing, vol. 2, pp. 25–34, 2006.  Z. Huang, H. Chen, C.-J. Hsu, W.-H. Chen, and S. Wu, “Credit rating analysis with support vector machines and neural networks: a market comparative study,” Decision Support Systems, vol. 37, no. 4, pp. 543– 558, 2004.   K.-S. Shin, T. S. Lee, and H.-J. Kim, “An application of support vector machines in bankruptcy prediction model,” Expert Systems with Applications, vol. 28, no. 1, pp. 127–135, 2005.   S. L. Summers and J. T. Sweeney, “Fraudulently misstated financial statements and insider trading: an empirical analysis,” The Accounting Review, vol. 73, no. 1, pp. 131–146, 1998.  G. Arminger, D. Enache, and T. Bonne, “Analyzing credit risk data: a comparison of logistic discrimination, classification tree analysis, and feedforward networks,” Computational Statistics, vol. 12, no. 2, pp. 293–310, 1997.   S. Viaene, G. Dedene, and R. A. Derrig, “Auto claim fraud detection using Bayesian learning neural networks,” Expert Systems with Applications, vol. 29, no. 3, pp. 653–666, 2005.   J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.

A Hybrid Approach of Stepwise Regression, Logistic Regression,....

147

24. P. Pudil, K. Fuka, K. Beránek, and P. Dvorák, “Potential of artificial intelligence based feature selection methods in regression models,” in Proceedings of the IEEE 3rd International Conference on Computational Intelligence and Multimedia Application, pp. 159–163, 1999.

CHAPTER 7

Dynamical Analysis in Explicit Continuous Iteration Algorithm and its Applications

Qingyi Zhan1 , Zhifang Zhang2 and Xiangdong Xie3 College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, P.R. China 1

Department of Sciences and Education, Fujian Center for Disease Control and Prevention, Fuzhou, P.R. China 2

Ningde Normal University, Ningde, P.R. China

3

ABSTRACT This article is devoted to the dynamical analysis of an explicit continuous iteration algorithm, describing its construction, relationship with the explicit

Citation (APA): Zhan, Q., Zhang, Z., & Xie, X. (2018). Dynamical analysis in explicit continuous iteration algorithm and its applications. Advances in Difference Equations, 2018(1), 457. (10 pages). DOI: https://doi.org/10.1186/s13662-018-1909-z Copyright: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

150

Modeling in Mathematics

trapezoid method, and error analysis. A theorem demonstrating the equality of these methods is also established. The accuracy of the theoretical results and universality of the explicit continuous iteration algorithm are proved by numerical experiments. Keywords: Error analysis; Explicit continuous iteration algorithm; Implicit method; Numerical simulation

INTRODUCTION With developments in the society and economy, scientific computing has recently become increasingly popular in the world. In fact, it is essential that we derive high order, efficient numerical methods to solve differential equations, which are widely used in physical problems. In particular, it is very important to construct fast algorithms for solving practical problems. It is well known that many numerical methods are applied to mathematical models to investigate the solution space. Some of these methods are explicit, such as Euler scheme, Adams scheme, and Runge–Kutta scheme, we refer the reader to [9, 10] and the references therein. Others employ implicit methods. However, implicit approaches have many shortcomings, such as being overly complex, relatively slow, and requiring excessive internal memory space. Therefore, explicit methods have become more widely used. Many uncertainties and practical difficulties involved in models for solving differential equations mean that there are relatively few reports in the literature up to now. In [1], Butcher stated that the classic finite stage Runge–Kutta methods could be expanded to infinite stage Runge–Kutta methods, and suggested that the finite summation should be changed to definite integration over finite intervals. However, he did not make any further progress in this field. In 2010, Haier built on this important concept and provided an expression for a continuous stage Runge–Kutta method [2]. We expand this to the case of ordinary differential equations (ODEs), which describe many natural phenomena in meteorology, biology, and so on [7, 8]. To the best of our knowledge, there are no previous reports of explicit continuous iterative methods in the literature. The main motivations for this work are twofold. On the one hand, the classical results on explicit numerical methods are the basis for this research. A variety of numerical methods have been applied to different aspects of differential equations, and many important results have revealed the mechanisms of dynamical behavior. On the other hand, our earlier

Dynamical Analysis In Explicit Continuous Iteration Algorithm and .....

151

work [10, 11] on stability analysis and numerical simulations of stochastic differential equations have inspired further study in this direction. For example, there has been some research on the numerical analysis [5, 6, 11] and numerical simulations [10] of stochastic differential equations. These studies established the foundation of numerical analysis. In this study, we first construct a class of explicit continuous iterative (ECI) algorithms, and then compare with other classes in terms of class of numerical methods and error analysis. Numerical examples are presented to illustrate the feasibility of the ECI algorithm and to provide accurate solutions within a reasonable time. These results show that, under some appropriate conditions, ECI algorithm can be used to solve some nonlinear ODEs more accurately than with some existing numerical approximations. The remainder of this paper is organized as follows. Section 2 describes the construction of the ECI algorithm, and introduces some relevant concepts and norms which will be utilized later. Section 3 is devoted to the theoretical analysis of the ECI algorithm, i.e., the error analysis of the solution and equivalence properties. Section 4 presents numerical experiments in some given areas, including illustrative numerical results for the main theorem. Section 5 provides the conclusions to this study

CONSTRUCTION OF EXPLICIT CONTINUOUS ITERATIVE ALGORITHM We consider the following test equation:

where Y=

. The norm of a variable is defined as follows:

For simplicity of notation, the norm otherwise stated in the sequel.

is usually written as

unless

Motivated by Haier’s work [2], i.e., continuous stage Runge–Kutta method, we subdivide the time axis into the union of subintervals [nt, (n + 1)t], i.e.,

152

Modeling in Mathematics

and utilize the step function and Haier’s construction method to form the following ECI algorithm: (1) Theorem 2.1 Yn and Yn+1 obtained by scheme (1) are the same as those by the trapezoid formula. Proof On the one hand, it follows from scheme (1) that we obtain the following results. When n = 0, we have (2) And when n = 1, we have (3) By the continuity of (2) and (3), we can get Therefore, we have

Similarly, when n = 2, we obtain (4) By the continuity of (3) and (4), we have

Therefore, we obtain

Dynamical Analysis In Explicit Continuous Iteration Algorithm and .....

153

It follows from the same method that we have

Therefore, we obtain

(5)

On the other hand, by the trapezoid formula, we have

i.e., (6) Combining (5) and (6), we complete our proof Remark 1 As known from the above results, the essence of the ECI algorithm is that explicit iteration is applied to fast obtain approximate solutions, which are continuous and more accurate, to the true solutions.

ERROR ANALYSIS Lemma 3.1 The function U(t) satisfies the following vector ordinary differential equation:

and initial conditions U(nh) = Yn. Furthermore,

Proof Firstly, it follows from the expression of function U(t) that

The derivative of function U(t) is given by

154

Modeling in Mathematics

Secondly, we have U(nh) = Yn. Lastly, we integrate the derivative

and obtain

We make the transform t – nh = τ and have

Therefore, we can obtain the claim of Lemma 3.1 as follows:

This completes our proof. Lemma 3.2 Let the local error be En = Yn – Y(nh). Then it satisfies the following equality:

(7)

Proof If t satisfies the condition (n + 1)h ≤ t ≤ (n + 2)h, we have

Then we can obtain U((t + 1)h) = Yn+1. It follows from Lemma 3.1 that

By the vector differential equation Y’ = aY, Y(0) = II, its solution is Y(t) = eatII. By the definition of local error En, we have

Dynamical Analysis In Explicit Continuous Iteration Algorithm and .....

155

This completes our proof. By the conclusions of Lemmas 3.1 and 3.2, we obtain the following error control theorem. Theorem 3.1 If a < 0, the ECI algorithm (1) satisfies the following error propagation inequality:

Proof When a < 0, it follows from the condition 0 < τ < h that we have the conclusion 1 – 0.5aτ > 1. By the properties of the integral, we obtain

Therefore, by Lemma 3.2 and the triangle inequality of the norm, we obtain

Therefore, the conclusion of Theorem 3.1 follows from Lemma 3.1. Remark 2 The advantages of this method are not only in its convergence, that is, the iterated error being limited in a small interval, but also in its ability to simulate the solutions of ODEs continuously and explicitly, which can help simulate the true solutions more accurately

Modeling in Mathematics

156

NUMERICAL EXPERIMENTS Comparison with Classic Methods As for the test equation in Sect. 2, we only consider the special case Y ∈ , a = –4.0 and Y(0) = 1.0. We compare numerical solutions obtained by the ECI algorithm with those yielded by some classic methods, such as Euler method and implicit trapezoid method. We choose the step size h = 0.01, and the different results are shown as follows. From the data shown in Table 1, we see that the results obtained by the ECI algorithm and trapezoid method are almost the same, and with the increasing number of iterations, the solutions become closer to zero. It follows from Table 2 and Figs. 1–2 that the accuracy of the numerical solutions obtained by the ECI algorithm is much higher than that obtained by Euler method, and the error approaches zero much faster. Meanwhile, Fig. 3 shows that this algorithm is stable for different initial values. All these facts verify the theoretical results. Table 1: Comparison of numerical solutions for different number of steps, N, obtained by different methods N 0 50 100 200 500 1000

Euler method 1.0 0.1353 0.0176 2.9647e − 04 1.4235e − 09 1.9452e − 18

ECI algorithm

Trapezoid method

1.0 0.1408 0.0191 3.4878e − 04 2.1396e − 09 4.3982e − 18

1.0 0.1408 0.0191 3.4878e − 04 2.1396e − 09 4.3982e − 18

Table 2: Comparison of errors for different number of steps, N, obtained by different methods N 0 50 100 200 500

Euler method 0 3.7582e − 05 7.4239e − 04 3.8996e − 05 6.3769e − 10

ECI algorithm

Trapezoid method

0 0.0055 7.3741e − 04 1.3320e − 05 7.8414e − 11

0 0.0055 7.3741e − 04 1.3320e − 05 7.8414e − 11

Dynamical Analysis In Explicit Continuous Iteration Algorithm and ..... 1000

2.3032e − 18

1.4988e − 19

157

1.4988e − 19

Figure 1: Comparison of the numerical solutions in the time interval [0, 2] for two methods.

Figure 2: Comparison of the numerical solutions in the time interval [1, 2] for two methods.

158

Modeling in Mathematics

Figure 3: Stability of the ECI algorithm.

Applications in Numerical Simulations We consider a nonlinear ordinary differential equation with initial value

Firstly, we make a transformation as follows. Let Z = Y + t, then we have

So solution is

. Therefore, the analytic

Secondly, the ECI algorithm is applied to this equation and the numerical solution is obtained as follows:

We choose the step size h = 0.01 and iterative step N = 600. The numerical results are shown as the following Figs. 4 and 5. And we obtain the computational time when the error tolerance is no more than ||E|| = 4.91e–

Dynamical Analysis In Explicit Continuous Iteration Algorithm and .....

159

06. The results are as follows. Figures 4–5 and Table 3 demonstrate that the accuracy of the ECI algorithm is much higher than that of the trapezoid method, and the iteration times decrease obviously, so that the computational efficacy of the ECI algorithm is better than that of Euler and trapezoid methods. Altogether, the ECI algorithm is an excellent and appropriate method for some nonlinear ODEs.

Figure 4: Comparison of numerical solution given by the ECI algorithm and analytic solution.

Figure 5: Comparison of errors from Euler scheme and ECI algorithm.

160

Modeling in Mathematics

Table 3: Comparison of various characteristics for different methods Main properties computational time number of iterations average iteration times used CPU time

Euler method 4.431 890 2.011 0.527

ECI algorithm

Trapezoid method

3.852 773 2.008 0.471

4.231 1369 3.237 0.508

Remark 3 As we see from this numerical experiment, although the ECI algorithm is constructed for simple test equations, it can be extended to some nonlinear ODEs, which can generate dynamical systems by some parameter transformations. However, the conditions, which should be satisfied for such nonlinear ODEs, are still to be investigated, and the associated algorithm will be revised, if needed. All these questions will be tackled in our future work.

CONCLUSION The main result of this paper is the dynamical analysis of the ECI algorithm and its applications in simulating the solutions of ODEs. The results show that this algorithm is effective and the numerical results can match the results of theoretical analysis. Although some progress is made, more practical models and methods, which are needed to solve a system of ODEs or stochastic differential equations, will be shown in our future work.

ACKNOWLEDGEMENTS The authors would like to express their gratitude to the referees for giving strong and very useful suggestions for improving the article.

Dynamical Analysis In Explicit Continuous Iteration Algorithm and .....

161

REFERENCES 1.

Butcher, J.C.: The Numerical Analysis of Ordinary Differential Equations: Runge–Kutta and General Linear Methods. Wiley, New York (1987) 2. Haier, E.: Energy-preserving variant of collocation methods. J. Numer. Anal. Ind. Appl. Math. 5, 73–84 (2010) 3. Khasminskii, R.: Stochastic Stability of Differential Equations, 2nd edn. Springer, Berlin (2011)Google Scholar 4. Milstein, G.: Numerical Integration of Stochastic Differential Equations. Kluwer Academic, Dordrecht (1995) 5. Wang, P.: A-stable Runge–Kutta methods for stiff stochastic differential equations with multiplicative noise. Comput. Appl. Math. 34(2), 773– 792 (2015) 6. Wang, T.: Optimal point-wise error estimate of a compact difference scheme for the coupled Gross–Pitaevskii equations in one dimension. J. Sci. Comput. 59, 158–186 (2014) 7. Xie, X., Chen, F.: Uniqueness of limit cycle and quality of infinite critical point of a class of cubic system. Ann. Differ. Equ. 21, 3 (2005) 8. Xie, X., Zhan, Q.: Uniqueness of limit cycles for a class of cubic system with an invariant straight line. Nonlinear Anal. TMA 70(12), 4217–4225 (2009) 9. Yang, Q.: Numerical Analysis, 2nd edn. Tsinghua University Press (2008) 10. Zhan, Q.: Mean-square numerical approximations to random periodic solutions of stochastic differential equations. Adv. Differ. Equ. 2015, 292, 1–17 (2015) 11. Zhan, Q.: Shadowing orbits of stochastic differential equations. J. Nonlinear Sci. Appl. 9, 2006–2018 (2016) 12. Zhan, Q., Xie, X., Zhang, Z.: Stability results of a class of differential equations and application in medicine. Abstr. Appl. Anal. 2009, Article ID 187021 (2009)

CHAPTER 8

A New Stability Analysis of Uncertain Delay Differential Equations

Xiao Wang1,2 and Yufu Ning3 School of Economics and Management, Beijing Institute of Petrochemical Technology, Beijing 102617, China 1

2

Beijing Academy of Safety Engineering and Technology, Beijing 102617, China

School of Information Engineering, Shandong Youth University of Political Science, Jinan 250103, China 3

ABSTRACT This paper first provides a concept of almost sure stability for uncertain delay differential equations and analyzes this new sort of stability. In addition, this paper derives three sufficient conditions for uncertain delay differential equations being stable almost surely. Finally, the relationship

Citation (APA): Wang, X., & Ning, Y. (2019). A New Stability Analysis of Uncertain Delay Differential Equations. Mathematical Problems in Engineering, 2019. (8 pages). DOI: https://doi.org/10.1155/2019/1257386 Copyright: 2019 Xiao Wang and Yufu Ning. This is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

164

Modeling in Mathematics

between almost sure stability and stability in measure for uncertain delay differential equations is discussed.

INTRODUCTION In order to deal with nondeterministic phenomenon in a dynamic system, Ito [1] proposed stochastic differential equation, which was driven by Wiener process. From then on, stochastic differential equation was employed to study dynamic systems with perturbation and applied in the fields of finance, control, and aerospace engineering. In the process of researching social system, the data used to describe the dynamic system may come from the domain experts. At this time, the expert data cannot be regard as a random variable. How to deal with these expert data in such dynamic system is an immediate problem. To tackle this problem, Liu [2] established uncertainty theory and proposed uncertain variable to describe the expert data. In addition, Liu [3] also proposed the concept of uncertain process to describe the evolution of an uncertain phenomenon. As a comparison of Wiener process, Liu process was designed by Liu [4]. Based on Liu process, uncertain calculus [4] is proposed to solve the integral and differential of an uncertain process. Driven by a Liu process, uncertain differential equation was proposed [3] to deal with dynamic systems under uncertain environment. In the aspect of theory, Chen and Liu [5] and Gao [6] proved two existence and uniqueness theorems on uncertain differential equations, respectively. Since it was difficult to obtain the analytic solutions of vast majority of uncertain differential equations, some numerical methods were proposed, such as Milne method [7], Adams-Simpson method [8], Euler method [9], and Hamming method [10]. In regard to stability analysis, stability in measure of uncertain differential equations was put forward by Liu [4] and stability in measure of linear uncertain differential equations was discussed by Yao, Gao, and Gao [11]. In addition, other types of stability were studied, such as almost sure stability [12], stability in moment [13], exponential stability [14], and stability in inverse distribution [15]. So far, some researchers employed uncertain differential equations to model the financial market. Uncertain stock model [16, 17], uncertain interest rate model [18, 19], and uncertain currency model [20, 21] have become the focus of attention for many scholars. In addition, uncertain differential equations have also been introduced into string vibration [22], differential game [23], optimal control [24], and so on.

A New Stability Analysis of Uncertain Delay Differential Equations

165

By using an uncertain differential equation, we can establish a mathematical model to describe the dynamic system under an uncertain environment, where the velocity of such dynamic system only depends on the state of the system at a given instant of time. However, the velocity of the dynamic system depends not only on the current state but also upon the previous states in some real phenomena. In such case, it is inappropriate to insist on modeling by using uncertain differential equation. An extended type called uncertain delay differential equation happens to describe the dynamic system just mentioned. Uncertain delay differential equations have been widely used in the field of engineering, especially in automatic control system; in the field of natural science, such as ecosystem, infectious diseases, and population dynamics; and in the field of social science to describe economic phenomena, commercial sales and transportation scheduling [25–28]. Back to the theoretical research on uncertain delay differential equations, Barbacioru [29] and Ge and Zhu [30] explored two existence and uniqueness theorems on uncertain delay differential equations, respectively. In the aspect of stability of uncertain delay differential equations, Wang and Ning [31] defined the stability in measure, stability in mean, and stability in moment and proved the corresponding stability theorems. In addition, Jia and Sheng [32] also discussed the stability in distribution recently. In this paper, we propose a new stability called almost sure stability for an uncertain delay differential equation and give its sufficient condition. The structure of this paper is organized as follows. Section 2 introduces some basic knowledge of uncertain delay differential equations. Section 3 gives the definition of almost sure stability for an uncertain delay differential equation. Then we present three sufficient conditions for an uncertain delay differential equation and linear uncertain delay differential equations being stable almost surely in Section 4. After that, Section 5 analyzes the relationship between almost sure stability and stability in measure. The last section makes a brief conclusion.

UNCERTAIN DELAY DIFFERENTIAL EQUATION This part makes a brief introduction of uncertain delay differential equation based on uncertain variable and uncertain process, where uncertain variable and uncertain process can be seen in Appendixes A and B. Definition 1 (Barbacioru [29]). Let Ct be a Liu process, and f and g are two real-valued functions.

(1)

166

Modeling in Mathematics

is called an uncertain delay differential equation, where is called time delay. Theorem 2 (Ge and Zhu [30]). Uncertain delay differential equation (1) with initial states has a unique solution if the coefficients satisfy



(2)

and (3) for some positive constant L. Definition 3 (Wang and Ning [31]). Uncertain delay differential equation (1) is said to be stable in measure if for any two solutions Xt and Yt with different initial states xj and yj for any , respectively, we have for any

(4)

is uncertain measure (see Appendix A).

Definition 4 (Liu [4]). Let be an uncertain process and a Liu process, respectively. For any partition of the closed interval the mesh is written as (5) Then the uncertain integral of Xt with respect to Ct is defined by

(6) provided that the limit exists almost surely and is finite. Theorem 5 (Chen and Liu [5]). Supposing that Ct is a Liu process and Xt is an integrable uncertain process on [a, b] with respect to t, then

A New Stability Analysis of Uncertain Delay Differential Equations

holds, where

is the Lipschitz constant of

167

(7) .

ALMOST SURE STABILITY Uncertain delay differential equation (1) is equivalent to the uncertain delay integral equation



(8)

For the sake of simplicity, we set the initial time t0 to zero. Then, the above equation can be simplified as

(9) Now let us present a definition of almost sure stability for uncertain delay differential equation (1). Definition 6. Supposing that Xt and Yt are two solutions of uncertain delay differential equation (1) with different initial states , respectively, uncertain delay differential equation (1) is said to be stable almost surely if

(10) Example 7. Consider an uncertain delay differential equation (11) The analytical solution of uncertain delay differential equation (11) with two

168

Modeling in Mathematics

initial states

is



(12)



(13)



(14)

And

respectively. Then

and, therefore, we have

(15) By using Definition 6, we have that uncertain delay differential equation (11) is stable almost surely. Example 8. Consider an uncertain delay differential equation (16) The analytical solution of uncertain delay differential equation (16) with two initial states

is

A New Stability Analysis of Uncertain Delay Differential Equations

169

(17) and

(18) respectively. Then



(19)

and, therefore, we have

(20) This means that uncertain delay differential equation (16) is stable almost surely by using Definition 6.

STABILITY THEOREM A sufficient condition for uncertain delay differential equation (1) being stable almost surely is discussed and shown by the following theorem.

170

Modeling in Mathematics

Theorem 9. Supposing that uncertain delay differential equation (1) has a unique solution for each given initial state. Then uncertain delay differential equation (1) is stable almost surely if the coefficients satisfy

where

(21)

.

Proof. Let Xt and Yt denote two solutions of uncertain delay differential equation (1) with different initial states respectively. That is,

and



,

(22)

And

(23) Then for any Lipschitz continuous sample

, we have

And

(24)

A New Stability Analysis of Uncertain Delay Differential Equations

By using formula (21) and Theorem 5, the inequality



(25)

(26) holds, where is the Lipschitz constant of . By using the Gronwall’s inequality [33], we have

171

172

Modeling in Mathematics

(27)

for any t>0. Since

(28) and

is finite, we have that

as long as

, which implies that (29) This means that uncertain delay differential equation (1) is almost surely stable under formula (21). Example 10. Consider an uncertain delay differential equation Take common upper bound of

and

(30) . Let N denote a

with t>0. The inequalities

(31)

(32) hold. According to Theorem 2, we obtain that uncertain delay differential equation (30) with initial states has a unique solution. In addition, by using inequality (32) and (33)

A New Stability Analysis of Uncertain Delay Differential Equations

173

uncertain delay differential equation (30) is stable almost surely by Theorem 9. Example 11 (uncertain cell population growth model). The initial cell population growth model was provided by the following equation [34]: (34) where Nt is the number of cells in cell population, growth rate, and is the delayed growth rate.

is the instantaneous

If these biological systems operate in uncertain environment, the population Nt is an uncertain process and its growth is described by the uncertain delay differential equation

(35)

where is a constant and Ct is a Liu process. Uncertain delay differential equation (35) is called uncertain cell population growth model. Take common upper bound of

. Let M denote a . The inequalities (36)

And



(37)

hold. According to Theorem 2, we have that uncertain delay differential equation (35) with the initial states has a unique solution. However, take According to Theorem 9, we cannot judge whether the solution of uncertain delay differential equation (35) is almost surely stable or not. From Examples 10 and 11, we can see that Theorem 9 only gives the sufficient condition but not the necessary and sufficient condition for uncertain delay differential equation being almost surely stable. On the basis of Theorem 9, we immediately present a corollary about a sufficient condition for a linear uncertain delay differential equation being stable almost surely.

174

Modeling in Mathematics

Corollary 12. Supposing that are realvalued functions, then the linear uncertain delay differential equation

(38) is almost surely stable if

are bounded,

(39)

and

(40) Proof.

Take . Let N denote a common upper bound of . The inequalities

and

(41)

(42) hold. According to Theorem 2, we have that linear uncertain delay differential equation (38) with initial states has a unique solution. Since

we take

which is integrable on

(43) . Hence, we have

A New Stability Analysis of Uncertain Delay Differential Equations

and

175

(44)

(45) By using Theorem 9, the linear uncertain delay differential equation (38) is almost surely stable. Example 13. Consider a linear uncertain delay differential equation (30): The real-valued functions , and

(46)

are bounded on the interval

(47) By using Corollary 12, we also have that uncertain delay differential equation (30) is stable almost surely. Up to now, we have discussed the almost sure stability of linear uncertain delay differential equation (38). In what follows, let us consider another type of uncertain delay differential equation. Theorem 14. Supposing that the uncertain delay differential equation

are real-valued functions, then (48)

is almost surely stable if

are bounded, and

(49) Proof. According to Theorem 2, we have that uncertain delay differential equation (48) has a unique solution with given initial states when are bounded. It is supposed that are the solutions of uncertain delay differential equation (48) with different initial states , respectively. That is,

and

(50)

176

Modeling in Mathematics

Then, we have and

(51) (52)



(53)



(54)

By using the Gronwall’s inequality [33], we have

for any . Hence, uncertain delay differential equation (48) is almost surely stable if (55) The above inequality is equivalent to the following inequality

A New Stability Analysis of Uncertain Delay Differential Equations

177

(56) Therefore,

(57)

holds, and uncertain delay differential equation (48) is almost surely stable following from Definition 6. Example 15. Consider an uncertain delay differential equation

(58)

It follows from Theorem 2 that uncertain delay differential equation (58) has a unique solution with given initial states. In addition, the real-valued functions

are bounded on the interval

, and

(59) According to Theorem 14, uncertain delay differential equation (58) is stable almost surely.

COMPARISON The relationship between almost sure stability and stability in measure for uncertain delay differential equation (1) is shown as below. Theorem 16. If uncertain delay differential equation (1) is almost surely stable, uncertain delay differential equation (1) is stable in measure. Proof. Supposing that Xt and Yt are two solutions of the uncertain delay differential equation (1) with different initial states xj and yj for any , respectively, according to Definition 6, we have

(60) That is, there exists a set

such that, for any

,

178

Modeling in Mathematics

(61) According to (61), for any as long as

. It directly leads to

(62) However,

Hence, for any



(63)



(64)

, we have

Thus almost sure stability implies the stability in measure.

CONCLUSION In this paper, we proposed a concept of almost sure stability of analytical solutions of uncertain delay differential equations. Meanwhile, we provided three sufficient conditions for uncertain delay differential equations being stable almost surely. At last, we analyzed the relationship between almost sure stability and stability in measure and found that almost sure stability could imply the stability in measure for uncertain delay differential equations. In future, we will focus on the application of uncertain delay differential equations in the area of biological systems and finance.

APPENDIX A. Uncertain Variable Definition A.1 (Liu [2, 4]). Let be a -algebra on a nonempty set . Uncertain measure is a set function from satisfying the following axioms:

A New Stability Analysis of Uncertain Delay Differential Equations

179

.

Axiom 1. Axiom 2.

.

Axiom 3. For every countable sequence of

, we have

Axiom 4. Let product uncertain measure

be uncertainty spaces for k = 1, 2, . . . The is an uncertain measure satisfying

where

(A.1)

(A.2)

respectively.

Definition A.2 (Liu [2]). An uncertain variable from an uncertainty space

is a measurable function

to the set of real numbers.

B. Uncertain Process Definition B.1 (Liu [3]). Let be an uncertainty space and T be a totally ordered set. An uncertain process Xt is a function from T× to the set of real numbers such that (B.1) is an event for any Borel set B at each t. Definition B.2 (Liu [4]). An uncertain process is called a Liu process if (1)C0=0 and almost all sample paths are Lipschitz continuous; (2)Ct is a stationary independent increment process; (3)every increment is a normal uncertain variable with expected value 0 and variance .

180

Modeling in Mathematics

ACKNOWLEDGMENTS This work was funded by National Natural Science Foundation of China (11701338) and a Project of Shandong Province Higher Educational Science and Technology Program (J17KB124).

A New Stability Analysis of Uncertain Delay Differential Equations

181

REFERENCES 1. 2. 3. 4. 5.

6.

7. 8.

9.

10.

11.

12.

13.

K. Ito, “Stochastic integral,” in Proceedings of the Japan Academy, pp. 519–524, Tokyo, Japan, 1944. B. Liu, Uncertainty Theory, Springer-Verlag, Berlin, Germany, 2nd edition, 2007. B. Liu, “Fuzzy process, hybrid process and uncertain process,” Journal of Uncertain Systems, vol. 2, no. 1, pp. 3–16, 2008. B. Liu, “Some research problems in uncertainty theory,” Journal of Uncertain Systems, vol. 1, pp. 3–10, 2009. X. W. Chen and B. Liu, “Existence and uniqueness theorem for uncertain differential equations,” Fuzzy Optimization and Decision Making, vol. 9, no. 1, pp. 69–81, 2010. Y. Gao, “Existence and uniqueness theorem on uncertain differential equations with local Lipschitz condition,” Journal of Uncertain Systems, vol. 6, no. 3, pp. 223–232, 2012. R. Gao, “Milne method for solving uncertain differential equations,” Applied Mathematics and Computation, vol. 274, pp. 774–785, 2016. X. Wang, Y. Ning, T. A. Moughal, and X. Chen, “Adams-Simpson method for solving uncertain differential equation,” Applied Mathematics and Computation, vol. 271, pp. 209–219, 2015. K. Yao and X. W. Chen, “A numerical method for solving uncertain differential equations,” Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, vol. 25, no. 3, pp. 825– 832, 2013. Y. Zhang, J. Gao, and Z. Huang, “Hamming method for solving uncertain differential equations,” Applied Mathematics and Computation, vol. 313, pp. 331–341, 2017. K. Yao, J. Gao, and Y. Gao, “Some stability theorems of uncertain differential equation,” Fuzzy Optimization and Decision Making, vol. 12, no. 1, pp. 3–13, 2013. H. Liu, H. Ke, and W. Fei, “Almost sure stability for uncertain differential equation,” Fuzzy Optimization and Decision Making. A Journal of Modeling and Computation Under Uncertainty, vol. 13, no. 4, pp. 463–473, 2014. Y. Sheng and C. Wang, “Stability in -th moment for uncertain differential equation,” Journal of Intelligent & Fuzzy Systems. Applications in

182

14. 15.

16.

17.

18. 19.

20. 21. 22. 23.

24.

25. 26. 27.

Modeling in Mathematics

Engineering and Technology, vol. 26, no. 3, pp. 1263–1271, 2014. Y. Sheng and J. Gao, “Exponential stability of uncertain differential equation,” Soft Computing, vol. 20, no. 9, pp. 3673–3678, 2016. X. Yang, Y. Ni, and Y. Zhang, “Stability in inverse distribution for uncertain differential equations,” Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, vol. 32, no. 3, pp. 2051–2059, 2017. Y. Sun and T. Su, “Mean-reverting stock model with floating interest rate in uncertain environment,” Fuzzy Optimization and Decision Making. A Journal of Modeling and Computation Under Uncertainty, vol. 16, no. 2, pp. 235–255, 2017. X. Yu, “A stock model with jumps for uncertain markets,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 20, no. 3, pp. 421–432, 2012. X. W. Chen and J. Gao, “Uncertain term structure model of interest rate,” Soft Computing, vol. 17, no. 4, pp. 597–604, 2013. Y. Sun, K. Yao, and Z. Fu, “Interest rate model in uncertain environment based on exponential Ornstein–Uhlenbeck equation,” Soft Computing, vol. 22, no. 2, pp. 465–475, 2018. Y. Shen and K. Yao, “A mean-reverting currency model in an uncertain environment,” Soft Computing, vol. 20, no. 10, pp. 4131–4138, 2016. X. Wang and Y. Ning, “An uncertain currency model with floating interest rates,” Soft Computing, vol. 21, no. 22, pp. 6739–6754, 2017. R. Gao, “Uncertain wave equation with infinite half-boundary,” Applied Mathematics and Computation, vol. 304, pp. 28–40, 2017. X. Yang and J. Gao, “Linear-Quadratic Uncertain Differential Game with Application to Resource Extraction Problem,” IEEE Transactions on Fuzzy Systems, vol. 24, no. 4, pp. 819–826, 2016. Y. Zhu, “Uncertain optimal control with application to a portfolio selection model,” Cybernetics and Systems, vol. 41, no. 7, pp. 535– 547, 2010. Y. Kuang, Delay Differential Equations with Applications in Population Dynamics, Academic Press, Boston, Mass, USA, pp. 117–350, 1993. M. Li and J. Wang, “Finite time stability of fractional delay differential equations,” Applied Mathematics Letters, vol. 64, pp. 170–176, 2017. M. Z. Liu and D. Li, “Properties of analytic solution and numerical

A New Stability Analysis of Uncertain Delay Differential Equations

28. 29.

30.

31.

32.

33.

34.

183

solution of multi-pantograph equation,” Applied Mathematics and Computation, vol. 155, no. 3, pp. 853–871, 2004. J. Richard, “Time-delay systems: an overview of some recent advances and open problems,” Automatica, vol. 39, no. 10, pp. 1667–1694, 2003. I. C. Barbacioru, “Uncertainty functional differential equations for finance,” Surveys in Mathematics and its Applications, vol. 5, pp. 275–284, 2010. X. Ge and Y. Zhu, “Existence and uniqueness theorem for uncertain delay differential equations,” Journal of Computational Information Systems, vol. 8, no. 20, pp. 8341–8347, 2012. X. Wang and Y. Ning, “Stability of uncertain delay differential equations,” Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology, vol. 32, no. 3, pp. 2655–2664, 2017. L. Jia and Y. Sheng, “Stability in distribution for uncertain delay differential equation,” Applied Mathematics and Computation, vol. 343, pp. 49–56, 2019. T. H. Gronwall, “Note on the derivatives with respect to a parameter of the solutions of a system of differential equations,” Annals of Mathematics, vol. 20, no. 4, pp. 292–296, 1919. E. Buckwar, “Introduction to the numerical analysis of stochastic delay differential equations,” Journal of Computational and Applied Mathematics, vol. 125, no. 1-2, pp. 297–307, 2000.

CHAPTER 9

Dynamical Analysis and Chaos Control of a Discrete SIS Epidemic Model

Zengyun Hu1,2, Zhidong Teng2 , Chaojun Jia1 , Chi Zhang1 and Long Zhang2 State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Beijing Road, Urumqi, 830011, China 1

College of Mathematics and System Sciences, Xinjiang University, Shengling Road, Urumqi, 830046, China 2

ABSTRACT The dynamical behaviors of a discrete-time SIS epidemic model are investigated in this paper. The result indicates that the model undergoes a flip bifurcation and a Hopf bifurcation, as found by using the center manifold

Citation (APA): Hu, Z., Teng, Z., Jia, C., Zhang, C., & Zhang, L. (2014). Dynamical analysis and chaos control of a discrete SIS epidemic model. Advances in Difference Equations, 2014(1), 58. (20 pages). DOI: https://doi.org/10.1186/1687-1847-2014-58 Copyright: 2014 Hu et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

186

Modeling in Mathematics

theorem and bifurcation theory. Numerical simulations not only illustrate our results, but they also exhibit the complex dynamical behaviors, such as the period-doubling bifurcation in period-2, -4, -8, quasi-periodic orbits and the chaotic sets. Specifically, when the parameters A, d1, d2, r, λ are fixed at some values and the bifurcation parameter h changes with different values, there exist local stability, Hopf bifurcation, 3-periodic orbits, 7-periodic orbits, period-doubling bifurcation and chaotic sets. These results reveal far richer dynamical behaviors of the discrete epidemic model compared with the continuous epidemic models although the discrete epidemic model is simple. Finally, the feedback control method is used to stabilize chaotic orbits at an unstable endemic equilibrium. Keywords: discrete epidemic model, bifurcation, chaos, feedback control

INTRODUCTION In the theoretical studies of epidemic dynamical models, there are two kinds of mathematical models: the continuous-time models described by differential equations, and the discrete-time models described by difference equations. Recent years, the discrete-time epidemic models have been discussed in many papers. Usually, there are two ways to construct a discrete-time epidemic model: (i) by directly making use of the property of the epidemic disease (see [1, 2]), and (ii) by discretizing a continuous-time epidemic model using techniques, such as the forward Euler scheme and Mickens’ non-standard discretization (see [3]). In [3] the authors firstly used the nonstandard or Mickens-type discretization in an explicitly epidemiological context. The details of Mickens-type discretization can be found in [4, 5]. Up to now, some work has been done on discrete-time epidemic models (for examples, see [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21] and the references cited therein). These works mainly focused on the computation of the basic reproduction number; the local stability and global stability of the disease-free equilibrium and the endemic equilibrium; the extinction and persistence of the disease. The authors in [6, 7, 8, 9] discussed the stabilities of the disease-free equilibrium and the endemic equilibrium for some SI, SIS, SIR, and SIRS type discrete-time epidemic models. In [9] we obtained the conditions for the existence and local stability of the disease-free equilibrium and endemic equilibrium in a class of discrete SIRS epidemic with three dimensions. The oscillation and stability have been discussed in [10, 11, 12, 13, 14]. The authors in [10, 11, 14] all used the

Dynamical Analysis and Chaos Control of a Discrete SIS Epidemic Model

187

non-standard discretization way to obtain their discrete epidemic models. Sufficient conditions for the global dynamics of the solution of the discrete SIRS epidemic model were obtained as for the original continuous model in [14]. A new way to study the basic reproduction number for some discretetime epidemic models has been given in [15]. Li and Wang in [16] discussed the dynamical behaviors including a bifurcation, but not giving a proof of the bifurcation. In general, the discrete epidemic models obtained by Mickens-type discretization have the same features as the original continuous-time model [10, 11, 14]. For the Rössler system [3], the difference equations obtained by the non-standard or Mickens-type method also show that the solutions to the discrete models are topologically equivalent to the solutions of the continuous-time system as long as the time step is less than a threshold value. For the discrete population models [22, 23, 24] approached by the forward Euler scheme, there existed a flip bifurcation, a Hopf bifurcation and chaos dynamical behaviors which are different from the dynamical behaviors in the corresponding continuous-time models. In [9] the authors used the forward Euler scheme to obtain a class of discrete SIRS epidemic models. They claimed that when the time step h is small (hh∗) in the discrete epidemic model appears a flip bifurcation, a Hopf bifurcation, chaos, and more complex dynamical behaviors by the numerical simulations. Therefore, motivated by the above studies, we will focus on the complex dynamical behaviors of a simple discrete SIS epidemic model approached by the forward Euler scheme. Now, we consider the following continuoustime SIS epidemic model described by differential equations:

(1) where S(t), and I(t) denote the numbers of susceptible, infective, individuals at time t, respectively. A is the recruitment rate of the population, d1 is the natural death rate of the population, d2 is the death rate of infective individuals which includes the natural death rate and the disease-related death rate, r is the recovery rate of the infective individuals, λ is the standard incidence rate. It is clear that [25] model (1) has the basic reproduction number

, and if R0≤1, then the disease-free equilibrium

of

188

Modeling in Mathematics

model (1) is globally asymptotically stable, and if R0>1, then the endemic equilibrium E+(S+,I+) of model (1) is locally asymptotically stable. Applying the forward Euler scheme to model (1), we obtain the following discrete-time SIS epidemic model:

(2) where h is the time step size. A, λ, d1, d2, and r are defined as model (1). It is assumed that initial values S0>0, I0>0 and all the parameters are positive. In this paper, we will study the existence of the disease-free equilibrium and endemic equilibrium, and the stability of the disease-free equilibrium and the endemic equilibrium for model (2). For detecting the complex dynamical behaviors, the time step h is selected as a bifurcation parameter in model (2). Furthermore, we use the numerical simulations to display the flip bifurcation, the Hopf bifurcation and complex dynamical behaviors. Finally, the chaos control for model (2) is obtained by the feedback control method. The following is the organization of this paper. In the second section, we discuss the existence and local stability of equilibria in model (2). In the third section, we study the flip bifurcation and the Hopf bifurcation of model (2) by choosing h as a bifurcation parameter. In the fourth section, we present the numerical simulations, which not only illustrate our results with the theoretical analysis, but we also exhibit the complex dynamical behaviors such as the cascade of period-doubling bifurcation in period-2, 4, 8, quasi-periodic orbits, 3-periodic orbits, 7-periodic orbits and chaotic sets. In the fifth section, the feedback control method is used to control chaotic orbits at an unstable endemic equilibrium. The conclusion is given in the last section.

ANALYSIS OF EQUILIBRIA Let (the basic reproductive rate), and we have the following result as regards the existence of the equilibria of model (2). Lemma 2.1 1. If R0≤1, then model (2) has only the disease-free equilibrium . 2. If R0>1, then model (2) has two equilibria: the disease-free equilibrium and the endemic equilibrium E2(S∗,I∗), where

Dynamical Analysis and Chaos Control of a Discrete SIS Epidemic Model

189

Now, we study the stability of equilibria E1 and E2 of model (2). The Jacobian matrix of model (2) at the equilibrium

The corresponding characteristic equation of J

is

can be written as

(3) After simple computing, we obtain the local stability result of the diseasefree equilibrium

, which is shown in the following.

Theorem 2.1 If R01, then

(1) E2(S∗,I∗) is a sink if one of the following conditions holds: (A) Δ≥0 and 0