Using Artificial Neural Networks for Timeseries Smoothing and Forecasting: Case Studies in Economics (Studies in Computational Intelligence, 979) [1st ed. 2021] 3030756483, 9783030756482

The aim of this publication is to identify and apply suitable methods for analysing and predicting the time series of go

227 29 10MB

English Pages 199 [197] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Introduction
Contents
1 Time Series and Their Importance to the Economy
1.1 Trend
1.2 Seasonality
1.3 Non-linearity
1.4 Conditioned Heteroscedasticity
References
2 Econometrics—Selected Models
2.1 Linear Models
2.2 Arima
2.2.1 Case
2.3 Exponential Smoothing
2.3.1 Case
2.4 Interrupted Time Series
2.5 Fourier Transformation
2.5.1 Case
2.6 Volatility Models
References
3 Artificial Neural Networks—Selected Models
3.1 Radial Basis Function Neural Networks (Explanation, Usage)
3.1.1 Mathematical Background
3.1.2 Case
3.2 Multi-Layer Perceptron Neural Networks
3.2.1 Mathematical Background
3.2.2 Case
3.3 Long Short Term Memory Neural Networks
3.3.1 Mathematical Background
3.3.2 Case
References
4 Comparison of Different Methods
4.1 Neural networks
4.1.1 Case
4.2 Decision Tree
4.2.1 Case
4.3 Gaussian Process
4.3.1 Case
4.4 Gradient Boosted Trees
4.4.1 Case
4.5 Linear Regression
4.5.1 Case
4.6 Nearest Neighbours
4.6.1 Case
4.7 Random Forest
4.7.1 Case
4.8 Mathematica: Comparison
References
5 Conclusion
Recommend Papers

Using Artificial Neural Networks for Timeseries Smoothing and Forecasting: Case Studies in Economics (Studies in Computational Intelligence, 979) [1st ed. 2021]
 3030756483, 9783030756482

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Studies in Computational Intelligence 979

Jaromír Vrbka

Using Artificial Neural Networks for Timeseries Smoothing and Forecasting Case Studies in Economics

Studies in Computational Intelligence Volume 979

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, selforganizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/7092

Jaromír Vrbka

Using Artificial Neural Networks for Timeseries Smoothing and Forecasting Case Studies in Economics

Jaromír Vrbka ˇ Institute of Technology and Business in Ceské ˇ Budˇejovice, Ceské Budˇejovice, Czech Republic

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-75648-2 ISBN 978-3-030-75649-9 (eBook) https://doi.org/10.1007/978-3-030-75649-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Introduction

The majority of collected data show a certain time structure. In some cases, this structure is hidden; however, there are methods to extract relevant information from the available data using this time structure. The knowledge of time series modelling is a basic skill in the field of data science, since there as specific structures for this data type which can be examined in all situations. Time series enables the analysis of the main patterns, such as trends, seasonality, cyclicality, or data irregularities. Time series analysis is applied, e.g. for analysis of stock markets, recognition of patterns, earthquake prediction, economic predictions, census, etc. Time series forecasts use historical empirical regularities for creating future projections, which are guided by theoretical understanding of economic processes. Economic forecasts are used for a wide range of activities, including setting monetary and fiscal policy, state and local budget, financial management and financial engineering. The key elements of economic forecasts include selecting the model(s) of predictions suitable for a given problem, assessment and communicating the uncertainty related to forecasts and protection against the model instability. For example, in the case of investments, time series follows the movement of selected data points, such as security price, for a specified period of time, where the data points are recorded at regular intervals. There is no minimum or maximum period, which enables the collection of the data in a way that provides the information required by an investor or analyst that analyses the activity. Time series analysis can be useful to determine how a given asset, security or economic variable changes over time. The method of predicting time series is the most reliable in the case that the data represent a longer period of time. The information about the conditions can be obtained by measuring the data at various time intervals—hourly, daily, monthly, quarterly, yearly, or at any other time interval. Predictions are most accurate if they are based on a large number of observations for a longer period of time so that it is possible to measure the patterns in given conditions. The main advantages of time series analysis include the fact that there are certain functions that cannot be captured by means of normal regression models, while the methods connected with time series can. Furthermore, there can be mentioned certain dependencies that cannot be captured by means of conventional models, especially factors such as seasonality, where the effects of trends cannot be fully explained by means of models such as linear regression, but by means of time series models. This v

vi

Introduction

shows that time series is useful if we try to capture aspects that usually change over time. For example, data on prices at the end of the day are very difficult to model if only regression models are used; however, time series uses a wide range of models which were proved to capture such complex phenomena as the dependence between random variables, also known as serial correlation. The most important advantage of time series consists in its ability to predict the future value of data. There are various methods to obtain prediction. There can be used, e.g. regression models, models based on the decomposition of time series, models of so-called smoothing by means of function or models based on neural networks (machine learning). Before examining the methods of machine learning for time series, it is recommended to make sure that conventional methods of predicting linear time series cannot be used. Such models can be focussed on linear relationships; however, they are sophisticated and work well in a wide range of problems, provided that the data are prepared properly and the method is well-configured. From the conventional methods, there should be mentioned, e.g. autoregressive model (AR), moving average (MA), autoregressive moving average (ARMA), ARIMA, seasonal autoregressive integrated moving average (SARIMA), seasonal autoregressive integrated moving average with exogenous regressors (SARIMAX), vector autoregression (VAR), vector autoregressive moving average (VARMA), vector autoregressive moving average with exogenous regressors (VARMAX), simple exponential smoothing (SES), or holt winter exponential smoothing (HWES). Time series predictions have been dominated by linear methods for decades. Linear methods show an easy development and implementation and are relatively easy to understand and interpret. However, it is important to understand the limitations of linear models; they are not able to capture nonlinear relationships in data. Nevertheless, neural networks have recently appeared to be an alternative tool for predicting. The naturally nonlinear structure of neural networks is particularly useful to capture a complex basic relationship in many real-world problems. Neural networks are perhaps more versatile methods for predicting, as they are able to find nonlinear structures in a problem but can also model linear processes. The disadvantages of time series analysis include the fact that the method assumes that past trends will go on forever and that the extrapolation of data based on historical information provides valid results. In fact, e.g. sales of products may be influenced by competition, especially in relation to the availability of new products in the market. A big problem related to the time series prediction is the fact that the models of ARIMA autoregressive integrated moving average type is not capable of long-term predictions, since they are dependent on previous values. Time series prediction thus cannot reflect what is happening now, since it does not use up-to-date information. One possibility is to add regressors containing up-to-date information so that it is possible to combine time series for the variable in question and other information on the time series reflecting what is considered a current period of time in the research. It is important, however, not to include too many regressors, as their overuse and over combined model may result in incorrect ad confusing prediction. Another problem is the fact that most phenomena that need to be predicted are affected by factors at different levels. Therefore, model components need to be added, which take seasonal

Introduction

vii

influences into account. All methods that could be used for time serious predictions have certain advantages and disadvantages, and so do their models; however, the only thing that primarily matters is the nature of the data, as a model that was successfully used to solve a problem in one situation, the most efficient model, is not always the most suitable one for another application. It is therefore necessary to consider possible application of various methods followed by evaluation which of them is the most suitable solution for a given problem. In the case of time series analysis and prediction, e.g. the free programme, Python, can be used as a software solution; however, there are several slightly more userfriendly programmes, such as Tibco’s Statistica or Wolfram Mathematica.

Contents

1 Time Series and Their Importance to the Economy . . . . . . . . . . . . . . . . 1.1 Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Non-linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Conditioned Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 2 3 3 3 4

2 Econometrics—Selected Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Arima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Interrupted Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Fourier Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 8 9 9 15 16 18 19 20 30 32

3 Artificial Neural Networks—Selected Models . . . . . . . . . . . . . . . . . . . . . 3.1 Radial Basis Function Neural Networks (Explanation, Usage) . . . . . 3.1.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Multi-Layer Perceptron Neural Networks . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Long Short Term Memory Neural Networks . . . . . . . . . . . . . . . . . . . . 3.3.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35 36 37 38 71 74 78 111 111 118 134

ix

x

Contents

4 Comparison of Different Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Gradient Boosted Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Nearest Neighbours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Mathematica: Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

137 137 137 140 143 149 149 155 156 160 162 167 169 172 175 180 185

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Chapter 1

Time Series and Their Importance to the Economy

There is no doubt that data, as they are, refer to results of everyday human activities, containing useful quantitative information, although only historical. What presents one of the examples of the data are time series (Zhou et al. 2020). According to Rafsanjani and Samareh (2016), time series or numeral values of a specific indicator changing in time constitute an integral part of our lives and exist throughout the real world. Typical examples are EEG records in medicine, data on the population development within a particular demographic territory or economic time series that monitor price inflation, GNP, export, population and other findings revealing causal relations (Pravilovic et al. 2017). Chandra (2015) argues that it is currently impossible to make crucial economic decisions without thoroughly analysing key economic indicators and their relations. The author suggests that economy, irrespective of it relating a business or country, sees the importance of time series in planning and predicting price development and already mentioned economic indicators. According to Wang and Han (2015), the economy strives for defining the development of the monitored economic indicator from previous periods, revealing causes of fluctuation and predicting the future. Rostan and Rostan (2018) claim that it is the prediction that plays the central role in analysing time series, presenting an intricate task that requires employing conventional methods. What is also disputable is the prediction reliability, as the value of indicators depends on various factors, other indicators, conditions and a result of the decision-making process of living organisms, i.e. people, whose behaviour is hard to predict (Reid et al. 2014). A large scale of scientific fields applies time series and analyses them in various studies and researches relating to the economy. Horák et al. (2019) compared the accuracy of smoothing time series through regression analysis and artificial neural networks on the export from The USA to PRC. The study focused on applying and contributing artificial neural structures to practice. The results showed that ANN could effectively learn correlations in time series. Vrbka et al. (2019) arrived at analogue findings, exploring the trade balance of countries of the EU and PRC. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_1

1

2

1 Time Series and Their Importance to the Economy

Kayalar et al. (2017) examined the correlation of oil prices, stock market indexes and exchange rates in various economies divided into developing states and importer markets or oil exporters. The authors concluded that exchange rates and stock market indexes of most of the oil-exporting countries generate greater dependence on the oil price than developing markets with oil importers. Baek and Kim (2019) observed how oil prices influence exchange rates of selected countries of SSA—SubSaharan Africa. The authors thoroughly analyzed the asymmetric effects of changes in oil prices within their modelling process—non-linear autoregressive distributed lag model. The results show that changes in oil prices will asymmetrically affect the actual exchange rates on a long-term basis, which means that the fluctuation of real exchange rates reflects the growth in oil prices more than their fall. Kodam et al. (2017) applied a recurrent neural network to time series of Bitcoin exchange rates denominated in EUR. The results considered only a unified market within a relatively short term, requiring, therefore, a more comprehensive survey to provide a more valid conclusion. Šrenkel and Smorada (2014) explored the development of the financial sector in Slovakia before, during and after the crisis. The authors compared data with the whole economic development, carefully analyzing debt, liquidity and profit ratios within 2007–2012. They concluded that 2009 indicated the lowest values, after which there was an improvement in 2010–2011, while 2012 saw another drop. Artl and Artlová (2003) claim that if experts focusing on time series know their characteristics based on which they are able to classify them, they will know better their calculation specifics. The authors further suggest several groups of time series widely applied in practice: descending, ascending, interval, and instantaneous, shortterm and long-term, stochastic and deterministic, equidistant and non-equidistant, monetary and naturally expressed time series or absolute and derived time series etc. Wen et al. (2019) state that the course of time series shows certain specifics including trend, seasonality, non-linearity and conditioned heteroscedasticity.

1.1 Trend De Leo et al. (2020) defines the term ‘trend’ through long-term changes in the behaviour of a time series, i.e. through the marked tendency for the development of the monitored phenomenon. He further refers to it as a result of long-term one-way exerting factors, such as market conditions, production technology, demographic conditions etc. Proietti and Grassi (2015) argue that a trend can acquire various forms: declining, growing, smoothing, variable, consistent or steep, which are liable to changes through the time; i.e. its shows signs of a cycle.

1.2 Seasonality

3

1.2 Seasonality Song and Chung (2014) claim that modelling seasonality in the time series presents an essential topic in the field of statistics. Ruling out the concept of the seasonality of the model may result in a considerable distortion. If we intend to model the average monthly temperature, excluding seasonal figures may lead to bias in forecasted values. The term ‘seasonality’ implies a periodical and systematic fluctuation within the time series. Artl and Artlová (2007) declare that the vacillation takes place within one calendar year, annually repeating in the same or modified form. The main reason for these periodical changes mostly consists in changing seasons and various human habits. Seasonality, which can be either regular or irregular, is the most evident in shortterm high-frequency time series. Parameters indicating seasonality comprise seasonal factors of which the sum equals zero.

1.3 Non-linearity Tsay (2005) argues that economic or financial relations are non-linear. The most distinctive traits of almost all economic time series are structural reversals, trend changes and variability. At times, this issue may lead to significant changes in their auto-correlation structures. The author further claims that non-linearity can consist in different average differentiations or different average coefficients of growth for specific periods. Where conventional linear models fail to identify this behaviour in economic time series, the linear models will do the thing (Galka and Ozaki 2001).

1.4 Conditioned Heteroscedasticity Kokoszka et al. (2017) claim that conditioned heteroscedasticity mostly takes place in financial and economic time series, determining that the yield logarithm is normally distributed with the scatter changing in relation to the time. Conditioned heteroscedasticity adopts the following formula: (ln X t − ln X t − 1)2 = a + p (ln X t − ln X (t − 2))2 + ut where: Xt a Xt-1 a ut

values in the time series in the time t changed by a unit parameter calculated by the least square method random element

(1.1)

4

1 Time Series and Their Importance to the Economy

If parameter p equals zero, no conditioned heteroscedasticity exists in the time series. Tsay (2005) argues that empirical and theoretical analyses of financial time series apply the fundamental principle that logarithms of yields (coefficient growth logarithms) are commonly distributed with the constant mean value and constant scatter. It is so because prices cannot have negative values; i.e. time series are presumed to bear logarithmic-normal distribution, with the higher frequency in financial time series. The author further states that compared with a normal distribution, the time series distribution of logarithmic yields is sharper. Spearman’s rank correlation test is presumably the most applicable method for heteroscedasticity.

References Arlt, J., and M. Arltová. 2003. Finanˇcní cˇ asové rˇady: Vlastnosti, metody modelování, pˇríklady a aplikace [Financial time series: Properties, modeling methods, examples and applications]. Prague: Grada Publishing. Arlt, J., and M. Arltová. 2007. Ekonomické cˇ asové rˇady [Economic time series]. Prague: Grada Publishing. Baek, J., and H. Y. Kim. 2019. On the relation between crude oil prices and exchange rates in sub-Saharan African countries. A nonlinear ARDL approach. 1–12. Chandra, R. 2015. Competition and collaboration in cooperative coevolution of Elman recurrent neural networks for time-series prediction. IEEE Transactions on Neural Networks and Learning Systems 26 (12): 3123–3136. De Leo, F., A. De Leo, G. Besio, and R. Briganti. 2020. Detection and quantification of trends in time series of significant wave heights: An application in the Mediterranean Sea. Ocean Engineering. 202. Galka, A., and T. Ozaki. 2001. Testing for nonlinearity in high-dimensional time series from continuous dynamics. Physica D, 158: 1–4, 32–44. Horák, J., P. Šuleˇr, and J., Vrbka. 2019. Comparison of neural networks and regression time series when predicting the export development from the USA to PRC. In Proceedings of 6th International Scientific Conference Contemporary Issues in Business, Management and Economics Engineering. Kayalar, D.E., C.C. Küçüközmen, and A.S. Selcuk-kestel. 2017. The impact of crude oil prices on financial market indicators: Copula approach. Energy Economics 61: 162–173. Kodama, O., L. Pichl, and T. Kaizoji. 2017. Regime change and trend prediction for bitcoin time series data. CBU International Conference Proceedings 5: 384–388. Kokoszka, P., G. Rice, and H.L. Shang. 2017. Inference for the autocovariance of a functional time series under conditional heteroscedasticity. Journal of Multivariate Analysis 162: 32–50. Pravilovic, S., M. Bilancia, A. Appice, and D. Malerba. 2017. Using multiple time series analysis for geosensor data forecasting. Information Sciences 380: 31–52. Proietti, T., and S. Grassi. 2015. Stochastic trends and seasonality in economic time series: New evidence from Bayesian stochastic model specification search. Empirical Economics 48 (3): 983–1011. RafsanjanI, M.K., and M. Samareh. 2016. Chaotic time series prediction by artificial neural network. Journal of Computational Methods in Sciences and Engineering 16 (3): 599–615. Reid, D., A. Hussain, H. Tawfik, and E. Ben-Jacob. 2014. Financial time series prediction using spiking neural networks. PLoS ONE, 9(8).

References

5

Rostan, P., and A. Rostan. 2018. The versatility of spectrum analysis for forecasting financial time series. Journal of Forecasting 37 (3): 327–339. Song, Q., and J.W. Chung. 2014. On a new framework of autoregressive fuzzy time series models. Industrial Engineering and Management Systems 13 (4): 357–368. ˇ and M. Smorada. 2014. Financial ratios analysis of financial institutions in the Slovak Šrenkel, L, Republic during and after the global financial crisis. In Managing and Modelling of Financial Risks: Proceedings: 7th International Scientific Conference: 8–9th September Ostrava, 793–798. Bratislava, Slovakia: VŠB - Technical University of Ostrava Univ Econ Bratislava, Fac Business Management, Dept Corp Finance. Tsay, S.R. 2005. Analysis of financial time series, 2nd ed. Hoboken, New Jersey: Wiley Interscience. Vrbka, J., P. Šuleˇr, V. Machová, and J. Horák. 2019. Considering seasonal fluctuations in equalizing time series by means of artificial neural networks for predicting development of USA and People´s Republic of China trade balance. Littera Scripta 12 (2): 178–193. Wang, X.Y., and M. Han. 2015. Improved extreme learning machine for multivariate time series online sequential prediction. Engineering Applications of Artificial Intelligence 40: 28–36. Wen, M., P. Li, L. Zhang, and Y. Chen. 2019. Stock market trend prediction using high-order information of time series. IEEE Access 7: 28299–28308. Zhou, M.N., J.Z. Yi, J.Q. Yang, and Y. Sima. 2020. Characteristic representation of stock time series based on trend feature points. IEEE ACCESS 8: 97016–97031.

Chapter 2

Econometrics—Selected Models

Econometrics is currently a rapidly developing field of study, referring not only to common economics (macroeconomy and microeconomy), but also specialized economic areas such as financial and spatial economics (Andrikopulos and Gkountanis 2011). According to Zheng and Hurn (2019), econometrics has recently been enjoying a reputation of fertile grounds for research with the availability of reliable information via high-frequency data. The authors further claim that econometrics predominantly consists in systematically seeking quantitative answers to various problems and issues from economics, building on economic theories related to the problem at issue. We need to say that econometric techniques and tools may also apply to areas other than economics, e.g. sociology. The scientific literature informs that professional experience provides various, yet conceptually equivalent definitions of econometrics. Magnusson (2016) declares that econometrics involves unorthodox methods for assessing mathematic economics of numeral models, validating economic theories by modern statistical techniques. Pineda (2006) regards econometrics as a quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation relating to convenient conclusive methods. Geweke et al. (2007) consider econometrics as a technique to analyse empirical economic relationships for the ex-post testing of economic theories, prognoses, decision-making and policy assessment. Irrespective of an accepted definition, Pinto (2011) considers econometrics as a most applicable, accurate and convenient method in economics. The author further regards the discipline as integration of findings from various fields of economy, statistics and mathematics. It covers different spheres of applied economics, testing economic theories, informing policy creators or predicting future behaviour. Andrikopoulos and Gkountanis (2011) argue that the fundamental principle of econometrics consists in creating models of monitored phenomena. The example enables an illustration of a real situation while being dependent on the difficulty of the examined phenomenon and incapable of setting out details by available theoretical means. The © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_2

7

8

2 Econometrics—Selected Models

prototype subsequently demonstrates distinctive traits of the economic phenomenon, disregarding uninteresting features. Econometrics verbatim means ‘measuring the economy”. The measuring takes place through various econometrics models and methods. Bendic et al. (2016) characterizes econometric models as economic principles established between variables of the interdependent system. Pineda (2006) points out that the econometric models involve directly measurable variables. However, the econometric theory also discusses non-measurable quantities. They are applicable in specific phases of econometric analysis. We call them dummies. They involve non-economic factors specifying traits of statistical units, thus receiving useful information on principles of econometric phenomena and processes. Examples of these phenomena are individuals (nationality, sex, skin colour etc.), seasonality settlement in time series (seasonal variables), and location setting (east or west or individual regions within a state). Regarding the demanding nature of the method, Magnusson (2016) recommends creating econometric modelling via econometric software. What is essential is to use a relevant data source. Unreliable, manipulated, or distorted information may lead to compromising the reliability of the model. Seeking a valuable data source can be very time consuming and, at times, impossible. The following text briefly describes econometric models and related methods.

2.1 Linear Models Hossain and Ghahramani (2016) argue that linear models, or linear regression models, apply to model (explication of examined values) of the continuous result variable (dependent variable). One or more predictors (independent variables) then demonstrate the outcome. Classical linear regression model presents the least complicated type, whose parameters derive from a simple formula. The input data, however, must comply with stringent requirements. The professional experience often brings about a situation when the available data fail to meet the essential prerequisites, as the limited number of events applicable to the classical linear model cannot cover the complexity of the reality. It is, therefore, necessary to relax some requirements of the classical linear model and examine the traits of newly created models. Linear models include an integrated linear regression model and a generalized linear regression model. A generalized integrated linear regression model concerns even a more generalized type of model, comprising the linear integrated and generalized linear model (Eguchi 2018).

2.2 Arima

9

2.2 Arima Pai and Lin (2005) argue that ARIMA Model (also called Box-Jenkins Method), or autoregression integrated moving average involves one of the most popular techniques of time series prediction, commonly applied thanks to its unique ability to predict and prognosis. ARIMA, however, presents a rather complex model imposing stringent requirements for its use. The model results mostly depend on experience, knowledge and decisions of persons conducting the analysis (Frances 2000). Mélard and Pasteels (2000) proclaim that ARIMA Models contribute to understanding time series and enable predictions of their future behaviour. The methods are convenient for short-term forecasting when data explaining variables are not available or when the model lacks abilities to predict.

2.2.1 Case Experiment 1 Experiment 1 deals with the use of the ARIMA method for predicting the price of gold. Figure 2.1 shows the parameter settings of the ARIMA model.

Fig. 2.1 Setup ARIMA (Tibco 2020)

10

2 Econometrics—Selected Models

Fig. 2.2 Results ARIMA (a) (Tibco 2020)

Figures 2.2 and 2.3 show the results of the model settings (Fig. 2.4). Figure 2.5 is a graph of the formed residues, i.e. the difference between the original and the predicted value. From the shape of the curve connecting the residues, it can be concluded that the differences in the predictions and the actual development of the price of gold are not excessively large and the proposed model could be the optimal model. Residues reach approximately +80 and −120 in the extremes. Figure 2.6 shows the normal distribution of residues (copying the Gaussian curve). The top of the curve is correctly placed at the value of 0 CZK. The Gaussian curve is not shifted to the left—to negative values or to the right—to positive values. The sum of the residues will probably be zero. This is very good news for the model used. We can consider the small number of residues and their relatively low values to be the most important. This can also confirm the accuracy of the exact result. Figure 2.7 shows a graph of normal probability, where the blue circles represent the predicted values that we ask to lie as close as possible to the red curve. From the figure it can be concluded that there are no excessive deviations from the red curve. Figure 2.8 shows a detrended normal probability graph. Half-Normal Probability plot is shown in Fig. 2.9. The graphical expression of ρk values for individual k ≥ 0 is called a correlogram. An example of the autocorrelation function correlogram is shown in Fig. 2.10. In the autocorrelation function correlogram, the emphasis is mainly on significant autocorrelations, as these suggest which values of the time series there is a relationship between. Figure 2.10 demonstrates the range of possible occurrence of the correct

2.2 Arima

Fig. 2.3 Results ARIMA (a) (Tibco 2020)

Fig. 2.4 Forecasts ARIMA (Tibco 2020)

11

12

Fig. 2.5 Plot of variable ARIMA (Tibco 2020)

Fig. 2.6 Histogram variable ARIMA (Tibco 2020)

2 Econometrics—Selected Models

2.2 Arima

Fig. 2.7 Normal Probability Plot ARIMA (Tibco 2020)

Fig. 2.8 Detrended Normal Probability Plot (Tibco 2020)

13

14

Fig. 2.9 Half-Normal Probability Plot ARIMA

Fig. 2.10 Autocorrelation Function ARIMA

2 Econometrics—Selected Models

2.2 Arima

15

Fig. 2.11 Partial Autocorrelation Function (Tibco 2020)

value. The figure shows standard errors, correlations, and especially the red limits of reliability highlighted in red. At the fifth Lag it can be observed that the value exceeds the confidence limit highlighted in red. In other cases, the values fit within the confidence limit. The detail of the overlap can be seen in Fig. 2.11.

2.3 Exponential Smoothing Qiao et al. (2020) points out that the application of equalizing methods commonly results in smoothing the time series to predict its future development. Moosa (2000) states that exponential smoothing present one of the most common techniques to forecast the exchange rate. The foreign professional literature abbreviates this method as SES (simple exponential smoothing), mostly applying to stationary time series. For the calculation of the oncoming period, we use the following formula (Lawrence et al., 2009): ∧ = p Sk + (1 − p)Sk∧ Sk+1

where: p ∧ Sk+1

smoothing parameter, prediction value in time k,

(2.1)

16

Sk

2 Econometrics—Selected Models

actual value in time k.

The smoothing parameter values may exist in (0; 1) interval. Some authors recommend limiting the value interval of the smoothing parameter to < 0.7; 1). Moose (2000) emphasizes correlation S∧1 = S1 , which means that the value of the first prediction equals to the actual first value. The first value usually presents the arithmetic mean of the first six values. The simple exponential smoothing and double exponential smoothing and its generalization often referred to as Holt’s Method belong to the most common types of equalizing.

2.3.1 Case Experiment 2 The second experiment is focused on the method of exponential adjustment, where the so-called smoothing takes place, i.e. the smoothing of the gold price prediction curve. Exponential adjustment is one of the simple methods used for smoothing and short-term prediction of time series. The meaning (weight) of a data point for the prediction value decreases exponentially with the time distance from the prediction (the age of the given data point). With this method, it is not necessary to determine the length of the section from which we calculate the adjusted value, because the calculation of adjusted values is, in contrast to moving averages, based on all previous observations. Eight different curve settings were used for this experiment. The first setting for exponential adjustment can be seen in Fig. 2.12. Figure 2.13 is a graph showing the course of the price of gold, where the blue curve represents the original time series of gold price development compared to the red curve showing the already smoothened (adjusted) time series using exponential adjustment and also the prediction for 44 trading days. The green curve then represents the resulting residues, i.e. the size of the difference between the original and the adjusted value. The second setting for exponential adjustment can be seen in Fig. 2.14. The second setting for exponential adjustment can be seen in Fig. 2.14. The second setting for exponential adjustment can be seen in Fig. 2.3. Figure 2.15 presents a graph showing the blue curve of the original gold price time series compared to the red curve, which represents an already smoothened time series with a subsequent prediction of 44 trading days. Again, a green curve showing the size of the residues can be observed. In this case, the residues are larger than when the first curve is used for interpolation. Figure 2.16 shows the setting for the third exponential adjustment curve. Figure 2.17 shows a graph in which the course of the gold price development can be observed again, where the blue curve represents the original time series of gold price development compared to the red curve showing the already smoothened (adjusted) time series with the help of exponential adjustment. The graph also shows a prediction of values for 44 business days using a red curve. The green curve then

2.3 Exponential Smoothing

17

Fig. 2.12 Setup Smoothing (Tibco 2020)

represents the resulting residues, i.e. the size of the difference between the original and the balanced value. The residues in this graph are very similar to the second curve. Figure 2.18 presents the settings for the fourth exponential adjustment curve. Figure 2.19 is a graph showing the evolution of the original gold price time series curve compared to the smoothened curve represented by the colour red. This curve also shows a prediction for 44 trading days. In the case of the green curve, these are again graphically represented sizes of residues, i.e. deviations of the smoothened values from the original values. In the case of the fifth smoothing, we can observe the settings in Fig. 2.20. Figure 2.21 shows a graph of the development of the original time series, shown by a green curve, compared to the evolution of time series smoothened using exponential adjustment (red curve). In this case too, a prediction was made for 44 trading days, which can be seen in the last part of the graph and is also shown in red. Green shows the resulting residues, which in this case have a relatively small extent. Figure 2.22 shows the settings for the sixth exponential adjustment curve which will be inserted into the graph. In the case of using the settings from Fig. 2.22, a graph is created, which is shown Fig. 2.23. In the case of this figure, the development of the original time series of gold price development and the development of the already adjusted time series can

18

2 Econometrics—Selected Models

Fig. 2.13 Exponential smoothing 1 (Tibco 2020)

be observed. In this case, the residues are relatively large, reaching values exceeding −200. The setting of the model, the next-to-last exponential adjustment, can be seen in Fig. 2.24. Figure 2.25 shows a graph of the development of the original time series, which is indicated by blue, compared to the adjusted time series. There is also a prediction for 44 trading days. Residues are marked with a green curve and in the extremes, they reach values up to −250. Figure 2.26 shows the last setting when using the exponential adjustment method. The last graph concerning the exponential adjustment method is shown in Fig. 2.27. Again, the development of the original time series is marked as a blue curve, compared to the red curve, which represents the already adjusted time series. Here, too, there is a prediction in the form of a red curve for 44 trading days. Residues are marked with a green curve.

2.4 Interrupted Time Series Professional literature usually defines time series as a continuous sequence of specific monitored phenomena on a repeated basis (often in regular intervals) in time (Bernal et al., 2017). Turner et al. (2020) consider interrupted time series (ITS) as one of the

2.4 Interrupted Time Series

19

Fig. 2.14 Setup smoothing 2 (Tibco 2020)

best quasi-experimental designs used for assessing the intervention effect. Bernal et al. (2017) also argue that ITS study sees the use of the time series of a specific result of interest to determine the trend “interrupted” by intervention in a particular moment. Ramsay et al. (2003) declare that ITS design amasses data in a larger number of equally distributed time points (annually, monthly, and weekly) before and after the intervention, while it is essential to know its moment. The time series slope, time series level changes or confidence interval of intervention parameters may assess the intervention effects. ITS also applies to various research areas such as health care, economy, politology, marketing research etc.

2.5 Fourier Transformation Fourier transformation involves principal instruments to process and analyse signals. The method is currently highly popular also due to its ability to quickly calculate large numbers while solving differential equations or elaborating an image. Fourier transformation consists in transferring signals from the time zone to the frequency zone, i.e. analysing a frequency spectrum (the signal content).

20

2 Econometrics—Selected Models

Fig. 2.15 Exponential Smoothing (Tibco 2020)

The professional literature also includes term Inverse Fourier Transformation (the original method’s inverse), dealing with transferring the signal from the frequency zone to the time zone (Radchenko and Stefanska 2016).

2.5.1 Case Experiment 3 The third experiment involves the so-called Fourier transform method. This procedure makes it possible to make an explicit description of the periodic behaviour of the time series and to identify significant components that contribute to the factual properties of the investigated process. Any time series can be expressed as a combination of a cosine (or sine) wave with different periods (how long it takes to complete the whole cycle) and amplitudes (maximum/minimum value during the cycle). This fact can be used to study periodic (cyclical) behaviour in a time series. Figure 2.28 shows the settings of the results of the Fourier analysis of one time series. Graphs shown in Figs. 2.29, 2.30, 2.31, 2.32 and 2.33 indicate the frequencies of the residue values based on the use of three time series adjustment options. The periodogram shown in Fig. 2.29 can be described as a time-limited period of implementation of the periodic random process. In this case, it shows the frequency of the

2.5 Fourier Transformation

21

Fig. 2.16 Setup Smoothing 3 (Tibco 2020)

residue values, i.e. the deviations from the actual measured values. In the first case, the Periodogram Values method was used. The periodogram in Fig. 2.30 shows the frequency of the residue values, i.e. the deviations from the curve of actual values. The log-periodogram of time series for non-negative Fourier frequencies was calculated. Figure 2.31 also shows the frequency of residue values, using the second time series adjustment method. In this case, the Spectral Density method was used. Figure 2.32 also shows residue values, using the second time series adjustment method. Figure 2.32 also shows residue values, using the second time series adjustment method. In this case, the Spectral Density method was used. Log-spectral analysis of time series for non-negative Fourier frequencies was performed. In the case of Fig. 2.33, a cosine coefficient was used for spectral analysis. Here, the values even go into the negative, and this is due to the fact that they show the dispersion of residues. In Fig. 2.33, a range with boundary points (extremes) from −50 to 50 can be observed. With the exception of two separate circles, which show a very large deviation from actuality. In the area of time series called spectral analysis, we see the time series as the sum of cosine waves with different amplitudes and frequencies. One of the goals of the analysis is to identify important frequencies (or periods) in the observed series. The initial tool for this is the periodogram. The periodogram shows a measure of the

22

Fig. 2.17 Exponential Smoothing 3 (Tibco 2020)

Fig. 2.18 Setup Smoothing 4 (Tibco 2020)

2 Econometrics—Selected Models

2.5 Fourier Transformation

Fig. 2.19 Exponential Smoothing 4 (Tibco 2020)

Fig. 2.20 Setup Smoothing 5 (Tibco 2020)

23

24

Fig. 2.21 Exponential smoothing 5 (Tibco 2020)

Fig. 2.22 Setup Smoothing 6 (Tibco 2020)

2 Econometrics—Selected Models

2.5 Fourier Transformation

Fig. 2.23 Exponential Smoothing 6 (Tibco 2020)

Fig. 2.24 Setup Smoothing 7 (Tibco 2020)

25

26

Fig. 2.25 Exponential Smoothing 7 (Tibco 2020)

Fig. 2.26 Setup Smoothing 8 (Tibco 2020)

2 Econometrics—Selected Models

2.5 Fourier Transformation

Fig. 2.27 Exponential Smoothing 8 Tibco (2020)

Fig. 2.28 Setup single series fourier analysis results (Tibco 2020)

27

28

2 Econometrics—Selected Models

Fig. 2.29 Spectral analysis 1 Periodogram Values Frequency (Tibco 2020)

Fig. 2.30 Spectral analysis 2 Log-periodogram Frequency (Tibco 2020)

2.5 Fourier Transformation

Fig. 2.31 Spectral analysis 3 Spectral Density Frequency (Tibco 2020)

Fig. 2.32 Spectral analysis 4 Log-spectral analysis Frequency (Tibco 2020)

29

30

2 Econometrics—Selected Models

Fig. 2.33 Spectral analysis 5 cosine coefficients frequency (Tibco 2020)

relative importance of possible frequency values that could explain the oscillating pattern of the observed data. In the case of Fig. 2.34, the Periodogram Values method was used to display the degree of relative importance of possible period values. In the case of Fig. 2.35, the Spectral Density method was used to obtain the output of the graph, i.e. to show the degree of relative importance of the possible values of the period. Fig. 2.36 shows the use of the Log Periodogram method, which was used to display the degree of relative importance of possible period values.

2.6 Volatility Models Wu and Wang (2020) point out that volatility presents a fundamental concept expressing instability and fluctuation changeable in time. The model mostly applies to measuring risks, assessing the variability rate of the examined quantity and relates to the conditioned scatter. As volatility is not an observable quantity, Horvath et al. (2020) emphasize its stringent requirements for its assessment and subsequent comparison of related models. The professional experience works with a wide range of various types of volatility models striving for the identification of their distinctive features. Historic volatility models and follow-up ARIMA processes were the first to serve these purposes.

2.6 Volatility Models

Fig. 2.34 Spectral analysis 6 Periodogram Values Period (Tibco 2020)

Fig. 2.35 Spectral analysis 7 Spectral Density Period (Tibco 2020)

31

32

2 Econometrics—Selected Models

Fig. 2.36 Spectral analysis 8 Log Periodogram Period (Tibco 2020)

These techniques, however, did not apply on a long-term basis, as ARCH models soon emerged, resulting in various modifications of these methods. GARCH, EGARCH, GJR GARCH models enjoy the widespread popularity today.

References Andrikopoulos, A.A., and D.C. Gkountanis. 2011. Issues and models in applied econometrics: A partical survey. South-Eastern Europe Journal of Economics 2: 107–165. Bendic, V., C. Mohora, D. Tilina, E. Turcu, and A. Nita. 2016. Multifunctional econometrics models of turnover dynamics of using primary factors of the economic process. In Vision 2020: Innovation Management, Development Sustainability, and Competitive Economic Growth. Bernal, J.L., S. Cummins, and A. Gasparrini. 2017. Interrupted time series regression for the evaluation of public health interventions: A tutorial. International Journal of Epidemiology 46 (1): 348–355. Eguchi, S. 2018. Model comparison for generalized linear models with dependent observations. Econometrics and Statistics 5 (1): 171–188. Franses, P.H.B.F. 2000. The econometric modelling of financial time series. International Journal of Forecasting 16 (3): 426–427. Geweke, J.F., J.L. Horowitz, and H. Pesaran. 2007. Econometrics: A bird’s eye view. The New Palgrave Dictionary.

References

33

Horvath, B., A. Jacquier, and P. Tankov. 2020. Volatility options in rough volatility models. Siam Journal on Financial Mathematics 11 (2): 437–469. Hossain, S.G. 2016. Shrinkage estimation of linear regression models with GARCH errors. Journal of Statistical Theory and Applications 15 (4): 405–423. Lawrence, K.D., R.K. Klimberg, and S.M. Lawrence. 2009. Fundamentals of Forecasting Using Excel. New York: IP. Magnusson, L.M. 2016. Econometrics in a formal science of economics: Theory and measurement of economic relations. Economic Record 92 (298): 509–511. Mélard, G., and J.M. Pasteels. 2000. Automatic ARIMA modeling including interventions, using time series expert software. International Journal of Forecasting 16 (4): 497–508. Moosa, I.A. 2000. Exchange rate forecasting: Techniques and applications. London: Macmillan Business. Pai, P., and C. Lin. 2005. A hybrid ARIMA and support vector machines model in stock price forecasting. Omega 33 (6): 497–505. Pineda, J.A.P. 2006. Spatial econometrics and regional science. Investigacion Economica 65 (258): 129. Pinto, H. 2011. The role of econometrics in economic science. An essay about the monopolization of economic methodology by econometrics methods. The Journal of Socio-Economics, 40(4): 436–443. Qiao, L., D. Liu, X. Yuan, Q. Wang, and Q. Ma. 2020. Generation and prediction of construction and demolition waste using exponential smoothing method: A case study of Shandong Province China. Sustainability, 12. Radchenko, V.M., and N.O. Stefanska. 2016. Fourier transformation of general stochastic measures. Theory of Probability and Mathematical Statistics 94: 143–149. Ramsay, C.R., L. Matowe, R. Grilli, J.M. Grimshaw, and R.E. Thomas. 2003. Interrupted time series designs in health technology assessment: Lessons from two systematic reviews of behaviour change strategies. International Journal of Technology Assessment in Health Care 19 (4): 613– 623. Tibco Statistica, v. 14.0.0. 2020. TIBCO Software Inc, Palo Alto, CA USA. Available from: https:// docs.tibco.com/products/tibco-statistica Turner, S. L., A. Karahalios, A. Forbes, M. Taljaard, J.M. Grimshaw, E. Korevaar, A.C. Cheng, L. Bero, and J.E. McKenzie. 2020. Creating effective interrupted time series graphs: Review and recommendations. Research Synthesis Methods. Wu, X., and X. Wang. 2020. Forecasting volatility using realized stochastic volatility model with time-varying leverage effect. Finance Research Letters, 34. Zheng, X., and S. Hurn. 2019. Editorial for the special issue on financial econometrics. China Finance Review International 9 (3): 310–311.

Chapter 3

Artificial Neural Networks—Selected Models

There is no doubt that artificial neural networks have been attracting the attention of a large number of scientists, experts and the uninitiated from all over the world for many years. It is so due to their ability to imitate the central nervous systems of living organisms in a far better way than previous techniques (Guresen and Kayakuthula 2011). The artificial neural network is currently an essential concept which Vochozka (2017) considers as an algorithm imitating the human brain. The author further argues that artificial neural networks rank amongst non-parametric, flexible instruments based on historical data and the ability to predict the future. Hamid and Habib (2014) define neural networks as a flexible and reliable field composed of principal and dominant elements—neurons. They allow the connection of various neural inputs and outputs, benefits, inhibition, while minimizing the influence of the defective neuron on the result. Vochozka et al. (2016) regard neural networks as a computer system based on integrated units, whose outputs usually process input data. Klieštik (2013) argues that artificial neural networks currently present one of the most popular techniques applicable in a wide range of fields. Apart from artificial intelligence, Ashoori and Mohammadi (2011) point out social and natural sciences, neuroscience, linguistics, technology, cognitive science or business environment, e.g. to model costs, enterprise valuation or bankruptcy prediction. Altun et al. (2007) proclaim that ANN ranks amongst the most attractive methods of the operational research of informatics; its practical application extensively exploits up-to-date technologies. Pao and Kaykutlu (2008) claim that ANNs are mostly suitable for analyticalintensive unidentifiable operations or modelling complex strategic decisions. The authors further suggest that a structure model and activity of the biological network are cornerstones of designing artificial neural networks. Bas et al. (2016) examined the structure of artificial neural networks, arguing that their type and geometry determine their composition. The system comprises three main constituents—artificial neural networks, the interconnection of artificial neurons and neural network layers. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_3

35

36

3 Artificial Neural Networks—Selected Models

The author suggests that artificial neural networks consist of interconnected artificial neurons transferring signals further transformed through specific functions of the transfer. Considering the pros and cons of ANNs, Sayadi et al. (2012) see advantages in their ability to generalize, learn from examples and definition of hidden non-linear dependences. Mileris and Boguslauskas (2011) state that neural networks mostly apply to analyses of masses of data laying down sets of criteria. The system works on a similar principle as a human. People, however, are not always able to process and interpret such an amount of information. Artificial neural networks never fail to meet established criteria or commit a human error. Kiruthka and Dilsha (2015) consider artificial neural networks as a rapid system allowing the identification of complex relations of individual input data. These statistics are not linear, requiring more than correlation and regression analysis. Echávarri et al. (2014) argue that the pros of artificial neural networks involve them carefully undertaking each task and being able to work with incomplete data. Liu et al. (2004) observe that structures use classical algorithms, allowing parallel calculations in several neurons at a time. Other advantages include the ability to learn, generalize or produce new data etc. The listed benefits contribute mostly to the area of the business environment. Rowland and Vrbka (2016) see disadvantages in illogical behaviour and the requirement for high-quality data. On the other hand, Echávarri et al. (2004) observe main drawbacks in a time-consuming preparation of the network, involving learning and strict requirements for handling the input data. Slavici et al. (2012) point out the extreme sensitivity to the organization, preparation and overall configuration. The method is applicable only under high computing power, including a longer processing time. According to Lahsasna et al. (2008), other cons of ANNs involve easy overtraining of the network, causing poor prediction results. The authors further suggest that it is essential to estimate the right moment to end the training. The professional experience has been employing a large number of artificial neural networks. The following text provides three frequently applied structures.

3.1 Radial Basis Function Neural Networks (Explanation, Usage) Pazouki et al. (2015) argue that RBF (Radial Basis Function) presents a network composed of three layers with feed forward signalling and learning through a teacher. The system can learn very quickly and express the similarity rate of the model to the prototype. As contrasted to other networks, the structure employs radial basis functions as activation functions. Jingfei et al. (2016) consider RBF neural networks as a simple method, although its generalizing ability is pretty strong. Gubana (2015) observes that the network topology consists of the input layer, hidden layer and output layer, which comprises linear processing units. Guan et al. (2016) emphasize

3.1 Radial Basis Function Neural Networks (Explanation, Usage)

37

the grave difficulty of determining the number of neurons in the hidden layer of the RBF. First-hand experience and previous findings point out the necessity to repeatedly test the number of neurons. Apart from that, there are doubts about RBF neural networks being convergent, or not. Hashemiho and Aghamohammadí (2013) see the basic concept of RBF in transforming the input data into the large space. RBF networks apply in approximating functions and predicting time series for cluster or classification tasks (3D object modelling, data fusions, image recovery, interpolation, language identification, time series chaotic modelling etc.) (Pazouki et al. 2015).

3.1.1 Mathematical Background According to Olej (2003), RBF (Radial basis function) networks are so-called threelayer neural networks. The first layer (input layer) is intended for the transmission of input values. The second one is the hidden layer, which consists of RBF units performing individual radial basis functions (RBF). The last (third) layer is the output layer, which usually has a linear character. Like MLP, RBF neural networks are also feed forward neural networks. In this type of network, the signal is transmitted from the inputs through the hidden to the output layer. For good network functionality, it is necessary that all neurons are connected in all layers. This means that neurons in adjacent layers are interconnected so that the out of one neuron is distributed in the inputs of neurons in the following layer. Gubana (2015) adds that the parameters of neural networks indicate the number of neurons and layers, depending on the nature of a relevant problem. Pazouki et al. (2015) state that RBF use radial basis function as an activation function; its basic principle is learning with a teacher. The authors also claim that in terms of approximation, those specified neural networks are natural, which is mainly due to the fact that they are approximated by means of functions affecting the resulting function only around the core of a relevant RBF neuron, not in its complete functional scope. In terms of classification, it can be stated that the application of RBF is also appropriate, since in many cases, a certain group of input vectors belongs to one of the classes which are sought for by means this neural network. According to Haykin (2001), RBF neurons are in the hidden layer. The author further adds that the radial basic functions (RBF) can be included in a special group of mathematic functions that increase or decrease monotonically with an increasing distance from the centre. Radial basic function can be described using the following formula:   (x − c)T . (x − c) (3.1) h i (x) = Ø |sc|

38

3 Artificial Neural Networks—Selected Models

Fig. 3.1 RBF network (Moreno et al. 2011)

(X). where: x c sc

input vector, centre of radial basic function. radius of radial basic function. If only a one-dimensional input vector is entered, the formula is as follows:   x − c   h i (x) = Ø  sc 

(3.2)

Radial basic function RBF Ø: R can be replaced e.g. by linear, cubic, multi quadratic, or Gaussian functions, etc. The figure below (Fig. 3.1) shows an example of a RBF network topology.

3.1.2 Case The methods used for these experiments were the radial basis function (RBF) neural networks; specifically, they were used for predicting the development of the gold price time series. The dataset was divided into three basic groups: testing, training, and validation datasets. Experiment 1 1.

Input variable (time—date), 1 output variable (price of gold), time series lag 1 day

The first experiment includes one input variable—time (date), one output variable, which is, in this case, price of gold, and time series lag of 1 day. Table 3.1 shows the basic overview of the neural networks used for predicting the development of price of gold. There are listed five most efficient and most successful networks (with the

Net name

RBF 1-24-1

RBF 1-25-1

RBF 1-30-1

RBF 1-28-1

RBF 1-22-1

Index

1

2

3

4

5

0.984569

0.985352

0.985971

0.984290

0.985283

Training perf

0.984367

0.985628

0.984687

0.984337

0.984936

Test perf

0.983187

0.984634

0.985232

0.983711

0.983279

Validation perf

Table 3.1 Overview of networks—RBF (Tibco 2020)

1529.233

1452.197

1391.248

1556.676

1458.937

Training error

1517.664

1396.228

1488.792

1523.381

1466.988

Test error

1508.264

1379.799

1326.705

1461.534

1500.297

Validation error

RBFT

RBFT

RBFT

RBFT

RBFT

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Gaussian

Gaussian

Gaussian

Gaussian

Gaussian

Hidden Activation

Identity

Identity

Identity

Identity

Identity

Output activation

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 39

40

3 Artificial Neural Networks—Selected Models

best characteristics). It shall be noticed that the network performance is about 98%, which is considered a very good performance. In all cases, the same training algorithm RBFT was used, while the sum of least squares (SOS) was used as an error function, and the Gaussian curve is used for the activation of the hidden layer of neurons. The Identity function is used for the activation of neurons in the output layer. The weights of the neural networks are listed in Table 3.2; however, there is only a part of the table is shown for demonstration, since due to its large extent, it is not possible to present the whole table. Therefore, only the first four IDs of the weights are presented. For all five networks, correlation coefficients of the individual networks and data subsets were determined. These indicate that if there is a correlation between two processes, these are likely to be mutually dependent. Table 3.3 shows that the value of the correlation coefficients is high for all networks, achieving the values of around 0.98. In terms of the correlation coefficients’ values, it is required that the number is as high as possible (as close to 1 as possible); identical for each network and all datasets (training, testing, and validation), if possible. The differences between the values of the neural networks presented in Table 3.3 are minimal. Table 3.4 shows the prediction statistics by the individual neural networks and datasets. It can be seen from the table that in the case of the maximal prediction on the training, testing, and validation datasets, the values are very similar for all networks except for 3. RBF 1-30-1. This applies also to the minimal prediction. As for the maximum and minimum residuals, the smallest possible difference between maximum and minimum is required. The best network from the perspective of residuals is 2. RBF 1-25-1, which is characterized by the smallest residuals. Statistics of individual data are given in Table 3.5. Figure 3.2 shows the trend of the curve representing the price of gold in USD/Troy ounce (marked blue) compared with the prediction of retained five best networks. As a sample, the training, testing, and validation datasets are used. Figure 3.3 shows the prediction of time series for USD/oz, where the testing dataset is used as a sample. The neural network is 1. RBF 1-24-1. Figure 3.4 shows the prediction of time series for USD/oz, where the testing dataset is used as a sample. The neural network is 2. RBF 1-25-1. Figure 3.5 shows the graph of the prediction of the time series for USD/oz. The testing dataset is a sample; the network is 3. RBF 1-30-1. The test is marked red; the network is represented by red curve. Figure 3.6 shows the graph of prediction of the time series for USD/oz. As a sample, the testing dataset is used. The network is 4. RBF 1-28-1. The test is marked blue; the network is represented by a red curve. The comparison of the course of the testing dataset (a sample) and the result provided by the network 5. RBF 1-22-1 is shown in Fig. 3.7. In comparison with the testing data set, all networks show very good results. All of them could in this case be used for prediction while ensuring high performance

0.306415 Date-1 → hidden neuron 1

0.717736 Date-1 → hidden neuron 2

0.682075 Date-1 → hidden neuron 3

0.408113 Date-1 → hidden neuron 4

Date-1 → hidden neuron 1

Date-1 → hidden neuron 2

Date-1 → hidden neuron 3

Date-1 → hidden neuron 4

2

3

4

0.248113 Date-1 → hidden neuron 4

0.371509 Date-1 → hidden neuron 3

0.693962 Date-1 → hidden neuron 2

0.251321 Date-1 → hidden neuron 4

0.020000 Date-1 → hidden neuron 3

0.600000 Date-1 → hidden neuron 2

0.452830 Date-1 → hidden neuron 4

0.495283 Date-1 → hidden neuron 3

0.172830 Date-1 → hidden neuron 2

0.654151

0.438679

0.000566

0.698491

Connections 5. RBF Weight 1-22-1 values 5. RBF 1-22-1 0.358491 Date-1 → hidden neuron 1

Connections 4. RBF Weight 1-28-1 values 4. RBF 1-28-1

0.388679 Date-1 → hidden neuron 1

Connections 3. RBF Weight 1-30-1 values 3. RBF 1-30-1

0.211698 Date-1 → hidden neuron 1

Connections 2. RBF Weight 1-25-1 values 2. RBF 1-25-1

1

Weight Connections 1. RBF Weight ID 1-24-1 values 1. RBF 1-24-1

Table 3.2 Weights of networks—experiment 1 RBF (Tibco 2020)

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 41

42

3 Artificial Neural Networks—Selected Models

Table 3.3 Correlation coefficients—RBF NN (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. RBF 1-24-1

0.985283

0.984936

0.983279

2. RBF 1-25-1

0.984290

0.984337

0.983711

3. RBF 1-30-1

0.985971

0.984687

0.985232

4. RBF 1-28-1

0.985352

0.985628

0.984634

5. RBF 1-22-1

0.984569

0.984367

0.983187

of the network. In terms of the minimum residuals, the best network appears to be 2. RBF 1-25-1. Experiment 2 1.

Input variable (time—date), 1 output variable (price of gold), time series lag 5 days

The second experiment includes one input variable (time—date), one output variable, which, in our case, is the price of gold, and the time series lag of five days. Table 3.6 represents a basic overview of the neural networks used for predicting the development of the price of gold, where the first five most efficient and most successful networks (networks with the best characteristics) are listed. It shall be noticed that the performance of the networks is 20–30% in this case, which is considered a very bad result, as in the previous experiment, the performance of the networks is about 98%. For all cases, the same training algorithm RBFT is used; as an error function, the sum of the least squares (SOS) is used, while the Gaussian curve is a function for the activation of the hidden layer of neurons. For the activation of neurons in the output layer, the Identity function is used. The weights of the neural networks are shown in Table 3.7. In this case, only a sample is shown, as due to its large extent, it is not possible to show the entire table with all weights. Therefore, only the first four IDs of the weights are presented. For all five neural networks, correlation coefficients and data subsets were determined, indicating that if there is correlation between two processes, they are likely to be mutually dependent. Table 3.8 shows that the value of the correlation coefficient is low in the case of all networks, as none of the coefficient exceeded the value of 0.5; in one case (network RBF 5-22-1), the value of training and testing datasets is even lower than 0.1, which indicates a very low performance. In the case of correlation coefficients’ values, it is required that the number is as high as possible (as close to 1 as possible), and that the values are ideally identical for each network in all datasets (training, testing, and validation). The differences between the values of the datasets for one network are minimal (except for one); however, their performance is not sufficient. Prediction statistics for individual networks in the case of the training, testing, and validation datasets are given in Table 3.9. The data statistics are given in Table 3.10. Table 3.9 also indicates that the values are not satisfactory and the poor functionality

1736.028 −171.607 222.109 −189.760

1705.554

−218.360

236.738

−169.120

Maximum prediction (validation)

Minimum residual (train)

Maximum residual (train)

Minimum residual (test)

4.494

Maximum standard residual (validation)

3.985

5.585 −3.878

6.116

−5.393

Minimum standard residual (validation)

−4.862

−4.416

Minimum standard residual (test)

Maximum standard residual (test)

5.629

−4.349

−5.717

6.198

Maximum residual (validation)

Maximum standard residual (train)

152.346

174.060

Minimum residual (validation)

Minimum standard residual (train)

217.991 −148.253

234.254

−208.887

Maximum residual (test)

558.860

1735.661

558.960

1706.304

599.784

Minimum prediction (test)

558.861 1736.064

599.839

1706.367

Maximum prediction (train)

Minimum prediction (validation)

599.783

Minimum prediction (train)

2. RBF 1-25-1

Maximum prediction (test)

1. RBF 1-24-1

Statistics

Table 3.4 Predictions statistics—RBF NN (Tibco 2020)

6.086

−5.412

7.396

−4.140

7.679

−5.054

221.675

−197.139

285.369

−159.727

286.407

−188.519

1871.601

529.154

1691.276

532.364

1924.269

528.905

3. RBF 1-30-1

6.805

−5.642

6.092

−4.862

7.041

−5.457

252.781

−209.565

227.624

−181.682

268.306

−207.971

1713.824

539.545

1714.812

539.446

1714.813

539.406

4. RBF 1-28-1

5.341

−4.168

5.672

−4.188

5.692

−5.572

207.416

−161.857

220.972

−163.171

222.579

−217.905

1695.424

565.769

1695.428

565.971

1695.447

565.750

5. RBF 1-22-1

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 43

44

3 Artificial Neural Networks—Selected Models

Table 3.5 Data statistics—RBF NN (Tibco 2020) Samples

Date Input

Minimum train)

38,720.00

529.500

Maximum (train)

44,020.00

1895.000

Mean (train)

41,340.06

1201.595

1523.42

316.367

Standard deviation (train)

USD/oz Target

Minimum (test)

38,728.00

538.750

Maximum (test)

44,015.00

1895.000

Mean (test)

41,515.31

1218.531

1590.14

313.031

Standard deviation (test) Minimum (validation)

38,722.00

524.750

Maximum (validation)

44,019.00

1834.000

Mean (validation)

41,440.15

1207.606

2335.57

543.600

Minimum (overall)

38,720.00

524.750

Maximum (overall)

44,020.00

1895.000

Mean (overall)

41,381.33

1205.034

1533.01

313.589

Standard deviation (validation)

Standard deviation (overall)

Fig. 3.2 Time series prediction for USD/oz (Tibco 2020)

3.1 Radial Basis Function Neural Networks (Explanation, Usage)

45

Fig. 3.3 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 1-24-1 (Tibco 2020)

Fig. 3.4 Smoothed time series and prediction for 44 trading days (two calendar months)—2. RBF 1-25-1 (Tibco 2020)

46

3 Artificial Neural Networks—Selected Models

Fig. 3.5 Smoothed time series and prediction for 44 trading days (two calendar months)—3. RBF 1-30-1 (Tibco 2020)

Fig. 3.6 Smoothed time series and prediction for 44 trading days (two calendar months)—4. RBF 1-28-1 (Tibco 2020)

3.1 Radial Basis Function Neural Networks (Explanation, Usage)

47

Fig. 3.7 Smoothed time series and prediction for 44 trading days (two calendar days)—5. RBF 1-22-1 (Tibco 2020)

of the networks can be expected even at this moment. In the case of residuals, it is required that the sum of the maximum and minimum values was 0 in ideals case, or in the case of the minimum, the highest negative number and the highest positive number in the case of the maximum are sought. Figure 3.8 shows the graph of the prediction of the time series for USD/oz, where the course of all networks’ curves is described. It is clear from the figure that the network 1. RBF 5-22-1 shows meaningless behaviour, and taking into account the displayed curve it can be conclude that this network is not applicable for predicting the price of gold time series. In this case, the sample was training, testing, and validation dataset. Figure 3.9 presents the course of the prediction curve of the network 1. RBF 5-221 together with the curve of the testing dataset; from the graph, it is not possible to obtain any information that would indicate the possibility of applying this network in practice for predicting the price of gold. The network shows confusing behaviour. Figure 3.10 presents the course of predicting the price of gold by the network 1. RBF 5-29-1 together with the curve of the testing dataset. Even in this case, the graph does not provide any information indicating the possibility of applying this network in practice for predicting the price of gold. The network shows confusing behaviour as well. Figure 3.11 presents a graph of the development of time series prediction for USD/oz, where the testing dataset is used as a sample. The graph clearly shows that

Net. name

RBF 5-22-1

RBF 5-29-1

RBF 5-22-1

RBF5-22-1

RBF 5-24-1

Index

1

2

3

4

5

0.225869

0.265853

0.314613

0.271311

0.072128

Training perf

0.268852

0.280137

0.346376

0.271381

0.059483

Test perf

0.263909

0.298474

0.303653

0.263919

0.252078

Validation perf

Test error 1.633586E + 21 3.873458E + 17 6.031110E + 17 2.617176E + 17 1.960738E + 16

Training error 2.255188E + 21 4.139185E + 17 5.236097E + 17 3.032763E + 17 1.353818E + 16

Table 3.6 Overview of networks—Experiment 2 RBF (Tibco 2020)

1.535418E + 16

2.303932E + 17

3.848165E + 17

3.135597E + 17

5.942022E + 18

Validation error

RBFT

RBFT

RBFT

RBFT

RBFT

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Gaussian

Gaussian

Gaussian

Gaussian

Gaussian

Hidden activation

Identity

Identity

Identity

Identity

Identity

Output activation

48 3 Artificial Neural Networks—Selected Models

Date-1 → 0.055849 hidden neuron 1

Date-1 → 0.056038 hidden neuron 2

Date-1 → 0.056604 hidden neuron 3

Date-1 → 0.056792 hidden neuron 4

Date-1 → 0.077736 hidden neuron 1

Date-1 → 0.077925 hidden neuron 2

Date-1 → 0.078113 hidden neuron 3

Date-1 → 0.078302 hidden neuron 4

1

2

3

4

Date-1 → 0.276226 hidden neuron 4

Date-1 → 0.276038 hidden neuron 3

Date-1 → 0.275849 hidden neuron 2

Date-1 → 0.274906 hidden neuron 1

Date-1 → 0.600755 hidden neuron 4

Date-1 → 0.600189 hidden neuron 3

Date-1 → 0.600000 hidden neuron 2

Date-1 → 0.599811 hidden neuron 1

Date-1 → 0.071321 hidden neuron 4

Date-1 → 0.071132 hidden neuron 3

Date-1 → 0.070566 hidden neuron 2

Date-1 → 0.070377 hidden neuron 1

Weigt ID Connections Weight values Connections Weight values Connections Weight values Connections Weight values Connections Weight values 1. RBF 5-22-1 1. RBF 5-22-1 2. RBF 5-29-1 2. RBF 5-29-1 3. RBF 5-22-1 3. RBF 5-22-1 4. RBF 5-22-1 4. RBF 5-22-1 5. RBF 5-24-1 5. RBF 5-24-1

Table 3.7 Weights of networks—Experiment 2 RBF (Tibco 2020)

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 49

50

3 Artificial Neural Networks—Selected Models

Table 3.8 Correlation coefficients—Experiment 2 RBF NN (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. RBF 5-22-1

0.072128

0.059483

0.252078

2. RBF 5-29-1

0.271311

0.271381

0.263919

3. RBF 5-22-1

0.314613

0.346376

0.303653

4. RBF 5-22-1

0.265853

0.280137

0.298474

5. RBF 5-24-1

0.225869

0.268852

0.263909

the network 3. RBF 5-22-1 provides confusing values and it is thus not suitable for predicting the price of gold time series. The curve of the network 4. RBF 5-22-1 shows a similar course and indicates the impossibility to use the network for predicting the price of gold time series. The network is thus not suitable for predicting the price of gold (see Fig. 3.12). Similarly, from Fig. 3.13 it is clear that the course of the network 5. RBF 5-24-1 is not satisfactory and there are no indications that the network could be used for predicting the price of gold time series. Experiment 2 was not as successful as Experiment 1. Instead of 1-day time lag of the time series, the time lag is five days. This had a significant impact on the results, as neither of the five best neural networks showed the behaviour and results indicating that it could be used for predicting the price of gold time series. Experiment 3 1.

Input variable (time—date), 1 output variable (price of gold), time series time lag 10 days

The third experiment includes one input variable (time—date), one output variable, which is in our case the price of gold, and a 10-day time lag of the time series. Table 3.11 presents a basic overview of the neural networks used for predicting the development of the price of gold. The first five most efficient and successful networks (networks with the best characteristics) are mentioned. It shall be noticed that the networks’ performance in this case is 19–23%, which is a very low performance. This information also confirms a high probability that none of the networks will be applicable for accurate prediction of the price of gold development. For all cases, the same training algorithm RBFT is used. As an error function, the sum of squares (SOS) is used, while for the activation of the hidden layer of neurons, the Gaussian curve is use. The function used for the activation of neurons in the output layer is Identity. The weights of the individual networks are presented in Table 3.12. The table contains only the first four rows that apply to the weights of the individual networks due to the large amount of the data rows. Even in this experiment, the correlation coefficients of the individual networks and data subsets. These are given in Table 3.13, indicating that the value of the correlation coefficient is very low for all networks, which confirms their poor performance. The

1.038252E + 01 −1.359789E + 01 1.172866E + 01

2.185005E + 01

Maximum standard residual (validation)

−1.305466E + 01

−1.036222E-01

Minimum standard residual (test) 3.224364E + 01

1.031345E + 01

2.697619E + 01

Maximum standard residual (train)

−9.837515E-01

−1.261045E + 01

−2.310319E + 00

Minimum standard residual (train)

Minimum standard residual (validation)

6.567628E + 09

5.326226E + 10

Maximum residual (validation)

Maximum standard residual (test)

6.461783E + 09 −7.614330E + 09

−8.124844E + 09

−4.188165E + 09

Minimum residual (test) 1.303212E + 12

6.635312E + 09

1.281067E + 12

Maximum residual (train)

−2.398018E + 09

−8.113124E + 09

−1.097143E + 11

Minimum residual (train)

Minimum residual (validation)

7.614332E + 09

2.398019E + 09

Maximum prediction (validation)

Maximum residual (test)

8.124846E + 09 −6.567627E + 09

4.188166E + 09

−6.461783E + 09

−1.303212E + 12

Minimum prediction (test) −5.326226E + 10

8.113126E + 09

1.097143E + 11

Maximum prediction (train)

Minimum prediction (validation)

−6.635312E + 09

−1.281067E + 12

Minimum prediction (train)

Maximum prediction (test)

2. RBF 5-29-1

1. RBF 5-22-1

Statistics

Table 3.9 Prediction statistics—Experiment 2 RBF NN (Tibco 2020)

1.191893E + 01

−1.167855E-01

1.003501E + 01

−9.261236E-02

1.077022E + 01

−1.001407E-01

7.393738E + 09

−7.244625E + 07

7.793213E + 09

−7.192296E + 07

7.793425E + 09

−7.246266E + 07

7.244796E + 07

−7.393738E + 09

7.192468E + 07

−0.793213E + 09

7.246438E + 07

−7.793424E + 09

3. RBF 5-22-1

1.053968E + 01

−2.258828E-01

1.004247E + 01

−2.119250E-01

9.467463E + 00

−1.961830E-01

5.058970E + 09

−1.084221E + 08

5.137561E + 09

−1.084173E + 08

5.213782E + 09

−1.080390E + 08

1.084231E + 08

−5.058970E + 09

1.084183E + 08

−5.137560E + 09

1.080400E + 08

−5.213781E + 09

4. RBF 5-22-1

1.212751E + 01

−3.027506E + 00

1.063759E + 01

−2.685191E + 00

1.291631E + 01

−3.235387E + 00

1.502743E + 09

−3.751443E + 08

1.489543E + 09

−3.759975E + 08

1.502861E + 09

−3.764493E + 08

3.751454E + 08

−1.502743E + 09

3.759987E + 08

−1.489543E + 09

3.764504E + 08

−1.502860E + 09

5. RBF 5-24-1

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 51

52

3 Artificial Neural Networks—Selected Models

Table 3.10 Data statistics—Experiment 2 RBF NN (Tibco 2020) Samples

Date input

USD/oz target

Minimum (train)

38,720.00

529.500

Maximum (train)

44,020.00

1895.000

Mean (train)

41,340.06

1201.595

Standard deviation (train)

1523.42

316.367

Minimum (test)

38,728.00

538.750

Maximum (test)

44,015.00

1895.000

Mean (test)

41,515.31

1218.531

Standard deviation (test)

1590.14

313.031

Minimum (validation)

38,722.00

524.750

Maximum (validation)

44,019.00

1834.000

Mean (validation)

41,440.15

1207.606

Standard deviation (validation)

2335.57

543.600

Minimum (overall)

38,720.00

524.750

Maximum (overall)

44,020.00

1895.000

Mean (overall)

41,381.33

1205.034

Standard deviation (overall)

1533.01

313.589

Fig. 3.8 Predicting time series for USD/oz—Experiment 2 RBF all networks (Tibco 2020)

3.1 Radial Basis Function Neural Networks (Explanation, Usage)

53

Fig. 3.9 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 5-22-1 (Tibco 2020)

Fig. 3.10 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 5-29-1 (Tibco 2020)

54

3 Artificial Neural Networks—Selected Models

Fig. 3.11 Smoothed time series and prediction for 44 trading days (two calendar months)—3. RBF 5-22-1 (Tibco 2020)

Fig. 3.12 Smoothed time series and prediction for 44 trading days (two calendar months)—4. RBF 5-22-1 (Tibco 2020)

3.1 Radial Basis Function Neural Networks (Explanation, Usage)

55

Fig. 3.13 Smoothed time series and prediction for 44 trading days (two calendar months)—5. RBF 5-24-1 (Tibco 2020)

lowest coefficient value in the table is 0.128979, while the highest value is around 0.228095; however, even this value is not sufficient to ensure good performance. In terms of correlation coefficients’ value, it is required that the number is as high as possible (closest to 1), and identical for each network in all datasets (training, testing, and validation), if possible. Table 3.14 presents the prediction statistics of individual networks by individual neural networks and datasets. The table shows that in the case of the maximum prediction for the training, testing, and validation datasets, the values differ significantly in all networks, except for 3. RBF 10-28-1, 5. RBF 10-24-1, and 1. RBF 10-21-1. This applies to the minimum prediction as well. As for the maximum and minimum residuals, the minimal difference between the minimum and maximum is required. In terms of residuals, the best network appears to be 3. RBF 10-28-1, which has the smallest residuals. Unlike this, the networks 2. RBF 10-29-1 and 4. RBF 10-23-1 show a large difference between the minimum and maximum. Table 3.15 presents the data statistics evaluated within the Experiment 3. Figure 3.14 clearly shows large fluctuations of the curve for the networks 2 and 4. It can thus be concluded that the course of the curves indicates the inability of the networks to predict the price of gold correctly. The same conclusion is indicated by Fig. 3.15, which shows the graphs comparing the curve of predictions made by the individual networks with the curve of the testing dataset. None of the networks represented by the graphs below appears to be

Net. name

RBF 10-21-1

RBF 10-29-1

RBF 10-28-1

RBF 10-23-1

RBF 10-24-1

Index

1

2

3

4

5

0.205615

0.173611

0.227991

0.228095

0.147261

Training perf

0.216607

0.154072

0.222300

0.218430

0.128979

Test perf

0.211115

0.198337

0.224916

0.216735

0.210811

Validation perf

Test error 4.404266E + 33 1.890416E + 37 5.915984E + 31 1.065013E + 35 6.935049E + 34

Training error 7.264679E + 33 1.915493E + 37 6.679056E + 31 1.012423E + 36 6.303151E + 34

Table 3.11 Overview of networks—Experiment 3 RBF (Tibco 2020)

7.667120E + 34

9.559485E + 35

4.084511E + 31

1.281073E + 37

1.351736E + 34

Validation error

RBFT

RBFT

RBFT

RBFT

RBFT

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Gaussian

Gaussian

Gaussian

Gaussian

Gaussian

Hidden activation

Identity

Identity

Identity

Identity

Identity

Output activation

56 3 Artificial Neural Networks—Selected Models

Weight values 1. RBF 10-21-1

0.011887

0.012075

0.012264

0.012453

Connections 1. RBF 10-21-1

Date-1 → hidden neuron 1

Date-1 → hidden neuron 2

Date-1 → hidden neuron 3

Date-1 → hidden neuron 4

Weight ID

1

2

3

4

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections2. RBF 10-29-1

0.244340

0.244151

0.243585

0.243396

Weight values 2. RBF 10-29-1

Table 3.12 Overview of networks—Experiment 3 RBF (Tibco 2020)

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections3. RBF 10-28-1

0.357736

0.357170

0.356981

0.356792

Weight values 3. RBF 10-28-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 4. RBF 10-23-1

0.689623

0.689434

0.689245

0.688302

Weight values 4. RBF 10-23-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 5. RBF 10-24-1

0.296415

0.296226

0.296038

0.295849

Weight values 5. RBF 10-24-1

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 57

58

3 Artificial Neural Networks—Selected Models

Table 3.13 Correlation coefficients—Experiment 3 RBF NN (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. RBF 10-21-1

0.147261

0.128979

0.210811

2. RBF 10-29-1

0.228095

0.218430

0.216735

3. RBF 10-28-1

0.227991

0.222300

0.224916

4. RBF 10-23-1

0.173611

0.154072

0.198337

5. RBF 10-24-1

0.205615

0.216607

0.211115

applicable and able to make an accurate prediction of the development of the price of gold curve. Experiment 4 1.

Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 1-day lag of the time series

This fourth experiment includes five input variable, specifically time—date, day of the week, day of the month, month, and year. It includes one output variable— the price of gold. The time lag of the time series is one day. The overview of the individual networks used for the Experiment 4 is given in Table 3.16. The table presents five most successful networks with the best characteristics. An important value, which is interesting in terms of the dependence on the individual networks, is the performance of the network, which exceeds the value of 0.80 in the case of training, testing, and validation datasets in all networks. As for the training dataset’s performance, the highest performance is achieved by the network 2. RBF 5-28-1, with the training dataset’s performance value of 0.861124. This applies also to the testing and validation datasets. The network 2. RBF 5-28-1 appears to be the most efficient. Table 3.17 presents the individual weights of the networks. The table presents only a part of the weights due to the large extent of the table. Even in the case of Experiment 4, the correlation coefficients were determined for the individual networks and datasets. In this case, Table 3.18 shows that the values of the training, testing, and validation datasets are similar to each other. In all cases, the correlation coefficients exceed the value of 0.80, which indicates high network’s performance, as a value as close to 1 as possible is required. Prediction statistics of the individual networks are presented in Table 3.19. Table 3.20 below shows the data statistics. The graph of the USD/oz time series and the individual networks is shown in Fig. 3.16. The figure indicates that most of the networks follow the curve of USD/oz. The graphs of smoothed time series and the prediction for 44 trading days (two calendar months) for all networks are presented in Fig. 3.17. It can be concluded based on the graph that all five networks have a similar course of the prediction curve. The test curve (marked blue) is the most closely copied by the curve of the network 2. RBF 5-28-1.

2.467397E-01 −2.267926E + 01 1.823189E-03

2.097723E + 00

Maximum standard residual (validation)

−1.767486E + 01

−2.130072E + 01

Minimum standard residual (test) 3.811311E + 00

2.423464E-01

2.977538E + 00

Maximum standard residual (train)

−1.354852E + 01

−1.864822E + 01

−1.863132E + 01

Minimum standard residual (train)

Minimum standard residual (validation)

6.525574E + 15

2.438901E + 17

Maximum residual (validation)

Maximum standard residual (test)

1.072797E + 18 −8.117379E + 19

−7.684836E + 19

−1.413615E + 18

Minimum residual (test) 2.529363E + 17

1.060662E + 18

2.537848E + 17

Maximum residual (train)

−1.575207E + 18

−8.161645E + 19

−1.588005E + 18

Minimum residual (train)

Minimum residual (validation)

8.117379E + 19

1.575207E + 18

Maximum prediction (validation)

Maximum residual (test)

7.684836E + 19 −6.525574E + 15

1.413615E + 18

−1.072797E + 18

−2.529363E + 17

Minimum prediction (test) −2.438901E + 17

8.161645E + 19

1.588005E + 18

Maximum prediction (train)

Minimum prediction (validation)

−1.060662E + 18

−2.537848E + 17

Minimum prediction (train)

Maximum prediction (test)

2. RBF 10-29-1

1. RBF 10-21-1

Statistics

Table 3.14 Prediction statistics—experiment 3 RBF NN (Tibco 2020)

1.969824E + 00

−2.046917E + 01

1.648246E + 00

−1.665085E + 01

1.553294E + 00

−1.631314E + 01

1.258918E + 16

−1.308188E + 17

1.267755E + 16

−1.280708E + 17

1.269437E + 16

−1.333200E + 17

1.308188E + 17

−1.258918E + 16

1.280708E + 17

−1.267755E + 16

1.333200E + 17

−1.269437E + 16

3. RBF 10-28-1

1.912147E + 01

−8.794625E-04

2.460340E + 01

−2.730818E-03

2.120139E + 01

−8.867346E-04

1.869556E + 19

−8.598735E + 14

8.029206E + 18

−8.911898E + 14

2.133268E + 19

−8.922255E + 14

8.598735E + 14

−1.869556E + 19

8.911898E + 14

−8.029206E + 18

8.922255E + 14

−2.133268E + 19

4. RBF 10-23-1

1.507784E + 01

−8.455025E + 00

1.575363E + 01

−8.785817E + 00

1.690587E + 01

−9.335269E + 00

4.174989E + 18

−2.341160E + 18

4.148635E + 18

−2.313699E + 18

4.244400E + 18

−2.343720E + 18

2.341160E + 18

−4.174989E + 18

2.313699E + 18

−4.148635E + 18

2.343720E + 18

−4.244400E + 18

5. RBF 10-24-1

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 59

60

3 Artificial Neural Networks—Selected Models

Table 3.15 Data statistics—Experiment 3 RBF NN (Tibco 2020) Samples

Date input

USD/oz target

Minimum (train)

38,720.00

529.500

Maximum (train)

44,020.00

1895.000

Mean (train)

41,340.06

1201.595

Standard deviation (train)

1523.42

316.367

Minimum (test)

38,728.00

538.750

Maximum (test)

44,015.00

1895.000

Mean (test)

41,515.31

1218.531

Standard deviation (test)

1590.14

313.031

Minimum (validation)

38,722.00

524.750

Maximum (validation)

44,019.00

1834.000

Mean (validation)

41,440.15

1207.606

Standard deviation (validation)

2335.57

543.600

Minimum (overall)

38,720.00

524.750

Maximum (overall)

44,020.00

1895.000

Mean (overall)

41,381.33

1205.034

Standard deviation (overall)

1533.01

313.589

Fig. 3.14 Prediction of time series for USD/oz—experiment 3 (Tibco 2020)

3.1 Radial Basis Function Neural Networks (Explanation, Usage)

61

Fig. 3.15 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 10-21-1, 2. RBF 10-29-1, 3. RBF 10-28-1, 4. RBF 10-23-1, 5. RBF 10-24-1 Experiment 3 (Tibco 2020)

Net. name

RBF 5-22-1

RBF 5-28-1

RBF 5-29-1

RBF 5-30-1

RBF 5-26-1

Index

1

2

3

4

5

0.853527

0.855557

0.846489

0.861124

0.855625

Training perf

0.854253

0.854172

0.850674

0.858894

0.853390

Test perf

0.837543

0.840979

0.844050

0.855178

0.850385

Validation perf

13,535.98

13,362.97

14,132.54

12,886.54

13,357.16

Training error

Table 3.16 Overview of networks—Experiment 4 RBF (Tibco 2020)

13,314.64

13,359.35

13,595.14

12,977.59

13,428.16

Test error

13,505.04

13,269.61

13,040.51

12,192.90

12,530.36

Validation error

RBFT

RBFT

RBFT

RBFT

RBFT

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Gaussian

Gaussian

Gaussian

Gaussian

Gaussian

Hidden activation

Identity

Identity

Identity

Identity

Identity

Output activation

62 3 Artificial Neural Networks—Selected Models

Date-1 → 0.48623 hidden neuron 1

Date-1 → 0.50000 hidden neuron 2

Date-1 → 0.73333 hidden neuron 3

Date-1 → 0.00000 hidden neuron 4

Date-1 → 0.14038 hidden neuron 1

Date-1 → 0.75000 hidden neuron 2

Date-1 → 0.53333 hidden neuron 3

Date-1 → 0.00000 hidden neuron 4

2

3

4

Date-1 → 0 hidden neuron 4

Date-1 → 0 hidden neuron 3

Date-1 → 1 hidden neuron 2

Date-1 → -7 hidden neuron 1

Connections Weight values Connections Weight values 2. RBF 5-28-1 2. RBF 5-28-1 3. RBF 5-29-1 3. RBF 5-29-1

1

Weight ID Connections Weight values 1. RBF 5-22-1 1. RBF 5-22-1

Table 3.17 Overview of networks’ weights—RBF Experiment 4 (Tibco 2020)

Date-1 → 0.36364 hidden neuron 4

Date-1 → 0.50000 hidden neuron 3

Date-1 → 1.00000 hidden neuron 2

Date-1 → 0.57642 hidden neuron 1

Date-1 → 0.1818 hidden neuron 4

Date-1 → → 0.1333 hidden neuron 3

Date-1 → 0.0000 hidden neuron 2

Date-1 → 0.4251 hidden neuron 1

Connections Weight values Connections Weight values 4. RBF 5-30-1 4. RBF 5-30-1 5. RBF 5-26-1 5. RBF 5-26-1

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 63

64

3 Artificial Neural Networks—Selected Models

Table 3.18 Correlation coefficients—Experiment 4 RBF NN (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. RBF 5-22-1

0.855625

0.853390

0.850385

2. RBF 5-28-1

0.861124

0.858894

0.855178

3. RBF 5-29-1

0.846489

0.850674

0.844050

4. RBF 5-30-1

0.855557

0.854172

0.840979

5. RBF 5-26-1

0.853527

0.854253

0.837543

Experiment 5 1.

Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 5-day time lag of the time series 5

This fifth experiment includes five input variables, specifically time-date, day of the week, day of the month, month, year. Moreover, it includes one output variable—the price of gold. The time lag of the time series is five days. The overview of the individual networks for the Experiment 5 is shown in Table 3.21. The table shows five most successful networks with the best characteristics. An important value important in terms of the dependency on the individual networks is the network performance, which, in the case of validation, testing, and training datasets, exceeds 0.80 for all networks. As for the training dataset performance, the most efficient network appears to be the network 3. RBF 25-29-1, with the performance value of 0.689605. The same applies to the testing and validation datasets, where the network 3. RBF 25-29-1 appears to be the most efficient one. As for the level of testing, training, and validation dataset’s error, the most successful network showing the smallest error is the network 3. RBF 25-29-1. The weights of the networks (a part of them) are shown in Table 3.22. Due to the large extent of the table, only a small part of it was presented for illustration. Correlation coefficients were determined even for the Experiment 5. It follows from Table 3.23 that in the case of validation, testing, and training datasets, the correlation coefficients of the individual networks exceed the value of 0.6, which indicates an average performance. Prediction statistics for the fifth experiment are given in Table 3.24, which shows that the values differ significantly both among the datasets and the networks. In the case of residuals, the range is similar for most of the networks. The data statistics are given in Table 3.25. Figure 3.18 shows the development of the individual networks’ curves in comparison with the curve of USD/oz (marked blue). The figure clearly shows that the curves of the individual networks at least partially follow the shape of the USD/oz curve; however, it is not possible to identify which of the curves is closest to the test curve. The graphs of smoothed time series and prediction for 44 trading days (two calendar months) are presented in Fig. 3.19. The follows from the figure that only the network 4. RBF 25-27-1 is able to predict the development of price of gold accurately, as in the last third of the graph, it was able to predict the upward trend

1653.310 −662.257 530.989 −314.606 514.756 −387.392 491.509 −5.834 4.678 −2.762 4.519 −3.508

1610.134

326.541

1580.539

333.125

1606.888

−623.079

574.318

−327.082

607.187

−310.017

581.582

−5.391

4.969

−2.823

5.240

−2.770

Maximum prediction (train)

Minimum prediction (test)

Maximum prediction (test)

Minimum prediction (validation)

Maximum prediction (validation)

Minimum residual (train)

Maximum residual (train)

Minimum residual (test)

Maximum residual (test)

Minimum residual (validation)

Maximum residual (validation)

Minimum standard residual (train)

Maximum standard residual (train)

Minimum standard residual (test)

Maximum standard residual (test)

Minimum standard residual (validation)

2. RBF 5-28-1

Minimum prediction (train)

354.589

1658.111

368.072

1686.544

325.207

1. RBF 5-22-1

315.495

Statistics

Table 3.19 Prediction statistics—Experiment 4 RBF NN (Tibco 2020)

−2.805

4.701

−2.861

6.928

−3.354

506.220

−320.347

548.175

−333.631

823.616

−398.691

1734.012

329.929

1758.465

315.697

1820.519

−4.153

5.728

−3.062

5.482

−5.568

513.179

−478.385

662.005

−353.896

633.697

−643.668

1596.790

314.374

1615.591

287.725

1665.795

4. RBF 5-30-1 222.561

3. RBF 5-29-1 −281.116

−3.218

5.379

−2.890

5.280

−7.183

512.095

−373.984

620.717

−333.529

614.251

−835.680

1583.979

287.997

1626.715

249.682

1951.241

203.087

5. RBF 5-26-1

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 65

41,340.06

1523.42

19.00

44,015.00

41,444.80

2377.31

38,722.00

44,019.00

41,440.15

2927.85

19.00

44,020.00

41,370.76

1677.88

Minimum (test)

Maximum (test)

Mean (test)

Standard deviation (test)

Minimum (validation)

Maximum (validation)

Mean (validation)

Standard deviation (validation)

Minimum (overall)

Maximum (overall)

Mean (overall)

Standard deviation (vverall)

44,020.00

Maximum (train)

Standard deviation (train)

38,720.00

Minimum (train)

Mean (train)

Date input

Samples

1.400899

4.018564

6.000000

2.000000

1.396892

4.027322

6.000000

2.000000

1.441273

4.034608

6.000000

2.000000

1.391633

4.013255

6.000000

2.000000

Day in week input

Table 3.20 Data statistics—Experiment 4 RBF NN (Tibco 2020)

8.70486

15.61551

31.00000

1.00000

8.69007

15.83789

31.00000

1.00000

8.90170

15.28415

31.00000

1.00000

8.65048

15.63885

31.00000

1.00000

Day in month input

3.42037

6.39503

12.00000

1.00000

3.57921

6.52095

12.00000

1.00000

3.42252

6.51730

12.00000

1.00000

3.41621

6.34191

12.00000

1.00000

Month input

4.201

2012.805

2020.000

2006.000

6.354

2012.954

2020.000

2006.000

4.367

2013.162

2020.000

2006.000

4.172

2012.696

2020.000

2006.000

Year input

313.589

1205.034

1895.000

524.750

543.600

1207.606

1834.000

524.750

313.031

1218.531

1895.000

538.750

316.367

1201.595

1895.000

529.500

USD/oz target

66 3 Artificial Neural Networks—Selected Models

3.1 Radial Basis Function Neural Networks (Explanation, Usage)

67

Fig. 3.16 Time series prediction for USD/oz—Experiment 4 RBF all networks (Tibco 2020)

of the curve, and its curve thus partially followed the testing dataset curve. All the remaining networks predicted the opposite trend in the last third of the graph. All networks show large residuals. Experiment 6 1.

Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 10-day time lag of time series

For sixth experiment 6, 5 input variables were selected—time—date, day of the week, day of the month, month, year and one output. The time lag of the time series is ten days in this case. Table 3.26 presents the networks’ performance and level of error, including the training algorithm, which is the same for all networks (RBFT), error functions (sum of squares), activation function of the hidden layer of neurons (Gaussian curve), and activation function of the neurons in the output layer (Identity). Even at first glance it could be estimated that the performance of the networks in Experiment 6 will not be ideal. In the case of the datasets (training, testing, validation), the performance values achieve less than 50%; some of the networks (2, 4, and 5) show performance lower than twenty percent in the case of the training dataset. This is the evidence of their inapplicability for the prediction of time series. The weights of the individual networks are presented in Table 3.27; however, only the first four rows describing the networks’ weights are shown due to the large extent of the table. Table 3.27 serves only as an illustration.

68

3 Artificial Neural Networks—Selected Models

Fig. 3.17 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 5-22-1, 2. RBF 5-28-1, 3. RBF 5-29-1, 4. RBF 5-30-1, 5. RBF 5-26-1 Experiment 4 (Tibco 2020)

The correlation coefficients of the datasets and individual networks determined within the Experiment 6 are presented in Table 3.28. As for correlation coefficients, the most required values are those closest to 1. Within this experiment, the correlation coefficients of the five most successful networks achieve the values of less than 0.3, which is considered to be a very low value. It can thus be said that no significant positive result concerning the prediction of the price of gold using any of the five aforementioned networks is expected. Prediction statistics are given in Table 3.29. It can be seen from the table that four out of the five networks show confusing behaviour, which can be evaluated on

Net. name

RBF 25-30-1

RBF 25-28-1

RBF 25-29-1

RBF 25-27-1

RBF 25-30-1

Index

1

2

3

4

5

0.629755

0.655645

0.689605

0.638106

0.653413

Training perf

0.648300

0.646763

0.681925

0.668792

0.641030

Test perf

0.645767

0.622212

0.653480

0.645141

0.635410

Validation perf

29,975.10

28,321.89

26,072.60

29,450.91

28,569.35

Training error

Table 3.21 Overview of networks—Experiment 5 RBF (Tibco 2020)

28,401.28

28,529.86

26,313.67

27,549.12

28,879.62

Test error

26,144.18

27,495.60

25,751.06

26,199.73

26,730.07

Validation error

RBFT

RBFT

RBFT

RBFT

RBFT

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Gaussian

Gaussian

Gaussian

Gaussian

Gaussian

Hidden activation

Identity

Identity

Identity

Identity

Identity

Output activation

3.1 Radial Basis Function Neural Networks (Explanation, Usage) 69

Weight values 1. RBF 25-30-1

1

1

0

1

Connections 1. RBF 25-30-1

Date-1 → hidden neuron 1

Date-1 → hidden neuron 2

Date-1 → hidden neuron 3

Date-1 → hidden neuron 4

Weight ID

1

2

3

4

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 2. RBF 25-28-1

1

1

0

1

Weight values 2. RBF 25-28-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 3. RBF 25-29-1

Table 3.22 Overview of networks’ weights—Experiment 5 RBF (Tibco 2020)

1

0

0

1

Weight values 3. RBF 25-29-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 4. RBF 25-27-1

1

1

0

1

Weight values 4. RBF 25-27-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 5. RBF 25-30-1

1

1

1

0

Weight values 5. RBF 25-30-1

70 3 Artificial Neural Networks—Selected Models

3.1 Radial Basis Function Neural Networks (Explanation, Usage)

71

Table 3.23 Correlation coefficients—Experiment 5 RBF (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. RBF 25-30-1

0.653413

0.641030

0.635410

2. RBF 25-28-1

0.638106

0.668792

0.645141

3. RBF 25-29-1

0.689605

0.681925

0.653480

4. RBF 25-27-1

0.655645

0.646763

0.622212

5. RBF 25-30-1

0.629755

0.648300

0.645767

the basis of the data presented in the table (see the repeated numbers in the case of networks 2,3,4, and 5 in the Prediction. Data statistics are presented in Table 3.30. Figure 3.20 shows the development of the individual networks’ curves in comparison with the USD/oz curve (marked blue). The figure clearly shows that the curves of the individual networks do not follow the course of the USD/oz curve at all. Even now it can thus be concluded that none of the five networks mentioned above is a suitable tool for predicting the price of gold time series. The graphs of smoothed time series and prediction for 44 trading days (two calendar months) can be seen in Fig. 3.21 showing the graphs that compare the prediction curve made by means of the individual networks with the curve of the testing dataset. None of the networks described in the graphs indicates it could be used for accurate predicting.

3.2 Multi-Layer Perceptron Neural Networks Kumar and Yaday (2011) rank MLP Network (Multi-Layer Perceptron Neural Network) amongst the most frequently used perceptron networks. Michal et al. (2015) point out that not only MLP networks but also other types of neural structures consist of processing units called nodes linked by weighted connections. Nodes usually comprise three or more layers. The input layer receives values of prediction variables presented in the network, while one or more output nodes represent the estimated output. Kumar and Yaday (2011) consider MLP as a network type suggesting a feedforward artificial neural network. The structure consists of no less than two neural layers, i.e. perceptrons. Each layer connects all inputs of individual neurons with outputs of the previous layer that always aim at the following layer. The authors observe that this network is a further modification of a classical linear perceptron with the ability to analyse data that linear differentiation does not show. For systematic training, multilayer perceptron networks employ the backpropagation method.

1741.494 −639.805 896.162 −611.122

1596.171

−618.257

687.671

−602.560

Maximum prediction (validation)

Minimum residual (train)

Maximum residual (train)

Minimum residual (rest)

3.939

Maximum standard residual (validation)

3.750

3.554

Minimum standard residual (test) −3.511

−3.682

−3.546

Maximum standard residual (train)

4.106

5.222

4.068

Minimum standard residual (train)

−3.536

−3.728

−3.658

Maximum residual (validation)

Minimum standard residual (validation)

606.932

643.996

Minimum residual (validation)

Maximum standard residual (test)

589.927 −568.361

697.809

−578.055

Maximum residual (test)

54.610

1781.606

0.073

1687.824

442.209

Minimum prediction (test)

3.823

−3.204

4.343

−2.775

4.305

−3.172

613.516

−514.184

704.459

−450.149

695.158

−512.248

1684.403

489.972

1643.146

245.615

1696.635

308.490

−273.412 1873.700

353.381

1669.395

Maximum prediction (train)

Minimum prediction (validation)

298.157

Minimum prediction (train)

3. RBF 25-29-1

2. RBF 25-28-1

Maximum prediction (test)

1. RBF 25-30-1

Statistics

Table 3.24 Prediction statistics—Experiment 5 RBF (Tibco 2020)

3.501

−3.569

4.068

−3.261

4.333

−3.799

580.566

−591.875

687.062

−550.816

729.182

−639.269

1738.462

339.162

1759.656

381.849

1913.607

297.634

4. RBF 25-27-1

4.187

−3.957

3.714

−3.642

4.623

−3.771

677.031

−639.836

625.925

−613.706

800.322

−652.936

1563.618

362.001

1595.895

251.368

1626.059

201.858

5. RBF 25-30-1

72 3 Artificial Neural Networks—Selected Models

41,340.06

1523.42

38,728.00

44,015.00

41,515.31

1590.14

38,722.00

44,019.00

41,440.15

2335.57

38,720.00

44,020.00

41,381.33

1533.01

Minimum (test)

Maximum (test)

Mean (test)

Standard deviation (test)

Minimum (validation)

Maximum (validation)

Mean (validation)

Standard deviation (validation)

Minimum (overall)

Maximum (overall)

Mean (overall)

Standard deviation (overall)

44,020.00

Maximum (train)

Standard deviation (train)

38,720.00

Minimum (train)

Mean (train)

Date input

Samples

1.400899

4.018564

6.000000

2.000000

1.396892

4.027322

6.000000

2.000000

1.441273

4.034608

6.000000

2.000000

1.391633

4.013255

6.000000

2.000000

Day in week input

Table 3.25 Data statistics—Experiment 5 RBF (Tibco 2020)

8.70385

15.61643

31.00000

1.00000

8.69112

15.83789

31.00000

1.00000

8.90170

15.28415

31.00000

1.00000

8.64905

15.64016

31.00000

1.00000

Day in month input

3.42037

6.39503

12.00000

1.00000

3.57921

6.52095

12.00000

1.00000

3.42252

6.51730

12.00000

1.00000

3.41621

6.34191

12.00000

1.00000

Month input

4.201

2012.805

2020.000

2006.000

6.354

2012.954

2020.000

2006.000

4.367

2013.162

2020.000

2006.000

4.172

2012.696

2020.000

2006.000

Year input

313.589

1205.034

1895.000

524.750

543.600

1207.606

1834.000

524.750

313.031

1218.531

1895.000

538.750

316.367

1201.595

1895.000

529.500

USD/oz target

3.2 Multi-Layer Perceptron Neural Networks 73

74

3 Artificial Neural Networks—Selected Models

Fig. 3.18 Time series prediction for USD/oz—Experiment 5 (Tibco 2020)

3.2.1 Mathematical Background In practice, neural model, learning process, or topology of synapses use various types of artificial neural networks. According to Michal et al. (2015), the most commonly applied and important neural networks include MLP networks (Multi-Layer perceptron), whose topology consists of the input and output layers and an unspecified number of hidden layers, which are interconnected between the layers, but without a connection inside the layers. MLP network is a feedforward unidirectional network trained by a “teacher”. Tuˇcková (2003) adds that feed forward neural networks are the most common and simplest type of neural networks, especially due to their acyclic topology, where the neurons are arranged in layers. This ensures that the signals spread in one direction only. Neurons are then connected directly by means of their inputs and outputs. The output of several neurons in one layer thus create an input in a neuron in another layer. The connection is thus only between the layers, not between the neurons within the same layer. Bishop (1995) states that multilayer perceptron network is able to model the functional relationships while applying the non-linear activation functions, which, compared to single-layer perceptron networks, enables also non-linear classification. Kumar and Yadav (2011) state that the basis of the structure of this type of artificial neural network is represented by a directed graph. Each neuron has a simple task to process information by means of converting the obtained input into processed outputs. As already mentioned, the information flow in MLP networks is one-directional, that

3.2 Multi-Layer Perceptron Neural Networks

75

Fig. 3.19 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 25-30-1, 2. RBF 25-28-1, 3. RBF 25-289-1, 4. RBF 25-27-1, 5.BF 25-30-1 Experiment 5 (Tibco 2020)

is, from the input layer through the hidden to output layer. The connection is described in the figure below (Fig. 3.22). According to Haykin (2001), this type of connection enables forward signal propagation. The signal passed to a given neural network causes excitation of neurons in the input layer. The author adds that the excitation is brought to the first hidden layer by means of synapses, where the strength of the signal is adjusted by means of individual weights. Each individual neuron in the first hidden layer thus obtains information from each neuron from the input layer; it should be noted that the differences in the information are caused by different weights. Subsequently, after the signals are transmitted, their summation and excitation occurs, which is given by the activation function. This process takes place in all layers; due to this process, it is

Net name

RBF 50-21-1

RBF 50-23-1

RBF 50-23-1

RBF 50-23-1

RBF 50-30-1

Index

1

2

3

4

5

0.163494

0.128000

0.207340

0.199671

0.214528

Training perf

0.176402

0.182207

0.186163

0.154004

0.221858

Test perf

0.255059

0.252961

0.257506

0.251010

0.246554

Validation perf

49,427.77

49,427.77

49,427.77

49,427.77

47,159.91

Training error

Table 3.26 Overview of networks—Experiment 6 RBF (Tibco 2020)

48,623.75

48,623.75

48,623.57

48,623.18

46,192.61

Test error

44,416.76

44,416.76

44,416.69

44,416.53

41,998.28

Validation error

RBFT

RBFT

RBFT

RBFT

RBFT

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Gaussian

Gaussian

Gaussian

Gaussian

Gaussian

Hidden activation

Identity

Identity

Identity

Identity

Identity

Output activation

76 3 Artificial Neural Networks—Selected Models

Weight values 1. RBF 50-21-1

0

1

1

1

Connections 1. RBF 50-21-1

Date-1 → hidden neuron 1

Date-1 → hidden neuron 2

Date-1 → hidden neuron 3

Date-1 → hidden neuron 4

Weight ID

1

2

3

4

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 2. RBF 50-23-1

0.727

0.067

0.250

0.528

Weight values 2. RBF 50-23-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 3. RBF 50-23-1

Table 3.27 Overview of networks’ weights—Experiment 6 RBF (Tibco 2020)

0.5455

0.2333

1.0000

0.7243

Weight values 3. RBF 50-23-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 4. RBF 50-23-1

0.818182

0.200000

0.000000

0.948113

Weight values 4. RBF 50-23-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 5. RBF 50-30-1

0.636364

1.000000

0.750000

0.045283

Weight values 5. RBF 50-30-1

3.2 Multi-Layer Perceptron Neural Networks 77

78

3 Artificial Neural Networks—Selected Models

Table 3.28 Correlation coefficients—Experiment 6 RBF (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. RBF 50-21-1

0.214528

0.221858

0.246554

2. RBF 50-23-1

0.199671

0.154004

0.251010

3. RBF 50-23-1

0.207340

0.186163

0.257506

4. RBF 50-23-1

0.128000

0.182207

0.252961

5. RBF 50-30-1

0.163494

0.176402

0.255059

possible to obtain a response to the input signal in the output layer of a given neural network. The following formula describes the output of the i-th neuron output in the k-th layer.

yi,k

⎛ ⎞ N

=ϕ⎝ y j,k−1 w j,i,k ⎠

(3.3)

j=1

where: N is a number of neurons in the (k − 1) layer. This output also depends on the shape of the activation function ϕ, which affects the behaviour of the neuron. According to Vochozka et al. (2016), typical algorithms for MLP network’s learning include a type of learning, which is referred to as backpropagation. This is a method based on gradient descent, which adjusts the weight values to minimize network and expected output errors. The authors add that MLP networks use iteration in training, which requires much more time in the case of larger amount of data or hidden neurons.

3.2.2 Case For these experiments, multilayer perceptron networks (MLP) were used a method of predicting the development of price of gold time series. The dataset was divided into three basic subsets: testing, training, and validation dataset. Experiment 1 1.

Input variable (time—date), 1 output variable (price of gold), 1-day time lag of time series

The first experiment includes one input variable (time—date), one output variable, which is the price of gold in our case, and the time series time lag is one day. Table 3.31 presents a basic overview of neural networks used for predicting the price of gold, showing the first five most efficient and most successful networks (with the best characteristics). It can be notices that the network performance is about 98% in the case of training, testing, and validation datasets, which is considered a very

1203.437

−11.367

1217.216

226.843

Minimum prediction (train)

Maximum prediction (train)

Minimum prediction (test)

1203.437 −668.437 691.563 −664.687

1217.216

−680.796

679.980

−678.039

Maximum prediction (validation)

Minimum residual (train)

Maximum residual (train)

Minimum residual (test)

1203.437

Minimum prediction (validation)

3.010

Maximum standard residual (validation)

2.992

3.136

Minimum standard residual (test) −3.072

−3.014

−3.155

Maximum standard residual (train)

3.154

3.111

3.131

Minimum standard residual (train)

−3.226

−3.007

−3.135

Maximum residual (validation)

Minimum standard residual (validation)

630.563

616.828

Minimum residual (validation)

Maximum standard residual (test)

691.563 −647.437

677.800

−661.042

Maximum residual (test)

1203.437

1217.216

652.462

Maximum prediction (test)

1203.437

1203.437

2. RBF 50-23-1

1. RBF 50-21-1

Statistics

Table 3.29 Prediction statistics—Experiment 6 RBF (Tibco 2020)

2.992

−3.072

3.136

−3.014

3.111

−3.006

630.587

−647.413

691.587

−664.663

691.587

−668.413

1203.413

1203.413

1203.413

1203.413

1203.413

1203.413

3. RBF 50-23-1

2.992

−3.072

3.136

−3.014

3.111

−3.006

630.598

−647.402

691.598

−664.652

691.598

−668.402

1203.402

1203.402

1203.402

1203.402

1203.402

1203.402

4. RBF 50-23-1

2.992

−3.072

3.136

−3.014

3.111

−3.006

630.598

−647.402

691.598

−664.652

691.598

−668.402

1203.402

1203.402

1203.402

1203.402

1203.402

1203.402

5. RBF 50-30-1

3.2 Multi-Layer Perceptron Neural Networks 79

41,340.06

1523.42

38,728.00

44,015.00

41,515.31

1590.14

38,722.00

44,019.00

41,440.15

2335.57

38,720.00

44,020.00

41,381.33

1533.01

Minimum (test)

Maximum (test)

Mean (test)

Standard deviation (test)

Minimum (validation)

Maximum (validation)

Mean (validation)

Standard deviation (validation)

Minimum (overall)

Maximum (overall)

Mean (overall)

Standard deviation (overall)

44,020.00

Maximum (train)

Standard deviation (train)

38,720.00

Minimum (train)

Mean (train)

Date input

Samples

1.400899

4.018564

6.000000

2.000000

1.396892

4.027322

6.000000

2.000000

1.441273

4.034608

6.000000

2.000000

1.391633

4.013255

6.000000

2.000000

Day in week input

Table 3.30 Data statistics—Experiment 6 RBF (Tibco 2020)

8.70385

15.61643

31.00000

1.00000

8.69112

15.83789

31.00000

1.00000

8.90170

15.28415

31.00000

1.00000

8.64905

15.64016

31.00000

1.00000

Day in month input

3.42037

6.39503

12.00000

1.00000

3.57921

6.52095

12.00000

1.00000

3.42252

6.51730

12.00000

1.00000

3.41621

6.34191

12.00000

1.00000

Month input

4.201

2012.805

2020.000

2006.000

6.354

2012.954

2020.000

2006.000

4.367

2013.162

2020.000

2006.000

4.172

2012.696

2020.000

2006.000

Year input

313.589

1205.034

1895.000

524.750

543.600

1207.606

1834.000

524.750

313.031

1218.531

1895.000

538.750

316.367

1201.595

1895.000

529.500

USD/oz target

80 3 Artificial Neural Networks—Selected Models

3.2 Multi-Layer Perceptron Neural Networks

81

Fig. 3.20 Predicting time series for USD/oz—Experiment 6 (Tibco 2020)

high performance. In all cases, the same training algorithm BFGS was used (with a different numerical designation). As an error function, the sum of squares (SOS) was used in all cases, while for the activation of the hidden layer of neurons, Hyberbolic tangent is used for all cases except the first one, where logistic function was used as a function to activate the hidden layer of neurons. For the activation of neurones in the output layer, Identity is used in two case, in one of them being exponential and logistic in the other. The weights of the individual neural networks are presented in Table 3.32; however, there is only a small part of the data since due to its extent, it is not possible to present the whole table. Therefore, only the first four IDs of the weights are specified for illustration. For all five neural networks and the data subsets, correlation coefficients were determined, which point out that if there is correlation between two processes, they are likely to be mutually dependent. It can be seen from Table 3.33 that the value of the correlation coefficient is high for all networks (about 0.98). In the case of correlation coefficients’ value, it is required that the value is as close to 1 as possible; for each network of the datasets (training, testing, validation), the values shall be identical in ideal case. The differences between the values of the neural networks presented in Table 3.33 are minimal. Table 3.34 captures the prediction statistics according to individual neural networks and datasets (training, testing, validation).

82

3 Artificial Neural Networks—Selected Models

Fig. 3.21 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 50-21-1, 2. RBF 50-23-1, 3. RBF 50-23-1, 4. RBF 50-23-1, 5. RBF 50-30-1 Experiment 6 (Tibco 2020)

Fig. 3.22 Topology of multilayer percepton

Net. name

MLP 1-8-1

MLP 1-7-1

MLP 1-8-1

MLP 1-8-1

MLP 1-8-1

Index

1

2

3

4

5

0.985260

0.984128

0.984474

0.983705

0.987049

Training perf

0.984405

0.983588

0.984876

0.983220

0.986649

Test perf

0.984205

0.983372

0.983161

0.982813

0.985175

Validation perf

1461.219

1572.533

1538.584

1614.390

1285.374

Training error

Table 3.31 Overview of networks—MLP NN Experiment 1 (Tibco 2020)

1513.991

1592.331

1469.196

1627.513

1303.063

Test error

1418.593

1491.808

1510.424

1541.637

1331.725

Validation error

BFGS 294

BFGS 557

BFGS 765

BFGS 220

BFGS 831

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Tanh

Tanh

Tanh

Tanh

Logistic

Hidden activation

Logistic

Identity

Identity

Logistic

Exponential

Output activation

3.2 Multi-Layer Perceptron Neural Networks 83

Date-1 → 7.4541 hidden neuron 1

Date-1 → −14.6916 hidden neuron 2

Date-1 → −53.1244 hidden neuron 3

Date-1 → −30.6890 hidden neuron 4

Date-1 → 21.436 hidden neuron 2

Date-1 → −30.753 hidden neuron 3

Date-1 → −14.799 hidden neuron 4

2

3

4

Date-1 → 12.5521 hidden neuron 4

Date-1 → 16.6189 hidden neuron 3

Date-1 → −0.8886 hidden neuron 2

Date-1 → −14.1168 hidden neuron 4

Date-1 → −43.0248 hidden neuron 3

Date-1 → −0.3166 hidden neuron 2

Weight values 5. MLP 1-8-1

Date-1 → 11.444 hidden neuron 4

Date-1 → −68.878 hidden neuron 3

Date-1 → −20.537 hidden neuron 2

Date-1 → −52.490 hidden neuron 1

Weight values Connections 4. MLP 1-8-1 5. MLP 1-8-1

Date-1 → 44.7786 hidden neuron 1

Weight values Connections 3. MLP 1-8-1 4. MLP 1-8-1

Date-1 → 12.9752 hidden neuron 1

Weight values Connections 2. MLP 1-7-1 3. MLP 1-8-1

Date-1 → 14.337 hidden neuron 1

Weight values Connections 1. MLP 1-8-1 2. MLP 1-7-1

1

Weight ID Connections 1. MLP 1-8-1

Table 3.32 Network weights—MLP NN experiment 1 (Tibco 2020)

84 3 Artificial Neural Networks—Selected Models

3.2 Multi-Layer Perceptron Neural Networks

85

Table 3.33 Correlation coefficients—MLP NN experiment 1 (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. MLP 1-8-1

0.987049

0.986649

0.985175

2. MLP 1-7-1

0.983705

0.983220

0.982813

3. MLP 1-8-1

0.984474

0.984876

0.983161

4. MLP 1-8-1

0.984128

0.983588

0.983372

5. MLP 1-8-1

0.985260

0.984405

0.984205

It can be seen from the table that in the case of the maximum prediction on the training, testing, and validation datasets, the values of all networks are very similar. This applies to the minimum prediction as well. In terms of minimum and maximum residuals, as small difference between the minimum and maximum as possible is required. In terms of residuals, the best network appears to be the network 3. MLP 1-8-1, which shows the smallest residuals. Statistics of the individual data are given in Table 3.35. Figure 3.23 shows the development of the curve of price per Troy ounce of gold in USD (marked blue) in comparison with five best retained networks. As a sample, training, testing, and validation datasets were used. In terms of accuracy, the most successful networks appear to be 3. MLP 1-8-1, since its curve best follows the blue curve, that is, the actual development of the price of gold. The graphs of smoothed time series including the prediction for 44 trading days (two calendar months) are presented in Fig. 3.24. The most successful networks in terms of the curve trend compared to the curve of the actual prices of gold are the networks 1. MLP 1-8-1 and 3. MLP 1-8-1. These networks were able to predict the trend of the curve even at some extreme points. All networks show very good results in comparison with the testing dataset, and all of them could be applied for predicting when ensuring high network performance. In terms of the minimum residuals, the best network appears to be 3. RBF 1-8-1. Experiment 2 1.

Input variable (time—date), 1 output variable (price of gold), 5-day time lag of time series

The second experiment includes one input variable (time—date), one output variable, which is the price of gold in this case and the time lag of the time series is five days. Table 3.36 presents the basic overview of the neural networks used for predicting the development of the price of gold. It includes first five most efficient and most successful networks (networks with the best characteristics). It shall be noticed that the performance of the networks in this case is again around 98 percent for all types of datasets, which is considered a very high performance. For all cases, the same training algorithm BFGS is used (only with different numerical designation). The error function in all cases is the sum of squares (SOS), as a function for the activation of the hidden layer of neurons, Hyperbolic tangent is used in one case,

−204.508 267.279 −185.462 265.230 −165.465 201.535 −5.090 6.652 −4.597 6.574 −4.214

−195.980

237.105

−187.133

234.323

−172.635

169.689

−5.466

6.613

−5.184

6.491

−4.731

4.650

Minimum residual (train)

Maximum residual (train)

Minimum residual (test)

Maximum residual (test)

Minimum residual (validation)

Maximum residual (validation)

Minimum standard residual (train)

Maximum standard residual (train)

Minimum standard residual (test)

Maximum standard residual (test)

Minimum standard residual (validation)

Maximum standard residual (validation)

5.133

1800.408

611.731

1793.820

573.900

1787.357

Maximum prediction (test)

611.757

1802.031

1789.468

574.330

Minimum prediction (test)

Maximum prediction (validation)

1789.557

Maximum prediction (train)

2. MLP 1-7-1 611.727

Minimum prediction (validation)

573.829

1. MLP 1-8-1

4.442

−4.603

5.917

−4.619

5.847

−4.349

172.626

−178.892

226.799

−177.053

229.342

−170.599

1745.913

551.554

1745.058

553.310

1745.934

551.261

3. MLP 1-8-1

Predictions statistics (gold - data_2) Target: USD/oz

Minimum prediction (train)

Statistics

Table 3.34 Prediction statistics—MLP NN Experiment 1 (Tibco 2020) 4. MLP 1-8-1

5.786

−4.665

7.184

−4.529

7.271

−4.466

223.491

−180.177

286.679

−180.744

288.340

−177.085

1711.196

576.578

1709.420

576.828

1711.634

576.539

5. MLP 1-8-1

6.013

−4.833

7.439

−4.683

7.611

−4.717

226.488

−182.024

289.439

−182.234

290.923

−180.329

1740.766

619.126

1738.325

619.129

1741.367

619.125

86 3 Artificial Neural Networks—Selected Models

3.2 Multi-Layer Perceptron Neural Networks

87

Table 3.35 Data statistics—MLP NN experiment 1 (Tibco 2020) Samples

Data statistics (gold—data_2) Date input

USD/oz target

Minimum (train)

38,720.00

529.500

Maximum (train)

44,020.00

1895.000

Mean (train)

41,340.06

1201.595

Standard deviation (train)

1523.42

316.367

Minimum (test)

38,728.00

538.750

Maximum (test)

44,015.00

1895.000

Mean (test)

41,515.31

1218.531

Standard deviation (test)

1590.14

313.031

Minimum (validation)

38,722.00

524.750

Maximum (validation)

44,019.00

1834.000

Mean (validation)

41,440.15

1207.606

Standard deviation (validation)

2335.57

543.600

Minimum (overall)

38,720.00

524.750

Maximum (overall)

44,020.00

1895.000

Mean (overall)

41,381.33

1205.034

Standard deviation (overall)

1533.01

313.589

Fig. 3.23 Prediction of time series USD/oz—MLP NN Experiment 1 (Tibco 2020)

88

3 Artificial Neural Networks—Selected Models

Fig. 3.24 Smoothed time series and prediction for 44 trading days 1. MLP 1-8-1, 2. MLP 1-7-1, 3. MLP 1-8-1, 4. MLP 1-8-1, 5. MLP 1-8-1—Experiment 1 (Tibco 2020)

while in the remaining cases, logistic function is used as a function for the activation of the hidden layer of neurons. As a function for the activation on neurons in the output layer, Hyperbolic tangent was used in two cases, in one case it was Exponential function, and in two remaining cases, it was logistics function. The weights of the neural networks are shown in Table 3.37; however, only a small part of the table is presented, since due to its large extent, it is not possible to present the complete table with all weights. Therefore, only the first four ID of weights are presented for illustration. For all five networks, correlation coefficients of the individual networks and data subsets were determined, indicating that in the case of correlation between two processes, these are likely to be mutually dependent. Table 3.38 clearly shows that the values of the correlation coefficient are high for all networks, since all of them exceeded the value of 0.98. This indicates a very high

Net. name

MLP 5-6-1

MLP 5-8-1

MLP 5-8-1

MLP 5-7-1

MLP 5-8-1

Index

1

2

3

4

5

0.985145

0.984502

0.983964

0.983543

0.984210

Training perf

0.984557

0.983561

0.983055

0.983431

0.984757

Test perf

0.983557

0.982901

0.982257

0.982565

0.982695

Validation perf

1464.947

1527.502

1580.052

1621.418

1555.911

Training error

Table 3.36 Overview of networks—MLP NN Experiment 2 (Tibco 2020)

1501.249

1596.290

1643.916

1607.253

1481.867

Test error

1460.720

1518.669

1575.105

1547.872

1536.321

Validation error

BFGS 246

BFGS 260

BFGS 451

BFGS 465

BFGS 616

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Logistic

Logistic

Logistic

Tanh

Logistic

Hidden activation

Logistic

Logistic

Exponential

Tanh

Tanh

Output activation

3.2 Multi-Layer Perceptron Neural Networks 89

Date-1 → −4.1419 hidden neuron 2

Date-1 → −3.0355 hidden neuron 3

Date-1 → −1.9745 hidden neuron 4

Date-1 → 19.6686 hidden neuron 3

Date-1 → 14.6392 hidden neuron 4

3

4

Date-1 → 8.0327 hidden neuron 4

Date-1 → 5.4789 hidden neuron 3

Date-1 → 25.2519 hidden neuron 4

Date-1 → 25.9877 hidden neuron 3

Weight values 5. MLP 5-8-1

Date-1 → −3.7047 hidden neuron 4

Date-1 → −3.2985 hidden neuron 3

Date-1 → −3.4150 hidden neuron 2

Weight values Connections 4. MLP 5-7-1 5. MLP 5-8-1

Date-1 → 26.6139 hidden neuron 2

Weight values Connections 3. MLP 5-8-1 4. MLP 5-7-1

Date-1 → 9.3501 hidden neuron 2

Weight values Connections 2. MLP 5-8-1 3. MLP 5-8-1

Date-1 → 24.0077 hidden neuron 2

Weight values Connections 1. MLP 5-6-1 2. MLP 5-8-1

2

Weight ID Connections 1. MLP 5-6-1

Table 3.37 Network weights—MLP NN Experiment 2 (Tibco 2020)

90 3 Artificial Neural Networks—Selected Models

3.2 Multi-Layer Perceptron Neural Networks

91

Table 3.38 Correlation coefficients—MLP NN Experiment 2 (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. MLP 5-6-1

0.984210

0.984757

0.982695

2. MLP 5-8-1

0.983543

0.983431

0.982565

3. MLP 5-8-1

0.983964

0.983055

0.982257

4. MLP 5-7-1

0.984502

0.983561

0.982901

5. MLP 5-8-1

0.985145

0.984557

0.983557

performance of the network. In the case of the correlation coefficients’ values, it is required that the value is as high as possible (close to 1), and ideally the same for all networks in all datasets (training, testing, and validation). The differences between the values of the datasets for one network are minimal, and their performance is sufficient. Prediction statistics for the individual networks in the case of training, testing, and validation datasets are presented in Table 3.39, while the data statistics are presented in Table 3.40. Looking at the Table 3.39, it is evident that the values are satisfactory, and high performance of the networks can be expected. In the case of residuals, it is required that the sum of the maximum and minimum was 0 in ideal case, or in the case of the minimum, the highest negative number and in the case of the maximum, the lowest positive number are sought. As for predictions, it could be concluded that the values of the individual networks in the case of training, testing, and validation datasets are very similar for all networks. In terms of the residuals, the best network appears to be 1. MLP 5-6-1. Figure 3.25 presents a graph of the prediction of time series for USD/oz, showing the course of all networks’ curves. It is evident from the figure that all networks show similar behaviour; taking a closer look at the curves displayed, it could be seen that their course is very similar at many parts of the graph. However, it could be said that the most suitable network for predicting the price of gold time series is the network 1. MLP 5-6-1, specifically from the perspective of the lowest residuals. The sample used is the training, testing, and validation dataset. The graphs of smoothed time series and prediction for 44 trading days for all networks is shown in Fig. 3.26. The networks that have a very good course of the curve, which at least partially follows the curve of the actual prices of gold are the networks 1. MLP 5-6-1 (in terms of the estimated shape of the curve in extreme cases) and 3. MLP 5-8-1, which shows a very high performance in the first third of the graph. Experiment 3 1.

Input variable (time—date), 1 output variable (price of gold), 10-day time lag of time series

The third experiment includes one input variable (time—date), one output variable, which is the price of gold in our case, and the time series time lag, which is 10 days.

1762.896 −179.764 277.204 −181.386

1755.786

−170.570

244.839

−177.474

Maximum prediction (validation)

Minimum residual (train)

Maximum residual (train)

Minimum residual (test)

4.421

Maximum standard residual (validation)

5.403

6.897

Minimum standard residual (test) −4.369

−4.524

−4.610

Maximum standard residual (train)

6.278

6.884

6.207

Minimum standard residual (train)

−4.378

−4.464

−4.324

Maximum residual (validation)

Minimum standard residual (validation)

212.557

173.289

Minimum residual (validation)

Maximum standard residual (test)

276.514 −171.893

241.658

−171.589

Maximum residual (test)

561.993

1759.851

563.497

1752.756

555.929

Minimum prediction (test)

558.298 1764.778

553.991

1755.883

Maximum prediction (train)

Minimum prediction (validation)

555.365

Minimum prediction (train)

2. MLP 5-8-1

Maximum prediction (test)

1. MLP 5-6-1

Statistics

Table 3.39 Prediction statistics—MLP NN Experiment 2 (Tibco 2020)

5.167

−4.451

7.138

−4.051

7.602

−5.394

205.048

−176.654

289.432

−164.231

302.189

−214.404

1726.746

600.998

1727.797

601.206

1728.186

601.198

3. MLP 5-8-1

6.117

−4.697

7.743

−3.653

8.059

−4.208

238.365

−183.057

309.355

−145.938

314.984

−164.480

1738.178

601.063

1736.327

601.183

1739.444

601.219

4. MLP 5-7-1

5.438

−4.693

7.052

−4.398

7.182

−5.273

207.823

−179.358

273.220

−170.416

274.896

−201.833

1755.181

584.292

1752.810

584.461

1756.388

584.629

5. MLP 5-8-1

92 3 Artificial Neural Networks—Selected Models

3.2 Multi-Layer Perceptron Neural Networks

93

Table 3.40 Data statistics—MLP NN experiment 2 (Tibco 2020) Samples

Date input

USD/oz target

Minimum (train)

38,720.00

529.500

Maximum (train)

44,020.00

1895.000

Mean (train)

41,340.06

1201.595

Standard deviation (train)

1523.42

316.367

Minimum (test)

38,728.00

538.750

Maximum (test)

44,015.00

1895.000

Mean (test)

41,515.31

1218.531

Standard deviation (test)

1590.14

313.031

Minimum (validation)

38,722.00

524.750

Maximum (validation)

44,019.00

1834.000

Mean (validation)

41,440.15

1207.606

Standard deviation (validation)

2335.57

543.600

Minimum (overall)

38,720.00

524.750

Maximum (overall)

44,020.00

1895.000

Mean (overall)

41,381.33

1205.034

Standard deviation (overall)

1533.01

313.589

Fig. 3.25 Prediction of USD/oz time series—MLP NN experiment 2 (Tibco 2020)

94

3 Artificial Neural Networks—Selected Models

Fig. 3.26 Graphs of smoothed time series and prediction for 44 trading days 1. MLP 5-6-1, 2. MLP 5-8-1, 3. MLP 5-8-1, 4. MLP 5-7-1, 5. MLP 5-8-1—Experiment 2 (Tibco 2020)

Table 3.41 offers the basic overview of the neural networks used for predicting the development of the price of gold, presenting the first five most efficient and most successful networks (networks with the best characteristics). It can be noticed that the performance of the networks is about 98 percent in this case, which is a very high performance, with the network 3. MLP 10-8-1 showing the best performance in the case of testing, training, and validation datasets. The information about the networks’ performance indicates high probability of successful application of the network for predicting the development of the price of gold. In all cases, the same training algorithm (BFGS) was used (the only difference is in its numerical designation). As an error function, the sum of squares (SOS) was used,

Net. name

MLP 10-7-1

MLP 10-7-1

MLP 10-8-1

MLP 10-8-1

MLP 10-8-1

Index

1

2

3

4

5

0.984047

0.982963

0.988940

0.983994

0.983571

Training perf

0.983372

0.981962

0.988240

0.984014

0.984685

Test perf

0.981882

0.981537

0.988279

0.981359

0.982238

Validation perf

1562.988

1668.345

1085.921

1568.038

1609.527

Training error

Table 3.41 Overview of network—MLP NN experiment 3 (Tibco 2020)

1598.650

1733.591

1134.527

1543.388

1478.157

Test error

1592.048

1621.498

1033.320

1637.143

1560.952

Validation error

BFGS 569

BFGS 473

BFGS 344

BFGS 328

BFGS 284

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Tanh

Tanh

Tanh

Tanh

Tanh

Hidden activation

Identity

Identity

Logistic

Exponential

Identity

Output activation

3.2 Multi-Layer Perceptron Neural Networks 95

96

3 Artificial Neural Networks—Selected Models

while Hyperbolic tangent is used as a function for the activation of the hidden layer of neurons, and for the activation of the neurons in the output layer, Identity was used in three cases, logistic function in one case, and exponential function in the remaining case. The weights of the individual networks are shown in Table 3.42. Due to the large volume of the data rows, the table presents only the first four rows that concern the weights of the individual networks. Even within the Experiment 3, the correlation coefficients of the individual networks and data subsets. They are presented in Table 3.43, which shows that the values of the correlation coefficient are high for all networks, which confirms their high performance. The lowest value of 0.981359 is recorded for the network is 2. MLP 10-7-1 in the validation dataset, while the highest value of 0.988940 is recorded in the case of the network 3. MLP 10-8-1, which achieves the highest values of the correlation coefficients in all datasets. As for the values of the correlation coefficient, it is required that the value is as high as possible (close to 1), and ideally identical for each network’s datasets (training, testing, and validation). The prediction statistics for the individual networks in the case of training, testing, and validation datasets are given in Table 3.44, while the data statistics are listed in Table 3.45. Looking at Table 3.44, it can be seen that the values are highly satisfactory, and even at this moment, high performance of the networks can be assumed. In terms of the residuals, it is required that the sum of the maximum and minimum values was 0 in ideal case, or in the case of the minimum, the highest negative number and the lowest positive number in the case of the maximum are sought. As for predictions, we can see that the values of the individual numbers are very similar for all networks and the testing, training, and validation datasets. In terms of the residuals, the best network appears to be 2. MLP 10-7-1 in the case of training and testing datasets. In the case of the validation dataset, the most successful network in terms of the smallest residuals was 5. MLP 10-8-1. Figure 3.27 presents the graph of the prediction of the time series for USD/oz, displaying the course of all networks’ curves. It follows from the figure that all networks show similar behaviour, as looking at the curves displayed it can be concluded that their courses are very similar in many parts of the graph. Nevertheless, it can be said that 3. MLP 10-8-1 is the most suitable network for predicting the price of gold time series due to its best performance. The sample in this case includes the training, testing, and validation datasets. Graphs of smoothed time series and prediction for 44 trading days is presented in Fig. 3.28. It can be seen from the figure that the best networks in terms of predicting the time series are the networks 1. MLP 10-7-1 and also the most efficient network 3. MLP 10-8-1. In the case of the third network, it can be seen that the curve of this networks copies the curve of the actual development of price of gold in many parts of the graph.

−3.0649

−3.7112

−3.1577

−2.4925

Date-1 → hidden neuron 1

Date-1 → hidden neuron 2

Date-1 → hidden neuron 3

Date-1 → hidden neuron 4

1

2

3

4

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 1. MLP 10-7-1 2. MLP 10-7-1

Connections 1. MLP 10-7-1

Weight ID

−1.3852

−4.9083

−2.2151

−8.2046

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 2. MLP 10-7-1 3. MLP 10-8-1

Table 3.42 Network weights—MLP NN Experiment 3 (Tibco 2020)

5.1852

6.3569

6.1811

7.3485

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 3. MLP 10-8-1 4. MLP 10-8-1

−3.7492

−5.9227

−14.3696

0.0373

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 4. MLP 10-8-1 5. MLP 10-8-1

−12.0061

−3.9044

0.7629

−3.1976

Weight values 5. MLP 10-8-1

3.2 Multi-Layer Perceptron Neural Networks 97

98

3 Artificial Neural Networks—Selected Models

Table 3.43 Correlation coefficients—MLP NN experiment 3 (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. MLP 10-7-1

0.983571

0.984685

0.982238

2. MLP 10-7-1

0.983994

0.984014

0.981359

3. MLP 10-8-1

0.988940

0.988240

0.988279

4. MLP 10-8-1

0.982963

0.981962

0.981537

5. MLP 10-8-1

0.984047

0.983372

0.981882

Experiment 4 5.

Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 1-day time lag of time series

This experiment includes five input variables, specifically time—date, day of the week, day of the month, month, and year. It includes one output variable—the price of gold. The time lag of the time series is one day. The overview of the individual networks used within the fourth experiment is given in Table 3.46. Table 3.46 presents five most successful networks with the best characteristics. An important value that is interesting in relation to the dependence on the individual networks is the network performance, which exceeds the value of 0.98 in the case of testing, validation, and training datasets. In the case of the training network’s performance, the best network appears to be 1. RBF 5-10-1, with the training dataset performance value of 0.987487. This also applies to the case of the testing dataset, where the most efficient network appears to be 1. RBF 5-10-1. In terms of performance, the most successful network for the validation dataset is 4. MLP 5-11-1, achieving the value of 0.985807. The individual weights of the networks are given in Table 3.47. Due to its large extent, only a small part of the table is presented. Even in Experiment 4, the correlation coefficients for individual networks and datasets were determined. In this case, Table 3.48 clearly shows that the values of the training, testing, and validation datasets are very similar. The correlation coefficients exceed the value of 0.98 in all cases, which indicates good network performance. It is required that the value is as close to 1 as possible. The prediction statistics of the individual networks are given in Table 3.49. Table 3.50 presents the data statistics. In terms of the smallest residuals, the most successful network appears to be 1. MLP 5-10-1 for all datasets. The graph of the USD/oz time series and the individual networks is shown in Fig. 3.29. It can be seen from the figure that most of the networks follow closely the curve of USD/oz. The graphs of the smoothed time series and prediction for 44 trading days (two calendar months) for all networks are presented in Fig. 3.30. It can be concluded from the graph that all five networks have a similar prediction course. The test curve (marked blue) is most closely followed by the curve of the network 1. MLP 5-10-1.

1720.420 −245.810 216.087 −188.106

1772.094

−176.668

232.737

−196.447

Maximum prediction (validation)

Minimum residual (train)

Maximum residual (train)

Minimum residual (test)

4.819

Maximum standard residual (validation)

3.612

5.241

Minimum standard residual (test) −4.539

−4.788

−5.110

Maximum standard residual (train)

6.001

5.457

5.801

Minimum standard residual (train)

−4.342

−6.208

−4.404

Maximum residual (validation)

Minimum standard residual (validation)

146.130

190.379

Minimum residual (validation)

Maximum standard residual (test)

205.882 −183.657

230.718

−171.543

Maximum residual (test)

588.106

1719.120

587.034

1770.735

593.275

Minimum prediction (test)

586.225 1721.376

595.647

1779.177

Maximum prediction (train)

Minimum prediction (validation)

591.288

Minimum prediction (train)

2. MLP 10-7-1

Maximum prediction (test)

1. MLP 10-7-1

Statistics

Table 3.44 Prediction statistics—MLP NN Experiment 3 (Tibco 2020)

5.686

−5.449

7.306

−4.178

7.488

−4.529

182.773

−175.164

246.074

−140.732

246.758

−149.229

1715.616

579.202

1714.025

578.219

1716.653

577.475

3. MLP 10-8-1

5.490

−3.946

7.051

−4.268

7.009

−4.805

221.070

−158.886

293.569

−177.708

286.303

−196.261

1788.217

606.105

1783.126

606.678

1788.079

606.397

4. MLP 10-8-1

4.722

−4.451

6.323

−4.599

6.296

−4.573

188.422

−177.601

252.824

−183.867

248.903

−180.784

1770.771

570.893

1767.646

568.868

1771.600

566.783

5. MLP 10-8-1

3.2 Multi-Layer Perceptron Neural Networks 99

100

3 Artificial Neural Networks—Selected Models

Table 3.45 Data statistics—Experiment 3 MLP NN (Tibco 2020) Samples

Date input

USD/oz target

Minimum (train)

38,720.00

529.500

Maximum (train)

44,020.00

1895.000

Mean (train)

41,340.06

1201.595

Standard deviation (train)

1523.42

316.367

Minimum (test)

38,728.00

538.750

Maximum (test)

44,015.00

1895.000

Mean (test)

41,515.31

1218.531

Standard deviation (test)

1590.14

313.031

Minimum (validation)

38,722.00

524.750

Maximum (validation)

44,019.00

1834.000

Mean (validation)

41,440.15

1207.606

Standard deviation (validation)

2335.57

543.600

Minimum (overall)

38,720.00

524.750

Maximum (overall)

44,020.00

1895.000

Mean (overall)

41,381.33

1205.034

Standard deviation (overall)

1533.01

313.589

Fig. 3.27 Prediction of USD/oz time series—MLP NN Experiment 3 (Tibco 2020)

3.2 Multi-Layer Perceptron Neural Networks

101

Fig. 3.28 Smoothed time series and prediction for 44 trading days 1. MLP 10-7-1, 2. MLP 10-7-1, 3. MLP 10-8-1, 4. MLP 10-8-1, 5. MLP 10-8-1—Experiment 3 (Tibco 2020)

The network is also the most successful one in terms of the performance and size of residuals. Experiment 5 5.

Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 5-day time lag of time series

This experiment includes five input variables, specifically time—date, day of the week, day of the month, month, and year. It includes one output variable—the price of gold. The time lag of the time series is five days. The overview of individual networks used within the Experiment 5 is given in Table 3.51, where the five most successful networks with the best characteristics are presented.

Net. name

MLP 5-10-1

MLP 5-10-1

MLP 5-10-1

MLP 5-11-1

MLP 5-10-1

Index

1

2

3

4

5

0.986565

0.987266

0.986425

0.986726

0.987487

Training perf

0.985645

0.987362

0.986850

0.986805

0.987378

Test perf

0.984734

0.985766

0.985807

0.984748

0.985182

Validation perf

1332.921

1264.658

1346.576

1317.137

1241.913

Training error

Table 3.46 Overview of networks—MLP NN Experiment 4 (Tibco 2020)

1396.626

1229.756

1277.803

1283.801

1228.974

Test error

1372.477

1278.896

1275.916

1369.677

1331.405

Validation error

BFGS 451

BFGS 419

BFGS 708

BFGS 297

BFGS 533

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Tanh

Logistic

Tanh

Tanh

Tanh

Hidden activation

Logistic

Exponential

Tanh

Tanh

Exponential

Output activation

102 3 Artificial Neural Networks—Selected Models

−32.1759

−0.0117

0.1151

1.2548

Date-1 → hidden neuron 1

Date-1 → hidden neuron 2

Date-1 → hidden neuron 3

Date-1 → hidden neuron 4

1

2

3

4

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 1. MLP 5-10-1 2. MLP 5-10-1

Connections 1. MLP 5-10-1

Weight ID

0.2381

−0.0399

0.0077

−6.7166

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 2. MLP 5-10-1 3. MLP 5-10-1

Table 3.47 Network weights—MLP NN Experiment 4 (Tibco 2020)

12.259

−8.821

1.006

28.805

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 3. MLP 5-10-1 4. MLP 5-11-1

0.034

0.044

0.003

−17.432

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 4. MLP 5-11-1 5. MLP 5-10-1

1.4382

0.1220

−0.0242

−15.2650

Weight values 5. MLP 5-10-1

3.2 Multi-Layer Perceptron Neural Networks 103

104

3 Artificial Neural Networks—Selected Models

Table 3.48 Correlation coefficients—MLP NN Experiment 4 (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. MLP 5-10-1

0.987487

0.987378

0.985182

2. MLP 5-10-1

0.986726

0.986805

0.984748

3. MLP 5-10-1

0.986425

0.986850

0.985807

4. MLP 5-11-1

0.987266

0.987362

0.985766

5. MLP 5-10-1

0.986565

0.985645

0.984734

An important value, which is interesting in relation to the individual networks is the network performance, which exceeds the value of 0.97 for testing, validation, and training datasets of all networks. In the case of the training dataset, the best performance is achieved by the network 1. MLP 25-11-1, by the training dataset performance value of 0.985793. In the case of the testing dataset, the most successful network appears to be 4. MLP 25-11-1, with the performance value of 0.984157. In the case of the validation dataset, the most efficient network is 1. MLP 25-11-1. As for the error level in the testing, training, and validation datasets, the most successful network with the smallest error is 1. MLP 25-11-1. The weights of the networks (a part of them) are presented in Table 3.52. Due to its large extent, only a small part of the table is presented for illustration. Correlation coefficients were determined also for the Experiment 5. It can be seen from Table 3.53 that in the case of validation, testing, and training datasets, the correlation coefficients of all individual networks exceed the value of 0.9, which indicates a very good network performance. Prediction statistics within the fifth experiment are given in Table 3.54, which clearly shows that the values of the datasets in the case of each network do not differ significantly. Data statistics are presented in Table 3.55. Figure 3.31 shows the trend of the individual networks’ curves in comparison with the curve of USD/oz (marked blue). It follows from the figure that the curves of the individual networks at least partially follow the shape of the USD/oz curve. However, it is not clear which of the curves is the closest to the test curve. The graphs of the smoothed time series and prediction for 44 trading days (two calendar months) are presented in Fig. 3.32. The figure shows that only the networks 1. MLP 25-11-1 and 4. MLP 25-11-1 were able to predict the development of the price of gold most precisely, although with some deviations, as e.g. in the case of the first network. Experiment 6 5.

Input variables (time—date, day of the week, day of the month, month, and year), 1 output variable (price of gold), 10-day time lag of time series

For Experiment 6, 5 input variables were selected, specifically time—date, day of the week, day of the month, month, and year. As an output variable, one particular

1820.055 −186.732 202.530 −207.486

1735.300

−173.015

186.798

−150.867

Maximum prediction (validation)

Minimum residual (train)

Maximum residual (train)

Minimum residual (test)

4.358

Maximum standard residual (validation)

3.908

5.249

Minimum standard residual (test) −4.705

−5.791

−4.304

Maximum standard residual (train)

4.493

5.581

5.301

Minimum standard residual (train)

−5.303

−5.145

−4.910

Maximum residual (validation)

Minimum standard residual (validation)

144.634

159.023

Minimum residual (validation)

Maximum standard residual (test)

188.085 −174.122

157.519

−193.503

Maximum residual (test)

540.056

1810.578

543.238

1738.183

588.022

Minimum prediction (test)

538.792 1819.319

587.187

1749.257

Maximum prediction (train)

Minimum prediction (validation)

585.089

Minimum prediction (train)

2. MLP 5-10-1

Maximum prediction (test)

1. MLP 5-10-1

Statistics

Table 3.49 Prediction statistics—MLP NN Experiment 4 (Tibco 2020)

5.721

−4.538

7.427

−4.164

7.294

−4.111

204.359

−162.088

265.500

−148.837

267.656

−150.861

1714.721

534.561

1713.292

535.615

1718.273

533.790

3. MLP 5-10-1

4.240

−5.492

5.369

−6.320

6.219

−5.667

151.619

−196.405

188.266

−221.621

221.156

−201.539

1755.186

586.129

1766.714

586.143

1761.783

586.162

4. MLP 5-11-1

3.902

−5.209

5.673

−7.222

6.415

−6.279

144.564

−192.960

212.004

−269.901

234.199

−229.225

1799.460

529.500

1800.901

529.500

1801.645

529.500

5. MLP 5-101

3.2 Multi-Layer Perceptron Neural Networks 105

41,340.06

1523.42

38,728.00

44,015.00

41,515.31

1590.14

38,722.00

44,019.00

41,440.15

2335.57

38,720.00

44,020.00

41,381.33

1533.01

Minimum (test)

Maximum (test)

Mean (test)

Standard deviation (test)

Minimum (validation)

Maximum (validation)

Mean (validation)

Standard deviation (validation)

Minimum (overall)

Maximum (overall)

Mean (overall)

Standard deviation (averall)

44,020.00

Maximum (train)

Standard deviation (train)

38,720.00

Minimum (train)

Mean (train)

Date input

Samples

1.400899

4.018564

6.000000

2.000000

1.396892

4.027322

6.000000

2.000000

1.441273

4.034608

6.000000

2.000000

1.391633

4.013255

6.000000

2.000000

Day in week input

Table 3.50 Data statistics—MLP NN Experiment 4 (Tibco 2020)

8.70385

15.61643

31.00000

1.00000

8.69112

15.83789

31.00000

1.00000

8.90170

15.28415

31.00000

1.00000

8.64905

15.64016

31.00000

1.00000

Day in month input

3.42037

6.39503

12.00000

1.00000

3.57921

6.52095

12.00000

1.00000

3.42252

6.51730

12.00000

1.00000

3.41621

6.34191

12.00000

1.00000

Month input

4.201

2012.805

2020.000

2006.000

6.354

2012.954

2020.000

2006.000

4.367

2013.162

2020.000

2006.000

4.172

2012.696

2020.000

2006.000

Year input

313.589

1205.034

1895.000

524.750

543.600

1207.606

1834.000

524.750

313.031

1218.531

1895.000

538.750

316.367

1201.595

1895.000

529.500

USD/oz target

106 3 Artificial Neural Networks—Selected Models

3.2 Multi-Layer Perceptron Neural Networks

107

Fig. 3.29 Graph of prediction of USD/oz time series—MLP NN Experiment 4 (Tibco 2020)

output variable was selected—the price of gold. The time lag of the time series is ten days. Table 3.56 presents the performance and the level of network error including the training algorithm, which is the same for all networks—BFGS. The error function is the sum of squares, the activation function of the hidden layer of neurons is the logistic function in three cases and Hyperbolic tangent in two cases, while as a function for the activation of neurons in the output layer, identity is used in one case, logistic function in two cases, exponential function in one case, and Sine in one case. It can be seen at first glance that the network performance in the Experiment 6 will also be high, although slightly lower in some cases than in the previous experiments. This is indicated by the fact that in terms of the performance in datasets (validation, testing, training), the values achieve more than 90%. The highest performance is achieved by the network 2. MLP 50-11-1. The weights of the individual networks are presented in Table 3.57; however, due to the large extent of the table, only the first four rows indicating the weights of the networks are presented. Table 3.57 is thus used for illustrative purposes only. The correlation coefficients of the datasets and individual networks determined within the sixth experiment are listed in Table 3.58. In the case of the correlation coefficients, the best values are those close to 1. In this experiment, all network values achieve the values higher than 0.97, which are considered to be very high values. At this moment, good results can be assumed in terms of predicting the price of gold using some of the aforementioned five networks.

108

3 Artificial Neural Networks—Selected Models

Fig. 3.30 Smoothed time series and prediction for 44 trading days 1. MLP 5-10-1, 2. MLP 5-10-1, 3. MLP 5-10-1, 4. MLP 5-11-1, 5. MLP 5-10-1—Experiment 4 (Tibco 2020)

Prediction statistics are given in Table 3.59. The best values of residuals in this case are achieved by the network 1. MLP 50-11-1 for the training and testing datasets. For the validation dataset, the best network in terms of residuals appears to be the network 3. MLP 50-7-1. Data statistics are given in Table 3.60. Figure 3.33 shows the trends of the individual networks’ curves in comparison with the USD/oz curve (marked blue). It can be seen from the figure that the curves of the individual networks quite successfully follow the course of the USD/oz curve. It can thus be assumed that most of the aforementioned five networks will serve as a suitable tool for predicting the price of gold time series.

Net. name

MLP 25-11-1

MLP 25-7-1

MLP 25-11-1

MLP 25-11-1

MLP 25-7-1

Index

1

2

3

4

5

0.983270

0.984736

0.983354

0.983858

0.985793

Training perf

0.981714

0.984157

0.981746

0.983318

0.983585

Test perf

0.979585

0.980520

0.979989

0.980422

0.980758

Validation perf

1648.149

1512.783

1639.858

1590.428

1401.024

Training error

Table 3.51 Overview of networks—MLP NN Experiment 5 (Tibco 2020)

1780.760

1555.678

1781.792

1622.432

1594.560

Test error

1811.187

1740.351

1775.321

1744.616

1710.007

Validation error

BFGS 152

BFGS 110

BFGS 187

BFGS 217

BFGS 111

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Tanh

Tanh

Tanh

Logistic

Tanh

Hidden activation

Logistic

Exponential

Exponential

Tanh

Logistic

Output activation

3.2 Multi-Layer Perceptron Neural Networks 109

Weight values 1. MLP 25-11-1

1.6902

−0.3760

−0.0819

0.2821

Connections 1. MLP 25-11-1

Date-1 → hidden neuron 1

Date-1 → hidden neuron 2

Date-1 → hidden neuron 3

Date-1 → hidden neuron 4

Weight ID

1

2

3

4

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 2. MLP 25-7-1

−4.1968

−0.5840

−0.0907

1.4174

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 2. MLP 25-7-1 3. MLP 25-11-1

Table 3.52 Network weights—MLP NN Experiment 5 (Tibco 2020)

3.2987

0.8235

−1.3749

0.4656

Weight values 3. MLP 25-11-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 4. MLP 25-11-1

−0.6413

0.2016

−0.7477

1.0665

Weight values 4. MLP 25-11-1

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 5. MLP 25-7-1

2.7451

0.2829

1.0247

−6.6613

Weight values 5. MLP 25-7-1

110 3 Artificial Neural Networks—Selected Models

3.2 Multi-Layer Perceptron Neural Networks

111

Table 3.53 Correlation coefficients—MLP NN Experiment 5 (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

0.985793

0.983585

0.980758

2. MLP 25-7-1

0.983858

0.983318

0.980422

3. MLP 25-11-1

0.983354

0.981746

0.979989

4. MLP 25-11-1

0.984736

0.984157

0.980520

5. MLP 25-7-1

0.983270

0.981714

0.979585

1. MLP 25-11-1

The graphs of the smoothed time series and predictions for 44 trading days (two calendar months) are shown in Fig. 3.34. The blue curve represents the actual development of the price of gold over a given period of time. Red curves represent the individual networks. It follows from the figure that the most accurate prediction of the development is provided by the network 1. MLP 50-11-1.

3.3 Long Short Term Memory Neural Networks Sak et al. (2014) argue that LSTM neural network (Long Short Term Memory) presents a specific architecture of so-called recurrent neural networks (RNN) designed to model time sequences including their dependence rate on the longdistance compared to conventional RNN. The author further claims that the structure, considering its repeating components, applies neither the activation function nor process stored values, and the gradient does not disappear throughout the training. LSTM usually implements units in “blocks” including several components. The blocks at issue contain three or four gates that control the information flow at the logistic function. The input gate, output gate, or forget gate serve as examples.

3.3.1 Mathematical Background According to Gers et al. (2000), LSTM (Long Short Term Memory) network is a special type of recurrent networks, which are able to learn long-term dependencies. This model was first introduced by Hochreiter and Schmidthuber (1997) and became quite popular. It can be said that LSTM networks are very versatile, they are able to solve a wide range of problems, and are currently among the most commonly used recurrent networks. This type of networks was created specifically for the purposes of avoiding difficulties with long-term dependencies. For this reason, their ability to memorize long periods of time was embedded in their structure; therefore, there is no need to learn it in any complicated way. According to Sak et al. (2014), the architecture of LSTM networks is based on cells arranged into chains, where a cell (Fig. 3.35) consists of several gates that

1778.752 −231.815 248.118 −187.488

1785.068

−172.894

231.648

−332.830

Maximum prediction (validation)

Minimum residual (train)

Maximum residual (train)

Minimum residual (test)

3.780

Maximum standard residual (validation)

4.217

6.352

Minimum standard residual (test) −4.870

−4.655

−8.335

Maximum standard residual (train)

6.115

6.222

6.189

Minimum standard residual (train)

−4.313

−5.813

−4.619

Maximum residual (validation)

Minimum standard residual (validation)

176.128

156.306

Minimum residual (validation)

Maximum standard residual (test)

255.864 −203.397

244.170

−178.349

Maximum residual (test)

577.991

1768.793

563.033

1764.446

571.273

Minimum prediction (test)

1778.137

560.289

577.483

1786.214

Maximum prediction (train)

Minimum prediction (validation)

566.260

Minimum prediction (train)

2. MLP 25-7-1

Maximum prediction (test)

1. MLP 25-11-1

Statistics

Table 3.54 Prediction statistics—MLP NN Experiment 5 (Tibco 2020)

3.826

−4.247

6.068

−3.996

6.742

−5.430

161.186

−178.950

256.145

−168.666

273.029

−219.890

1932.811

541.517

1832.368

529.864

1892.556

529.952

3. MLP 25-11-1

4.306

−4.389

5.194

−4.998

5.436

−4.687

179.642

−183.113

204.881

−197.135

211.413

−182.292

1847.954

564.257

1828.056

563.947

1853.302

563.568

4. MLP 25-11-1

3.623

−4.897

5.737

−4.397

5.872

−5.841

154.175

−208.415

242.102

−185.532

238.397

−237.112

1817.944

581.502

1780.957

580.407

1817.080

580.492

5. MLP 25-7-1

112 3 Artificial Neural Networks—Selected Models

41,340.06

1523.42

38,728.00

44,015.00

41,515.31

1590.14

38,722.00

44,019.00

41,440.15

2335.57

38,720.00

44,020.00

41,381.33

1533.01

Minimum (test)

Maximum (test)

Mean (test)

Standard deviation (test)

Minimum (validation)

Maximum (validation)

Mean (validation)

Standard deviation (validation)

Minimum (overall)

Maximum (overall)

Mean (overall)

Standard deviation (overall)

44,020.00

Maximum (train)

Standard deviation (train)

38,720.00

Minimum (train)

Mean (train)

Date input

Samples

1.400899

4.018564

6.000000

2.000000

1.396892

4.027322

6.000000

2.000000

1.441273

4.034608

6.000000

2.000000

1.391633

4.013255

6.000000

2.000000

Day in week input

Table 3.55 Data statistics—MLP NN Experiment 5 (Tibco 2020)

8.70385

15.61643

31.00000

1.00000

8.69112

15.83789

31.00000

1.00000

8.90170

15.28415

31.00000

1.00000

8.64905

15.64016

31.00000

1.00000

Day in month input

3.42037

6.39503

12.00000

1.00000

3.57921

6.52095

12.00000

1.00000

3.42252

6.51730

12.00000

1.00000

3.41621

6.34191

12.00000

1.00000

Month input

4.201

2012.805

2020.000

2006.000

6.354

2012.954

2020.000

2006.000

4.367

2013.162

2020.000

2006.000

4.172

2012.696

2020.000

2006.000

Year input

313.589

1205.034

1895.000

524.750

543.600

1207.606

1834.000

524.750

313.031

1218.531

1895.000

538.750

316.367

1201.595

1895.000

529.500

USD/oz target

3.3 Long Short Term Memory Neural Networks 113

114

3 Artificial Neural Networks—Selected Models

Fig. 3.31 Prediction of USD/oz time series—MLP NN Experiment 5 (Tibco 2020)

help this layer maintain internal memory. The authors add that LSTM cells can be classified into three types: forget, remember, and output. Sakti et al. (2015) says that the architecture of LSTM model includes the so-called input gate, whose task is to decide what to receive from the input, the so-called output gate, which reduces the perturbation of the output error to other blocks, and forget gate, whose tasks consists in deciding on remaining in the state. They are called gates, since the sigmoid limits the input to (0,1) and the multiplication by another vector thus decides on what will be passed on from the input. According to the authors, cell state is the cornerstone. It depends on several linear operations and is retained for all the time. The operation ∗ represents the product by individual components: C (t) = f (t) *C (t−1) + i (t) *C¯ (t)

(3.4)

The first step is the forget gate. In case a new important input appears, the original one needs to be forgotten. f t = o W f h (t−1) + U f x (t)

(3.5)

In the following step, it is necessary to decide on which new information will be accepted. This step consisted of two stages: input gate and tanh layer, which will be used to calculate the specific value of C¯ (t) :

3.3 Long Short Term Memory Neural Networks

115

Fig. 3.32 Graphs of smoothed time series and prediction for 44 trading days 1. MLP 25-11-1, 2. MLP 25-7-1, 3. MLP 25-11-1, 4. MLP 25-11-1, 5. MLP 25-7-1—Experiment 5 (Tibco 2020)

i t = o Wi h (t−1) + Ui x (t) + bi

(3.6)

C¯ (t) = tahn Wc h (t−1) + Uc x (t) + bc

(3.7)

The last step consists in the calculation of what will be used for output. As a conclusion, it could be said that the LSTM networks have a wide range of versions, for example peephole connections.

Net. name

MLP 50-11-1

MLP 50-11-1

MLP 50-7-1

MLP 50-8-1

MLP 50-8-1

Index

1

2

3

4

5

0.983978

0.984565

0.984449

0.985717

0.985725

Training perf

0.981748

0.981232

0.980179

0.982615

0.982392

Test perf

0.978069

0.978746

0.978733

0.980854

0.977933

Validation perf

1569.636

1512.697

1523.802

1400.356

1399.572

Training error

Table 3.56 Overview of networks—MLP NN Experiment 6 (Tibco 2020)

1755.150

1808.432

1906.181

1673.367

1694.422

Test error

1923.831

1867.791

1865.874

1684.065

1938.685

Validation error

BFGS 166

BFGS 95

BFGS 139

BFGS 152

BFGS 158

Training algorithm

SOS

SOS

SOS

SOS

SOS

Error function

Logistic

Tanh

Tanh

Logistic

Logistic

Hidden activation

Sine

Exponential

Logistic

Logistic

Identity

Output activation

116 3 Artificial Neural Networks—Selected Models

Weight values 1. MLP 50-11-1

−0.3180

0.7532

−0.1653

−0.6646

Connections 1. MLP 50-11-1

Date-1 → hidden neuron 1

Date-1 → hidden neuron 2

Date-1 → hidden neuron 3

Date-1 → hidden neuron 4

Weight ID

1

2

3

4

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 2. MLP 50-11-1

−0.3159

−0.0480

−0.1786

−1.5491

Weight values 2. MLP 50-11-1

Table 3.57 Network weights—MLP NN Experiment 6 (Tibco 2020)

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Connections 3. MLP 50-7-1

−0.0234

−0.0196

−0.0802

−0.7466

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 3. MLP 50-7-1 4. MLP 50-8-1

-0.15187

0.01234

0.01382

-0.27740

Date-1 → hidden neuron 4

Date-1 → hidden neuron 3

Date-1 → hidden neuron 2

Date-1 → hidden neuron 1

Weight values Connections 4. MLP 50-8-1 5. MLP 50-8-1

0.0870

−0.0067

0.0578

1.1943

Weight values 5. MLP 50-8-1

3.3 Long Short Term Memory Neural Networks 117

118

3 Artificial Neural Networks—Selected Models

Table 3.58 Correlation coefficients—MLP NN Experiment 6 (Tibco 2020) USD/oz train

USD/oz test

USD/oz validation

1. MLP 50-11-1

0.985725

0.982392

0.977933

2. MLP 50-11-1

0.985717

0.982615

0.980854

3. MLP 50-7-1

0.984449

0.980179

0.978733

4. MLP 50-8-1

0.984565

0.981232

0.978746

5. MLP 50-8-1

0.983978

0.981748

0.978069

3.3.2 Case LSTM networks can be used for the same purposes as convolutional neural networks (CNN), i.e. to solve classification and regression problems. In this example, a long short-term memory network will be built. Built Neural Network Figure 3.36 shows the basic structure of a LSTM neural network with the output size 3350. Figure 3.37 shows the parameters of the elementary layer (No. 2) where the Ramp function was used as the basic function. Figure 3.38 shows the parameters of the elementary layer, where the hyperbolic tangent function was used. Figure 3.39 shows the parameters of the threading layer where the plus function was used. In Fig. 3.40, the fifth layer of the (elementary) LSTM network can be observed where the hyperbolic tangent function is used again. Figure 3.41 shows the penultimate part in which the network data processing takes place, namely the linear layer of the LSTM network. Figure 3.42 shows the last phase of the LSTM network where the output of the network is represented in form of a vector. Trained Neural Network (With Observable Changes) Figure 3.43 shows the structure of an already trained neural network in the first phase. The output size here is 3550. Figure 3.44 illustrates the basic information about the elementary layer for which the Ramp function was selected in the case of a trained LSTM network. Figure 3.45 shows additional parameters related to the elementary layer for which the hyperbolic tangent function was used in the case of an already trained LSTM network.

1779.998 −173.967 228.190 −155.304 225.857 −167.260 223.765 −4.649 6.098 −3.797 5.521 −4.076

1844.443

548.838

1917.035

553.319

1961.064

−177.434

207.138

−203.057

207.555

−198.030

271.945

−4.743

5.537

−4.933

5.042

−4.498

Maximum prediction (train)

Minimum prediction (test)

Maximum prediction (test)

Minimum prediction (validation)

Maximum prediction (validation)

Minimum residual (train)

Maximum residual (train)

Minimum residual (test)

Maximum residual (test)

Minimum residual (validation)

Maximum residual (validation)

Minimum standard residual (train)

Maximum standard residual (train)

Minimum standard residual (test)

Maximum standard residual (test)

Minimum standard residual (validation)

2. MLP 50-11-1

Minimum prediction (train)

537.058

1767.279

538.278

1766.978

537.407

1. MLP 50-11-1

539.391

Statistics

Table 3.59 Prediction statistics—MLP NN Experiment 6 (Tibco 2020) 3. MLP 50-7-1

−5.734

5.599

−7.481

5.560

−4.659

242.820

−247.667

244.431

−326.636

217.034

−181.870

1755.507

579.686

1853.882

579.445

1764.938

579.091

−4.068

3.577

−9.645

4.440

−5.386

186.910

−175.797

152.101

−410.162

172.693

−209.476

1889.312

581.259

1982.162

580.306

1828.392

574.115

4. MLP 50-8-1

−4.167

6.295

−4.829

6.413

−4.988

336.292

−182.758

263.728

−202.315

254.086

−197.618

1734.646

516.751

1736.740

552.371

1740.535

531.270

5. MLP 50-8-1

3.3 Long Short Term Memory Neural Networks 119

41,340.06

1523.42

38,728.00

44,015.00

41,515.31

1590.14

38,722.00

44,019.00

41,440.15

2335.57

38,720.00

44,020.00

41,381.33

1533.01

Minimum (test)

Maximum (test)

Mean (test)

Standard deviation (test)

Minimum (validation)

Maximum (validation)

Mean (validation)

Standard deviation (validation)

Minimum (overall)

Maximum (overall)

Mean (overall)

Standard deviation (overall)

44,020.00

Maximum (train)

Standard deviation (train)

38,720.00

Minimum (train)

Mean (train)

Date input

Samples

1.400899

4.018564

6.000000

2.000000

1.396892

4.027322

6.000000

2.000000

1.441273

4.034608

6.000000

2.000000

1.391633

4.013255

6.000000

2.000000

Day in week input

Table 3.60 Data statistics—MLP NN Experiment 6 (Tibco 2020)

8.70385

15.61643

31.00000

1.00000

8.69112

15.83789

31.00000

1.00000

8.90170

15.28415

31.00000

1.00000

8.64905

15.64016

31.00000

1.00000

Day in month input

3.42037

6.39503

12.00000

1.00000

3.57921

6.52095

12.00000

1.00000

3.42252

6.51730

12.00000

1.00000

3.41621

6.34191

12.00000

1.00000

Month input

4.201

2012.805

2020.000

2006.000

6.354

2012.954

2020.000

2006.000

4.367

2013.162

2020.000

2006.000

4.172

2012.696

2020.000

2006.000

Year input

313.589

1205.034

1895.000

524.750

543.600

1207.606

1834.000

524.750

313.031

1218.531

1895.000

538.750

316.367

1201.595

1895.000

529.500

USD/oz target

120 3 Artificial Neural Networks—Selected Models

3.3 Long Short Term Memory Neural Networks

121

Fig. 3.33 Predictions of USD/oz time series—MLP NN Experiment 6 (Tibco 2020)

The threading layer in the case of a trained LSTM network is characterized by a plus function in this case, as also shown in Fig. 3.46. Figure 3.47 shows the parameters of an elementary layer of a trained LSTM network. In the case of this layer, the hyperbolic tangent function was used. Figure 3.48 shows the sixth part of the LSTM trained network in the form of a linear layer. Figure 3.49 shows the output of the trained LSTM network in form of a vector. Information about the Neural Network The basic structure of the LSTM network, including the designation of its individual parts, can be observed in Fig. 3.50. The individual layers of the network can be seen in Fig. 3.51, where there is a detailed specification of each layer of the LSTM network. A brief flow chart of the neural network is shown graphically in Fig. 3.52. Figure 3.53 shows a diagram of neural network nodes and Fig. 3.54 then a graph of LSTM neural network nodes. Results Figure 3.55 shows the result of the predicted price of gold in comparison with the real price of gold, where the predicted price is indicated by a blue curve; the red curve indicates the actual development of the price of gold in Czech crowns.

122

3 Artificial Neural Networks—Selected Models

Fig. 3.34 Smoothed time series and prediction for 44 trading days 1. MLP 50-11-1, 2. MLP 50-11-1, 3. MLP 50-7-1, 4. MLP 50-8-1, 5. MLP 50-8-1—Experiment 6 (Tibco 2020)

The graph of the variance of residues, i.e. the deviations of the measured values from the prediction in the case of LSTM neural network, is shown in Fig. 3.56. Of course, we strive for the smallest possible deviation, i.e. the smallest possible amplitude. Figure 3.57 shows a graph illustrating a balanced time series in the form of the price of gold in Czech crowns—a red curve, with a blue curve representing the predicted price of gold using LSTM neural network. It can be noticed that except for small deviations, the curves are almost identic, which is already a sign of a good result of the prediction (Fig. 3.57).

3.3 Long Short Term Memory Neural Networks

123

Fig. 3.35 Scheme of LSTM layer cells (Sak et al. 2014)

Figure 3.58 shows how the gold price prediction curve moved in the case of using LSTM neural network over a period of two months. Figure 3.59 shows the development of the price of gold over a period of two months, followed by an example of the prediction of the price of gold for two months ahead. Figure 3.60 shows the development of the gold price over the entire period under review, together with an example of the development of the gold price prediction curve marked in blue. Statistical characteristics that provide information in a concentrated form about the essential statistical properties of the studied group are shown in Fig. 3.61.

124

3 Artificial Neural Networks—Selected Models

Fig. 3.36 Basic information about the LSTM network (Wolfram Research 2020)

3.3 Long Short Term Memory Neural Networks

Fig. 3.37 Elementary layer—Ramp (Wolfram Research 2020)

Fig. 3.38 Elementary layer—Tanh (Wolfram Research 2020)

Fig. 3.39 Threading layer—parameters (Wolfram Research 2020)

125

126

3 Artificial Neural Networks—Selected Models

Fig. 3.40 Elementary layer—Tanh (Wolfram Research 2020)

Fig. 3.41 Linear layer of the LSTM network (Wolfram Research 2020)

Fig. 3.42 LSTM output (Wolfram Research 2020)

3.3 Long Short Term Memory Neural Networks

127

Fig. 3.43 Basic information about the neural network—trained LSTM (Wolfram Research 2020)

128

3 Artificial Neural Networks—Selected Models

Fig. 3.44 Elementary layer—Ramp of a trained LSTM (Wolfram Research 2020)

Fig. 3.45 Elementary layer—Tanh of a trained LSTM (Wolfram Research 2020)

3.3 Long Short Term Memory Neural Networks

Fig. 3.46 Threading layer—trained LSTM (Wolfram Research 2020)

Fig. 3.47 Elementary layer—Tanh of a trained LSTM (Wolfram Research 2020)

129

130

3 Artificial Neural Networks—Selected Models

Fig. 3.48 Linear layer—trained LSTM (Wolfram Research 2020)

Fig. 3.49 Output—trained LSTM (Wolfram Research 2020)

Fig. 3.50 Basic structure of the LSTM neural network (Wolfram Research 2020)

Fig. 3.51 Individual layers of the LSTM neural network (Wolfram Research 2020)

3.3 Long Short Term Memory Neural Networks

131

Fig. 3.52 Flow chart of the LSTM neural network (Wolfram Research 2020)

Fig. 3.53 Diagram of LSTM neural network nodes (Wolfram Research 2020) 5

3

6

17

18

19

e

s

R

BG

4

7

20

21

e

RNN

s

BG

8

9

SA

r

2

0

1

14

13

11

15

16

FC

c

12 t

10

SA

t

Tensor

RNN

elemwise add

slice axis

SwapAxis

relu

FullyConnected

Reshape

expand dims

tanh

copy

BlockGrad

Fig. 3.54 Graph of LSTM neural network nodes (Wolfram Research 2020)

squeeze

132

3 Artificial Neural Networks—Selected Models

Fig. 3.55 Predicted and actual gold price in CZK (Wolfram Research 2020)

Fig. 3.56 LSTM neural network residues (Wolfram Research 2020)

3.3 Long Short Term Memory Neural Networks

133

Fig. 3.57 Balanced time series and prediction—a comparison (Wolfram Research 2020)

Fig. 3.58 Development of the gold price prediction curve for 2 months (Wolfram Research 2020)

Fig. 3.59 Development of the gold price curve for two months plus the prediction of the gold price for the next 2 months (Wolfram Research 2020)

134

3 Artificial Neural Networks—Selected Models

Fig. 3.60 Overall development of the time series plus its prediction (Wolfram Research 2020)

Fig. 3.61 Statistical characteristics of the data file (Wolfram Research 2020)

References Altun, H., A. Bilgil, and B.C. Fidan. 2007. Treatment of multi-dimensional data to enhance neural network estimators in regression problems. Expert Systems with Applications 32 (2): 599–605. Ashoori, S., and S. Mohammadi. 2011. Compare failure prediction models based on feature selection technique: Empirical case from Iran. Procedia Computer Science 3 (1): 568–573. Bas, E., V.R. Uslu, and E. Egrioglu. 2016. Robust learning algorithm for multiplicative neuron model artificial neural networks. Expert Systems with Applications 56 (3): 80–88. Bishop, Ch.M. 1995. Neural networks for pattern recognition. New York, NY United States: Oxford University Press. Echávarri Otero, J., E. de la Guerra Ochoa, Chacón Tanarro, P. Lafont Morgado, A. Díaz Lantada, and J.L. Muñoz Sanz. 2014. Artificial neural network approach to predict the lubricated friction coefficient: From standards chips to embedded systems on chip. Lubrication Science 26 (3): 141–162. Gers, F.A., J. Schmidhuber, and F. Cummins. 2000. Learning to forget: Continual prediction with lstm. Neural Computation 12 (10): 2451–2471. Guan, X., Y. Zhu, and W. Song. 2016. Application of RBF neural network improved by peak density function in intelligent color matching of wood dyeing Chaos. Solitons 89: 485–490. Gubana, A. 2015. State-of-the-art report on high reversible timber to timber strengthening interventions on wooden floors. Construction and Building Materials 97: 25–33. Guresen, E., and G. Kayakutlu. 2011. Definition of artificial neural networks with comparison to other networks. Procedia Computer Science 3: 426–433. Hamid, A., and A. Habib. 2014. Financial forecasting with neural networks. Academy of Accounting and Financial Studies Journal 18 (4): 37–55.

References

135

Hashemi, S., and M.R. Aghamohammadi. 2013. Wavelet based feature extraction of voltage profile for online voltage stability assessment using RBF neural network. International Journal of Electrical Power 49: 86–94. Haykin, S. 2001. Neural networks. A comprehensive foundation. Person Prentice Hall. Hochreiter, S., and J. Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8), 1735–1780. Jingfei, J., C. Dengqing, and C. Huatao. 2016. Boundary value problems for fractional differential equation with causal operators. Applied Mathematics and Nonlinear Sciences 1 (1): 11–22. Kiruthika, M., and M. Dilsha. 2015. A neural network approach for microfinance credit scoring. Journal of Statistics and Management Systems 18 (1): 121–138. Klieštik, T. 2013. Models of autoregression conditional heteroskedasticity garch and arch as a tool for modeling the volatility of financial time series. Ekonomicko-Manažerské Spektrum 7 (1): 2–10. Kumar, M., and N. Yadav. 2011. Multilayer perceptrons and radial basis function neural network methods for the solution of differential equations: A survey. Computers & Mathematics with Applications 62 (10): 3796–3811. Lahsasna, A., R.N. Ainon, and Y.W. Teh. 2008. Intelligent credit scoring model using soft computing approach. International Conference on Computer and Communication Engineering, 396–402. Liu, W., X.P. Li, H.-O. Mao, and T.-Y. Chai. 2004. Neural network cost prediction model based on real-coded genetic algorithm and its application. Kongzhi Lilun yu Yingyong/Control. Theory & Applications (China), 21(3), 423–426. Michal, P., A. Vagaská, M. Gombár, J. Kmec, E.A. Spišák, and D. Kuˇcerka. 2015. Usage of neural network to predict aluminium oxide layer thickness. The Scientific World Journal, 1–10. Mileris, R., and V. Boguslauskas. 2011. Credit risk estimation model development process: Main steps and model improvement. Engineering Economics 22 (2): 126–133. Moreno, J.J.M., A.P. Pol, and P.M. Gracia. 2011. Artificial neural networks applied to forecasting time series. Coden Psoteg, Psicothema 23 (2): 322–329. Olej, V. 2003. Modelovanie ekonomických procesov na báze výpoˇctovej inteliegencie [Modeling of economic processes based on computational intelligence], Miloš Vognar MV. Pao, H.T., and G. Kayakutlu. 2008. A comparison of neural network and multiple regression analysis in modeling capital structure. Expert Systems with Applications 35 (3): 720–727. Pazouki, M., Z. Wu, Z. Yang, and D.P.F. Moeller. 2015. An efficient learning method for RBF neural networks. Proceedings of the International Joint Conference on Neural Networks, 1–6. Rowland, Z., and J. Vrbka. 2016. Optimization of a company’s property structure aiming at maximization of its profit using neural networks with the example of a set of construction companies. Mathematical Modeling in Economics 3–4 (7): 36–41. Sakti, S., F. Ilham, G. Neubig, T. Toda, A. Purwarianti, and S. Nakamura. 2015. Incremental sentence compression using lstm recurrent networks. In2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 252–258. Sak, H., A. Senior, and F. Beaufays. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 338–342. Sayadi, A.R., S.M.M. Tavassoli, Monjezi, and M. Rezaei. 2012. M. Application of neural networks to predict net present value in mining projects. Arabian Journal of Geosciences, 7(3), 1067–1072. Slavici, T., D. Mnerie, and S. Kosutic. 2012. Some applications artificial neural networks in agricultural management. In: Actual Tasks on Agricultural Engineering. Proceedings of the 40. International Symposium on Agricultural Engineering, 363–373. Tuˇcková, J. 2003. Introduction to the theory and applications of artificial neural networks. Prague: ˇ CVUT Publishing.

136

3 Artificial Neural Networks—Selected Models

Vochozka, M. 2017. Formation of complex company evaluation method through neural networks based on the example of construction companies’ collection. AD ALTA-Journal of Interdisciplinary Research 7 (2): 232–239. Vochozka, M., Z. Rowland, V. Stehel, P. Šuleˇr, and J. Vrbka. 2016. Business cost modeling using ˇ neural networks. Ceské Budˇejovice: Institute of Technology and Business.

Chapter 4

Comparison of Different Methods

4.1 Neural networks 4.1.1 Case Basic information related to the artificial neural network method is shown in Fig. 4.1. The predictors in this case are neural networks, the number of test examples is 1221 and the number of training examples is 2442. The report related to NN is shown in Fig. 4.2. Figure 4.3 shows a comparison chart comparing actual gold prices with those predicted using artificial neural networks. The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It can be noticed that the accuracy of the predicted values compared to the actual values is very high. This model therefore proves to be suitable for prediction. Probability Density Histogram related to the NN method is shown in Fig. 4.4. Figure 4.5 shows the histogram of the residues, i.e. the frequency of the difference between the actual and predicted values. The residue can be understood as the size of the error that we make at the selected point in the estimation. It can be noted that in the case of using the NN model, bin 0–5 has the largest representation. The extremes, in the form of residues −60 and 60, have the smallest proportion of the entire histogram, as they have occurred the least amount of times. The residue chart can be seen in Fig. 4.6. The value of Standard Deviation for this experiment is 20.7404. The value of Mean Cross Entropy is 4.45335, Mean Deviation is 14.9052, Mean Square is 430.163 and Evaluation Time is 0.00566.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_4

137

138

4 Comparison of Different Methods

Fig. 4.1 Report (Wolfram Research 2020)

Fig. 4.2 Report—NN (Wolfram Research 2020)

The time series, including the prediction for two months, is shown in Fig. 4.7. It can be observed from the chart that the level of accuracy of gold price prediction is very high in the case of using the neural network method. The curve of the predicted price (blue) accurately copies the curve of the observed price of gold in CZK (red). The size of residues over the observed years can be seen in Fig. 4.8. The aforementioned prediction of gold prices for two months is shown in Fig. 4.9. It is possible to observe a prediction of a significant decline in the trend at the beginning of August with a subsequent gradual increase and further decline in gold prices at the end of August. The time series of the observed development of the gold price, including the prediction of the development of the gold price for the next two months, is shown in Fig. 4.10.

4.1 Neural networks

139

Fig. 4.3 Comparison Plot—NN (Wolfram Research 2020)

Fig. 4.4 Probability Density Histogram—NN (Wolfram Research 2020)

A more detailed view of the observed development of the price of gold in CZK with the connection of gold price prediction using NN is shown in Fig. 4.11.

140

4 Comparison of Different Methods

Fig. 4.5 Residual Histogram—NN (Wolfram Research 2020)

Fig. 4.6 Residual Plot—NN (Wolfram Research 2020)

4.2 Decision Tree Currently, more and more scientific disciplines use a tree structure for storing knowledge. This applies also to e.g. the field of artificial intelligence, among which the decision trees are ranked as well (Andone and Sireteanu 2009). Decision trees can be seen as a non-linear, hierarchical system, which provides the possibility to store knowledge. Sagi and Rokach (2020) consider decision trees to be a tool for representation and support in the case of solving multi-stage decision-making processes in the conditions of uncertainty and risk. The authors also add that decision trees are the most important

4.2 Decision Tree

141

Fig. 4.7 Time series with prediction—NN (Wolfram Research 2020)

Fig. 4.8 Residues—NN (Wolfram Research 2020)

graphical means of decision analysis that applies a conceptual apparatus of graph theory. According to Xiao and Xu (2020), decision trees outline various variants, risk factors along with their development, as well as the potential consequences of these variants that bring along the risk. Decision trees can be characterized quite precisely as a certain sequence of nodes (knots) and edges of an oriented graph. Their basic structure consists of a combination of decision (Fig. 4.13) and chance nodes (4.12),

142

4 Comparison of Different Methods

Fig. 4.9 Prediction for two months—NN (Wolfram Research 2020)

Fig. 4.10 Time series with prediction—NN (Wolfram Research 2020)

Fig. 4.11 Time series with prediction 2—NN (Wolfram Research 2020)

4.2 Decision Tree Fig. 4.12 Chance node (Author according to Fotr 2006)

143

S 1 S 2

Fig. 4.13 Decision nodes (Author according to Fotr 2006)

1

R

2

R

where chance nodes represent the process stage in problem solving, where the given alternative is selected “by nature”, regardless of the will of the decision-maker. Fotr (2006) refers to the individual variants as situational, showing the individual values of individual risk factors, where the decision-maker is not able to modify the values in any way. The individual chance nodes can be shown by means of an edge and circles coming from the given nodes. Decision nodes are most commonly represented in the form of diamonds, rectangles, or squares. Decision nodes represent the stage of a decision-making process, where the decision-maker can choose one particular variant from all variants possible. Logically, the decision-maker chooses the variant that is most acceptable to him at the moment. From these decision nodes, edges representing the individual variants of decisions are coming. The figures below (Figs. 4.12 and 4.13) show chance and decision nodes. In the case of decision trees, a combination of trees with both chance and decision nodes can be encountered in many situations. In practice, however, there are also the decision trees with decision nodes only. These threes are referred to as deterministic decision trees.

4.2.1 Case Using the Decision Tree method, the development of the price of gold is predicted in this experiment. Basic information on the Decision Tree method used is shown in Fig. 4.14. The predictors in this case are the Decision Tree, the number of test examples is 1221 and the number of training examples is 2442. The report related to this method is shown in Fig. 4.15. Figure 4.16 shows a comparison chart comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Decision Tree method, shown by a curve labelled “predictions". The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles.

144

4 Comparison of Different Methods

Fig. 4.14 Basic information—Decision Tree (Wolfram Research 2020)

Fig. 4.15 Report—Decision Tree (Wolfram Research 2020)

It should be noted that the accuracy of the predicted values compared to the actual values is not very high and large residues can be seen. This model therefore proves to be less suitable for prediction. Probability Density Histogram related to the Decision Tree method is shown in Fig. 4.17. Figure 4.18 shows a histogram of residues, i.e. the frequency of the difference between the actual and predicted value. The residue can be understood as the size of the error that we make at the selected point in the prediction. It should be noted that in the case of using the Decision Tree method, the most represented is the range

4.2 Decision Tree

145

Fig. 4.16 Comparison Plot—Decision Tree (Wolfram Research 2020)

Fig. 4.17 Probability Density Histogram—Decision Tree (Wolfram Research 2020)

of values between −20 and 0. It should be noted that when using this method, the extreme values of residues range up to −200 and 200, compared to −50 and 50 for NN. The extremes, in the form of residues with values of −200 and 200, have the smallest representation of the whole histogram, because they occurred the least amount of times.

146

4 Comparison of Different Methods

Fig. 4.18 Residual Histogram—Decision Tree (Wolfram Research 2020)

Residues mentioned above can be seen graphically shown in Fig. 4.19. The value of Standard Deviation for this experiment is 58.5175. The value of Mean Cross Entropy is 5.4493, Mean Deviation is 45.9167, Mean Square is 3424.3 and Evaluation Time is 0.0049. The time series, including the two-month prediction, is shown in Fig. 4.20. Based on the chart, we can observe that the gold price prediction is inaccurate when using the Decision Tree method. The curve of the predicted price (green) does not copy the curve of the observed price of gold in CZK (red). The Decision Tree method was able to predict the direction of the curve at least partially, but a closer look reveals that the prediction is very inaccurate. The size of residues over the observed years can be seen in Fig. 4.21.

Fig. 4.19 Residual Plot—Decision Tree (Wolfram Research 2020)

4.2 Decision Tree

147

Fig. 4.20 Time series with prediction—Decision Tree (Wolfram Research 2020)

Fig. 4.21 Residues - Decision Tree (Wolfram Research 2020)

The aforementioned prediction of gold prices for two months using the Decision Tree method is shown in Fig. 4.22. It is clear from the figure that it is not appropriate to predict future values using the Decision Tree method. Time series including the two-month prediction is shown in Fig. 4.23. The time series with prediction 2 shown in Fig. 4.24 clearly shows that it is not appropriate to predict future values using the Decision Tree method.

148

4 Comparison of Different Methods

Fig. 4.22 Prediction—Decision Tree (Wolfram Research 2020)

Fig. 4.23 Time series with prediction—Decision Tree (Wolfram Research 2020)

Fig. 4.24 Time series with prediction 2—Decision Tree (Wolfram Research 2020)

4.3 Gaussian Process

149

4.3 Gaussian Process Gaussian process is a very interesting technique, whose popularity has grown significantly in recent years. In terms of its definition, it can be said that Gaussian process is basically a generalization of the Gaussian probability distribution. However, while the principle of probability distribution consists in the description of vectors or variables, the random process deals with the description of a function (Klyuchnikov and Burnaev, 2020). An example could be X ∼ GP (m, k), which can be explained by the random function X being distributed by means of the Gaussian distribution, with the function of the mean m and covariance function k. Generally, it can be explained by the Gaussian process being represented using the distribution through all suitable functions, where the properties of these functions are determined on the basis of covariance functions. The existence of many covariance functions enables a flexible setting of the Gaussian process. Another possibility is the interpretation of objective function as an unknown scalar of the function x ∈ Rn in the dimensional space n. Then, evaluating of such a function on a set of solution points XN = {×1,…, xN} will result in the set of function values tN = {t1,…, tN}, where ∀i = 1,…, N, ti = f(xi). In simpler terms, the input for the Gaussian process is represented by the training set D of N data points with corresponding function values at those points. The training set can be also described as follows: D = { (xi , ti )|i = 1, . . . , n} = (X, t) (X) D = { (xi , ti )|i = 1, . . . , n} = (X, t)

(4.1)

The implementation of the point x and the corresponding value of the function f(x) can be seen as certain points of distribution certainty. In other words, the functions with inconsistent properties and structure determined by the evaluation are eliminated. The biggest problem within the Gaussian process consists in finding a suitable combination of properties for a selected covariance function (Rahmussen and Williams, 2006). Bajer et al. (2015) state that many different studies have shown that the application of models based on the Gaussian process can result in a significant acceleration compared to other types of models.

4.3.1 Case Using the Gaussian Process method, the development of the price of gold is predicted in this experiment. The basic information about the Gaussian Process method used is shown in Fig. 4.25. The predictor in this case is the Gaussian Process, the number of test examples is 1221 and the number of training examples is 2442.

150

4 Comparison of Different Methods

Fig. 4.25 Basic information—Gaussian Process (Wolfram Research 2020)

The report related to the Gaussian Process method used is shown in Fig. 4.26. Figure 4.27 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Gaussian Process method, shown by a curve labelled “predictions". The exact prediction is illustrated here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It can be noticed that the accuracy of the predicted values compared to the actual values is really high and only a slight deviation of the points from the dashed line can be observed. Therefore, this model has so far proved to be suitable for prediction. Probability Density Histogram related to the Gaussian Process method is shown in Fig. 4.28. Fig. 4.26 Report—Gaussian Process (Wolfram Research 2020)

4.3 Gaussian Process

151

Fig. 4.27 Comparison Plot—Gaussian Process (Wolfram Research 2020)

Fig. 4.28 Probability Density Histogram—Gaussian Process (Wolfram Research 2020)

Figure 4.29 shows a histogram of residues, i.e. the frequency of the difference between the actual and predicted value. The residue can be understood as the size of the error that we make at the selected point in the prediction. It can be noticed that when using the Gaussian Process model, values bin 0–10 has the largest proportion. Extremes, in the form of residues −100 and 100, have the

152

4 Comparison of Different Methods

Fig. 4.29 Residual Histogram—Gaussian Process (Wolfram Research 2020)

smallest proportion of the entire histogram, as they have occurred the least amount of times. Residues are then shown graphically in Fig. 4.30. The value of Standard Deviation for this experiment is 28.4139. The value of Mean Cross Entropy is 4.81866, Mean Deviation is 21.1516, Mean Square is 807.351 and Evaluation Time is 0.020. The time series, including the two-month prediction, is shown in Fig. 4.31. The graph shows that the gold price prediction is very accurate when using the Random Forest method. The curve of the predicted price (green) for the most part copies the curve of the actual price of gold in CZK (red). The Gaussian Process method was able to predict the development of the curve’s motion quite

Fig. 4.30 Residual Plot (Wolfram Research 2020)

4.3 Gaussian Process

153

Fig. 4.31 Time series with prediction– Gaussian Process (Wolfram Research 2020)

accurately, except for some deviations. This model appears to be suitable for use in time series prediction. Residues for the observed time period are recorded in the chart in Fig. 4.32. The prediction for two calendar months using the Gaussian Process method is shown in the chart in Fig. 4.33. The time series of the actual values of the price of gold, including the prediction for two calendar months, is shown in Fig. 4.34.

Fig. 4.32 Residues—Gaussian Process (Wolfram Research 2020)

154

4 Comparison of Different Methods

Fig. 4.33 Prediction—Gaussian Process (Wolfram Research 2020)

Fig. 4.34 Time series with prediction—Gaussian Process (Wolfram Research 2020)

Fig. 4.35 Time series with prediction 2—Gaussian Process (Wolfram Research 2020)

4.3 Gaussian Process

155

Figure 4.35—time series with prediction 2, shows a part of the development of the time series of actual values of gold with the connection of prediction for two calendar months.

4.4 Gradient Boosted Trees Recently, a popular algorithm used for solving classification and regression tasks by means of decision tress is the so-called Gradient booster models (GBM), also known as Gradient Boosted Trees. Gradient boosting was popularised at the end of 1990s by the statistician Leo Breiman. According to Natekin and Knoll (2013), the principle of the gradient boosting method consists in a gradual creation of models, where each successive model is created with regard to the errors of the preceding models. According to Friedman (2002), gradient boosting method is a machine learning method applying the combination of weak learners, which represent the decision trees of a limited size in the case of GBM. If T is a number of end nodes in a partial decision tree, their number is determined based on the level of interaction between the input variables. If there is a tree with T = 2, there cannot be any interaction between the variables. These special cases are referred to as decision stumps. Empirically, it was proved that T ∈ 4, 8 is optimal for gradient boosting. The following figure (Fig. 4.36) shows the decision-making process by means of weak classification decision trees in the GBM algorithm (Hastie, 2009). In case an attempt is to teach the M model to predict values such as yˆ = (x), a primary objective function can be applied—the so-called mean squared error (MSE). This can be calculated as follows: for each input in the training dataset, a difference of expected and predicted value squared is calculated ( yˆ −y)2 . The result is subsequently divided by the number of elements in the training dataset. Hastie (2009) states that each iteration i in the Gradient Boosting algorithm contains an Mi model, where the imperfection of this model is assumed. Gradient Boosting aims to take the Mi model and adjust it so that better accuracy is achieved. This relationship can be generally transcribed as follows:

+ … + Fig. 4.36 Decision-making process in the GBM algorithm (Hastie 2009). Author

156

4 Comparison of Different Methods

Mi+1 (x) = Mi (x) + h(x) = y

(4.2)

On the basis of simple consideration, the ideal function h can be expressed as follows: h(x) = y − Mi (x)

(4.3)

where: y—M i (x)

error in prediction between y and yˆ .

4.4.1 Case Using the Gradient Boosted Trees method, the development of the price of gold is predicted in this experiment. Basic information on the Gradient Boosted Trees method used is shown in Fig. 4.37. The predictors in this case are Gradient Boosted Trees, the number of test examples is 1221 and the number of training examples is 2442. The report related to the Gradient Boosted Trees method is shown in Fig. 4.38. Figure 4.39 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Gradient Boosted Trees method, shown by a “predictions” curve. The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. From the graph it is possible to observe that the blue dots lie close to the curve with partial deviations from the curve and the accuracy of the predicted values compared to the actual values is quite high. This model therefore proves to be quite suitable for prediction. Probability Density Histogram related to the Gradient Boosted Trees method is shown in Fig. 4.40. Figure 4.41 shows a histogram of the residues, i.e. the frequency of the difference between the observed and the predicted value. The residues can be understood as the size of the error that we make at the selected point in the estimation. It should be noted that when using the Gradient Boosted Trees method, the range of values is −20 to 0. It should be noted that when using this method, the extreme residue values reach −100 and 100, compared to −200 and 200 for the Decision Tree method. Extremes, in the Fig. 4.37 Basic information—Gradient Boosted Trees (Wolfram Research 2020)

4.4 Gradient Boosted Trees

Fig. 4.38 Report—Gradient Boosted Trees (Wolfram Research 2020)

Fig. 4.39 Comparison Plot—Gradient Boosted Trees (Wolfram Research 2020)

157

158

4 Comparison of Different Methods

Fig. 4.40 Probability Density Histogram—Gradient Boosted Trees (Wolfram Research 2020)

Fig. 4.41 Residual Histogram—Gradient Boosted Trees (Wolfram Research 2020)

form of residues around the values of −100 and 100, have the smallest representation of the entire histogram, as they have occurred the least amount of times. The graphically shown residues related to the Gradient Boosted Trees method can be seen in Fig. 4.42. The value of Standard Deviation for this experiment is 27.3378. The value of Mean Cross Entropy is 4.79768, Mean Deviation is 19.7399, Mean Square is 747.353 and Evaluation Time is 0.0085. The time series, including the two-month prediction, is shown in Fig. 4.43. The graph shows that the gold price prediction is quite accurate when using the Gradient Boosted Trees method, with a few exceptions. The predicted price curve (gold) accurately copies the observed gold price curve in CZK (red). The

4.4 Gradient Boosted Trees

159

Fig. 4.42 Residual Plot—Gradient Boosted Trees (Wolfram Research 2020)

Fig. 4.43 Time series with prediction—Gradient Boosted Trees (Wolfram Research 2020)

Gradient Boosted Trees method was able to predict the direction of the curve, with high accuracy at some points. Deviations are noticeable, but negligible compared to the previous Decision Tree method. The size of the residues over the observed years can be seen in Fig. 4.44. The prediction for two calendar months, created using the Gradient Boosted Trees method, is shown in Fig. 4.45. The time series with a two-month prediction is shown in Fig. 4.46. With the Gradient Boosted Trees method, it is no longer possible to say that the prediction is unusable and incorrect, as was the case with the previous Decision Tree method.

160

4 Comparison of Different Methods

Fig. 4.44 Residues—Gradient Boosted Trees (Wolfram Research 2020)

Fig. 4.45 Prediction—Gradient Boosted Trees (Wolfram Research 2020)

Figure 4.47 shows a graph of the curve of observed values with the connecting curve of predicted values for two calendar months.

4.5 Linear Regression Hendl (2004) claims that the term “regression” was first used at the end of the nineteenth century by Francis Galton from Great Britain, who used it within his specialization when focusing the dependence of the height of offspring on the height

4.5 Linear Regression

161

Fig. 4.46 Time series with a prediction for two months—Gradient Boosted Trees (Wolfram Research 2020)

Fig. 4.47 Time series with prediction 2—Gradient Boosted Trees (Wolfram Research 2020)

of their parents. Valášková et al. (2018) state that regression analysis can generally be included among the most commonly used methods worldwide. The authors further add that the aim of this method is to examine the dependencies and relations between various variables, with their main objective being the description of the shape of the relationship between two or more variables. According to Moravˇcíková et al. (2017), this type of analysis is used to determine a suitable formula for predicting a dependent variable (Y), and based on the factors that affect the variables, it evaluates the prediction error—an independent variable (X). Variables Y and X are interconnected by means of regression function including several various parameters, where if the function has a linear character, it is referred to as a linear regression model, while if the parameters do not include a linear function, it is referred to as a non-linear regression model. In the case of modelling the behaviour of variable Y by means of one variable X, it is so-called simple regression; otherwise, we speak about the so-called multiple

162

4 Comparison of Different Methods

regression analysis, which is able to determine the parameters of the relationship of two variables like e.g. standard deviation and the means characterizing the behaviour of one variable. According to Hindls (2007), the shape of the classical model of linear is as follows: yi = yβ0 + β1 xi1 + · · · . + βk xik + εi

(4.4)

where: yi xi1 , … xik β0 , β1 , … βk εi n

value of random variable Y, values of explanatory variables, model’s parameters, random component (its mean value = 0), i = 1, 2, observation index.

The simplest examples of regression function include so-called regression line. Its function n(x) is expressed by means of the following formula: n(x) = β0 + β1 x

(4.5)

Hendl (2004) states that regression analysis is applied in a wide range of various fields and disciplines. As an example, he mentions technologies, where regression analysis is used to estimate the risk of failure or defect under certain conditions. In the field of medicine, it is used for modelling the efficacy of medicine; it is also important for marketing, where it is used for determining the probability of customers’ switch to competition. Last but not least, it is the field of finance, where regression analysis is used for forecasting customer’s creditworthiness or for predicting future development of a business entity in dependence on economic indicators.

4.5.1 Case Using the Linear Regression method, the development of gold price is predicted in this experiment. Fig. 4.48 Basic information—Linear Regression (Wolfram Research 2020)

4.5 Linear Regression

163

Basic information on the Linear Regression method used is shown in Fig. 4.48. The predictor in this case is Linear Regression, the number of test examples is 1221 and the number of training examples is 2442. The report on the Linear Regression method is shown in Fig. 4.49. Figure 4.50 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Linear Regression method, shown by a “predictions” curve. The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It should be noted that the accuracy of the predicted values compared to the actual values is not very high and large residues can be seen, i.e. deviations from the dashed line, which indicates the unsuitability of the model for prediction. This model therefore proves to be less suitable for prediction. Probability Density Histogram related to the Linear Regression method is shown in Fig. 4.51. Fig. 4.49 Report—Linear Regression (Wolfram Research 2020)

164

4 Comparison of Different Methods

Fig. 4.50 Comparison Plot—Linear Regression (Wolfram Research 2020)

Fig. 4.51 Probability Density Histogram—Linear Regression (Wolfram Research 2020)

The histogram of the resulting residues is shown in Fig. 4.52 and indicates the deviation and inaccuracy of the predicted values from the actual values. In this histogram we see that the extremes range from −300 to 500, which is an especially wide range. The highest proportion in terms of residues can be observed in the range of −50 to −100. Graphically represented residues in the chart are shown in Fig. 4.53.

4.5 Linear Regression

165

Fig. 4.52 Residual Histogram—Linear Regression (Wolfram Research 2020)

Fig. 4.53 Residual Plot—Linear Regression (Wolfram Research 2020)

The value of Standard Deviation for this experiment is 179.88. The value of Mean Cross Entropy is 6.61125, Mean Deviation is 130.294, Mean Square is 32356.9 and Evaluation Time is 0.0058. The time series, including the two-month prediction, is shown in Fig. 4.54. The chart shows that the gold price prediction is very inaccurate when using the Linear Regression method. The curve of the predicted price (black) does not copy the curve of the actual price of gold in CZK (red). The Linear Regression method could not predict the direction of the curve in some sections of the chart, including high inaccuracy at some points. The deviations are not negligible. Residues produced using the Linear Regression method are recorded over the monitored years in the chart shown in Fig. 4.55.

166

4 Comparison of Different Methods

Fig. 4.54 Time series with prediction—Linear Regression (Wolfram Research 2020)

Fig. 4.55 Residues—Linear Regression (Wolfram Research 2020)

The prediction for two calendar months using the Linear Regression method is shown in Fig. 4.56. The chart shows that, according to the forecast, there was a large increase in the price of gold at the turn of August and September. The time series of the actual development of the price of gold, including the previously mentioned two-month prediction, is shown in Fig. 4.57. The prediction curve is marked in black here. The actual gold price curve is marked in red.

4.5 Linear Regression

167

Fig. 4.56 Prediction—Linear Regression (Wolfram Research 2020)

Fig. 4.57 Time series with prediction—Linear Regression (Wolfram Research 2020)

Time series with a subsequent prediction for two calendar months is shown in detail in Fig. 4.58.

4.6 Nearest Neighbours The method of Nearest Neighbours, also known as k-NN (k-Nearest Neighbours) is a k-nearest neighbour algorithm, which belongs to basic and commonly applied algorithms of artificial intelligence. The method is applied mainly for the algorithm is applied for the problems of regression and classification (Mocnik 2020). The figure below (Fig. 4.59) shows the basic principle of this algorithm. According to Ya and Yu (2006), it works on the principle of storing information on the values of training elements during training, while during testing, for each test element,

168

4 Comparison of Different Methods

Fig. 4.58 Time series with prediction 2—Linear Regression (Wolfram Research 2020)

k=3 k=7 Fig. 4.59 Principle of k-NN algorithm (Yu and Yu 2006). Author

there is calculated distance (Manhattan distance, Euclidean distance, etc.) between the attributes of this element and attributes of the training element. Based on this method, distance calculations are performed on all training samples. Subsequently, the distances are arranged and the k samples with the shortest distance are considered nearest neighbours. On the basis of the classification classes of k nearest training elements, the test element is classified. Recently, in a number of studies, there has been mentioned the application of modified or optimized versions of the k-NN algorithm. For example, Garcia et al. (2008) compared the implementation of the GPU version of k-NN with several CPU versions. Liang et al. (2009) introduced several special methods in order to maximize the application of the graphics accelerator. It is also necessary to mention Garcia et al. (2008), who dealt with the description of the k-NN algorithm for texture analysis.

4.6 Nearest Neighbours

169

4.6.1 Case Using the Nearest Neighbours method, gold price development is predicted in this experiment. The basic information on the Nearest Neighbours method used is shown in Fig. 4.60. The predictors in this case are Nearest Neighbours, the number of test examples is 1221 and the number of training examples is 2442. The report related to the Nearest Neighbours method is shown in Fig. 4.61.

Fig. 4.60 Basic information—Nearest Neighbours (Wolfram Research 2020)

Fig. 4.61 Report—Nearest Neighbours (Wolfram Research 2020)

170

4 Comparison of Different Methods

Fig. 4.62 Comparison Plot—Nearest Neighbours (Wolfram Research 2020)

Figure 4.62 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Nearest Neighbours method, shown by a curve labelled “predictions". The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It should be noted that the accuracy of the predicted values compared to the actual values is high and only slight deviations from the dashed line can be seen. This model therefore proves to be quite suitable for prediction. Probability Density Histogram related to the Nearest Neighbours method is shown in Fig. 4.63. Figure 4.64 shows a histogram of residues, i.e. the frequency of the difference between the actual and predicted value. The residue can be understood as the size of the error that we make at the selected point in the estimation. It can be noticed that in the case of using the Nearest Neighbours model, the values of bin −5 to 0 have the largest representation. Extremes, in the form of residues −70 and 70, have the smallest representation of the entire histogram, as they have occurred the least amount of times. Figure 4.65 then shows the residues in a chart. The value of Standard Deviation for this experiment is 37.1113. The value of Mean Cross Entropy is 5.03393, Mean Deviation is 20.2955, Mean Square is 1377.25 and Evaluation Time is 0.0048. The time series, including the two-month prediction, is shown in Fig. 4.66. The chart shows that the gold price prediction is very accurate when using the Nearest Neighbours method. The curve of the predicted price (orange) accurately copies the curve of the actual price of gold in CZK (red). The Nearest

4.6 Nearest Neighbours

171

Fig. 4.63 Probability Density Histogram—Nearest Neighbours (Wolfram Research 2020)

Fig. 4.64 Residual Histogram—Nearest Neighbours (Wolfram Research 2020)

Neighbours method was able to predict the development of the curve very accurately. Deviations are negligible. The resulting residues for the observed years are shown in Fig. 4.67. The prediction for two calendar months is shown in Fig. 4.68. Sudden changes in the development of the curve can be observed. The time series of the actual development of the gold price together with the subsequent prediction, using the Nearest Neighbours method, is shown in Fig. 4.69. Figure 4.70 then shows a detail of the development of a part of the time series including the prediction.

172

4 Comparison of Different Methods

Fig. 4.65 Residual Plot—Nearest Neighbours (Wolfram Research 2020)

Fig. 4.66 Time series with prediction—Nearest Neighbours (Wolfram Research 2020)

4.7 Random Forest The method of Random Forest was created by Leo Breiman, who, in his study, created a forest using a combination of decision trees in order to improve classification or prediction (Isobe and Tamada 2018). According to Breiman (2001), Random Forest can be characterized as an extension and particular implementation of decision trees using the application of several methods in order to increase the applicability on real data. The method of Random Forest is basically not particularly different from the method of decision threes. The main difference consists in the fact that instead of one tree (the so-called “boosting”), a collection of trees (forest) is created. The

4.7 Random Forest

173

Fig. 4.67 Residues—Nearest Neighbours (Wolfram Research 2020)

Fig. 4.68 Prediction—Nearest Neighbours (Wolfram Research 2020)

task of the forest is to decide on the classification of a data point in a class (in the case of classification) by voting or by calculating the average of the target value from the estimates of the individual trees. It follows from the above that there is an effort to reduce a general error of the forest as a whole instead of reducing the inaccuracy of individual trees. In connection with this issue, the term “bagging” or “bootstrap aggregating” is used, which represents a random division of a training dataset into k parts, where one part is applied for the creation of trees and the other for the purposes of verification of the created model of forest (Ma et al. 2020).

174

4 Comparison of Different Methods

Fig. 4.69 Time series with prediction—Nearest Neighbours (Wolfram Research 2020)

Fig. 4.70 Time series with prediction 2—Nearest Neighbours (Wolfram Research 2020)

According to Tang and Ishwaran (2017), the mechanism of random forests is sufficiently universal to be used for solving classification and regression tasks. The authors further add that this method could be used to eliminate some problems that may arise from the application of trees, e.g. instability of trees. Random forest is created by a set of trees T1 ,….TS, , whose classification function is expressed using the following formula: {d(x, k ), k = 1, . . . , S} where: x 1 ,…,S

vector of predictors’ values, independent, equally distributed random vectors.

(4.6)

4.7 Random Forest

175

The method of Random Forest applies binary trees of the CART type. Focusing on the advantages of this method, it can be stated that it consists primarily in their simple learning and debugging, which is also the reason for their general popularity and common implementation in various fields. Random forest has the ability to improve accuracy (reduce fluctuations) by not allowing the trees to grow to excessive complexity; it does not cut the trees but it maintains the ideal dispersion by combining the results of the individual trees. Originally, this method was created for the datasets that create a large amount of predictors; however, it turned out to work well even in the case of small datasets (Ramo and Chuvieco 2017).

4.7.1 Case Using the Random Forest method, the development of the price of gold is predicted in this experiment. The basic information on the Random Forest method used is shown in Fig. 4.71. The predictor in this case is Random Forest, the number of test examples is 1221 and the number of training examples is 2442. The report for the Random Forest method is shown in Fig. 4.72. Figure 4.73 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Random Forest method, shown by a curve labelled “predictions". The exact prediction is illustrated here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It should be noted that the accuracy of the predicted values compared to the actual values is not very high and a large deviation of the circles from the dashed line can be observed. This model therefore proves to be less suitable for prediction. Probability Density Histogram related to the Random Forest method is shown in Fig. 4.74. Figure 4.75 shows a histogram of residues, i.e. the frequency of the difference between the actual and predicted values. The residue can be understood as the size of the error that we make at the selected point in the prediction. It should be noted that when using the Random Forest model, bin 0 to 20 has the largest representation. Extremes, in the form of residues −70 and 70, have the smallest proportion of the Fig. 4.71 Basic information—Random Forest (Wolfram Research 2020)

176

4 Comparison of Different Methods

Fig. 4.72 Report—Random Forest Wolfram Research (2020)

Fig. 4.73 Comparison Plot—Random Forest (Wolfram Research 2020)

4.7 Random Forest

177

Fig. 4.74 Probability Density Histogram—Random Forest (Wolfram Research 2020)

Fig. 4.75 Residual Histogram—Random Forest (Wolfram Research 2020)

entire histogram, as they have occurred the least amount of times. Figure 4.76 then captures the residues graphically. The value of Standard Deviation for this experiment is 68.7212. The value of Mean Cross Entropy is 7.14466, Mean Deviation is 57.0969, Mean Square is 4722.6 and Evaluation Time is equal to 0.011. The time series, including the two-month prediction, is shown in Fig. 4.77. The chart shows that the gold price prediction is not very accurate when using the Random Forest method. The curve of the predicted price (purple) does not accurately copy the curve of the actual price of gold in CZK (red). The Random Forest method failed to predict the development of the curve’s motion accurately, having only partially predicted the shape of the curve and the

178

4 Comparison of Different Methods

Fig. 4.76 Residual Plot—Random Forest (Wolfram Research 2020)

Fig. 4.77 Time series with prediction—Random Forest (Wolfram Research 2020)

direction of development. Deviations are very noticeable, and this model seems unsuitable for use in time series prediction. The resulting residues are captured in the chart for the monitored period in Fig. 4.78. Figure 4.79 shows a detailed view of the time series prediction for two calendar months using the Random Forest method. The time series of actual values, including the added prediction for two calendar months, can be seen in Fig. 4.80.

4.7 Random Forest

179

Fig. 4.78 Residues—Random Forest (Wolfram Research 2020)

Fig. 4.79 Prediction—Random Forest (Wolfram Research 2020)

Figure 4.81 shows a detailed view of the development of the time series of actual values of gold prices with the connection of prediction using the Random Forest method.

180

4 Comparison of Different Methods

Fig. 4.80 Time series with prediction—Random Forest (Wolfram Research 2020)

Fig. 4.81 Time series with prediction 2—Random Forest (Wolfram Research 2020)

4.8 Mathematica: Comparison A comparison of the time series of actual gold prices in CZK with time series predicted using seven different methods is shown in Fig. 4.82. It is clear from the figure that in the case of the Decision Tree, Linear Regression and Random Forest methods that they did not come close to the course of the red curve indicating the actual development of the price of gold. Therefore, these methods are not suitable for time series prediction. Figure 4.83 shows an overview of the individual time series compared to a time series containing observed gold price values (actual values), where the curve of this time series is marked in red. Neural Networks, Gradient Boosted Trees, Nearest Neighbors and Gaussian Process appear to be quite suitable prediction methods. The

4.8 Mathematica: Comparison

181

Fig. 4.82 Time series—comparison (Wolfram Research 2020)

smallest extent of residues was recorded in the case of the Neural Networks method, which thus came first in the accuracy of time series prediction. Therefore, it can be said that the Neural Networks method is the most suitable for time series prediction. Predictions for two calendar months using seven different methods are shown in Fig. 4.84. A very similar course of the prediction curve can be observed for the Random Forest and Gaussian Process methods, starting from the middle of the chart. The biggest fluctuations in the predicted prices are shown by the curve of the Nearest Neighbours method. The Decision Tree method can be ruled out as a suitable prediction tool for its prediction curve. Figure 4.85 shows the individual curve developments predicted using seven different methods. Time series statistics for all methods are shown in Fig. 4.86. Prediction statistics for all methods are shown in Fig. 4.87.

182

4 Comparison of Different Methods

Fig. 4.83 Time series and residues—comparison (Wolfram Research 2020)

4.8 Mathematica: Comparison

Fig. 4.84 Prediction—comparison (Wolfram Research 2020)

183

184

4 Comparison of Different Methods

Fig. 4.85 Prediction—methods individually (Wolfram Research 2020)

4.8 Mathematica: Comparison

185

Fig. 4.86 Time series statistics—comparison (Wolfram Research 2020)

Fig. 4.87 Prediction statistics—comparison (Wolfram Research 2020)

References Andone, I., and N.A. Sireteanu. 2009. A combination of two classification techniques for businesses Bankruptcy prediction. SSRN Electronic Journal [online]. Available at https://ssrn.com/abstract= 1527726 Bajer, L., Z. Pitra, and M. Holeˇna. 2015. Benchmarking Gaussian processes and random forests surrogate models on the BBOB noiseless testbed. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation 1143–1150. ACM. Breiman, L. 2001. Random forests. Machine Learning 45: 5–32. Fotr, J. 2006. Manažerské rozhodování: Postupy, metody a nástroje [Managerial decision making: Procedures, methods and tools]. Prague: Ekopress. Friedman, J.H. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis, 367– 378. Garcia, V., E. Debreuve, and M. Barlaud, 2008. Fast k nearest neighbour search using GPU. In Workshops IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 1–6. Hastie, T. 2009. The elements of statistical learning. New York: Springer Publishing. Hendl, J. 2004. Pˇrehled statistických metod zpracování dat: analýza a metaanalýza dat [Overview of statistical methods of data processing: analysis and meta-analysis of data]. Prague: Portál. Hindls, R. 2007. Statistika pro economy [Statistics for economists]. Prague: Professional publishing. Isobe, Y., and H. Tamada. 2018. Are identifier renaming methods secure? An evaluation focuses on opcodes using random forest. In 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 322–328. Klyuchnikov, N., and E. Burnaev. 2020. Gaussian process classification for variable fidelity data. Neurocomputing 397: 345–355. Liang, D., B. Liu, J. Wang, and L. Ying. 2009. Accelerating SENSE using compressed sensing. Magnetic Resonance in Medicine. 62: 1574–1584. Ma, W., G. Lin, and J.L. Liang. 2020. Estimating dynamics of central hardwood forests using random forests. Ecological modelling, 419.

186

4 Comparison of Different Methods

Mocnik, F. 2020. Am improved algorithm for dynamic nearest-neighbour models. Journal of Spatial Science. https://doi.org/10.1080/14498596.2020.1739575. Moravˇcíková, D., A. Križanová, J., Klieštiková, and M. Rypáková. 2017. Green marketing as the source of the competitive advantage of the business. Sustainability, 9(12). Natekin, A., and A. Knoll. 2013. Gradient boosting machines. Frontiers in Neurobotics, 1–21. Ramo, R., and E. Chuvieco. 2017. Developing a random forest algorithm for MODIS global burned area classification. Remote Sensing, 9(11). Rasmussen, C.E., and C.K. Williams. 2006. Gaussian processes for machine learning. The MIT Press. Sagi, O., and L. Rokach. 2020. Explainable decision forest: Transforming a decision forest into an interpretable tree. Information Fusion 61: 124–138. Tang, F., and H. Ishwaran. 2017. Random forest missing data algorithms. Statistical Analysis and Data Mining 10 (6): 363–377. Valášková, K., T. Klieštik, L. Švábová, and P. Adamko. 2018. Financial risk measurement and prediction modelling for sustainable development of business entities using regression analysis. Sustainability, 10(7). Wolfram Research, Inc. 2020. Mathematica, verze 12.1, Champaign, IL. Xiao, H., and G. Xu. 2020. Neural decision tree towards fully functional neural graph. Unmanned Systems 8 (3): 203–210.

Chapter 5

Conclusion

In this publication, time series were addressed within case studies by means of experiments, primarily with a focus on their analysis and subsequent prediction. This publication aimed to provide an overview of predictive models that can be used for predicting selected time series with their subsequent application and evaluation of their applicability for a given problem. Here, the time series consisted of daily prices of gold (USD/oz) for a longer period of time. When applied in the real world, thorough pre-processing, construction of functions, and selection of functions in term of predictive models is necessary. For time series predicting, the case studies used both econometric models and models based on artificial neural networks. In terms of traditional statistical methods, one case study used e.g. one of the basic methods of time series analysis called linear regression. Regression analysis can be generally included among the globally most widely used methods. In terms of performance and error rate, the case studies in this publication, which deals with a suitability of applying various predictive models for predicting time series of price of gold, linear regression, however, is one of the least suitable methods. This method is unable to work with non-linearity in data, which makes in an unsuitable method for the analysis and predicting price of gold. Another statistical method used for the analysis and predicting of price of gold time series is the so-called Gaussian process. This method is presented using distribution through all suitable functions, where the characteristics of these functions are determined on the basis of covariance functions. Gaussian process is an interesting technique, whose popularity has grown significantly in recent years. This method was used for one case study; in its application for the price of gold time series, its extent of residuals was one of the smallest compared to other methods, and its prediction was very accurate. It is a very accurate and suitable statistical method for time series analysis and prediction. Another statistical method used within the experiments is the method of exponential smoothing, where the price of gold prediction curve is smoothed. Its application showed that this method is quite suitable for time series predicting at different curve setting. The extent of residuals was not large; however, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_5

187

188

5 Conclusion

the predictions for 44 trading days differ significantly at different curve setting. One of the most widely used method for time series analysis and predicting is the so-called ARIMA model, one of econometric models, the same as exponential smoothing. ARIMA models are suitable especially for short-term predictions, in the situations where the data of explanatory variables are not available or where the model shows poor predictive ability. In the case of applying the model in the price of gold time series within a case study, it was found that this model appears to be very suitable due to nearly zero residuals after summing, where even in normal distribution of residuals, the peak of the curve is at zero point. This model can thus be used as a predictive model for time series, which is not surprising in the case of the ARIMA model. As for the methods of artificial intelligence, more specifically, when dealing first with data-mining techniques, which were also used within the case studies, first method to be mentioned is the method of decision trees, as it is a basis for other methods used within the publication, such as Gradient Boosted Trees or Random Forest. Nevertheless, the Decision Tree method appeared not to be suitable within the experiment focused on the comparison of models applicable for predicting price of gold time series. The difference of the actual values and values predicted using this method was striking at individual data points. When comparing the actual curve of the development of price of gold, it can be concluded that the curve of the predicted values successfully captures a long-term trend at some points of the graph, but in spite of this, the method does not appear to be suitable for predicting the price of gold time series. The same case is the application of Random Forest for predicting the price of gold. Much better appears to be another data-mining method, Gradient Boosted Trees. Its principle consists in gradual creation of models, where each following model is created with respect to the errors of the previous models. The model used within one of the experiments appeared to be very suitable and accurate, with accurate prediction and low residuals. Similar case is the application of the Nearest Neighbours method, which is referred to as k-NN (k-Nearest Neighbours), and is one of the machine learning methods included in the artificial intelligence methods. It is applied especially for regression and classification. For the experiments carried out within the case studies, this method appears to be suitable, since using this method, the predicted time series (the curve) copied the curve of the actual values of prices of gold. The residuals within this model were not large. The most successful group of methods used within the case studies for time series analysis and prediction appeared to be the methods of neural networks. Three types of networks were used, specifically Multi-Layer perceptron (MLP), Long Short Term Memory (LSTM), and Radial basis function (RBF). In the case of RBF networks, the most successful network achieving the performance of more than 98% and with determined 1 input variable (time—date), 1 output variable (price of gold), and 1day delay of the time series. The MLP networks show a very similar performance of more than 98% in training, testing, and validation datasets. The most successful MLP network in terms of time series prediction was the network with 5 input variables (time—date, day of the week, day of the month, month, year), 1 output variable

5 Conclusion

189

(price of gold), 1-day delay of the time series. The best evaluation in terms of the smallest extent of residuals was achieved by LSTM networks. All the aforementioned methods based on neural networks appear to be suitable for predicting the time series of commodity prices. In conclusion, it can be stated that not all methods used are suitable for time series predictions; specifically, not all statistical methods or methods based on artificial intelligence are a suitable tool for time series predictions. However, it can be stated that the most suitable methods definitely include the methods based on neural networks, provided that there is correct setting and an appropriate number of variables for each group of networks.