219 29 10MB
English Pages 199 [197] Year 2021
Studies in Computational Intelligence 979
Jaromír Vrbka
Using Artificial Neural Networks for Timeseries Smoothing and Forecasting Case Studies in Economics
Studies in Computational Intelligence Volume 979
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, selforganizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/7092
Jaromír Vrbka
Using Artificial Neural Networks for Timeseries Smoothing and Forecasting Case Studies in Economics
Jaromír Vrbka ˇ Institute of Technology and Business in Ceské ˇ Budˇejovice, Ceské Budˇejovice, Czech Republic
ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-75648-2 ISBN 978-3-030-75649-9 (eBook) https://doi.org/10.1007/978-3-030-75649-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Introduction
The majority of collected data show a certain time structure. In some cases, this structure is hidden; however, there are methods to extract relevant information from the available data using this time structure. The knowledge of time series modelling is a basic skill in the field of data science, since there as specific structures for this data type which can be examined in all situations. Time series enables the analysis of the main patterns, such as trends, seasonality, cyclicality, or data irregularities. Time series analysis is applied, e.g. for analysis of stock markets, recognition of patterns, earthquake prediction, economic predictions, census, etc. Time series forecasts use historical empirical regularities for creating future projections, which are guided by theoretical understanding of economic processes. Economic forecasts are used for a wide range of activities, including setting monetary and fiscal policy, state and local budget, financial management and financial engineering. The key elements of economic forecasts include selecting the model(s) of predictions suitable for a given problem, assessment and communicating the uncertainty related to forecasts and protection against the model instability. For example, in the case of investments, time series follows the movement of selected data points, such as security price, for a specified period of time, where the data points are recorded at regular intervals. There is no minimum or maximum period, which enables the collection of the data in a way that provides the information required by an investor or analyst that analyses the activity. Time series analysis can be useful to determine how a given asset, security or economic variable changes over time. The method of predicting time series is the most reliable in the case that the data represent a longer period of time. The information about the conditions can be obtained by measuring the data at various time intervals—hourly, daily, monthly, quarterly, yearly, or at any other time interval. Predictions are most accurate if they are based on a large number of observations for a longer period of time so that it is possible to measure the patterns in given conditions. The main advantages of time series analysis include the fact that there are certain functions that cannot be captured by means of normal regression models, while the methods connected with time series can. Furthermore, there can be mentioned certain dependencies that cannot be captured by means of conventional models, especially factors such as seasonality, where the effects of trends cannot be fully explained by means of models such as linear regression, but by means of time series models. This v
vi
Introduction
shows that time series is useful if we try to capture aspects that usually change over time. For example, data on prices at the end of the day are very difficult to model if only regression models are used; however, time series uses a wide range of models which were proved to capture such complex phenomena as the dependence between random variables, also known as serial correlation. The most important advantage of time series consists in its ability to predict the future value of data. There are various methods to obtain prediction. There can be used, e.g. regression models, models based on the decomposition of time series, models of so-called smoothing by means of function or models based on neural networks (machine learning). Before examining the methods of machine learning for time series, it is recommended to make sure that conventional methods of predicting linear time series cannot be used. Such models can be focussed on linear relationships; however, they are sophisticated and work well in a wide range of problems, provided that the data are prepared properly and the method is well-configured. From the conventional methods, there should be mentioned, e.g. autoregressive model (AR), moving average (MA), autoregressive moving average (ARMA), ARIMA, seasonal autoregressive integrated moving average (SARIMA), seasonal autoregressive integrated moving average with exogenous regressors (SARIMAX), vector autoregression (VAR), vector autoregressive moving average (VARMA), vector autoregressive moving average with exogenous regressors (VARMAX), simple exponential smoothing (SES), or holt winter exponential smoothing (HWES). Time series predictions have been dominated by linear methods for decades. Linear methods show an easy development and implementation and are relatively easy to understand and interpret. However, it is important to understand the limitations of linear models; they are not able to capture nonlinear relationships in data. Nevertheless, neural networks have recently appeared to be an alternative tool for predicting. The naturally nonlinear structure of neural networks is particularly useful to capture a complex basic relationship in many real-world problems. Neural networks are perhaps more versatile methods for predicting, as they are able to find nonlinear structures in a problem but can also model linear processes. The disadvantages of time series analysis include the fact that the method assumes that past trends will go on forever and that the extrapolation of data based on historical information provides valid results. In fact, e.g. sales of products may be influenced by competition, especially in relation to the availability of new products in the market. A big problem related to the time series prediction is the fact that the models of ARIMA autoregressive integrated moving average type is not capable of long-term predictions, since they are dependent on previous values. Time series prediction thus cannot reflect what is happening now, since it does not use up-to-date information. One possibility is to add regressors containing up-to-date information so that it is possible to combine time series for the variable in question and other information on the time series reflecting what is considered a current period of time in the research. It is important, however, not to include too many regressors, as their overuse and over combined model may result in incorrect ad confusing prediction. Another problem is the fact that most phenomena that need to be predicted are affected by factors at different levels. Therefore, model components need to be added, which take seasonal
Introduction
vii
influences into account. All methods that could be used for time serious predictions have certain advantages and disadvantages, and so do their models; however, the only thing that primarily matters is the nature of the data, as a model that was successfully used to solve a problem in one situation, the most efficient model, is not always the most suitable one for another application. It is therefore necessary to consider possible application of various methods followed by evaluation which of them is the most suitable solution for a given problem. In the case of time series analysis and prediction, e.g. the free programme, Python, can be used as a software solution; however, there are several slightly more userfriendly programmes, such as Tibco’s Statistica or Wolfram Mathematica.
Contents
1 Time Series and Their Importance to the Economy . . . . . . . . . . . . . . . . 1.1 Trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Non-linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Conditioned Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 3 3 3 4
2 Econometrics—Selected Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Arima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Exponential Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Interrupted Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Fourier Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Volatility Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 8 9 9 15 16 18 19 20 30 32
3 Artificial Neural Networks—Selected Models . . . . . . . . . . . . . . . . . . . . . 3.1 Radial Basis Function Neural Networks (Explanation, Usage) . . . . . 3.1.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Multi-Layer Perceptron Neural Networks . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Long Short Term Memory Neural Networks . . . . . . . . . . . . . . . . . . . . 3.3.1 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35 36 37 38 71 74 78 111 111 118 134
ix
x
Contents
4 Comparison of Different Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Gaussian Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Gradient Boosted Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Nearest Neighbours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Mathematica: Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
137 137 137 140 143 149 149 155 156 160 162 167 169 172 175 180 185
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Chapter 1
Time Series and Their Importance to the Economy
There is no doubt that data, as they are, refer to results of everyday human activities, containing useful quantitative information, although only historical. What presents one of the examples of the data are time series (Zhou et al. 2020). According to Rafsanjani and Samareh (2016), time series or numeral values of a specific indicator changing in time constitute an integral part of our lives and exist throughout the real world. Typical examples are EEG records in medicine, data on the population development within a particular demographic territory or economic time series that monitor price inflation, GNP, export, population and other findings revealing causal relations (Pravilovic et al. 2017). Chandra (2015) argues that it is currently impossible to make crucial economic decisions without thoroughly analysing key economic indicators and their relations. The author suggests that economy, irrespective of it relating a business or country, sees the importance of time series in planning and predicting price development and already mentioned economic indicators. According to Wang and Han (2015), the economy strives for defining the development of the monitored economic indicator from previous periods, revealing causes of fluctuation and predicting the future. Rostan and Rostan (2018) claim that it is the prediction that plays the central role in analysing time series, presenting an intricate task that requires employing conventional methods. What is also disputable is the prediction reliability, as the value of indicators depends on various factors, other indicators, conditions and a result of the decision-making process of living organisms, i.e. people, whose behaviour is hard to predict (Reid et al. 2014). A large scale of scientific fields applies time series and analyses them in various studies and researches relating to the economy. Horák et al. (2019) compared the accuracy of smoothing time series through regression analysis and artificial neural networks on the export from The USA to PRC. The study focused on applying and contributing artificial neural structures to practice. The results showed that ANN could effectively learn correlations in time series. Vrbka et al. (2019) arrived at analogue findings, exploring the trade balance of countries of the EU and PRC. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_1
1
2
1 Time Series and Their Importance to the Economy
Kayalar et al. (2017) examined the correlation of oil prices, stock market indexes and exchange rates in various economies divided into developing states and importer markets or oil exporters. The authors concluded that exchange rates and stock market indexes of most of the oil-exporting countries generate greater dependence on the oil price than developing markets with oil importers. Baek and Kim (2019) observed how oil prices influence exchange rates of selected countries of SSA—SubSaharan Africa. The authors thoroughly analyzed the asymmetric effects of changes in oil prices within their modelling process—non-linear autoregressive distributed lag model. The results show that changes in oil prices will asymmetrically affect the actual exchange rates on a long-term basis, which means that the fluctuation of real exchange rates reflects the growth in oil prices more than their fall. Kodam et al. (2017) applied a recurrent neural network to time series of Bitcoin exchange rates denominated in EUR. The results considered only a unified market within a relatively short term, requiring, therefore, a more comprehensive survey to provide a more valid conclusion. Šrenkel and Smorada (2014) explored the development of the financial sector in Slovakia before, during and after the crisis. The authors compared data with the whole economic development, carefully analyzing debt, liquidity and profit ratios within 2007–2012. They concluded that 2009 indicated the lowest values, after which there was an improvement in 2010–2011, while 2012 saw another drop. Artl and Artlová (2003) claim that if experts focusing on time series know their characteristics based on which they are able to classify them, they will know better their calculation specifics. The authors further suggest several groups of time series widely applied in practice: descending, ascending, interval, and instantaneous, shortterm and long-term, stochastic and deterministic, equidistant and non-equidistant, monetary and naturally expressed time series or absolute and derived time series etc. Wen et al. (2019) state that the course of time series shows certain specifics including trend, seasonality, non-linearity and conditioned heteroscedasticity.
1.1 Trend De Leo et al. (2020) defines the term ‘trend’ through long-term changes in the behaviour of a time series, i.e. through the marked tendency for the development of the monitored phenomenon. He further refers to it as a result of long-term one-way exerting factors, such as market conditions, production technology, demographic conditions etc. Proietti and Grassi (2015) argue that a trend can acquire various forms: declining, growing, smoothing, variable, consistent or steep, which are liable to changes through the time; i.e. its shows signs of a cycle.
1.2 Seasonality
3
1.2 Seasonality Song and Chung (2014) claim that modelling seasonality in the time series presents an essential topic in the field of statistics. Ruling out the concept of the seasonality of the model may result in a considerable distortion. If we intend to model the average monthly temperature, excluding seasonal figures may lead to bias in forecasted values. The term ‘seasonality’ implies a periodical and systematic fluctuation within the time series. Artl and Artlová (2007) declare that the vacillation takes place within one calendar year, annually repeating in the same or modified form. The main reason for these periodical changes mostly consists in changing seasons and various human habits. Seasonality, which can be either regular or irregular, is the most evident in shortterm high-frequency time series. Parameters indicating seasonality comprise seasonal factors of which the sum equals zero.
1.3 Non-linearity Tsay (2005) argues that economic or financial relations are non-linear. The most distinctive traits of almost all economic time series are structural reversals, trend changes and variability. At times, this issue may lead to significant changes in their auto-correlation structures. The author further claims that non-linearity can consist in different average differentiations or different average coefficients of growth for specific periods. Where conventional linear models fail to identify this behaviour in economic time series, the linear models will do the thing (Galka and Ozaki 2001).
1.4 Conditioned Heteroscedasticity Kokoszka et al. (2017) claim that conditioned heteroscedasticity mostly takes place in financial and economic time series, determining that the yield logarithm is normally distributed with the scatter changing in relation to the time. Conditioned heteroscedasticity adopts the following formula: (ln X t − ln X t − 1)2 = a + p (ln X t − ln X (t − 2))2 + ut where: Xt a Xt-1 a ut
values in the time series in the time t changed by a unit parameter calculated by the least square method random element
(1.1)
4
1 Time Series and Their Importance to the Economy
If parameter p equals zero, no conditioned heteroscedasticity exists in the time series. Tsay (2005) argues that empirical and theoretical analyses of financial time series apply the fundamental principle that logarithms of yields (coefficient growth logarithms) are commonly distributed with the constant mean value and constant scatter. It is so because prices cannot have negative values; i.e. time series are presumed to bear logarithmic-normal distribution, with the higher frequency in financial time series. The author further states that compared with a normal distribution, the time series distribution of logarithmic yields is sharper. Spearman’s rank correlation test is presumably the most applicable method for heteroscedasticity.
References Arlt, J., and M. Arltová. 2003. Finanˇcní cˇ asové rˇady: Vlastnosti, metody modelování, pˇríklady a aplikace [Financial time series: Properties, modeling methods, examples and applications]. Prague: Grada Publishing. Arlt, J., and M. Arltová. 2007. Ekonomické cˇ asové rˇady [Economic time series]. Prague: Grada Publishing. Baek, J., and H. Y. Kim. 2019. On the relation between crude oil prices and exchange rates in sub-Saharan African countries. A nonlinear ARDL approach. 1–12. Chandra, R. 2015. Competition and collaboration in cooperative coevolution of Elman recurrent neural networks for time-series prediction. IEEE Transactions on Neural Networks and Learning Systems 26 (12): 3123–3136. De Leo, F., A. De Leo, G. Besio, and R. Briganti. 2020. Detection and quantification of trends in time series of significant wave heights: An application in the Mediterranean Sea. Ocean Engineering. 202. Galka, A., and T. Ozaki. 2001. Testing for nonlinearity in high-dimensional time series from continuous dynamics. Physica D, 158: 1–4, 32–44. Horák, J., P. Šuleˇr, and J., Vrbka. 2019. Comparison of neural networks and regression time series when predicting the export development from the USA to PRC. In Proceedings of 6th International Scientific Conference Contemporary Issues in Business, Management and Economics Engineering. Kayalar, D.E., C.C. Küçüközmen, and A.S. Selcuk-kestel. 2017. The impact of crude oil prices on financial market indicators: Copula approach. Energy Economics 61: 162–173. Kodama, O., L. Pichl, and T. Kaizoji. 2017. Regime change and trend prediction for bitcoin time series data. CBU International Conference Proceedings 5: 384–388. Kokoszka, P., G. Rice, and H.L. Shang. 2017. Inference for the autocovariance of a functional time series under conditional heteroscedasticity. Journal of Multivariate Analysis 162: 32–50. Pravilovic, S., M. Bilancia, A. Appice, and D. Malerba. 2017. Using multiple time series analysis for geosensor data forecasting. Information Sciences 380: 31–52. Proietti, T., and S. Grassi. 2015. Stochastic trends and seasonality in economic time series: New evidence from Bayesian stochastic model specification search. Empirical Economics 48 (3): 983–1011. RafsanjanI, M.K., and M. Samareh. 2016. Chaotic time series prediction by artificial neural network. Journal of Computational Methods in Sciences and Engineering 16 (3): 599–615. Reid, D., A. Hussain, H. Tawfik, and E. Ben-Jacob. 2014. Financial time series prediction using spiking neural networks. PLoS ONE, 9(8).
References
5
Rostan, P., and A. Rostan. 2018. The versatility of spectrum analysis for forecasting financial time series. Journal of Forecasting 37 (3): 327–339. Song, Q., and J.W. Chung. 2014. On a new framework of autoregressive fuzzy time series models. Industrial Engineering and Management Systems 13 (4): 357–368. ˇ and M. Smorada. 2014. Financial ratios analysis of financial institutions in the Slovak Šrenkel, L, Republic during and after the global financial crisis. In Managing and Modelling of Financial Risks: Proceedings: 7th International Scientific Conference: 8–9th September Ostrava, 793–798. Bratislava, Slovakia: VŠB - Technical University of Ostrava Univ Econ Bratislava, Fac Business Management, Dept Corp Finance. Tsay, S.R. 2005. Analysis of financial time series, 2nd ed. Hoboken, New Jersey: Wiley Interscience. Vrbka, J., P. Šuleˇr, V. Machová, and J. Horák. 2019. Considering seasonal fluctuations in equalizing time series by means of artificial neural networks for predicting development of USA and People´s Republic of China trade balance. Littera Scripta 12 (2): 178–193. Wang, X.Y., and M. Han. 2015. Improved extreme learning machine for multivariate time series online sequential prediction. Engineering Applications of Artificial Intelligence 40: 28–36. Wen, M., P. Li, L. Zhang, and Y. Chen. 2019. Stock market trend prediction using high-order information of time series. IEEE Access 7: 28299–28308. Zhou, M.N., J.Z. Yi, J.Q. Yang, and Y. Sima. 2020. Characteristic representation of stock time series based on trend feature points. IEEE ACCESS 8: 97016–97031.
Chapter 2
Econometrics—Selected Models
Econometrics is currently a rapidly developing field of study, referring not only to common economics (macroeconomy and microeconomy), but also specialized economic areas such as financial and spatial economics (Andrikopulos and Gkountanis 2011). According to Zheng and Hurn (2019), econometrics has recently been enjoying a reputation of fertile grounds for research with the availability of reliable information via high-frequency data. The authors further claim that econometrics predominantly consists in systematically seeking quantitative answers to various problems and issues from economics, building on economic theories related to the problem at issue. We need to say that econometric techniques and tools may also apply to areas other than economics, e.g. sociology. The scientific literature informs that professional experience provides various, yet conceptually equivalent definitions of econometrics. Magnusson (2016) declares that econometrics involves unorthodox methods for assessing mathematic economics of numeral models, validating economic theories by modern statistical techniques. Pineda (2006) regards econometrics as a quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation relating to convenient conclusive methods. Geweke et al. (2007) consider econometrics as a technique to analyse empirical economic relationships for the ex-post testing of economic theories, prognoses, decision-making and policy assessment. Irrespective of an accepted definition, Pinto (2011) considers econometrics as a most applicable, accurate and convenient method in economics. The author further regards the discipline as integration of findings from various fields of economy, statistics and mathematics. It covers different spheres of applied economics, testing economic theories, informing policy creators or predicting future behaviour. Andrikopoulos and Gkountanis (2011) argue that the fundamental principle of econometrics consists in creating models of monitored phenomena. The example enables an illustration of a real situation while being dependent on the difficulty of the examined phenomenon and incapable of setting out details by available theoretical means. The © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_2
7
8
2 Econometrics—Selected Models
prototype subsequently demonstrates distinctive traits of the economic phenomenon, disregarding uninteresting features. Econometrics verbatim means ‘measuring the economy”. The measuring takes place through various econometrics models and methods. Bendic et al. (2016) characterizes econometric models as economic principles established between variables of the interdependent system. Pineda (2006) points out that the econometric models involve directly measurable variables. However, the econometric theory also discusses non-measurable quantities. They are applicable in specific phases of econometric analysis. We call them dummies. They involve non-economic factors specifying traits of statistical units, thus receiving useful information on principles of econometric phenomena and processes. Examples of these phenomena are individuals (nationality, sex, skin colour etc.), seasonality settlement in time series (seasonal variables), and location setting (east or west or individual regions within a state). Regarding the demanding nature of the method, Magnusson (2016) recommends creating econometric modelling via econometric software. What is essential is to use a relevant data source. Unreliable, manipulated, or distorted information may lead to compromising the reliability of the model. Seeking a valuable data source can be very time consuming and, at times, impossible. The following text briefly describes econometric models and related methods.
2.1 Linear Models Hossain and Ghahramani (2016) argue that linear models, or linear regression models, apply to model (explication of examined values) of the continuous result variable (dependent variable). One or more predictors (independent variables) then demonstrate the outcome. Classical linear regression model presents the least complicated type, whose parameters derive from a simple formula. The input data, however, must comply with stringent requirements. The professional experience often brings about a situation when the available data fail to meet the essential prerequisites, as the limited number of events applicable to the classical linear model cannot cover the complexity of the reality. It is, therefore, necessary to relax some requirements of the classical linear model and examine the traits of newly created models. Linear models include an integrated linear regression model and a generalized linear regression model. A generalized integrated linear regression model concerns even a more generalized type of model, comprising the linear integrated and generalized linear model (Eguchi 2018).
2.2 Arima
9
2.2 Arima Pai and Lin (2005) argue that ARIMA Model (also called Box-Jenkins Method), or autoregression integrated moving average involves one of the most popular techniques of time series prediction, commonly applied thanks to its unique ability to predict and prognosis. ARIMA, however, presents a rather complex model imposing stringent requirements for its use. The model results mostly depend on experience, knowledge and decisions of persons conducting the analysis (Frances 2000). Mélard and Pasteels (2000) proclaim that ARIMA Models contribute to understanding time series and enable predictions of their future behaviour. The methods are convenient for short-term forecasting when data explaining variables are not available or when the model lacks abilities to predict.
2.2.1 Case Experiment 1 Experiment 1 deals with the use of the ARIMA method for predicting the price of gold. Figure 2.1 shows the parameter settings of the ARIMA model.
Fig. 2.1 Setup ARIMA (Tibco 2020)
10
2 Econometrics—Selected Models
Fig. 2.2 Results ARIMA (a) (Tibco 2020)
Figures 2.2 and 2.3 show the results of the model settings (Fig. 2.4). Figure 2.5 is a graph of the formed residues, i.e. the difference between the original and the predicted value. From the shape of the curve connecting the residues, it can be concluded that the differences in the predictions and the actual development of the price of gold are not excessively large and the proposed model could be the optimal model. Residues reach approximately +80 and −120 in the extremes. Figure 2.6 shows the normal distribution of residues (copying the Gaussian curve). The top of the curve is correctly placed at the value of 0 CZK. The Gaussian curve is not shifted to the left—to negative values or to the right—to positive values. The sum of the residues will probably be zero. This is very good news for the model used. We can consider the small number of residues and their relatively low values to be the most important. This can also confirm the accuracy of the exact result. Figure 2.7 shows a graph of normal probability, where the blue circles represent the predicted values that we ask to lie as close as possible to the red curve. From the figure it can be concluded that there are no excessive deviations from the red curve. Figure 2.8 shows a detrended normal probability graph. Half-Normal Probability plot is shown in Fig. 2.9. The graphical expression of ρk values for individual k ≥ 0 is called a correlogram. An example of the autocorrelation function correlogram is shown in Fig. 2.10. In the autocorrelation function correlogram, the emphasis is mainly on significant autocorrelations, as these suggest which values of the time series there is a relationship between. Figure 2.10 demonstrates the range of possible occurrence of the correct
2.2 Arima
Fig. 2.3 Results ARIMA (a) (Tibco 2020)
Fig. 2.4 Forecasts ARIMA (Tibco 2020)
11
12
Fig. 2.5 Plot of variable ARIMA (Tibco 2020)
Fig. 2.6 Histogram variable ARIMA (Tibco 2020)
2 Econometrics—Selected Models
2.2 Arima
Fig. 2.7 Normal Probability Plot ARIMA (Tibco 2020)
Fig. 2.8 Detrended Normal Probability Plot (Tibco 2020)
13
14
Fig. 2.9 Half-Normal Probability Plot ARIMA
Fig. 2.10 Autocorrelation Function ARIMA
2 Econometrics—Selected Models
2.2 Arima
15
Fig. 2.11 Partial Autocorrelation Function (Tibco 2020)
value. The figure shows standard errors, correlations, and especially the red limits of reliability highlighted in red. At the fifth Lag it can be observed that the value exceeds the confidence limit highlighted in red. In other cases, the values fit within the confidence limit. The detail of the overlap can be seen in Fig. 2.11.
2.3 Exponential Smoothing Qiao et al. (2020) points out that the application of equalizing methods commonly results in smoothing the time series to predict its future development. Moosa (2000) states that exponential smoothing present one of the most common techniques to forecast the exchange rate. The foreign professional literature abbreviates this method as SES (simple exponential smoothing), mostly applying to stationary time series. For the calculation of the oncoming period, we use the following formula (Lawrence et al., 2009): ∧ = p Sk + (1 − p)Sk∧ Sk+1
where: p ∧ Sk+1
smoothing parameter, prediction value in time k,
(2.1)
16
Sk
2 Econometrics—Selected Models
actual value in time k.
The smoothing parameter values may exist in (0; 1) interval. Some authors recommend limiting the value interval of the smoothing parameter to < 0.7; 1). Moose (2000) emphasizes correlation S∧1 = S1 , which means that the value of the first prediction equals to the actual first value. The first value usually presents the arithmetic mean of the first six values. The simple exponential smoothing and double exponential smoothing and its generalization often referred to as Holt’s Method belong to the most common types of equalizing.
2.3.1 Case Experiment 2 The second experiment is focused on the method of exponential adjustment, where the so-called smoothing takes place, i.e. the smoothing of the gold price prediction curve. Exponential adjustment is one of the simple methods used for smoothing and short-term prediction of time series. The meaning (weight) of a data point for the prediction value decreases exponentially with the time distance from the prediction (the age of the given data point). With this method, it is not necessary to determine the length of the section from which we calculate the adjusted value, because the calculation of adjusted values is, in contrast to moving averages, based on all previous observations. Eight different curve settings were used for this experiment. The first setting for exponential adjustment can be seen in Fig. 2.12. Figure 2.13 is a graph showing the course of the price of gold, where the blue curve represents the original time series of gold price development compared to the red curve showing the already smoothened (adjusted) time series using exponential adjustment and also the prediction for 44 trading days. The green curve then represents the resulting residues, i.e. the size of the difference between the original and the adjusted value. The second setting for exponential adjustment can be seen in Fig. 2.14. The second setting for exponential adjustment can be seen in Fig. 2.14. The second setting for exponential adjustment can be seen in Fig. 2.3. Figure 2.15 presents a graph showing the blue curve of the original gold price time series compared to the red curve, which represents an already smoothened time series with a subsequent prediction of 44 trading days. Again, a green curve showing the size of the residues can be observed. In this case, the residues are larger than when the first curve is used for interpolation. Figure 2.16 shows the setting for the third exponential adjustment curve. Figure 2.17 shows a graph in which the course of the gold price development can be observed again, where the blue curve represents the original time series of gold price development compared to the red curve showing the already smoothened (adjusted) time series with the help of exponential adjustment. The graph also shows a prediction of values for 44 business days using a red curve. The green curve then
2.3 Exponential Smoothing
17
Fig. 2.12 Setup Smoothing (Tibco 2020)
represents the resulting residues, i.e. the size of the difference between the original and the balanced value. The residues in this graph are very similar to the second curve. Figure 2.18 presents the settings for the fourth exponential adjustment curve. Figure 2.19 is a graph showing the evolution of the original gold price time series curve compared to the smoothened curve represented by the colour red. This curve also shows a prediction for 44 trading days. In the case of the green curve, these are again graphically represented sizes of residues, i.e. deviations of the smoothened values from the original values. In the case of the fifth smoothing, we can observe the settings in Fig. 2.20. Figure 2.21 shows a graph of the development of the original time series, shown by a green curve, compared to the evolution of time series smoothened using exponential adjustment (red curve). In this case too, a prediction was made for 44 trading days, which can be seen in the last part of the graph and is also shown in red. Green shows the resulting residues, which in this case have a relatively small extent. Figure 2.22 shows the settings for the sixth exponential adjustment curve which will be inserted into the graph. In the case of using the settings from Fig. 2.22, a graph is created, which is shown Fig. 2.23. In the case of this figure, the development of the original time series of gold price development and the development of the already adjusted time series can
18
2 Econometrics—Selected Models
Fig. 2.13 Exponential smoothing 1 (Tibco 2020)
be observed. In this case, the residues are relatively large, reaching values exceeding −200. The setting of the model, the next-to-last exponential adjustment, can be seen in Fig. 2.24. Figure 2.25 shows a graph of the development of the original time series, which is indicated by blue, compared to the adjusted time series. There is also a prediction for 44 trading days. Residues are marked with a green curve and in the extremes, they reach values up to −250. Figure 2.26 shows the last setting when using the exponential adjustment method. The last graph concerning the exponential adjustment method is shown in Fig. 2.27. Again, the development of the original time series is marked as a blue curve, compared to the red curve, which represents the already adjusted time series. Here, too, there is a prediction in the form of a red curve for 44 trading days. Residues are marked with a green curve.
2.4 Interrupted Time Series Professional literature usually defines time series as a continuous sequence of specific monitored phenomena on a repeated basis (often in regular intervals) in time (Bernal et al., 2017). Turner et al. (2020) consider interrupted time series (ITS) as one of the
2.4 Interrupted Time Series
19
Fig. 2.14 Setup smoothing 2 (Tibco 2020)
best quasi-experimental designs used for assessing the intervention effect. Bernal et al. (2017) also argue that ITS study sees the use of the time series of a specific result of interest to determine the trend “interrupted” by intervention in a particular moment. Ramsay et al. (2003) declare that ITS design amasses data in a larger number of equally distributed time points (annually, monthly, and weekly) before and after the intervention, while it is essential to know its moment. The time series slope, time series level changes or confidence interval of intervention parameters may assess the intervention effects. ITS also applies to various research areas such as health care, economy, politology, marketing research etc.
2.5 Fourier Transformation Fourier transformation involves principal instruments to process and analyse signals. The method is currently highly popular also due to its ability to quickly calculate large numbers while solving differential equations or elaborating an image. Fourier transformation consists in transferring signals from the time zone to the frequency zone, i.e. analysing a frequency spectrum (the signal content).
20
2 Econometrics—Selected Models
Fig. 2.15 Exponential Smoothing (Tibco 2020)
The professional literature also includes term Inverse Fourier Transformation (the original method’s inverse), dealing with transferring the signal from the frequency zone to the time zone (Radchenko and Stefanska 2016).
2.5.1 Case Experiment 3 The third experiment involves the so-called Fourier transform method. This procedure makes it possible to make an explicit description of the periodic behaviour of the time series and to identify significant components that contribute to the factual properties of the investigated process. Any time series can be expressed as a combination of a cosine (or sine) wave with different periods (how long it takes to complete the whole cycle) and amplitudes (maximum/minimum value during the cycle). This fact can be used to study periodic (cyclical) behaviour in a time series. Figure 2.28 shows the settings of the results of the Fourier analysis of one time series. Graphs shown in Figs. 2.29, 2.30, 2.31, 2.32 and 2.33 indicate the frequencies of the residue values based on the use of three time series adjustment options. The periodogram shown in Fig. 2.29 can be described as a time-limited period of implementation of the periodic random process. In this case, it shows the frequency of the
2.5 Fourier Transformation
21
Fig. 2.16 Setup Smoothing 3 (Tibco 2020)
residue values, i.e. the deviations from the actual measured values. In the first case, the Periodogram Values method was used. The periodogram in Fig. 2.30 shows the frequency of the residue values, i.e. the deviations from the curve of actual values. The log-periodogram of time series for non-negative Fourier frequencies was calculated. Figure 2.31 also shows the frequency of residue values, using the second time series adjustment method. In this case, the Spectral Density method was used. Figure 2.32 also shows residue values, using the second time series adjustment method. Figure 2.32 also shows residue values, using the second time series adjustment method. In this case, the Spectral Density method was used. Log-spectral analysis of time series for non-negative Fourier frequencies was performed. In the case of Fig. 2.33, a cosine coefficient was used for spectral analysis. Here, the values even go into the negative, and this is due to the fact that they show the dispersion of residues. In Fig. 2.33, a range with boundary points (extremes) from −50 to 50 can be observed. With the exception of two separate circles, which show a very large deviation from actuality. In the area of time series called spectral analysis, we see the time series as the sum of cosine waves with different amplitudes and frequencies. One of the goals of the analysis is to identify important frequencies (or periods) in the observed series. The initial tool for this is the periodogram. The periodogram shows a measure of the
22
Fig. 2.17 Exponential Smoothing 3 (Tibco 2020)
Fig. 2.18 Setup Smoothing 4 (Tibco 2020)
2 Econometrics—Selected Models
2.5 Fourier Transformation
Fig. 2.19 Exponential Smoothing 4 (Tibco 2020)
Fig. 2.20 Setup Smoothing 5 (Tibco 2020)
23
24
Fig. 2.21 Exponential smoothing 5 (Tibco 2020)
Fig. 2.22 Setup Smoothing 6 (Tibco 2020)
2 Econometrics—Selected Models
2.5 Fourier Transformation
Fig. 2.23 Exponential Smoothing 6 (Tibco 2020)
Fig. 2.24 Setup Smoothing 7 (Tibco 2020)
25
26
Fig. 2.25 Exponential Smoothing 7 (Tibco 2020)
Fig. 2.26 Setup Smoothing 8 (Tibco 2020)
2 Econometrics—Selected Models
2.5 Fourier Transformation
Fig. 2.27 Exponential Smoothing 8 Tibco (2020)
Fig. 2.28 Setup single series fourier analysis results (Tibco 2020)
27
28
2 Econometrics—Selected Models
Fig. 2.29 Spectral analysis 1 Periodogram Values Frequency (Tibco 2020)
Fig. 2.30 Spectral analysis 2 Log-periodogram Frequency (Tibco 2020)
2.5 Fourier Transformation
Fig. 2.31 Spectral analysis 3 Spectral Density Frequency (Tibco 2020)
Fig. 2.32 Spectral analysis 4 Log-spectral analysis Frequency (Tibco 2020)
29
30
2 Econometrics—Selected Models
Fig. 2.33 Spectral analysis 5 cosine coefficients frequency (Tibco 2020)
relative importance of possible frequency values that could explain the oscillating pattern of the observed data. In the case of Fig. 2.34, the Periodogram Values method was used to display the degree of relative importance of possible period values. In the case of Fig. 2.35, the Spectral Density method was used to obtain the output of the graph, i.e. to show the degree of relative importance of the possible values of the period. Fig. 2.36 shows the use of the Log Periodogram method, which was used to display the degree of relative importance of possible period values.
2.6 Volatility Models Wu and Wang (2020) point out that volatility presents a fundamental concept expressing instability and fluctuation changeable in time. The model mostly applies to measuring risks, assessing the variability rate of the examined quantity and relates to the conditioned scatter. As volatility is not an observable quantity, Horvath et al. (2020) emphasize its stringent requirements for its assessment and subsequent comparison of related models. The professional experience works with a wide range of various types of volatility models striving for the identification of their distinctive features. Historic volatility models and follow-up ARIMA processes were the first to serve these purposes.
2.6 Volatility Models
Fig. 2.34 Spectral analysis 6 Periodogram Values Period (Tibco 2020)
Fig. 2.35 Spectral analysis 7 Spectral Density Period (Tibco 2020)
31
32
2 Econometrics—Selected Models
Fig. 2.36 Spectral analysis 8 Log Periodogram Period (Tibco 2020)
These techniques, however, did not apply on a long-term basis, as ARCH models soon emerged, resulting in various modifications of these methods. GARCH, EGARCH, GJR GARCH models enjoy the widespread popularity today.
References Andrikopoulos, A.A., and D.C. Gkountanis. 2011. Issues and models in applied econometrics: A partical survey. South-Eastern Europe Journal of Economics 2: 107–165. Bendic, V., C. Mohora, D. Tilina, E. Turcu, and A. Nita. 2016. Multifunctional econometrics models of turnover dynamics of using primary factors of the economic process. In Vision 2020: Innovation Management, Development Sustainability, and Competitive Economic Growth. Bernal, J.L., S. Cummins, and A. Gasparrini. 2017. Interrupted time series regression for the evaluation of public health interventions: A tutorial. International Journal of Epidemiology 46 (1): 348–355. Eguchi, S. 2018. Model comparison for generalized linear models with dependent observations. Econometrics and Statistics 5 (1): 171–188. Franses, P.H.B.F. 2000. The econometric modelling of financial time series. International Journal of Forecasting 16 (3): 426–427. Geweke, J.F., J.L. Horowitz, and H. Pesaran. 2007. Econometrics: A bird’s eye view. The New Palgrave Dictionary.
References
33
Horvath, B., A. Jacquier, and P. Tankov. 2020. Volatility options in rough volatility models. Siam Journal on Financial Mathematics 11 (2): 437–469. Hossain, S.G. 2016. Shrinkage estimation of linear regression models with GARCH errors. Journal of Statistical Theory and Applications 15 (4): 405–423. Lawrence, K.D., R.K. Klimberg, and S.M. Lawrence. 2009. Fundamentals of Forecasting Using Excel. New York: IP. Magnusson, L.M. 2016. Econometrics in a formal science of economics: Theory and measurement of economic relations. Economic Record 92 (298): 509–511. Mélard, G., and J.M. Pasteels. 2000. Automatic ARIMA modeling including interventions, using time series expert software. International Journal of Forecasting 16 (4): 497–508. Moosa, I.A. 2000. Exchange rate forecasting: Techniques and applications. London: Macmillan Business. Pai, P., and C. Lin. 2005. A hybrid ARIMA and support vector machines model in stock price forecasting. Omega 33 (6): 497–505. Pineda, J.A.P. 2006. Spatial econometrics and regional science. Investigacion Economica 65 (258): 129. Pinto, H. 2011. The role of econometrics in economic science. An essay about the monopolization of economic methodology by econometrics methods. The Journal of Socio-Economics, 40(4): 436–443. Qiao, L., D. Liu, X. Yuan, Q. Wang, and Q. Ma. 2020. Generation and prediction of construction and demolition waste using exponential smoothing method: A case study of Shandong Province China. Sustainability, 12. Radchenko, V.M., and N.O. Stefanska. 2016. Fourier transformation of general stochastic measures. Theory of Probability and Mathematical Statistics 94: 143–149. Ramsay, C.R., L. Matowe, R. Grilli, J.M. Grimshaw, and R.E. Thomas. 2003. Interrupted time series designs in health technology assessment: Lessons from two systematic reviews of behaviour change strategies. International Journal of Technology Assessment in Health Care 19 (4): 613– 623. Tibco Statistica, v. 14.0.0. 2020. TIBCO Software Inc, Palo Alto, CA USA. Available from: https:// docs.tibco.com/products/tibco-statistica Turner, S. L., A. Karahalios, A. Forbes, M. Taljaard, J.M. Grimshaw, E. Korevaar, A.C. Cheng, L. Bero, and J.E. McKenzie. 2020. Creating effective interrupted time series graphs: Review and recommendations. Research Synthesis Methods. Wu, X., and X. Wang. 2020. Forecasting volatility using realized stochastic volatility model with time-varying leverage effect. Finance Research Letters, 34. Zheng, X., and S. Hurn. 2019. Editorial for the special issue on financial econometrics. China Finance Review International 9 (3): 310–311.
Chapter 3
Artificial Neural Networks—Selected Models
There is no doubt that artificial neural networks have been attracting the attention of a large number of scientists, experts and the uninitiated from all over the world for many years. It is so due to their ability to imitate the central nervous systems of living organisms in a far better way than previous techniques (Guresen and Kayakuthula 2011). The artificial neural network is currently an essential concept which Vochozka (2017) considers as an algorithm imitating the human brain. The author further argues that artificial neural networks rank amongst non-parametric, flexible instruments based on historical data and the ability to predict the future. Hamid and Habib (2014) define neural networks as a flexible and reliable field composed of principal and dominant elements—neurons. They allow the connection of various neural inputs and outputs, benefits, inhibition, while minimizing the influence of the defective neuron on the result. Vochozka et al. (2016) regard neural networks as a computer system based on integrated units, whose outputs usually process input data. Klieštik (2013) argues that artificial neural networks currently present one of the most popular techniques applicable in a wide range of fields. Apart from artificial intelligence, Ashoori and Mohammadi (2011) point out social and natural sciences, neuroscience, linguistics, technology, cognitive science or business environment, e.g. to model costs, enterprise valuation or bankruptcy prediction. Altun et al. (2007) proclaim that ANN ranks amongst the most attractive methods of the operational research of informatics; its practical application extensively exploits up-to-date technologies. Pao and Kaykutlu (2008) claim that ANNs are mostly suitable for analyticalintensive unidentifiable operations or modelling complex strategic decisions. The authors further suggest that a structure model and activity of the biological network are cornerstones of designing artificial neural networks. Bas et al. (2016) examined the structure of artificial neural networks, arguing that their type and geometry determine their composition. The system comprises three main constituents—artificial neural networks, the interconnection of artificial neurons and neural network layers. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_3
35
36
3 Artificial Neural Networks—Selected Models
The author suggests that artificial neural networks consist of interconnected artificial neurons transferring signals further transformed through specific functions of the transfer. Considering the pros and cons of ANNs, Sayadi et al. (2012) see advantages in their ability to generalize, learn from examples and definition of hidden non-linear dependences. Mileris and Boguslauskas (2011) state that neural networks mostly apply to analyses of masses of data laying down sets of criteria. The system works on a similar principle as a human. People, however, are not always able to process and interpret such an amount of information. Artificial neural networks never fail to meet established criteria or commit a human error. Kiruthka and Dilsha (2015) consider artificial neural networks as a rapid system allowing the identification of complex relations of individual input data. These statistics are not linear, requiring more than correlation and regression analysis. Echávarri et al. (2014) argue that the pros of artificial neural networks involve them carefully undertaking each task and being able to work with incomplete data. Liu et al. (2004) observe that structures use classical algorithms, allowing parallel calculations in several neurons at a time. Other advantages include the ability to learn, generalize or produce new data etc. The listed benefits contribute mostly to the area of the business environment. Rowland and Vrbka (2016) see disadvantages in illogical behaviour and the requirement for high-quality data. On the other hand, Echávarri et al. (2004) observe main drawbacks in a time-consuming preparation of the network, involving learning and strict requirements for handling the input data. Slavici et al. (2012) point out the extreme sensitivity to the organization, preparation and overall configuration. The method is applicable only under high computing power, including a longer processing time. According to Lahsasna et al. (2008), other cons of ANNs involve easy overtraining of the network, causing poor prediction results. The authors further suggest that it is essential to estimate the right moment to end the training. The professional experience has been employing a large number of artificial neural networks. The following text provides three frequently applied structures.
3.1 Radial Basis Function Neural Networks (Explanation, Usage) Pazouki et al. (2015) argue that RBF (Radial Basis Function) presents a network composed of three layers with feed forward signalling and learning through a teacher. The system can learn very quickly and express the similarity rate of the model to the prototype. As contrasted to other networks, the structure employs radial basis functions as activation functions. Jingfei et al. (2016) consider RBF neural networks as a simple method, although its generalizing ability is pretty strong. Gubana (2015) observes that the network topology consists of the input layer, hidden layer and output layer, which comprises linear processing units. Guan et al. (2016) emphasize
3.1 Radial Basis Function Neural Networks (Explanation, Usage)
37
the grave difficulty of determining the number of neurons in the hidden layer of the RBF. First-hand experience and previous findings point out the necessity to repeatedly test the number of neurons. Apart from that, there are doubts about RBF neural networks being convergent, or not. Hashemiho and Aghamohammadí (2013) see the basic concept of RBF in transforming the input data into the large space. RBF networks apply in approximating functions and predicting time series for cluster or classification tasks (3D object modelling, data fusions, image recovery, interpolation, language identification, time series chaotic modelling etc.) (Pazouki et al. 2015).
3.1.1 Mathematical Background According to Olej (2003), RBF (Radial basis function) networks are so-called threelayer neural networks. The first layer (input layer) is intended for the transmission of input values. The second one is the hidden layer, which consists of RBF units performing individual radial basis functions (RBF). The last (third) layer is the output layer, which usually has a linear character. Like MLP, RBF neural networks are also feed forward neural networks. In this type of network, the signal is transmitted from the inputs through the hidden to the output layer. For good network functionality, it is necessary that all neurons are connected in all layers. This means that neurons in adjacent layers are interconnected so that the out of one neuron is distributed in the inputs of neurons in the following layer. Gubana (2015) adds that the parameters of neural networks indicate the number of neurons and layers, depending on the nature of a relevant problem. Pazouki et al. (2015) state that RBF use radial basis function as an activation function; its basic principle is learning with a teacher. The authors also claim that in terms of approximation, those specified neural networks are natural, which is mainly due to the fact that they are approximated by means of functions affecting the resulting function only around the core of a relevant RBF neuron, not in its complete functional scope. In terms of classification, it can be stated that the application of RBF is also appropriate, since in many cases, a certain group of input vectors belongs to one of the classes which are sought for by means this neural network. According to Haykin (2001), RBF neurons are in the hidden layer. The author further adds that the radial basic functions (RBF) can be included in a special group of mathematic functions that increase or decrease monotonically with an increasing distance from the centre. Radial basic function can be described using the following formula: (x − c)T . (x − c) (3.1) h i (x) = Ø |sc|
38
3 Artificial Neural Networks—Selected Models
Fig. 3.1 RBF network (Moreno et al. 2011)
(X). where: x c sc
input vector, centre of radial basic function. radius of radial basic function. If only a one-dimensional input vector is entered, the formula is as follows: x − c h i (x) = Ø sc
(3.2)
Radial basic function RBF Ø: R can be replaced e.g. by linear, cubic, multi quadratic, or Gaussian functions, etc. The figure below (Fig. 3.1) shows an example of a RBF network topology.
3.1.2 Case The methods used for these experiments were the radial basis function (RBF) neural networks; specifically, they were used for predicting the development of the gold price time series. The dataset was divided into three basic groups: testing, training, and validation datasets. Experiment 1 1.
Input variable (time—date), 1 output variable (price of gold), time series lag 1 day
The first experiment includes one input variable—time (date), one output variable, which is, in this case, price of gold, and time series lag of 1 day. Table 3.1 shows the basic overview of the neural networks used for predicting the development of price of gold. There are listed five most efficient and most successful networks (with the
Net name
RBF 1-24-1
RBF 1-25-1
RBF 1-30-1
RBF 1-28-1
RBF 1-22-1
Index
1
2
3
4
5
0.984569
0.985352
0.985971
0.984290
0.985283
Training perf
0.984367
0.985628
0.984687
0.984337
0.984936
Test perf
0.983187
0.984634
0.985232
0.983711
0.983279
Validation perf
Table 3.1 Overview of networks—RBF (Tibco 2020)
1529.233
1452.197
1391.248
1556.676
1458.937
Training error
1517.664
1396.228
1488.792
1523.381
1466.988
Test error
1508.264
1379.799
1326.705
1461.534
1500.297
Validation error
RBFT
RBFT
RBFT
RBFT
RBFT
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Gaussian
Gaussian
Gaussian
Gaussian
Gaussian
Hidden Activation
Identity
Identity
Identity
Identity
Identity
Output activation
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 39
40
3 Artificial Neural Networks—Selected Models
best characteristics). It shall be noticed that the network performance is about 98%, which is considered a very good performance. In all cases, the same training algorithm RBFT was used, while the sum of least squares (SOS) was used as an error function, and the Gaussian curve is used for the activation of the hidden layer of neurons. The Identity function is used for the activation of neurons in the output layer. The weights of the neural networks are listed in Table 3.2; however, there is only a part of the table is shown for demonstration, since due to its large extent, it is not possible to present the whole table. Therefore, only the first four IDs of the weights are presented. For all five networks, correlation coefficients of the individual networks and data subsets were determined. These indicate that if there is a correlation between two processes, these are likely to be mutually dependent. Table 3.3 shows that the value of the correlation coefficients is high for all networks, achieving the values of around 0.98. In terms of the correlation coefficients’ values, it is required that the number is as high as possible (as close to 1 as possible); identical for each network and all datasets (training, testing, and validation), if possible. The differences between the values of the neural networks presented in Table 3.3 are minimal. Table 3.4 shows the prediction statistics by the individual neural networks and datasets. It can be seen from the table that in the case of the maximal prediction on the training, testing, and validation datasets, the values are very similar for all networks except for 3. RBF 1-30-1. This applies also to the minimal prediction. As for the maximum and minimum residuals, the smallest possible difference between maximum and minimum is required. The best network from the perspective of residuals is 2. RBF 1-25-1, which is characterized by the smallest residuals. Statistics of individual data are given in Table 3.5. Figure 3.2 shows the trend of the curve representing the price of gold in USD/Troy ounce (marked blue) compared with the prediction of retained five best networks. As a sample, the training, testing, and validation datasets are used. Figure 3.3 shows the prediction of time series for USD/oz, where the testing dataset is used as a sample. The neural network is 1. RBF 1-24-1. Figure 3.4 shows the prediction of time series for USD/oz, where the testing dataset is used as a sample. The neural network is 2. RBF 1-25-1. Figure 3.5 shows the graph of the prediction of the time series for USD/oz. The testing dataset is a sample; the network is 3. RBF 1-30-1. The test is marked red; the network is represented by red curve. Figure 3.6 shows the graph of prediction of the time series for USD/oz. As a sample, the testing dataset is used. The network is 4. RBF 1-28-1. The test is marked blue; the network is represented by a red curve. The comparison of the course of the testing dataset (a sample) and the result provided by the network 5. RBF 1-22-1 is shown in Fig. 3.7. In comparison with the testing data set, all networks show very good results. All of them could in this case be used for prediction while ensuring high performance
0.306415 Date-1 → hidden neuron 1
0.717736 Date-1 → hidden neuron 2
0.682075 Date-1 → hidden neuron 3
0.408113 Date-1 → hidden neuron 4
Date-1 → hidden neuron 1
Date-1 → hidden neuron 2
Date-1 → hidden neuron 3
Date-1 → hidden neuron 4
2
3
4
0.248113 Date-1 → hidden neuron 4
0.371509 Date-1 → hidden neuron 3
0.693962 Date-1 → hidden neuron 2
0.251321 Date-1 → hidden neuron 4
0.020000 Date-1 → hidden neuron 3
0.600000 Date-1 → hidden neuron 2
0.452830 Date-1 → hidden neuron 4
0.495283 Date-1 → hidden neuron 3
0.172830 Date-1 → hidden neuron 2
0.654151
0.438679
0.000566
0.698491
Connections 5. RBF Weight 1-22-1 values 5. RBF 1-22-1 0.358491 Date-1 → hidden neuron 1
Connections 4. RBF Weight 1-28-1 values 4. RBF 1-28-1
0.388679 Date-1 → hidden neuron 1
Connections 3. RBF Weight 1-30-1 values 3. RBF 1-30-1
0.211698 Date-1 → hidden neuron 1
Connections 2. RBF Weight 1-25-1 values 2. RBF 1-25-1
1
Weight Connections 1. RBF Weight ID 1-24-1 values 1. RBF 1-24-1
Table 3.2 Weights of networks—experiment 1 RBF (Tibco 2020)
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 41
42
3 Artificial Neural Networks—Selected Models
Table 3.3 Correlation coefficients—RBF NN (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. RBF 1-24-1
0.985283
0.984936
0.983279
2. RBF 1-25-1
0.984290
0.984337
0.983711
3. RBF 1-30-1
0.985971
0.984687
0.985232
4. RBF 1-28-1
0.985352
0.985628
0.984634
5. RBF 1-22-1
0.984569
0.984367
0.983187
of the network. In terms of the minimum residuals, the best network appears to be 2. RBF 1-25-1. Experiment 2 1.
Input variable (time—date), 1 output variable (price of gold), time series lag 5 days
The second experiment includes one input variable (time—date), one output variable, which, in our case, is the price of gold, and the time series lag of five days. Table 3.6 represents a basic overview of the neural networks used for predicting the development of the price of gold, where the first five most efficient and most successful networks (networks with the best characteristics) are listed. It shall be noticed that the performance of the networks is 20–30% in this case, which is considered a very bad result, as in the previous experiment, the performance of the networks is about 98%. For all cases, the same training algorithm RBFT is used; as an error function, the sum of the least squares (SOS) is used, while the Gaussian curve is a function for the activation of the hidden layer of neurons. For the activation of neurons in the output layer, the Identity function is used. The weights of the neural networks are shown in Table 3.7. In this case, only a sample is shown, as due to its large extent, it is not possible to show the entire table with all weights. Therefore, only the first four IDs of the weights are presented. For all five neural networks, correlation coefficients and data subsets were determined, indicating that if there is correlation between two processes, they are likely to be mutually dependent. Table 3.8 shows that the value of the correlation coefficient is low in the case of all networks, as none of the coefficient exceeded the value of 0.5; in one case (network RBF 5-22-1), the value of training and testing datasets is even lower than 0.1, which indicates a very low performance. In the case of correlation coefficients’ values, it is required that the number is as high as possible (as close to 1 as possible), and that the values are ideally identical for each network in all datasets (training, testing, and validation). The differences between the values of the datasets for one network are minimal (except for one); however, their performance is not sufficient. Prediction statistics for individual networks in the case of the training, testing, and validation datasets are given in Table 3.9. The data statistics are given in Table 3.10. Table 3.9 also indicates that the values are not satisfactory and the poor functionality
1736.028 −171.607 222.109 −189.760
1705.554
−218.360
236.738
−169.120
Maximum prediction (validation)
Minimum residual (train)
Maximum residual (train)
Minimum residual (test)
4.494
Maximum standard residual (validation)
3.985
5.585 −3.878
6.116
−5.393
Minimum standard residual (validation)
−4.862
−4.416
Minimum standard residual (test)
Maximum standard residual (test)
5.629
−4.349
−5.717
6.198
Maximum residual (validation)
Maximum standard residual (train)
152.346
174.060
Minimum residual (validation)
Minimum standard residual (train)
217.991 −148.253
234.254
−208.887
Maximum residual (test)
558.860
1735.661
558.960
1706.304
599.784
Minimum prediction (test)
558.861 1736.064
599.839
1706.367
Maximum prediction (train)
Minimum prediction (validation)
599.783
Minimum prediction (train)
2. RBF 1-25-1
Maximum prediction (test)
1. RBF 1-24-1
Statistics
Table 3.4 Predictions statistics—RBF NN (Tibco 2020)
6.086
−5.412
7.396
−4.140
7.679
−5.054
221.675
−197.139
285.369
−159.727
286.407
−188.519
1871.601
529.154
1691.276
532.364
1924.269
528.905
3. RBF 1-30-1
6.805
−5.642
6.092
−4.862
7.041
−5.457
252.781
−209.565
227.624
−181.682
268.306
−207.971
1713.824
539.545
1714.812
539.446
1714.813
539.406
4. RBF 1-28-1
5.341
−4.168
5.672
−4.188
5.692
−5.572
207.416
−161.857
220.972
−163.171
222.579
−217.905
1695.424
565.769
1695.428
565.971
1695.447
565.750
5. RBF 1-22-1
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 43
44
3 Artificial Neural Networks—Selected Models
Table 3.5 Data statistics—RBF NN (Tibco 2020) Samples
Date Input
Minimum train)
38,720.00
529.500
Maximum (train)
44,020.00
1895.000
Mean (train)
41,340.06
1201.595
1523.42
316.367
Standard deviation (train)
USD/oz Target
Minimum (test)
38,728.00
538.750
Maximum (test)
44,015.00
1895.000
Mean (test)
41,515.31
1218.531
1590.14
313.031
Standard deviation (test) Minimum (validation)
38,722.00
524.750
Maximum (validation)
44,019.00
1834.000
Mean (validation)
41,440.15
1207.606
2335.57
543.600
Minimum (overall)
38,720.00
524.750
Maximum (overall)
44,020.00
1895.000
Mean (overall)
41,381.33
1205.034
1533.01
313.589
Standard deviation (validation)
Standard deviation (overall)
Fig. 3.2 Time series prediction for USD/oz (Tibco 2020)
3.1 Radial Basis Function Neural Networks (Explanation, Usage)
45
Fig. 3.3 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 1-24-1 (Tibco 2020)
Fig. 3.4 Smoothed time series and prediction for 44 trading days (two calendar months)—2. RBF 1-25-1 (Tibco 2020)
46
3 Artificial Neural Networks—Selected Models
Fig. 3.5 Smoothed time series and prediction for 44 trading days (two calendar months)—3. RBF 1-30-1 (Tibco 2020)
Fig. 3.6 Smoothed time series and prediction for 44 trading days (two calendar months)—4. RBF 1-28-1 (Tibco 2020)
3.1 Radial Basis Function Neural Networks (Explanation, Usage)
47
Fig. 3.7 Smoothed time series and prediction for 44 trading days (two calendar days)—5. RBF 1-22-1 (Tibco 2020)
of the networks can be expected even at this moment. In the case of residuals, it is required that the sum of the maximum and minimum values was 0 in ideals case, or in the case of the minimum, the highest negative number and the highest positive number in the case of the maximum are sought. Figure 3.8 shows the graph of the prediction of the time series for USD/oz, where the course of all networks’ curves is described. It is clear from the figure that the network 1. RBF 5-22-1 shows meaningless behaviour, and taking into account the displayed curve it can be conclude that this network is not applicable for predicting the price of gold time series. In this case, the sample was training, testing, and validation dataset. Figure 3.9 presents the course of the prediction curve of the network 1. RBF 5-221 together with the curve of the testing dataset; from the graph, it is not possible to obtain any information that would indicate the possibility of applying this network in practice for predicting the price of gold. The network shows confusing behaviour. Figure 3.10 presents the course of predicting the price of gold by the network 1. RBF 5-29-1 together with the curve of the testing dataset. Even in this case, the graph does not provide any information indicating the possibility of applying this network in practice for predicting the price of gold. The network shows confusing behaviour as well. Figure 3.11 presents a graph of the development of time series prediction for USD/oz, where the testing dataset is used as a sample. The graph clearly shows that
Net. name
RBF 5-22-1
RBF 5-29-1
RBF 5-22-1
RBF5-22-1
RBF 5-24-1
Index
1
2
3
4
5
0.225869
0.265853
0.314613
0.271311
0.072128
Training perf
0.268852
0.280137
0.346376
0.271381
0.059483
Test perf
0.263909
0.298474
0.303653
0.263919
0.252078
Validation perf
Test error 1.633586E + 21 3.873458E + 17 6.031110E + 17 2.617176E + 17 1.960738E + 16
Training error 2.255188E + 21 4.139185E + 17 5.236097E + 17 3.032763E + 17 1.353818E + 16
Table 3.6 Overview of networks—Experiment 2 RBF (Tibco 2020)
1.535418E + 16
2.303932E + 17
3.848165E + 17
3.135597E + 17
5.942022E + 18
Validation error
RBFT
RBFT
RBFT
RBFT
RBFT
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Gaussian
Gaussian
Gaussian
Gaussian
Gaussian
Hidden activation
Identity
Identity
Identity
Identity
Identity
Output activation
48 3 Artificial Neural Networks—Selected Models
Date-1 → 0.055849 hidden neuron 1
Date-1 → 0.056038 hidden neuron 2
Date-1 → 0.056604 hidden neuron 3
Date-1 → 0.056792 hidden neuron 4
Date-1 → 0.077736 hidden neuron 1
Date-1 → 0.077925 hidden neuron 2
Date-1 → 0.078113 hidden neuron 3
Date-1 → 0.078302 hidden neuron 4
1
2
3
4
Date-1 → 0.276226 hidden neuron 4
Date-1 → 0.276038 hidden neuron 3
Date-1 → 0.275849 hidden neuron 2
Date-1 → 0.274906 hidden neuron 1
Date-1 → 0.600755 hidden neuron 4
Date-1 → 0.600189 hidden neuron 3
Date-1 → 0.600000 hidden neuron 2
Date-1 → 0.599811 hidden neuron 1
Date-1 → 0.071321 hidden neuron 4
Date-1 → 0.071132 hidden neuron 3
Date-1 → 0.070566 hidden neuron 2
Date-1 → 0.070377 hidden neuron 1
Weigt ID Connections Weight values Connections Weight values Connections Weight values Connections Weight values Connections Weight values 1. RBF 5-22-1 1. RBF 5-22-1 2. RBF 5-29-1 2. RBF 5-29-1 3. RBF 5-22-1 3. RBF 5-22-1 4. RBF 5-22-1 4. RBF 5-22-1 5. RBF 5-24-1 5. RBF 5-24-1
Table 3.7 Weights of networks—Experiment 2 RBF (Tibco 2020)
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 49
50
3 Artificial Neural Networks—Selected Models
Table 3.8 Correlation coefficients—Experiment 2 RBF NN (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. RBF 5-22-1
0.072128
0.059483
0.252078
2. RBF 5-29-1
0.271311
0.271381
0.263919
3. RBF 5-22-1
0.314613
0.346376
0.303653
4. RBF 5-22-1
0.265853
0.280137
0.298474
5. RBF 5-24-1
0.225869
0.268852
0.263909
the network 3. RBF 5-22-1 provides confusing values and it is thus not suitable for predicting the price of gold time series. The curve of the network 4. RBF 5-22-1 shows a similar course and indicates the impossibility to use the network for predicting the price of gold time series. The network is thus not suitable for predicting the price of gold (see Fig. 3.12). Similarly, from Fig. 3.13 it is clear that the course of the network 5. RBF 5-24-1 is not satisfactory and there are no indications that the network could be used for predicting the price of gold time series. Experiment 2 was not as successful as Experiment 1. Instead of 1-day time lag of the time series, the time lag is five days. This had a significant impact on the results, as neither of the five best neural networks showed the behaviour and results indicating that it could be used for predicting the price of gold time series. Experiment 3 1.
Input variable (time—date), 1 output variable (price of gold), time series time lag 10 days
The third experiment includes one input variable (time—date), one output variable, which is in our case the price of gold, and a 10-day time lag of the time series. Table 3.11 presents a basic overview of the neural networks used for predicting the development of the price of gold. The first five most efficient and successful networks (networks with the best characteristics) are mentioned. It shall be noticed that the networks’ performance in this case is 19–23%, which is a very low performance. This information also confirms a high probability that none of the networks will be applicable for accurate prediction of the price of gold development. For all cases, the same training algorithm RBFT is used. As an error function, the sum of squares (SOS) is used, while for the activation of the hidden layer of neurons, the Gaussian curve is use. The function used for the activation of neurons in the output layer is Identity. The weights of the individual networks are presented in Table 3.12. The table contains only the first four rows that apply to the weights of the individual networks due to the large amount of the data rows. Even in this experiment, the correlation coefficients of the individual networks and data subsets. These are given in Table 3.13, indicating that the value of the correlation coefficient is very low for all networks, which confirms their poor performance. The
1.038252E + 01 −1.359789E + 01 1.172866E + 01
2.185005E + 01
Maximum standard residual (validation)
−1.305466E + 01
−1.036222E-01
Minimum standard residual (test) 3.224364E + 01
1.031345E + 01
2.697619E + 01
Maximum standard residual (train)
−9.837515E-01
−1.261045E + 01
−2.310319E + 00
Minimum standard residual (train)
Minimum standard residual (validation)
6.567628E + 09
5.326226E + 10
Maximum residual (validation)
Maximum standard residual (test)
6.461783E + 09 −7.614330E + 09
−8.124844E + 09
−4.188165E + 09
Minimum residual (test) 1.303212E + 12
6.635312E + 09
1.281067E + 12
Maximum residual (train)
−2.398018E + 09
−8.113124E + 09
−1.097143E + 11
Minimum residual (train)
Minimum residual (validation)
7.614332E + 09
2.398019E + 09
Maximum prediction (validation)
Maximum residual (test)
8.124846E + 09 −6.567627E + 09
4.188166E + 09
−6.461783E + 09
−1.303212E + 12
Minimum prediction (test) −5.326226E + 10
8.113126E + 09
1.097143E + 11
Maximum prediction (train)
Minimum prediction (validation)
−6.635312E + 09
−1.281067E + 12
Minimum prediction (train)
Maximum prediction (test)
2. RBF 5-29-1
1. RBF 5-22-1
Statistics
Table 3.9 Prediction statistics—Experiment 2 RBF NN (Tibco 2020)
1.191893E + 01
−1.167855E-01
1.003501E + 01
−9.261236E-02
1.077022E + 01
−1.001407E-01
7.393738E + 09
−7.244625E + 07
7.793213E + 09
−7.192296E + 07
7.793425E + 09
−7.246266E + 07
7.244796E + 07
−7.393738E + 09
7.192468E + 07
−0.793213E + 09
7.246438E + 07
−7.793424E + 09
3. RBF 5-22-1
1.053968E + 01
−2.258828E-01
1.004247E + 01
−2.119250E-01
9.467463E + 00
−1.961830E-01
5.058970E + 09
−1.084221E + 08
5.137561E + 09
−1.084173E + 08
5.213782E + 09
−1.080390E + 08
1.084231E + 08
−5.058970E + 09
1.084183E + 08
−5.137560E + 09
1.080400E + 08
−5.213781E + 09
4. RBF 5-22-1
1.212751E + 01
−3.027506E + 00
1.063759E + 01
−2.685191E + 00
1.291631E + 01
−3.235387E + 00
1.502743E + 09
−3.751443E + 08
1.489543E + 09
−3.759975E + 08
1.502861E + 09
−3.764493E + 08
3.751454E + 08
−1.502743E + 09
3.759987E + 08
−1.489543E + 09
3.764504E + 08
−1.502860E + 09
5. RBF 5-24-1
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 51
52
3 Artificial Neural Networks—Selected Models
Table 3.10 Data statistics—Experiment 2 RBF NN (Tibco 2020) Samples
Date input
USD/oz target
Minimum (train)
38,720.00
529.500
Maximum (train)
44,020.00
1895.000
Mean (train)
41,340.06
1201.595
Standard deviation (train)
1523.42
316.367
Minimum (test)
38,728.00
538.750
Maximum (test)
44,015.00
1895.000
Mean (test)
41,515.31
1218.531
Standard deviation (test)
1590.14
313.031
Minimum (validation)
38,722.00
524.750
Maximum (validation)
44,019.00
1834.000
Mean (validation)
41,440.15
1207.606
Standard deviation (validation)
2335.57
543.600
Minimum (overall)
38,720.00
524.750
Maximum (overall)
44,020.00
1895.000
Mean (overall)
41,381.33
1205.034
Standard deviation (overall)
1533.01
313.589
Fig. 3.8 Predicting time series for USD/oz—Experiment 2 RBF all networks (Tibco 2020)
3.1 Radial Basis Function Neural Networks (Explanation, Usage)
53
Fig. 3.9 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 5-22-1 (Tibco 2020)
Fig. 3.10 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 5-29-1 (Tibco 2020)
54
3 Artificial Neural Networks—Selected Models
Fig. 3.11 Smoothed time series and prediction for 44 trading days (two calendar months)—3. RBF 5-22-1 (Tibco 2020)
Fig. 3.12 Smoothed time series and prediction for 44 trading days (two calendar months)—4. RBF 5-22-1 (Tibco 2020)
3.1 Radial Basis Function Neural Networks (Explanation, Usage)
55
Fig. 3.13 Smoothed time series and prediction for 44 trading days (two calendar months)—5. RBF 5-24-1 (Tibco 2020)
lowest coefficient value in the table is 0.128979, while the highest value is around 0.228095; however, even this value is not sufficient to ensure good performance. In terms of correlation coefficients’ value, it is required that the number is as high as possible (closest to 1), and identical for each network in all datasets (training, testing, and validation), if possible. Table 3.14 presents the prediction statistics of individual networks by individual neural networks and datasets. The table shows that in the case of the maximum prediction for the training, testing, and validation datasets, the values differ significantly in all networks, except for 3. RBF 10-28-1, 5. RBF 10-24-1, and 1. RBF 10-21-1. This applies to the minimum prediction as well. As for the maximum and minimum residuals, the minimal difference between the minimum and maximum is required. In terms of residuals, the best network appears to be 3. RBF 10-28-1, which has the smallest residuals. Unlike this, the networks 2. RBF 10-29-1 and 4. RBF 10-23-1 show a large difference between the minimum and maximum. Table 3.15 presents the data statistics evaluated within the Experiment 3. Figure 3.14 clearly shows large fluctuations of the curve for the networks 2 and 4. It can thus be concluded that the course of the curves indicates the inability of the networks to predict the price of gold correctly. The same conclusion is indicated by Fig. 3.15, which shows the graphs comparing the curve of predictions made by the individual networks with the curve of the testing dataset. None of the networks represented by the graphs below appears to be
Net. name
RBF 10-21-1
RBF 10-29-1
RBF 10-28-1
RBF 10-23-1
RBF 10-24-1
Index
1
2
3
4
5
0.205615
0.173611
0.227991
0.228095
0.147261
Training perf
0.216607
0.154072
0.222300
0.218430
0.128979
Test perf
0.211115
0.198337
0.224916
0.216735
0.210811
Validation perf
Test error 4.404266E + 33 1.890416E + 37 5.915984E + 31 1.065013E + 35 6.935049E + 34
Training error 7.264679E + 33 1.915493E + 37 6.679056E + 31 1.012423E + 36 6.303151E + 34
Table 3.11 Overview of networks—Experiment 3 RBF (Tibco 2020)
7.667120E + 34
9.559485E + 35
4.084511E + 31
1.281073E + 37
1.351736E + 34
Validation error
RBFT
RBFT
RBFT
RBFT
RBFT
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Gaussian
Gaussian
Gaussian
Gaussian
Gaussian
Hidden activation
Identity
Identity
Identity
Identity
Identity
Output activation
56 3 Artificial Neural Networks—Selected Models
Weight values 1. RBF 10-21-1
0.011887
0.012075
0.012264
0.012453
Connections 1. RBF 10-21-1
Date-1 → hidden neuron 1
Date-1 → hidden neuron 2
Date-1 → hidden neuron 3
Date-1 → hidden neuron 4
Weight ID
1
2
3
4
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections2. RBF 10-29-1
0.244340
0.244151
0.243585
0.243396
Weight values 2. RBF 10-29-1
Table 3.12 Overview of networks—Experiment 3 RBF (Tibco 2020)
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections3. RBF 10-28-1
0.357736
0.357170
0.356981
0.356792
Weight values 3. RBF 10-28-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 4. RBF 10-23-1
0.689623
0.689434
0.689245
0.688302
Weight values 4. RBF 10-23-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 5. RBF 10-24-1
0.296415
0.296226
0.296038
0.295849
Weight values 5. RBF 10-24-1
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 57
58
3 Artificial Neural Networks—Selected Models
Table 3.13 Correlation coefficients—Experiment 3 RBF NN (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. RBF 10-21-1
0.147261
0.128979
0.210811
2. RBF 10-29-1
0.228095
0.218430
0.216735
3. RBF 10-28-1
0.227991
0.222300
0.224916
4. RBF 10-23-1
0.173611
0.154072
0.198337
5. RBF 10-24-1
0.205615
0.216607
0.211115
applicable and able to make an accurate prediction of the development of the price of gold curve. Experiment 4 1.
Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 1-day lag of the time series
This fourth experiment includes five input variable, specifically time—date, day of the week, day of the month, month, and year. It includes one output variable— the price of gold. The time lag of the time series is one day. The overview of the individual networks used for the Experiment 4 is given in Table 3.16. The table presents five most successful networks with the best characteristics. An important value, which is interesting in terms of the dependence on the individual networks, is the performance of the network, which exceeds the value of 0.80 in the case of training, testing, and validation datasets in all networks. As for the training dataset’s performance, the highest performance is achieved by the network 2. RBF 5-28-1, with the training dataset’s performance value of 0.861124. This applies also to the testing and validation datasets. The network 2. RBF 5-28-1 appears to be the most efficient. Table 3.17 presents the individual weights of the networks. The table presents only a part of the weights due to the large extent of the table. Even in the case of Experiment 4, the correlation coefficients were determined for the individual networks and datasets. In this case, Table 3.18 shows that the values of the training, testing, and validation datasets are similar to each other. In all cases, the correlation coefficients exceed the value of 0.80, which indicates high network’s performance, as a value as close to 1 as possible is required. Prediction statistics of the individual networks are presented in Table 3.19. Table 3.20 below shows the data statistics. The graph of the USD/oz time series and the individual networks is shown in Fig. 3.16. The figure indicates that most of the networks follow the curve of USD/oz. The graphs of smoothed time series and the prediction for 44 trading days (two calendar months) for all networks are presented in Fig. 3.17. It can be concluded based on the graph that all five networks have a similar course of the prediction curve. The test curve (marked blue) is the most closely copied by the curve of the network 2. RBF 5-28-1.
2.467397E-01 −2.267926E + 01 1.823189E-03
2.097723E + 00
Maximum standard residual (validation)
−1.767486E + 01
−2.130072E + 01
Minimum standard residual (test) 3.811311E + 00
2.423464E-01
2.977538E + 00
Maximum standard residual (train)
−1.354852E + 01
−1.864822E + 01
−1.863132E + 01
Minimum standard residual (train)
Minimum standard residual (validation)
6.525574E + 15
2.438901E + 17
Maximum residual (validation)
Maximum standard residual (test)
1.072797E + 18 −8.117379E + 19
−7.684836E + 19
−1.413615E + 18
Minimum residual (test) 2.529363E + 17
1.060662E + 18
2.537848E + 17
Maximum residual (train)
−1.575207E + 18
−8.161645E + 19
−1.588005E + 18
Minimum residual (train)
Minimum residual (validation)
8.117379E + 19
1.575207E + 18
Maximum prediction (validation)
Maximum residual (test)
7.684836E + 19 −6.525574E + 15
1.413615E + 18
−1.072797E + 18
−2.529363E + 17
Minimum prediction (test) −2.438901E + 17
8.161645E + 19
1.588005E + 18
Maximum prediction (train)
Minimum prediction (validation)
−1.060662E + 18
−2.537848E + 17
Minimum prediction (train)
Maximum prediction (test)
2. RBF 10-29-1
1. RBF 10-21-1
Statistics
Table 3.14 Prediction statistics—experiment 3 RBF NN (Tibco 2020)
1.969824E + 00
−2.046917E + 01
1.648246E + 00
−1.665085E + 01
1.553294E + 00
−1.631314E + 01
1.258918E + 16
−1.308188E + 17
1.267755E + 16
−1.280708E + 17
1.269437E + 16
−1.333200E + 17
1.308188E + 17
−1.258918E + 16
1.280708E + 17
−1.267755E + 16
1.333200E + 17
−1.269437E + 16
3. RBF 10-28-1
1.912147E + 01
−8.794625E-04
2.460340E + 01
−2.730818E-03
2.120139E + 01
−8.867346E-04
1.869556E + 19
−8.598735E + 14
8.029206E + 18
−8.911898E + 14
2.133268E + 19
−8.922255E + 14
8.598735E + 14
−1.869556E + 19
8.911898E + 14
−8.029206E + 18
8.922255E + 14
−2.133268E + 19
4. RBF 10-23-1
1.507784E + 01
−8.455025E + 00
1.575363E + 01
−8.785817E + 00
1.690587E + 01
−9.335269E + 00
4.174989E + 18
−2.341160E + 18
4.148635E + 18
−2.313699E + 18
4.244400E + 18
−2.343720E + 18
2.341160E + 18
−4.174989E + 18
2.313699E + 18
−4.148635E + 18
2.343720E + 18
−4.244400E + 18
5. RBF 10-24-1
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 59
60
3 Artificial Neural Networks—Selected Models
Table 3.15 Data statistics—Experiment 3 RBF NN (Tibco 2020) Samples
Date input
USD/oz target
Minimum (train)
38,720.00
529.500
Maximum (train)
44,020.00
1895.000
Mean (train)
41,340.06
1201.595
Standard deviation (train)
1523.42
316.367
Minimum (test)
38,728.00
538.750
Maximum (test)
44,015.00
1895.000
Mean (test)
41,515.31
1218.531
Standard deviation (test)
1590.14
313.031
Minimum (validation)
38,722.00
524.750
Maximum (validation)
44,019.00
1834.000
Mean (validation)
41,440.15
1207.606
Standard deviation (validation)
2335.57
543.600
Minimum (overall)
38,720.00
524.750
Maximum (overall)
44,020.00
1895.000
Mean (overall)
41,381.33
1205.034
Standard deviation (overall)
1533.01
313.589
Fig. 3.14 Prediction of time series for USD/oz—experiment 3 (Tibco 2020)
3.1 Radial Basis Function Neural Networks (Explanation, Usage)
61
Fig. 3.15 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 10-21-1, 2. RBF 10-29-1, 3. RBF 10-28-1, 4. RBF 10-23-1, 5. RBF 10-24-1 Experiment 3 (Tibco 2020)
Net. name
RBF 5-22-1
RBF 5-28-1
RBF 5-29-1
RBF 5-30-1
RBF 5-26-1
Index
1
2
3
4
5
0.853527
0.855557
0.846489
0.861124
0.855625
Training perf
0.854253
0.854172
0.850674
0.858894
0.853390
Test perf
0.837543
0.840979
0.844050
0.855178
0.850385
Validation perf
13,535.98
13,362.97
14,132.54
12,886.54
13,357.16
Training error
Table 3.16 Overview of networks—Experiment 4 RBF (Tibco 2020)
13,314.64
13,359.35
13,595.14
12,977.59
13,428.16
Test error
13,505.04
13,269.61
13,040.51
12,192.90
12,530.36
Validation error
RBFT
RBFT
RBFT
RBFT
RBFT
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Gaussian
Gaussian
Gaussian
Gaussian
Gaussian
Hidden activation
Identity
Identity
Identity
Identity
Identity
Output activation
62 3 Artificial Neural Networks—Selected Models
Date-1 → 0.48623 hidden neuron 1
Date-1 → 0.50000 hidden neuron 2
Date-1 → 0.73333 hidden neuron 3
Date-1 → 0.00000 hidden neuron 4
Date-1 → 0.14038 hidden neuron 1
Date-1 → 0.75000 hidden neuron 2
Date-1 → 0.53333 hidden neuron 3
Date-1 → 0.00000 hidden neuron 4
2
3
4
Date-1 → 0 hidden neuron 4
Date-1 → 0 hidden neuron 3
Date-1 → 1 hidden neuron 2
Date-1 → -7 hidden neuron 1
Connections Weight values Connections Weight values 2. RBF 5-28-1 2. RBF 5-28-1 3. RBF 5-29-1 3. RBF 5-29-1
1
Weight ID Connections Weight values 1. RBF 5-22-1 1. RBF 5-22-1
Table 3.17 Overview of networks’ weights—RBF Experiment 4 (Tibco 2020)
Date-1 → 0.36364 hidden neuron 4
Date-1 → 0.50000 hidden neuron 3
Date-1 → 1.00000 hidden neuron 2
Date-1 → 0.57642 hidden neuron 1
Date-1 → 0.1818 hidden neuron 4
Date-1 → → 0.1333 hidden neuron 3
Date-1 → 0.0000 hidden neuron 2
Date-1 → 0.4251 hidden neuron 1
Connections Weight values Connections Weight values 4. RBF 5-30-1 4. RBF 5-30-1 5. RBF 5-26-1 5. RBF 5-26-1
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 63
64
3 Artificial Neural Networks—Selected Models
Table 3.18 Correlation coefficients—Experiment 4 RBF NN (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. RBF 5-22-1
0.855625
0.853390
0.850385
2. RBF 5-28-1
0.861124
0.858894
0.855178
3. RBF 5-29-1
0.846489
0.850674
0.844050
4. RBF 5-30-1
0.855557
0.854172
0.840979
5. RBF 5-26-1
0.853527
0.854253
0.837543
Experiment 5 1.
Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 5-day time lag of the time series 5
This fifth experiment includes five input variables, specifically time-date, day of the week, day of the month, month, year. Moreover, it includes one output variable—the price of gold. The time lag of the time series is five days. The overview of the individual networks for the Experiment 5 is shown in Table 3.21. The table shows five most successful networks with the best characteristics. An important value important in terms of the dependency on the individual networks is the network performance, which, in the case of validation, testing, and training datasets, exceeds 0.80 for all networks. As for the training dataset performance, the most efficient network appears to be the network 3. RBF 25-29-1, with the performance value of 0.689605. The same applies to the testing and validation datasets, where the network 3. RBF 25-29-1 appears to be the most efficient one. As for the level of testing, training, and validation dataset’s error, the most successful network showing the smallest error is the network 3. RBF 25-29-1. The weights of the networks (a part of them) are shown in Table 3.22. Due to the large extent of the table, only a small part of it was presented for illustration. Correlation coefficients were determined even for the Experiment 5. It follows from Table 3.23 that in the case of validation, testing, and training datasets, the correlation coefficients of the individual networks exceed the value of 0.6, which indicates an average performance. Prediction statistics for the fifth experiment are given in Table 3.24, which shows that the values differ significantly both among the datasets and the networks. In the case of residuals, the range is similar for most of the networks. The data statistics are given in Table 3.25. Figure 3.18 shows the development of the individual networks’ curves in comparison with the curve of USD/oz (marked blue). The figure clearly shows that the curves of the individual networks at least partially follow the shape of the USD/oz curve; however, it is not possible to identify which of the curves is closest to the test curve. The graphs of smoothed time series and prediction for 44 trading days (two calendar months) are presented in Fig. 3.19. The follows from the figure that only the network 4. RBF 25-27-1 is able to predict the development of price of gold accurately, as in the last third of the graph, it was able to predict the upward trend
1653.310 −662.257 530.989 −314.606 514.756 −387.392 491.509 −5.834 4.678 −2.762 4.519 −3.508
1610.134
326.541
1580.539
333.125
1606.888
−623.079
574.318
−327.082
607.187
−310.017
581.582
−5.391
4.969
−2.823
5.240
−2.770
Maximum prediction (train)
Minimum prediction (test)
Maximum prediction (test)
Minimum prediction (validation)
Maximum prediction (validation)
Minimum residual (train)
Maximum residual (train)
Minimum residual (test)
Maximum residual (test)
Minimum residual (validation)
Maximum residual (validation)
Minimum standard residual (train)
Maximum standard residual (train)
Minimum standard residual (test)
Maximum standard residual (test)
Minimum standard residual (validation)
2. RBF 5-28-1
Minimum prediction (train)
354.589
1658.111
368.072
1686.544
325.207
1. RBF 5-22-1
315.495
Statistics
Table 3.19 Prediction statistics—Experiment 4 RBF NN (Tibco 2020)
−2.805
4.701
−2.861
6.928
−3.354
506.220
−320.347
548.175
−333.631
823.616
−398.691
1734.012
329.929
1758.465
315.697
1820.519
−4.153
5.728
−3.062
5.482
−5.568
513.179
−478.385
662.005
−353.896
633.697
−643.668
1596.790
314.374
1615.591
287.725
1665.795
4. RBF 5-30-1 222.561
3. RBF 5-29-1 −281.116
−3.218
5.379
−2.890
5.280
−7.183
512.095
−373.984
620.717
−333.529
614.251
−835.680
1583.979
287.997
1626.715
249.682
1951.241
203.087
5. RBF 5-26-1
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 65
41,340.06
1523.42
19.00
44,015.00
41,444.80
2377.31
38,722.00
44,019.00
41,440.15
2927.85
19.00
44,020.00
41,370.76
1677.88
Minimum (test)
Maximum (test)
Mean (test)
Standard deviation (test)
Minimum (validation)
Maximum (validation)
Mean (validation)
Standard deviation (validation)
Minimum (overall)
Maximum (overall)
Mean (overall)
Standard deviation (vverall)
44,020.00
Maximum (train)
Standard deviation (train)
38,720.00
Minimum (train)
Mean (train)
Date input
Samples
1.400899
4.018564
6.000000
2.000000
1.396892
4.027322
6.000000
2.000000
1.441273
4.034608
6.000000
2.000000
1.391633
4.013255
6.000000
2.000000
Day in week input
Table 3.20 Data statistics—Experiment 4 RBF NN (Tibco 2020)
8.70486
15.61551
31.00000
1.00000
8.69007
15.83789
31.00000
1.00000
8.90170
15.28415
31.00000
1.00000
8.65048
15.63885
31.00000
1.00000
Day in month input
3.42037
6.39503
12.00000
1.00000
3.57921
6.52095
12.00000
1.00000
3.42252
6.51730
12.00000
1.00000
3.41621
6.34191
12.00000
1.00000
Month input
4.201
2012.805
2020.000
2006.000
6.354
2012.954
2020.000
2006.000
4.367
2013.162
2020.000
2006.000
4.172
2012.696
2020.000
2006.000
Year input
313.589
1205.034
1895.000
524.750
543.600
1207.606
1834.000
524.750
313.031
1218.531
1895.000
538.750
316.367
1201.595
1895.000
529.500
USD/oz target
66 3 Artificial Neural Networks—Selected Models
3.1 Radial Basis Function Neural Networks (Explanation, Usage)
67
Fig. 3.16 Time series prediction for USD/oz—Experiment 4 RBF all networks (Tibco 2020)
of the curve, and its curve thus partially followed the testing dataset curve. All the remaining networks predicted the opposite trend in the last third of the graph. All networks show large residuals. Experiment 6 1.
Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 10-day time lag of time series
For sixth experiment 6, 5 input variables were selected—time—date, day of the week, day of the month, month, year and one output. The time lag of the time series is ten days in this case. Table 3.26 presents the networks’ performance and level of error, including the training algorithm, which is the same for all networks (RBFT), error functions (sum of squares), activation function of the hidden layer of neurons (Gaussian curve), and activation function of the neurons in the output layer (Identity). Even at first glance it could be estimated that the performance of the networks in Experiment 6 will not be ideal. In the case of the datasets (training, testing, validation), the performance values achieve less than 50%; some of the networks (2, 4, and 5) show performance lower than twenty percent in the case of the training dataset. This is the evidence of their inapplicability for the prediction of time series. The weights of the individual networks are presented in Table 3.27; however, only the first four rows describing the networks’ weights are shown due to the large extent of the table. Table 3.27 serves only as an illustration.
68
3 Artificial Neural Networks—Selected Models
Fig. 3.17 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 5-22-1, 2. RBF 5-28-1, 3. RBF 5-29-1, 4. RBF 5-30-1, 5. RBF 5-26-1 Experiment 4 (Tibco 2020)
The correlation coefficients of the datasets and individual networks determined within the Experiment 6 are presented in Table 3.28. As for correlation coefficients, the most required values are those closest to 1. Within this experiment, the correlation coefficients of the five most successful networks achieve the values of less than 0.3, which is considered to be a very low value. It can thus be said that no significant positive result concerning the prediction of the price of gold using any of the five aforementioned networks is expected. Prediction statistics are given in Table 3.29. It can be seen from the table that four out of the five networks show confusing behaviour, which can be evaluated on
Net. name
RBF 25-30-1
RBF 25-28-1
RBF 25-29-1
RBF 25-27-1
RBF 25-30-1
Index
1
2
3
4
5
0.629755
0.655645
0.689605
0.638106
0.653413
Training perf
0.648300
0.646763
0.681925
0.668792
0.641030
Test perf
0.645767
0.622212
0.653480
0.645141
0.635410
Validation perf
29,975.10
28,321.89
26,072.60
29,450.91
28,569.35
Training error
Table 3.21 Overview of networks—Experiment 5 RBF (Tibco 2020)
28,401.28
28,529.86
26,313.67
27,549.12
28,879.62
Test error
26,144.18
27,495.60
25,751.06
26,199.73
26,730.07
Validation error
RBFT
RBFT
RBFT
RBFT
RBFT
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Gaussian
Gaussian
Gaussian
Gaussian
Gaussian
Hidden activation
Identity
Identity
Identity
Identity
Identity
Output activation
3.1 Radial Basis Function Neural Networks (Explanation, Usage) 69
Weight values 1. RBF 25-30-1
1
1
0
1
Connections 1. RBF 25-30-1
Date-1 → hidden neuron 1
Date-1 → hidden neuron 2
Date-1 → hidden neuron 3
Date-1 → hidden neuron 4
Weight ID
1
2
3
4
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 2. RBF 25-28-1
1
1
0
1
Weight values 2. RBF 25-28-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 3. RBF 25-29-1
Table 3.22 Overview of networks’ weights—Experiment 5 RBF (Tibco 2020)
1
0
0
1
Weight values 3. RBF 25-29-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 4. RBF 25-27-1
1
1
0
1
Weight values 4. RBF 25-27-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 5. RBF 25-30-1
1
1
1
0
Weight values 5. RBF 25-30-1
70 3 Artificial Neural Networks—Selected Models
3.1 Radial Basis Function Neural Networks (Explanation, Usage)
71
Table 3.23 Correlation coefficients—Experiment 5 RBF (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. RBF 25-30-1
0.653413
0.641030
0.635410
2. RBF 25-28-1
0.638106
0.668792
0.645141
3. RBF 25-29-1
0.689605
0.681925
0.653480
4. RBF 25-27-1
0.655645
0.646763
0.622212
5. RBF 25-30-1
0.629755
0.648300
0.645767
the basis of the data presented in the table (see the repeated numbers in the case of networks 2,3,4, and 5 in the Prediction. Data statistics are presented in Table 3.30. Figure 3.20 shows the development of the individual networks’ curves in comparison with the USD/oz curve (marked blue). The figure clearly shows that the curves of the individual networks do not follow the course of the USD/oz curve at all. Even now it can thus be concluded that none of the five networks mentioned above is a suitable tool for predicting the price of gold time series. The graphs of smoothed time series and prediction for 44 trading days (two calendar months) can be seen in Fig. 3.21 showing the graphs that compare the prediction curve made by means of the individual networks with the curve of the testing dataset. None of the networks described in the graphs indicates it could be used for accurate predicting.
3.2 Multi-Layer Perceptron Neural Networks Kumar and Yaday (2011) rank MLP Network (Multi-Layer Perceptron Neural Network) amongst the most frequently used perceptron networks. Michal et al. (2015) point out that not only MLP networks but also other types of neural structures consist of processing units called nodes linked by weighted connections. Nodes usually comprise three or more layers. The input layer receives values of prediction variables presented in the network, while one or more output nodes represent the estimated output. Kumar and Yaday (2011) consider MLP as a network type suggesting a feedforward artificial neural network. The structure consists of no less than two neural layers, i.e. perceptrons. Each layer connects all inputs of individual neurons with outputs of the previous layer that always aim at the following layer. The authors observe that this network is a further modification of a classical linear perceptron with the ability to analyse data that linear differentiation does not show. For systematic training, multilayer perceptron networks employ the backpropagation method.
1741.494 −639.805 896.162 −611.122
1596.171
−618.257
687.671
−602.560
Maximum prediction (validation)
Minimum residual (train)
Maximum residual (train)
Minimum residual (rest)
3.939
Maximum standard residual (validation)
3.750
3.554
Minimum standard residual (test) −3.511
−3.682
−3.546
Maximum standard residual (train)
4.106
5.222
4.068
Minimum standard residual (train)
−3.536
−3.728
−3.658
Maximum residual (validation)
Minimum standard residual (validation)
606.932
643.996
Minimum residual (validation)
Maximum standard residual (test)
589.927 −568.361
697.809
−578.055
Maximum residual (test)
54.610
1781.606
0.073
1687.824
442.209
Minimum prediction (test)
3.823
−3.204
4.343
−2.775
4.305
−3.172
613.516
−514.184
704.459
−450.149
695.158
−512.248
1684.403
489.972
1643.146
245.615
1696.635
308.490
−273.412 1873.700
353.381
1669.395
Maximum prediction (train)
Minimum prediction (validation)
298.157
Minimum prediction (train)
3. RBF 25-29-1
2. RBF 25-28-1
Maximum prediction (test)
1. RBF 25-30-1
Statistics
Table 3.24 Prediction statistics—Experiment 5 RBF (Tibco 2020)
3.501
−3.569
4.068
−3.261
4.333
−3.799
580.566
−591.875
687.062
−550.816
729.182
−639.269
1738.462
339.162
1759.656
381.849
1913.607
297.634
4. RBF 25-27-1
4.187
−3.957
3.714
−3.642
4.623
−3.771
677.031
−639.836
625.925
−613.706
800.322
−652.936
1563.618
362.001
1595.895
251.368
1626.059
201.858
5. RBF 25-30-1
72 3 Artificial Neural Networks—Selected Models
41,340.06
1523.42
38,728.00
44,015.00
41,515.31
1590.14
38,722.00
44,019.00
41,440.15
2335.57
38,720.00
44,020.00
41,381.33
1533.01
Minimum (test)
Maximum (test)
Mean (test)
Standard deviation (test)
Minimum (validation)
Maximum (validation)
Mean (validation)
Standard deviation (validation)
Minimum (overall)
Maximum (overall)
Mean (overall)
Standard deviation (overall)
44,020.00
Maximum (train)
Standard deviation (train)
38,720.00
Minimum (train)
Mean (train)
Date input
Samples
1.400899
4.018564
6.000000
2.000000
1.396892
4.027322
6.000000
2.000000
1.441273
4.034608
6.000000
2.000000
1.391633
4.013255
6.000000
2.000000
Day in week input
Table 3.25 Data statistics—Experiment 5 RBF (Tibco 2020)
8.70385
15.61643
31.00000
1.00000
8.69112
15.83789
31.00000
1.00000
8.90170
15.28415
31.00000
1.00000
8.64905
15.64016
31.00000
1.00000
Day in month input
3.42037
6.39503
12.00000
1.00000
3.57921
6.52095
12.00000
1.00000
3.42252
6.51730
12.00000
1.00000
3.41621
6.34191
12.00000
1.00000
Month input
4.201
2012.805
2020.000
2006.000
6.354
2012.954
2020.000
2006.000
4.367
2013.162
2020.000
2006.000
4.172
2012.696
2020.000
2006.000
Year input
313.589
1205.034
1895.000
524.750
543.600
1207.606
1834.000
524.750
313.031
1218.531
1895.000
538.750
316.367
1201.595
1895.000
529.500
USD/oz target
3.2 Multi-Layer Perceptron Neural Networks 73
74
3 Artificial Neural Networks—Selected Models
Fig. 3.18 Time series prediction for USD/oz—Experiment 5 (Tibco 2020)
3.2.1 Mathematical Background In practice, neural model, learning process, or topology of synapses use various types of artificial neural networks. According to Michal et al. (2015), the most commonly applied and important neural networks include MLP networks (Multi-Layer perceptron), whose topology consists of the input and output layers and an unspecified number of hidden layers, which are interconnected between the layers, but without a connection inside the layers. MLP network is a feedforward unidirectional network trained by a “teacher”. Tuˇcková (2003) adds that feed forward neural networks are the most common and simplest type of neural networks, especially due to their acyclic topology, where the neurons are arranged in layers. This ensures that the signals spread in one direction only. Neurons are then connected directly by means of their inputs and outputs. The output of several neurons in one layer thus create an input in a neuron in another layer. The connection is thus only between the layers, not between the neurons within the same layer. Bishop (1995) states that multilayer perceptron network is able to model the functional relationships while applying the non-linear activation functions, which, compared to single-layer perceptron networks, enables also non-linear classification. Kumar and Yadav (2011) state that the basis of the structure of this type of artificial neural network is represented by a directed graph. Each neuron has a simple task to process information by means of converting the obtained input into processed outputs. As already mentioned, the information flow in MLP networks is one-directional, that
3.2 Multi-Layer Perceptron Neural Networks
75
Fig. 3.19 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 25-30-1, 2. RBF 25-28-1, 3. RBF 25-289-1, 4. RBF 25-27-1, 5.BF 25-30-1 Experiment 5 (Tibco 2020)
is, from the input layer through the hidden to output layer. The connection is described in the figure below (Fig. 3.22). According to Haykin (2001), this type of connection enables forward signal propagation. The signal passed to a given neural network causes excitation of neurons in the input layer. The author adds that the excitation is brought to the first hidden layer by means of synapses, where the strength of the signal is adjusted by means of individual weights. Each individual neuron in the first hidden layer thus obtains information from each neuron from the input layer; it should be noted that the differences in the information are caused by different weights. Subsequently, after the signals are transmitted, their summation and excitation occurs, which is given by the activation function. This process takes place in all layers; due to this process, it is
Net name
RBF 50-21-1
RBF 50-23-1
RBF 50-23-1
RBF 50-23-1
RBF 50-30-1
Index
1
2
3
4
5
0.163494
0.128000
0.207340
0.199671
0.214528
Training perf
0.176402
0.182207
0.186163
0.154004
0.221858
Test perf
0.255059
0.252961
0.257506
0.251010
0.246554
Validation perf
49,427.77
49,427.77
49,427.77
49,427.77
47,159.91
Training error
Table 3.26 Overview of networks—Experiment 6 RBF (Tibco 2020)
48,623.75
48,623.75
48,623.57
48,623.18
46,192.61
Test error
44,416.76
44,416.76
44,416.69
44,416.53
41,998.28
Validation error
RBFT
RBFT
RBFT
RBFT
RBFT
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Gaussian
Gaussian
Gaussian
Gaussian
Gaussian
Hidden activation
Identity
Identity
Identity
Identity
Identity
Output activation
76 3 Artificial Neural Networks—Selected Models
Weight values 1. RBF 50-21-1
0
1
1
1
Connections 1. RBF 50-21-1
Date-1 → hidden neuron 1
Date-1 → hidden neuron 2
Date-1 → hidden neuron 3
Date-1 → hidden neuron 4
Weight ID
1
2
3
4
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 2. RBF 50-23-1
0.727
0.067
0.250
0.528
Weight values 2. RBF 50-23-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 3. RBF 50-23-1
Table 3.27 Overview of networks’ weights—Experiment 6 RBF (Tibco 2020)
0.5455
0.2333
1.0000
0.7243
Weight values 3. RBF 50-23-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 4. RBF 50-23-1
0.818182
0.200000
0.000000
0.948113
Weight values 4. RBF 50-23-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 5. RBF 50-30-1
0.636364
1.000000
0.750000
0.045283
Weight values 5. RBF 50-30-1
3.2 Multi-Layer Perceptron Neural Networks 77
78
3 Artificial Neural Networks—Selected Models
Table 3.28 Correlation coefficients—Experiment 6 RBF (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. RBF 50-21-1
0.214528
0.221858
0.246554
2. RBF 50-23-1
0.199671
0.154004
0.251010
3. RBF 50-23-1
0.207340
0.186163
0.257506
4. RBF 50-23-1
0.128000
0.182207
0.252961
5. RBF 50-30-1
0.163494
0.176402
0.255059
possible to obtain a response to the input signal in the output layer of a given neural network. The following formula describes the output of the i-th neuron output in the k-th layer.
yi,k
⎛ ⎞ N
=ϕ⎝ y j,k−1 w j,i,k ⎠
(3.3)
j=1
where: N is a number of neurons in the (k − 1) layer. This output also depends on the shape of the activation function ϕ, which affects the behaviour of the neuron. According to Vochozka et al. (2016), typical algorithms for MLP network’s learning include a type of learning, which is referred to as backpropagation. This is a method based on gradient descent, which adjusts the weight values to minimize network and expected output errors. The authors add that MLP networks use iteration in training, which requires much more time in the case of larger amount of data or hidden neurons.
3.2.2 Case For these experiments, multilayer perceptron networks (MLP) were used a method of predicting the development of price of gold time series. The dataset was divided into three basic subsets: testing, training, and validation dataset. Experiment 1 1.
Input variable (time—date), 1 output variable (price of gold), 1-day time lag of time series
The first experiment includes one input variable (time—date), one output variable, which is the price of gold in our case, and the time series time lag is one day. Table 3.31 presents a basic overview of neural networks used for predicting the price of gold, showing the first five most efficient and most successful networks (with the best characteristics). It can be notices that the network performance is about 98% in the case of training, testing, and validation datasets, which is considered a very
1203.437
−11.367
1217.216
226.843
Minimum prediction (train)
Maximum prediction (train)
Minimum prediction (test)
1203.437 −668.437 691.563 −664.687
1217.216
−680.796
679.980
−678.039
Maximum prediction (validation)
Minimum residual (train)
Maximum residual (train)
Minimum residual (test)
1203.437
Minimum prediction (validation)
3.010
Maximum standard residual (validation)
2.992
3.136
Minimum standard residual (test) −3.072
−3.014
−3.155
Maximum standard residual (train)
3.154
3.111
3.131
Minimum standard residual (train)
−3.226
−3.007
−3.135
Maximum residual (validation)
Minimum standard residual (validation)
630.563
616.828
Minimum residual (validation)
Maximum standard residual (test)
691.563 −647.437
677.800
−661.042
Maximum residual (test)
1203.437
1217.216
652.462
Maximum prediction (test)
1203.437
1203.437
2. RBF 50-23-1
1. RBF 50-21-1
Statistics
Table 3.29 Prediction statistics—Experiment 6 RBF (Tibco 2020)
2.992
−3.072
3.136
−3.014
3.111
−3.006
630.587
−647.413
691.587
−664.663
691.587
−668.413
1203.413
1203.413
1203.413
1203.413
1203.413
1203.413
3. RBF 50-23-1
2.992
−3.072
3.136
−3.014
3.111
−3.006
630.598
−647.402
691.598
−664.652
691.598
−668.402
1203.402
1203.402
1203.402
1203.402
1203.402
1203.402
4. RBF 50-23-1
2.992
−3.072
3.136
−3.014
3.111
−3.006
630.598
−647.402
691.598
−664.652
691.598
−668.402
1203.402
1203.402
1203.402
1203.402
1203.402
1203.402
5. RBF 50-30-1
3.2 Multi-Layer Perceptron Neural Networks 79
41,340.06
1523.42
38,728.00
44,015.00
41,515.31
1590.14
38,722.00
44,019.00
41,440.15
2335.57
38,720.00
44,020.00
41,381.33
1533.01
Minimum (test)
Maximum (test)
Mean (test)
Standard deviation (test)
Minimum (validation)
Maximum (validation)
Mean (validation)
Standard deviation (validation)
Minimum (overall)
Maximum (overall)
Mean (overall)
Standard deviation (overall)
44,020.00
Maximum (train)
Standard deviation (train)
38,720.00
Minimum (train)
Mean (train)
Date input
Samples
1.400899
4.018564
6.000000
2.000000
1.396892
4.027322
6.000000
2.000000
1.441273
4.034608
6.000000
2.000000
1.391633
4.013255
6.000000
2.000000
Day in week input
Table 3.30 Data statistics—Experiment 6 RBF (Tibco 2020)
8.70385
15.61643
31.00000
1.00000
8.69112
15.83789
31.00000
1.00000
8.90170
15.28415
31.00000
1.00000
8.64905
15.64016
31.00000
1.00000
Day in month input
3.42037
6.39503
12.00000
1.00000
3.57921
6.52095
12.00000
1.00000
3.42252
6.51730
12.00000
1.00000
3.41621
6.34191
12.00000
1.00000
Month input
4.201
2012.805
2020.000
2006.000
6.354
2012.954
2020.000
2006.000
4.367
2013.162
2020.000
2006.000
4.172
2012.696
2020.000
2006.000
Year input
313.589
1205.034
1895.000
524.750
543.600
1207.606
1834.000
524.750
313.031
1218.531
1895.000
538.750
316.367
1201.595
1895.000
529.500
USD/oz target
80 3 Artificial Neural Networks—Selected Models
3.2 Multi-Layer Perceptron Neural Networks
81
Fig. 3.20 Predicting time series for USD/oz—Experiment 6 (Tibco 2020)
high performance. In all cases, the same training algorithm BFGS was used (with a different numerical designation). As an error function, the sum of squares (SOS) was used in all cases, while for the activation of the hidden layer of neurons, Hyberbolic tangent is used for all cases except the first one, where logistic function was used as a function to activate the hidden layer of neurons. For the activation of neurones in the output layer, Identity is used in two case, in one of them being exponential and logistic in the other. The weights of the individual neural networks are presented in Table 3.32; however, there is only a small part of the data since due to its extent, it is not possible to present the whole table. Therefore, only the first four IDs of the weights are specified for illustration. For all five neural networks and the data subsets, correlation coefficients were determined, which point out that if there is correlation between two processes, they are likely to be mutually dependent. It can be seen from Table 3.33 that the value of the correlation coefficient is high for all networks (about 0.98). In the case of correlation coefficients’ value, it is required that the value is as close to 1 as possible; for each network of the datasets (training, testing, validation), the values shall be identical in ideal case. The differences between the values of the neural networks presented in Table 3.33 are minimal. Table 3.34 captures the prediction statistics according to individual neural networks and datasets (training, testing, validation).
82
3 Artificial Neural Networks—Selected Models
Fig. 3.21 Smoothed time series and prediction for 44 trading days (two calendar months)—1. RBF 50-21-1, 2. RBF 50-23-1, 3. RBF 50-23-1, 4. RBF 50-23-1, 5. RBF 50-30-1 Experiment 6 (Tibco 2020)
Fig. 3.22 Topology of multilayer percepton
Net. name
MLP 1-8-1
MLP 1-7-1
MLP 1-8-1
MLP 1-8-1
MLP 1-8-1
Index
1
2
3
4
5
0.985260
0.984128
0.984474
0.983705
0.987049
Training perf
0.984405
0.983588
0.984876
0.983220
0.986649
Test perf
0.984205
0.983372
0.983161
0.982813
0.985175
Validation perf
1461.219
1572.533
1538.584
1614.390
1285.374
Training error
Table 3.31 Overview of networks—MLP NN Experiment 1 (Tibco 2020)
1513.991
1592.331
1469.196
1627.513
1303.063
Test error
1418.593
1491.808
1510.424
1541.637
1331.725
Validation error
BFGS 294
BFGS 557
BFGS 765
BFGS 220
BFGS 831
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Tanh
Tanh
Tanh
Tanh
Logistic
Hidden activation
Logistic
Identity
Identity
Logistic
Exponential
Output activation
3.2 Multi-Layer Perceptron Neural Networks 83
Date-1 → 7.4541 hidden neuron 1
Date-1 → −14.6916 hidden neuron 2
Date-1 → −53.1244 hidden neuron 3
Date-1 → −30.6890 hidden neuron 4
Date-1 → 21.436 hidden neuron 2
Date-1 → −30.753 hidden neuron 3
Date-1 → −14.799 hidden neuron 4
2
3
4
Date-1 → 12.5521 hidden neuron 4
Date-1 → 16.6189 hidden neuron 3
Date-1 → −0.8886 hidden neuron 2
Date-1 → −14.1168 hidden neuron 4
Date-1 → −43.0248 hidden neuron 3
Date-1 → −0.3166 hidden neuron 2
Weight values 5. MLP 1-8-1
Date-1 → 11.444 hidden neuron 4
Date-1 → −68.878 hidden neuron 3
Date-1 → −20.537 hidden neuron 2
Date-1 → −52.490 hidden neuron 1
Weight values Connections 4. MLP 1-8-1 5. MLP 1-8-1
Date-1 → 44.7786 hidden neuron 1
Weight values Connections 3. MLP 1-8-1 4. MLP 1-8-1
Date-1 → 12.9752 hidden neuron 1
Weight values Connections 2. MLP 1-7-1 3. MLP 1-8-1
Date-1 → 14.337 hidden neuron 1
Weight values Connections 1. MLP 1-8-1 2. MLP 1-7-1
1
Weight ID Connections 1. MLP 1-8-1
Table 3.32 Network weights—MLP NN experiment 1 (Tibco 2020)
84 3 Artificial Neural Networks—Selected Models
3.2 Multi-Layer Perceptron Neural Networks
85
Table 3.33 Correlation coefficients—MLP NN experiment 1 (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. MLP 1-8-1
0.987049
0.986649
0.985175
2. MLP 1-7-1
0.983705
0.983220
0.982813
3. MLP 1-8-1
0.984474
0.984876
0.983161
4. MLP 1-8-1
0.984128
0.983588
0.983372
5. MLP 1-8-1
0.985260
0.984405
0.984205
It can be seen from the table that in the case of the maximum prediction on the training, testing, and validation datasets, the values of all networks are very similar. This applies to the minimum prediction as well. In terms of minimum and maximum residuals, as small difference between the minimum and maximum as possible is required. In terms of residuals, the best network appears to be the network 3. MLP 1-8-1, which shows the smallest residuals. Statistics of the individual data are given in Table 3.35. Figure 3.23 shows the development of the curve of price per Troy ounce of gold in USD (marked blue) in comparison with five best retained networks. As a sample, training, testing, and validation datasets were used. In terms of accuracy, the most successful networks appear to be 3. MLP 1-8-1, since its curve best follows the blue curve, that is, the actual development of the price of gold. The graphs of smoothed time series including the prediction for 44 trading days (two calendar months) are presented in Fig. 3.24. The most successful networks in terms of the curve trend compared to the curve of the actual prices of gold are the networks 1. MLP 1-8-1 and 3. MLP 1-8-1. These networks were able to predict the trend of the curve even at some extreme points. All networks show very good results in comparison with the testing dataset, and all of them could be applied for predicting when ensuring high network performance. In terms of the minimum residuals, the best network appears to be 3. RBF 1-8-1. Experiment 2 1.
Input variable (time—date), 1 output variable (price of gold), 5-day time lag of time series
The second experiment includes one input variable (time—date), one output variable, which is the price of gold in this case and the time lag of the time series is five days. Table 3.36 presents the basic overview of the neural networks used for predicting the development of the price of gold. It includes first five most efficient and most successful networks (networks with the best characteristics). It shall be noticed that the performance of the networks in this case is again around 98 percent for all types of datasets, which is considered a very high performance. For all cases, the same training algorithm BFGS is used (only with different numerical designation). The error function in all cases is the sum of squares (SOS), as a function for the activation of the hidden layer of neurons, Hyperbolic tangent is used in one case,
−204.508 267.279 −185.462 265.230 −165.465 201.535 −5.090 6.652 −4.597 6.574 −4.214
−195.980
237.105
−187.133
234.323
−172.635
169.689
−5.466
6.613
−5.184
6.491
−4.731
4.650
Minimum residual (train)
Maximum residual (train)
Minimum residual (test)
Maximum residual (test)
Minimum residual (validation)
Maximum residual (validation)
Minimum standard residual (train)
Maximum standard residual (train)
Minimum standard residual (test)
Maximum standard residual (test)
Minimum standard residual (validation)
Maximum standard residual (validation)
5.133
1800.408
611.731
1793.820
573.900
1787.357
Maximum prediction (test)
611.757
1802.031
1789.468
574.330
Minimum prediction (test)
Maximum prediction (validation)
1789.557
Maximum prediction (train)
2. MLP 1-7-1 611.727
Minimum prediction (validation)
573.829
1. MLP 1-8-1
4.442
−4.603
5.917
−4.619
5.847
−4.349
172.626
−178.892
226.799
−177.053
229.342
−170.599
1745.913
551.554
1745.058
553.310
1745.934
551.261
3. MLP 1-8-1
Predictions statistics (gold - data_2) Target: USD/oz
Minimum prediction (train)
Statistics
Table 3.34 Prediction statistics—MLP NN Experiment 1 (Tibco 2020) 4. MLP 1-8-1
5.786
−4.665
7.184
−4.529
7.271
−4.466
223.491
−180.177
286.679
−180.744
288.340
−177.085
1711.196
576.578
1709.420
576.828
1711.634
576.539
5. MLP 1-8-1
6.013
−4.833
7.439
−4.683
7.611
−4.717
226.488
−182.024
289.439
−182.234
290.923
−180.329
1740.766
619.126
1738.325
619.129
1741.367
619.125
86 3 Artificial Neural Networks—Selected Models
3.2 Multi-Layer Perceptron Neural Networks
87
Table 3.35 Data statistics—MLP NN experiment 1 (Tibco 2020) Samples
Data statistics (gold—data_2) Date input
USD/oz target
Minimum (train)
38,720.00
529.500
Maximum (train)
44,020.00
1895.000
Mean (train)
41,340.06
1201.595
Standard deviation (train)
1523.42
316.367
Minimum (test)
38,728.00
538.750
Maximum (test)
44,015.00
1895.000
Mean (test)
41,515.31
1218.531
Standard deviation (test)
1590.14
313.031
Minimum (validation)
38,722.00
524.750
Maximum (validation)
44,019.00
1834.000
Mean (validation)
41,440.15
1207.606
Standard deviation (validation)
2335.57
543.600
Minimum (overall)
38,720.00
524.750
Maximum (overall)
44,020.00
1895.000
Mean (overall)
41,381.33
1205.034
Standard deviation (overall)
1533.01
313.589
Fig. 3.23 Prediction of time series USD/oz—MLP NN Experiment 1 (Tibco 2020)
88
3 Artificial Neural Networks—Selected Models
Fig. 3.24 Smoothed time series and prediction for 44 trading days 1. MLP 1-8-1, 2. MLP 1-7-1, 3. MLP 1-8-1, 4. MLP 1-8-1, 5. MLP 1-8-1—Experiment 1 (Tibco 2020)
while in the remaining cases, logistic function is used as a function for the activation of the hidden layer of neurons. As a function for the activation on neurons in the output layer, Hyperbolic tangent was used in two cases, in one case it was Exponential function, and in two remaining cases, it was logistics function. The weights of the neural networks are shown in Table 3.37; however, only a small part of the table is presented, since due to its large extent, it is not possible to present the complete table with all weights. Therefore, only the first four ID of weights are presented for illustration. For all five networks, correlation coefficients of the individual networks and data subsets were determined, indicating that in the case of correlation between two processes, these are likely to be mutually dependent. Table 3.38 clearly shows that the values of the correlation coefficient are high for all networks, since all of them exceeded the value of 0.98. This indicates a very high
Net. name
MLP 5-6-1
MLP 5-8-1
MLP 5-8-1
MLP 5-7-1
MLP 5-8-1
Index
1
2
3
4
5
0.985145
0.984502
0.983964
0.983543
0.984210
Training perf
0.984557
0.983561
0.983055
0.983431
0.984757
Test perf
0.983557
0.982901
0.982257
0.982565
0.982695
Validation perf
1464.947
1527.502
1580.052
1621.418
1555.911
Training error
Table 3.36 Overview of networks—MLP NN Experiment 2 (Tibco 2020)
1501.249
1596.290
1643.916
1607.253
1481.867
Test error
1460.720
1518.669
1575.105
1547.872
1536.321
Validation error
BFGS 246
BFGS 260
BFGS 451
BFGS 465
BFGS 616
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Logistic
Logistic
Logistic
Tanh
Logistic
Hidden activation
Logistic
Logistic
Exponential
Tanh
Tanh
Output activation
3.2 Multi-Layer Perceptron Neural Networks 89
Date-1 → −4.1419 hidden neuron 2
Date-1 → −3.0355 hidden neuron 3
Date-1 → −1.9745 hidden neuron 4
Date-1 → 19.6686 hidden neuron 3
Date-1 → 14.6392 hidden neuron 4
3
4
Date-1 → 8.0327 hidden neuron 4
Date-1 → 5.4789 hidden neuron 3
Date-1 → 25.2519 hidden neuron 4
Date-1 → 25.9877 hidden neuron 3
Weight values 5. MLP 5-8-1
Date-1 → −3.7047 hidden neuron 4
Date-1 → −3.2985 hidden neuron 3
Date-1 → −3.4150 hidden neuron 2
Weight values Connections 4. MLP 5-7-1 5. MLP 5-8-1
Date-1 → 26.6139 hidden neuron 2
Weight values Connections 3. MLP 5-8-1 4. MLP 5-7-1
Date-1 → 9.3501 hidden neuron 2
Weight values Connections 2. MLP 5-8-1 3. MLP 5-8-1
Date-1 → 24.0077 hidden neuron 2
Weight values Connections 1. MLP 5-6-1 2. MLP 5-8-1
2
Weight ID Connections 1. MLP 5-6-1
Table 3.37 Network weights—MLP NN Experiment 2 (Tibco 2020)
90 3 Artificial Neural Networks—Selected Models
3.2 Multi-Layer Perceptron Neural Networks
91
Table 3.38 Correlation coefficients—MLP NN Experiment 2 (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. MLP 5-6-1
0.984210
0.984757
0.982695
2. MLP 5-8-1
0.983543
0.983431
0.982565
3. MLP 5-8-1
0.983964
0.983055
0.982257
4. MLP 5-7-1
0.984502
0.983561
0.982901
5. MLP 5-8-1
0.985145
0.984557
0.983557
performance of the network. In the case of the correlation coefficients’ values, it is required that the value is as high as possible (close to 1), and ideally the same for all networks in all datasets (training, testing, and validation). The differences between the values of the datasets for one network are minimal, and their performance is sufficient. Prediction statistics for the individual networks in the case of training, testing, and validation datasets are presented in Table 3.39, while the data statistics are presented in Table 3.40. Looking at the Table 3.39, it is evident that the values are satisfactory, and high performance of the networks can be expected. In the case of residuals, it is required that the sum of the maximum and minimum was 0 in ideal case, or in the case of the minimum, the highest negative number and in the case of the maximum, the lowest positive number are sought. As for predictions, it could be concluded that the values of the individual networks in the case of training, testing, and validation datasets are very similar for all networks. In terms of the residuals, the best network appears to be 1. MLP 5-6-1. Figure 3.25 presents a graph of the prediction of time series for USD/oz, showing the course of all networks’ curves. It is evident from the figure that all networks show similar behaviour; taking a closer look at the curves displayed, it could be seen that their course is very similar at many parts of the graph. However, it could be said that the most suitable network for predicting the price of gold time series is the network 1. MLP 5-6-1, specifically from the perspective of the lowest residuals. The sample used is the training, testing, and validation dataset. The graphs of smoothed time series and prediction for 44 trading days for all networks is shown in Fig. 3.26. The networks that have a very good course of the curve, which at least partially follows the curve of the actual prices of gold are the networks 1. MLP 5-6-1 (in terms of the estimated shape of the curve in extreme cases) and 3. MLP 5-8-1, which shows a very high performance in the first third of the graph. Experiment 3 1.
Input variable (time—date), 1 output variable (price of gold), 10-day time lag of time series
The third experiment includes one input variable (time—date), one output variable, which is the price of gold in our case, and the time series time lag, which is 10 days.
1762.896 −179.764 277.204 −181.386
1755.786
−170.570
244.839
−177.474
Maximum prediction (validation)
Minimum residual (train)
Maximum residual (train)
Minimum residual (test)
4.421
Maximum standard residual (validation)
5.403
6.897
Minimum standard residual (test) −4.369
−4.524
−4.610
Maximum standard residual (train)
6.278
6.884
6.207
Minimum standard residual (train)
−4.378
−4.464
−4.324
Maximum residual (validation)
Minimum standard residual (validation)
212.557
173.289
Minimum residual (validation)
Maximum standard residual (test)
276.514 −171.893
241.658
−171.589
Maximum residual (test)
561.993
1759.851
563.497
1752.756
555.929
Minimum prediction (test)
558.298 1764.778
553.991
1755.883
Maximum prediction (train)
Minimum prediction (validation)
555.365
Minimum prediction (train)
2. MLP 5-8-1
Maximum prediction (test)
1. MLP 5-6-1
Statistics
Table 3.39 Prediction statistics—MLP NN Experiment 2 (Tibco 2020)
5.167
−4.451
7.138
−4.051
7.602
−5.394
205.048
−176.654
289.432
−164.231
302.189
−214.404
1726.746
600.998
1727.797
601.206
1728.186
601.198
3. MLP 5-8-1
6.117
−4.697
7.743
−3.653
8.059
−4.208
238.365
−183.057
309.355
−145.938
314.984
−164.480
1738.178
601.063
1736.327
601.183
1739.444
601.219
4. MLP 5-7-1
5.438
−4.693
7.052
−4.398
7.182
−5.273
207.823
−179.358
273.220
−170.416
274.896
−201.833
1755.181
584.292
1752.810
584.461
1756.388
584.629
5. MLP 5-8-1
92 3 Artificial Neural Networks—Selected Models
3.2 Multi-Layer Perceptron Neural Networks
93
Table 3.40 Data statistics—MLP NN experiment 2 (Tibco 2020) Samples
Date input
USD/oz target
Minimum (train)
38,720.00
529.500
Maximum (train)
44,020.00
1895.000
Mean (train)
41,340.06
1201.595
Standard deviation (train)
1523.42
316.367
Minimum (test)
38,728.00
538.750
Maximum (test)
44,015.00
1895.000
Mean (test)
41,515.31
1218.531
Standard deviation (test)
1590.14
313.031
Minimum (validation)
38,722.00
524.750
Maximum (validation)
44,019.00
1834.000
Mean (validation)
41,440.15
1207.606
Standard deviation (validation)
2335.57
543.600
Minimum (overall)
38,720.00
524.750
Maximum (overall)
44,020.00
1895.000
Mean (overall)
41,381.33
1205.034
Standard deviation (overall)
1533.01
313.589
Fig. 3.25 Prediction of USD/oz time series—MLP NN experiment 2 (Tibco 2020)
94
3 Artificial Neural Networks—Selected Models
Fig. 3.26 Graphs of smoothed time series and prediction for 44 trading days 1. MLP 5-6-1, 2. MLP 5-8-1, 3. MLP 5-8-1, 4. MLP 5-7-1, 5. MLP 5-8-1—Experiment 2 (Tibco 2020)
Table 3.41 offers the basic overview of the neural networks used for predicting the development of the price of gold, presenting the first five most efficient and most successful networks (networks with the best characteristics). It can be noticed that the performance of the networks is about 98 percent in this case, which is a very high performance, with the network 3. MLP 10-8-1 showing the best performance in the case of testing, training, and validation datasets. The information about the networks’ performance indicates high probability of successful application of the network for predicting the development of the price of gold. In all cases, the same training algorithm (BFGS) was used (the only difference is in its numerical designation). As an error function, the sum of squares (SOS) was used,
Net. name
MLP 10-7-1
MLP 10-7-1
MLP 10-8-1
MLP 10-8-1
MLP 10-8-1
Index
1
2
3
4
5
0.984047
0.982963
0.988940
0.983994
0.983571
Training perf
0.983372
0.981962
0.988240
0.984014
0.984685
Test perf
0.981882
0.981537
0.988279
0.981359
0.982238
Validation perf
1562.988
1668.345
1085.921
1568.038
1609.527
Training error
Table 3.41 Overview of network—MLP NN experiment 3 (Tibco 2020)
1598.650
1733.591
1134.527
1543.388
1478.157
Test error
1592.048
1621.498
1033.320
1637.143
1560.952
Validation error
BFGS 569
BFGS 473
BFGS 344
BFGS 328
BFGS 284
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Tanh
Tanh
Tanh
Tanh
Tanh
Hidden activation
Identity
Identity
Logistic
Exponential
Identity
Output activation
3.2 Multi-Layer Perceptron Neural Networks 95
96
3 Artificial Neural Networks—Selected Models
while Hyperbolic tangent is used as a function for the activation of the hidden layer of neurons, and for the activation of the neurons in the output layer, Identity was used in three cases, logistic function in one case, and exponential function in the remaining case. The weights of the individual networks are shown in Table 3.42. Due to the large volume of the data rows, the table presents only the first four rows that concern the weights of the individual networks. Even within the Experiment 3, the correlation coefficients of the individual networks and data subsets. They are presented in Table 3.43, which shows that the values of the correlation coefficient are high for all networks, which confirms their high performance. The lowest value of 0.981359 is recorded for the network is 2. MLP 10-7-1 in the validation dataset, while the highest value of 0.988940 is recorded in the case of the network 3. MLP 10-8-1, which achieves the highest values of the correlation coefficients in all datasets. As for the values of the correlation coefficient, it is required that the value is as high as possible (close to 1), and ideally identical for each network’s datasets (training, testing, and validation). The prediction statistics for the individual networks in the case of training, testing, and validation datasets are given in Table 3.44, while the data statistics are listed in Table 3.45. Looking at Table 3.44, it can be seen that the values are highly satisfactory, and even at this moment, high performance of the networks can be assumed. In terms of the residuals, it is required that the sum of the maximum and minimum values was 0 in ideal case, or in the case of the minimum, the highest negative number and the lowest positive number in the case of the maximum are sought. As for predictions, we can see that the values of the individual numbers are very similar for all networks and the testing, training, and validation datasets. In terms of the residuals, the best network appears to be 2. MLP 10-7-1 in the case of training and testing datasets. In the case of the validation dataset, the most successful network in terms of the smallest residuals was 5. MLP 10-8-1. Figure 3.27 presents the graph of the prediction of the time series for USD/oz, displaying the course of all networks’ curves. It follows from the figure that all networks show similar behaviour, as looking at the curves displayed it can be concluded that their courses are very similar in many parts of the graph. Nevertheless, it can be said that 3. MLP 10-8-1 is the most suitable network for predicting the price of gold time series due to its best performance. The sample in this case includes the training, testing, and validation datasets. Graphs of smoothed time series and prediction for 44 trading days is presented in Fig. 3.28. It can be seen from the figure that the best networks in terms of predicting the time series are the networks 1. MLP 10-7-1 and also the most efficient network 3. MLP 10-8-1. In the case of the third network, it can be seen that the curve of this networks copies the curve of the actual development of price of gold in many parts of the graph.
−3.0649
−3.7112
−3.1577
−2.4925
Date-1 → hidden neuron 1
Date-1 → hidden neuron 2
Date-1 → hidden neuron 3
Date-1 → hidden neuron 4
1
2
3
4
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 1. MLP 10-7-1 2. MLP 10-7-1
Connections 1. MLP 10-7-1
Weight ID
−1.3852
−4.9083
−2.2151
−8.2046
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 2. MLP 10-7-1 3. MLP 10-8-1
Table 3.42 Network weights—MLP NN Experiment 3 (Tibco 2020)
5.1852
6.3569
6.1811
7.3485
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 3. MLP 10-8-1 4. MLP 10-8-1
−3.7492
−5.9227
−14.3696
0.0373
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 4. MLP 10-8-1 5. MLP 10-8-1
−12.0061
−3.9044
0.7629
−3.1976
Weight values 5. MLP 10-8-1
3.2 Multi-Layer Perceptron Neural Networks 97
98
3 Artificial Neural Networks—Selected Models
Table 3.43 Correlation coefficients—MLP NN experiment 3 (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. MLP 10-7-1
0.983571
0.984685
0.982238
2. MLP 10-7-1
0.983994
0.984014
0.981359
3. MLP 10-8-1
0.988940
0.988240
0.988279
4. MLP 10-8-1
0.982963
0.981962
0.981537
5. MLP 10-8-1
0.984047
0.983372
0.981882
Experiment 4 5.
Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 1-day time lag of time series
This experiment includes five input variables, specifically time—date, day of the week, day of the month, month, and year. It includes one output variable—the price of gold. The time lag of the time series is one day. The overview of the individual networks used within the fourth experiment is given in Table 3.46. Table 3.46 presents five most successful networks with the best characteristics. An important value that is interesting in relation to the dependence on the individual networks is the network performance, which exceeds the value of 0.98 in the case of testing, validation, and training datasets. In the case of the training network’s performance, the best network appears to be 1. RBF 5-10-1, with the training dataset performance value of 0.987487. This also applies to the case of the testing dataset, where the most efficient network appears to be 1. RBF 5-10-1. In terms of performance, the most successful network for the validation dataset is 4. MLP 5-11-1, achieving the value of 0.985807. The individual weights of the networks are given in Table 3.47. Due to its large extent, only a small part of the table is presented. Even in Experiment 4, the correlation coefficients for individual networks and datasets were determined. In this case, Table 3.48 clearly shows that the values of the training, testing, and validation datasets are very similar. The correlation coefficients exceed the value of 0.98 in all cases, which indicates good network performance. It is required that the value is as close to 1 as possible. The prediction statistics of the individual networks are given in Table 3.49. Table 3.50 presents the data statistics. In terms of the smallest residuals, the most successful network appears to be 1. MLP 5-10-1 for all datasets. The graph of the USD/oz time series and the individual networks is shown in Fig. 3.29. It can be seen from the figure that most of the networks follow closely the curve of USD/oz. The graphs of the smoothed time series and prediction for 44 trading days (two calendar months) for all networks are presented in Fig. 3.30. It can be concluded from the graph that all five networks have a similar prediction course. The test curve (marked blue) is most closely followed by the curve of the network 1. MLP 5-10-1.
1720.420 −245.810 216.087 −188.106
1772.094
−176.668
232.737
−196.447
Maximum prediction (validation)
Minimum residual (train)
Maximum residual (train)
Minimum residual (test)
4.819
Maximum standard residual (validation)
3.612
5.241
Minimum standard residual (test) −4.539
−4.788
−5.110
Maximum standard residual (train)
6.001
5.457
5.801
Minimum standard residual (train)
−4.342
−6.208
−4.404
Maximum residual (validation)
Minimum standard residual (validation)
146.130
190.379
Minimum residual (validation)
Maximum standard residual (test)
205.882 −183.657
230.718
−171.543
Maximum residual (test)
588.106
1719.120
587.034
1770.735
593.275
Minimum prediction (test)
586.225 1721.376
595.647
1779.177
Maximum prediction (train)
Minimum prediction (validation)
591.288
Minimum prediction (train)
2. MLP 10-7-1
Maximum prediction (test)
1. MLP 10-7-1
Statistics
Table 3.44 Prediction statistics—MLP NN Experiment 3 (Tibco 2020)
5.686
−5.449
7.306
−4.178
7.488
−4.529
182.773
−175.164
246.074
−140.732
246.758
−149.229
1715.616
579.202
1714.025
578.219
1716.653
577.475
3. MLP 10-8-1
5.490
−3.946
7.051
−4.268
7.009
−4.805
221.070
−158.886
293.569
−177.708
286.303
−196.261
1788.217
606.105
1783.126
606.678
1788.079
606.397
4. MLP 10-8-1
4.722
−4.451
6.323
−4.599
6.296
−4.573
188.422
−177.601
252.824
−183.867
248.903
−180.784
1770.771
570.893
1767.646
568.868
1771.600
566.783
5. MLP 10-8-1
3.2 Multi-Layer Perceptron Neural Networks 99
100
3 Artificial Neural Networks—Selected Models
Table 3.45 Data statistics—Experiment 3 MLP NN (Tibco 2020) Samples
Date input
USD/oz target
Minimum (train)
38,720.00
529.500
Maximum (train)
44,020.00
1895.000
Mean (train)
41,340.06
1201.595
Standard deviation (train)
1523.42
316.367
Minimum (test)
38,728.00
538.750
Maximum (test)
44,015.00
1895.000
Mean (test)
41,515.31
1218.531
Standard deviation (test)
1590.14
313.031
Minimum (validation)
38,722.00
524.750
Maximum (validation)
44,019.00
1834.000
Mean (validation)
41,440.15
1207.606
Standard deviation (validation)
2335.57
543.600
Minimum (overall)
38,720.00
524.750
Maximum (overall)
44,020.00
1895.000
Mean (overall)
41,381.33
1205.034
Standard deviation (overall)
1533.01
313.589
Fig. 3.27 Prediction of USD/oz time series—MLP NN Experiment 3 (Tibco 2020)
3.2 Multi-Layer Perceptron Neural Networks
101
Fig. 3.28 Smoothed time series and prediction for 44 trading days 1. MLP 10-7-1, 2. MLP 10-7-1, 3. MLP 10-8-1, 4. MLP 10-8-1, 5. MLP 10-8-1—Experiment 3 (Tibco 2020)
The network is also the most successful one in terms of the performance and size of residuals. Experiment 5 5.
Input variables (time—date, day of the week, day of the month, month, year), 1 output variable (price of gold), 5-day time lag of time series
This experiment includes five input variables, specifically time—date, day of the week, day of the month, month, and year. It includes one output variable—the price of gold. The time lag of the time series is five days. The overview of individual networks used within the Experiment 5 is given in Table 3.51, where the five most successful networks with the best characteristics are presented.
Net. name
MLP 5-10-1
MLP 5-10-1
MLP 5-10-1
MLP 5-11-1
MLP 5-10-1
Index
1
2
3
4
5
0.986565
0.987266
0.986425
0.986726
0.987487
Training perf
0.985645
0.987362
0.986850
0.986805
0.987378
Test perf
0.984734
0.985766
0.985807
0.984748
0.985182
Validation perf
1332.921
1264.658
1346.576
1317.137
1241.913
Training error
Table 3.46 Overview of networks—MLP NN Experiment 4 (Tibco 2020)
1396.626
1229.756
1277.803
1283.801
1228.974
Test error
1372.477
1278.896
1275.916
1369.677
1331.405
Validation error
BFGS 451
BFGS 419
BFGS 708
BFGS 297
BFGS 533
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Tanh
Logistic
Tanh
Tanh
Tanh
Hidden activation
Logistic
Exponential
Tanh
Tanh
Exponential
Output activation
102 3 Artificial Neural Networks—Selected Models
−32.1759
−0.0117
0.1151
1.2548
Date-1 → hidden neuron 1
Date-1 → hidden neuron 2
Date-1 → hidden neuron 3
Date-1 → hidden neuron 4
1
2
3
4
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 1. MLP 5-10-1 2. MLP 5-10-1
Connections 1. MLP 5-10-1
Weight ID
0.2381
−0.0399
0.0077
−6.7166
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 2. MLP 5-10-1 3. MLP 5-10-1
Table 3.47 Network weights—MLP NN Experiment 4 (Tibco 2020)
12.259
−8.821
1.006
28.805
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 3. MLP 5-10-1 4. MLP 5-11-1
0.034
0.044
0.003
−17.432
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 4. MLP 5-11-1 5. MLP 5-10-1
1.4382
0.1220
−0.0242
−15.2650
Weight values 5. MLP 5-10-1
3.2 Multi-Layer Perceptron Neural Networks 103
104
3 Artificial Neural Networks—Selected Models
Table 3.48 Correlation coefficients—MLP NN Experiment 4 (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. MLP 5-10-1
0.987487
0.987378
0.985182
2. MLP 5-10-1
0.986726
0.986805
0.984748
3. MLP 5-10-1
0.986425
0.986850
0.985807
4. MLP 5-11-1
0.987266
0.987362
0.985766
5. MLP 5-10-1
0.986565
0.985645
0.984734
An important value, which is interesting in relation to the individual networks is the network performance, which exceeds the value of 0.97 for testing, validation, and training datasets of all networks. In the case of the training dataset, the best performance is achieved by the network 1. MLP 25-11-1, by the training dataset performance value of 0.985793. In the case of the testing dataset, the most successful network appears to be 4. MLP 25-11-1, with the performance value of 0.984157. In the case of the validation dataset, the most efficient network is 1. MLP 25-11-1. As for the error level in the testing, training, and validation datasets, the most successful network with the smallest error is 1. MLP 25-11-1. The weights of the networks (a part of them) are presented in Table 3.52. Due to its large extent, only a small part of the table is presented for illustration. Correlation coefficients were determined also for the Experiment 5. It can be seen from Table 3.53 that in the case of validation, testing, and training datasets, the correlation coefficients of all individual networks exceed the value of 0.9, which indicates a very good network performance. Prediction statistics within the fifth experiment are given in Table 3.54, which clearly shows that the values of the datasets in the case of each network do not differ significantly. Data statistics are presented in Table 3.55. Figure 3.31 shows the trend of the individual networks’ curves in comparison with the curve of USD/oz (marked blue). It follows from the figure that the curves of the individual networks at least partially follow the shape of the USD/oz curve. However, it is not clear which of the curves is the closest to the test curve. The graphs of the smoothed time series and prediction for 44 trading days (two calendar months) are presented in Fig. 3.32. The figure shows that only the networks 1. MLP 25-11-1 and 4. MLP 25-11-1 were able to predict the development of the price of gold most precisely, although with some deviations, as e.g. in the case of the first network. Experiment 6 5.
Input variables (time—date, day of the week, day of the month, month, and year), 1 output variable (price of gold), 10-day time lag of time series
For Experiment 6, 5 input variables were selected, specifically time—date, day of the week, day of the month, month, and year. As an output variable, one particular
1820.055 −186.732 202.530 −207.486
1735.300
−173.015
186.798
−150.867
Maximum prediction (validation)
Minimum residual (train)
Maximum residual (train)
Minimum residual (test)
4.358
Maximum standard residual (validation)
3.908
5.249
Minimum standard residual (test) −4.705
−5.791
−4.304
Maximum standard residual (train)
4.493
5.581
5.301
Minimum standard residual (train)
−5.303
−5.145
−4.910
Maximum residual (validation)
Minimum standard residual (validation)
144.634
159.023
Minimum residual (validation)
Maximum standard residual (test)
188.085 −174.122
157.519
−193.503
Maximum residual (test)
540.056
1810.578
543.238
1738.183
588.022
Minimum prediction (test)
538.792 1819.319
587.187
1749.257
Maximum prediction (train)
Minimum prediction (validation)
585.089
Minimum prediction (train)
2. MLP 5-10-1
Maximum prediction (test)
1. MLP 5-10-1
Statistics
Table 3.49 Prediction statistics—MLP NN Experiment 4 (Tibco 2020)
5.721
−4.538
7.427
−4.164
7.294
−4.111
204.359
−162.088
265.500
−148.837
267.656
−150.861
1714.721
534.561
1713.292
535.615
1718.273
533.790
3. MLP 5-10-1
4.240
−5.492
5.369
−6.320
6.219
−5.667
151.619
−196.405
188.266
−221.621
221.156
−201.539
1755.186
586.129
1766.714
586.143
1761.783
586.162
4. MLP 5-11-1
3.902
−5.209
5.673
−7.222
6.415
−6.279
144.564
−192.960
212.004
−269.901
234.199
−229.225
1799.460
529.500
1800.901
529.500
1801.645
529.500
5. MLP 5-101
3.2 Multi-Layer Perceptron Neural Networks 105
41,340.06
1523.42
38,728.00
44,015.00
41,515.31
1590.14
38,722.00
44,019.00
41,440.15
2335.57
38,720.00
44,020.00
41,381.33
1533.01
Minimum (test)
Maximum (test)
Mean (test)
Standard deviation (test)
Minimum (validation)
Maximum (validation)
Mean (validation)
Standard deviation (validation)
Minimum (overall)
Maximum (overall)
Mean (overall)
Standard deviation (averall)
44,020.00
Maximum (train)
Standard deviation (train)
38,720.00
Minimum (train)
Mean (train)
Date input
Samples
1.400899
4.018564
6.000000
2.000000
1.396892
4.027322
6.000000
2.000000
1.441273
4.034608
6.000000
2.000000
1.391633
4.013255
6.000000
2.000000
Day in week input
Table 3.50 Data statistics—MLP NN Experiment 4 (Tibco 2020)
8.70385
15.61643
31.00000
1.00000
8.69112
15.83789
31.00000
1.00000
8.90170
15.28415
31.00000
1.00000
8.64905
15.64016
31.00000
1.00000
Day in month input
3.42037
6.39503
12.00000
1.00000
3.57921
6.52095
12.00000
1.00000
3.42252
6.51730
12.00000
1.00000
3.41621
6.34191
12.00000
1.00000
Month input
4.201
2012.805
2020.000
2006.000
6.354
2012.954
2020.000
2006.000
4.367
2013.162
2020.000
2006.000
4.172
2012.696
2020.000
2006.000
Year input
313.589
1205.034
1895.000
524.750
543.600
1207.606
1834.000
524.750
313.031
1218.531
1895.000
538.750
316.367
1201.595
1895.000
529.500
USD/oz target
106 3 Artificial Neural Networks—Selected Models
3.2 Multi-Layer Perceptron Neural Networks
107
Fig. 3.29 Graph of prediction of USD/oz time series—MLP NN Experiment 4 (Tibco 2020)
output variable was selected—the price of gold. The time lag of the time series is ten days. Table 3.56 presents the performance and the level of network error including the training algorithm, which is the same for all networks—BFGS. The error function is the sum of squares, the activation function of the hidden layer of neurons is the logistic function in three cases and Hyperbolic tangent in two cases, while as a function for the activation of neurons in the output layer, identity is used in one case, logistic function in two cases, exponential function in one case, and Sine in one case. It can be seen at first glance that the network performance in the Experiment 6 will also be high, although slightly lower in some cases than in the previous experiments. This is indicated by the fact that in terms of the performance in datasets (validation, testing, training), the values achieve more than 90%. The highest performance is achieved by the network 2. MLP 50-11-1. The weights of the individual networks are presented in Table 3.57; however, due to the large extent of the table, only the first four rows indicating the weights of the networks are presented. Table 3.57 is thus used for illustrative purposes only. The correlation coefficients of the datasets and individual networks determined within the sixth experiment are listed in Table 3.58. In the case of the correlation coefficients, the best values are those close to 1. In this experiment, all network values achieve the values higher than 0.97, which are considered to be very high values. At this moment, good results can be assumed in terms of predicting the price of gold using some of the aforementioned five networks.
108
3 Artificial Neural Networks—Selected Models
Fig. 3.30 Smoothed time series and prediction for 44 trading days 1. MLP 5-10-1, 2. MLP 5-10-1, 3. MLP 5-10-1, 4. MLP 5-11-1, 5. MLP 5-10-1—Experiment 4 (Tibco 2020)
Prediction statistics are given in Table 3.59. The best values of residuals in this case are achieved by the network 1. MLP 50-11-1 for the training and testing datasets. For the validation dataset, the best network in terms of residuals appears to be the network 3. MLP 50-7-1. Data statistics are given in Table 3.60. Figure 3.33 shows the trends of the individual networks’ curves in comparison with the USD/oz curve (marked blue). It can be seen from the figure that the curves of the individual networks quite successfully follow the course of the USD/oz curve. It can thus be assumed that most of the aforementioned five networks will serve as a suitable tool for predicting the price of gold time series.
Net. name
MLP 25-11-1
MLP 25-7-1
MLP 25-11-1
MLP 25-11-1
MLP 25-7-1
Index
1
2
3
4
5
0.983270
0.984736
0.983354
0.983858
0.985793
Training perf
0.981714
0.984157
0.981746
0.983318
0.983585
Test perf
0.979585
0.980520
0.979989
0.980422
0.980758
Validation perf
1648.149
1512.783
1639.858
1590.428
1401.024
Training error
Table 3.51 Overview of networks—MLP NN Experiment 5 (Tibco 2020)
1780.760
1555.678
1781.792
1622.432
1594.560
Test error
1811.187
1740.351
1775.321
1744.616
1710.007
Validation error
BFGS 152
BFGS 110
BFGS 187
BFGS 217
BFGS 111
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Tanh
Tanh
Tanh
Logistic
Tanh
Hidden activation
Logistic
Exponential
Exponential
Tanh
Logistic
Output activation
3.2 Multi-Layer Perceptron Neural Networks 109
Weight values 1. MLP 25-11-1
1.6902
−0.3760
−0.0819
0.2821
Connections 1. MLP 25-11-1
Date-1 → hidden neuron 1
Date-1 → hidden neuron 2
Date-1 → hidden neuron 3
Date-1 → hidden neuron 4
Weight ID
1
2
3
4
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 2. MLP 25-7-1
−4.1968
−0.5840
−0.0907
1.4174
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 2. MLP 25-7-1 3. MLP 25-11-1
Table 3.52 Network weights—MLP NN Experiment 5 (Tibco 2020)
3.2987
0.8235
−1.3749
0.4656
Weight values 3. MLP 25-11-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 4. MLP 25-11-1
−0.6413
0.2016
−0.7477
1.0665
Weight values 4. MLP 25-11-1
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 5. MLP 25-7-1
2.7451
0.2829
1.0247
−6.6613
Weight values 5. MLP 25-7-1
110 3 Artificial Neural Networks—Selected Models
3.2 Multi-Layer Perceptron Neural Networks
111
Table 3.53 Correlation coefficients—MLP NN Experiment 5 (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
0.985793
0.983585
0.980758
2. MLP 25-7-1
0.983858
0.983318
0.980422
3. MLP 25-11-1
0.983354
0.981746
0.979989
4. MLP 25-11-1
0.984736
0.984157
0.980520
5. MLP 25-7-1
0.983270
0.981714
0.979585
1. MLP 25-11-1
The graphs of the smoothed time series and predictions for 44 trading days (two calendar months) are shown in Fig. 3.34. The blue curve represents the actual development of the price of gold over a given period of time. Red curves represent the individual networks. It follows from the figure that the most accurate prediction of the development is provided by the network 1. MLP 50-11-1.
3.3 Long Short Term Memory Neural Networks Sak et al. (2014) argue that LSTM neural network (Long Short Term Memory) presents a specific architecture of so-called recurrent neural networks (RNN) designed to model time sequences including their dependence rate on the longdistance compared to conventional RNN. The author further claims that the structure, considering its repeating components, applies neither the activation function nor process stored values, and the gradient does not disappear throughout the training. LSTM usually implements units in “blocks” including several components. The blocks at issue contain three or four gates that control the information flow at the logistic function. The input gate, output gate, or forget gate serve as examples.
3.3.1 Mathematical Background According to Gers et al. (2000), LSTM (Long Short Term Memory) network is a special type of recurrent networks, which are able to learn long-term dependencies. This model was first introduced by Hochreiter and Schmidthuber (1997) and became quite popular. It can be said that LSTM networks are very versatile, they are able to solve a wide range of problems, and are currently among the most commonly used recurrent networks. This type of networks was created specifically for the purposes of avoiding difficulties with long-term dependencies. For this reason, their ability to memorize long periods of time was embedded in their structure; therefore, there is no need to learn it in any complicated way. According to Sak et al. (2014), the architecture of LSTM networks is based on cells arranged into chains, where a cell (Fig. 3.35) consists of several gates that
1778.752 −231.815 248.118 −187.488
1785.068
−172.894
231.648
−332.830
Maximum prediction (validation)
Minimum residual (train)
Maximum residual (train)
Minimum residual (test)
3.780
Maximum standard residual (validation)
4.217
6.352
Minimum standard residual (test) −4.870
−4.655
−8.335
Maximum standard residual (train)
6.115
6.222
6.189
Minimum standard residual (train)
−4.313
−5.813
−4.619
Maximum residual (validation)
Minimum standard residual (validation)
176.128
156.306
Minimum residual (validation)
Maximum standard residual (test)
255.864 −203.397
244.170
−178.349
Maximum residual (test)
577.991
1768.793
563.033
1764.446
571.273
Minimum prediction (test)
1778.137
560.289
577.483
1786.214
Maximum prediction (train)
Minimum prediction (validation)
566.260
Minimum prediction (train)
2. MLP 25-7-1
Maximum prediction (test)
1. MLP 25-11-1
Statistics
Table 3.54 Prediction statistics—MLP NN Experiment 5 (Tibco 2020)
3.826
−4.247
6.068
−3.996
6.742
−5.430
161.186
−178.950
256.145
−168.666
273.029
−219.890
1932.811
541.517
1832.368
529.864
1892.556
529.952
3. MLP 25-11-1
4.306
−4.389
5.194
−4.998
5.436
−4.687
179.642
−183.113
204.881
−197.135
211.413
−182.292
1847.954
564.257
1828.056
563.947
1853.302
563.568
4. MLP 25-11-1
3.623
−4.897
5.737
−4.397
5.872
−5.841
154.175
−208.415
242.102
−185.532
238.397
−237.112
1817.944
581.502
1780.957
580.407
1817.080
580.492
5. MLP 25-7-1
112 3 Artificial Neural Networks—Selected Models
41,340.06
1523.42
38,728.00
44,015.00
41,515.31
1590.14
38,722.00
44,019.00
41,440.15
2335.57
38,720.00
44,020.00
41,381.33
1533.01
Minimum (test)
Maximum (test)
Mean (test)
Standard deviation (test)
Minimum (validation)
Maximum (validation)
Mean (validation)
Standard deviation (validation)
Minimum (overall)
Maximum (overall)
Mean (overall)
Standard deviation (overall)
44,020.00
Maximum (train)
Standard deviation (train)
38,720.00
Minimum (train)
Mean (train)
Date input
Samples
1.400899
4.018564
6.000000
2.000000
1.396892
4.027322
6.000000
2.000000
1.441273
4.034608
6.000000
2.000000
1.391633
4.013255
6.000000
2.000000
Day in week input
Table 3.55 Data statistics—MLP NN Experiment 5 (Tibco 2020)
8.70385
15.61643
31.00000
1.00000
8.69112
15.83789
31.00000
1.00000
8.90170
15.28415
31.00000
1.00000
8.64905
15.64016
31.00000
1.00000
Day in month input
3.42037
6.39503
12.00000
1.00000
3.57921
6.52095
12.00000
1.00000
3.42252
6.51730
12.00000
1.00000
3.41621
6.34191
12.00000
1.00000
Month input
4.201
2012.805
2020.000
2006.000
6.354
2012.954
2020.000
2006.000
4.367
2013.162
2020.000
2006.000
4.172
2012.696
2020.000
2006.000
Year input
313.589
1205.034
1895.000
524.750
543.600
1207.606
1834.000
524.750
313.031
1218.531
1895.000
538.750
316.367
1201.595
1895.000
529.500
USD/oz target
3.3 Long Short Term Memory Neural Networks 113
114
3 Artificial Neural Networks—Selected Models
Fig. 3.31 Prediction of USD/oz time series—MLP NN Experiment 5 (Tibco 2020)
help this layer maintain internal memory. The authors add that LSTM cells can be classified into three types: forget, remember, and output. Sakti et al. (2015) says that the architecture of LSTM model includes the so-called input gate, whose task is to decide what to receive from the input, the so-called output gate, which reduces the perturbation of the output error to other blocks, and forget gate, whose tasks consists in deciding on remaining in the state. They are called gates, since the sigmoid limits the input to (0,1) and the multiplication by another vector thus decides on what will be passed on from the input. According to the authors, cell state is the cornerstone. It depends on several linear operations and is retained for all the time. The operation ∗ represents the product by individual components: C (t) = f (t) *C (t−1) + i (t) *C¯ (t)
(3.4)
The first step is the forget gate. In case a new important input appears, the original one needs to be forgotten. f t = o W f h (t−1) + U f x (t)
(3.5)
In the following step, it is necessary to decide on which new information will be accepted. This step consisted of two stages: input gate and tanh layer, which will be used to calculate the specific value of C¯ (t) :
3.3 Long Short Term Memory Neural Networks
115
Fig. 3.32 Graphs of smoothed time series and prediction for 44 trading days 1. MLP 25-11-1, 2. MLP 25-7-1, 3. MLP 25-11-1, 4. MLP 25-11-1, 5. MLP 25-7-1—Experiment 5 (Tibco 2020)
i t = o Wi h (t−1) + Ui x (t) + bi
(3.6)
C¯ (t) = tahn Wc h (t−1) + Uc x (t) + bc
(3.7)
The last step consists in the calculation of what will be used for output. As a conclusion, it could be said that the LSTM networks have a wide range of versions, for example peephole connections.
Net. name
MLP 50-11-1
MLP 50-11-1
MLP 50-7-1
MLP 50-8-1
MLP 50-8-1
Index
1
2
3
4
5
0.983978
0.984565
0.984449
0.985717
0.985725
Training perf
0.981748
0.981232
0.980179
0.982615
0.982392
Test perf
0.978069
0.978746
0.978733
0.980854
0.977933
Validation perf
1569.636
1512.697
1523.802
1400.356
1399.572
Training error
Table 3.56 Overview of networks—MLP NN Experiment 6 (Tibco 2020)
1755.150
1808.432
1906.181
1673.367
1694.422
Test error
1923.831
1867.791
1865.874
1684.065
1938.685
Validation error
BFGS 166
BFGS 95
BFGS 139
BFGS 152
BFGS 158
Training algorithm
SOS
SOS
SOS
SOS
SOS
Error function
Logistic
Tanh
Tanh
Logistic
Logistic
Hidden activation
Sine
Exponential
Logistic
Logistic
Identity
Output activation
116 3 Artificial Neural Networks—Selected Models
Weight values 1. MLP 50-11-1
−0.3180
0.7532
−0.1653
−0.6646
Connections 1. MLP 50-11-1
Date-1 → hidden neuron 1
Date-1 → hidden neuron 2
Date-1 → hidden neuron 3
Date-1 → hidden neuron 4
Weight ID
1
2
3
4
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 2. MLP 50-11-1
−0.3159
−0.0480
−0.1786
−1.5491
Weight values 2. MLP 50-11-1
Table 3.57 Network weights—MLP NN Experiment 6 (Tibco 2020)
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Connections 3. MLP 50-7-1
−0.0234
−0.0196
−0.0802
−0.7466
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 3. MLP 50-7-1 4. MLP 50-8-1
-0.15187
0.01234
0.01382
-0.27740
Date-1 → hidden neuron 4
Date-1 → hidden neuron 3
Date-1 → hidden neuron 2
Date-1 → hidden neuron 1
Weight values Connections 4. MLP 50-8-1 5. MLP 50-8-1
0.0870
−0.0067
0.0578
1.1943
Weight values 5. MLP 50-8-1
3.3 Long Short Term Memory Neural Networks 117
118
3 Artificial Neural Networks—Selected Models
Table 3.58 Correlation coefficients—MLP NN Experiment 6 (Tibco 2020) USD/oz train
USD/oz test
USD/oz validation
1. MLP 50-11-1
0.985725
0.982392
0.977933
2. MLP 50-11-1
0.985717
0.982615
0.980854
3. MLP 50-7-1
0.984449
0.980179
0.978733
4. MLP 50-8-1
0.984565
0.981232
0.978746
5. MLP 50-8-1
0.983978
0.981748
0.978069
3.3.2 Case LSTM networks can be used for the same purposes as convolutional neural networks (CNN), i.e. to solve classification and regression problems. In this example, a long short-term memory network will be built. Built Neural Network Figure 3.36 shows the basic structure of a LSTM neural network with the output size 3350. Figure 3.37 shows the parameters of the elementary layer (No. 2) where the Ramp function was used as the basic function. Figure 3.38 shows the parameters of the elementary layer, where the hyperbolic tangent function was used. Figure 3.39 shows the parameters of the threading layer where the plus function was used. In Fig. 3.40, the fifth layer of the (elementary) LSTM network can be observed where the hyperbolic tangent function is used again. Figure 3.41 shows the penultimate part in which the network data processing takes place, namely the linear layer of the LSTM network. Figure 3.42 shows the last phase of the LSTM network where the output of the network is represented in form of a vector. Trained Neural Network (With Observable Changes) Figure 3.43 shows the structure of an already trained neural network in the first phase. The output size here is 3550. Figure 3.44 illustrates the basic information about the elementary layer for which the Ramp function was selected in the case of a trained LSTM network. Figure 3.45 shows additional parameters related to the elementary layer for which the hyperbolic tangent function was used in the case of an already trained LSTM network.
1779.998 −173.967 228.190 −155.304 225.857 −167.260 223.765 −4.649 6.098 −3.797 5.521 −4.076
1844.443
548.838
1917.035
553.319
1961.064
−177.434
207.138
−203.057
207.555
−198.030
271.945
−4.743
5.537
−4.933
5.042
−4.498
Maximum prediction (train)
Minimum prediction (test)
Maximum prediction (test)
Minimum prediction (validation)
Maximum prediction (validation)
Minimum residual (train)
Maximum residual (train)
Minimum residual (test)
Maximum residual (test)
Minimum residual (validation)
Maximum residual (validation)
Minimum standard residual (train)
Maximum standard residual (train)
Minimum standard residual (test)
Maximum standard residual (test)
Minimum standard residual (validation)
2. MLP 50-11-1
Minimum prediction (train)
537.058
1767.279
538.278
1766.978
537.407
1. MLP 50-11-1
539.391
Statistics
Table 3.59 Prediction statistics—MLP NN Experiment 6 (Tibco 2020) 3. MLP 50-7-1
−5.734
5.599
−7.481
5.560
−4.659
242.820
−247.667
244.431
−326.636
217.034
−181.870
1755.507
579.686
1853.882
579.445
1764.938
579.091
−4.068
3.577
−9.645
4.440
−5.386
186.910
−175.797
152.101
−410.162
172.693
−209.476
1889.312
581.259
1982.162
580.306
1828.392
574.115
4. MLP 50-8-1
−4.167
6.295
−4.829
6.413
−4.988
336.292
−182.758
263.728
−202.315
254.086
−197.618
1734.646
516.751
1736.740
552.371
1740.535
531.270
5. MLP 50-8-1
3.3 Long Short Term Memory Neural Networks 119
41,340.06
1523.42
38,728.00
44,015.00
41,515.31
1590.14
38,722.00
44,019.00
41,440.15
2335.57
38,720.00
44,020.00
41,381.33
1533.01
Minimum (test)
Maximum (test)
Mean (test)
Standard deviation (test)
Minimum (validation)
Maximum (validation)
Mean (validation)
Standard deviation (validation)
Minimum (overall)
Maximum (overall)
Mean (overall)
Standard deviation (overall)
44,020.00
Maximum (train)
Standard deviation (train)
38,720.00
Minimum (train)
Mean (train)
Date input
Samples
1.400899
4.018564
6.000000
2.000000
1.396892
4.027322
6.000000
2.000000
1.441273
4.034608
6.000000
2.000000
1.391633
4.013255
6.000000
2.000000
Day in week input
Table 3.60 Data statistics—MLP NN Experiment 6 (Tibco 2020)
8.70385
15.61643
31.00000
1.00000
8.69112
15.83789
31.00000
1.00000
8.90170
15.28415
31.00000
1.00000
8.64905
15.64016
31.00000
1.00000
Day in month input
3.42037
6.39503
12.00000
1.00000
3.57921
6.52095
12.00000
1.00000
3.42252
6.51730
12.00000
1.00000
3.41621
6.34191
12.00000
1.00000
Month input
4.201
2012.805
2020.000
2006.000
6.354
2012.954
2020.000
2006.000
4.367
2013.162
2020.000
2006.000
4.172
2012.696
2020.000
2006.000
Year input
313.589
1205.034
1895.000
524.750
543.600
1207.606
1834.000
524.750
313.031
1218.531
1895.000
538.750
316.367
1201.595
1895.000
529.500
USD/oz target
120 3 Artificial Neural Networks—Selected Models
3.3 Long Short Term Memory Neural Networks
121
Fig. 3.33 Predictions of USD/oz time series—MLP NN Experiment 6 (Tibco 2020)
The threading layer in the case of a trained LSTM network is characterized by a plus function in this case, as also shown in Fig. 3.46. Figure 3.47 shows the parameters of an elementary layer of a trained LSTM network. In the case of this layer, the hyperbolic tangent function was used. Figure 3.48 shows the sixth part of the LSTM trained network in the form of a linear layer. Figure 3.49 shows the output of the trained LSTM network in form of a vector. Information about the Neural Network The basic structure of the LSTM network, including the designation of its individual parts, can be observed in Fig. 3.50. The individual layers of the network can be seen in Fig. 3.51, where there is a detailed specification of each layer of the LSTM network. A brief flow chart of the neural network is shown graphically in Fig. 3.52. Figure 3.53 shows a diagram of neural network nodes and Fig. 3.54 then a graph of LSTM neural network nodes. Results Figure 3.55 shows the result of the predicted price of gold in comparison with the real price of gold, where the predicted price is indicated by a blue curve; the red curve indicates the actual development of the price of gold in Czech crowns.
122
3 Artificial Neural Networks—Selected Models
Fig. 3.34 Smoothed time series and prediction for 44 trading days 1. MLP 50-11-1, 2. MLP 50-11-1, 3. MLP 50-7-1, 4. MLP 50-8-1, 5. MLP 50-8-1—Experiment 6 (Tibco 2020)
The graph of the variance of residues, i.e. the deviations of the measured values from the prediction in the case of LSTM neural network, is shown in Fig. 3.56. Of course, we strive for the smallest possible deviation, i.e. the smallest possible amplitude. Figure 3.57 shows a graph illustrating a balanced time series in the form of the price of gold in Czech crowns—a red curve, with a blue curve representing the predicted price of gold using LSTM neural network. It can be noticed that except for small deviations, the curves are almost identic, which is already a sign of a good result of the prediction (Fig. 3.57).
3.3 Long Short Term Memory Neural Networks
123
Fig. 3.35 Scheme of LSTM layer cells (Sak et al. 2014)
Figure 3.58 shows how the gold price prediction curve moved in the case of using LSTM neural network over a period of two months. Figure 3.59 shows the development of the price of gold over a period of two months, followed by an example of the prediction of the price of gold for two months ahead. Figure 3.60 shows the development of the gold price over the entire period under review, together with an example of the development of the gold price prediction curve marked in blue. Statistical characteristics that provide information in a concentrated form about the essential statistical properties of the studied group are shown in Fig. 3.61.
124
3 Artificial Neural Networks—Selected Models
Fig. 3.36 Basic information about the LSTM network (Wolfram Research 2020)
3.3 Long Short Term Memory Neural Networks
Fig. 3.37 Elementary layer—Ramp (Wolfram Research 2020)
Fig. 3.38 Elementary layer—Tanh (Wolfram Research 2020)
Fig. 3.39 Threading layer—parameters (Wolfram Research 2020)
125
126
3 Artificial Neural Networks—Selected Models
Fig. 3.40 Elementary layer—Tanh (Wolfram Research 2020)
Fig. 3.41 Linear layer of the LSTM network (Wolfram Research 2020)
Fig. 3.42 LSTM output (Wolfram Research 2020)
3.3 Long Short Term Memory Neural Networks
127
Fig. 3.43 Basic information about the neural network—trained LSTM (Wolfram Research 2020)
128
3 Artificial Neural Networks—Selected Models
Fig. 3.44 Elementary layer—Ramp of a trained LSTM (Wolfram Research 2020)
Fig. 3.45 Elementary layer—Tanh of a trained LSTM (Wolfram Research 2020)
3.3 Long Short Term Memory Neural Networks
Fig. 3.46 Threading layer—trained LSTM (Wolfram Research 2020)
Fig. 3.47 Elementary layer—Tanh of a trained LSTM (Wolfram Research 2020)
129
130
3 Artificial Neural Networks—Selected Models
Fig. 3.48 Linear layer—trained LSTM (Wolfram Research 2020)
Fig. 3.49 Output—trained LSTM (Wolfram Research 2020)
Fig. 3.50 Basic structure of the LSTM neural network (Wolfram Research 2020)
Fig. 3.51 Individual layers of the LSTM neural network (Wolfram Research 2020)
3.3 Long Short Term Memory Neural Networks
131
Fig. 3.52 Flow chart of the LSTM neural network (Wolfram Research 2020)
Fig. 3.53 Diagram of LSTM neural network nodes (Wolfram Research 2020) 5
3
6
17
18
19
e
s
R
BG
4
7
20
21
e
RNN
s
BG
8
9
SA
r
2
0
1
14
13
11
15
16
FC
c
12 t
10
SA
t
Tensor
RNN
elemwise add
slice axis
SwapAxis
relu
FullyConnected
Reshape
expand dims
tanh
copy
BlockGrad
Fig. 3.54 Graph of LSTM neural network nodes (Wolfram Research 2020)
squeeze
132
3 Artificial Neural Networks—Selected Models
Fig. 3.55 Predicted and actual gold price in CZK (Wolfram Research 2020)
Fig. 3.56 LSTM neural network residues (Wolfram Research 2020)
3.3 Long Short Term Memory Neural Networks
133
Fig. 3.57 Balanced time series and prediction—a comparison (Wolfram Research 2020)
Fig. 3.58 Development of the gold price prediction curve for 2 months (Wolfram Research 2020)
Fig. 3.59 Development of the gold price curve for two months plus the prediction of the gold price for the next 2 months (Wolfram Research 2020)
134
3 Artificial Neural Networks—Selected Models
Fig. 3.60 Overall development of the time series plus its prediction (Wolfram Research 2020)
Fig. 3.61 Statistical characteristics of the data file (Wolfram Research 2020)
References Altun, H., A. Bilgil, and B.C. Fidan. 2007. Treatment of multi-dimensional data to enhance neural network estimators in regression problems. Expert Systems with Applications 32 (2): 599–605. Ashoori, S., and S. Mohammadi. 2011. Compare failure prediction models based on feature selection technique: Empirical case from Iran. Procedia Computer Science 3 (1): 568–573. Bas, E., V.R. Uslu, and E. Egrioglu. 2016. Robust learning algorithm for multiplicative neuron model artificial neural networks. Expert Systems with Applications 56 (3): 80–88. Bishop, Ch.M. 1995. Neural networks for pattern recognition. New York, NY United States: Oxford University Press. Echávarri Otero, J., E. de la Guerra Ochoa, Chacón Tanarro, P. Lafont Morgado, A. Díaz Lantada, and J.L. Muñoz Sanz. 2014. Artificial neural network approach to predict the lubricated friction coefficient: From standards chips to embedded systems on chip. Lubrication Science 26 (3): 141–162. Gers, F.A., J. Schmidhuber, and F. Cummins. 2000. Learning to forget: Continual prediction with lstm. Neural Computation 12 (10): 2451–2471. Guan, X., Y. Zhu, and W. Song. 2016. Application of RBF neural network improved by peak density function in intelligent color matching of wood dyeing Chaos. Solitons 89: 485–490. Gubana, A. 2015. State-of-the-art report on high reversible timber to timber strengthening interventions on wooden floors. Construction and Building Materials 97: 25–33. Guresen, E., and G. Kayakutlu. 2011. Definition of artificial neural networks with comparison to other networks. Procedia Computer Science 3: 426–433. Hamid, A., and A. Habib. 2014. Financial forecasting with neural networks. Academy of Accounting and Financial Studies Journal 18 (4): 37–55.
References
135
Hashemi, S., and M.R. Aghamohammadi. 2013. Wavelet based feature extraction of voltage profile for online voltage stability assessment using RBF neural network. International Journal of Electrical Power 49: 86–94. Haykin, S. 2001. Neural networks. A comprehensive foundation. Person Prentice Hall. Hochreiter, S., and J. Schmidhuber. 1997. Long short-term memory. Neural Computation, 9(8), 1735–1780. Jingfei, J., C. Dengqing, and C. Huatao. 2016. Boundary value problems for fractional differential equation with causal operators. Applied Mathematics and Nonlinear Sciences 1 (1): 11–22. Kiruthika, M., and M. Dilsha. 2015. A neural network approach for microfinance credit scoring. Journal of Statistics and Management Systems 18 (1): 121–138. Klieštik, T. 2013. Models of autoregression conditional heteroskedasticity garch and arch as a tool for modeling the volatility of financial time series. Ekonomicko-Manažerské Spektrum 7 (1): 2–10. Kumar, M., and N. Yadav. 2011. Multilayer perceptrons and radial basis function neural network methods for the solution of differential equations: A survey. Computers & Mathematics with Applications 62 (10): 3796–3811. Lahsasna, A., R.N. Ainon, and Y.W. Teh. 2008. Intelligent credit scoring model using soft computing approach. International Conference on Computer and Communication Engineering, 396–402. Liu, W., X.P. Li, H.-O. Mao, and T.-Y. Chai. 2004. Neural network cost prediction model based on real-coded genetic algorithm and its application. Kongzhi Lilun yu Yingyong/Control. Theory & Applications (China), 21(3), 423–426. Michal, P., A. Vagaská, M. Gombár, J. Kmec, E.A. Spišák, and D. Kuˇcerka. 2015. Usage of neural network to predict aluminium oxide layer thickness. The Scientific World Journal, 1–10. Mileris, R., and V. Boguslauskas. 2011. Credit risk estimation model development process: Main steps and model improvement. Engineering Economics 22 (2): 126–133. Moreno, J.J.M., A.P. Pol, and P.M. Gracia. 2011. Artificial neural networks applied to forecasting time series. Coden Psoteg, Psicothema 23 (2): 322–329. Olej, V. 2003. Modelovanie ekonomických procesov na báze výpoˇctovej inteliegencie [Modeling of economic processes based on computational intelligence], Miloš Vognar MV. Pao, H.T., and G. Kayakutlu. 2008. A comparison of neural network and multiple regression analysis in modeling capital structure. Expert Systems with Applications 35 (3): 720–727. Pazouki, M., Z. Wu, Z. Yang, and D.P.F. Moeller. 2015. An efficient learning method for RBF neural networks. Proceedings of the International Joint Conference on Neural Networks, 1–6. Rowland, Z., and J. Vrbka. 2016. Optimization of a company’s property structure aiming at maximization of its profit using neural networks with the example of a set of construction companies. Mathematical Modeling in Economics 3–4 (7): 36–41. Sakti, S., F. Ilham, G. Neubig, T. Toda, A. Purwarianti, and S. Nakamura. 2015. Incremental sentence compression using lstm recurrent networks. In2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 252–258. Sak, H., A. Senior, and F. Beaufays. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 338–342. Sayadi, A.R., S.M.M. Tavassoli, Monjezi, and M. Rezaei. 2012. M. Application of neural networks to predict net present value in mining projects. Arabian Journal of Geosciences, 7(3), 1067–1072. Slavici, T., D. Mnerie, and S. Kosutic. 2012. Some applications artificial neural networks in agricultural management. In: Actual Tasks on Agricultural Engineering. Proceedings of the 40. International Symposium on Agricultural Engineering, 363–373. Tuˇcková, J. 2003. Introduction to the theory and applications of artificial neural networks. Prague: ˇ CVUT Publishing.
136
3 Artificial Neural Networks—Selected Models
Vochozka, M. 2017. Formation of complex company evaluation method through neural networks based on the example of construction companies’ collection. AD ALTA-Journal of Interdisciplinary Research 7 (2): 232–239. Vochozka, M., Z. Rowland, V. Stehel, P. Šuleˇr, and J. Vrbka. 2016. Business cost modeling using ˇ neural networks. Ceské Budˇejovice: Institute of Technology and Business.
Chapter 4
Comparison of Different Methods
4.1 Neural networks 4.1.1 Case Basic information related to the artificial neural network method is shown in Fig. 4.1. The predictors in this case are neural networks, the number of test examples is 1221 and the number of training examples is 2442. The report related to NN is shown in Fig. 4.2. Figure 4.3 shows a comparison chart comparing actual gold prices with those predicted using artificial neural networks. The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It can be noticed that the accuracy of the predicted values compared to the actual values is very high. This model therefore proves to be suitable for prediction. Probability Density Histogram related to the NN method is shown in Fig. 4.4. Figure 4.5 shows the histogram of the residues, i.e. the frequency of the difference between the actual and predicted values. The residue can be understood as the size of the error that we make at the selected point in the estimation. It can be noted that in the case of using the NN model, bin 0–5 has the largest representation. The extremes, in the form of residues −60 and 60, have the smallest proportion of the entire histogram, as they have occurred the least amount of times. The residue chart can be seen in Fig. 4.6. The value of Standard Deviation for this experiment is 20.7404. The value of Mean Cross Entropy is 4.45335, Mean Deviation is 14.9052, Mean Square is 430.163 and Evaluation Time is 0.00566.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_4
137
138
4 Comparison of Different Methods
Fig. 4.1 Report (Wolfram Research 2020)
Fig. 4.2 Report—NN (Wolfram Research 2020)
The time series, including the prediction for two months, is shown in Fig. 4.7. It can be observed from the chart that the level of accuracy of gold price prediction is very high in the case of using the neural network method. The curve of the predicted price (blue) accurately copies the curve of the observed price of gold in CZK (red). The size of residues over the observed years can be seen in Fig. 4.8. The aforementioned prediction of gold prices for two months is shown in Fig. 4.9. It is possible to observe a prediction of a significant decline in the trend at the beginning of August with a subsequent gradual increase and further decline in gold prices at the end of August. The time series of the observed development of the gold price, including the prediction of the development of the gold price for the next two months, is shown in Fig. 4.10.
4.1 Neural networks
139
Fig. 4.3 Comparison Plot—NN (Wolfram Research 2020)
Fig. 4.4 Probability Density Histogram—NN (Wolfram Research 2020)
A more detailed view of the observed development of the price of gold in CZK with the connection of gold price prediction using NN is shown in Fig. 4.11.
140
4 Comparison of Different Methods
Fig. 4.5 Residual Histogram—NN (Wolfram Research 2020)
Fig. 4.6 Residual Plot—NN (Wolfram Research 2020)
4.2 Decision Tree Currently, more and more scientific disciplines use a tree structure for storing knowledge. This applies also to e.g. the field of artificial intelligence, among which the decision trees are ranked as well (Andone and Sireteanu 2009). Decision trees can be seen as a non-linear, hierarchical system, which provides the possibility to store knowledge. Sagi and Rokach (2020) consider decision trees to be a tool for representation and support in the case of solving multi-stage decision-making processes in the conditions of uncertainty and risk. The authors also add that decision trees are the most important
4.2 Decision Tree
141
Fig. 4.7 Time series with prediction—NN (Wolfram Research 2020)
Fig. 4.8 Residues—NN (Wolfram Research 2020)
graphical means of decision analysis that applies a conceptual apparatus of graph theory. According to Xiao and Xu (2020), decision trees outline various variants, risk factors along with their development, as well as the potential consequences of these variants that bring along the risk. Decision trees can be characterized quite precisely as a certain sequence of nodes (knots) and edges of an oriented graph. Their basic structure consists of a combination of decision (Fig. 4.13) and chance nodes (4.12),
142
4 Comparison of Different Methods
Fig. 4.9 Prediction for two months—NN (Wolfram Research 2020)
Fig. 4.10 Time series with prediction—NN (Wolfram Research 2020)
Fig. 4.11 Time series with prediction 2—NN (Wolfram Research 2020)
4.2 Decision Tree Fig. 4.12 Chance node (Author according to Fotr 2006)
143
S 1 S 2
Fig. 4.13 Decision nodes (Author according to Fotr 2006)
1
R
2
R
where chance nodes represent the process stage in problem solving, where the given alternative is selected “by nature”, regardless of the will of the decision-maker. Fotr (2006) refers to the individual variants as situational, showing the individual values of individual risk factors, where the decision-maker is not able to modify the values in any way. The individual chance nodes can be shown by means of an edge and circles coming from the given nodes. Decision nodes are most commonly represented in the form of diamonds, rectangles, or squares. Decision nodes represent the stage of a decision-making process, where the decision-maker can choose one particular variant from all variants possible. Logically, the decision-maker chooses the variant that is most acceptable to him at the moment. From these decision nodes, edges representing the individual variants of decisions are coming. The figures below (Figs. 4.12 and 4.13) show chance and decision nodes. In the case of decision trees, a combination of trees with both chance and decision nodes can be encountered in many situations. In practice, however, there are also the decision trees with decision nodes only. These threes are referred to as deterministic decision trees.
4.2.1 Case Using the Decision Tree method, the development of the price of gold is predicted in this experiment. Basic information on the Decision Tree method used is shown in Fig. 4.14. The predictors in this case are the Decision Tree, the number of test examples is 1221 and the number of training examples is 2442. The report related to this method is shown in Fig. 4.15. Figure 4.16 shows a comparison chart comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Decision Tree method, shown by a curve labelled “predictions". The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles.
144
4 Comparison of Different Methods
Fig. 4.14 Basic information—Decision Tree (Wolfram Research 2020)
Fig. 4.15 Report—Decision Tree (Wolfram Research 2020)
It should be noted that the accuracy of the predicted values compared to the actual values is not very high and large residues can be seen. This model therefore proves to be less suitable for prediction. Probability Density Histogram related to the Decision Tree method is shown in Fig. 4.17. Figure 4.18 shows a histogram of residues, i.e. the frequency of the difference between the actual and predicted value. The residue can be understood as the size of the error that we make at the selected point in the prediction. It should be noted that in the case of using the Decision Tree method, the most represented is the range
4.2 Decision Tree
145
Fig. 4.16 Comparison Plot—Decision Tree (Wolfram Research 2020)
Fig. 4.17 Probability Density Histogram—Decision Tree (Wolfram Research 2020)
of values between −20 and 0. It should be noted that when using this method, the extreme values of residues range up to −200 and 200, compared to −50 and 50 for NN. The extremes, in the form of residues with values of −200 and 200, have the smallest representation of the whole histogram, because they occurred the least amount of times.
146
4 Comparison of Different Methods
Fig. 4.18 Residual Histogram—Decision Tree (Wolfram Research 2020)
Residues mentioned above can be seen graphically shown in Fig. 4.19. The value of Standard Deviation for this experiment is 58.5175. The value of Mean Cross Entropy is 5.4493, Mean Deviation is 45.9167, Mean Square is 3424.3 and Evaluation Time is 0.0049. The time series, including the two-month prediction, is shown in Fig. 4.20. Based on the chart, we can observe that the gold price prediction is inaccurate when using the Decision Tree method. The curve of the predicted price (green) does not copy the curve of the observed price of gold in CZK (red). The Decision Tree method was able to predict the direction of the curve at least partially, but a closer look reveals that the prediction is very inaccurate. The size of residues over the observed years can be seen in Fig. 4.21.
Fig. 4.19 Residual Plot—Decision Tree (Wolfram Research 2020)
4.2 Decision Tree
147
Fig. 4.20 Time series with prediction—Decision Tree (Wolfram Research 2020)
Fig. 4.21 Residues - Decision Tree (Wolfram Research 2020)
The aforementioned prediction of gold prices for two months using the Decision Tree method is shown in Fig. 4.22. It is clear from the figure that it is not appropriate to predict future values using the Decision Tree method. Time series including the two-month prediction is shown in Fig. 4.23. The time series with prediction 2 shown in Fig. 4.24 clearly shows that it is not appropriate to predict future values using the Decision Tree method.
148
4 Comparison of Different Methods
Fig. 4.22 Prediction—Decision Tree (Wolfram Research 2020)
Fig. 4.23 Time series with prediction—Decision Tree (Wolfram Research 2020)
Fig. 4.24 Time series with prediction 2—Decision Tree (Wolfram Research 2020)
4.3 Gaussian Process
149
4.3 Gaussian Process Gaussian process is a very interesting technique, whose popularity has grown significantly in recent years. In terms of its definition, it can be said that Gaussian process is basically a generalization of the Gaussian probability distribution. However, while the principle of probability distribution consists in the description of vectors or variables, the random process deals with the description of a function (Klyuchnikov and Burnaev, 2020). An example could be X ∼ GP (m, k), which can be explained by the random function X being distributed by means of the Gaussian distribution, with the function of the mean m and covariance function k. Generally, it can be explained by the Gaussian process being represented using the distribution through all suitable functions, where the properties of these functions are determined on the basis of covariance functions. The existence of many covariance functions enables a flexible setting of the Gaussian process. Another possibility is the interpretation of objective function as an unknown scalar of the function x ∈ Rn in the dimensional space n. Then, evaluating of such a function on a set of solution points XN = {×1,…, xN} will result in the set of function values tN = {t1,…, tN}, where ∀i = 1,…, N, ti = f(xi). In simpler terms, the input for the Gaussian process is represented by the training set D of N data points with corresponding function values at those points. The training set can be also described as follows: D = { (xi , ti )|i = 1, . . . , n} = (X, t) (X) D = { (xi , ti )|i = 1, . . . , n} = (X, t)
(4.1)
The implementation of the point x and the corresponding value of the function f(x) can be seen as certain points of distribution certainty. In other words, the functions with inconsistent properties and structure determined by the evaluation are eliminated. The biggest problem within the Gaussian process consists in finding a suitable combination of properties for a selected covariance function (Rahmussen and Williams, 2006). Bajer et al. (2015) state that many different studies have shown that the application of models based on the Gaussian process can result in a significant acceleration compared to other types of models.
4.3.1 Case Using the Gaussian Process method, the development of the price of gold is predicted in this experiment. The basic information about the Gaussian Process method used is shown in Fig. 4.25. The predictor in this case is the Gaussian Process, the number of test examples is 1221 and the number of training examples is 2442.
150
4 Comparison of Different Methods
Fig. 4.25 Basic information—Gaussian Process (Wolfram Research 2020)
The report related to the Gaussian Process method used is shown in Fig. 4.26. Figure 4.27 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Gaussian Process method, shown by a curve labelled “predictions". The exact prediction is illustrated here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It can be noticed that the accuracy of the predicted values compared to the actual values is really high and only a slight deviation of the points from the dashed line can be observed. Therefore, this model has so far proved to be suitable for prediction. Probability Density Histogram related to the Gaussian Process method is shown in Fig. 4.28. Fig. 4.26 Report—Gaussian Process (Wolfram Research 2020)
4.3 Gaussian Process
151
Fig. 4.27 Comparison Plot—Gaussian Process (Wolfram Research 2020)
Fig. 4.28 Probability Density Histogram—Gaussian Process (Wolfram Research 2020)
Figure 4.29 shows a histogram of residues, i.e. the frequency of the difference between the actual and predicted value. The residue can be understood as the size of the error that we make at the selected point in the prediction. It can be noticed that when using the Gaussian Process model, values bin 0–10 has the largest proportion. Extremes, in the form of residues −100 and 100, have the
152
4 Comparison of Different Methods
Fig. 4.29 Residual Histogram—Gaussian Process (Wolfram Research 2020)
smallest proportion of the entire histogram, as they have occurred the least amount of times. Residues are then shown graphically in Fig. 4.30. The value of Standard Deviation for this experiment is 28.4139. The value of Mean Cross Entropy is 4.81866, Mean Deviation is 21.1516, Mean Square is 807.351 and Evaluation Time is 0.020. The time series, including the two-month prediction, is shown in Fig. 4.31. The graph shows that the gold price prediction is very accurate when using the Random Forest method. The curve of the predicted price (green) for the most part copies the curve of the actual price of gold in CZK (red). The Gaussian Process method was able to predict the development of the curve’s motion quite
Fig. 4.30 Residual Plot (Wolfram Research 2020)
4.3 Gaussian Process
153
Fig. 4.31 Time series with prediction– Gaussian Process (Wolfram Research 2020)
accurately, except for some deviations. This model appears to be suitable for use in time series prediction. Residues for the observed time period are recorded in the chart in Fig. 4.32. The prediction for two calendar months using the Gaussian Process method is shown in the chart in Fig. 4.33. The time series of the actual values of the price of gold, including the prediction for two calendar months, is shown in Fig. 4.34.
Fig. 4.32 Residues—Gaussian Process (Wolfram Research 2020)
154
4 Comparison of Different Methods
Fig. 4.33 Prediction—Gaussian Process (Wolfram Research 2020)
Fig. 4.34 Time series with prediction—Gaussian Process (Wolfram Research 2020)
Fig. 4.35 Time series with prediction 2—Gaussian Process (Wolfram Research 2020)
4.3 Gaussian Process
155
Figure 4.35—time series with prediction 2, shows a part of the development of the time series of actual values of gold with the connection of prediction for two calendar months.
4.4 Gradient Boosted Trees Recently, a popular algorithm used for solving classification and regression tasks by means of decision tress is the so-called Gradient booster models (GBM), also known as Gradient Boosted Trees. Gradient boosting was popularised at the end of 1990s by the statistician Leo Breiman. According to Natekin and Knoll (2013), the principle of the gradient boosting method consists in a gradual creation of models, where each successive model is created with regard to the errors of the preceding models. According to Friedman (2002), gradient boosting method is a machine learning method applying the combination of weak learners, which represent the decision trees of a limited size in the case of GBM. If T is a number of end nodes in a partial decision tree, their number is determined based on the level of interaction between the input variables. If there is a tree with T = 2, there cannot be any interaction between the variables. These special cases are referred to as decision stumps. Empirically, it was proved that T ∈ 4, 8 is optimal for gradient boosting. The following figure (Fig. 4.36) shows the decision-making process by means of weak classification decision trees in the GBM algorithm (Hastie, 2009). In case an attempt is to teach the M model to predict values such as yˆ = (x), a primary objective function can be applied—the so-called mean squared error (MSE). This can be calculated as follows: for each input in the training dataset, a difference of expected and predicted value squared is calculated ( yˆ −y)2 . The result is subsequently divided by the number of elements in the training dataset. Hastie (2009) states that each iteration i in the Gradient Boosting algorithm contains an Mi model, where the imperfection of this model is assumed. Gradient Boosting aims to take the Mi model and adjust it so that better accuracy is achieved. This relationship can be generally transcribed as follows:
+ … + Fig. 4.36 Decision-making process in the GBM algorithm (Hastie 2009). Author
156
4 Comparison of Different Methods
Mi+1 (x) = Mi (x) + h(x) = y
(4.2)
On the basis of simple consideration, the ideal function h can be expressed as follows: h(x) = y − Mi (x)
(4.3)
where: y—M i (x)
error in prediction between y and yˆ .
4.4.1 Case Using the Gradient Boosted Trees method, the development of the price of gold is predicted in this experiment. Basic information on the Gradient Boosted Trees method used is shown in Fig. 4.37. The predictors in this case are Gradient Boosted Trees, the number of test examples is 1221 and the number of training examples is 2442. The report related to the Gradient Boosted Trees method is shown in Fig. 4.38. Figure 4.39 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Gradient Boosted Trees method, shown by a “predictions” curve. The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. From the graph it is possible to observe that the blue dots lie close to the curve with partial deviations from the curve and the accuracy of the predicted values compared to the actual values is quite high. This model therefore proves to be quite suitable for prediction. Probability Density Histogram related to the Gradient Boosted Trees method is shown in Fig. 4.40. Figure 4.41 shows a histogram of the residues, i.e. the frequency of the difference between the observed and the predicted value. The residues can be understood as the size of the error that we make at the selected point in the estimation. It should be noted that when using the Gradient Boosted Trees method, the range of values is −20 to 0. It should be noted that when using this method, the extreme residue values reach −100 and 100, compared to −200 and 200 for the Decision Tree method. Extremes, in the Fig. 4.37 Basic information—Gradient Boosted Trees (Wolfram Research 2020)
4.4 Gradient Boosted Trees
Fig. 4.38 Report—Gradient Boosted Trees (Wolfram Research 2020)
Fig. 4.39 Comparison Plot—Gradient Boosted Trees (Wolfram Research 2020)
157
158
4 Comparison of Different Methods
Fig. 4.40 Probability Density Histogram—Gradient Boosted Trees (Wolfram Research 2020)
Fig. 4.41 Residual Histogram—Gradient Boosted Trees (Wolfram Research 2020)
form of residues around the values of −100 and 100, have the smallest representation of the entire histogram, as they have occurred the least amount of times. The graphically shown residues related to the Gradient Boosted Trees method can be seen in Fig. 4.42. The value of Standard Deviation for this experiment is 27.3378. The value of Mean Cross Entropy is 4.79768, Mean Deviation is 19.7399, Mean Square is 747.353 and Evaluation Time is 0.0085. The time series, including the two-month prediction, is shown in Fig. 4.43. The graph shows that the gold price prediction is quite accurate when using the Gradient Boosted Trees method, with a few exceptions. The predicted price curve (gold) accurately copies the observed gold price curve in CZK (red). The
4.4 Gradient Boosted Trees
159
Fig. 4.42 Residual Plot—Gradient Boosted Trees (Wolfram Research 2020)
Fig. 4.43 Time series with prediction—Gradient Boosted Trees (Wolfram Research 2020)
Gradient Boosted Trees method was able to predict the direction of the curve, with high accuracy at some points. Deviations are noticeable, but negligible compared to the previous Decision Tree method. The size of the residues over the observed years can be seen in Fig. 4.44. The prediction for two calendar months, created using the Gradient Boosted Trees method, is shown in Fig. 4.45. The time series with a two-month prediction is shown in Fig. 4.46. With the Gradient Boosted Trees method, it is no longer possible to say that the prediction is unusable and incorrect, as was the case with the previous Decision Tree method.
160
4 Comparison of Different Methods
Fig. 4.44 Residues—Gradient Boosted Trees (Wolfram Research 2020)
Fig. 4.45 Prediction—Gradient Boosted Trees (Wolfram Research 2020)
Figure 4.47 shows a graph of the curve of observed values with the connecting curve of predicted values for two calendar months.
4.5 Linear Regression Hendl (2004) claims that the term “regression” was first used at the end of the nineteenth century by Francis Galton from Great Britain, who used it within his specialization when focusing the dependence of the height of offspring on the height
4.5 Linear Regression
161
Fig. 4.46 Time series with a prediction for two months—Gradient Boosted Trees (Wolfram Research 2020)
Fig. 4.47 Time series with prediction 2—Gradient Boosted Trees (Wolfram Research 2020)
of their parents. Valášková et al. (2018) state that regression analysis can generally be included among the most commonly used methods worldwide. The authors further add that the aim of this method is to examine the dependencies and relations between various variables, with their main objective being the description of the shape of the relationship between two or more variables. According to Moravˇcíková et al. (2017), this type of analysis is used to determine a suitable formula for predicting a dependent variable (Y), and based on the factors that affect the variables, it evaluates the prediction error—an independent variable (X). Variables Y and X are interconnected by means of regression function including several various parameters, where if the function has a linear character, it is referred to as a linear regression model, while if the parameters do not include a linear function, it is referred to as a non-linear regression model. In the case of modelling the behaviour of variable Y by means of one variable X, it is so-called simple regression; otherwise, we speak about the so-called multiple
162
4 Comparison of Different Methods
regression analysis, which is able to determine the parameters of the relationship of two variables like e.g. standard deviation and the means characterizing the behaviour of one variable. According to Hindls (2007), the shape of the classical model of linear is as follows: yi = yβ0 + β1 xi1 + · · · . + βk xik + εi
(4.4)
where: yi xi1 , … xik β0 , β1 , … βk εi n
value of random variable Y, values of explanatory variables, model’s parameters, random component (its mean value = 0), i = 1, 2, observation index.
The simplest examples of regression function include so-called regression line. Its function n(x) is expressed by means of the following formula: n(x) = β0 + β1 x
(4.5)
Hendl (2004) states that regression analysis is applied in a wide range of various fields and disciplines. As an example, he mentions technologies, where regression analysis is used to estimate the risk of failure or defect under certain conditions. In the field of medicine, it is used for modelling the efficacy of medicine; it is also important for marketing, where it is used for determining the probability of customers’ switch to competition. Last but not least, it is the field of finance, where regression analysis is used for forecasting customer’s creditworthiness or for predicting future development of a business entity in dependence on economic indicators.
4.5.1 Case Using the Linear Regression method, the development of gold price is predicted in this experiment. Fig. 4.48 Basic information—Linear Regression (Wolfram Research 2020)
4.5 Linear Regression
163
Basic information on the Linear Regression method used is shown in Fig. 4.48. The predictor in this case is Linear Regression, the number of test examples is 1221 and the number of training examples is 2442. The report on the Linear Regression method is shown in Fig. 4.49. Figure 4.50 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Linear Regression method, shown by a “predictions” curve. The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It should be noted that the accuracy of the predicted values compared to the actual values is not very high and large residues can be seen, i.e. deviations from the dashed line, which indicates the unsuitability of the model for prediction. This model therefore proves to be less suitable for prediction. Probability Density Histogram related to the Linear Regression method is shown in Fig. 4.51. Fig. 4.49 Report—Linear Regression (Wolfram Research 2020)
164
4 Comparison of Different Methods
Fig. 4.50 Comparison Plot—Linear Regression (Wolfram Research 2020)
Fig. 4.51 Probability Density Histogram—Linear Regression (Wolfram Research 2020)
The histogram of the resulting residues is shown in Fig. 4.52 and indicates the deviation and inaccuracy of the predicted values from the actual values. In this histogram we see that the extremes range from −300 to 500, which is an especially wide range. The highest proportion in terms of residues can be observed in the range of −50 to −100. Graphically represented residues in the chart are shown in Fig. 4.53.
4.5 Linear Regression
165
Fig. 4.52 Residual Histogram—Linear Regression (Wolfram Research 2020)
Fig. 4.53 Residual Plot—Linear Regression (Wolfram Research 2020)
The value of Standard Deviation for this experiment is 179.88. The value of Mean Cross Entropy is 6.61125, Mean Deviation is 130.294, Mean Square is 32356.9 and Evaluation Time is 0.0058. The time series, including the two-month prediction, is shown in Fig. 4.54. The chart shows that the gold price prediction is very inaccurate when using the Linear Regression method. The curve of the predicted price (black) does not copy the curve of the actual price of gold in CZK (red). The Linear Regression method could not predict the direction of the curve in some sections of the chart, including high inaccuracy at some points. The deviations are not negligible. Residues produced using the Linear Regression method are recorded over the monitored years in the chart shown in Fig. 4.55.
166
4 Comparison of Different Methods
Fig. 4.54 Time series with prediction—Linear Regression (Wolfram Research 2020)
Fig. 4.55 Residues—Linear Regression (Wolfram Research 2020)
The prediction for two calendar months using the Linear Regression method is shown in Fig. 4.56. The chart shows that, according to the forecast, there was a large increase in the price of gold at the turn of August and September. The time series of the actual development of the price of gold, including the previously mentioned two-month prediction, is shown in Fig. 4.57. The prediction curve is marked in black here. The actual gold price curve is marked in red.
4.5 Linear Regression
167
Fig. 4.56 Prediction—Linear Regression (Wolfram Research 2020)
Fig. 4.57 Time series with prediction—Linear Regression (Wolfram Research 2020)
Time series with a subsequent prediction for two calendar months is shown in detail in Fig. 4.58.
4.6 Nearest Neighbours The method of Nearest Neighbours, also known as k-NN (k-Nearest Neighbours) is a k-nearest neighbour algorithm, which belongs to basic and commonly applied algorithms of artificial intelligence. The method is applied mainly for the algorithm is applied for the problems of regression and classification (Mocnik 2020). The figure below (Fig. 4.59) shows the basic principle of this algorithm. According to Ya and Yu (2006), it works on the principle of storing information on the values of training elements during training, while during testing, for each test element,
168
4 Comparison of Different Methods
Fig. 4.58 Time series with prediction 2—Linear Regression (Wolfram Research 2020)
k=3 k=7 Fig. 4.59 Principle of k-NN algorithm (Yu and Yu 2006). Author
there is calculated distance (Manhattan distance, Euclidean distance, etc.) between the attributes of this element and attributes of the training element. Based on this method, distance calculations are performed on all training samples. Subsequently, the distances are arranged and the k samples with the shortest distance are considered nearest neighbours. On the basis of the classification classes of k nearest training elements, the test element is classified. Recently, in a number of studies, there has been mentioned the application of modified or optimized versions of the k-NN algorithm. For example, Garcia et al. (2008) compared the implementation of the GPU version of k-NN with several CPU versions. Liang et al. (2009) introduced several special methods in order to maximize the application of the graphics accelerator. It is also necessary to mention Garcia et al. (2008), who dealt with the description of the k-NN algorithm for texture analysis.
4.6 Nearest Neighbours
169
4.6.1 Case Using the Nearest Neighbours method, gold price development is predicted in this experiment. The basic information on the Nearest Neighbours method used is shown in Fig. 4.60. The predictors in this case are Nearest Neighbours, the number of test examples is 1221 and the number of training examples is 2442. The report related to the Nearest Neighbours method is shown in Fig. 4.61.
Fig. 4.60 Basic information—Nearest Neighbours (Wolfram Research 2020)
Fig. 4.61 Report—Nearest Neighbours (Wolfram Research 2020)
170
4 Comparison of Different Methods
Fig. 4.62 Comparison Plot—Nearest Neighbours (Wolfram Research 2020)
Figure 4.62 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Nearest Neighbours method, shown by a curve labelled “predictions". The exact prediction is shown here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It should be noted that the accuracy of the predicted values compared to the actual values is high and only slight deviations from the dashed line can be seen. This model therefore proves to be quite suitable for prediction. Probability Density Histogram related to the Nearest Neighbours method is shown in Fig. 4.63. Figure 4.64 shows a histogram of residues, i.e. the frequency of the difference between the actual and predicted value. The residue can be understood as the size of the error that we make at the selected point in the estimation. It can be noticed that in the case of using the Nearest Neighbours model, the values of bin −5 to 0 have the largest representation. Extremes, in the form of residues −70 and 70, have the smallest representation of the entire histogram, as they have occurred the least amount of times. Figure 4.65 then shows the residues in a chart. The value of Standard Deviation for this experiment is 37.1113. The value of Mean Cross Entropy is 5.03393, Mean Deviation is 20.2955, Mean Square is 1377.25 and Evaluation Time is 0.0048. The time series, including the two-month prediction, is shown in Fig. 4.66. The chart shows that the gold price prediction is very accurate when using the Nearest Neighbours method. The curve of the predicted price (orange) accurately copies the curve of the actual price of gold in CZK (red). The Nearest
4.6 Nearest Neighbours
171
Fig. 4.63 Probability Density Histogram—Nearest Neighbours (Wolfram Research 2020)
Fig. 4.64 Residual Histogram—Nearest Neighbours (Wolfram Research 2020)
Neighbours method was able to predict the development of the curve very accurately. Deviations are negligible. The resulting residues for the observed years are shown in Fig. 4.67. The prediction for two calendar months is shown in Fig. 4.68. Sudden changes in the development of the curve can be observed. The time series of the actual development of the gold price together with the subsequent prediction, using the Nearest Neighbours method, is shown in Fig. 4.69. Figure 4.70 then shows a detail of the development of a part of the time series including the prediction.
172
4 Comparison of Different Methods
Fig. 4.65 Residual Plot—Nearest Neighbours (Wolfram Research 2020)
Fig. 4.66 Time series with prediction—Nearest Neighbours (Wolfram Research 2020)
4.7 Random Forest The method of Random Forest was created by Leo Breiman, who, in his study, created a forest using a combination of decision trees in order to improve classification or prediction (Isobe and Tamada 2018). According to Breiman (2001), Random Forest can be characterized as an extension and particular implementation of decision trees using the application of several methods in order to increase the applicability on real data. The method of Random Forest is basically not particularly different from the method of decision threes. The main difference consists in the fact that instead of one tree (the so-called “boosting”), a collection of trees (forest) is created. The
4.7 Random Forest
173
Fig. 4.67 Residues—Nearest Neighbours (Wolfram Research 2020)
Fig. 4.68 Prediction—Nearest Neighbours (Wolfram Research 2020)
task of the forest is to decide on the classification of a data point in a class (in the case of classification) by voting or by calculating the average of the target value from the estimates of the individual trees. It follows from the above that there is an effort to reduce a general error of the forest as a whole instead of reducing the inaccuracy of individual trees. In connection with this issue, the term “bagging” or “bootstrap aggregating” is used, which represents a random division of a training dataset into k parts, where one part is applied for the creation of trees and the other for the purposes of verification of the created model of forest (Ma et al. 2020).
174
4 Comparison of Different Methods
Fig. 4.69 Time series with prediction—Nearest Neighbours (Wolfram Research 2020)
Fig. 4.70 Time series with prediction 2—Nearest Neighbours (Wolfram Research 2020)
According to Tang and Ishwaran (2017), the mechanism of random forests is sufficiently universal to be used for solving classification and regression tasks. The authors further add that this method could be used to eliminate some problems that may arise from the application of trees, e.g. instability of trees. Random forest is created by a set of trees T1 ,….TS, , whose classification function is expressed using the following formula: {d(x, k ), k = 1, . . . , S} where: x 1 ,…,S
vector of predictors’ values, independent, equally distributed random vectors.
(4.6)
4.7 Random Forest
175
The method of Random Forest applies binary trees of the CART type. Focusing on the advantages of this method, it can be stated that it consists primarily in their simple learning and debugging, which is also the reason for their general popularity and common implementation in various fields. Random forest has the ability to improve accuracy (reduce fluctuations) by not allowing the trees to grow to excessive complexity; it does not cut the trees but it maintains the ideal dispersion by combining the results of the individual trees. Originally, this method was created for the datasets that create a large amount of predictors; however, it turned out to work well even in the case of small datasets (Ramo and Chuvieco 2017).
4.7.1 Case Using the Random Forest method, the development of the price of gold is predicted in this experiment. The basic information on the Random Forest method used is shown in Fig. 4.71. The predictor in this case is Random Forest, the number of test examples is 1221 and the number of training examples is 2442. The report for the Random Forest method is shown in Fig. 4.72. Figure 4.73 shows a comparison graph comparing actual gold prices, in this case “perfect prediction”, with those predicted using the Random Forest method, shown by a curve labelled “predictions". The exact prediction is illustrated here by a dashed line, as opposed to the predicted points, which are shown in the form of blue circles. It should be noted that the accuracy of the predicted values compared to the actual values is not very high and a large deviation of the circles from the dashed line can be observed. This model therefore proves to be less suitable for prediction. Probability Density Histogram related to the Random Forest method is shown in Fig. 4.74. Figure 4.75 shows a histogram of residues, i.e. the frequency of the difference between the actual and predicted values. The residue can be understood as the size of the error that we make at the selected point in the prediction. It should be noted that when using the Random Forest model, bin 0 to 20 has the largest representation. Extremes, in the form of residues −70 and 70, have the smallest proportion of the Fig. 4.71 Basic information—Random Forest (Wolfram Research 2020)
176
4 Comparison of Different Methods
Fig. 4.72 Report—Random Forest Wolfram Research (2020)
Fig. 4.73 Comparison Plot—Random Forest (Wolfram Research 2020)
4.7 Random Forest
177
Fig. 4.74 Probability Density Histogram—Random Forest (Wolfram Research 2020)
Fig. 4.75 Residual Histogram—Random Forest (Wolfram Research 2020)
entire histogram, as they have occurred the least amount of times. Figure 4.76 then captures the residues graphically. The value of Standard Deviation for this experiment is 68.7212. The value of Mean Cross Entropy is 7.14466, Mean Deviation is 57.0969, Mean Square is 4722.6 and Evaluation Time is equal to 0.011. The time series, including the two-month prediction, is shown in Fig. 4.77. The chart shows that the gold price prediction is not very accurate when using the Random Forest method. The curve of the predicted price (purple) does not accurately copy the curve of the actual price of gold in CZK (red). The Random Forest method failed to predict the development of the curve’s motion accurately, having only partially predicted the shape of the curve and the
178
4 Comparison of Different Methods
Fig. 4.76 Residual Plot—Random Forest (Wolfram Research 2020)
Fig. 4.77 Time series with prediction—Random Forest (Wolfram Research 2020)
direction of development. Deviations are very noticeable, and this model seems unsuitable for use in time series prediction. The resulting residues are captured in the chart for the monitored period in Fig. 4.78. Figure 4.79 shows a detailed view of the time series prediction for two calendar months using the Random Forest method. The time series of actual values, including the added prediction for two calendar months, can be seen in Fig. 4.80.
4.7 Random Forest
179
Fig. 4.78 Residues—Random Forest (Wolfram Research 2020)
Fig. 4.79 Prediction—Random Forest (Wolfram Research 2020)
Figure 4.81 shows a detailed view of the development of the time series of actual values of gold prices with the connection of prediction using the Random Forest method.
180
4 Comparison of Different Methods
Fig. 4.80 Time series with prediction—Random Forest (Wolfram Research 2020)
Fig. 4.81 Time series with prediction 2—Random Forest (Wolfram Research 2020)
4.8 Mathematica: Comparison A comparison of the time series of actual gold prices in CZK with time series predicted using seven different methods is shown in Fig. 4.82. It is clear from the figure that in the case of the Decision Tree, Linear Regression and Random Forest methods that they did not come close to the course of the red curve indicating the actual development of the price of gold. Therefore, these methods are not suitable for time series prediction. Figure 4.83 shows an overview of the individual time series compared to a time series containing observed gold price values (actual values), where the curve of this time series is marked in red. Neural Networks, Gradient Boosted Trees, Nearest Neighbors and Gaussian Process appear to be quite suitable prediction methods. The
4.8 Mathematica: Comparison
181
Fig. 4.82 Time series—comparison (Wolfram Research 2020)
smallest extent of residues was recorded in the case of the Neural Networks method, which thus came first in the accuracy of time series prediction. Therefore, it can be said that the Neural Networks method is the most suitable for time series prediction. Predictions for two calendar months using seven different methods are shown in Fig. 4.84. A very similar course of the prediction curve can be observed for the Random Forest and Gaussian Process methods, starting from the middle of the chart. The biggest fluctuations in the predicted prices are shown by the curve of the Nearest Neighbours method. The Decision Tree method can be ruled out as a suitable prediction tool for its prediction curve. Figure 4.85 shows the individual curve developments predicted using seven different methods. Time series statistics for all methods are shown in Fig. 4.86. Prediction statistics for all methods are shown in Fig. 4.87.
182
4 Comparison of Different Methods
Fig. 4.83 Time series and residues—comparison (Wolfram Research 2020)
4.8 Mathematica: Comparison
Fig. 4.84 Prediction—comparison (Wolfram Research 2020)
183
184
4 Comparison of Different Methods
Fig. 4.85 Prediction—methods individually (Wolfram Research 2020)
4.8 Mathematica: Comparison
185
Fig. 4.86 Time series statistics—comparison (Wolfram Research 2020)
Fig. 4.87 Prediction statistics—comparison (Wolfram Research 2020)
References Andone, I., and N.A. Sireteanu. 2009. A combination of two classification techniques for businesses Bankruptcy prediction. SSRN Electronic Journal [online]. Available at https://ssrn.com/abstract= 1527726 Bajer, L., Z. Pitra, and M. Holeˇna. 2015. Benchmarking Gaussian processes and random forests surrogate models on the BBOB noiseless testbed. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation 1143–1150. ACM. Breiman, L. 2001. Random forests. Machine Learning 45: 5–32. Fotr, J. 2006. Manažerské rozhodování: Postupy, metody a nástroje [Managerial decision making: Procedures, methods and tools]. Prague: Ekopress. Friedman, J.H. 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis, 367– 378. Garcia, V., E. Debreuve, and M. Barlaud, 2008. Fast k nearest neighbour search using GPU. In Workshops IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 1–6. Hastie, T. 2009. The elements of statistical learning. New York: Springer Publishing. Hendl, J. 2004. Pˇrehled statistických metod zpracování dat: analýza a metaanalýza dat [Overview of statistical methods of data processing: analysis and meta-analysis of data]. Prague: Portál. Hindls, R. 2007. Statistika pro economy [Statistics for economists]. Prague: Professional publishing. Isobe, Y., and H. Tamada. 2018. Are identifier renaming methods secure? An evaluation focuses on opcodes using random forest. In 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 322–328. Klyuchnikov, N., and E. Burnaev. 2020. Gaussian process classification for variable fidelity data. Neurocomputing 397: 345–355. Liang, D., B. Liu, J. Wang, and L. Ying. 2009. Accelerating SENSE using compressed sensing. Magnetic Resonance in Medicine. 62: 1574–1584. Ma, W., G. Lin, and J.L. Liang. 2020. Estimating dynamics of central hardwood forests using random forests. Ecological modelling, 419.
186
4 Comparison of Different Methods
Mocnik, F. 2020. Am improved algorithm for dynamic nearest-neighbour models. Journal of Spatial Science. https://doi.org/10.1080/14498596.2020.1739575. Moravˇcíková, D., A. Križanová, J., Klieštiková, and M. Rypáková. 2017. Green marketing as the source of the competitive advantage of the business. Sustainability, 9(12). Natekin, A., and A. Knoll. 2013. Gradient boosting machines. Frontiers in Neurobotics, 1–21. Ramo, R., and E. Chuvieco. 2017. Developing a random forest algorithm for MODIS global burned area classification. Remote Sensing, 9(11). Rasmussen, C.E., and C.K. Williams. 2006. Gaussian processes for machine learning. The MIT Press. Sagi, O., and L. Rokach. 2020. Explainable decision forest: Transforming a decision forest into an interpretable tree. Information Fusion 61: 124–138. Tang, F., and H. Ishwaran. 2017. Random forest missing data algorithms. Statistical Analysis and Data Mining 10 (6): 363–377. Valášková, K., T. Klieštik, L. Švábová, and P. Adamko. 2018. Financial risk measurement and prediction modelling for sustainable development of business entities using regression analysis. Sustainability, 10(7). Wolfram Research, Inc. 2020. Mathematica, verze 12.1, Champaign, IL. Xiao, H., and G. Xu. 2020. Neural decision tree towards fully functional neural graph. Unmanned Systems 8 (3): 203–210.
Chapter 5
Conclusion
In this publication, time series were addressed within case studies by means of experiments, primarily with a focus on their analysis and subsequent prediction. This publication aimed to provide an overview of predictive models that can be used for predicting selected time series with their subsequent application and evaluation of their applicability for a given problem. Here, the time series consisted of daily prices of gold (USD/oz) for a longer period of time. When applied in the real world, thorough pre-processing, construction of functions, and selection of functions in term of predictive models is necessary. For time series predicting, the case studies used both econometric models and models based on artificial neural networks. In terms of traditional statistical methods, one case study used e.g. one of the basic methods of time series analysis called linear regression. Regression analysis can be generally included among the globally most widely used methods. In terms of performance and error rate, the case studies in this publication, which deals with a suitability of applying various predictive models for predicting time series of price of gold, linear regression, however, is one of the least suitable methods. This method is unable to work with non-linearity in data, which makes in an unsuitable method for the analysis and predicting price of gold. Another statistical method used for the analysis and predicting of price of gold time series is the so-called Gaussian process. This method is presented using distribution through all suitable functions, where the characteristics of these functions are determined on the basis of covariance functions. Gaussian process is an interesting technique, whose popularity has grown significantly in recent years. This method was used for one case study; in its application for the price of gold time series, its extent of residuals was one of the smallest compared to other methods, and its prediction was very accurate. It is a very accurate and suitable statistical method for time series analysis and prediction. Another statistical method used within the experiments is the method of exponential smoothing, where the price of gold prediction curve is smoothed. Its application showed that this method is quite suitable for time series predicting at different curve setting. The extent of residuals was not large; however, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 J. Vrbka, Using Artificial Neural Networks for Timeseries Smoothing and Forecasting, Studies in Computational Intelligence 979, https://doi.org/10.1007/978-3-030-75649-9_5
187
188
5 Conclusion
the predictions for 44 trading days differ significantly at different curve setting. One of the most widely used method for time series analysis and predicting is the so-called ARIMA model, one of econometric models, the same as exponential smoothing. ARIMA models are suitable especially for short-term predictions, in the situations where the data of explanatory variables are not available or where the model shows poor predictive ability. In the case of applying the model in the price of gold time series within a case study, it was found that this model appears to be very suitable due to nearly zero residuals after summing, where even in normal distribution of residuals, the peak of the curve is at zero point. This model can thus be used as a predictive model for time series, which is not surprising in the case of the ARIMA model. As for the methods of artificial intelligence, more specifically, when dealing first with data-mining techniques, which were also used within the case studies, first method to be mentioned is the method of decision trees, as it is a basis for other methods used within the publication, such as Gradient Boosted Trees or Random Forest. Nevertheless, the Decision Tree method appeared not to be suitable within the experiment focused on the comparison of models applicable for predicting price of gold time series. The difference of the actual values and values predicted using this method was striking at individual data points. When comparing the actual curve of the development of price of gold, it can be concluded that the curve of the predicted values successfully captures a long-term trend at some points of the graph, but in spite of this, the method does not appear to be suitable for predicting the price of gold time series. The same case is the application of Random Forest for predicting the price of gold. Much better appears to be another data-mining method, Gradient Boosted Trees. Its principle consists in gradual creation of models, where each following model is created with respect to the errors of the previous models. The model used within one of the experiments appeared to be very suitable and accurate, with accurate prediction and low residuals. Similar case is the application of the Nearest Neighbours method, which is referred to as k-NN (k-Nearest Neighbours), and is one of the machine learning methods included in the artificial intelligence methods. It is applied especially for regression and classification. For the experiments carried out within the case studies, this method appears to be suitable, since using this method, the predicted time series (the curve) copied the curve of the actual values of prices of gold. The residuals within this model were not large. The most successful group of methods used within the case studies for time series analysis and prediction appeared to be the methods of neural networks. Three types of networks were used, specifically Multi-Layer perceptron (MLP), Long Short Term Memory (LSTM), and Radial basis function (RBF). In the case of RBF networks, the most successful network achieving the performance of more than 98% and with determined 1 input variable (time—date), 1 output variable (price of gold), and 1day delay of the time series. The MLP networks show a very similar performance of more than 98% in training, testing, and validation datasets. The most successful MLP network in terms of time series prediction was the network with 5 input variables (time—date, day of the week, day of the month, month, year), 1 output variable
5 Conclusion
189
(price of gold), 1-day delay of the time series. The best evaluation in terms of the smallest extent of residuals was achieved by LSTM networks. All the aforementioned methods based on neural networks appear to be suitable for predicting the time series of commodity prices. In conclusion, it can be stated that not all methods used are suitable for time series predictions; specifically, not all statistical methods or methods based on artificial intelligence are a suitable tool for time series predictions. However, it can be stated that the most suitable methods definitely include the methods based on neural networks, provided that there is correct setting and an appropriate number of variables for each group of networks.