169 96 3MB
English Pages 88 Year 2015
Copyright © 2015. Diplomica Verlag. All rights reserved.
Becker, Jan: Big Data Investments: Effects of Internet Search Queries on German Stocks. Hamburg, Diplomica Verlag GmbH, 2015 Buch-ISBN: 978-3-95934-597-2 PDF-eBook-ISBN: 978-3-95934-097-7 Druck/Herstellung: Diplomica® Verlag GmbH, Hamburg, 2015 Covermotiv: © buchachon - Fotolia.com Bibliografische Information der Deutschen Nationalbibliothek: Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.d-nb.de abrufbar.
Copyright © 2015. Diplomica Verlag. All rights reserved.
Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlages unzulässig und strafbar. Dies gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Bearbeitung in elektronischen Systemen. Die Wiedergabe von Gebrauchsnamen, Handelsnamen, Warenbezeichnungen usw. in diesem Werk berechtigt auch ohne besondere Kennzeichnung nicht zu der Annahme, dass solche Namen im Sinne der Warenzeichen- und Markenschutz-Gesetzgebung als frei zu betrachten wären und daher von jedermann benutzt werden dürften. Die Informationen in diesem Werk wurden mit Sorgfalt erarbeitet. Dennoch können Fehler nicht vollständig ausgeschlossen werden und der Diplomica Verlag, die Autoren oder Übersetzer übernehmen keine juristische Verantwortung oder irgendeine Haftung für evtl. verbliebene fehlerhafte Angaben und deren Folgen. Alle Rechte vorbehalten © Diplomica Verlag GmbH Hermannstal 119k, 22119 Hamburg http://www.diplomica-verlag.de, Hamburg 2015 Printed in Germany
Vorwort Sehr geehrter Leser, im Jahre 2010 entschloss sich der Bundesverband Alternative Investments e. V. (BAI), wissenschaftliche Arbeiten im Bereich der sog. Alternativen Investments zu fördern. Zu diesem Zweck wurde damals der BAI-Wissenschaftspreis ins Leben gerufen. Einer der Hauptgründe sowie die Intention für diese Förderung waren und sind, dass das Wissen über Alternative Investments sowohl in der Breite als auch in der Tiefe leider immer noch sehr rudimentär ist. In weiten Teilen der Öffentlichkeit, der Politik, der Medien aber auch auf Seiten der Investoren herrschen oftmals vielfache Missverständnisse hinsichtlich Nutzen und Risiken von Alternative Investments. Mit dem Wissenschaftspreis will der BAI einen Anreiz für Studenten und Wissenschaftler in Deutschland schaffen, Forschungsarbeit in diesem für institutionelle Investoren zukünftig immer wichtiger werdenden Bereich zu leisten. Viele deutsche Hochschulen erklärten sich auf Anhieb bereit, den BAI bei der Bekanntmachung des Wissenschaftspreises zu unterstützen. Daraus resultierend erreichten den BAI zahlreiche anspruchsvolle Bewerbungen in den vier Kategorien „Dissertationen“, „Master-/Diplomarbeiten“, „Bachelorarbeiten“ und „Sonstige Wissenschaftliche Arbeiten“. Für diese wurde jährlich neben einem Award ein Preisgeld von 10.000 Euro an die Gewinner ausgelobt. Wir freuen uns sehr, dass der Diplomica Verlag die Reihe „Alternative Investments“ ins Leben gerufen hat. Diese Publikation wird sicherlich auch dazu beitragen, das Thema Alternative Investments einer Vielzahl von Personen näherzubringen. Wir wünschen dem Leser nun eine spannende Lektüre!
Copyright © 2015. Diplomica Verlag. All rights reserved.
Ihr Bundesverband Alternative Investments e. V.
5
Copyright © 2015. Diplomica Verlag. All rights reserved.
Abstract
Copyright © 2015. Diplomica Verlag. All rights reserved.
This study examines the impact of internet search queries on the German stock market. Two research questions are answered: First, whether an increase in search queries drives individual stock returns and second, whether queries affect the implied volatility of stock options. After controlling for seasonality, autocorrelation and general market risk in the further analysis the Price-to-Book valuation, one year performance and historical volatility are examined in interaction with internet search queries. Overall the analyzed statistical relationships are not stable and there is no significant relationship.
7
Copyright © 2015. Diplomica Verlag. All rights reserved.
Table of Contents Vorwort ..................................................................................................................................... 5 Abstract ..................................................................................................................................... 7 List of Abbreviations .............................................................................................................. 12 List of Tables ........................................................................................................................... 13 List of Figures ......................................................................................................................... 14 Part 1 – Framework ............................................................................................................... 15 1.1 Introduction .................................................................................................................... 15 1.2 Contribution to the current State of Research ................................................................ 16 1.3 Structure.......................................................................................................................... 16 1.4 Definition of Internet Search Queries ............................................................................. 16 1.5 Literature Review ........................................................................................................... 16 1.5.1 Stocks ...................................................................................................................... 17 1.5.2 Implied Volatility .................................................................................................... 18 1.5.3 Unemployment, Consumer Sentiment and Revenue ............................................... 18 1. Data Scope of Analysis................................................................................................... 19 1..1 Timeframe ............................................................................................................... 20 1..2 Necessary Adjustments in Sample Selection .......................................................... 20 1. Search Engines – Gateways to Information ................................................................... 20 1. The Google Tool...................................................................................................... 20 1..2 Grouping .................................................................................................................. 21
Copyright © 2015. Diplomica Verlag. All rights reserved.
1..3 Multiple Counting is automatically avoided by Google.......................................... 21 1..4 Synthetic Index rather than actual Numbers ........................................................... 22 1..5 Empty Values .......................................................................................................... 22 1..6 Limited by German Language ................................................................................. 22 1..7 Exact Wording of Search Terms and Search Term Combinations.......................... 23
9
1. Control Variables ............................................................................................................25 1..1 Market Index ............................................................................................................ 25 1..2 Turnover ...................................................................................................................25 1..3 Price-to-Book Ratio (PB) .........................................................................................26 1..4 Prior Price Changes..................................................................................................26 1..5 The 52 Week High or Low ......................................................................................26 1..6 Implied Volatility .....................................................................................................27 1..7 3m implied Volatilities ............................................................................................27 1..8 Historical Volatility .................................................................................................27 Part 2 - Regressions.................................................................................................................29 2.1 Model Selection ..............................................................................................................29 2.1.1 Goodness of Fit ........................................................................................................ 29 2.1.2 Level versus Changes ..............................................................................................30 2.1.3 Seasonality ...............................................................................................................30 2.2 - Return Regressions ................................................................................................ 33 2.2.1 Standardization ........................................................................................................ 38 2.2.2 Correlation Analysis ................................................................................................39 2.2.3 Pooled Regression ....................................................................................................40 2.2.4 DAX, MDAX and SDAX Panel ..............................................................................42 2.2.5 Search Intensity ........................................................................................................ 44 2.2.6 Price-to-Book and GSI .............................................................................................46 2.2.7 52 week High or Low ..............................................................................................47 2.2.8 Interim Conclusion...................................................................................................48 2.3 - Implied Volatility Regressions ..............................................................................49 2.3.1 Groups of Search Intensity.......................................................................................50
Copyright © 2015. Diplomica Verlag. All rights reserved.
2.3.2 Cross Check for historical Volatility .......................................................................51 2.3.3 Single Stock Regressions on implied Volatility ......................................................53 2.3.4 Interim Conclusion...................................................................................................56
10
Part 3 - Vector Autoregressive Model (VAR) ...................................................................... 59 3.1 BMW VAR ..................................................................................................................... 59 3.2 General VAR .................................................................................................................. 61 Part 4 – Limitations of Research ........................................................................................... 65 4.1 Initial Intention for Query............................................................................................... 65 4.2 Discussion about Transparency versus Endogeneity...................................................... 65 4.3 Language Barrier ............................................................................................................ 66 4.4 Technical Limitations ..................................................................................................... 66 4.5 Alternatives ..................................................................................................................... 66 4.6 Correction for Seasonality .............................................................................................. 66 4.7 Limitation of Trading Signal .......................................................................................... 67 4.8 Missing Interaction with other Controls ......................................................................... 67 4.9 External Data Manipulation............................................................................................ 67 4.9.1 Micro Impact Field Study ........................................................................................ 67 4.10 Access Chanel............................................................................................................... 68 4.11 Missing Controls........................................................................................................... 68 Part 5........................................................................................................................................ 69 5.1 Further Research ............................................................................................................. 69 5.2 Conclusion ...................................................................................................................... 69 Bibliography ............................................................................................................................ 71 Appendix ................................................................................................................................. 77 A.1 Frequency Distributions................................................................................................. 77 A.2 Q-Q-Plot ........................................................................................................................ 78
Copyright © 2015. Diplomica Verlag. All rights reserved.
A.3 Table of aggregate Inputs .............................................................................................. 78 A.4 Alternative Model Specifications .................................................................................. 79 A.5 Seasonality of Returns ................................................................................................... 80 A.6 Seasonality of Search Queries ....................................................................................... 81 A.7 Trading Strategies .......................................................................................................... 82
11
4.7 Limitation of Trading Signal At the time of research (05 May 2013) it is not yet possible to set up a real-time trading strategy with Google data for the German market. The most frequent observations are published in a rolling 90-day window for daily data. The publication lag is usually one day and data gets published the next morning at US’s Pacific Time (UTC-8). For German stocks this implies that it is already high noon in Germany and the data has roughly 1.25 days delay. This is a disadvantage to other sources of media like Twitter or press releases.
4.8 Missing Interaction with other Controls One of the problems, why this study does not find clear directional effects of queries on stocks price changes can be attributed to the fact that the data is not controlled for the initial intention of the person conducting a search query and his or her investment positioning. Google data without these controls does not lead to clear conclusions about a directional effect. First, it may be interesting to see how different sentiment in Twitter and high search activity influences the stock prices. Secondly, especially controls from News in the media as for example TV or in Newspapers may be helpful in setting a basic sentiment indicator. Third, as most financial websites (e.g. exchanges or online brokers) offer the possibility to search through their own sites, this data could be much more relevant, because it actually targets the stock itself and not the products as discussed before. Yet there is no freely available functionality which covers the topic.
4.9 External Data Manipulation The nature of the Google Search Index being an unquantifiable Index leaves some facts in the dark. First we cannot say how much queries really happen by looking at the index. Secondly, we do not know how the Index exactly evolved via normalization and third, this opens the possibility to manipulate the Index by entering terms via a computer program many times into the search engine under different IPs.
Copyright © 2015. Diplomica Verlag. All rights reserved.
As many researchers have recently looked into the topic of queries, it cannot be said, how many of those queries actually influenced the statistics of later research. It cannot be identified how many queries were generated by researchers.
4.9.1 Micro Impact Field Study On 5th June 2013 I tried to test whether a small amount of queries would tilt Google’s search index time series. Therefore friends were invited via Facebook to google the terms "balda aktie", "baywa aktie" or “gildemeister aktie“.These three stocks were selected because they had little or no search activity in prior weeks. On the 5th and 6th people overall googled eight times Baywa, 11 times Balda and 11 times Gildemeister. Surprisingly the
67
List of Tables Table 1: List of Stocks in Sample Table 2: F-Test for Seasonality of GSI of three Categories Table 3: Regression of Model_mix_GSI_Name for Y=price_r_t0 Table 4: Regression of Model_mix_GSI_AG for Y=price_r_t0 Table 5: Regression of Model_mix_GSI_Aktie for Y=price_r_t0 Table 6: Jarque-Bera-Test for “Name” seasonal adjusted and standardized Queries Table 7: Correlation Analysis of Signals Table 8: Correlation Analysis of Signals and dependent Variables Table 9: Regression of Model_3Categories _Zmix for Y=price_r_t0 Table 10: Regression of Model_3Indices_Zmix for Y= price_r_t0 Table 11: Regression of Model_GSI_Quantile_ZMix for Y=price_r_t0 Table 12: Regression of Model_PBxGSI_ZMix for Y=price_r_t0 Table 13: Regression of Model_52weekHighLow_Zmix for Y= price_r_t0 Table 14: Regression of Model_3Categories_Zmix for Y= d_3m_impl_vola_t0 Table 15: Regression of Model_GSI_Quantile_ZMix for Y= d_3m_impl_vola_t0 Table 16: Regression of Model_HistVola&GSI_ZMix for Y=d_implied_vola_t0 Table 17: Regression of Model_mix_GSI_Name for Y=d_3m_impl_vola_t0 Table 18: Regression of Model_mix_GSI_AG for Y= d_3m_impl_vola_t0 Table 19: Regression of Model_mix_GSI_Aktie for Y= d_3m_impl_vola_t0 Table 20: Number of available Data Points by Index and Search Term Table 21: Regression of Price Changes for different Setups of non-linear Specification Table 22: Example BMW: Correction for Stock Price Seasonality Table 23: BMW Example: Correction for Seasonality in GSI
Copyright © 2015. Diplomica Verlag. All rights reserved.
Table 24: Regression of Model_PB_ZMix for Y=price_r_t+1 Table 25: Regression of Model_52weekHighLow _Zmix for Y= price_r_t+1 Table 26: Regression of Model_HistVola&GSI_ZMix for Y= d_implied_vola_t+1 Table 27: VAR_BMW
13
List of Figures Figure 1: Evolution of available Signals Figure 2: Monthly Averages of Internet Search Queries for BMW, Daimler and VW Figure 3: Impulse Response of BMW returns to Search Queries Figure 4: Impulse Response Graph of Price Changes by “Name” Queries Figure 5: Impulse Response Graph of implied Volatility Changes by “Name” Queries Figure 6: Impulse Response Graph of Price Changes by “AG” Queries Figure 7: Impulse Response Graph of implied Volatility Changes by “AG” Queries Figure 8: Impulse Response Graph of Price Changes by “Aktie” Queries Figure 9: Impulse Response Graph of implied Volatility Changes by “Aktie” Queries Figure 10: Frequency Distribution of mixed “Name” Search Queries over the whole sample Figure 11: Dot-plot Chart of standardized “Name” Search Queries and % Price Changes Figure 12: Frequency Distribution of mixed “AG” Search Queries over the whole sample Figure 13: Dot-plot Chart of standardized “AG” Search Queries and % Price Changes Figure 14: Frequency Distribution of mixed “Aktie” Search Queries over the whole sample Figure 15: Dot-plot Chart of standardized “Aktie” Search Queries and % Price Changes
Copyright © 2015. Diplomica Verlag. All rights reserved.
Figure 16: Q-Q-plot of standardized mixed Queries
14
Part 1 – Framework 1.1 Introduction In recent years, the internet has developed very quickly and became a major source of information for human beings all over the planet. Many kinds of questions can be answered by simply typing in a few letters on a computer and time and space do not detain the answers anymore. The internet has made its major breakthrough into the mobile telephones of even the youngest children. For many years sophisticated investor information had not been publicly available to a broad mass and the information flows had been very slow. Today many websites update investor education material, quotes and news in milliseconds. In this context retail and also institutional investor have to stay informed and to rebalance investment positions according to the developments all over the planet, especially because globalization has made the economies linked very closely together. Nowadays many people do not only trust in the available marketing material of their bank advisors in order to stay informed, but do take care of their investments themselves. In doing so, search engine data has become a tool to find relevant research, investment products and capital market pricing information. Since 2004 it is also transparently comprehensible what internet users actually search for. This study is by far not the first to examine the implications on capital markets. Many scientists have used search engine query data to forecast econometric time series like consumer confidence indicators, unemployment rates, retail sales, house price indices, stock prices, volatility of stocks, commodity prices and even the skew of implied options for different currency options. Following up the prior research this study analyzes the impact of internet search engine data on capital markets. This is a broad field. As many authors already have contributed to the index level data and mostly US market studies, there are only a few studies on the German market. Therefore this paper is meant to fill this gap.
Copyright © 2015. Diplomica Verlag. All rights reserved.
The main objective is to analyze two questions: First, the hypotheses whether internet search queries do have an influential directional effect on German stocks and is tested. Secondly, it will be scrutinized whether search queries directionally influence the implied volatilities of options on German stocks. In the first part of this study the effect of internet search queries related to specific stocks is analyzed and in the second section the effects of internet search queries are related and analyzed with respect to implied volatility: 1st Hypothesis: “An Increase of Internet Search Queries for a specific Stock is a directional Indicator of its later Price Movements”. 2nd Hypothesis: “An Increase of Internet Search Queries for a specific Stock increases its implied Option Volatility Premium”.
15
1.2 Contribution to the current State of Research The difference to the current state of research is mainly in the proceeding to test for different search terms such as “AG”, “Aktie” or “News” in combination with the common name, also referred to as “Name” of single stocks for the German market. Most of the studies for the US market applied the ticker symbols of the RUSSEL 3000 stocks (cf. Da, Engelberg and Gao, 2011), as did some studies (Latoeiro, Ramos and Veiga, 2013) for the European market with the standard names for the EUROSTOXX50 stocks. Research about the German Xetra market mainly focused on its trading volatility, liquidity and volume of trading (Bank, Larch and Peter, 2010 or Fink and Johann, 2013). No prior research on Google search queries on the German market for implied volatility of individual stocks has been conducted before. Therefore, to the best of my current knowledge this aspect is new and adds to the given state of research.
1.3 Structure In the first section the most ground laying basics about internet search queries are defined and the structural framework is set. Building on this framework internet search queries from the most common internet search engine are attained and the specific data structure is discussed. By dealing with seasonality and other control variables the dataset is prepared for regressions. After understanding the specifics of the data the pooled regression is conducted and split into its subsets by testing for directional impacts on the price returns and changes of the implied volatility. Finally, a VAR model on the pooled data for returns and implied volatility changes concludes the section of research. At last, drawbacks and pitfalls of the data quality and model selection are explained, while introducing further areas of research.
1.4 Definition of Internet Search Queries
Copyright © 2015. Diplomica Verlag. All rights reserved.
Before starting with an overview of the current state of literature a short definition of internet search queries should be given to the reader. In this study “internet search query” is understood as an active information gathering activity by a human being, who types in a full sentence or words into a web search engine, in order to get additional information (e.g. about an exchange traded stock or an equity option). The query can be structured with “and/or” and “not” operators.
1.5 Literature Review One of the first papers which applied internet search queries is published by Grinsberg et al. (2009). Indeed the authors were able to show that Google® data was a leading indicator to the actual outbreak of influenza epidemics with a two week edge over the official statistics. On year prior Polgreen, Chen, Pennock, and Forrest (2008) had published a paper and found correlations of the words “flu” and “influenza” with virologic and mortality surveillance data provided by Yahoo®. In the recent years researcher mainly used the publicly available data 16
by Google for other areas than medicine and started to draw conclusions for the area of capital markets.
Copyright © 2015. Diplomica Verlag. All rights reserved.
1.5.1 Stocks Bank, Larch and Peter (2010) were the first to claim that Google search queries are to be associated with a rise in trading activity and liquidity. Their sample of all Xetra®-listed stocks was cleaned of all queries with periods of zero searches for the plain names without the ending “AG”, which represents the legal form for stock listed companies in Germany. Bank, Larch and Peter’s analysis contains the Carhart (1997) factors as control variables. By assuming that the higher search volume was mainly attributable to uninformed investors, they conclude that the reduction in asymmetric information costs improves liquidity and secondly leads to higher future returns. In the year 2011 new discoveries were made also in other areas of social media. Bollen, Mao and Zeng (2011) applied Twitter data to forecast the Dow Jones Industrial Average. The idea to make cross references from collective mood states is tested with an 87.6% accuracy for the Index. They split the Twitter data into categories of mood (calm, sure, alert, vital, kind and happy). In contrast to Twitter with Google queries the mood states cannot be extracted as of today, which is an advantage to Twitter. Focusing again on Google data, Mondria and Wu (2011) found higher returns for stocks where local search queries exceeded national queries. Their sample consists of the Standard and Poor’s (S&P500) stocks and differentiates remote areas by the addresses of headquarters. They conjecture that the outperformance is due to the occurrence of initial non-public information advantage, which is revealed by search queries for verification. Da, Engelberg and Gao (2011) analyzed the Russell Index with 3000 stocks by their ticker symbol as search term and indeed found a two week leading property for the Google Search Volume Index (SVI). They state a positive relationship in the short run with a price reversal within a year. Secondly, they contribute an event study on IPO performance: Higher search activity would lead to higher first day returns, followed by long-run underperformance of the stock price. For a perspective on index data Dimpfl and Jank (2012) analyzed daily data for the search terms “dow”, “s&p500” and “s&p” and found bi-directional causality between internet search queries and stock indices volatility. Other authors like Wang (2012) differentiated between the four possible combinations of information supply and demand. Wang shows that only if information supply meets information demand, stock returns are affected. In his samples of 5607 US firms the information demand is measured by the number of news articles and the information demand by Google queries. Fink and Johann (2013) found evidence in a sample of DAX, MDAX, SDAX and TECDAX that high attention triggers positive returns and attention improves liquidity for small cap stocks by a reduction of the traded price spread. Their data frequency is on a daily basis for the stock queries by names and they applied the category filter “Finance” to all their search queries.
17
1.5.2 Implied Volatility Some studies like Vlastakis and Markellosy (2012) were able to show that there is no directional relationship between search queries for the 30 largest NYSE stocks and the additional supply of news in media. However they found an increase in search queries whenever the expected variance risk premium of S&P500 index options rose and thus claim to be the first to have verified the information demand hypothesis empirically. In addition, Kita and Wang (2012) examined seven currency pairs (interactions of EUR, GBP, USD, JPY and AUD) for their relationship between currency risk premium and carry trade returns. While controlling for macroeconomic uncertainty via newspaper articles the results show that higher investor attention, which is proxied by Google search queries, is associated with higher future volatility and lower carry trade returns. Apart from that, the study additionally associates higher investor attention with the variance risk premium and gives further explanations on implied volatility skews. Starting with a different approach Da, Engelberg and Gao (2013) constructed a sentiment index called “FEARS Index” by rolling regressions of 139 search terms related to positive or negative sentiment (e.g. “recession”, “unemployment” and ”bankruptcy”) and remained with 30 terms. Da, Engelberg and Gao associated these terms with lower returns for the cross section of S&P500’s stocks, an increase in implied volatility of VIX® and inflows into mutual bond funds. The patterns reversed in the following two days. Their paper is different as it does not analyze direct queries for the individuals stocks, but generalized conclusions from a search index to individual stocks. Again on the stock level for the European market Latoeiro, Ramos and Veiga (2013) split their research on the VSTOXX®, the EURO STOXX 50®, on the 36 individual stocks, which had been part of the index over the whole sample period. They found an increase in search queries leading to an increase in volume and volatility with a rapid reversal in the following week.
1.5.3 Unemployment, Consumer Sentiment and Revenue
Copyright © 2015. Diplomica Verlag. All rights reserved.
The rest of the literature review is dedicated to general economics and revenue studies which apply Google search queries. Even though this is only marginally contributing to the paper’s main research question, the explained variables have impact on capital markets in the large scale and therefore deserve to be discussed. Economists adapted the technique to forecast economic time series, which have long publication lags. In the field of economics this is known as now-casting. Choi and Varian (2009b) showed how Google Trends can be used to predict initial claims for US unemployment benefits and also D’Amuri and Marcucci (2010) forecasted the US unemployment rate with a Google job search index. There are already several other papers around the prediction of unemployment around the Belgium, Israeli, Italian and Norwegian states. For Germany Askitas and Zimmerman (2009) contributed by explaining the unemployment states published by the German Federal Employment Agency with search term like “unemployment office”, “personnel consultant”, keywords for popular job search engines like “Jobscout” or “Stepstone” and other terms. Their model helps in now-casting the official statistics. For nowcasting the private consumption in the U.S., please refer to Kholodilin, Podstawski and 18
Siliverstovs (2010) and the paper by Vosen and Schmidt (2011), who find that the Google search query data outperforms survey-based indicators. There are also other works on different topics. For example for information on inflation see Guzman (2011). She analyzed a case for the inflation expectations with Google data and found significant results. A study by Wu and Brynjolfsson (2009) supports Choi and Varian in the aspect of Real Estate. They found that each percent increase in the subcategory “Real Estate Agencies” correlates with an additional sale of 67,220 houses in the particular US states during the next quarter. Choi and Varian (2009a) had included three research topics in their paper. The most outstanding improvement was done in the area of tourism with an R² of 98%, followed by a 12% improvement for New Housing Statistics and 3% for monthly car sales. These improvements were achieved in a seasonal autoregressive model with the Google variable over a model without the variable. In their case for tourism to Hong Kong they were able to match the official tourism statistics of country of residence via the Google search filter by the local queries and controlled for currency effects and the Olympic sport event. Also in their study for home sales in the US, they referenced to the US state data. With the attempt to explain monthly sales of cars as reported by Automotive News, they focused on the Google’s subcategory of the brand Ford. Overall, they found the outcomes to be faster than the actual publications. A study by Goel et al. (2010) achieved highly predictive future outcomes with an R² of 70-80% for film revenues at the very first opening weekends, first month sales of newly introduced video games and the rank of new songs on the charts. Their autoregressive model controlled for costs of production, predecessors and the number of screens. Studies of future sales numbers can be helpful in reducing uncertainty around earnings announcements. Drake, Roulstone, and Thornock (2012) found search queries to increase two weeks before announcements, with a spike at announcement day.
Copyright © 2015. Diplomica Verlag. All rights reserved.
1. Data Scope of Analysis The data scope of this study extends over the German Stock Index DAX® (Deutscher Aktien IndeX), MDAX® (Mid-Cap-DAX) and SDAX® (Small-Cap-DAX). The three parts are the German prime standard market indices for large, medium and small sized exchange listed companies. The blue chip index DAX covers 80% of Germany’s free float market and consist of the 30 largest companies in terms of market capitalization and exchange turnover. MDAX and SDAX both have 50 titles and follow directly DAX constituents. The full prime standard would be completed by adding TECDAX® to the sample. The TECDAX consists of the 30 largest technology shares. The composition of all indices is constantly review and rebalanced on a quarterly basis, except for new listings, deletions or mergers, which are taken into account immediately (Deutsche Börse 2013, p. 19). The TECDAX was initially not included into the sample for two reasons. First, in order keep the sample size manageable and secondly with respect to the online business model of some companies (e.g. Xing or Freenet) the correlations of search queries and the success of the companies were assumed to be high ex ante. So for a generalization of theory the hypothesis should work for standard companies too.
19
1..1 Timeframe The overall timeframe of nine years and four month from 10 January 2004 to 4 May 2013 refers to the first publicly available observation downloadable from Google and the time of this study. All regressions are based on this time frame. It is to say, that two major macroeconomic crisis fall into this period. The global financial crisis of 2008/09 and the European sovereign debt crisis that has been going on since 2010. Both crisis affected the global economy and lead to a slowdown of production. The financial crisis of 2008 is sometimes also referred to as the “Great Recession”.
1..2 Necessary Adjustments in Sample Selection Over this period not all the stocks could be added to the analysis, which adds a small selection bias. The final structure of constituents as of 6 May 2013 is modified with respect to the initial setup of January 2004 in the following way: All stocks which are included in the index in 2013 should also be in one of the three indices at the starting point of the analysis in 2004 in order to ensure that all control variables are available and the stocks are already exchange listed and tradable. This implies a survivorship bias in terms of excluding companies which defaulted or merged during the time in between. Some companies were taken private and are also excluded, because no trading prices are quoted anymore. Third, newly listed companies after 2004 are not included in the sample. There have been several event studies concerning IPO’s which cover this topic (cf. Da, Engelberg and Gao, 2011). The main argument for adjusting the sample is to focus on a continuous and comparable data set basis.
1. Search Engines – Gateways to Information The internet search query data is downloaded from Google. According to webhits.de the American company Google Inc. had a German market share of 80,4% (Webhits, 2013) and a rather higher score of 83,18% was reported by netmarketshare.com on a Global ranking (Netmarketshare, 2013). This leads to the assumption that Google data can, to a certain extent, allow to appropriately test hypothesis and has the necessary data scope to draw statistically significant conclusions about overall search activities.
Copyright © 2015. Diplomica Verlag. All rights reserved.
1..1 The Google Tool Historically, there has been” Google Trends” and “Google Insights for Search” which both have been merged into “Google Trends” in September 2012 (Google, 2012b). Since then the combined interface, under Google Trends, is the only remaining platform. The service is provided by Google Inc. (“Google”), located at 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States, and can be accessed via http://www.google.com/trends/. Access to the data is free of charge and furthermore all times
20
series can be downloaded after registration and login with a free Google Account at the website. The tool allows to lookup an index of a specific search query from year 2004 until today and is available for worldwide data. The user interface of the software offers four options to specify the query into web search, area specific search, time frame and category. For this study “Web Search” is of relevance, which could also be modified into “Product” or “News Search”. Not relevant are “Youtube” and “Image Search”. The area specific search queries can be modified to a specific country, state or in some case also cities. For example Germany can be broken down into the state of “Hessen”, but the city of Frankfurt is not available as time series yet, although Google already displays the current respective search activities by city. Depending on the query’s frequency some time series are still on a monthly basis, whereas the most common downloadable format is of a weekly frequency. The existing possibility to download daily time series of the last 90 days can be extended to the further past by manually downloading one month windows by the “Select dates” functionality and then chaining the time series parts together manually. Under the category ‘filter’ there are 26 options with 241 subcategories available. Using the example of Choi and Varian (cf. 2009a, p. 4): “…query [car tire] would be assigned to category Vehicle Tires which is a subcategory of Auto Parts which is a subcategory of Automotive.”. Of major interest are the categories “Business & Industrial” and “Finance” for stocks. These categories do not always deliver time series for all queries. So in the later analysis the most general form over all categories is applied, instead of the “Finance” filtering, to take care of the maximum likelihood to actually get a time series to analyze. Other research focused on these specific categories (e.g. Fink and Johann, 2013) The query can be compared to its category. In this case the time series is scaled into a percentage of the initial starting value and is thus a growth rate (Google, 2012c). The category can add important information with respect to seasonality.
Copyright © 2015. Diplomica Verlag. All rights reserved.
1..2 Grouping It is possible to group up to 25 search terms via a “+” sign. The Items are then displayed as separate graphs. In order to specify the query for combinations of terms the quotation marks (“x+y”) have to be set at the beginning and end of each request.
1..3 Multiple Counting is automatically avoided by Google In order to avoid multiple counting the request are filtered by their IP address. The IP address (Internet Protocol address) is a numerical label assigned to each computer, which uses the
21
Internet Protocol for communication. The IP can be used for host or network interface identification and location addressing. By identifying each user via an IP only the sum of their daily queries become part of the search volume index. If one user is not only searching via one computer (IP) then the queries are counted multiple times. It cannot be distinguished on a publicly available basis, for how many cross sectional queries the same user is responsible. It is possible that one user is responsible for generating all the signals over time.
1..4 Synthetic Index rather than actual Numbers Google does not publish the overall sum of search queries, but calculates an index. This index is bounded within the values of 0 and 100 and is recalculated under specific situations: Whenever there is a new maximum of search queries, this quantity is set to 100 and thereafter preceding quantities are scaled by this quantity via division and multiplication of 100 until a new high is reached. The old values are not recalculated and remain scaled by their old maxima. The Index could be interpreted as a percentage index (Google, 2012a). For this reason it is difficult to compare different stocks by their search intensity. The actual quantities are not available to attribute increases in one stock query to a decrease of another. This may be interesting in the case of actual sales and shipped units of competitors. Moreover, the rescaling does not allow drawing conclusions about the original query quantity for a company, because the basis is constantly shifting.
1..5 Empty Values Another drawback in Google’s practice is to publish an index value of “0” instead of a very small number, whenever the search queries were below a certain threshold level. Google does not transparently explain how the threshold level is measured until now (Google, 2010).
Copyright © 2015. Diplomica Verlag. All rights reserved.
1..6 Limited by German Language As the later data will show, most of the relevant data emerges from the German language area and from queries within Germany. This is also true for most queries which refer to international exporting companies like car manufacturers (e.g. BMW) and is in contrast to the previous studies on individual stocks from the US market. On the higher level of DAX there are comparably more international queries than in the smaller company index MDAX and SDAX. This may be a hint to home bias and the local degree of familiarity with smaller stocks. When analyzing queries of a combination of the stocks name with a second word, the language barriers become more obvious. When searching for terms like “Aktie” (engl. stock) or “Dividende” (engl. dividend) already small changes in the denotation can tilt the data origination form German to English speaking countries. A study by Mondria and Wu (2011) showed that home bias delivers higher returns by advantages of higher information density. Therefore 22
the study uses the German terms in the regression models. A comparable study by Bank, Larch and Peter (2010) on the German stock market for all Xetra-listed stocks used the Name of the companies, but without any “AG”. Their queries are restricted to only German queries. Fink and Johann (2013) apply the category filter “Finance” when downloading the data in addition to the name of the German companies. This procedure allows taking advantage of a particular Google feature, which assigns queries with the classification of the final website accessed, after activating the query. This anomaly to other studies adds additional information to the query and the authors show that it improves the query quality. As it is not transparent how Google classifies “Finance” queries, in this study nevertheless the standard query method is used and the focus is set via the additional terms “AG” and “Aktie”.
1..7 Exact Wording of Search Terms and Search Term Combinations When searching for data on a search engine one question which arises is: What do people type into the search engine? Most users start by typing in just one search term (cf. Spink et al., 2001). This seems to be common practice and is also supported by the data set later. To set up a list of words, the most common reference name for a company is searched (e.g. BMW for Bayerische Motoren Werke Aktiengesellschaft). This approach had some minor flaws because the German common understanding of some stock names conflict with some equal meaning in the English language. E.g. “MAN” is a German producer in the automobile industry and “Metro” a big retailer for consumer products. In these cases a more stock related perspective was introduced by searching for the combination of the stock’s name together with the German abbreviation for PLC (public limited company) namely “AG” (Aktiengesellschaft). Altogether, four main search combinations evolved: The common search name declared as “Name”, the name plus “AG”, the name plus “Aktie” and the name plus “News”. It is to say that the available data frequency dramatically decreases by combining terms. In the initial setup many more terms were included, but not enough data sets could be extracted. These terms were: name plus “Report”, name plus “Return”, name plus “Rendite”, name plus “HV” (engl. shareholders' meeting), name plus “IR”, name plus “Investor Relalations”, name plus “Bilanz” (engl. balance sheet), name plus “P&L” and name plus “GuV” (engl. P&L).
Copyright © 2015. Diplomica Verlag. All rights reserved.
The fact that only top level search terms are available may support the initial assumption that only search queries with one word are preferred over full sentences or it may be due to Google’s policy not to publish time series which fall below a certain threshold level. The available time series in 2013 partly did not start in 2004. Some of the time series started later. The graph below shows how the signals of time series evolved over time. Most signals come from “Name” and the least from “News”. “News” was dropped later in the analysis because of the low data availability.
23
Figure 1: Evolution of available Signals 120 100 80 60 40 20 0 2004
2005
2006
GSI_common_name
2007
2008
GSI_name_AG
2009
2010
GSI_name_aktie
2011
2012
2013
GSI_name_news
The available frequency of data is different from index to index. Starting with the 90 index stocks only a fraction remained, depending on the exact search term wording. Especially in the SDAX data frequency significantly dropped for weekly data. In general, the most search terms were available for the common search name of a company. The suffix “AG” and “Aktie” offered 27 and 18 results, but for “News” there had been only six results in combination with a stock’s name for the full set. Please find a table of the finally available terms below: Table 1: List of Stocks in Sample
DAX Common name
MDAX SDAX
DAX AG MDAX SDAX DAX
Copyright © 2015. Diplomica Verlag. All rights reserved.
Aktie
News
MDAX SDAX DAX MDAX SDAX
Adidas, Allianz, BASF, Bayer, BMW, Commerzbank, Continental, Daimler, Deutsche Bank, Deutsche Börse, Deutsche Post, Deutsche Telekom, E.ON, Fresenius Medical Care, Henkel, Infineon, Linde, Lufthansa, Münchener Rück, RWE, SAP, Siemens, ThyssenKrupp, TUI, Volkswagen Aareal Bank, Aurubis, Beiersdorf, Bilfinger, Celesio, Comdirect, EADS, Fielmann, Fraport, GEA, Hannover Rück, HeidelbergCement, Heidelberger Druckmaschinen, Hochtief, IVG Immobilien, KUKA, Merck, ProSieben, Puma, Salzgitter, SGL Carbon, Stada, Südzucker, Vossloh Baader Bank, Balda, BayWa, Cewe, Deutz, Dürr, ElringKlinger, Fuchs Petrolub, Gerry Weber, GFK, Gildemeister, Grenkeleasing, Hawesko, Hornbach, Jungheinrich, MVV Energie, Sixt "Adidas AG", "Allianz AG", "Bayer AG", "BMW AG", "Commerzbank AG", "Continental AG", "Daimler AG", "Deutsche Bank AG", "Deutsche Post AG", "Deutsche Telekom AG", "Henkel AG", "Linde AG", "Lufthansa AG", "RWE AG", "SAP AG", "Siemens AG", "ThyssenKrupp AG", "TUI AG", "Volkswagen AG", "MAN AG", "Metro AG" "Beiersdorf AG", "Fraport AG", "Salzgitter AG", "MLP AG" "BayWa AG", "Deutz AG" "Allianz Aktie", "BASF Aktie", "Bayer Aktie", "BMW Aktie", "Commerzbank Aktie", "Daimler Aktie", "Deutsche Bank Aktie", EON Aktie, Infineon Aktie, Lufthansa Aktie, "RWE Aktie", "SAP Aktie", "Siemens Aktie", "ThyssenKrupp Aktie", "TUI Aktie" "Heidelberger Druckmaschinen Aktie" "Dürr Aktie", "Sixt Aktie" "BMW News", "Commerzbank News", "Daimler News", "E.ON News", "Siemens News” “Hochtief News" -
The frequency for “News” is far too low to make significant statements about the search power and the overall quality of the full sample and for MDAX and SDAX we can make clear statements only with the category “Name”. 24
When focusing on option implied volatilities, statements can only be made with respect to the DAX index. Options data are mostly from DAX, because many SDAX and MDAX title had close to no publicly traded options outstanding in 2004 (There are five option time series available from Bloomberg for MDAX and none for SDAX). The data quality would guide to misleading interpretations, when not exclusively focusing on DAX.
1. Control Variables In order to check the regressions for robustness and interaction with other already known influencing factors, the following control variables are added to the models. The variables include lags of the stock performance to include mean reversion and auto-regression, lags of the search queries themselves, historical market volatility, turnover of stocks, price-to book ratio and one year high or low dummies. This list does not aim to be complete and to fully reflect the true influencing factors. A complete list would be a broader area of research and is a wide academic field. The following control variables are chosen:
1..1 Market Index The market index is added to account for macroeconomic influences, which are not directly attributable to the stock itself. In science this phenomenon is known as idiosyncratic versus systematic risk (cf. Sharpe, 1964). The market in this study is represented by the average performance of the stocks in the sample. This could also be the HDAX or DAX as an overall proxy for the German market. The variable is important as it normally explains most of the daily variations of a stock. The average beta of German stocks is typically very close to one. For example technological companies show higher betas and healthcare companies lower betas. The difference is due to their different sensitivity to the business cycle. In the later models the average performance of the week is added to the regression as a control variable.
Copyright © 2015. Diplomica Verlag. All rights reserved.
1..2 Turnover The control variable turnover is important as it reflects the current demand and supply situation of stocks in capital markets. In basic economic theory high supply is connected with lower prices and vice versa. In the model the turnover is applied instead of the volume because the turnover is set in relation to the firm value adjusted for the free float. Some companies are exchange listed and valuated, but are not freely tradable. This enables volume squeezes and shortages of supply. The formula adjusts for the smaller free float wherever necessary and can be interpreted as the percentage of free float traded that week (e.g. 5% of the tradable stocks changed ownership this week). An exhaustive discussion about illiquidity and which formula to use can be found in Bank, Larch and Peter (2010, p.6). ܶݎ݁ݒ݊ݎݑ ݅݊ Ψ ൌ
ܸ݁݉ݑ݈ ݁ܿ݅ݎܲ כ ܸܯ ݐ݈݂ܽ݁݁ݎܨ כ ݅݊ Ψ
(1)
25
1..3 Price-to-Book Ratio (PB) The price to book ratio (PB) is part of the Fama-French-Model (Fama and French, 1993). The ratio allows comparing the market valuation view of a company with the intrinsic value of its balance sheet. The balance sheet consists of debt and equity. The equity is the money stockholders invested and retained in the company for future growth. As market price the traded stocks reflect the future payouts to the equity owners of the company. A high P/B thus implies that market participants assume high future payouts for this company. The ratio allows comparing companies by their implied valuation. In this context it is important to notice that some industries require more equity to survive than others under cyclical economic drawdowns and as capital buffer shrink.
1..4 Prior Price Changes The previous trading days are sometimes assumed to have a major impact onto the current trading direction. In economic theory this can be due to autocorrelation and mean reversion – two opposing effects. This is because market participants can act rational and irrational. Trading momentum over many weeks can sometimes be understood as herd following in behavioral finance (Grinblatt, Titman and Wermers, 1995), a real change in the fundamentals or just a random effect. For example autocorrelation could be induced by a big investment fund who is investing a large sum over a couple of weeks and mean reversion by other fund managers, who believe that the stock is overpriced and thus sell the security. The measurement of the effects mostly depends on the model construction itself: Momentum studies have found out that five to ten year momenta imply mean reversion (DeBond and Thaler, 1985) and one year momentum autocorrelation (Jegadeesh and Titman, 1993). The short dated frequencies in the one to two days frame are assumed to be mean reverting and three to four weeks to be autoregressive. This study focuses on the very short run effects, thus includes the one week lag for mean reversion.
Copyright © 2015. Diplomica Verlag. All rights reserved.
1..5 The 52 Week High or Low The 52 week high or low refers to the price of a specific stock relative to its historical prices. Latoeiro, Ramos and Veiga (2013) use this control variable in their working paper as a dummy variable and found some significant relations to search queries. In this study the dummy variable is set with a 10% tolerance, which means that a stock is classified as high already at 90% of its one year performance level and as low, if in the bottom 10% quantile. As in the discussion about the autocorrelation in the section about prior price changes the one year returns are assumed to be auto-correlated.
26
1..6 Implied Volatility The implied volatility needs some explanation – also it is used as a dependent variable and output in the later analysis. Therefore it is explained in this section together with the control variables. Black and Fischer (1973) formulated a function for option prices, which is mainly dependent on the implied volatlity, given all other parameters, as follows: ݁ܿ݅ݎܲ ݈݈ܽܥሺܵǡ ݐሻ ൌ ܰሺ݀ା ሻ ܵ כെ ܰሺ݀ି ሻ ି ݁ כ ܭ כ்כ
݄ݐ݅ݓ
݀ା ൌ
ܵ ߪଶ ቀ ܭቁ ൬ ݎ ʹ ൰ ܶ כ
(2) (3)
ߪ כξܶ
݀ି ൌ ݀ା െ ߪ כξܶ
(4)
Where N() is the cumulative normal distribution, T the time to maturity, S the spot price of the underlying stock, K the strike price, r the risk free annual continuous compounding interest rate and ı the implied volatility of returns. Whenever the market quotes prices of an option, the implied volatility can be extracted from this price by solving the equation for the implied volatility. As there is no closed form solution, this can be done via a Newton algorithm or simply trial and error. The implied volatility can be interpreted as the average uncertainty over the life time of the option. A call option, which has a positive payout if the spot is above the strike at maturity, is convex in the way that upward volatility is positively related to the payout. High implied volatility thus represents the market participant’s expectation of high cost of hedging and vice versa.
1..7 3m implied Volatilities
Copyright © 2015. Diplomica Verlag. All rights reserved.
In this study the implied volatilities of options with a maturity in three month are included in the output of the regressions. The fixed time frame is steadily and constantly rolled to adjust for constant three month maturities on a weekly basis. The options have a strike price close to the Forward (“At-the-money”). An increase in the implied volatility thus represents an increase in the options price. All data come from Bloomberg LP.
1..8 Historical Volatility In order to check for the effect of new stock market volatility the 30 day rolling volatility is added as a control variable. The historical volatility is not the options volatility, but the underlying volatility of the stock. Whenever the stock starts to swing the historical volatility increases. In this setup the formula for the historical volatility takes the last 30 trading days
27
into account. This means that large swings remain in the formula for some time, diminish over time and vanish after 30 days. This fact explains why volatility typically spikes sharply in times of market shocks and slowly fades out thereafter. The historical volatility can be higher or lower than the implied volatility. The relationship between implied volatility and historical volatility is also known as variance risk premium (VRP). Kita and Wang (2012) used this VRP in their study about foreign exchange (FX) market options and FX fluctuations induced by search queries.
Copyright © 2015. Diplomica Verlag. All rights reserved.
After defining the control variables the next chapter starts with the regressions.
28
Part 2 - Regressions 2.1 Model Selection The overall model is a pooled regression with dummy variables for the specific stock or group intercepts, respectively. This method allows for differentiating between the control variables, which are level data and the output function which is a price change or delta volatility change for some cases. The method allows for customization by adding group intercepts. For the regression of price changes three control groups were analyzed. One for the membership of one of the indices, one for the Price-to-book ratio and lastly one for the 52 weeks high or low attribute. For regressions of the implied volatility a further control is the historical volatility. In both regression frameworks the last weeks trading return and the market return are added to account for the autocorrelation of the price changes themselves. For the implied volatility regression the current market return is used instead of the last week’s return. Alternatively to this method a multilevel hierarchical regression could have been used, but this method does not allow for confidence intervals of the coefficients without a Markov chain Monte Carlo simulation and reports Baysian probabilities. It is based on inference and is different to traditional statistics. Other authors like Latoeiro, Ramos and Veiga (2013) also used Garch models for the explanation of stock volatility. As this study does not focus on the stock’s volatility, the Garch model could be used for the implied volatility. Indeed the historical volatility is part of the models, but no long run mean, as in Garch. Other authors like Mondria and Wu (2011) set up a portfolio framework and measured the performance of trades to test their hypothesis against each other. They simulate investment strategies according to their hypotheses as either long or short trades and analyze the results. In the later stages of this study a similar method is used to differentiate between groups of stocks according to their quantile of valuation (BP-ratio) and historical volatility.
Copyright © 2015. Diplomica Verlag. All rights reserved.
2.1.1 Goodness of Fit For a multiple regression there are some very basic prerequisites. For a fully detailed explanation please refer to a basic econometrics book (e.g. Brooks, 2008). The following sentence should report only the most basic ones. To start with a Jarque-Bera test a variable can be tested whether it is normally distributed. This is especially important for the residuals, which should not contain any further information than white noise and should have a mean value of zero. Furthermore, the residuals of a time series should not be correlated. This can be tested via a Durbin-Watson test. To account for seasonality a time series can be regressed on the month. Of very importance is heteroscedasticity, which means that large realizations of the independent variable spread with the increase of the dependent variables. There is a quick fix to this problem by selecting robust standard errors. All regressions in this study show heteroscedasticity. The model specification can be checked for non-linearities by a Ramsey-Reset test. Non-linear variables can be transformed by taking the logarithm, the roots or exponents until they are specified correctly. Once a significant regression occurs, it shows high R² with a 29
significant F-Test for the variables. To verify the apparent relationship the data set can be split into subsets via a Chow-test (e.g. in two parts as in this study).
2.1.2 Level versus Changes This study focuses on analyzing the effect of queries, on the price changes and volatility changes of stocks. Therefore the depended variable is not a level, but a change. To explain changes the input data has also to be a change. Thus the stock price changes over one week are regressed on the Google search volume index (GSI) changes. οܫܵܩ ൌ ܫܵܩ െ ܫܵܩషభ
(5)
Some studies use the abnormal search queries instead of delta changes. They set the current value in relation to the four week average (e.g. Latoeiro, Ramos and Veiga, 2013, p.10). This procedure was not adopted, because some time series had to be seasonally adjusted first. Other studies such as Mondria and Wu (2011) used the difference of the current month to the quarterly median of monthly data. Their lower data frequency was not corrected for monthly seasonality.
2.1.3 Seasonality The following figure shows an example for three major German car manufacturers and the positive July and December seasonal increase. Figure 2: Averages from 2004 to 2013
DAI
VW
December
November
October
August
July
June
May
April
March
February
BMW
September
Copyright © 2015. Diplomica Verlag. All rights reserved.
January
4 3 2 1 0 -1 -2 -3
Category
This attribute is solved by Choi and Varian (2009, p.2) by simply subtracting the last years value from the current value. A second solution is to regress the queries on the respective month and continue with the remaining residual for any other further calculation. Another method could be to subtract the category queries. But in this paper a regression and F-Test is conducted for every single query and only those time series are corrected for seasonality, 30
where the F-Test showed critical values. The resulting time series is called “mixed” due to the time series corrected and party not seasonally corrected. The correction is done by subtracting the monthly average. Not all time series are seasonal. The following table shows the F-Test values for the three category searches. Empty values denote that not enough data points were available from Google. Seasonality is expressed at 99% significance with three stars (***), 95% (**) significance and 90% (*) significance.
Copyright © 2015. Diplomica Verlag. All rights reserved.
Other researchers, for example Vlastakis and Markellos (2012), proceeded in testing for seasonality via an F-test. Prior to that they conducted a test for unit roots and found out that most of the 30 stocks were stationary around a time trend. Their finding of stationarity is not surprising, because Google algorithmically adjusts the data to stay between the minimum of zero and the maximum of 100 (Part 1.8.4).
31
Variable Y Adidas AG Allianz SE BASF SE Bayer AG BMW Commerzbank AG Daimler AG Deutsche Bank AG Deutsche Börse AG Deutsche Lufthansa AG Deutsche Post AG Deutsche Telekom AG E.ON SE Fresenius Medical Henkel AG & Co KGaA Infineon Linde AG RWE AG SAP AG Siemens AG ThyssenKrupp AG TUI AG Volkswagen AG Aareal Bank AG Aurubis AG Beiersdorf AG Bilfinger SE Celesio AG Comdirect Bank AG EADS Fielmann AG Fraport
Copyright © 2015. Diplomica Verlag. All rights reserved.
32 Name 7.8341 *** 4.711 *** 6.2559 *** 2.8058 *** 1.8796 ** 1.3131 4.0417 *** 0.5934 0.819 1.2413 0.589 1.407 1.3668 1.8548 ** 4.869 *** 2.113 ** 5.1076 *** 3.2797 *** 11.305 *** 2.1649 ** 4.4175 *** 2.2068 ** 4.3701 *** 0.5713 1.1102 2.7825 *** 1.325 0.9766 1.5681 1.1117 3.6435 *** 0.4942
AG 0.623 0.362 # 0.9973 # 0.3567 2.47 *** 0.5102 1.2437 0.353 # 0.8943 # # 0.4458 # 0.5907 0.6013 0.7409 1.2921 0.3926 # 1.1945 # # 1.4523 # # # # # 0.4322
Aktie 0.9068 0.6616 # 0.4109 0.3714 1.0603 0.9594 1.2115 # 0.4941 # # 0.3884 # # 0.3829 # 0.7733 0.4303 2.2991 *** 0.2664 # # # # # # # # # # #
Variable Y GEA Group AG Hannover Rück HeidelbergCement Heidelberger Druc Hochtief AG IVG Immobilien AG KUKA AG Merck KGaA ProSiebenSat.1 Puma SE Salzgitter AG SGL Carbon SE Stada Südzucker AG Vossloh AG Baader Bank AG Balda AG BayWa AG Cewe Color Holding AG Deutz AG Dürr AG ElringKlinger AG Fuchs Petrolub AG Gerry Weber GFK SE Gildemeister AG Grenkeleasing AG Hawesko Holding AG Hornbach Holding AG Jungheinrich AG MVV Energie AG Sixt AG
Table 2: F-Test for Seasonality of GSI and the three Categories Name 2.0096 ** 0.932 0.1824 1.4281 2.0046 ** 0.5842 2.7243 *** 2.2281 ** 1.2059 5.816 *** 1.6905 * 2.0479 ** 0.6451 1.227 1.3495 0.5848 0.2624 2.2817 ** 2.2542 ** 0.2944 0.4235 2.432 *** 0.4678 0.0133 2.8285 *** 1.5575 0.5976 1.1106 7.065 *** 3.371 *** 0.178 4.4702 ***
AG # # # # # # # # # # 0.8975 # # # # # # 0.3862 # 0.6548 # # # # # # # # # # # #
Aktie # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
As already said, the F-test reports the monthly search query changes under the combined hypothesis of jointly seasonal effects of the monthly dummy variables (January to December). The above table reports the jointly significant stocks with stars. As we see, BMW “name” queries show seasonality at 95% confidence. After controlling for seasonality the regression of individual stocks is conducted.
2.2 - Return Regressions The single stock regressions are constructed as a linear model of the form: ܲ ݊ݎݑݐ݁ݎ ݁ܿ݅ݎ ݅݊ Ψ ൌ ܿݐ݂݂݊݁݅ܿ݅݁ כοܫܵܩ ܿݐݏ݊
(6)
The pooled regression model carries dummies for the identification of the intercept (consti) for each stock in the pooled data set. Whenever the stock is meant this dummy is set to one or zero otherwise. The regression then delivers an intercept as consti. Alternatively each stock could be regressed individually and independently. The advantage of the pooled regression is that it is time saving and in unison there is still an isolated intercept and beta for every stock. However different distributions for each stock could skew the results. For this reason a double check for the first 15 stocks was conducted. The single and independent regressions lead to the same beta and intercept, thus the pooled regression can be continued.
Copyright © 2015. Diplomica Verlag. All rights reserved.
In the first model the “Name” queries are the inputs. In the two later regressions “Name” is altered to “AG” and “Aktie”. The findings are presented in the following table 3:
33
coefficient 0.01623 -0.01402 -0.04693 0.03176 -0.05492 -0.05205 0.0267 0.03791 -0.01356 -0.04352 0.01921 -0.03557 -0.03742 0.00132 -0.00709 -0.0433 0.02173 -0.0193 -0.01107 -0.02816 0.04746 -0.06185 0.59852 -0.13225 -0.03194 -0.0066 -0.01279 -0.00797 0.0213 -0.08054 -0.00275 0.01794
0.002488
180016.8
Variable X AdidasAG AllianzSE BASFSE BayerAG BayerischeMotor CommerzbankAG DaimlerAG DeutscheBankAG DeutscheBorseAG DeutscheLufthan DeutschePostAG DeutscheTelekom E_ONSE FreseniusMedica HenkelAG_CoKGaA InfineonTechnol LindeAG RWEAG SAPAG SiemensAG ThyssenKruppAG TUIAG VolkswagenAG AarealBankAG AurubisAG BeiersdorfAG BilfingerSE CelesioAG ComdirectBankAG EADS FielmannAG FraportAGFrankf
Adj. R²
AIC
Copyright © 2015. Diplomica Verlag. All rights reserved.
34
t-ratio (0.1685) (-0.258) (-1.325) (0.8552) (-0.5076) (-1.3478) (0.609) (0.9503) (-0.4397) (-0.9314) (0.5481) (-0.584) (-0.5832) (0.0697) (-0.1674) (-0.969) (0.5178) (-0.5557) (-0.2221) (-0.1696) (1.59) (-1.2631) (6.2127) *** (-3.9437) *** (-1.2258) (-0.2159) (-0.4858) (-0.2704) (0.5278) (-1.7586) * (-0.0887) (0.715) F(127. 28872) CHOW pValue
constant 0.34477 0.15197 0.32822 0.33483 0.24872 -0.24342 0.16935 0.0906 0.26575 0.12348 0.11016 -0.049 0.06393 0.25213 0.31199 0.20233 0.31352 0.06538 0.18275 0.12466 0.12388 0.05161 0.53887 0.20806 0.46706 0.20084 0.35795 -0.08604 0.07846 0.29671 0.33945 0.20877
0.3588
1.5695***
t-ratio (1.4129) (0.6228) (1.3451) (1.3721) (1.0193) (-0.9975) (0.694) (0.3713) (1.089) (0.506) (0.4514) (-0.2008) (0.262) (1.0332) (1.2785) (0.8292) (1.2848) (0.2679) (0.7489) (0.5109) (0.5077) (0.2115) (2.2083) ** (0.8482) (1.2847) (0.8231) (1.4669) (-0.3286) (0.3215) (1.2159) (1.3911) (0.8556)
u~N(0,1) p-Value
Homoscedasticity
Variable X GEA GroupAG HannoverRuckver HeidelbergCemen HeidelbergerDru HochtiefAG IVGImmobilienAG KUKAAG MerckKGaA ProSiebenSat_1M PumaSE SalzgitterAG SGLCarbonSE StadaArzneimitt SudzuckerAG VosslohAG BaaderBankAG BaldaAG BayWaAG CeweColorHoldin DeutzAG DurrAG ElringKlingerAG FuchsPetrolubAG GerryWeberInter GFKSE GildemeisterAG GrenkeleasingAG HaweskoHoldingA HornbachHolding JungheinrichAG MVVEnergieAG SixtAG
Y=price_r_t0
Table 3: Model_mix_GSI_Name
0
Robust SE
coefficient 0.01881 -0.08196 0.04181 -0.01176 -0.02536 -0.05969 0.02806 0.06884 0.0226 0.04924 -0.05128 0.04878 -0.06229 0.01021 -0.02484 0.0394 0.16149 -0.04032 -0.04319 0.04269 -0.10695 -0.00072 -0.01383 -0.0093 0.00292 -0.09588 -0.00421 0.02277 0.01891 -0.00459 0.00543 -0.09107
t-ratio (0.5145) (-2.4392) ** (1.758) * (-0.5518) (-1.0596) (-3.1675) *** (0.9806) (1.2436) (0.6778) (0.5502) (-1.7005) * (1.4004) (-2.734) *** (0.4395) (-1.2253) (1.3374) (4.7891) *** (-1.3413) (-0.4268) (1.1659) (-2.454) ** (-0.0416) (-0.7177) (-0.3257) (0.097) (-2.4839) ** (-0.2039) (0.815) (0.3491) (-0.2215) (0.2461) (-1.4873) N
constant 0.29909 0.39457 0.09645 -0.26583 0.33303 -0.95628 0.29079 0.33279 0.48714 0.19235 0.43204 0.38078 0.20274 0.20085 0.21863 -0.19926 0.56433 0.32498 0.23485 0.25891 0.45743 0.41602 0.65126 0.56595 0.26356 0.36119 0.2098 0.37623 0.17132 0.29337 -0.07027 0.34982
29000
t-ratio (1.2257) (1.2576) (0.3316) (-1.0747) (1.3648) (-2.9585) *** (1.1917) (1.3638) (1.9963) ** (0.7883) (1.7705) * (1.5605) (0.8308) (0.8171) (0.896) (-0.4858) (2.3126) ** (1.3318) (0.9624) (1.061) (1.8745) * (1.2684) (1.8159) * (2.3193) ** (1.0801) (1.4801) (0.594) (1.5306) (0.7021) (1.2022) (-0.2394) (1.4336)
Surprisingly, there is no pattern immediately visible, which could help in explaining the first hypothesis. The “Name” queries do not show a consistent behavior. For some stocks the impact is negative, for others it is positive. For example “Adidas” has a coefficient of 0.01623 percent. This would imply that an increase of one unit in search queries for Adidas increases the stock price by 0.01623%. The coefficient’s t-ratio of 0.1685 is far below any significance level, which means that the coefficient is not distinguishable from zero impact. The sign could also be negative at this low level of significance and the most stocks are not significantly explained. The eleven stocks, which show significant coefficients, are neither uniformly positive nor negative. This is a second indication, that the data may not contain any additional information over white noise. In general, at a significance level of 95%, 11 out of 220 stocks would be allowed to show statistical significance (11 = 5%*220). So some stocks were expected to be significant ex ante by randomness, which is the case. Moreover the overall Adj. R² is below 1%. This is a sign of very bad model quality. As the regression is for the same time interval and without any lags, the effect could also be of reverse causality: meaning that an increase in the price induces higher search query amounts. In science this is known as Granger-Causality. A prior event induces a later result. As the data frequency is for weekly data, the impact on later time windows is subject to the analysis in part 3 of this study – the VAR models. For now, the weekly data should already transmit a message irrespective of time effects or direction of causality. If there was information in the dataset, then the data should already be significant. However, this is not the case.
Copyright © 2015. Diplomica Verlag. All rights reserved.
One assumption why the data are not significant could be the model specification itself. Additionally, the web traffic could have been measured as a control variable. Rajgopal et al. (2000) show that the profitability of internet firms can be approximated by the amount of web traffic to the site. From internet search queries it cannot be revealed how many user actually have accessed the company’s site after the query. Some users may even know the web address by heart and bypass the query gateway at all.
35
But for now the following tables present the two other queries for individual stocks “AG” and “Aktie”: Table 4: Model_mix_GSI_AG
Copyright © 2015. Diplomica Verlag. All rights reserved.
Y=price_r_t0 Variable X AdidasAG AllianzSE BayerAG CommerzbankAG DaimlerAG DeutscheBankA DeutscheBorse DeutscheLufth DeutscheTelek HenkelAG_CoKG LindeAG RWEAG SAPAG SiemensAG ThyssenKruppA VolkswagenAG BeiersdorfAG FraportAGFran SalzgitterAG BayWaAG DeutzAG
coefficient -0.01289 0.01483 0.02072 -0.01366 0.01832 -0.01833 -0.00254 -0.00436 0.01246 -0.01015 0.00633 -0.03327 0.00004 0.00401 0.0069 0.1795 -0.00839 -0.00488 0.00109 0.00217 0.01543
t-ratio (-0.876) (1.416) (1.297) (-0.644) (0.607) (-0.965) (-0.139) (-0.283) (0.717) (-0.434) (0.553) (-2.145)** (0.004) (0.252) (0.39) (1.265) (-0.833) (-0.278) (0.049) (0.126) (0.73)
constant 0.59862 0.19064 0.33718 -0.36393 0.05591 0.07152 0.26685 0.09616 -0.05719 0.48801 0.29847 -0.24141 0.19662 0.12532 0.0045 0.54397 0.09441 -0.04583 0.42247 0.32606 0.0792
N Adj. R² AIC F-Test
7623 0.004267 46923.89 0.971743
CHOW p-Value u~N(0,1) p-Value Homoscedasticity
t-ratio (2.311)** (0.603) (1.971)** (-0.927) (0.164) (0.253) (1.252) (0.368) (-0.343) (2.145)** (1.889)* (-0.941) (1.144) (0.658) (0.011) (1.374) (0.531) (-0.173) (1.392) (1.141) (0.23) 0.82 0 Robust SE
On first sight the amount of available stocks has dropped by changing the query type from “Name” to “AG”. The adjusted R² has not improved on the other hand and is quite low in comparison to the first model. The F-test statistic decreased, but the sample split test got a little bit better. The signs of the individual stocks are mixed again and the t-ratios remain low. The picture is not very different compared to the first regressions for “Name” queries. From this perspective any statistical significance is far out of reach. Barber and Odean (2001) hypothesized that increased usage of internet provided an illusion of control and would result in overconfidence among online investors. When attributing the “AG”, “Aktien” and “Name” queries to those investors, the regression tables should support their hypothesis in providing evidence of a directional influence but they do not. As we will see also for “Aktie” queries the signs are mixed:
36
Table 5: Model_mix_GSI_Aktie Y=price_r_t0 Variable X AdidasAG AllianzSE BayerAG BayerischeMot CommerzbankAG DaimlerAG DeutscheBankA DeutscheLufth E_ONSE InfineonTechn RWEAG SAPAG SiemensAG ThyssenKruppA HeidelbergerD DurrAG SixtAG
coefficient -0.01176 -0.0327 -0.05983 0.01746 0.03428 -0.03889 0.01923 -0.00079 -0.09199 0.0356 -0.08384 -0.04013 -0.02783 0.08333 0.04724 0.03369 -0.01353
t-ratio (-0.384) (-0.701) (-1.672)* (0.301) (0.391) (-0.372) (0.274) (-0.03) (-2.927)*** (0.578) (-2.994)*** (-0.635) (-0.67) (2.222)** (465.199)*** (1.998)** (-2.143e+15)***
N Adj. R² AIC F-Test
3422 0.005809 22143.69 9.25e+29***
CHOW p-Value u~N(0,1) p-Value Homoscedasticity
constant 0.40203 0.14805 0.24488 0.52301 -0.68498 0.11527 0.07706 0.22287 -0.29358 0.80025 -0.28331 0.13281 0.12494 -1.04474 6.18284 0.63424 3.27073
t-ratio (1.466) (0.379) (0.769) (1.41) (-1.162) (0.339) (0.158) (0.602) (-0.904) (1.3) (-0.932) (0.432) (0.657) (-1.772)* (564.47)*** (0.491) (5.18e+15)*** 0.041884 0 Robust SE
Copyright © 2015. Diplomica Verlag. All rights reserved.
The “Aktie” queries show very little significance at an adjusted R² of 0.058%. The F-test changes to a higher level. But the sign of the coefficients are mostly positive and negative, while also the sample split test indicates, that most of the coefficients will switch when the time frame is changed. The queries for “Aktie” do not help in explaining variations of the stocks. Until now there is no difference in “AG”, “Aktie” or “Name” queries. All of those do not significantly explain the returns of stocks (1st hypothesis) of the same week. Wang (2012) analyzed this fact for 5607 firms with Factiva news data and Google. He found that only when information demand and information supply matched, prices are affected permanently. His finding may explain why the above results show dispersion. His study controls for good and for bad news and further includes a proxy about investor learning via Wikipedia website traffic. Prior to that, a study by Rubin and Rubin (2010) had concluded that also the update frequency of Wikipedia entries effects investor behavior by reducing uncertainty and thus volatility. Differentiating between all the effects and adding more control variables seems to be necessary to transparently evaluate the effect of search queries and its interactions on capital markets. Deciding whether news are classified as good news or bad news and how the recipient interprets these news is a subject of its own and not part of this study. It is to say that price movements without news reversed, while price events accompanied by news sustain (e.g. Chan, 2003). So news are an important proxy and should be addressed in any further analysis. Without stepping out of the scope of this study, in the next step the individual stocks are standardized by their volatility in order to be comparable among themselves.
37
2.2.1 Standardization In order to prepare the dataset for pooled regression the different data standard deviations have to be made comparable via standardization. A division by their standard deviation changes the interpretation of the tables in the way that not ‘unit’ increases or decreases but ‘one standard deviation changes’ become the new scaling of the coefficients. Why is this done? The individual stocks are pooled in regressions in order to extract their combined information content. The intention to pool all stocks together is to make transparent what regressions on the individual stocks cannot show. The stocks are clustered by their attributes as for example the Price-to-Book ratio, the intensity of search queries or the historical volatility. Splitting the stocks into these groups gives them a new order and additional prediction power over new topics. The standardization is done for every single time series with its overall standard deviation. Please note that this procedure is normally done only over the past existing values or last 52 values to avoid look-ahead bias in out-of-sample testing. Because out-of-sample testing is not an issue in this study the data can be standardized this way. Reasons are the very fat tails of the query data. ܼ݉݅ݔ ൌ
ሺοܫܵܩ െ ܽ݃ݒ ሺοܫܵܩ ሻ ͳ כி ሻ ݀ݐݏሺοܫܵܩ ሻ
(7)
Where i denotes the specific stock, avg the monthly average for seasonal adjustment of month m and std the standard deviation. ͳி is an indicator variable, which is one for the seasonal data series and zero for all other. After standardization the Zmix data can be used in a pooled regression. They now show the same scale of variation.
Copyright © 2015. Diplomica Verlag. All rights reserved.
To test whether the attribute of fat tails has decreased a Jarque-Bera test over all of the time series shows the improved attributes:
38
Table 6: Jarque-Bera-Test Variable Y Adidas AG Allianz SE BASF SE Bayer AG BMW Commerzbank AG Daimler AG Deutsche Bank AG Deutsche Börse AG Lufthansa Deutsche Post AG Deutsche Telekom E.ON SE Fresenius Medical Henkel AG Infineon Linde AG RWE AG SAP AG Siemens AG ThyssenKrupp AG TUI AG Volkswagen AG Aareal Bank AG Aurubis AG Beiersdorf AG Bilfinger SE Celesio AG Comdirect Bank AG EADS Fielmann AG Fraport
Name plain 330.225 *** 56789.2 *** 608.537 *** 35182.6 *** 54374.9 *** 105.176 *** 682.167 *** 19.503 *** 5828.1 *** 10160.1 *** 4741.39 *** 179.942 *** 33373.5 *** 52.054 *** 66.539 *** 533.645 *** 1391.07 *** 154.452 *** 760.705 *** 340.744 *** 428.065 *** 2951.86 *** 145.116 *** 23653.9 *** 2.876 59.723 *** 267.639 *** 11231 *** 298.691 *** 212665 *** 28.678 *** 2476.8 ***
Name Z mix 339.433 *** 69879.287 *** 487.184 *** 27920.70 *** 66595.41 *** 113.241 *** 1764.686 *** 24.431 *** 7791.778 *** 7785.185 *** 6186.836 *** 95.981 *** 25142.55 *** 16.948 *** 66.784 *** 303.398 *** 1890.964 *** 120.766 *** 139.125 *** 142.143 *** 821.333 *** 2792.273 *** 218.693 *** 334.32 *** 2.876 23.927 *** 302.298 *** 18321.732 *** 346.713 *** 512.524 *** 1.81 3473.305 ***
Variable Y GEA Group AG Hannover Rück HeidelbergCement Heidelberger Druc Hochtief AG IVG Immobilien AG KUKA AG Merck KGaA ProSiebenSat.1 Puma SE Salzgitter AG SGL Carbon SE Stada Südzucker AG Vossloh AG Baader Bank AG Balda AG BayWa AG Cewe Color Holding Deutz AG Dürr AG ElringKlinger AG Fuchs Petrolub AG Gerry Weber GFK SE Gildemeister AG Grenkeleasing AG Hawesko Holding Hornbach Holding Jungheinrich AG MVV Energie AG Sixt AG
Name plain 18783.1 *** 42493 *** 15126.2 *** 902.917 *** 46.862 *** 2204.3 *** 103.516 *** 47521.4 *** 1747.55 *** 474.724 *** 59.526 *** 961.115 *** 201.735 *** 925.666 *** 435.081 *** 253.292 *** 15680.7 *** 15.548 *** 854.938 *** 150.723 *** 392188 *** 189.385 *** 839.726 *** 35551 *** 70.157 *** 5655.27 *** 1100.83 *** 1015.72 *** 6.344 *** 56.566 *** 9186.53 *** 264.034 ***
Name Z mix 17007.657 *** 42493.588 *** 15125.868 *** 129.599 *** 60.426 *** 2204.273 *** 58.392 *** 6212.577 *** 1604.638 *** 654.731 *** 59.833 *** 306.511 *** 32.317 *** 564.632 *** 25.277 *** 253.292 *** 14727.096 *** 19.041 *** 868.878 *** 110.808 *** 412871.66 *** 122.522 *** 839.722 *** 38232.488 *** 33.513 *** 8403.723 *** 1100.934 *** 210.62 *** 11.116 *** 27.476 *** 4905.099 *** 314.403 ***
When taking a closer look at the Jarque-Bera values not all of the stocks improve towards the normal distribution. For example “Fielmann” drops from 28.678 to 1.81, but “Fraport” increases from 2476.8 to 3473.305. On average the time series with little query amounts for some time and thus lots of zeros will be leptokurtic and show higher Jarque-Bera values, when standardized. Time series with sufficient data will tend towards the normal distribution by standardization.
Copyright © 2015. Diplomica Verlag. All rights reserved.
2.2.2 Correlation Analysis The correlation analysis of different search term categories shows that the three queries are positively related, but their relation is not of 100% correlation. A search by “Aktie” is very little but positively correlated at roughly 13.01% with searches for the “AG”. The same is true for “Name” at 22%. “AG” and “Name” share a correlation of 19.4%. This makes transparent that people obviously do not enter the name of a company with the “AG” in appendix and that searches for “Aktie” are not the same as searches for “AG”
39
Table 7: Correlation Analysis of Signals Mixed_name Mixed_AG Mixed_Aktie
Mixed_name 1
Mixed_AG 0.194 1
Mixed_Aktie 0.2238 0.1301 1
The correlation of signals can be extended to the correlation with the dependent outcomes. The following table lists the seasonal adjusted and standardized queries together with the price changes for the prior week (t-1), the current week returns (t0) and for the week ahead (t+1). The time lag notation is the same for the implied volatility changes. The difference to the stock returns is that the implied volatilities are displayed as percentage point changes (delta) and not as percentage returns (e.g. if the implied volatility dropped from 15% to 10%, then this would be a 5% change). When analyzing the next table, please note that “AG” queries show the opposite correlation (positive) of signals for the current week (t0) than “Aktie” and “Name” queries (negative). When looking one week ahead the correlation is negative for “Name” and “AG” queries, whereas positive for “Aktie” queries. This could be a hint that “Aktie” queries are related to the stock and induce a buying behavior, which has to be analyzed later on. As the table already displays the price changes in percentage, the effects are of very small scale (below 5%). The exact significance level has to be tested yet. Besides the returns, the implied volatility has a positive correlation for the current week at all the queries. The correlation is higher than for returns with a maximum of 14.16% for “Aktie”. The query correlation with the following week drops and is negative for “Name” and “Aktie” and positive for “AG”. There is no constituent pattern in the correlation table, which allows us to draw conclusions. One argument could be that the option prices increase because of attention to the stock and a purchase of the option together with a normalization in the following week. This correlation table is a start to look at the pooled regression in closer detail.
Copyright © 2015. Diplomica Verlag. All rights reserved.
Table 8: Correlation Analysis of Signals and dependent Variables price_r_t-1 price_r_t0 price_r_t+1
Z_mix_GSI_name 0.0124 -0.0073 -0.0151
Z_mix_GSI_AG -0.0244 0.0227 -0.008
Z_mix_GSI_aktie -0.0585 -0.0329 0.0007
d_impl_v_t0 d_impl_v_t+1
0.0463 -0.019
0.0577 0.0001
0.1416 -0.0456
2.2.3 Pooled Regression The next model includes the control variables for the last trading week’s return the current market movement and the market turnover. All control variables were described in part one. The model differentiates between the queries for “Name”, “AG” and “Aktie” and is set up to explain the current price changes of the stocks.
40
The “Aktie” seems to be the best estimator of the three by t-value of -1.064 but decreases in significance in combination with control variables to -0.459. With controls “AG” has the best t-value of 1.436 but shows an opposite sign of 0.186114% in comparison with the other two queries. This result would mean that the price return increases with every increase of one standard deviation in “AG” queries. This is notably a very small effect and the prediction power is very low for the models without control variables (Adj. R² < 0.1%). The Chow-test switches signs when the sample is split into two parts. Moreover, the heteroscedasticity is not reduced by using robust standard errors and tests for non-linearity are significant. The residuals of the regression are not normally distributed and show very fat tails. The model lacks significance for every search query input data. This outcome is not surprising, as the single stock regressions where not significant either. Table 9: Model_3Categories_Zmix Y=price_r_t0 Variable X Constant
Value 0.239559 (7.007)***
Name
-0.042707 (-0.864)
AG
Value 0.15967 (2.512)**
Value 0.05461 (0.466)
0.135408 (1.034) -0.206085 (-1.064)
Price_r_t-1 Avg_Price_r_t0 26270 0.000015 164516.9 0.746687 0.561926 0 Robust SE 0.4152 0.2026
Value -0.20635 (-2.472)**
0.186114 (1.436)
ǻ Turnover
Copyright © 2015. Diplomica Verlag. All rights reserved.
Value -0.070747 (-1.37)
-0.033718 (-0.904)
Aktie
N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity Reset-Test p-Value Non-lin p-Value
Value -0.000463 (-0.016)
7120 0.000376 44126.94 1.068153 0.813389 0 Robust SE 0.004141 0.05407
2864 0.000733 18684.13 1.1329 2.18e-6 0 Robust SE 1.63e-19 1.244e-16
-0.059974 (-0.459) 0.112611 (1.987)** -0.014985 (-1.384) 1.007975 (56.847)***
-0.047601 (-2.007)** 1.014868 (32.899)*** 0.094493 (0.931)
0.047681 (0.412) 0.037406 (1.411) 1.22945 (37.337)***
24024 0.350007 141491.9 1023.466 0.7965 0 Robust SE 0.0186 0.00109
6747 0.400139 38619.56 369.877 0.0022521 0 Robust SE 9.5e-11 2.81e-26
2823 0.507344 16549.27 365.8371 6.78e-66 0 Robust SE 0.09398 5.74e-6
In general, the model is not in line with other authors like Da et al. (cf. 2011, p. 1465), who found a 30 BP outperformance for abnormal search queries in the first two weeks for the US Russel 3000 Ticker data. Their abnormal search volume queries were defined as the differences of logs minus the eight week median of queries. Their sample stopped in June 2008. As there is a positive value for “AG” queries of 18.6 BP for the first week returns, the dimensions are not fully out of scope. Moreover some of the regressions for “Name” and “AG” also switch signs when the time frame is cut and align with Da et al.’s theory. This is a hint that the data may be more random than causal and differs from regime to regime. Please remember that in the financial crisis and European debt crisis most of the stocks tumbled. Some of
41
this negative variation is stored into the variables just because no dummy variable is included and absorbs the events. Hint: The non-linearity, non-normality of residual and heteroscadasticity test occurred in every single model. This fact is set for all other following regressions and will not be mentioned at every single regression with fat tails hereafter. It will be announced, whenever something different applies.
2.2.4 DAX, MDAX and SDAX Panel When splitting the model into the three indices DAX, MDAX and SDAX by the three search terms the quality of the data set structure becomes problematic and misleading: “AG” queries are available for only a total of four stocks in MDAX and two stocks in SDAX. The “Aktie” queries are available for one stock in MDAX and two stocks in SDAX. Thus only the “Name” query can give considerable hints at high data frequency with more than 25 stocks for every index. The “Name” queries are the only ones, which allow for a Chow-test for sample split. From the low frequency data for “AG” and “Aktie” conclusion cannot be generalized as not enough time series were available.
Copyright © 2015. Diplomica Verlag. All rights reserved.
The query for MDAX titles is significant at 5% level with a negative sign. Not significant are DAX and SDAX queries.
42
Table 10: Model_3Indices_Zmix Y=price_r_t0 Variable X DAX_dummy MDAX_dummy SDAX_dummy
Value 0.190068 (4.360)*** 0.220672 (4.557)*** 0.33778 (5.342)*** .
Value -0.036815 (-0.922) -0.020597 (-0.453) 0.1011 (1.585)
DAX_and_name
Value -0.045557 (-1.051) -0.026359 (-0.555) 0.106623 (1.547)
Value -0.079841 (-1.334) -0.072404 (-0.612) 0.010591 (0.057)
0.023671 (0.326) -0.142178 (-1.975)** 0.014741 (0.172)
MDAX_and_name SDAX_and_name DAX_and_AG
0.188657 (1.379) -0.164812 (-1.117) -0.151109 (-0.616)
MDAX_and_AG SDAX_and_AG DAX_and_Aktie
-0.073375 (-0.561) 4.191439 (8.661)*** 1.62753 (1.543)
MDAX_and_Aktie SDAX_and_Aktie ǻ Turnover Price_r_t-1 Avg_Price_r_t0
Copyright © 2015. Diplomica Verlag. All rights reserved.
N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity Reset-Test p-Value Non-lin p-Value
Value -0.218035 (-2.601)*** 5.06689 (17.847)*** 0.615165 (0.62)
33534 0.000063 207167.7 22.7739 0.6639 0 Robust SE 1 ERROR
0.108015 (2.074)** -0.018026 (-1.827)* 1.003498 (61.63)***
0.113531 (2.006)** -0.014985 (-1.386) 1.008179 (56.784)***
0.184244 (1.433) -0.047051 (-2.005)** 1.015498 (32.803)***
0.047518 (0.41) 0.035327 (1.328) 1.229172 (37.307)***
27186 0.350208 159662.8 815.71 0.635239 0 Robust SE 0.267763 0.0001929
24024 0.350168 141489.9 484.4427 0.562149 0 Robust SE 0.0141031 0.004509
24024 0.400363 38621.04 165.0933 5.86e-5 0 Robust SE 1.82e-11 1.78688e-27
24024 0.508287 16458.04 369.5106 3.065e-41 0 Robust SE 0.0972611 2.44e-5
43
Bank, Larch and Peter (2010, p. 21) reported in their analysis positive signs for large stocks and negative signs for small stocks. Aligning their definition of small stocks with MDAX stocks, then the below sample would support their finding. Additional to their study the regression coefficient becomes significant at a 95% confidence level, which is a higher level than in their study. Bank, Larch and Peter (2010) could only report tendencies and were not able to show significance at higher levels. The additional data of this study for the later years up to 2013 may make that point. In contrast, their data set covers four week returns and not weekly frequencies. Quite puzzling remains the fact that the SDAX coefficient is positive for the weekly framework. In the next section the overall non-linearity should be addressed before moving into the further analysis of the control variable quantiles. Therefore the search queries are split into their quantiles themselves.
2.2.5 Search Intensity One way to address the non-linearity of the prior regressions is to split the query intensity into groups. The groups are constructed by the amount of search queries. High spikes of searches fall into quantile five and the lowest negative ones into quantile one.
Copyright © 2015. Diplomica Verlag. All rights reserved.
The table below shows that none of the quantiles is significant for “Name” queries. Quantile one and two of “Aktie” show negative signs at 95% significance level. The “AG” quantile four is significant at a 90% level. For a good regression all quantiles are expected to be significant, but one may conclude that outer quantiles go with negative signs and inner quantiles, where not much search queries occurred, with positive signs. This may reflect the fact that people search for additional information, when prices drop and do nothing, when they observe prices increase. But this hypothesis is not significantly supported by the regression below and more research needs to be done. Secondly, there is an unwanted effect: a typical behavior for time series was not only to drop when no queries happened but also dropped very sharply after new highs had been reached. Thus, very negative quantiles are highly correlated with new tops. The Google method sets the highest top at a maximum level of 100 and limits it there. The variations are correct in their direction but the scaling switches. Harsh variations can be tiny little in the time series as of today. Therefore the quantiles cannot be addressed with a functional relationship. Judging from the coefficients no functional relationship, like an exponential function or logarithmic function, is obvious:
44
Table 11: Model_GSI_Quantile_ZMix Y=price_r_t0 Variable X Constant
Name 0.012978 (0.231)
AG 0.01985 (0.624)
Aktie 0.014866 (0.509)
Quantile 1
0.021483 (0.262) -0.068406 (-0.843) 0.054054 (0.653) -0.00331 (-0.038) -0.109081 (-0.826)
-0.110135 (-0.989) 0.053148 (0.518) -0.069254 (-0.689) -0.217735 (-1.683)* -0.006675 (-0.034)
-0.321485 (-2.067)** -0.289346 (-1.98)** 0.088706 (0.512) -0.031445 (-0.14) -0.045313 (-0.182)
0.109222 (2.118)** -0.017962 (-1.822)* 1.003589 (61.686)***
0.108264 (2.085)** -0.017969 (-1.822)* 1.003458 (61.648)***
0.107505 (2.06)** -0.017982 (-1.823)* 1.003496 (61.615)***
27186 0.3501 159669.78 621.19 0.6324 0 robust SE 0.2643 0.0002137
27186 0.350129 159669.1 597.1055 0.0696336 0 robust SE 0.22258 0.000238
6417 0.350158 159667.89 592.14 3.58e-206 0 robust SE 0.21871 0.0002796
Quantile 2 Quantile 3 Quantile 4 Quantile 5 ǻ Turnover Price_r_t-1 Avg_Price_r_t0 N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity Reset-Test p-Value Non-lin p-Value
Copyright © 2015. Diplomica Verlag. All rights reserved.
Fink and Johann (2013) analyzed a quite similar data set of DAX, MDAX, SDAX and TECDAX in one draw with groups of attention quantiles. It is to say that their data frequency is on a daily basis and that they applied the category filter “Finance” on their Google time series before downloading. Their outcomes are identical to the ones above with respect to positive effects of 3.2 BP for quantile 3 under a 1.24% p-value and negative quantile 2 results, but the rest of the dataset had opposing signs. Additionally, their data is clearer in stating that an increase in queries delivers short term performance, while little queries relate to negative returns. Overall, there is no clear message in the table above. As discussed, the major pitfalls in the state of information storage and overwriting do not allow for a sensible functional relationship. Furthermore, a quick fix to the sharp drops after new tops is to analyze only positive search query changes as did Latoeiro, Ramos and Veiga (2013). The analysis of the positive queries did not reveal other significant patterns and was dropped thereafter. What can be addressed is the Price-to-Book ratio in the next step of the analysis.
45
2.2.6 Price-to-Book and GSI The regression can be controlled for the Price-to-Book ratio (PB). This ratio relates the company’s balance sheet equity position to its exchange traded valuation price. By grouping every stock into the quantile of valuation, five groups with each 20% mass are constructed. The first quantile includes the companies with the lowest ratio and the quantile 5 the companies with the highest ratio. When looking at the outcomes below, the regression table shows, that none of the search query controlled quantile is significant. “Name”, “AG” and “Aktie” queries differ in their sign. One might expect mixed signs for medium quantiles and mean reversion for the highest and lowest quantiles. This property cannot be found in any of the three. “Aktie” may show this pattern, but a split of sample with a Chow-test rejected the assumption. Table 12: Model_PBxGSI_ZMix Y=price_r_t0 Variable X Constant Quantile 1 intercept Quantile 2 intercept Quantile 3 intercept Quantile 4 intercept Quantile 5 intercept Quantile 1 and GSI Quantile 2 and GSI Quantile 3 and GSI Quantile 4 and GSI Quantile 5 and GSI ǻ Turnover Price_r_t-1
Copyright © 2015. Diplomica Verlag. All rights reserved.
Avg_Price_r_t0 N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity
46
Name -0.564512 (-4.76)*** 0.484988 (3.61)*** 0.563271 (4.195)*** 0.541319 (4.068)*** 0.846115 (6.33)*** 0.939344 (6.135)***
AG -0.591992 (-5.137)*** 0.492599 (3.308)*** 0.611272 (3.971)*** 0.711862 (4.101)*** 0.576701 (3.567)*** 0.744787 (3.298)***
Aktie -0.617516 (-5.531)*** 0.640518 (3.137)*** 0.410056 (2.175)** 0.435735 (1.831)* 0.631401 (3.501)*** 0.630433 (2.225)**
-0.092316 (-1.45) -0.142223 (-1.405) 0.0802 (1.115) 0.061769 (0.944) 0.25891 (1.049)
0.010457 (0.097) 0.03724 (0.36) -0.099969 (-0.604) -0.123786 (-0.741) 1.140306 (1.553)
0.103897 (0.391) -0.205004 (-1.055) -0.007651 (-0.027) 0.018753 (0.079) -0.031359 (-0.1)
0.110618 (1.972)** -0.016524 (-1.575) 1.007643 (57.072)***
0.125362 (1.151) -0.026364 (-1.57) 1.121275 (28.822)***
0.049181 (0.436) 0.005659 (0.31) 1.293641 (25.188)***
24307 0.35208 142946.29 351.91*** 0.108925 0 robust SE
9062 0.356371 55381.2 107.2851*** 0.0025078 0 robust SE
5489 0.394831 34470.26 89.52464*** 1.27e-107 0 robust SE
As in Bank, Larch and Peter (2010, p. 21) the coefficients for high Price-to-Book stocks are not negatively related to the queries and also the low Price-to-Book stocks are not positively related to the queries. As already mentioned, the return period in their analysis is four weeks, which is different to the one week frequency in this regression. [Table 27] of the appendix shows the analysis of returns with lagged inputs or, in other words, the one week ahead returns with the current known values. In that analysis the regression coefficients slowly start to change into the direction of Bank, Larch and Peter (2010) and become negative for the high Price-to-Book stocks, which would support the first hypothesis. In the prediction the quantile 3 coefficient is significant at the 95% level. The overall coefficients for the other search terms of “AG” and “Aktie” queries were not significant. Bank, Lanrch and Peter also used “Name” queries. Overall the data shows a tendency, which is not fully statistical significant, of investors to search for information about high priced securities in the short run and to sell those securities in the next four weeks. That is an interesting finding with respect to the first hypothesis. In the next step the one year performance is analyzed with respect to highs and lows of the stock prices.
2.2.7 52 week High or Low Already Latoeiro, Ramos and Veiga (2013) analyzed the topic of yearly high or lows and the impact on the prices. Hence this is not really a new insight. Nevertheless it is worthwhile to analyze the topic further. The definition of 52 high and low is a little bit broader: To allow for some tolerance the stocks in the upper 10% of their range and lower 10% of their range are allowed to carry the dummy for the yearly property.
Copyright © 2015. Diplomica Verlag. All rights reserved.
The regression table below for “Name”, “AG” and “Aktie” queries reports a positive price change for high trading stocks and low performance for low trading stocks. The sample split is stable for both variables. The overall Chow-test is not stable, because of the beginning financial crisis in 2008.
47
Table 13: Model_52weekHighLow_Zmix Y=price_r_t0 Variable X Constant High10% Low10% High10%_and_GSI Low10%_and_GSI ǻ Turnover Price_r_t-1 Avg_Price_r_t0 N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity Reset-Test p-Value Non-lin p-Value
Name -0.039872 (-1.076) 1.138011 (17.59)*** -1.815374 (-17.078)***
AG -0.065968 (-1.08) 1.062765 (8.67)*** -1.62496 (-9.539)***
Aktie -0.000575 (-0.006) 0.630612 (3.794)*** -1.423122 (-5.699)***
0.170641 (2.074)** -0.734299 (-4.501)***
0.34915 (1.262) -0.668378 (-3.202)***
0.496178 (2.765)*** -0.832998 (-4.642)***
0.108672 (1.975)** -0.054469 (-5.135)*** 0.947744 (52.559)***
0.173353 (1.394) -0.083534 (-3.621)*** 0.959663 (31.958)***
0.101489 (0.918) -0.001601 (-0.059) 1.173774 (34.317)***
24024 0.3764 140498.83 925.98 3.59e-7 0 Robust SE 2.98e-26 4.88e-11
6747 0.4253 38332.94 340.23 5.66e-6 0 Robust SE 8.46e-48 1.28e-30
2823 0.5256 16355.39 246.77 2.64e-10 0 Robust SE 0.000424 1.537e-5
Latoeiro, Ramos and Veiga (2013, p. 47) come to the same conclusion. They found that prices are positively affected, when trading at their 52 weeks high and vice versa. So the data reassures, that the attribute of price level significantly changes the way, how stock queries are reflected. Remarkably all the query types report the same signs. The pattern supports Baber and Odean (2008) in the way that investors probably need to update their view on the business case at high prices, while under negative scenarios stocks are blindly sold.
Copyright © 2015. Diplomica Verlag. All rights reserved.
[Table 25] in the appendix shows the regression for the lagged inputs and the return one week ahead. The coefficients mostly stay in order, but are not significant anymore. The signs do not change except for the “Name” coefficient at the 52 week high. As the table has very low significance at an adjusted R² below 1%, an active trading strategy is not advisable at this point of research.
2.2.8 Interim Conclusion The 1st hypothesis could not be verified in this study. Many authors before had found statistically significant relationships mainly on the index level and for macroeconomic variables. On the specific stock level the relationship did not hold for the above data sample. Other researchers did not find stable relationships on the specific stock level either, which could
48
support the hypothesis. This fact may be attributed to the information lost or not controlled for as for example news flow, investor’s portfolio position, mood and opinions about a stock. Some already known effects are supported by the data, e.g. the valuation (via market value and Price-to-Book ratio) and 52 week high and low effects. The general statement about the hypothesis without any restrictions however cannot be verified in this study for the individuals stocks.
2.3 - Implied Volatility Regressions Most of the papers were better in explaining squared returns and volatility with Garch models than directional price effects (e.g. see Dimpfl and Jank (2011) or Fink and Johann (2013)). On this occasion it got inevitable to conjecture that search queries might influence implied volatility.
Copyright © 2015. Diplomica Verlag. All rights reserved.
The regressions on the implied volatility are based on the same timeframe as the regressions of the price performance. As already mentioned before, the regressions for implied volatility mostly represent DAX stocks and some MDAX stocks. For most of the SDAX stocks no options were quoted on the market in prior years. In the next model the different search terms “Name”, “Aktie” and “AG” show some statistical significance at the 99% confidence level. This is very interesting, because those regressions were stable also after splitting the sample in two parts and checking for non-linear effects. The “AG” queries were significant at 95% level, but passed the Reset-test for higher moment specification within the highest probability. All signs are positive, which implies that an increase of one standard deviation in the queries is related with a 0.28 percentage point increase in implied volatility for “Name”, 0.3 delta change of the implied volatility for “AG” and lastly 0.76 percentage points increase with “Aktie”. When introducing control variables the signs drop a little and “AG” becomes insignificant but “Name” and “Aktie” remain slightly informative and above the 95% and 90% confidence level respectively. Their impact is cut approximately by half to 0.13 and 0.30. Their overall R² is very low in order to explain the 2nd hypothesis. Please find the regression table below:
49
Table 14: Model_3Categories_Zmix Y= d_3m_impl_vola_t0 Variable X Constant
Value 0.0145633 (0.366)
Name
Value 0.021521 (0.484)
Value -0.00466 (-0.082)
Value -0.009418 (-0.095)
0.28775 (3.886)***
AG
Value 0.11925 (2.106)**
0.306537 (2.065)**
0.11913 (1.232) 0.76221 (3.3)***
ǻ Turnover Price_r_t0 Avg_Price_r_t0 19272 0.000000 120561.9 NA 0.7556 0 Robust SE 1 NA
16549 0.002087 104701.8 15.1029 0.69436 0 Robust SE 0.0703 0.37378
Value 0.144977 (1.735)*
0.137931 (2.382)**
Aktie
N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity Reset-Test p-Value Non-lin p-Value
Value 0.124436 (2.613)***
6551 0.003178 38566.44 4.263221 0.992193 0 Robust SE 0.301807 0.427006
2860 0.019711 17733.24 10.89216 0.295506 0 Robust SE 3.907e-28 3.859e-23
0.29884 (1.79)* 0.306682 (4.413)*** -0.007509 (-0.116) -0.47033 (-7.52)***
0.525731 (4.614)*** 0.070847 (0.566) -0.64637 (-5.78)***
0.344345 (4.831)*** -0.167404 (-4.707)*** -0.637379 (-11.69)***
15466 0.0893 97361.62 88.40 4.48e-8 0 Robust SE 1.44e-122 7.37e-125
6213 0.2531 35015.46 77.50 0.000541 0 Robust SE 3.1e-278 5.7e-169
2819 0.39104 0.390175 16181.18 0.120245 0 Robust SE 2.65e-68 2.71e-21
Naturally the search queries can be sorted into quantiles. The next analysis focuses on that issue.
2.3.1 Groups of Search Intensity
Copyright © 2015. Diplomica Verlag. All rights reserved.
The search queries are clustered into different intensities of query search changes. The highest searched stocks fall into group five and the least searched ones into group one. The a priori assumption is that highly searched stocks should have an increased implied volatility change and low frequented stocks, which drop out of attention, should vary only little or decrease in implied volatility. When looking at the data, the assumption is tendentially reflected by the “Name” queries but opposed by the “Aktie” search queries. None of the groups is statistically significant and the Chow-tests reverse in sign. Therefore, there is no significant information in the groups sorted by intensity. Any judgments about the initial assumptions cannot be made.
50
Table 15: Model_GSI_Quantile_ZMix Y= d_3m_impl_vola_t0 Variable X Constant
Name 0.150007 (1.398)
AG 0.137292 (2.39)**
Aktie 0.133088 (2.737)***
Quantile 1
-0.145802 (-1.08) 0.031436 (0.256) -0.001199 (-0.009) -0.089521 (-0.521) 0.075864 (0.485)
-0.144981 (-1.134) 0.001836 (0.017) -0.032678 (-0.295) -0.018696 (-0.139) -0.035862 (-0.184)
0.19604 (1.043) 0.058558 (0.291) -0.189523 (-1.125) -0.108781 (-0.506) -0.27178 (-1.244)
0.294014 (4.614)*** -0.022794 (-0.389) -0.458151 (-8.078)***
0.294933 (4.612)*** -0.023006 (-0.393) -0.457809 (-8.086)***
0.297219 (4.604)*** -0.022725 (-0.388) -0.457962 (-8.093)***
17538 0.0923 109509.73 61.36 5.9417e-5 0 robust SE 4.406e-142 7.92e-144
17538 0.0922 109511.79 53.35 3.56e-6 0 robust SE 2.827e-141 4.9853e-144
17538 0.0923 109509.58 53.52 2.77e-6 0 robust SE 7.3078e-141 1.92e-144
Quantile 2 Quantile 3 Quantile 4 Quantile 5 ǻ Turnover Price_r_t0 Avg_Price_r_t0 N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity Reset-Test p-Value Non-lin p-Value
As the above table does not provide any significant information, the next regression adds historical volatility to the setup.
2.3.2 Cross Check for historical Volatility
Copyright © 2015. Diplomica Verlag. All rights reserved.
In the regression of implied volatility the sample is split into the groups of historical volatility. A high historical volatility is assigned to group five and low volatilities to group one. Every one of the five groups represents 20% weight, which sums up to 100%. When looking at the regression table below, the highest quantile of historical volatility is affected by search queries for “Name” with 1.58 percentage point increases for one standard deviation increase of queries at a significance level of 99%. This is by far the highest amount. The confidence level for 3.51 percentage points of “AG” queries is only at 90% level. The “Aktie” queries are not significant at all. The lowest quantile of all three terms does not have a negative sign, which could be influenced by mean reversion, meaning that stocks with very low historical volatilities will gain in variance at some point in time due to new information or other effects. Searching for those stocks via “Aktie” may lead to the conclusion to buy some options to hedge an existing position before new infor51
mation will increase the stock’s volatility. Moreover, the second lowest quantile shows a negative sign for all three groups. This is special with respect to the typical volatility behavior. Whenever the volatility spikes sharply, it normally takes some time until the volatility decays. This process is slow. Hence, when the historical volatility is low already and the implied volatility is still high and there is no new information available through search queries, then the implied volatility drops and closes the gap of the overpriced option volatility to the true volatility realizations. This would fit into this bracket. Table 16: Model_HistVola&GSI_ZMix Y=d_implied_vola_t0 Variable X Constant Quantile 1 intercept Quantile 2 intercept Quantile 3 intercept Quantile 4 intercept Quantile 5 intercept Quantile 1 and GSI Quantile 2 and GSI Quantile 3 and GSI Quantile 4 and GSI Quantile 5 and GSI ǻ Turnover Price_r_t0
Copyright © 2015. Diplomica Verlag. All rights reserved.
Avg_Price_r_t0 N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity
Name 0.091059 (1.488) 0.022729 (0.225) 0.015119 (0.101) 0.019335 (0.205) 0.047655 (0.445) 0.269508 (0.953)
AG 0.095569 (1.502) 0.074121 (0.812) 0.004803 (0.047) -0.053043 (-0.426) 0.028415 (0.156) 0.132343 (0.264)
Aktie 0.220909 (1.829)* -0.167137 (-0.842) -0.200552 (-0.93) -0.157969 (-0.674) 0.026465 (0.156) -0.158268 (-0.523)
0.025544 (0.329) -0.201119 (-1.143) 0.137938 (1.408) 0.165992 (1.377) 1.577076 (2.759)***
0.016779 (0.246) -0.232615 (-2.371)** 0.001098 (0.009) 0.262016 (0.885) 3.513334 (1.91)*
0.462018 (1.27) -0.010978 (-0.041) 0.252947 (0.547) 0.264956 (0.869) 0.658491 (0.901)
0.26827 (4.641)*** -0.033525 (-1.741)* -0.480347 (-17.993)***
0.439982 (5.941)*** 0.02901 (0.293) -0.571367 (-6.805)***
0.34649 (4.652)*** -0.161133 (-4.069)*** -0.498521 (-8.642)***
15731 0.094694 98756.33 40.44292*** 0 0 robust SE
7343 0.251208 40770.76 28.4488*** 0.0001523 0 robust SE
4610 0.256033 25508.6 25.26048*** 0 0 robust SE
This finding could be a major breakthrough in science, if the signs would not be inverted when splitting the sample with a Chow-test. Ex post this is very sensible but ex ante a trading strategy could not support stable returns. The sample split falls into the financial crisis of 2008/09 and monetary crisis of 2011/12. The fifth quantile reverses in sign over the later period to 2013. Also the historical volatilities were very high, implied option volatilities were high too. Starting
52
from these high levels of implied volatilities, those could do nothing else than decrease for quantile five. The process of ongoing soft landing is not reflected in the regressions constant, because the sample was split too early and the implied volatilities jumped twice in late 2008 and late 2012. Therefore the negative sign change is explained by the macroeconomic conditions. Those macroeconomic twists could be implemented by adding a control variable of macroeconomic environment to the regression model, but in reality the macroeconomic regime is not easy to judge a priori. Ex ante the regime is evident, but the model must work without this information as well, which is not the case. The lagged regression in [Table 26] of the appendix shows how the one week ahead implied volatility changes fit softly into the picture. The “Name” queries, which lifted the implied volatility at the prior week, show a negative sign in the following week. This effect could be the mean reversion, following the initial shock with first profit taking. The significance drops sharply for the coefficient to slightly above 90% confidence level and the overall R² is at 0.1%. The pattern is not constant over “AG” and “Aktie” queries, where some mixed effects happen in the intermediate quantiles. To sum it up, the implied volatility is sensitive to internet search queries and the effects can be explained by existing economic theory. Following this finding the question arises, if possibly the implied volatility of individual stocks can also be explained explicitly by their search queries. This is discussed the next chapter.
2.3.3 Single Stock Regressions on implied Volatility For the implied volatilities the regressions are more informative than for the percentage changes of stock prices. The regressions of implied volatility are conducted on a smaller data set, because many of the SDAX stocks had no publicly traded options outstanding in prior years.
Copyright © 2015. Diplomica Verlag. All rights reserved.
Please find the regressions in the three tables below. Some stocks are indeed significant with respect to the search queries. Most of the statistical significant stocks carry a positive sign. This is similar over all three different search terms. The R²s are below 2% and thus very small. ”Name” and “AG” queries survive a Chow-Test. Opposing to the directional regressions, the implied volatility regressions do not differ between the three terms. The tables can be read as follows: A one delta increase in search queries for “Adidas” does increase the implied volatility of Adidas at-the-money options of constant three month to maturity by 0.03559 percentage points (e.g. from 15% to 15.03559%).
53
Variable X AdidasAG AllianzSE BASFSE BayerAG BayerischeMotor CommerzbankAG DaimlerAG DeutscheBankAG DeutscheBorseAG DeutscheLufthan DeutschePostAG DeutscheTelekom E_ONSE FreseniusMedica HenkelAG_CoKGaA InfineonTechnol LindeAG RWEAG SAPAG SiemensAG ThyssenKruppAG TUIAG VolkswagenAG AarealBankAG AurubisAG BeiersdorfAG BilfingerSE CelesioAG EADS FielmannAG FraportAGFrankf
coefficient 0.03559 0.08125 0.08959 -0.00452 -0.04391 0.21871 0.0262 0.01015 0.10456 0.00258 0.00993 0.07814 0.01967 -0.00139 0.05689 0.08688 0.0398 0.02794 0.06975 0.07584 0.02867 0.06371 0.83383 0.23126 0.02557 -0.0072 0.02304 0.03495 0.07937 -0.0253 0.002
Copyright © 2015. Diplomica Verlag. All rights reserved.
54 t-ratio (0.785) (1.593) (2.374)** (-0.229) (-0.961) (2.812)*** (0.44) (0.147) (1.76)* (0.128) (0.521) (1.047) (0.297) (-0.175) (2.166)** (1.647)* (2.139)** (1.61) (1.867)* (0.618) (1.502) (1.883)* (1.557) (0.662) (1.006) (-0.614) (1.005) (1.387) (1.567) (-0.541) (0.119)
constant 0.00765 -0.00363 0.01057 0.00296 0.00439 0.01391 0.00147 0.01139 0.00429 -0.00231 -0.00177 0.01275 0.01417 -0.01081 0.00299 -0.01451 -0.00323 0.00022 -0.02096 -0.02003 0.03181 0.00325 -0.02157 0.00987 0.10877 -0.00525 0.06917 -0.02736 0.00035 -0.20277 0.06605
t-ratio (0.053) (-0.015) (0.064) (0.016) (0.03) (0.053) (0.007) (0.037) (0.029) (-0.017) (-0.014) (0.087) (0.09) (-0.092) (0.024) (-0.05) (-0.026) (0.002) (-0.116) (-0.104) (0.189) (0.013) (-0.056) (0.021) (0.309) (-0.051) (0.275) (-0.197) (0.003) (-1.009) (0.291)
coefficient 0.03129 0.00216 -0.16728 -0.10022 0.04365 0.43883 -0.04659 0.04165 -0.03562 -0.36959 0.02055 0.04771 0.04616 0.02178 -0.01667 -0.00435 0.48808 -0.01031 0.00094 0.04588 0.08627 -0.025 0.22384 16913 0.0078 106662.7 1.03275 0.4558 0 Robust SE
Variable X GEAGroupAG HannoverRuckver HeidelbergCemen HeidelbergerDru HochtiefAG IVGImmobilienAG KUKAAG MerckKGaA ProSiebenSat_1M PumaSE SalzgitterAG SGLCarbonSE StadaArzneimitt SudzuckerAG VosslohAG BayWaAG DeutzAG DurrAG ElringKlingerAG FuchsPetrolubAG GerryWeberInter GildemeisterAG SixtAG N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity
Y=d_3m_impl_vola_t0
Table 17: Model_mix_GSI_Name
t-ratio (1.85)* (0.135) (-1.813)* (-0.474) (1.505) (2.674)*** (-0.825) (1.23) (-0.572) (-0.989) (0.501) (0.707) (1.58) (0.755) (-0.576) (-0.114) (1.457) (-0.832) (0.058) (0.726) (3.472)*** (-0.257) (1.675)*
constant -0.0407 -0.03064 0.11644 0.17753 -0.02354 1.56551 0.24408 -0.01845 0.30694 -0.02676 -0.02021 0.07965 0.00413 -0.14614 -0.0942 -0.16571 0.25279 0.62643 -0.02327 0.18127 -0.12535 -0.03029 -0.05396
t-ratio (-0.209) (-0.153) (0.208) (0.195) (-0.101) (1.615) (0.542) (-0.146) (0.443) (-0.024) (-0.104) (0.241) (0.019) (-1.035) (-0.533) (-0.888) (0.455) (0.969) (-0.089) (0.406) (-0.437) (-0.098) (-0.149)
A surprisingly little amount of stock returns are significantly explained by search queries. Ten out of the eleven stocks with significance above 90% carry a positive sign. For the stocks with queries for “AG” the quantity of available signals dropped. In this sample four out of five significant stocks carry a positive sign. The F-test is very low at 1.26 and R² below 2%. Table 18: Model_mix_GSI_AG Y= d_3m_impl_vola_t0 Variable X AdidasAG AllianzSE BayerAG CommerzbankAG DaimlerAG DeutscheBankA DeutscheBorse DeutscheLufth DeutscheTelek HenkelAG_CoKG LindeAG RWEAG SAPAG SiemensAG ThyssenKruppA VolkswagenAG BeiersdorfAG FraportAGFran SalzgitterAG BayWaAG DeutzAG
coefficient 0.03757 -0.01289 0.02002 0.04871 0.00987 0.01995 0.03175 -0.00995 0.01224 -0.01274 0.0161 0.00398 0.02911 -0.00879 -0.00086 0.26051 -0.01557 0.0346 0.01871 -0.00993 -0.08732
N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity
6818 0.013806 39930.42 1.26472 0.06723 0 Robust SE
t-ratio (2.334)** (-1.279) (0.699) (3.103)*** (0.327) (0.714) (1.986)** (-1.458) (0.792) (-0.781) (1.834)* (0.348) (1.549) (-0.65) (-0.107) (1.398) (-3.248)*** (1.505) (0.682) (-0.617) (-1.147)
constant -0.09995 -0.09793 0.00488 0.0101 -0.02822 0.0106 0.00416 -0.10939 0.00994 -0.06609 0.00436 0.04839 -0.0159 -0.01956 -0.06874 0.02658 0.01604 0.10269 0.01129 -0.16877 0.2092
t-ratio (-0.432) (-0.285) (0.027) (0.034) (-0.097) (0.034) (0.028) (-0.636) (0.067) (-0.389) (0.035) (0.256) (-0.084) (-0.101) (-0.292) (0.066) (0.119) (0.441) (0.058) (-0.91) (0.366)
Copyright © 2015. Diplomica Verlag. All rights reserved.
Also for the “Aktie” queries most of the stocks carry a very small positive sign and are not significant in the broadest sense. A statistical relationship between an increase of search queries and the implied volatility change is not present for every stock. The R² again is very small.
55
Table 19: Model_mix_GSI_Aktie Y= d_3m_impl_vola_t0 Variable X AdidasAG AllianzSE BayerAG BayerischeMot CommerzbankAG DaimlerAG DeutscheBankA DeutscheLufth E_ONSE InfineonTechn RWEAG SAPAG SiemensAG ThyssenKruppA HeidelbergerD DurrAG
coefficient 0.10255 0.05802 0.09578 0.04991 0.18539 0.14701 -0.00154 0.04543 0.04221 0.0372 0.05322 0.08266 0.05583 0.02401 -0.00034 0.00224
N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity
3381 0.021582 20974.86 1.715991 0.433539 0 Robust SE
t-ratio (2.352)** (0.885) (1.556) (0.989) (2.625)*** (1.163) (-0.016) (2.354)** (1.744)* (0.564) (2.215)** (0.998) (0.887) (0.878) (-0.035) (0.07)
constant -0.0815 -0.05243 0.07215 0.02372 -0.0451 -0.029 0.0246 -0.05314 0.01149 -0.33744 -0.00992 0.10466 -0.01895 0.24799 3.69333 0.26158
t-ratio (-0.369) (-0.123) (0.23) (0.093) (-0.098) (-0.104) (0.048) (-0.204) (0.043) (-0.694) (-0.041) (0.291) (-0.099) (0.602) (3.565)*** (0.243)
Overall the results are mixed and quite surprising. Kita and Wang (2012) had found that investors’ attention is significantly related to the FX market with time varying risk aversion measured by the variance risk premium (= implied volatility minus historical volatility). In their analysis they controlled for macroeconomic uncertainty and news. Vlastakis and Markellos (2012), who also controlled for news, analyzed the 30 largest stocks on NYSE and confirmed that investors demand more information as their level of risk rises. Based on this research the above results should display much more significance and higher positive signs. One reason could be the shortage of external news proxies in the model. It seems to be inevitable to add news proxies to the analysis.
Copyright © 2015. Diplomica Verlag. All rights reserved.
2.3.4 Interim Conclusion The second hypothesis cannot be verified for the single individual stocks in general. With respect to the pooled regression over the full sample a small positive effect of internet search queries on the implied volatility for “Name” and “Aktie” queries of stocks is present. Moreover, the grouping into historical volatility clusters yielded some predictive power to the one week ahead volatility changes. The second hypothesis does not hold for the single individual stock options and maybe tangentially developed for the aggregated queries at little significance level.
56
Copyright © 2015. Diplomica Verlag. All rights reserved.
In order to test further time lags and autocorrelation of either the stocks returns and secondly the query data itself, the method of Vector Autoregressive Models (VAR) concludes the study in the final next part.
57
Copyright © 2015. Diplomica Verlag. All rights reserved.
Part 3 - Vector Autoregressive Model (VAR) Vector Autoregressive Models were popularized by Sims (1980) as generalizations of univariate autoregressive models. To introduce the method, one example for the BMW stock is presented in particular and then the generalization over the full sample is further analyzed.
3.1 BMW VAR The BMW example is used to show the seasonality adjustment. In the VAR model the BMW return lags, DAX return lags for general index performance control and the seasonal corrected BMW search query deviation lags are tested for their significance. The best model fit is based on an AIC ratio with two lags. The setup of lag order is automatically optimized by the software and the different lag sizes are explained by the AIC ratio. In general, the AIC is an indicator for the quality of a model and is calculated by the unexplained variance. A low unexplained variance is expressed by a low AIC and represents a better model. Prior to the VAR impulse response function outputs the lags were checked for the best AIC in combination. Surprisingly, the lag structure is no longer than two weeks. The BMW performance is significantly more related to its own lags and the index performance than to search queries, which are not significant. The below impulse response graph about the influence on search queries on the BMW returns shows this relationship:
Copyright © 2015. Diplomica Verlag. All rights reserved.
Figure 3: Impulse Response of BMW returns to Search Queries
59
In the graph the 90% confidence interval is slightly above the zero line. There could be a tendency for the average to be above zero for the second week simulation with roughly 0.25% return for every standard deviation of new queries. This would imply that the impact of search queries is positively related with performance of the stock in a two week time frame window, but it cannot be said with statistical confidence at a reliable level. Overall, a longer lag structure was expected to account for the decision process inclined with buying a new car. Some weeks of prior information gathering, deciding for one product and final delivery are not reflected by the AIC ratio with two lags. A two week time frame for a decision and final realization of sales in the company’s balance sheets, thus influencing the stock price, seem very short to contribute to the above effect. Stock related queries would make more sense than product related queries in this setup. Overall, there is no statistical significance to proof this finding. Nonetheless, the graph is in line with Barber and Odean (2008) who argued that stocks are more likely to increase by attention that to fall. Their argument is deducted by the assumption that nonprofessional investors need to learn about companies before they can make an educated selection for one stock, whereas sell decisions were rather spontaneously. If this assumption was true, we should see it for the pooled data model later as well. Please find the table of coefficients for the BMW returns below: Table 27: VAR_BMW
Copyright © 2015. Diplomica Verlag. All rights reserved.
Y= BMW_return%_t0
60
Variable X Constant ǻGSI_BMW_lag1 ǻGSI_BMW_lag2 DAX_return%_lag1 DAX_return%_lag2 BMW_return%_lag1 BMW_return%_lag2
coefficient 0.230319 0.114614 0.10718 0.107956 0.332172*** -0.0942824 -0.178032***
N Adj. R² AIC F-Test Durbin-Watson
484 0.018104 14.45 2.4842** 1.98
t-ratio 1.1435 1.2165 1.1385 1.1158 3.4497 -1.3838 -2.6303
p-value 0.25342 0.22441 0.25547 0.26508 0.00061 0.16706 0.00881
3.2 General VAR In a last step a mixed model over the full sample is analyzed with respect to the average search queries in relation to the average performance and volatility. For the percentage change of price the average percentage change is calculated with every stock from the sample, whereas for the volatility an index proxy, namely the VDAX Index is chosen. The VDAX is a synthetically calculated index of the implied index options on the German DAX 30 index with a constant maturity of one month and a strike price at-themoney. This proxy is helpful, because many smaller stocks in the sample had options outstanding neither in the past nor today. Prior to the VAR impulse response function outputs the lags were checked for the best AIC in combination. As a result “Name” indications are calculated with six weeks lag, “AG” indications with ten weeks lag and “Aktie” search queries with a smaller lag of two. For the charts below, a time window of 16 weeks shows the main effects. The analysis of the vector autoregressive models shows high volatility and little to no significance. The average prediction of the impulse response function may indicate a positive relationship in the case of directional price movement, but the confidence interval is very broad. This implies no significance and is also the case for “AG” and “Name” queries. In the case of “Aktie” the very first queries are negative and inversely related to the other two query types. This may represent the fact that people who search for the company’s product by the “Name” are more likely to increase product sales numbers and thus stock prices, than people who are controlling their investment portfolio for the stock by “AG” or “Aktie” queries. This finding is supported by the impulse response for the implied volatility. In the case of “Aktie” the implied volatility is increasing in the very first days of search queries and has a beta above 0 with more than 99% confidence. There is a tendency in the impulse response function of the implied volatility to increase in week six for “Name” and “AG”. This drop is paraphrased by negative directional performance from the stocks. This relationship is circumscribed at high variation and low statistical significance. It is not stable.
Copyright © 2015. Diplomica Verlag. All rights reserved.
The below charts show the impulse response functions for the directional VAR and the implied volatility model. In the shaded area in grey is the 99% confidence band.
61
Copyright © 2015. Diplomica Verlag. All rights reserved.
Figure 4: Price Change t0 in % – GSI Name lag order 6
Figure 5: delta_implied_vola_in_%__vdax_t0 - GSI Name lag order 6
Figure 6: Price Change t0 – GSI AG lag order 10
Figure 7: d_implied_vola__vdax_t0 - GSI AG lag order 10
Figure 8: Price Change t0 – GSI Aktie lag order 2
Figure 9: d_implied_vola__vdax_t0 - GSI Aktie lag order 2
62
Da et al. (2011, p.1465) found a positive price shift of 30 BP the first two weeks and a price reversal to the end of the year. This feature is not documented in the above charts. It shows a negative shift in the first two weeks, followed by an upwards movement and finally mean reversion up to week 16. As already said, those shifts are not statistically stable. The charts may support Barber and Odean (2008, p.3) who argued that nonprofessional investors need to learn before buying a stock. Excluding the short run negative effect, the positive returns around week two for “AG” and “Name” is tendentially supportive for their argument. It would fit together with the time necessary to learn and draw an educated conclusion before investing into a stock and driving the price upwards. Furthermore, Latoeiro, Ramos and Veiga (cf. 2013, p.15) found supportive results for Da et al. (2011). They report a positive return for the first week after and two weeks before the actual query of 11 BP and negative cumulative returns in week four of -24 BP and with insignificant, but negative sign eight weeks thereafter. When taking a look at the confidence levels, only then their negative four week returns were significant at a p-value of 1.3% (see p. 41). it is not quite clear why stocks should drop four weeks after search queries. The data above also entails negative returns in the later weeks for “Name” and “AG” queries, just the timing is a little bit later than for Da et al. and Latoeiro, Ramos and Veiga. From the data it is difficult to find clear statements about the direction of impact. The statistical significance of all the models cited is weak as is the model above. From this point there is no clear answer with respect to the direction of returns. On the other hand, for the implied volatility the effect is positive and the 90% confidence interval above the zero line in the first weeks. Queries for “Name” and “Aktie” are clearly positive, while “AG” seems a little bit vague. Latoeiro, Ramos and Veiga (2013, p.50) had already shown the positive effect on the EURO STOXX implied volatility index. This effect can also be seen at the German implied volatility.
Copyright © 2015. Diplomica Verlag. All rights reserved.
This is a good starting point to take up the limitations already mentioned in the prior text and to add further points for completeness.
63
Copyright © 2015. Diplomica Verlag. All rights reserved.
Part 4 – Limitations of Research This study does not claim to cover all aspects important to the field. With respect to the given time constraint the following limitations and shortfalls are known:
4.1 Initial Intention for Query
Copyright © 2015. Diplomica Verlag. All rights reserved.
There is a variety of possible motivations why people search for a term and many possible outcomes exist. The initial intention could be a desire to buy the product of the respective company. Or maybe a competitor wants to learn about the products, which could have a negative influence on the stock price in the long run. This negative effect would also be true, if some annoyed customer wanted to sue the company. In the next stage, an investor wants to buy the company or sell existing stock. The desire to buy or sell can be distinguished from the cases, where the investor already owns a position (long and short) and where the investor originally gathers information about a second or third company for peer group review (Peng and Xiong, 2006) and got interested in the final stock. In all cases the investor will learn or misinterpret. Based on this information, it is not for sure that the investor will act immediately or later. The cross learning effect can be shifted to future decisions. Additionally to the plain initial intention, external shocks could induce search queries as well. These may be abnormal price changes of the stock or other related stocks and macro- and microeconomic shocks. Fundamental changes about the company or economy may be announced in the news channels like television, web or print. The news may lead to search queries and let the investor rethink his positioning, conditional on the fact that the investor apprehends the impact of the news on the stock in a timely manner. With respect to insider information there are papers which claim that local insider information is first verified by Google queries before being monetized (Mondria and Wu, 2011, p.9). In all cases the level of prior investor’s information can significantly change the outcome of information gathering via search queries, depending on the cognitive capabilities to analyze the data and the actual quality of information found. Already this list opens more than 70 possible cases to learn, buy, sell, hold, cross-learn, time shift or to not act at all. In the end, the search query could be randomly induced by a researcher who is analyzing the topic or some other, unknown reason.
4.2 Discussion about Transparency versus Endogeneity Dimpfl and Jank (2011) already discussed the possibility that the availability of search engine data is not influensive at all, because it just makes transparent what had been there nevertheless. Meaning that people might have used other channels of information before, but now substitute these information sources by Google search queries. It could be understood as a hidden layer, finger print or shadow, which becomes traceable, but remains a random walk. The act of searching represents an internal desire to grasp information about a stock, but the
65
outcome to buy or sell a security is open. Thus the core issue is that search engine data may not directly influence the stock prices, because it does not reveal the intention of the user.
4.3 Language Barrier Other researcher like Da et al. (2008) have analyzed the US market and found significant results. This is different to the study’s results. One reason could be the difference of language and home bias. Americans may search for American companies, but not for German companies with terms like “Aktie” or “AG”, because they do not speak the German language. The language barrier skewed the search tags to German investors. In China Google is not used intensively. There Baidu is the main search engine, therefore a study about Asian stocks would not be representative using the Google database. Summing up, language and culture can be a limitation.
4.4 Technical Limitations Another source of error is the Google data quality. It shows four kinds of weakness. First, the scaling between zero and 100 artificially changes the real number of queries. Search spikes are rescaled in retrospect. Moreover, the practice to report zeros for low search frequencies blanks out large time frames in the overall set. This limits the access to the true data. The frequency of data points is not constantly available on a weekly basis and is altered to monthly time series. This practice hides the true nature of the data and leaves room for speculation.
4.5 Alternatives It would be very favorable to verify the findings by source triangulation with other search engine data, e.g. from Yahoo, Bing, Baidu, AOL or others. Although Google covers 83% of initial search queries the data triangulation could support or change the findings. At this point in time, this possibility is not publicly available from the other sources.
Copyright © 2015. Diplomica Verlag. All rights reserved.
4.6 Correction for Seasonality The mathematical correction for seasonality opens room for error. The seasonality could have been just a statistical outlier or for some series the correction was missed incidentally. In a further analysis it should be checked if the seasonality of Google queries may explain the seasonal patterns of the stocks. In this paper it was assumed that most of the stocks seasonality was absorbed by the control variable of overall market movement.
66
queries did not show up in the results. As of 10th of June all three values were reported as “0” for the preceding week. This result opens the discussion as to who is searching: It may be of a big importance if a star investor or a hand full of money managers are searching stocks. Money managers control billions of capital and their action would stay undiscovered in the above example. Moreover, institutional investors do not search with Google - Da et al. (2011, p.1475) argues that they have access to superior research material.
4.10 Access Chanel The most intriguing argument for internet search query’s misrepresentation is that many people will access a known company directly via their browser link. Some may even have saved a company as “favorite” link to their internet browser task bar or know the company’s web address by heart. Those people are not registered via the internet search query method.
4.11 Missing Controls Other control variables which could make search query information more valuable were intentionally left out. These were news controls, sentiment controls and positioning controls. The opinion an investors holds in combination with his or her stock positioning can make very big differences in how the information is reflected and the actions taken thereafter. Possession of these controls ex ante could valuably change the information content of search queries conducted.
Copyright © 2015. Diplomica Verlag. All rights reserved.
After having discussed all constrains, the next chapter lists areas of further research and finishes with a short conclusion.
68
queries did not show up in the results. As of 10th of June all three values were reported as “0” for the preceding week. This result opens the discussion as to who is searching: It may be of a big importance if a star investor or a hand full of money managers are searching stocks. Money managers control billions of capital and their action would stay undiscovered in the above example. Moreover, institutional investors do not search with Google - Da et al. (2011, p.1475) argues that they have access to superior research material.
4.1 Access Chanel The most intriguing argument for internet search query’s misrepresentation is that many people will access a known company directly via their browser link. Some may even have saved a company as “favorite” link to their internet browser task bar or know the company’s web address by heart. Those people are not registered via the internet search query method.
4.1 Missing Controls Other control variables which could make search query information more valuable were intentionally left out. These were news controls, sentiment controls and positioning controls. The opinion an investors holds in combination with his or her stock positioning can make very big differences in how the information is reflected and the actions taken thereafter. Possession of these controls ex ante could valuably change the information content of search queries conducted.
Copyright © 2015. Diplomica Verlag. All rights reserved.
After having discussed all constrains, the next chapter lists areas of further research and finishes with a short conclusion.
68
Part 5 5.1 Further Research There are many areas which could be looked into after this study. One of the most important aspects of future research has to be how to differentiate between queries related to buy intentions and queries related to sell intentions. The above listed control variables from the limitations part are very important to include into the framework. As one aspect for investment managers could be a strategy for sector ETF’s rotation steered by Google’s category filter. Maybe the automobile or airline stocks are more sensitive to their category than to the individual stock’s query. Additionally to that, the origin of the search query could give insightful hints. The effect of home bias versus international queries could have an impact. Moreover, an international equity portfolio manager could use country data and relative comparisons of local consumer index data to balance an international portfolio wherever higher abnormal search activities for consumer products arise. The class of inflation linked bonds could be affected by now-casting of unemployment and consumer products studies as reported in the literature review in the first part. Also in the area of commodities there is not much research yet. As commodities are simpler to search for e.g. with terms like “oil” or “fuel prices” the effects on the market could be easier to identify as with difficult company names. With relation to the real economy the information content of search queries could be helpful for single companies in planning capacity utilizations of their production facilities. Whenever a company is searched for, the probability of succeeding sales is increased. Many searches may lead to higher sales numbers and the production could be initiated earlier.
Copyright © 2015. Diplomica Verlag. All rights reserved.
5.2 Conclusion The first and second hypothesis about the directional effect of internet search queries on capital markets are analyzed within this study and cannot be verified at a significant confidence level. The first hypothesis stated: “An increase of internet search queries for a specific stock is a directional indicator of its later price movements” and the second hypothesis stated: “An increase of internet search queries for a specific stock, increases its implied option volatility premium”. Several studies for the US market found significant relationships between queries and price movements and motivated to further analyze the topic. Therefore stocks in the German market from within a sample of DAX, MDAX and SDAX were chosen and analyzed with different search terms like the “Name” of a company or the name in combination with the suffix “AG” or “Aktie”. The assumption implies that “Name” was related with the products of a company, while “AG” and “Aktie” are to be assigned with the stock. The outcomes are contra-
69
dicting and not similar among the three groups caused by data peculiarities and a different number of time series available: The “Name” queries were available at the highest frequency, while the other two are apparently less frequently searched. The quality deficiency becomes obvious when splitting the samples for stability tests. Often the signs of the regressed coefficients flip. In contrast to the prior studies, the effects in this study are thus not statistically significant in general. Overall the VAR models, by including more than one lag, did not give a better indication for the directional effects than the simpler regression models with only one lag. Next to the direct control variables of transaction volume, index performance and autocorrelation via last week’s trading returns, the simpler regression models were analyzed with respect to level controls by splitting the data into groups of quantiles. The quantile regressions are conducted for Google search query intensity, Price-to-Book valuation in interaction with queries, 52 week high and lows in interaction with queries and finally for historical volatility in interaction with the queries. As already proven by other authors, the effects of Price-to-Book and 52 week highs and lows could be reproduced within acceptable bounds of variation. As a new finding the query regressions on implied option volatility slightly hint a positive relationship of 0.1379% at 95% significance. The increase is induced by a one standard deviation seasonal adjusted query increase of the company’s name. Moreover, stocks with very high historical volatility showed an increase of +1.58% related to queries with 95% confidence and a reversion of -0.61% in the following week. The adj. R² below 0.1% does not allow for a trading strategy with this model. Moreover the relation broke down, when splitting the sample in half.
Copyright © 2015. Diplomica Verlag. All rights reserved.
To sum up the findings within this study, directional effects of internet search queries on capital markets cannot be statistically verified. There has to be more research with more external control variables. Some part of the model’s insignificance can be explained by not including news and investor’s sentiment ex ante into the framework. It can be said that simple internet search queries do not transmit the information content at a level simplified enough to impart the directional effects. In order to extract information about the future directional movements more details about entities, which are searching via internet queries, have to be integrated into multidimensional regressions.
70
Bibliography Asur, S. and Huberman, B. A., 2010. Predicting the Future with Social Media. arXiv:1003.5699v1, available at http://arxiv.org/pdf/1003.5699v1, accessed May 2013. Askitas, N., Zimmermann, K. F., 2009. Google econometrics and unemployment forecasting. Applied Economics Quarterly 55(2), 107–120. Bank, M., Larch, M. and Peter, G., 2010. Google Search Volume and Its Influence on Liquidity and Returns of German Stocks. Financial Markets and Portfolio Management 25, 239-64. Barber, B. M. and Odean, T., 2001. The Internet and the Investor. Journal of Economic Perspectives 15, 41-54. Barber, B.M and Odean, T., 2008. All That Glitters: The Effect of Attention and News on the Buying Behavior of Individual and Institutional Investors. Review of Financial Studies 21, 785-818. Black, F. and Scholes, M., 1973. The Pricing of Options and Corporate Liabilities. Journal of Political Economy 81(3), 637–654. Bollen, J. Mao, H. and Zeng, X. J., 2011. Twitter mood predicts the stock market. Journal of Computational Science 2(1), 1-8. Bollerslev, T., 1986. Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics 31, 307-327. Brooks, C., 2008. Introductory Econometrics for Finance. Cambridge University Press. Carhart, M. M., 1997. On persistence in mutual fund performance. Journal of Finance 52(1), 57–82.
Copyright © 2015. Diplomica Verlag. All rights reserved.
Chan, W. S., 2003. Stock price reaction to news and no-news: Drift and reversal after headlines. Journal of Financial Economics 70(2), 223-260. Choi, H. and Varian, H., 2009a. Predicting the present with Google Trends. Google Inc., available at http://google.com/googleblogs/pdfs/google_predicting_the_present.pdf, accessed May 2013. Choi, H. and Varian, H., 2009b. Predicting initial claims for unemployment benefits. Google Inc., available at http://research.google.com/archive/papers/initialclaimsUS.pdf, accessed May 2013. D’Amuri, F. and Marcucci, J., 2010. Google it! Forecasting the US unemployment rate with a Google job search index. MPRA paper, University Library of Munich, Germany.
71
Da, Z., Engelberg, J. and Gao, P., 2013. The Sum of All FEARS: Investor Sentiment and Asset Prices. Available at SSRN: http://ssrn.com/abstract=1509162, accessed May 2013. Da, Z., Engelberg, J. and Gao, P., 2010. In Search of Earnings Predictability. Working Paper, available at SSRN: http://ssrn.com/abstract=15898057, accessed May 2013. Da, Z., Engelberg, J. and Gao, P., 2011. In Search of Attention. Journal of Finance 66, 146199. Daniel, K. D., Hirshleifer, D. and Subrahmanyam, A., 1998. Investor Psychology and Security Market Under- and Overreactions. The Journal of Finance 53, 1839-1885. DeBondt, W. F. M. and Thaler, R. H., 1985. Does the Stock Market Overreact? Journal of Finance 40, 793-805. Deutsche Börse, 2013. Guide to the Equity Indices of Deutsche Börse. Available at http://www.dax-indices.com/EN/MediaLibrary/Document/Equity_L_6_19_e.pdf, accessed May 2013. Dickey, D. A. and Fuller, W. A., 1979. Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association 74, 427431. Dimpfl, T., Jank, S., 2011. Can internet search queries help to predict stock market volatility? CFR Working Papers 11-15, University of Cologne, Centre for Financial Research, CFR. Drake, S., Roulstone, D. and Thornock, J., 2012. Investor Information Demand: Evidence from Google Searches Around Earnings Announcements. Journal of Accounting Research 50, 1001-1040. Engle, R. F. and Granger, C.W.J., 1987. Co-integration and error correction: Representation, estimation, and testing. Econometrica 55, 251-276.
Copyright © 2015. Diplomica Verlag. All rights reserved.
Fama, E. F. and French, K. R., 1993. Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33(1), 3–56. Fink, C. and Johann, T., 2013. May I Have Your Attention, Please: The Market Microstructure of Investor Attention. Available at SSRN: http://ssrn.com/abstract=2139313, accessed May 2013. Ginsberg, J., Mohebbi, M., Patel, R., Brammer, L., Smolinski, M. and Brilliant, L., 2009. Detecting influenza epidemics using search engine query data. Nature 457, 1012-1014. Goel, S., Hofman, J.M., Lahaie, S., Pennock, D.M., Watts, D.J., 2010. Predicting Consumer Behavior with Web Search. PNAS Proceedings of the National Academy of Sciences of the United States of America 107, 17486-17490. 72
Google, 2010. What does a flat line on the graph indicate? Google Inc., available at https://support.google.com/trends/answer/87286?hl=en&ref_topic=13975, accessed May 2013. Google, 2012a. How is the data scaled? Google Inc., available at http://support.google.com/trends/answer/87282?hl=en&ref_topic=13975, accessed May 2013. Google, 2012b. Insights into what the world is searching for -- the new Google Trends. Inside Search – The official Google Search blog, Google Inc., available at http://insidesearch.blogspot.co.il/2012/09/insights-into-what-world-is-searching.html, accessed May 2013. Google, 2012c. What do the numbers on the graph mean? Google Inc., available at http://support.google.com/trends/answer/87285?hl=en&uls=en, accessed May 2013. Grinblatt, M., Titman, S. and Wermers, R., 1995. Momentum investment strategies, portfolio performance, and herding: A study of mutual fund behavior. American Economic Review 85(5), 1088-1105. Guzman, G., 2011. Internet Search Behavior as an Economic Forecasting Tool: The Case of Inflation Expectations. Journal of Economic and Social Measurement 36(3). Hirshleifer, D. and Shumway, T., 2003. Good Day Sunshine: Stock Returns and the Weather. Journal of Finance 58, 1009-1032. Jegadeesh, N. and Titman, S., 1993. Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency. The Journal of Finance 48(1), 65-91. Kahneman, D., Tversky, A., 1979. Prospect theory: An analysis of decision under risk. Econometrica 47(2), 263–291.
Copyright © 2015. Diplomica Verlag. All rights reserved.
Kholodilin, A., Podstawski, M. and Siliverstovs, B., 2010. Do Google Searches Help in Nowcasting Private Consumption? A Real-Time Evidence for the US. Discussion paper n°997(April), available at http://www.diw.de/documents/publikationen/73/diw_01.c.356220.de/dp997.pdf, accessed May 2013. Kita, A. and Wang, Q., 2012. Investor Attention and FX Market Volatility. Available at SSRN: http://ssrn.com/abstract=2022100, accessed May 2013. Lynn, W. and Brynjolfsson, E., 2009. The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales. NBER Conference Technological Progress & Productivity Measurement, WISE, 2009; ICIS, 2009, available at http://www.nber.org/confer/2009/PRf09/Wu_Brynjolfsson.pdf, accessed May 2013.
73
Merton, R. C., 1987. A simple model of capital market equilibrium with incomplete Information, Journal of Finance 42(3), 483–510. Mondria, J., Wu, T., 2010. The Puzzling Evolution of the Home Bias, Information Processing and Financial Openness. Journal of Economic Dynamics and Control 34, 875-896. Mondria, J. and Wu, T., 2011. Asymmetric Attention and Stock Returns. AFA 2012 Chicago Meetings Paper, available at SSRN: http://ssrn.com/abstract=1772821, accessed May 2013. Netmarketshare, 2013. Total Markets Share. Available at http://www.netmarketshare.com/ ĺ “Market Share Repots” ĺ “Search Engines”, accessed May 2013. Peng, L., Xiong, W., 2006. Investor attention, overconfidence and category learning. Journal of Financial Economics 80(3), 563–602. Polgreen, P.M., Chen, Y., Pennock, D.M., and Forrest, N.D., 2008. Using internet searches for influenza surveillance. Clinical Infectious Diseases 47, 1443-1448. Ramos, S. B., Veiga, H. and Latoeiro, P., 2013. Predictability of stock market activity using Google search queries. Statistics and Econometrics Working Papers, Universidad Carlos III, Departamento de Estadística y Econometría, available at http://EconPapers.repec.org/RePEc:cte:wsrepe:ws130605, accessed May 2013. Rubin, A. and Rubin, E., 2010. Informed investors and the Internet. Journal of Business Finance & Accounting 37(7/8), 841–865. Sharpe, W., 1964. Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk. Journal of Finance, p. 425-442. Smith, G.P., 2012. Google Internet Search Activity and Volatility Prediction in the Market for Foreign Currency. Finance Research Letters 9, 103-110. Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T., 2001. Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology 52(3), 226–234.
Copyright © 2015. Diplomica Verlag. All rights reserved.
Tetlock, P.C., 2007. Giving Content to Investor Sentiment: The Role of Media in the Stock Market. Journal of Finance 62, 1139-1168. Tversky, A., Kahneman, D., 1974. Judgment under uncertainty: Heuristics and biases. Science 185, 1124–1130. Vlastakis, N. and Markellos, R. N., 2012. Information demand and stock market volatility. Journal of Banking & Finance 36(6), 1808-1821. Rajgopal, S., Kotha, S. and Venkatachalam, M., 2000. The Relevance of Web Traffic for Internet Stock Prices. Working Paper, University of Washington.
74
Schmidt, T. and Vosen, S., 2011. Forecasting Private Consumption: Survey-based Indicators vs. Google Trends. Journal of Forecasting 30(6), 565-578. Sims, C. A., 1980. Macroeconomics and Reality. Econometrica 48, 1-48. Wang, Y., 2012. Media and Google: The Impact of Information Supply and Demand on Stock Returns. Available at SSRN: http://ssrn.com/abstract=2180409, accessed May 2013.
Copyright © 2015. Diplomica Verlag. All rights reserved.
Webhits, 2013. Suchmaschinen. Available at http://www.webhits.de/artwork/ws_engines_druck.png, accessed May 2013.
75
Copyright © 2015. Diplomica Verlag. All rights reserved.
Appendix
Copyright © 2015. Diplomica Verlag. All rights reserved.
A.1 Frequency Distributions Figure 10: Mixed Name
Figure 11: Z mixed Name
Figure 12: Mixed AG
Figure 13: Z mixed AG
Figure 14: Mixed Aktie
Figure 15: Z mixed Aktie
77
A.2 Q-Q-Plot When plotting the stocks returns and the GSI inputs, the fat tails for the GSI data become visible. By pure visual inspection no linear relationship becomes obvious on first sight. GSIs cannot be directly translated into a function. The “Aktie” data seems to be the most normally distributed of all three. The Q-Q-plots below show the deviation from the normal distribution at the outer tails. Figure 16: Q-Q-plot of standardized mixed Queries
A.3 Table of aggregate Inputs After controlling for seasonality and data quality the below number of Google search query time series remained: Table 20: Number of available Data Points by Index and Search Term
Copyright © 2015. Diplomica Verlag. All rights reserved.
Number of Terms available DAX MDAX SDAX Stocks Data Points
78
Common Name 25 27 28 80 29000
AG 21 4 2 27 7639
Aktie 15 1 2 18 3283
News 5 1 0 6 856
A.4 Alternative Model Specifications Most of the models above show fat tails and seem to be not adequately specified (w.r.t Ramsey Reset test / Non-liearity). Therefore the regression for the “Name” queries could have been more stable for other distributions. The alternatives are tested below: Table 21: Regression of Price Changes for different Setups of non-linear Specification Model Specification
Z_NAME
Abs Z_name
3.root Z_name
2.root Z_name
Copyright © 2015. Diplomica Verlag. All rights reserved.
Z_name^2
Y= Price_%
Y= Price_%^2
Y= Abs(Price_%)
Constant
0.2396***
Constant
30.7663***
Constant
3.6631***
Zmix_name
-0.0427
Zmix_name
10.1682***
Zmix_name
0.271***
R^2
0.0000
R^2
0.0011
R^2
0.0037
AIC
164516.88
AIC
372116.24
AIC
149417.31
F
1.39
F
29.07
F
99.21
Constant
0.2083***
Constant
22.7514***
Constant
3.4118***
absZ_name
0.0608
absZ_name
11.3855***
absZ_name
0.26***
R^2
0.0000
R^2
0.0009
R^2
0.0020
AIC
207167.79
AIC
468070.05
AIC
187579.78
F
2.04
F
30.85
F
67.00
Constant
0.2375***
Constant
28.2488***
Constant
3.5378***
Z_name_w3
-0.0227
Z_name_w3
9.1119***
Z_name_w3
0.2677***
R^2
0.0000
R^2
0.0006
R^2
0.0023
AIC
207169.53
AIC
468079.54
AIC
187569.90
F
0.31
F
21.35
F
76.90
Constant
0.242***
Constant
18.3989***
Constant
3.22***
Z_name_w2orO
-0.0115
Z_name_w2orO
27.6608***
Z_name_w2orO
0.8304***
R^2
0.0000
R^2
0.0015
R^2
0.0081
AIC
131808.53
AIC
304298.96
AIC
120046.32
F
0.02
F
32.92
F
173.46
Constant
0.2302***
Constant
27.0493***
Constant
3.5122***
Z_name_^2
0.0105
Z_name_^2
1.6773***
Z_name_^2
0.0353***
R^2
0.0000
R^2
0.0005
R^2
0.0009
AIC
207168.26
AIC
468083.64
AIC
187614.81
F
1.57
F
17.25
F
31.92
Evolution: The model specification gets better for square roots of y and for the abs(y). This is nice to know, but does not help in explaining the direction of returns induced by queries.
79
A.5 Seasonality of Returns Sometimes stock returns differ statistically from month to month. The below table shows this monthly effects and concludes that some parts of seasonality can be reduced by adding the market index into the regression, assuming that some of the seasonality is also induced by market overall seasonality. Table 22: Model_Seasonalities Y= price_r_t0 Variable X Constant
DAX_return_t0 -
BMW_return_t0 -
BMW_return_t0 -
BMW_alpha_t0 -
January
-0.186744 (-0.387) 0.092500 (0.185) 0.252045 (0.529) 0.823953 (1.710) * -0.121951 (-0.247) -0.137895 (-0.269) 0.405122 (0.821) -0.480526 (-0.937) 0.287436 (0.568) -0.059024 (-0.120) 0.475135 (0.914) 0.841951 (1.706) *
-0.149937 (-0.221) 0.154376 (0.219) 0.927142 (1.380) 1.142450 (1.681) * -0.131856 (-0.190) -0.303368 (-0.420) 0.927558 (1.333) -1.06339 (-1.471) 0.718719 (1.007) -0.184720 (-0.265) 0.250118 (-0.341) 0.959569 (1.379)
0.045400 (0.099) 0.057620 (0.122) 0.63500 (1.471) 0.280593 (0.613) -0.004294 (-0.009) -0.159129 (-0.328) 0.503798 (1.077) -0.560754 (-1.154) 0.418059 (0.872) -0.122980 (-0.263) -0.747113 (-1.518) 0.078882 (0.168)
0.036279 (0.079) 0.062500 (0.132) 0.675227 (1.497) 0.318140 (0.697) -0.009269 (-0.019) -0.165000 (-0.339) 0.526317 (1.120) -0.582368 (-1.200) 0.432051 (0.902) -0.125366 (-0.268) -0.724595 (-1.473) -0.116829 (0.250)
DAX_return_t0
-
-
1.04601 (24.05) ***
-
R² AIC
0.015463 2509.393
0.021811 2843.502
0.55999 2457.230
0.017559 2456.423
February March April May June July August September October November
Copyright © 2015. Diplomica Verlag. All rights reserved.
December
80
A.6 Seasonality of Search Queries The middle column in the below table shows exemplary that BMW queries are seasonally effected. To the right and left two methods of adjustment are presented. On the left the BMW queries are simply corrected by the overall search queries of “cars” and to the right the time series is corrected by its own monthly averages. The correction by its own monthly averages is better than the category method as the table shows. By this rational the monthly method is used in the study for every sample. Table 23: Model_Seasonalities Y=GSI’s Variable X
GSI_BMW_minus _category
GSI_BMW
GSI_BMW_minus _monthly_average
Constant
-
-
-
January
-0.343905 (-0.8152) -0.387752 (-0.8865) 0.475143 (1.1393) -0.516716 (-1.2248) 0.092571 (0.2143) -0.177281 (-0.395) -0.278156 (-0.6438) 0.547847 (1.2208) 0.814324 (1.8383) * -0.248108 (-0.5743) 0.011184 (0.0246) 0.428091 (0.9908)
0.767442 (2.2031) ** 0.075 (0.2077) 0.772727 (2.2439) ** -0.395349 (-1.1349) -0.121951 (-0.3418) -0.289474 (-0.7812) 0.439024 (1.2306) -0.157895 (-0.4261) -0.025641 (-0.0701) -0.731707 (-2.051) ** -0.594595 (-1.5833) -0.097561 (-0.2735)
0.050775 (0.1458) 2.19E-17 (0) -0.032273 (-0.0937) 0.004651 (0.0134) -0.038618 (-0.1082) -0.00614 (-0.0166) -0.00542 (-0.0152) -0.002339 (-0.0063) -0.003419 (-0.0093) -0.009485 (-0.0266) 0.005405 (0.0144) 0.035772 (0.1003)
0.022661 2380.134
0.041283 2193.991
0.000113 2193.991
February March April May June July August September October November December
Copyright © 2015. Diplomica Verlag. All rights reserved.
R² AIC
81
A.7 Trading Strategies The next three regression tableaus could be understood as trading strategies, if the regressions were statistically significant. The regressions aim on explaining the one week ahead performance (t+1). In case the regression coefficient were significant, one could buy the stocks if the coefficient was positive and make a profit the following week or sell the stocks short, if the coefficients were negative. The last table is for the implied option volatility. Here the same principle would apply, but with the stock’s option and a delta hedge in the underlying. Since the regressions are not significant this is not a valid approach for action. Nevertheless, please find the results: Table 24: Model_PB_ZMix Y=price_r_t+1 Variable X Constant Quantile 1 intercept Quantile 2 intercept Quantile 3 intercept Quantile 4 intercept Quantile 5 intercept Quantile 1 and GSI Quantile 2 and GSI Quantile 3 and GSI Quantile 4 and GSI Quantile 5 and GSI ǻ Turnover Price_r_t-1
Copyright © 2015. Diplomica Verlag. All rights reserved.
Avg_Price_r_t0 N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity
82
Name 0.421736 (2.883 *** -0.215069 (-1.287) -0.171145 (-1.039) -0.196974 (-1.217) -0.281837 (-1.711)* -0.247355 (-1.456) -0.073986 (-0.825) -0.080287 (-0.845) -0.173647 (-2.049)** -0.022303 (-0.245) -0.110488 (-1.008)
AG 0.43044 (2.907)*** -0.134965 (-0.695) -0.379649 (-1.941)* -0.296895 (-1.374) -0.064775 (-0.277) -0.469019 (-2.214)** 0.038916 (0.247) 0.081183 (0.483) -0.24463 (-1.518) 0.151431 (0.513) -0.089537 (-0.44)
Aktie 0.442094 (2.949)*** -0.57877 (-1.971)** -0.262012 (-0.993) -0.101259 (-0.322) -0.305858 (-1.356) -0.371463 (-1.086) 0.254419 (0.831) -0.06621 (-0.279) -0.14928 (-0.41) -0.282886 (-1.456) -0.00751 (-0.024)
-0.048343 (-1.752)* 0.055912 (5.32)*** -0.076129 (-3.97)***
-0.088388 (-1.831)* 0.05186 (3.144)*** -0.123164 (-3.075)***
-0.094518 (-1.411) 0.068848 (3.346)*** -0.15252 (-2.752)***
24243 0.005525 153044.6 4.7888093 0.6814 0 robust SE
9037 0.007530 59129.08 2.611346*** 0.650125 0 robust SE
5471 0.010212 36963.75 2.334066*** 7.06e-118 0 robust SE
Table 25: Model_52weekHighLow_Zmix Y= price_r_t+1 Variable X Constant High10% Low10% High10%_and_GSI Low10%_and_GSI ǻ Turnover Price_r_t0 Avg_Price_r_t0
Copyright © 2015. Diplomica Verlag. All rights reserved.
N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity Reset-Test p-Value Non-lin p-Value
Name 0.325864 (7.105)*** 0.016758 (0.245) -0.587202 (-3.776)***
AG 0.232593 (2.716)*** 0.117386 (0.979) -0.599956 (-2.295)**
Aktie 0.323586 (2.425)** -0.051054 (-0.244) -0.767232 (-1.956)*
-0.029818 (-0.598) -0.125695 (-0.694)
0.133711 (1.262) -0.517215 (-1.593)
0.044375 (0.21) 0.147915 (0.539)
-0.061862 (-2.331)** -0.035501 (-2.132)** -0.058792 (-2.525)**
-0.037387 (-0.849) -0.082471 (-1.885)* -0.053645 (-1.297)
-0.151155 (-1.579) 0.0967 (2.081)** -0.286785 (-3.93)***
23960 0.0043 151363.12 7.17 0.70 0 Robust SE 8.20e-5 1.19e-5
6727 0.0108 41869.52 3.31 0.9038 0 Robust SE 4.58e-25 0.01495
2810 0.0152 18237.17 3.01 0.0420 0 Robust SE 2.961e-6 0.02746
83
Table 26: Model_HistVola&GSI_ZMix Y= d_implied_vola_t+1 Variable X Constant Quantile 1 intercept Quantile 2 intercept Quantile 3 intercept Quantile 4 intercept Quantile 5 intercept Quantile 1 and GSI Quantile 2 and GSI Quantile 3 and GSI Quantile 4 and GSI Quantile 5 and GSI ǻ Turnover Price_r_t0 Avg_Price_r_t0
Copyright © 2015. Diplomica Verlag. All rights reserved.
N Adj. R² AIC F-Test CHOW p-Value u~N(0,1) p-Value Homoscedasticity
84
Name 0.160691 (2.205)** -0.153415 (-1.353) -0.060945 (-0.39) -0.210271 (-2.021)** -0.264291 (-2.199)** -0.011822 (-0.039)
AG 0.158549 (2.157)*** -0.088293 (-0.792) -0.027732 (-0.219) -0.313818 (-2.244)** -0.372487 (-1.834)* -0.4142 (-0.69)
Aktie 0.092543 (0.715) 0.086776 (0.336) -0.327038 (-1.524) -0.195052 (-0.764) -0.14045 (-0.694) -0.175415 (-0.493)
-0.091358 (-0.952) -0.297231 (-0.755) -0.041321 (-0.48) 0.059396 (0.408) -0.606191 (-1.846)*
-0.019749 (-0.222) -0.060799 (-0.542) 0.241266 (1.76)* 0.387662 (1.367) 0.172116 (0.149)
-0.067979 (-0.171) 0.014743 (0.057) -0.531692 (-1.395) 0.138081 (0.54) -0.380484 (-0.532)
-0.023752 (-0.557) -0.009984 (-0.739) -0.025228 (-0.991)
-0.134488 (-2.266)** -0.043423 (-0.699) 0.027011 (0.504)
-0.07839 (-1.135) 0.024032 (0.773) 0.000706 (0.014)
15701 0.001015 100161.4 1.203358 0.4666864 0 robust SE
7322 0.005043 42855.05 1.658110* 0.836559 0 robust SE
4591 0.001474 26376.23 0.657282 7.8e-38 0 robust SE
Frederik Bruns Windfall Profit in Portfolio Diversification? An Empirical Analysis of the Potential Benefits of Renewable Energy Investments Diplomica 2013 / 112 Seiten / 39,50 Euro ISBN 978-3-8428-8799-2 EAN 9783842887992
Copyright © 2015. Diplomica Verlag. All rights reserved.
Modern Portfolio Theory is a theory which was introduced by Markowitz, and which suggests the building of a portfolio with assets that have low or, in the best case, negative correlation. In times of financial crises, however, the positive diversification effect of a portfolio can fail when Traditional Assets are highly correlated. Therefore, many investors search for Alternative Asset classes, such as Renewable Energies, that tend to perform independently from capital market performance. 'Windfall Profit in Portfolio Diversification?' discusses the potential role of Renewable Energy investments in an institutional investor’s portfolio by applying the main concepts from Modern Portfolio Theory. Thereby, the empirical analysis uses a unique data set from one of the largest institutional investors in the field of Renewable Energies, including several wind and solar parks. The study received the Science Award 2012 of the German Alternative Investments Association ('Bundesverband Alternative Investments e.V.').
Christian Deger Behind the Curve An Analysis of the Investment Behavior of Private Equity Funds Diplomica 2013 / 80 Seiten / 29,50 Euro ISBN 978-3-8428-8910-1 EAN 9783842889101
Copyright © 2015. Diplomica Verlag. All rights reserved.
In aviation, getting “behind the power curve” usually refers to a situation, in which an aircraft is flying slowly at low altitude and there is not enough power to reestablish a controlled flight. The only option for the pilot to recover from this situation is to nose dive the aircraft in order to regain airspeed. In private equity, especially in the field of leveraged buyouts, fund managers are regularly confronted with a less dangerous, but similar situation. Facing low fund performance or having an overhang of uninvested capital puts fund managers “behind the curve” and requires measures for recovery. This study investigates the behavior of fund managers exposed to this kind of distressed situation by analyzing the effects on both financing structure and pricing of portfolio investments.
Bernhard Särve PIPE Investments of Private Equity Funds The temptation of public equity investments to private equity firms Diplomica 2013 / 80 Seiten / 29,50 Euro
Copyright © 2015. Diplomica Verlag. All rights reserved.
ISBN 978-3-8428-8911-8 EAN 9783842889118
Usually, private equity firms take control of firms which are privately held, and tend to act hidden. But, in recent years, the rising phenomenon of private investments in publicly listed companies, so-called PIPEs, could be observed. At first, this seems to be inconsistent but, it could become a perfect way to generate good returns. This book gives an overview about the PIPE market, and then focuses on the role of private equity funds. How do they invest in publicly listed firms? And what are their motivations? Is the overall performance of PIPE deals superior to those of traditional private deals? PIPE deals have much in common with typical venture capital deals with regard to the young and high-risk nature of target companies, and the minority ownership position. Surprisingly, buyout funds are relatively more engaged in PIPEs than venture funds are. The author analyzes deal sizes, industry sectors, holding periods, IRRs and multiples of public deals, and comparable private deals with a unique data sample on transaction level. Finally, he discusses other possible motives for private equity firms to engage in these deals: improved liquidity, fast process of deal execution, access to certain markets, avoidance of takeover premiums and the thesis of an escape-strategy for surplus investment money.
Christina Halder Finanzierung von M&A-Transaktionen Vendor Loans und Earnout-Strukturen Diplomica 2013 / 56 Seiten / 19,50 Euro
Copyright © 2015. Diplomica Verlag. All rights reserved.
ISBN 978-3-8428-8913-2 EAN 9783842889132
Im Rahmen der Hochkonjunktur von M&A-Transaktionen beschäftigte sich eine große Anzahl von Experten aus Wissenschaft und Forschung mit den im M&AKontext aufkommenden Fragestellungen. Die vorliegende Untersuchung beschäftigt sich mit dem Thema der Finanzierung von M&A-Transaktionen durch den Verkäufer. Die Zielsetzung besteht darin, die verschiedenen Möglichkeiten der Finanzierung von M&A-Deals durch den Verkäufer darzulegen. Es wird kritisch hinterfragt, welche Chancen eine derartige Lösung für die jeweilige Partei bietet und ob diese Chancen den möglichen Risiken überwiegen. Nach erfolgter Einführung in das Thema wird zunächst ein allgemeiner Überblick zu typischen Finanzierungsinstrumenten im Rahmen einer M&A-Transaktion gegeben. Es folgt eine Erläuterung zu den verschiedenen Möglichkeiten der Finanzierung des Kaufpreises durch den Verkäufer. Abschließend werden die aktuellen Entwicklungen auf dem M&A-Markt, insbesondere hinsichtlich der Finanzierungsstruktur von Transaktionen, beleuchtet, um zu erörtern, ob die Finanzierung durch den Verkäufer unter Annahme einer eingeschränkten, strengen Kreditvergabepolitik der Finanzinstitute zu einer Belebung des Transaktionsmarktes führen kann.
Mihaela Butu Shareholder Activism by Hedge Funds Motivations and Market’s Perceptions of Hedge Fund Interventions Diplomica 2013 / 60 Seiten / 19,50 Euro
Copyright © 2015. Diplomica Verlag. All rights reserved.
ISBN 978-3-8428-8914-9 EAN 9783842889149
In recent years, hedge funds’ successful interventions in some large public companies have revealed their critical role in the corporate governance landscape in the United States and Europe. Due to public opinion, this new form of shareholder activism is accompanied by much polemic. This study examines the nature of hedge fund activism, the types of them, and the market’s perception of interventions in the United States. Starting with a distinction between shareholder activism by traditional institutions, and activism performed by hedge funds, the study elucidates why the latter may be more effective in monitoring management, and reduce agency costs. Analysing the Schedules 13D filed with the U.S. Securities and Exchange Commission, the study provides a classification of activists’ demands into ten distinct categories, arguing that hostile forms of activism are not central for hedge funds, and some more aggressive types of activism are possibly used as a negotiating tool to achieve the activist’s agenda. Using the event study methodology, the author estimates the stock returns around the announcement date. For a better understanding of hedge fund activism, and their demands on target companies, the reader will find two original Schedule 13D filings accompanied by letters to the management. Finally, the paper concludes on a view of the subject through the prism of the 2007/ 2008 financial crisis, outlining some trends in the aftermath of the financial market turmoil.
Sarah Kumpf Listed Private Equity Investment Strategies and Returns Diplomica 2013 / 92 Seiten / 29,50 Euro ISBN 978-3-8428-8948-4 EAN 9783842889484
Copyright © 2015. Diplomica Verlag. All rights reserved.
Current research in the field of PE and buyout investments addresses the question how PE firms generate value by means of an investment into a portfolio company. Drivers of value generation are typically classified into governance, financial and operational capabilities of PE firms. In addition to these direct drivers of value, investment and portfolio management strategies differ with respect to the ways of acquiring and divesting a portfolio company and these different entry and exit channels can in turn offer distinct potential for value generation. Therefore, this paper first presents the investment and portfolio management strategies of PE firms coherently. The second objective is to establish a link between different investment strategies and the expected returns generated on the investor level. Listed PE allows analyzing the market's reaction to the announcement of investments and divestments within an event study and hypotheses were derived for both of these types of events.