Individual Retweeting Behavior on Social Networking Sites: A Study on Individual Information Disseminating Behavior on Social Networking Sites [1st ed.] 9789811573750, 9789811573767

This book explores and analyzes influential predictors and the underlying mechanisms of individual content sharing/retwe

256 33 5MB

English Pages XVII, 132 [143] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter ....Pages i-xvii
Introduction (Juan Shi, Kin Keung Lai, Gang Chen)....Pages 1-19
Literature Review and Theoretical Foundation (Juan Shi, Kin Keung Lai, Gang Chen)....Pages 21-37
Research Scheme Design (Juan Shi, Kin Keung Lai, Gang Chen)....Pages 39-60
Dominating Factors Affecting Individual Retweeting Behavior (Juan Shi, Kin Keung Lai, Gang Chen)....Pages 61-88
Direct Effect and Mediating Effect of Individual Retweeting Behavior on SNS (Juan Shi, Kin Keung Lai, Gang Chen)....Pages 89-108
Moderating Effect of Individual Retweeting Behavior on SNS (Juan Shi, Kin Keung Lai, Gang Chen)....Pages 109-124
Conclusion and Discussion (Juan Shi, Kin Keung Lai, Gang Chen)....Pages 125-132
Recommend Papers

Individual Retweeting Behavior on Social Networking Sites: A Study on Individual Information Disseminating Behavior on Social Networking Sites [1st ed.]
 9789811573750, 9789811573767

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Juan Shi Kin Keung Lai Gang Chen

Individual Retweeting Behavior on Social Networking Sites A Study on Individual Information Disseminating Behavior on Social Networking Sites

Individual Retweeting Behavior on Social Networking Sites

Juan Shi Kin Keung Lai Gang Chen •



Individual Retweeting Behavior on Social Networking Sites A Study on Individual Information Disseminating Behavior on Social Networking Sites

123

Juan Shi International Business School Shaanxi Normal University Xi’an, Shanxi, China

Kin Keung Lai College of Economics Shenzhen University Shenzhen, Guangdong, China

Gang Chen Beijing Sankuai Online Technology Co Ltd Beijing, China

ISBN 978-981-15-7375-0 ISBN 978-981-15-7376-7 https://doi.org/10.1007/978-981-15-7376-7

(eBook)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

For my husband Chen Gang and sweet heart son Chen Chen. Juan Shi

Preface

The development of information technology and emergence of Social Networking Sites (SNS) have changed the way information is created, transmitted, and consumed. Every SNS user can report what happened around them at the first time, analyze, recommend, and comment on news anywhere and anytime. Besides, they can retweet (also called forward, share or disseminate in this monograph) any post as they like. As a result, SNS not only decentralizes information production and consumption, but also expedites information dissemination through interpersonal interactions such as the retweeting behavior. Actually, individual retweeting behavior plays a pivotal role in information diffusion on social network sites, without which tweets cannot be delivered to audiences other than the author’s followers. It is quite natural to ask a series of questions about this behavior, which is probably repeated by the majority of us on SNS at home and abroad. For example, is individual retweeting behavior a random occurrence? If not, how to prove that? What factors will lead to individual retweeting decision and are there any dominating ones? Besides, how do these factors affect individual information dissemination behavior on SNS? In other words, what are the underlying mechanisms of individual retweeting behavior? Furthermore, providing personalized advertising has become one of the hottest trends in online advertising. Well, are individuals equally influenced by personalized content when making retweeting decisions? If not, what factors moderate the relationship between topical relevance and individual retweeting behavior? Answering these questions is necessary and important. For platforms themselves, they cannot decide what content to recommend for different users unless the platforms understand what kind of information the user prefers to share on the platform. If there is a match between the recommendation content and users’ expectations, the degree of user satisfaction and loyalty will be enhanced and the probability for the content to be shared by the user may be increased remarkably. These results are desirable for social media platform, as the sustainability and prosperity of a platform not only relies on the number of users it owns but also how active these users are as content contributors. In addition, marketers recently exert themselves to elicit Word of Mouth (WOM) from SNS users, as WOM, especially vii

viii

Preface

from their friends, has a decisive impact on consumers’ purchasing decision. Thus figuring out influential factors that affects individual retweeting behavior are vital for marketers to position their product and service and accordingly, devise effective marketing strategies on SNS. To answer these questions, we crawled more than 60 million posts from Twitter using Twitter API and carried out empirical research on individual disseminating behavior. Specifically, we examine whether individual information dissemination behavior is random or not from two different perspectives. After that, we rank the relative importance of factors by various feature selection algorithms and identify a subset of dominating features. And then we examine determinants of individual retweeting behavior from the perspective of an information receiver and construct a conceptual model based on the Elaboration-Likelihood Model (ELM). In the end, we focus on the relationship between topical relevance and individual retweeting behavior. The main innovation of this monograph is as follows: 1. We verify that individual information dissemination behavior on SNS is not random at all and propose a new feature selection algorithm which considers both relevance and redundancy of features. Using this algorithm, we pick out the dominating features which have an influential impact on individual dissemination behavior on SNS. Our research finds that among the most dominating six features, topical relevance and social tie strength are the most important factors, followed by #mention(@), #URL, the retweeted times of a message, and #hashtag. However, author-related factors are of the lowest importance and almost negligible. Comparison experiments show that under SVC or logistic regression, using dominating features can even improve the prediction performance to some extent. By picking out dominating features, this research not only reduces the cost of collecting features, helps us better understand individual forwarding behavior, but also makes sure that the prediction performance will not deteriorate. Therefore, the curse of dimensionality is avoided effectively. 2. Based on the above work, we carry out a comprehensive investigation of individual retweeting behavior. From the perspective of an information receiver, we consider all involved aspects of a social communication process. Based on ELM, we propose a conceptual model of individual retweeting behavior on SNS and then verify this model using Twitter data. In this model, topical relevance and information richness (#URL, #hashtag) belong to the central route as both factors require effortful elaboration. Social tie strength, informational social influence, and other factors belong to the peripheral route as these factors do not require individuals to scrutinize the message arguments and allow them to make quick decisions. Analysis results show that both routes have significant effects on individual retweeting decisions. Among them, topical relevance, social tie strength, and value homophily are the most important ones, followed by information richness, #mention and informational social influence. Author-related factors such as source trustworthiness have trivial impacts. Besides, we validate that social tie strength partially mediates the effect that value homophily has on individual retweeting behavior. This study expands the

Preface

ix

application area of ELM and offers at least one explanation for the contradictory findings about the effect of homophily on individual sharing behavior. 3. We investigate three types of moderators that moderate the effect of topical relevance on individual retweeting decisions, including individual characteristics, characteristics of tweets, and interpersonal relationships. A hierarchical linear model is employed to testify the moderating effects using Twitter panel data. The comparative experiment and robustness test show the superiority and stability of the hierarchical model. We find out that the impact of topical relevance is stronger for individuals with larger number of followers. However, the moderating effects of individual cumulative experience and gender are not significant. Users who tend to produce longer original tweets are more likely to expend more cognitive effort and consider more about topical relevance when making retweeting decisions. Besides, the effect of topical relevance is stronger for shorter time intervals, that is, for active individuals on SNS. When a tweet comes from a followee with similar tastes and preferences, the individual is prone to rely on the peripheral route and thus the impact of topical relevance is weaker. However, the impact of topical relevance is stronger for tweets coming from strong ties. This research expands the ELM theory and deepens our understanding of the impact that topical relevance has on individual forwarding behavior. The research reveals that this impact will be influenced by individual characteristics, characteristics of tweets, and interpersonal relationships and can provide guidelines on how to take advantage of topical relevance in online marketing. Xi’an City, Shaan’xi Province, China May 2020

Juan Shi

Acknowledgements

The research in this monograph is supported by the Fundamental Research Funds for the Central Universities (Program No. 19SZYB28) and Natural Science Basic Research Program of Shaanxi (Program No. 2020JQ-427). Xian Social Science Fund project (Program No. WL108).

xi

Contents

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

1 1 1 3 8 8 9 10 10 11 12 14 17

2 Literature Review and Theoretical Foundation . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Definition of Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Social Networking Sites . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Tweet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 User’s Behavior on SNS . . . . . . . . . . . . . . . . . . . . . . . 2.3 Explanation-Oriented Studies on Individual Retweeting Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Research Emphasizing the Information Carrier . . . . . . . 2.3.2 Research Emphasizing both the Information Source and the Information Carrier . . . . . . . . . . . . . . . . . . . . . 2.3.3 Research Emphasizing Individual Preferences . . . . . . . . 2.3.4 Research Emphasizing Relationships . . . . . . . . . . . . . . . 2.4 Prediction-Oriented Studies on Individual Retweeting Behavior . 2.5 Theoretical Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

21 21 22 22 22 23 23

.. ..

24 26

. . . . .

27 27 27 28 30

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Research Background and Research Questions . 1.1.1 Research Background . . . . . . . . . . . . . 1.1.2 Research Questions . . . . . . . . . . . . . . . 1.2 Research Significance . . . . . . . . . . . . . . . . . . . 1.2.1 Theoretical Significance . . . . . . . . . . . . 1.2.2 Practical Significance . . . . . . . . . . . . . . 1.3 Research Content and Technical Route . . . . . . 1.3.1 Research Content . . . . . . . . . . . . . . . . . 1.3.2 Structure of the Monograph . . . . . . . . . 1.3.3 Technical Route . . . . . . . . . . . . . . . . . 1.4 Main Innovation and Contributions . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . .

xiii

xiv

Contents

2.5.1 The Elaboration-Likelihood Model . . . . . . 2.5.2 Why Using ELM in the Current Research . 2.6 Commentary on Related Literature . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

3 Research Scheme Design . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 A Conceptual Framework of Individual Retweeting Behavior on SNS . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Analyzing Factors on the Central Route . . . 3.2.2 Analyzing Factors on the Peripheral Route . 3.2.3 The Mediating Role of Social Tie Strength . 3.2.4 Analyzing Moderators on the Central Route 3.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . 3.3.2 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Data Analysis Methods . . . . . . . . . . . . . . . 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

30 31 33 35

........... ...........

39 39

. . . . . . . . . . .

. . . . . . . . . . .

40 40 41 44 45 49 49 49 54 57 57

.. ..

61 61

.. .. ..

62 62 63

. . . .

. . . .

65 66 69 75

..

75

..

79

. . . . .

79 82 83 86 87

. . . .

. . . .

. . . . . . . . . . .

. . . .

. . . . . . . . . . .

. . . .

. . . . . . . . . . .

. . . .

. . . . . . . . . . .

. . . .

. . . . . . . . . . .

. . . .

. . . . . . . . . . .

. . . .

. . . . . . . . . . .

. . . .

. . . . . . . . . . .

4 Dominating Factors Affecting Individual Retweeting Behavior . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Verification of the Non-randomness of Individual Retweeting Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Verification from Social Interaction Perspective . . . . . . . 4.2.2 Verification from Topical Perspective . . . . . . . . . . . . . . 4.3 Ranking Factors Affecting Individual Retweeting Behavior—An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 A Highly Discriminating Feature: Topic_distance . . . . . 4.3.2 Ranking Factors on a Specific User . . . . . . . . . . . . . . . 4.4 Ranking Factors on a Large Sample . . . . . . . . . . . . . . . . . . . . 4.4.1 Feature Selection Using Filter Models on a Large Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Feature Selection Using Hybrid Model on a Large Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Feature Selection Using Other Method on a Large Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Analysis of Feature Selection Results . . . . . . . . . . . . . . 4.5 Prediction Performance of Salient Factors . . . . . . . . . . . . . . . . 4.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . . . . . . . . .

. . . . .

Contents

xv

5 Direct Effect and Mediating Effect of Individual Retweeting Behavior on SNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Data Pre-processing and Descriptive Analysis . . . . . . . . . . 5.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Hypotheses Test of Factors on the Central Route . . . 5.4.2 Hypotheses Test of Factors on the Peripheral Route 5.4.3 Hypotheses Test of the Mediating Effect . . . . . . . . . 5.4.4 Ranking of Factors . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Understanding Users with Different Retweeting Behavior . . 5.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. 89 . 89 . 90 . 93 . 94 . 94 . 94 . 96 . 96 . 98 . 107 . 108

6 Moderating Effect of Individual Retweeting Behavior on SNS . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Data Pre-processing and Descriptive Analysis . . . . . . . . . . . . 6.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Hypotheses Test of Moderating Factors—Individual Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Hypotheses Test of Moderating Factors—Interpersonal Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 Hypotheses Test of Moderating Factors—Tweet Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Model Diagnostics: Individual Heterogeneity . . . . . . . . 6.4.5 Robustness Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

118 119 120 120 123

7 Conclusion and Discussion . . . . . . . . 7.1 Summary of Findings . . . . . . . . . 7.2 Contribution and Implications . . . 7.2.1 Implications for Research . 7.2.2 Managerial Contribution . . 7.3 Limitations . . . . . . . . . . . . . . . . . 7.4 Directions for Future Research . . References . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

125 125 128 128 130 130 131 131

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

109 109 112 114 116

. . . 116 . . . 118

Acronyms

eWOM

ELM

LSA

SNS

UGC

Electronic Word of Mouth, is a form of buzz marketing, focuses on person-to-person contacts that happen on the Internet. Thus, eWOM can become viral if the message is persuasive or funny enough and has been recognized as one of the most influential resources of information transmission. Elaboration-Likelihood Model, a dual-processing model of information processing proposes that attitude change and consequent behavior change among individuals may be caused by two routes of influence: the central route and the peripheral route. ELM has been considered as the most popular and useful persuasion model in consumer research and social psychology. Latent Semantic Analysis algorithm, an information retrieval technique in natural language processing, sometimes called Latent Semantic Indexing (LSI), analyzes relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. Social networking sites, it provides free Web space and tools for its community members to build profiles, interact, share, connect, create, and publish content User Generated Content, alternatively known as User-Created Content (UCC), refers to any form of content, such as images, videos, text, and audio, which has been posted by users on online platforms such as social networking sites and wiki.

xvii

Chapter 1

Introduction

1.1 Research Background and Research Questions 1.1.1 Research Background Web 2.0 technology emphasizes user-generated content (UGC), usability and interoperability for end users. Based on Web 2.0 technology, Social Networking Sites (SNS) enable users to publish their daily activities, express their opinions, likes and dislikes at the speed of thought, giving them the ability to express themselves and connect with other people, to be heard, and to feel a sense of worth and importance. Recently, with the development of internet technology and popularity of smart devices, SNS have experienced a tremendous surge in user base. For instance, as of the first quarter of 2017, Facebook has 1.94 billion monthly active users1 and Twitter has 328 million monthly active users.2 Sina Weibo,3 China’s leading microblogging service provider, has 340 million monthly active users as of the end of March 2017, overtaking Twitter in active user totals.4 Besides, Weichat has 889 million monthly active users as of the end of March 2017.5 The popularity and penetration of SNS has revolutionized the way information is sought, produced and disseminated in modern society. Different with the age of Web 1.0 where users passively receive information from information portal websites without being given the opportunity to post reviews, comments, and feedback, every SNS user is not only an information consumer but also an information producer, as they can report what happened around them at the first time, analyze, recommend and comment on news anywhere and anytime. As a result, SNS can even beat mainstream media in terms of speed, scope, and flexibility, especially in tracking events as they unfold in real time [1]. For example, Osama bin Laden’s death was posted 1 https://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-

worldwide/. 2 https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/. 3 http://weibo.com. 4 http://www.chinadaily.com.cn/business/tech/2017-05/18/content_29393533.htm. 5 http://b2b.toocle.com/detail--6394894.html.

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 J. Shi et al., Individual Retweeting Behavior on Social Networking Sites, https://doi.org/10.1007/978-981-15-7376-7_1

1

2

1 Introduction

by Mr. Keith Urbahn on Twitter one hour and eleven minutes earlier than the formal announcement by U.S. President Barack Obama in 2011.6 In China, public events and political scandals are often disclosed on Sina weibo at the first time.7 Liu [2] sampled 110 negative public events which broke out in 2013 in China and found that 47% of these events were first exposed by SNS users. This is because SNS offer people at the grassroots level opportunities to articulate their voices and closely contact with social mainstream public. It is not an exaggeration to say that SNS provide arenas for democratic participation and citizen journalism [3], and have transformed one-way transmission mode into multil-party multi-directional radiation interactive mode [2]. Actually, SNS have manifested their great power not only in the public sphere, but also in online marketing domain. Nowadays, more and more individuals use online discussion forums, social networking sites, and other Web 2.0 tools to communicate their experience and opinions about products and (or) services. Researchers have found that users depend on online reviews written by unknown consumers more than they depend on traditional media.8 Electronic Word-of-Mouth (eWOM) is found to significantly influence consumer purchasing decisions, loyalty, engagement, and total retail sales [4–6]. As a result, companies are embracing SNS, such as Facebook, Twitter, Sina Weibo, Wechat, etc., as a marketing tool to increase brand awareness, foster relationships, promote company’s presence and reputation on the internet. Take Oreo—an American company famous for its sandwich cookies—as an example, it triggered a social epidemic on Twitter by tweeting “you can still dunk in the dark” during the third quarter of Super Bowl XLVII when a power outage happened. Figure 1.1 shows this tweet, which was retweeted almost 15,000 times on Twitter and garnered nearly 20,000 likes on Facebook.9 Given the fact that advertisers are spending nearly 4 million to run a spot during the Big Game, Oreo has made a success on SNS at lower or almost no cost. Figure 1.2 shows the result of a survey carried out in December 2016, which aims to investigate the development of internet in China. Among the companies which have ever adopted internet marketing, 83.3% of them are embracing mobile internet for marketing business, which has almost doubled the ratio in 2015: 46%. Especially, as many as 67.8% of them have employed paid promotion. It can be expected that more and more companies will rely on mobile internet for marketing business and the market size will keep growing at a very fast clip. Figure 1.3 demonstrates the usage ratio of different mobile internet marketing channels. Considering the fact that Weichat has 889 million monthly active users as of the end of March 2017, it should come as no surprise that Wechat is the most popular mobile internet marketing channel for enterprises to promote their products and services.

6 @keithurbahn,

http://twitter.com/keithurbahn.

7 http://yuqing.people.com.cn/n/2014/0318/c364391-24662668.html. 8 http://www.bazaarvoice.com/research-and-insight/social-commerce-statistics/. 9 https://www.wired.com/2013/02/oreo-twitter-super-bowl/.

1.1 Research Background and Research Questions

3

Fig. 1.1 A tweet posted by Oreo during the Super Bowl when a power outage happened

Fig. 1.2 Survey on enterprises using mobile internet for marketing business

1.1.2 Research Questions When surfing on SNS, we can often encounter various advertisements published by companies. People can have different responses to these online contents. For example, you can retweet it to your followers, comment this advertisment, or add it

4

1 Introduction

Fig. 1.3 Usage ratio of different mobile internet marketing channels

to your favorite list for later use. From Fig. 1.4, we can see that compared with other behaviors such as commenting a post or adding it to favorites list, the most frequent response to advertisements on Sina Weibo is retweeting. On SNS, an individual can easily retweet other-sourced posts to all his or her followers (i.e., people who subscribe to this individual’s account to receive this individual’s updates) just with a single click on their computers or mobile devices.10 Indeed, nearly 60% of people report that they frequently share online content with others [7]. Users’ retweeting behavior plays a pivotal role in information diffusion on social networking sites [8], without which tweets cannot be delivered to audiences other than the author’s followers. Due to such inter-personal relaying behavior, the original post is able to travel from one distinct subgroup of referral actors to another subgroup, reaching an audience far beyond the author’s own followers. Especially when a post gains numerous people’s attention and gets retweeted tens and thousands of times, we can say that a social epidemic is formed on SNS. From the above-mentioned Oreo example, we know that manufacturers and retailers can benefit a lot from triggering a social epidemic related with their products or services, including promoting their brand and presence on the internet, reaching actual or prospective customers quickly, widely and at a lower cost [9]. Not surprisingly, SNS are viewed as cheaper and more effective marketing channels than traditional media.

10 In

this work, the term “disseminate”, “retweet” and “share” are used interchangeably, referring to sharing an other-sourced tweet to all one’s followers.

1.1 Research Background and Research Questions

5

Fig. 1.4 The distribution of users’ responses to advertisements on Sina Weibo

However, not every company is as successful as Oreo in inducing individuals to retweet online content to their followers [10]. Promotion messages directly delivered to all available consumers may be treated as spams, and result in user dissatisfaction and a high advertising cost for the merchants [11]. Thus, figuring out what factors lead to individual information dissemination behavior can help us understand the driving force behind social epidemics, devise sensible online marketing strategies, or effectively intervene in online rumors in case of emergencies. In this monograph, we will study individual retweeting behavior from the the following aspects: Research Question 1: Is individual retweeting behavior a random occurrence? All of existing research on retweeting is based on an unstated premise—an individual’s retweeting behavior is not a random occurrence. And this premise is taken as a given fact and left unsubstantiated. We argue that re-examining this unstated assumption is necessary, because any prediction work will be rendered meaningless without this premise. Specifically, if randomness dominates a user’s retweeting decisions, this user will randomly reweet the posts that run across his or her home timeline.11 As a result, there is no need for researchers to figure out the underlying mechanisms of individual retweeting behavior, as this is a random phenomenon. Otherwise, it might be the opposite: individual retweeting behavior is not random and certain antecedents are contributing to this behavior. Fox example, tweets that are not so appealing to the individual are filtered out and only some posts deemed interesting, important, or entertaining are retweeted to the the individual’s followers. If this is the case, we 11 Take Twitter as an example, home timelines are what every Twitter user sees on their home page by default—a stream of tweets from all the people they follow, which gets updated in real time.

6

1 Introduction

need to dig out the causal behavioral mechanisms that explain individual retweeting decision. Thus, in this monograph, we will examine whether individuals’ retweeting behavior is random or not. Research Question 2: What are those salient factors leading to individual retweeting decision? In other words, are all the factors equally important in affecting individual retweeting decision? When launching campaigns on social networking sites, marketers always expect to attract as many people as possible within a certain budget. To achieve this goal, should marketers rely on web celebrities to trigger a “cascade”, or should they focus on the the message content, such as catering to customers’ topics of interest, paying special attention to the sentiment of the advertisement? To answer this question, we need to investigate the relative importance of the features (i.e., factors) which have an impact on individual retweeting decisions. While some research has been carried out to predict or explain individual retweeting behavior, to the best of our knowledge, virtually no scholarly effort has been undertaken to understand the relative importance of those factors in affecting individual retweeting decision. Instead, lots of features are indiscriminately introduced into the prediction model without examining the relevance of these features. As a result, the existence of irrelevant/redundant features not only increases the data collection cost and tends to generate an overfitted model which predicts poorly on future observations not used in model training, known as the curse of dimensionality [12], but also hinders us from understanding which factors are actually dominating an individual’s retweeting behavior. As with other social behaviors, there are many factors which may have an impact on individual retweeting decision, among which some may be very important and others may have trivial influence. James et al. [12] point out that it is often the case that only a small fraction of independent factors are substantially associated with the response variable. Thus, in this monograph, we will find out which factors have substantial influence on individual retweeting decision. We believe that knowledge about the priorities of those influential factors helps us have an in-depth understanding of individual retweeting behavior and thus guides marketers to devise effective and efficient marketing strategies on SNS. Research Question 3: How do determinants affect individual information dissemination behavior on SNS? A number of researchers have examined how message content affects individual retweeting decision. They find out that characteristics of the content, such as the amount of sentiment [10, 13], argument quality and information quality [14–17], popularity of the content [18] and so on, have significant impacts on retweeted times of a message or individual retweeting decision. Besides, source-related (i.e., authorrelated) factors, such as source credibility [14, 16], source attractiveness [15], are also found to influence individual retweeting decision. Furthermore, researchers have studied the impacts of individual heterogeneity and preferences on their motivation of eWOM sharing behavior [19, 20]. However, the scope of prior studies on the deter-

1.1 Research Background and Research Questions

7

minants of individual information dissemination decisions is large and fragmented, and little has been done to integrate existing findings. Furthermore, it is worthwhile to note the fact that although the retweeting action is technically easy to implement, individual retweeting behavior is quite spontaneous, arbitrary and different individuals may have different reactions to the same post. When an individual gets a new post from his or her followees,12 he or she may either choose to retweet them so that all the followers will see the post immediately, or may do nothing and ignore the new message. This increases the difficulty for enterprises to induce users to retweet online advertisements. We argue that it is better to investigate the retweeting decision from an information receiver’s perspective. After all, the retweeting decision is made by the information receiver who has his or her own preference, interests and needs. Surprisingly, most existing studies pay more attention to the information source and the tweet itself, and few researchers highligth receivers’ role on this issue [21]. Thus, in this monograph, we will use the information receiver’s perspective to figure out what and how do determinants affect individual information dissemination behavior. We believe a systematical examination of individual information dissemination behavior from the information receiver’s perspective is beneficial for us to grasp the whole picture of this issue. Research Question 4: Are individuals equally influenced by personalized content when making retweeting decisions? Prior studies have found that an advertisement’s relevance influences individual reactions, including paying closer attention to the ad [22], showing favorable attitudes towards the ad [23–25], or disseminating the message by sharing it with others [20]. Thus it is understandable that providing personalized advertising has become one of the hottest trends in online advertising. However, in terms of information diffusion, we are not aware whether the relationship between the relevance of a post and individual retweeting behavior applies equally or differentially across different user groups. To put it another way, are individuals equally influenced by the content’s relevance when making sharing decisions? We claim that the answer to this question is important for theoretical as well as practical reasons. Theoretically, such research can enrich the online marketing literature by examining moderating factors that mitigate the effect of content’s relevance on individual retweeting behavior and can deepen our understanding of such behavior. For practitioners, the research findings provide guidances on strategically customizing advertisements on SNS and thus help marketers improve the effectiveness and efficiency of their online marketing campaigns. In this monograph, we will investigate three types of factors which moderate the impact of topical relevance on individual retweeting decision.

12 An individual can subscribe to other users to receive their updates timely. And these users are this

individual’s followees, also called friends in this paper.

8

1 Introduction

1.2 Research Significance 1.2.1 Theoretical Significance Social networking sites are the online version of social networks. With the development of internet technology and smart devices, social networking sites have become an indispensable part of our daily life. Given the fact that individual information dissemination behavior plays a pivotal role in information diffusion on SNS, investigating individual retweeting behavior on SNS is one of the key issues in SNS studies and has attracted tremendous interest from both academia and industry. Specifically, investigating individual information dissemination behavior has the following theoretical significance: • It helps to understand individual behaviors on social networking sites and information diffusion on social networking sites. The mechanisms underlying individuallevel information dissemination behavior can shed some light on the formation of social epidemic, as such macrolevel collective outcome is the aggregation result of countless individuals’ retweeting decisions and thus also depends on microlevel individual decisions. • The Elaboration-Likelihood Model (ELM) is primarily employed to examine information adoption behavior, such as making purchase decisions or becoming interested in certain product. We argue that it can also be used to investigate individual information dissemination decision, because information dissemination behavior is a clear and visible sign that the message is actively processed by individuals and hence a behavioral indicator of message salience. Using the ELM as the overarching theoretical framework, we provide empirical evidence that its applicability extends to the context of information dissemination on SNS, thus paving the way for subsequent research along those lines. Under this ELM framework, we identify influential determinants based on information processing theory, social tie, bandwagon effect, etc. and integrate them into the theoretical model to examine the influence process involved in individual retweeting decision making process. As a result, our study promotes the integration of ELM and other research areas. • Existing studies are controversial about the impact that interpersonal relationships have on individual retweeting behavior. Chu and Kim [26] reveal that a negative relationship exists between perceived attitude homophily and individual opinion seeking or opinion passing behavior. However, Harrigan et al. [27] show that users are more likely to retweet messages from close relations/community structures. And Ma et al. [28] find that homophily exerts no significant influence on intention to share news in social media. Based on homophily principle, we propose that social tie strength mediates the effect of value homophily on individual information dissemination behavior. The validated empirical result partially explains the conflicting findings about the impact that interpersonal relationships have on individual retweeting decision in prior research.

1.2 Research Significance

9

• Although the ELM illustrates well how personal characteristics can change the decision-making process, it does not examine how characteristics of tweets or interpersonal relationships between the information source and the information receiver can also affect this process. Motivated by electronic word-of-mouth (eWOM) research [29] and practical observations, we extend the ELM by investigating three types of moderators, including individual characteristics, characteristics of tweets, and interpersonal relationships. This deepens our understanding about how personalized content affect individual retweeting decision and provides empirical support to classify potential customers.

1.2.2 Practical Significance Companies often launch online advertisement campaigns in the hope that people will share the content with others, but some of these efforts take off while others fail. This is largely due to the wealth of information on SNS and the scarcity of attention [30]. Customers are becoming less receptive to unsolicited marketing communication, especially if it is irrelevant or impersonal [31]. Thus, investigating individual information dissemination behavior has the following practical significance: • It deepens enterprises’ understanding of customers’ responses to online content and their behaviors on social networking sites. Understanding what influence processes motivate individuals to share online contents is beneficial for companies to devise effective and efficient online marketing strategy, leverage social networking sites to promote their brand and products on the internet, reach actual or prospective customers quickly, widely and at a lower cost. Besides, knowledge about the relative importance of features help enterprises to focus on salient features and make the most of their online advertisement campaigns on SNS. • Nowadays, personalized advertising is widely used in online advertising. Our study of moderating effects that mitigate the effect of content’s relevance on individual sharing behavior can help enterprises realize and utilize the characteristics of different groups of customers. Thus, the research findings provide guidance for market segmentation and precision marketing. • Knowledge about individual retweeting behavior helps to find out the factors behind online public opinion dissemination, provides support for the government to manage and direct the transmission of public opinion on SNS. Especially in case of emergency events, the government can intervene the transmission of rumors on SNS and apply SNS to spread useful information based on the findings of our study. • The sustainable growth of SNS depends on users’ active participation behaviors, such as contributing original contents, sharing information with their followers, commenting on posts, and so on. Mechanisms behind users’ information dissemination behavior can provide insight to SNS service providers regarding the factors that may retain active and regular users and evolve infrequent users into committed ones.

10

1 Introduction

1.3 Research Content and Technical Route 1.3.1 Research Content In order to answer the research questions proposed in Sect. 1.1.2, the research content of this monograph is arranged as follows, shown in Fig. 1.5. 1. First, we explore individual retweeting behavior on SNS from the following aspects. • We prove that individual retweeting behavior is not random from two perspectives: 1. the existence of an individual’s favorite topics; 2. the existence of an individual’s close friends. • We evaluate the importance of features based on seven different feature evaluation methods, which fall into three categories: 1. filter model; 2. hybrid model; 3. other methods. Based on the evaluation results, we pick out salient features which have substantial influence on individual retweeting decision. • We compare the prediction performance of those salient features and that of the full feature set using multiple performance metrics and multiple classification techniques.

Fig. 1.5 Research content of this monograph

1.3 Research Content and Technical Route

11

2. Next, we try to explain individual retweeting decision by focusing on the influence process involved in an individual’s decision-making process. We ground our explanation on underlying causal relationships between different variables. • We identify factors which have an impact on individual retweeting behavior based on prior studies and various theories such as information processing theory, bandwagon effect, etc. And then by adopting the ELM as the overarching theoretical framework, we integrate and classify those factors into the central and peripheral routes and finally formulate our research model. • Based on homophily principle, we propose and validate that social tie strength partially mediates the effect of value homophily on individual information dissemination behavior. 3. Finally, we look into the central route of the research model and investigate moderating factors on the central route based on ELM. Specifically, we propose three types of moderators that mitigate the effect of content’s relevance on individual retweeting behavior on SNS. These moderators include: • Individual characteristics such as the user’s cumulative experience on SNS, the user’s gender, and the user’s social connectedness. • Tweet characteristics such as tweet length and tweet interval. • Interpersonal characteristics such as social tie strength and value homophily.

1.3.2 Structure of the Monograph The monograph consists of seven chapters. The content of each chapter is briefly introduced as follows. 1. In this chapter, we first elaborate the research background and then propose the research questions which will be dealt with in the monograph. Research significance at both theoretical and practical aspect is then presented. And then we introduce the research content, structure, and technical route of this monograph sequentially. Research methods mainly adopted are also briefly introduced. Finally, we highlight the main innovation and contribution of this monograph. 2. Chapter 2 first presents the definition of related concepts in the monograph and then reviews an extensive body of literature about individual retweeting behavior. The literature are mainly divided into two categories according to the research purpose: (1) explanation-oriented research; (2) prediction-oriented research. 3. In Chapter 3, we introduce the research scheme design of this monograph. First, we develop a conceptual framework about individual information retweeting behavior on SNS based on the ELM and related theories. And then we elaborate the data-collection, preprocessing and measurement of variables. Finally, statistical analysis methods involved in this monograph are briefly introduced. 4. Chapter 4 explores individual retweeting behavior from several aspects. First, we validate the non-randomness of individual retweeting behavior. And then, we

12

1 Introduction

prioritize the importance of features using seven feature evaluation methods and thereby a subset of features which have substantial effects on individual retweeting behavior are picked out. Finally, we examine the prediction performance of these salient features. 5. Chapter 5 validates hypotheses about the direct effect and mediating effect of individual retweeting behavior using data crawled from Twitter. And then the relative importance of different determinants is examined. Finally, we verify that social tie strength partially mediates the effect of value homophily on individual information dissemination behavior. 6. Using data crawled from Twitter, Chap. 6 analyzes and tests three types of moderators that mitigate the effect of content’s relevance on individual retweeting behavior on SNS. 7. Chapter 7 presents the overall findings of this monograph on individual information dissemination behavior on SNS, summarizes the contribution and implications of this research, and finally discusses the limitations and future research direction.

1.3.3 Technical Route The technical route of this monograph is shown in Fig. 1.6. Here we give a brief introduction about the research methods adopted in this monograph. 1. Literature survey The literature survey method refers to searching, analyzing, grouping, and synthesizing an extensive body of related studies, which creates a firm foundation for advancing knowledge. A thorough literature survey helps to get a well understanding of research situation on a certain topic, identify key concepts, find the discrepancy of existing studies and thereby provides the basis for the current research. In this monograph, we adopt the literature survey method to analyze and summarize prior research, find their gaps, identify key constructs, and thereby support those hypotheses proposed in this monograph. 2. Content analysis Content analysis is a means of analyzing texts. This includes all sorts of texts, from newspaper articles to transcripts of interviews and from descriptions of pictures to written recollections [32]. A collection of terms are subsumed under the term content analysis, such as systematic content analysis [33], meaning analysis, quantitative content analysis, qualitative content analysis and hermeneuticclassificatory content analysis [34]. Berelson [35] point out that the elements of objectivity, system, and quantification in connection with the manifest content were characteristic of the methodological approach of quantitative content analysis. But over time, it has expanded to also include interpretations of latent content.

1.3 Research Content and Technical Route

13

Fig. 1.6 Structure of this monograph

Nowadays, computer programs provide a variety of techniques for the management and analysis of textual data. In this monograph, based on Latent Semantic Indexing algorithm [36] and understanding of natural language of Twitter posts, we extract each user’s topics of interest and the topic distribution of a particular tweet. We also measure the information richness of a tweet by counting the number of URLs and hashtags contained in a tweet. Besides, under the help of automated

14

1 Introduction

sentiment analysis technique, we extract the sentiment of a post efficiently and denote the sentiment in two dimensions: polarity and emotionality. 3. Statistical analysis Statistical analysis refers to collecting, presenting, exploring and analyzing large amounts of data to discover underlying relationships, patterns and trends in the data, which can be used to interpret and predict the phenomenon in reality and support our decision-making process. This monograph mainly use STATA to analyze the secondary data crawled from Twitter and test the hypotheses by doing descriptive statistical analysis and establishing appropriate statistical model. For example, we have employed panel logit model, negative binomial regression, and multiple linear model to validate the conceptual model.

1.4 Main Innovation and Contributions Investigating factors affecting individual retweeting behavior and the underlying mechanisms of individual retweeting behavior has important theoretical and practical significance. The research findings can provide insight for keeping the sustainable growth of SNS, developing online marketing strategies and managing online public opinions. This monograph focuses on individual information dissemination behavior on SNS and makes several important contributions to research and practice. First, we carry out an exploratory research on individual retweeting behavior on SNS. The main innovations are as follows. 1. We verify that individual retweeting behavior is not a random occurrence, whether from social interaction perspective or from the perspective of individual topic preference. Besides, the retweeting history reveals that individuals show preferences for certain topics when retweeting posts. 2. We propose a new feature selection method which considers both the relevance of a feature with the target class and the redundancy between different features. By using our method, we pick out a subset of dominating features. We also testify the result of our method using other feature selection methods. The conclusion is very consistent: topical relevance and social tie strength are the most important factors, followed by #mention, #URL, the retweeted times of a message, and #hashtag. However, author-related factors are of the lowest importance and almost negligible. 3. By comparing the prediction performance of the salient features and that of the full feature set in terms of multiple performance metrics, we confirm that salient features can not only save the cost of measuring trashy features but also improve the prediction performance moderately under some classification algorithms such as SVC or logistic regression. The exploratory study is preliminary which focuses on correlation relationship but not on causal relationship. Based on the findings of the first part, we carry out an explanation-oriented research to examine the underlying mechanisms of individual

1.4 Main Innovation and Contributions

15

retweeting behavior on SNS. Considering the fact that the retweeting decisions are made by specific individuals, in this monograph, we propose to employ the information receiver’s perspective and focus on the information receiver’s decision making process when investigating this issue. Specifically, the main innovations are as follows. 1. The scope of prior studies on the determinants of individual information dissemination decisions is large and fragmented, and little has been done to integrate existing findings. Based on the Elaboration-Likelihood Model, we propose an overarching research framework of individual retweeting behavior on SNS. This conceptual model integrates and classifies influential features into two information processing routes: the central route and the peripheral route. Specifically, topical relevance and information richness (#URL, #hashtag) belong to the central route as these two factors require individuals to concentrate on the content and thus call for more effortful elaboration. Social tie strength, informational social influence, value homophily and other factors belong to the peripheral route as these factors do not require individuals to scrutinize the message arguments and allow them to make quick decisions. 2. Analysis on Twitter data show that both the central route and the peripheral route have significant effects on individual retweeting decisions. Among them, topical relevance, social tie strength and value homophily are the most important ones, followed by information richness(#URL, #hashtag), #mention and informational social influence. Author-related factors such as source trustworthiness have trivial impacts. 3. The impacts of the relationships between the source and the receiver on the receiver’s information retweeting behavior are still controversial. We propose and validate that social tie strength partially mediates the effect of value homophily on individual retweeting behavior, which provides knowledge about how value homophily works to affect individual retweeting behavior on SNS. It also offers at least one explanation for the contradictory findings about the effect of homophily on individual sharing behavior in previous research. Different with existing research which ignores the receiver’s role and does not fully consider the influencing factors, this study employs the information receiver’s perspective to examine the information receiver’s decision making process, comprehensively considers all aspects of information dissemination in the model, and reveals the significant impact of topical relevance on individual retweeting behavior. By adopting the ELM as the overarching theoretical framework, we provide empirical evidence that its applicability extends to the context of information dissemination on SNS, thus paving the way for subsequent research along those lines. Besides, the research of mediating effect of social tie strength offers at least one explanation for the contradictory findings about the effect of homophily on individual sharing behavior in previous research. Third, information processing theory indicates that topical relevance plays an important role in individual retweeting behavior, which is also supported by previous studies and the current research. However, prior research has not adequately

16

1 Introduction

investigated the relationship between topical relevance and individual retweeting behavior. Understanding how individual retweeting behavior is affected by topical relevance is crucially important to gauging the extent that content’s relevance affects individual sharing decision on SNS, especially given the significant effect that individual retweeting decision has on information diffusion on SNS and thus the effectiveness and efficiency of online marketing campaigns. This monograph, through its investigation of the three types of moderators affecting the relationship between topical relevance and individual retweeting behavior, expands the ELM theory, deepens our understanding of the impact that topical relevance has on individual forwarding behavior, and constitutes a step toward fulfilling this void in the literature. Specifically, the main innovations are as follows. 1. Individuals with larger number of followers consider more about the topical relevance of a tweet when they make the retweeting decisions. This is reasonable as users having a larger number of followers experience an intensified awareness regarding other members within the network, and thus are more likely to maintain self-image within the network. The interpersonal pressure motivates the individuals to spend more thoughtful effort and carefully consider the message content when deciding to retweet. Thus, message content-related cues such as topical relevance has a stronger impact on users with more followers. Ma et al. [29] verify that females tend to rely more on the central route (i.e., independently give their own subsequent rating instead of relying on prior reviews). However, our research does not find that females are more likely to follow the central route when making retweeting decisions, and there is no significant difference in the impact that topical relevance has on different genders when they make retweeting decisions. This may be explained by the different situations in the two research. Besides, the moderation effect of cumulative experience on the impact of topical relevance of a tweet on individual retweeting behavior is not significant in our research. 2. Lengthier tweets are found to positively moderate the relationship between topical relevance and individual retweeting behavior. We guess users who tend to produce longer original tweets are willing to expend more cognitive effort in composing the content and enjoy thinking deliberately. Thus they are more likely to scrutinize the content and judge the topical relevance of the tweet when making retweeting decisions. We also find support for the negative moderation effect of the time interval on the impact of topical relevance of a tweet on individual retweeting behavior. This is reasonable as active users, as quantified by a shorter lag between retweets, are more involved and more likely to be independent when making retweeting decisions, and thus deliberate in a more central fashion [37]. 3. The impact that topical relevance has on individual retweeting behavior is weaker for tweets coming from homophilic followees. This means when receiving a tweet from a homophilic friend, instead of doing extensive cognitive work, individuals may rely on a variety of peripheral cues—such as “liking” or “authority”—that

1.4 Main Innovation and Contributions

17

allow them to make quick decisions. Consequently, the effect of central route such as topical relevance on this individual’s retweeting response decreases. That is to say, the more value homophily exists between the tweet’s author and the individual, the less likely are individuals to effortfully elaborate on the degree of topical relevance when making retweeting decisions. Besides, our regression result shows that the moderating effect of social tie strength is positively significant at 1‰, which is opposite to our expectation. This means that an involving message should have a greater likelihood of being shared when it comes from a followee who interacts with the individual on Twitter more frequently. 4. In addition to these findings, we confirm that the hierarchical/multilevel linear model adopted in our research has several advantages over previous models employed to study individual retweeting behavior on SNS. This is the first time that a HLM is used to analyze individual retweeting behavior on SNS in the information systems (IS) field. Second, retweets of one individual tend to be dramatically distinct from those of another individual, especially in terms of the intra-individual correlation [38]. As such, unless a hierarchical model is employed, estimation results are likely to be inconsistent [39], especially for the cross-level moderating effect [29]. Third, the hierarchical linear model yields a better fit with the data, generates a stable estimation result across different samples, and significantly boosts the hit rate.

References 1. Shi, Z., Rui, H., Whinston, A.B.: Content sharing in a social broadcasting environment: evidence from twitter. Manage. Inf. Syst. Q. 38(1), 123–142 (2014) 2. Liu, P.f.: The trend of public opinion in the network and the pattern of public opinion in 2013. Journalists, 21–28 (2014). (in Chinese) 3. Goode, L.: Social news, citizen journalism and democracy. New Media Soc. 11(8), 1287–1305 (2009) 4. Riegner, C.: Word of mouth on the web: the impact of web 2.0 on consumer purchase decisions. J. Adv. Res. 47(4), 436–447 (2007) 5. Forman, C., Ghose, A., Wiesenfeld, B.: Examining the relationship between reviews and sales: the role of reviewer identity disclosure in electronic markets. Inf. Syst. Res. 19(3), 291–313 (2008) 6. Rapp, A., Beitelspacher, L.S., Grewal, D., Hughes, D.E.: Understanding social media effects across seller, retailer, and consumer interactions. J. Acad. Market Sci. 41(5), 547–566 (2013) 7. Allsop, D.T., Bassett, B.R., Hoskins, J.A.: Word-of-mouth research: principles and applications. J. Adv. Res. 47(4), 398–411 (2007) 8. Petrovic, S., Osborne, M., Lavrenko, V.: Rt to win! Predicting message propagation in Twitter. In: ICWSM (2011) 9. Baird, C.H., Parasnis, G.: From social media to social customer relationship management. Strat. Leader. 39(5), 30–37 (2011)

18

1 Introduction

10. Berger, J., Milkman, K.L.: What makes online content viral? J. Market. Res. 49(2), 192–205 (2012) 11. Li, Y.M., Lee, Y.L., Lien, N.J.: Online social advertising via influential endorsers. Int. J. Electron. Commer. 16(3), 119–154 (2012) 12. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer (2013) ˘ 13. Stieglitz, S., Dang-Xuan, L.: Emotions and information diffusion in social mediaâAsentiment of microblogs and sharing behavior. J. Manage. Inf. Syst. 29(4), 217–248 (2013) ˘ Ztweets?: ´ 14. Ha, S., Ahn, J.: Why are you sharing othersâA the impact of argument quality and source credibility on information sharing behavior. In: Proceedings of the 32nd International Conference on Information Systems (2011) 15. Liu, Z., Liu, L., Li, H.: Determinants of information retweeting in microblogging. Internet Res. 22(4), 443–466 (2012) 16. Yan, W., Huang, J.: Microblogging reposting mechanism: an information adoption perspective. Tsinghua Sci. Technol. 19(5), 531–542 (2014) 17. Zhang, Y., Moe, W.W., Schweidel, D.A.: Modeling the role of message content and influencers in social media rebroadcasting. Int. J. Res. Market. 34(1), 100–119 (2017) 18. Rudat, A., Buder, J.: Making retweeting social: the influence of content and context information on sharing news in twitter. Comput. Hum. Behav. 46, 75–84 (2015) 19. Ho, J.Y., Dempsey, M.: Viral marketing: motivations to forward online content. J. Bus. Res. 63(9), 1000–1006 (2010) 20. Harvey, C.G., Stewart, D.B., Ewing, M.T.: Forward or delete: what drives peer-to-peer message propagation across social networks? J. Consum. Behav. 10(6), 365–372 (2011) 21. Sweeney, J.C., Soutar, G.N., Mazzarol, T.: Factors influencing word of mouth effectiveness: receiver perspectives. Eur. J. Market. 42(3/4), 344–364 (2008) 22. Pechmann, C., Stewart, D.W.: The effects of comparative advertising on attention, memory, and purchase intentions. J. Consum. Res. 17(2), 180–191 (1990) 23. Campbell, D.E., Wright, R.T.: Shut-up i don’t care: understanding the role of relevance and interactivity on customer attitudes toward repetitive online advertising. J. Electron. Commer. Res. 9(1), 62 (2008) 24. Liang, T.P., Lai, H.J., Ku, Y.C.: Personalized content recommendation and user satisfaction: theoretical synthesis and empirical findings. J. Manage. Inf. Syst. 23(3), 45–70 (2006) 25. Zhu, Y.Q., Chang, J.H.: The key role of relevance in personalized advertisement: examining its impact on perceptions of privacy invasion, self-awareness, and continuous use intentions. Comput. Hum. Behav. 65, 442–447 (2016) 26. Chu, S.C., Kim, Y.: Determinants of consumer engagement in electronic word-of-mouth (ewom) in social networking sites. Int. J. Adv. 30(1), 47–75 (2011) 27. Harrigan, N., Achananuparp, P., Lim, E.P.: Influentials, novelty, and social contagion: the viral power of average friends, close communities, and old news. Soc. Netw. 34(4), 470–480 (2012) 28. Ma, L., Sian Lee, C., Hoe-Lian Goh, D.: Understanding news sharing in social media: an explanation from the diffusion of innovations theory. Online Inf. Rev. 38(5), 598–615 (2014) 29. Ma, X., Khansa, L., Deng, Y., Kim, S.S.: Impact of prior reviews on the subsequent review process in reputation systems. J. Manage. Inf. Syst. 30(3), 279–310 (2013) 30. Goldhaber, M.: The value of openness in an attention economy. First Monday 11(6), (2006) 31. Patalano, C.: Punk marketing-get off your ass and join the revolution. J. Appl. Manage. Entrep. 13(1), 87 (2008) 32. Bos, W., Tarnai, C.: Content analysis in empirical social research. Int. J. Educ. Res. 31(8), 659–671 (1999) 33. Silbermann, A.: Systematische Inhaltsanalyse. Handbuch der empirischen Sozialforschung, vol. 4. Enke, Stuttgart (1974) 34. Graneheim, U.H., Lundman, B.: Qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness. Nurse Educ. Today 24(2), 105–112 (2004) 35. Berelson, B.: Content analysis in communication research. Am. Sociol. Rev. 17(4), 515 (1952)

References

19

36. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990) 37. Kim, D., Benbasat, I.: Trust-related arguments in internet stores: a framework for evaluation. J. Electron. Commer. Res. 4(2), 49–64 (2003) 38. Phillips, D.M., Baumgartner, H.: The role of consumption emotions in the satisfaction response. J. Consum. Psychol. 12(3), 243–252 (2002) 39. Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press (2006)

Chapter 2

Literature Review and Theoretical Foundation

2.1 Introduction Beyond all doubt, SNS have become an indispensable part of our daily livies. Both at home and abroad, hundreds of millions of people are spending countless hours on SNS to search, share, communicate, interact, and create user-generated data at an unprecedented rate. Lots of researchers have began to study SNS, such as users’ behavior on SNS, information diffusion phenomenon on SNS. Specifically, SNSrelated research revolves around community detection and analysis, information diffusion on SNS, user influence on SNS, recommendations on SNS, and behavior analysis including individual-level and collective-level analysis [1]. This monograph focuses on individual retweeting behavior on SNS, which plays a pivotal role in information diffusion on SNS. Actually, investigating individual retweeting behavior on SNS has become one of the key issues in SNS studies and drawn tremendous interest from multiple disciplines such as computer science [2– 7], communication [8–10], marketing [11, 12], management [13, 14], and so on. Using the citation report function on Web of Science, we examine the research trend of users’ retweeting behavior on social networking sites. The result is illustrated in Fig. 2.1. We can see that the cited times have been increasing year by year since 2007, demonstrating the vitality of this research area. Investigating individual retweeting behavior on SNS has become a hot research topic. In this chapter, we first introduce key concepts involved in this monograph in Sect. 2.2. And then we spend two sections elaborating research on individual retweeting behavior, followed by Sect. 2.5 introducing the theoretical foundation of this monograph. Finally, Sect. 2.6 comments on existing research and concludes this chapter.

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 J. Shi et al., Individual Retweeting Behavior on Social Networking Sites, https://doi.org/10.1007/978-981-15-7376-7_2

21

22

2 Literature Review and Theoretical Foundation

Fig. 2.1 Statistics on research topic “retweeting/disseminating/sharing on social networking sites”

2.2 Definition of Concepts 2.2.1 Social Networking Sites According to [15], Social Networking Sites (SNS) are “web-based services that allow individuals to (1) construct a public or semi-public profile within a bounded system, (2) articulate a list of other users with whom they share a connection, and (3) view and traverse their list of connections and those made by others within the system”. Thus, SNS enable users to present themselves, connect to a social network, and develop and maintain relationships with others [16, 17]. SNS encourage participation, collaboration, and information sharing, allowing users to interact freely with each other [18]. Facebook, MySpace, Twitter, and Sina Weibo are successful examples of social networking sites. Among them, we adopt Twitter to carry out our research and thus the following concepts are elaborated based on Twitter.

2.2.2 Twitter Twitter, one of the most popular social networking and micro-blogging service providers in the world, has grown in popularity in recent years since its emergence in 2006. Within the 140-character tweet, people can freely publish their daily activities, express their opinions, likes and dislikes which may induce other people’s forwarding and replies. Moreover, Twitter’s asymmetric following model allows users to keep abreast of the latest happenings of any other user, even though that particular user may not choose to follow you back or be aware of your existence, which is opposed to Facebook or LinkedIn requiring mutual acceptance between users. Consequently, it is not surprising that Twitter has attracted wide interest from both the academia and the industry, because of abundant real-time user data which offer prolific information about people from every walk of life, and its clean and well-documented API, rich developer tooling which have made access and analysis of social networking data easy and convenient. For example, Twitter data have been

2.2 Definition of Concepts

23

Fig. 2.2 A tweet on Twitter

exploited and researched in areas such as political area [19–21], enterprise marketing [22–24], medical and health care [25, 26], finance forecast [27] and so forth.

2.2.3 Tweet Tweets are the basic atomic building block of all things on Twitter. Tweets are also known as “status updates”. Tweets can be embedded, replied to, liked, unliked and deleted. In this monograph, we use “tweet” “post”, and “message” interchangeably. Figure 2.2 shows a tweet published by the account “Wall Street Journal” which includes a URL link as well as one sentence. Note that the maximum length of a post is restricted to 140 characters on Twitter. URLs in a post can redirect people to videos, interesting web pages, and other web contents to obtain further information, and thus expand the informativeness of a post. Hashtag, a word beginning with the # symbol, is added to posts to aggregate messages which revolve around the same topic (e.g., # intel-powered, # Christmas). Thus, hashtag can categorize messages and facilitate users’ search on SNS.

2.2.4 User’s Behavior on SNS 1. Follow As illustrated in Fig. 2.3, on Twitter, once userA follows (i.e., subscribes) userB, userA will receiver userB’s updates in real time. We call userA as a follower of userB and userB is a friend (followee) of userA. According to the definition given by [28], social communication is a “process by which an individual (the informa-

24

2 Literature Review and Theoretical Foundation

Fig. 2.3 Following relationship on Twitter

tion source) transmits stimuli (the information carrier) to modify the behavior of other individuals (the information receivers)”. Thus, userB is the information source; userB’s updates are stimuli; userA and other followers of userB are the information receivers. Upon receiving userB’s update, those followers may ignore it, read it, reply it, or retweet it to their own followers. Figure 2.4 lists some accounts for you to follow, which is one of Twitter’s services to improve customer’s satisfaction. 2. Retweet In the current research, retweet refers to re-posting other-sourced tweets. Twitter’s retweet feature helps users quickly share that tweet with all of the user’s followers. Just click the “retweet” button marked by the rectangular in Fig. 2.2 and all of your followers will see the tweet on their timeline. We can see that retweeting action is a quick and easy way to relay other-sourced information to an large audience and thus is of great importance for spreading news. In this monograph, we use “retweet”, “disseminate”, “share”, and “forward” interchangeably. It is worth noting that on Facebook, users can control the scope of information available for each friend. This is the function of lists. That is to say, users can introduce filters for their tweets and selectively display tweets on Twitter. However, on Twitter, there are no such features to manage the scope of information. As a result, once a user retweets a post, all of the user’s followers will see the message. 3. Mention A mention is a Tweet that contains another person’s @username anywhere in the body of the Tweet. “Mentioning” is the practice of referring to another user in a post via the use of “@username”. Thus, it is a form of “addressivity” aiming to gain the target person’s attention, which is essential for conversation to occur.

2.3 Explanation-Oriented Studies on Individual Retweeting Behavior According to Shmueli and Koppius [29], existing studies on individual retweeting behavior can be divided into two classes, namely prediction-oriented research and explanation-oriented research. The former aims at accurately predicting new observations by establishing predictive models (statistical models and other methods such as data mining algorithms), while the latter focuses on explanation and is grounded on underlying causal relationships between theoretical constructs. In this section,

2.3 Explanation-Oriented Studies on Individual Retweeting Behavior

25

Fig. 2.4 Twitter suggests some accounts for you to follow

we present explanation-oriented study on individual retweeting behavior and in next section we give a brief introduction about prediction-oriented research on individual retweeting behavior. From the perspective of a social communication process [28], communication is a process by which an individual (the information source) transmits stimuli (the information carrier, for example, a message) to modify the behavior of other individuals (the information receiver). In addition, Rogers and Bhowmik [30] point out that relationship between the source and the receiver is also an aspect of communication that cannot be ignored, as relationships account for many aspects of communication, such as credibility, empathy, attraction, etc., and ultimately the effectiveness of communication. Thus, relationship between the source and the receiver should also be taken into account in the communication process. Figure 2.5 shows the basic elements in a communication process. In Fig. 2.5, we can see that “contextual factor” is also included in the communication process. This is because different with the traditional interpersonal commu-

26

2 Literature Review and Theoretical Foundation

Fig. 2.5 Sketch for elements in a communication process

nication in face-to-face settings, receivers are mostly unfamiliar with the credentials of the information source on SNS. They tend to look for contextual factors such as credibility of websites/platform [31], decisions of other people (also called bandwagon effect or herd behavior [32]) to judge the credibility of the stimuli and the information source. Thus, contextual factor also has an impact on receiver’s reaction. Totally, five different aspects of factors can influence the receiver’s decisions in a communication process, as illustrated in Fig. 2.5. According to Fig. 2.5, we can roughly classify existing studies focusing on factors behind the retweeting behavior into several categories. Table 2.1 lists these studies and their main findings.

2.3.1 Research Emphasizing the Information Carrier The first category of research emphasizes the impact of the information carrier on the dissemination behavior. For example, based on two data sets of more than 165,000 political tweets in total, Stieglitz and Dang-Xuan [14] demonstrate that the more amount of sentiment a political Twitter message exhibits, the more often it will be retweeted. By analyzing a data set of all online New York Times articles published over a three-month period, Berger and Milkman [10] indicate that the likelihood of positive articles to be shared (by e-mail) is higher than negative articles. Besides, articles that evoke high-arousal positive (awe) or negative (anger or anxiety) emotions are more viral than articles that evoke low-arousal or deactivating emotions. In the laboratory experiments participated by 64 German speaking students, Rudat and Buder [33] show that both the information value of the tweet and agent awareness (i.e., bandwagon effect) have significant effects on people’s retweeting intention.

2.3 Explanation-Oriented Studies on Individual Retweeting Behavior

27

2.3.2 Research Emphasizing both the Information Source and the Information Carrier The second category of research examines both the impact of information source and the impact of information carrier on the retweetability of a tweet or individual retweeting intention. Based on two data sets of 360,000 tweets in total, Liu et al. [34] examine the determinants of information retweeting in emergency events (i.e., earthquake and mudslide) and show that both characteristics of the information source (e.g., source expertise, source attractiveness) and the number of multimedia in a tweet have positive impacts on the retweeted times for a tweet on Sina micro-blogging. In a field experiment participated by 216 respondents, Yan and Huang [35] investigate individual retweeting intention based on Information Adoption Model (IAM) [36] and social presence theory and show that information quality, source credibility and perceived enjoyment have significant impacts on individual perceptions regarding the tweet and thereby indirectly influence their retweeting intention. Based on a survey participated by 84 Twitter users, Ha and Ahn [37] obtain similar conclusions.

2.3.3 Research Emphasizing Individual Preferences The third category of research focuses on the impacts of individual heterogeneity, preferences, and motivations on their information retweeting behavior. Based on an online survey participated by 586 respondents, Ho and Dempsey [38] examine internet users’ motivations to pass along online content and find that (1) the need to be individualistic, (2) the need to be altruistic, (3) the consumption of online content all have positive impacts on individual eWOM forwarding behavior. Using three YouTube videos as exemplars of viral peer-to-peer stimuli, Harvey et al. [9] investigate the forwarding behavior of 173 respondents in peer-to-peer communication and demonstrate that a sender’s relevance with a YouTube video has a positive impact on the likelihood of forwarding it across a tie. Based on an individual-level split hazardmodel that, Zhang et al. [39] verify that the rebroadcasting a message depends not only on message content but also on the message’s fit with a user.

2.3.4 Research Emphasizing Relationships In addition to the above research, the impact of relationship between the information source and the receiver on the receiver’s dissemination behavior is also examined by prior research. Shen et al. [40] show that social tie moderates the effects of the message format and advertising literacy on communication effectiveness. Using social exchange theory and a Twitter data set, Shi et al. [13] investigate relationships between users’ social network characteristics and their retweeting actions and find

28

2 Literature Review and Theoretical Foundation

that weak ties (in the form of unidirectional links) are more likely to engage in the social exchange process of content sharing, compared with bidirectional followers. In contrast to that, others verify that social tie strength has a positive effect on users’ eWOM behavior such as opinion seeking or opinion passing [41, 42], or news sharing intention [43]. Researchers also find inconsistent evidence on the impact of homophily. Chu and Kim [42] reveal that a negative relationship exists between perceived homophily and individual opinion seeking or opinion passing behavior. However, Harrigan et al. [44] show that users are more likely to retweet messages from close relations/community structures. And Ma et al. [43] find that homophily exerts no significant influence on intention to share news in social media.

2.4 Prediction-Oriented Studies on Individual Retweeting Behavior Based on a Twitter dataset, Suh et al. [2] find that whether containing hashtags, URLs, and mentions in tweets are the most important factors for the retweetability of a tweet, followed by the indegree and outdegree of the author. Nagarajan et al. [45] analyze over 1 million tweets relating to three real-world events and the properties of the retweet behavior surrounding the most viral content pieces. They find out that tweets containing hyperlinks to informative posts, videos, images generate a denser retweet network. Xu and Yang [3] leverage four different types of features: social-based, content-based, tweet-based and author-based features in their prediction model. By performing “leave-one-feature-out” comparisons, they find that “the number of times the author has been retweeted by the user” is the most important feature in predicting an individual’s retweeting behavior. Using a Twitter data consisting of over 768,000 tweets, Macskassyand Michelson [4] compare four competing retweeting models and show that retweet model which takes user’s homophily or similarity into account fits the observed retweet behavior much better than generic models. However, there are obvious drawbacks in their research. First, both model fitting and model evaluation are based on the same dataset. Therefore, it is doubtful whether the conclusions will hold for new unseen tweets. Second, only capitalized non-stop words in tweets are used to detect a user’ topics of interest, which may not capture the full picture of the tweets. Tang et al. [5] study relationships between users by considering social similarity in the prediction model, and cast the predicting problem as a multi-task learning problem. Zhang et al. [6] verify that prediction performance improves significantly by incorporating social influence locality into the model. Feng and Wang [7] model individual retweet behavior as a graph made up of three types of nodes: users, publishers and tweets. Based on the graph, they propose a feature-aware factorization model to predict individual retweeting behavior on Twitter.

2.4 Prediction-Oriented Studies on Individual Retweeting Behavior

29

Table 2.1 Classification of related studies based on a social communication perspective Studies

Source related factors

[14]

Informationcarrier related factors

Receiver related factors

[33]

Findings

Sentiment

Amount of sentimenta → retweeted times. The association between amount of sentiment and retweet quantity is stronger for tweets with negative sentimentb

Sentiment

Positive content is more viral than negative contenta Content that evokes high-arousal positive or negative emotions is more virala

Characteristics of content

Tweets containing hyperlinks to informative posts, videos, images generate a denser retweet networkc

Information value, bandwagon effect

Information valuea → people’s retweeting intention. Bandwagon effecta → people’s retweeting intention

[10]

[45]c

Sourcereceiver relationships

[2]c

Followers of author

Characteristics of content

Having urls, hashtagsc → retweeted times. #Followers of authorc → retweeted times

[37]

Source credibility

Argument quality

Source credibility indirectly affects sharing intentiona Argument quality indirectly affects sharing intentiona

[34]

Characteristics Amount of of source information

Source expertisea → retweeted times. Source attractivenessa → retweeted times. Source trustworthinessa → retweeted times. #multimediaa → retweeted times

[35]

Characteristics Information of source quality

Source credibility indirectly affects retweeting intentiona Information quality indirectly affects retweeting intentiona Perceived enjoymenta → retweeting intention

[38]

Individual need

The need to be individualistica → eWOM forwarding behavior. The need to be altruistica → eWOM forwarding behavior

Relevance

Relevance with individuala → forwarding behavior

Content-user fit

Contenta → retweeting behavior. Content-user fita → retweeting behavior

[9] Content [39]

(continued)

30

2 Literature Review and Theoretical Foundation

Table 2.1 (continued) Studies

Source related factors

Informationcarrier related factors

[41]

Receiver related factors

Sourcereceiver relationships

Findings

Individual characteristics

Social tie

Internet social connection indirectly affects online forwarding behaviora Individual characteristics (i.e., innovativeness, internet usage experience, music involvement) indirectly affects online forwarding behaviora

Social tie

Weak tiesa → content sharing

Social tie, homophily

Strong tiesa →opinion seeking or passing behavior. Perceived attitude homophily negatively affects opinion seeking or passing behaviora

[13] [42]

[43]

[40]

Perceived credibility, perceived preference

Role of the individual

Social tie, homophily

Perceived preferencea , opinion leadershipa , tie strengtha → news sharing behavior. Perceived credibilityb , opinion seekingb , homophilyb →news sharing behavior

Advertising format

Advertising literacy

Social tie

Message-sharing intention is higher in an interactive advertising format than in a non-interactive formata . Consumers with higher advertising literacy have less intention to share ads if they comes from a weak tie than from a strong tiea

Note a Significant effect b Insignificant effect c Correlation relationship instead of causal relationship is examined in the study

2.5 Theoretical Foundation 2.5.1 The Elaboration-Likelihood Model The Elaboration-Likelihood Model (ELM) was developed by [46] and has been considered as the most popular and useful persuasion model in consumer research and social psychology. Elaboration is “the extent to which a person carefully thinks about issue-relevant arguments contained in a persuasive communication”. As a dualprocessing model of information processing, the ELM proposes that attitude change and consequent behavior change among individuals may be caused by two routes of

2.5 Theoretical Foundation

31

influence: the central route and the peripheral route. The central route requires an individual to think effortfully about issue-related arguments in a message and reflect on the relative merits and relevance of those arguments before forming an informed decision about the target behavior. In the context of making retweeting decisions on SNS, such arguments may refer to the trustworthiness and truthfulness of the message, relevance of the message to himself/herself, potential risks and benefits of sharing the message with followers, etc. We can see that the central route involves a high level of message elaboration, which demands concentrating on the content of the message, deliberating and assessing its content, and reflecting on issues relevant to the message. In an attempt to process new information rationally, people use the central route to scrutinize the ideas and to figure out if they have true merit and mull over their implications. The peripheral route involves less cognitive effort, where individuals accept or reject a message “without any active thinking about the attributes of the issue or the object of consideration”. Instead of doing extensive cognitive work, recipients rely on a variety of cues that allow them to make quick decisions. For example, Cialdini [47] lists six cues that trigger a “click, whirr” response: (1) reciprocation; (2) consistency; (3) social proof; (4) liking; (5) authority; (6) scarcity. In the context of making retweeting decisions on SNS, such cues may refer to source credibility, source attractiveness, how many times the tweet has been retweeted by others, relationships with the information source, etc. Apparently, these cues do not require individuals to scrutinize the message arguments and thus involve less cognitive effort. Therefore, the peripheral route represents a quantitatively difference in elaborative processing relative to the central route [48]. According to the ELM, information receivers can differ greatly in their ability and motivation to elaborate on an argument’s central merits, which in turn may constrain how a given influence process affects their attitude formation or change. This ability and motivation to elaborate is captured in ELM by the elaboration likelihood construct. People in the high elaboration likelihood state are more likely to undertake the cognitive effort to carefully scrutinize the information they are exposed to and, therefore, tend to be more persuaded by the central route than by peripheral cues. In contrast, those in the low elaboration likelihood state, lacking the motivation or ability to deliberate thoughtfully, will be less likely to engage in elaboration and more likely to be influenced by peripheral cues such as source credibility [49].

2.5.2 Why Using ELM in the Current Research Actually various theories have been adopted to investigate the information diffusion phenomenon on SNS or people’s eWOM behavior, as shown in Table 2.2. However, it is worth noting that these theories are suitable for investigating a limited set of factors, but not comprehensive and inclusive enough to support a thorough examination of individual information dissemination behavior. Although Heuristic Systematic

32

2 Literature Review and Theoretical Foundation

Table 2.2 Classification of related studies based on data collection methods Data collection Authors (year) Scale of data Theories methods Twitter

[14]

Twitter

[13]

Twitter

[39]a

Sina Weibo

[34]

New York Times

[10]

Experiment

[33]

Experiment

[40]

Field experiment

[35]

Survey

[38]

Survey

[9]

Survey

[37]

Survey

[42]

Survey

[41]

Survey

[43]

Two data sets of more than 165,000 tweets in total 65 toptweets, 24,403 users 3074 retweets,728,986 non-retweets

The first sample: 302,918, the second sample: 58,164 messages 6956 articles

65 participants, 36 self-created fictive tweets 100 participants in study 1246 participants in study 2 216 participants, #tweets depends on each participant’s account 582 respondents

173 respondents, 3 YouTube videos as exemplars of stimuli. 84 respondents, the first tweet on each one’s Twitter account 363 respondents 250 undergraduate students from two colleges 309 respondents

Emotions-related social psychology Social exchange theory Prior studies such as eWOM studies and social influence studies Heuristic Systematic Model (HSM) [50]

Self-presentation theory and theory related with psychological arousal News value theory and bandwagon effect Social capital and communication theory Information Adoption Model [36] and social presence theory Fundamental interpersonal relations orientation [51] Prior studies

Information Adoption Model and social efficacy theory Prior studies such as social network studies Diffusion of innovation Diffusion of innovation

Note Nagarajan et al. [45] and Suh et al. [2] are not included in the table, as causal relationships are not their focus a Zhang et al. [39] focuses on modeling instead of testing explanatory hypotheses

2.5 Theoretical Foundation

33

Model (HSM), adopted in the study of [34], is another dual-processing theory, the empirical literature supporting the validity of HSM is limited [52]. From Subsect. 2.5.1, we know that the ELM is a persuasion theory, which models how the message a person is exposed to influences the person’s attitude formation and, subsequently, his or her behavior. That is to say, the ELM can answer questions about the influence process itself. For instance, what determinants may impact the individual’s retweeting decision? Which determinants are more important in affecting the decision-making process? Using the ELM as the theoretical framework, we can integrate and classify relevant factors into the central and peripheral routes and thereby examine their relative importance in affecting individual retweeting decisions. Thus, the ELM lends itself naturally to the examination of individual information dissemination behavior on SNS. Although the ELM is primarily employed to examine information adoption behavior, such as making purchase decisions or becoming interested in certain product, we argue that it can also be used to investigate individual information dissemination decision. Because information dissemination behavior is a clear and visible sign that the message is actively processed by twitterers and hence a behavioral indicator of message salience [53]. After assessing and acknowledging the validity and value of the information, individuals retweet the information to his or her followers, demonstrating that this message has some intrinsic value (being, at the very least, seen as worth sharing with others).

2.6 Commentary on Related Literature Social networking sites have become an important avenue for information transmission. Investigating the drivers of individual information dissemination behavior is of great importance to understanding the mechanisms of information diffusion on SNS, can shed some light on the formation of social epidemic and (or) rumor, instruct marketers to devise online marketing strategies, and provide guidance for the government to manage online public opinions. While extensive research has been carried out on predicting or explaining individual retweeting behavior, some questions still need further exploration. First, none of the existing studies has verified that individual retweeting behavior does not occur randomly; rather, they take this unstated premise as a fact. We claim that re-examining this unstated assumption is necessary, because any prediction work will be rendered meaningless without this premise. Second, from Table 2.1, we can see that the scope of prior studies on the determinants of individual information dissemination decisions is large and fragmented, and little has been done to integrate existing findings. Furthermore, it is worthwhile to note the fact that people vary in how they react to the same tweet. Consider, for example, a tweet published by Donald J. Trump (the 45th President of the USA), which has been retweeted millions of times, may be totally disregarded by a twitterer who shows little interest in politics. After all, the retweeting decision is made by the

34

2 Literature Review and Theoretical Foundation

information receiver who has his/her own preference, interests and needs. However, most existing studies pay more attention to the impact that the information source and the tweet itself have on individual retweeting decisions, and few researchers highlight the information receivers’ role on this issue [54]. Third, to the best of our knowledge, virtually no scholarly effort has been undertaken to understand the relative importance of those factors in affecting individual retweeting decision. Instead, a large number of features are indiscriminately introduced into the prediction model without examining the relevance of these features. This point can be drawn from prediction-oriented studies. As a result, the existence of irrelevant/redundant features not only increases the data collection cost, but also tends to generate an overfitted model which predicts poorly on future observations not used in model training, known as the curse of dimensionality [55], but also hinders us from understanding which factors are actually dominating an individual’s retweeting behavior. Fourth, the impact of the relationship between the source and the receiver on the receiver’s information retweeting behavior is still controversial. Fifth, although Harvey et al. [9] and Zhang et al. [39] verify that online content’s relevance with an individual has an impact on that individual’s forwarding behavior, none of them has investigated whether the identified causal relationship applies equally or differentially across user populations. To put it another way, are individuals equally affected by the content’s relevance when making retweeting decisions? Last but not the least, from Table 2.2, we find that survey is a mainstream data collection method of prior explanation-oriented empirical studies. However, selfreported approaches are limited by sample size, recall bias, and the bias of social desirability. Besides, the typical retweeting experiment employs very limited number of readily available messages to examine how some external factors affect individual retweeting intention. This may preclude researchers from getting an in-depth understanding about the driving forces behind individual retweeting behavior, due to the lack of generalizability across different topics. The above-mentioned gaps merit a systematic and integrated examination of individual information dissemination behavior on SNS. In this study, we aim to address these gaps by proposing an integrated conceptual model from the information receiver’s perspective. This conceptual model integrates various factors shown to influence individual information dissemination behavior on SNS and thus allows us to examine the relative importance of those factors. In the following chapter, we will focus on the development of this systematic and integrated conceptual model of individual retweeting behavior on SNS.

References

35

References 1. Zafarani, R., Abbasi, M.A., Liu, H.: Social Media Mining: An Introduction. Cambridge University Press, New York, NY, USA (2014) 2. Suh, B., Hong, L., Pirolli, P., Chi, E.H.: Want to be retweeted? large scale analytics on factors impacting retweet in twitter network. In: 2010 IEEE Second International Conference on Social Computing (Socialcom), pp. 177–184. IEEE (2010) 3. Xu, Z., Yang, Q.: Analyzing user retweet behavior on twitter. In: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), pp. 46–50. IEEE Computer Society (2012) 4. Macskassy, S.A., Michelson, M.: Why do people retweet? Anti-homophily wins the day! In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, pp. 209–216 (2011) 5. Tang, X., Miao, Q., Quan, Y., Tang, J., Deng, K.: Predicting individual retweet behavior by user similarity: a multi-task learning approach. Knowl.-Based Syst. 89, 681–688 (2015) 6. Zhang, J., Tang, J., Li, J., Liu, Y., Xing, C.: Who influenced you? Predicting retweet via social influence locality. ACM Trans. Knowl. Disc. Data (TKDD) 9(3), 25 (2014) 7. Feng, W., Wang, J.: Retweet or not?: personalized tweet re-ranking. In: Proceedings of the Sixth ACM International Conference on Web search and Data Mining, pp. 577–586. ACM (2013) 8. Steffes, E.M., Burgee, L.E.: Social ties and online word of mouth. Internet Res. 19(1), 42–59 (2009) 9. Harvey, C.G., Stewart, D.B., Ewing, M.T.: Forward or delete: what drives peer-to-peer message propagation across social networks? J. Consum. Behav. 10(6), 365–372 (2011) 10. Berger, J., Milkman, K.L.: What makes online content viral? J. Mark. Res. 49(2), 192–205 (2012) 11. Ho, J.Y.C., Dempsey, M.: Viral marketing: motivations to forward online content. J. Bus. Res. 63(9–10), 1000–1006 (2010) 12. Walker, L., Baines, P., Dimitriu R., Macdonald E.K.: Antecedents of retweeting in a (political) marketing context. Psychol. Mark. textbf34(3), 275–293 (2017) 13. Shi, Z., Rui, H., Whinston, A.B.: Content sharing in a social broadcasting environment: evidence from twitter. Manag. Inform. Syst. Quart. 38(1), 123–142 (2014) 14. Stieglitz, S., Dang-Xuan, L.: Emotions and information diffusion in social media–sentiment of microblogs and sharing behavior. J. Manag. Inform. Syst. 29(4), 217–248 (2013) 15. Ellison, N.B., et al.: Social network sites: definition, history, and scholarship. J. Comput.Mediated Commun. 13(1), 210–230 (2007) 16. Ellison, N.B., Steinfield, C., Lampe, C.: The benefits of facebook “friends:” social capital and college students’ use of online social network sites. J. Comput.-Mediated Commun. 12(4), 1143–1168 (2007) 17. Kane, G.C., Fichman, R.G., Gallaugher, J., Glaser, J.: Community relations 2.0. Harv. Bus. Rev. 87(11), 45–50 (2009) 18. Tim, O’.: What is web 2.0? Design patterns and business models for the next generation of software. Commun. Strateg. 1, 17 (2007) 19. Lotan, G., Graeff, E., Ananny, M., Gaffney, D., Pearce, I., et al.: The arab spring|The revolutions were tweeted: information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Commun. 5, 31 (2011) 20. Hermida, A., Lewis, S.C., Zamith, R.: Sourcing the arab spring: a case study of andy Carvin’s sources on Twitter during the Tunisian and Egyptian revolutions. J. Comput.-Mediated Commun. 19(3), 479–499 (2014) 21. Stieglitz, S., Dang-Xuan, L.: Political communication and influence through microblogging— an empirical analysis of sentiment in twitter messages and retweet behavior. In: 2012 45th Hawaii International Conference on System Science (HICSS), pp. 3500–3509. IEEE (2012) 22. Smith, A.N., Fischer, E., Yongjian, C.: How does brand-related user-generated content differ across Youtube, Facebook, and Twitter? J. Interact. Mark. 26(2), 102–113 (2012)

36

2 Literature Review and Theoretical Foundation

23. Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral marketing in large-scale social networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1029–1038. ACM (2010) 24. Swani, K., Brown, B.P., Milne, G.R.: Should tweets differ for B2B and B2C? An analysis of fortune 500 companies’ Twitter communications. Ind. Mark. Manag. 43(5), 873–881 (2014) 25. Krieck, M., Dreesman, J., Otrusina, L., Denecke, K.: A new age of public health: Identifying disease outbreaks by analyzing tweets. In: Proceedings of Health Web-science Workshop, ACM Web Science Conference (2011) 26. Signorini, A., Segre, A.M., Polgreen, P.M.: The use of Twitter to track levels of disease activity and public concern in the US during the influenza a H1N1 pandemic. PLoS One 6(5), e19467 (2011) 27. Asur, S., Huberman, B.A.: Predicting the future with social media. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 492–499. IEEE (2010) 28. Hovland, C.I.: Social communication. Proc. Am. Philos. Soc. 92(5), 371–375 (1948) 29. Shmueli, G., Koppius, O.R.: Predictive analytics in information systems research. MIS Quart. 553–572 (2011) 30. Rogers, E.M., Bhowmik, D.K.: Homophily-Heterophily: relational concepts for communication research. Public Opinion Quar. 34(4), 523–538 (1970) 31. Park, C., Lee, T.M.: Information direction, website reputation and eWOM effect: a moderating role of product type. J. Bus. Res. 62(1), 61–67 (2009) 32. Salganik, M.J., Dodds, P.S., Watts, D.J.: Experimental study of inequality and unpredictability in an artificial cultural market. Science 311(5762), 854–856 (2006) 33. Rudat, A., Buder, J.: Making retweeting social: the influence of content and context information on sharing news in Twitter. Comput. Hum. Behav. 46, 75–84 (2015) 34. Liu, Z., Liu, L., Li, H.: Determinants of information retweeting in microblogging. Internet Res. 22(4), 443–466 (2012) 35. Yan, W., Huang, J.: Microblogging reposting mechanism: an information adoption perspective. Tsinghua Sci. Technol. 19(5), 531–542 (2014) 36. Sussman, S.W., Siegal, W.S.: Informational influence in organizations: an integrated approach to knowledge adoption. Inform. Syst. Res. 14(1), 47–65 (2003) 37. Ha, S., Ahn, J.: Why are you sharing others’ tweets?: The impact of argument quality and source credibility on information sharing behavior. In: Proceedings of the 32nd International Conference on Information Systems (2011) 38. Ho, J.Y., Dempsey, M.: Viral marketing: motivations to forward online content. J. Bus. Res. 63(9), 1000–1006 (2010) 39. Zhang, Y., Moe, W.W., Schweidel, D.A.: Modeling the role of message content and influencers in social media rebroadcasting. Int. J. Res. Mark. 34(1), 100–119 (2017) 40. Shen, G.C.C., Chiou, J.S., Hsiao, C.H., Wang, C.H., Li, H.N.: Effective marketing communication via social networking site: the moderating role of the social tie. J. Bus. Res. 69(6), 2265–2270 (2016) 41. Sun, T., Youn, S., Wu, G., Kuntaraporn, M.: Online word-of-mouth (or mouse): an exploration of its antecedents and consequences. J. Comput.-Mediated Commun. 11(4), 1104–1127 (2006) 42. Chu, S.C., Kim, Y.: Determinants of consumer engagement in electronic word-of-mouth (eWOM) in social networking sites. Int. J. Advert. 30(1), 47–75 (2011) 43. Ma, L., Sian Lee, C., Hoe-Lian Goh, D.: Understanding news sharing in social media: an explanation from the diffusion of innovations theory. Online Inform. Rev. 38(5), 598–615 (2014) 44. Harrigan, N., Achananuparp, P., Lim, E.P.: Influentials, novelty, and social contagion: the viral power of average friends, close communities, and old news. Soc. Netw. 34(4), 470–480 (2012) 45. Nagarajan, M., Purohit, H., Sheth, A.P.: A qualitative examination of topical tweet and retweet practices. In: Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, vol. 2, no. 10, pp. 295–298 (2010)

References

37

46. Petty, R.E., Cacioppo, J.T.: The elaboration likelihood model of persuasion. Adv. Exp. Soc. Psychol. 19, 123–205 (1986) 47. Cialdini, R.B.: Influence: Science and Practice, vol. 4. Pearson Education, Boston (2009) 48. Petty, R.E.: The evolution of theory and research in social psychology: from single to multiple effect and process models of persuasion. In: McGarty, C., Alexander Haslam, S. (eds.) The Message of Social Psychology: Perspectives on Mind in Society, pp. 268–290 (1997) 49. Petty, R.E., Cacioppo, J.T., Goldman, R.: Personal involvement as a determinant of argumentbased persuasion. J. Pers. Soc. Psychol. 41(5), 847–855 (1981) 50. Chaiken, S.: Heuristic versus systematic information processing and the use of source versus message cues in persuasion. J. Pers. Soc. Psychol. 39(5), 752 (1980) 51. Schulz, W.: A three-dimensional theory of interpersonal behavior. Holt, Rhinehart, & Winston, New York (1958) 52. Angst, C.M., Agarwal, R.: Adoption of electronic health records in the presence of privacy concerns: the elaboration likelihood model and individual persuasion. MIS Quart. 33(2), 339– 370 (2009) 53. Sutton, J., Gibson, C.B., Spiro, E.S., League, C., Fitzhugh, S.M., Butts, C.T.: What it takes to get passed on: message content, style, and structure as predictors of retransmission in the boston marathon bombing response. PLoS One 10(8), e0134452 (2015) 54. Sweeney, J.C., Soutar, G.N., Mazzarol, T.: Factors influencing word of mouth effectiveness: receiver perspectives. Europ. J. Mark. 42(3/4), 344–364 (2008) 55. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer (2013)

Chapter 3

Research Scheme Design

3.1 Introduction In order to systematically explain individual retweeting behavior on SNS, we try to seek an overarching theoretical perspective to examine this issue. One theoretical perspective that can help inform our thorough understanding of individual information dissemination behavior is the Elaboration-Likelihood Model (ELM) [1], which can explain reactions of individuals to online contents by focusing on the influence process involved in an individual’s decision-making process [2]. The ELM classifies the influence mechanisms into central and peripheral routes based on the type of information processed by a given individual (e.g., issue-relevant arguments or simple cues), allowing us to examine the retweeting behavior from the perspective of a retweeting decision maker. As a decision maker, the individual may follow the central route such as weighing the benefits and risks of retweeting a certain message, or rely on peripheral cues such as relationship between the information source and himself/herself or other people’s opinions on this message. Thus, this theoretical framework provides a lens to understand the influence process involved in an individual’s retweeting decision making process. Using this theoretical framework, we integrate various influential factors into the model and thereby assess their relative importance in affecting individual retweeting decisions. This chapter presents the research design of this monograph. In Sect. 3.2, we develop a conceptual framework of individual retweeting behavior based on the ELM. Specifically, in Sects. 3.2.1–3.2.2, we examine factors that affect individual retweeting decisions based on the ELM and other theories and classify them into two influence routes. Sects. 3.2.3 and 3.2.4 investigate the mediating effect and moderating effect, respectively. Section 3.3 introduces the research methods including data collection, variables measurement, and data analysis models involved in this monograph. Section 3.4 concludes this chapter.

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 J. Shi et al., Individual Retweeting Behavior on Social Networking Sites, https://doi.org/10.1007/978-981-15-7376-7_3

39

40

3 Research Scheme Design

3.2 A Conceptual Framework of Individual Retweeting Behavior on SNS 3.2.1 Analyzing Factors on the Central Route 3.2.1.1

Topical Relevance with the Information Receiver

Topical relevance refers to the extent to which an other-sourced post is related with a twitterer’s topics of interest. With the increasing popularity of micro-blogging services, users always find themselves overwhelmed by a large number of posts involving all kinds of topics. Researchers have found that internet users rarely read web pages in detail but rather scan the pages to find the information they need [3]. This can be explained by the information processing theory. According to the information processing theory [4], individuals have a limited capacity to deal with all the information they are exposed to in the environments and to process what they do perceive. When surrounded by a great deal of stimuli, only a small fraction of the available information can attract individuals’ attention, since attention is highly selective. Individual motivation and ability are the major individual factors affecting attention. Individual motivation is a drive state created by individual interests and needs. Interests are a reflection of overall lifestyle as well as a result of goals (e.g. being an accomplished guitar player) and needs (e.g. hunger). Ability refers to the capacity of individuals to attend to and process information. Ability is related to knowledge and familiarity with the stimulus. Based on these discussions, we propose that a twitterer’s topical preferences and interests may influence his or her attention and decision-makings when skimming through web pages on SNS. A post in accord with the user’s topics of interest can easily attract his or her attention and thus has a larger chance of being forwarded. Because judging the topical relevance between a tweet and oneself requires the individual to concentrate on the content and comprehend the information clearly, this task calls for an individual’s effortful elaboration on the tweet. Thus, we include topical relevance into the central route. Based on a survey participated by 154 users, Cheung et al. [5] show that relevance has a significant impact on information usefulness and thereby indirectly influence individual information adoption behavior. Harvey et al. [6] investigate the forwarding behavior of 173 respondents in peer-to-peer communication and demonstrate that a sender’s relevance with a YouTube video has a positive impact on the likelihood of forwarding it across a tie. Given the fact that retweeting is not only passing a post to the followers but also an efficient way of posting on SNS [7], we can use the perspective of impression management (IM) to interpret such behavior. IM refers to the process by which individuals strive to control how they are perceived by others [8]. On SNS, individuals can create and manage their impression by constructing a profile, voicing their opinions, disseminating original or other-sourced messages about the self such as hobbies, tastes in music, books, and movies. For example, Naaman et al. [9] examine the content of 3,379 tweets and find that 80% of the 350 users in their study post messages relating

3.2 A Conceptual Framework of Individual Retweeting Behavior on SNS

41

to themselves or their thoughts, as opposed to sharing general news. Therefore, we infer that tweets closely relevant to an individual’s topics of interest may help the individual construct his or her self-image on SNS and thus have a higher chance to be noticed and finally retweeted. And we propose the following: Hypothesis 3.1 Topical relevance has a positive impact on individual information dissemination behavior. 

3.2.1.2

Information Richness

Information richness refers to the extent to which the amount of information in a post is sufficient for decision-makings [10]. Micro-blogging services usually limit the maximum length of a message to 140 characters and thus restrict the information richness of a tweet to some extent. Given that users are mostly averse to losses, richer information indicates lower uncertainties and tends to be more helpful for users to make decisions [11]. By assembling user-marketer interaction content data on Facebook and consumer transactions data, Goh et al. [12] find that information richness of User-Generated Content (UGC) has a significant impact on users’ purchasing behavior. Following the same logic, tweets with richer information are believed to have high information quality and to be more useful to the audience [13], and thus prompt twitterers to share them with their followers. For example, the amount of information has been employed as an important central factor in shaping individual information dissemination intention on SNS [13], influencing the retweeted times of a tweet on Sina Weibo [14], and in affecting individual purchasing intention [15]. Therefore, we include information richness as another central factor in the concept model and hypothesize: Hypothesis 3.2 Information richness has a positive impact on individual information dissemination behavior. 

3.2.2 Analyzing Factors on the Peripheral Route According to the ELM, the peripheral route influences individuals typically through very simple decision criteria and cues such as celebrity endorsements, charisma, or the attractiveness of the sender [1, 16]. Prior research has verified the impacts of source credibility and source attractiveness on the retweeted times of tweets [14, 17] or individual retweeting intentions [13, 18]. Therefore, we include source trustworthiness and source attractiveness into the peripheral route. Besides, other twitterers’ retweeting choices are also an important cue indicating the message’s popularity and credibility and thus significantly affect individual disseminating decisions [19]. Our research model also includes this cue, termed “informational social influence”. Another influential but controversial factor in affecting individual retweeting decisions is the relationship between the message source and the decision-maker [20].

42

3 Research Scheme Design

Since this factor does not require the decision-maker to carefully comprehend or evaluate the content and thus involves less elaboration, we include this factor in the peripheral route. We investigate the relationship in two dimensions. The first dimension is value homophily [21], defined as similarities between two people’s values, likes and dislikes [22]. The second dimension is social tie strength, which is related to—but conceptually distinct from—value homophily, refers to the level of intensity of the linkage between individuals [23].

3.2.2.1

Source-Related Factors

Source trustworthiness Source trustworthiness refers to the extent to which the information source is perceived to be trustworthy by information recipients [1]. On SNS, every user has almost unlimited freedom to publish and express their opinions towards certain issues without disclosing their real identity. It is therefore left up to information receivers to determine the trustworthiness of the message source and the message itself when they decide whether to retweet the message. Wilson et al. [24] has proven that the credibility of information is often positively related to the trustworthiness of the information source. Authors with better reputations will help decrease a message’s quality uncertainty. Prior research has shown that information receiver is more likely to accept the message arguments when those arguments come from a trustworthy message source [25], and retweet it to their followers [14]. Thus, we hypothesize: Hypothesis 3.3 Source trustworthiness has a positive impact on individual information dissemination behavior.  Source attractiveness In the context of SNS, source attractiveness refers to the extent to which the person is welcomed and liked by others, namely the number of his or her followers. The more followers a user has, the more attractive this user is. Actually, the attractiveness of individuals in micro-blogging is another manifestation of their status in the real world [14]. In the two-step model proposed by Lazarsfeld et al. [21], users with a large number of followers on SNS are opinion leaders and have great influence in spreading ideas to others. Using a unique Twitter data set, Suh et al. [17] find that the number of followers of the author has a significant impact on the retweetability of a tweet. Based on two data sets of 360,000 tweets in total, Liu et al. [14] examine the determinants of information retweeting in emergency events (i.e., earthquake and mudslide) and show that source attractiveness has a significant positive impact on the retweeted times for a tweet. Therefore, we hypothesize: Hypothesis 3.4 Source attractiveness has a positive impact on individual information dissemination behavior. 

3.2 A Conceptual Framework of Individual Retweeting Behavior on SNS

3.2.2.2

43

Value Homophily

Value homophily is the level to which pairs of individuals share similarities in attributes such as attitudes, tastes, information, beliefs, which are internal states presumed to shape our orientation toward future behavior [21]. Brown and Reingen [26] investigate the impact of homophily on offline WOM communication and find that homophilic sources are more likely to be used as information sources. Furthermore, Steffes et al. [23] collect survey data from 482 college students and verify that homophilic sources are not only preferred to be used as information sources but also are more influential in consumers’ decision-making process. This is because a message originating from a source with similar likes and dislikes is more likely to invoke more interest than one from a source with dissimilar tastes [22]. Based on prior research, we envisage that, on SNS, posts generated by homophilic sources are more likely to attract individuals’ attention and to be shared with their followers. Hence, we propose the following hypothesis: Hypothesis 3.5 Value homophily between the source node and the receiver has a positive impact on the receiver’s information dissemination behavior. 

3.2.2.3

Social Tie Strength

Social tie refers to the linkage between individuals and social tie strength is the level of intensity of the linkage between individuals [23], which depends on the number and types of resources they exchange, the frequency of exchanges, etc. [27]. In traditional WOM research, Brown and Reingen [26] find that at the micro level, strong ties are more likely to be employed as sources of information and bear greater influence on the receiver’s decision-makings than weak ties. These findings are further confirmed in eWOM research. For example, Bansal and Voyer [28] carry out a survey at a Canadian Forces Base, and find that the effect of the interpersonal forces (e.g., tie strength) on the influence of the sender’s WOM on the receiver’s purchase decisions is significantly positive. Consistently, Chu and Kim [20] show that social tie strength has a positive effect on users’ eWOM behavior such as opinion seeking or opinion passing, based on an online survey participated by 363 college students. This is because strong-tie sources are perceived as more credible and trustworthy than weak-tie sources, and tie strength can operate through trust [29]. Frenzen and Nakamoto [30] demonstrate that strong ties are more likely to transmit information of higher economic value than weak ties. In the context of SNS, unsolicited new posts are more likely to be noticed [22] and shared if they come from close and trusted sources, while posts coming from friends with less interactions will be anticipated as containing potentially less valuable or more suspicious information. Thus, we propose the following: Hypothesis 3.6 Social tie strength has a positive impact on individual information dissemination behavior. 

44

3.2.2.4

3 Research Scheme Design

Informational Social Influence

Social influence refers to the conformity of going along with or agreeing with others or a visible majority [31]. Deutsch and Gerard [32] named two forms of social influence: normative social influence and informational social influence. The former means one’s tendency to comply with the expectations of other individuals, often to gain acceptance. The latter means one’s tendency to conform to the opinions of others, based on information obtained as evidence in judgment. Note that conformity effects resulting from these two forms of social influence are driven by different needs. The former springs from our desire to be liked and accepted by others, and the latter from our desire to be right. In this paper, we focus on informational social influence, which has been studied under the bandwagon effect (and other related concepts such as herd behavior and social proof). On Twitter, other twitters’ attitudes about a particular tweet can be reflected by the number of times this tweet has been retweeted and (or) liked. Therefore, individuals may be influenced by other people’s reactions to this post. Specifically, the more times it has been retweeted, the more value this post is perceived to have, and the more motivated the user becomes in retweeting this post to his or her followers. Rudat and Buder [19] find in an experiment that tweets with star icons indicating other students retweet the particular tweet very often are retweeted more often than tweets without star icons. That is to say, bandwagon effect has a significant effect on people’s retweeting intention. Accordingly, we hypothesize: Hypothesis 3.7 Informational social influence has a positive impact on individual information dissemination behavior. 

3.2.3 The Mediating Role of Social Tie Strength Homophily principle tells us that a contact between similar people occurs at a higher rate than among dissimilar people [33]. Attitude, belief, and value similarity lead to attraction and interaction. Take Twitter as an example, Weng et al. [34] have verified that some users do seriously “follow” others because of common topical interests. After subscribing others’ accounts, users can scan these friends’ updates, comment, or retweet these followees’ posts on SNS.1 Therefore, value homophily not only contributes to the establishment of linkages between individuals, but also is beneficial to information transfer and interactions between them. Hence, we propose: Hypothesis 3.8 Value homophily between individuals has a positive impact on social tie strength between them. 

1 On

SNS such as Twitter, an individual can subscribe to other users to receive their updates timely. And these users are this individual’s followees, also called friends in this paper.

3.2 A Conceptual Framework of Individual Retweeting Behavior on SNS

45

3.2.4 Analyzing Moderators on the Central Route The ELM investigates that individual characteristics, specifically, “motivation, ability, [and] personality trait”, determine the route a person chooses for information processing. For instance, Tam and Ho [35] employ the ELM to examine the moderating effect of personal disposition on their elaboration of personalized content and acceptance decisions. We propose that individual characteristics, namely, prior experience, social connectedness and gender can moderate the relationship between topical relevance and individual retweeting decision. Specifically, experienced individuals have the knowledge and skills, and thus the ability to deliberate by using a central route. Social connectedness is also relevant because individuals with lots of followers are more motivated to care about their self-image on SNS, and thus think centrally when publishing tweets. We adopt gender as an agglomeration of personality traits that tend to affect individuals’ deliberation route. We argue that tweet characteristics, namely, tweet length and time interval since the last retweeting behavior, are important moderators of how topical relevance affects individual retweeting decision. This effect is supported by prior research. For example, Ma et al. [36] validate that consumers think more when they write longer reviews and tend to deliberate centrally. This makes sense because writing longer content calls for individuals to reflect on a larger amount of information and details, and thus these individuals are prone to think more and follow a central route. Similarly, the time interval since the last retweeting behavior indicates individual activity level and affects their need, or lack thereof, to adopt the central deliberation route. Finally, we claim that interpersonal relationships are important moderators of how topical relevance affects individual retweeting decision. Prior research shows that the more important a consumer feels the source of the information is, the more considerable influence the source exerts on the consumer’s decision when searching and passing product-focused information in the online social media [20], because they trust strong ties to be benevolent and competent to transfer useful knowledge or information [37]. Steffes and Burgee [23] justify that homophilic sources are more likely to be used as information sources and furthermore, information from homophilic sources is more influential in consumers’ decision making.

3.2.4.1

Individual Characteristics

Cumulative experience From the perspective of impression management [8], retweeting is not only disseminating a post to one’s followers but also a means to construct one’s self-image on SNS. Thus, individuals need to consider some issues when deciding to retweet. For example, the judgement about the credibility of the tweet is a critical issue. Actually, Morris et al. [38] find that “users are poor at judging the truthfulness based on content alone, and instead are influenced by heuristics such as author-related information”. They refer to users “authoring tweets more fre-

46

3 Research Scheme Design

quently than others” as experienced Twitter users and justify that these users perform better at judging tweet credibility than those who have less experience. Therefore, we expect that experienced users are more skillful in assessing true merits and the relevance of the tweet when forming attitudes toward the tweet. Drawing on ELM, the topical relevance of a message may have a greater influence on experienced users than inexperienced users. Thus, we hypothesize: Hypothesis 3.9a The impact that topical relevance has on individual retweeting behavior is stronger for more experienced individuals.  Social connectedness Social connectedness refers to the number of followers one has in social networks. Accordingly, social connectedness affects the extent to which users feel socially connected [39]. Compared with less popular users, those with a larger number of followers have an intensified awareness regarding other members within the network [40], and thus are more motivated to maintain self-image within the network [36]. The interpersonal pressure motivates the individuals to invest more cognitive effort and thoughtfully consider the message content when deciding to retweet. Therefore, we infer that message content-related cues should have a stronger impact on users with more followers. Furthermore, users with larger number of followers are more skillful in interacting with followers, have more accurate expectations of followers’ responses to the shared content, and have established a clear understanding of their roles within the network [41]. Therefore, they are very adept at assessing whether retweeting some kinds of tweets can help enhance their self-image in the network. The confidence makes them more likely to judge the topical relevance of the content when making retweeting decisions. Therefore, we propose: Hypothesis 3.9b The impact that topical relevance has on individual retweeting behavior is stronger for individuals with larger number of followers.  Gender It has been reported that when processing information, compared with males, females often “engage in more detailed elaboration of specific message content” [42] and “exhibit greater sensitivity to the particulars of relevant information when forming judgements than males” [43]. Meyers et al. [44] suggest that females should be more likely to elaborately store message material and employ a detailed strategy at recognition, which involves a thorough and deliberate search of memory. In contrast, males’ processing is more likely to be driven by a less effortful strategy, which consists of identifying an overall theme and making judgements accordingly. Ma et al. [36] verify that males tend to rely more on peripheral route, namely prior reviews to give his own subsequent rating. Based on these arguments, we expect females are more likely to follow the central route when making retweeting decisions, as females are prone to engage in detailed elaboration of the content. Thus, we hypothesize: Hypothesis 3.9c The impact that topical relevance has on individual retweeting behavior is stronger for female users. 

3.2 A Conceptual Framework of Individual Retweeting Behavior on SNS

3.2.4.2

47

Tweet Characteristics

Tweet length [36] claim that consumers tend to deliberate centrally when they write longer reviews, because longer reviews call for reflecting on a larger amount of information and encourage reviewers to think more about their experiences. The same logic can be applied to the SNS context. When publishing original tweets, users articulate their ideas to make the content consistent, accurate, and complete. Those who tend to produce longer original tweets expend more cognitive effort in composing the content, enjoy thinking deliberately and carefully. According to [45], these users should score high on the Need for Cognition (NFC) Scale, whereas persons scoring low on the scale have a tendency to avoid effortful cognitive work. Previous research on ELM indicates that NFC serves as a moderator of elaboration via the central route [46]. Therefore, we propose that people who produce longer original tweets are more likely to scrutinize the content and judge the topical relevance of the tweet. Hypothesis 3.10a The impact that topical relevance has on individual retweeting behavior is stronger for users producing longer original tweets.  Time interval The time interval between retweets indicates users’ level of activity and involvement because more active users have short waits between retweets on SNS. According to the ELM, active users, as quantified by a shorter lag between retweets, are more involved and more likely to be independent when making retweeting decisions, and thus deliberate in a more central fashion [47]. So we conjecture that the effect that topical relevance has on individual retweeting behavior is stronger for shorter time intervals, that is, for active individuals on SNS. Thus, we hypothesize: Hypothesis 3.10b The impact that topical relevance has on individual retweeting behavior is weaker for longer time intervals between retweets. 

3.2.4.3

Interpersonal Relationships

Social tie strength Researchers have verified that strong ties—those typified as close and by frequent interactions—help to stimulate trust [37, 48]. Furthermore, Priester et al. [49] demonstrate that information presented by trustworthy sources is likely to be unthinkingly accepted, whereas information coming from untrustworthy sources tends to be carefully scrutinized. That is to say, social tie strength may influence the extent of message scrutiny efforts. Shen et al. [50] find that when consumers receive an advertisement from a weak tie, they will spend more effort analyzing the content. In contrast, when an advertisement comes from a strong tie, consumers’ advertising literacy plays a minimal role in verification of the truth and purpose of the advertisement. Thus, we envisage that when a tweet comes from a strong tie, individuals are prone to accept it unthinkingly and may share it without deliberating over the content carefully. Stated formally,

48

3 Research Scheme Design

Fig. 3.1 Research model

Hypothesis 3.11a The impact that topical relevance has on individual retweeting behavior is weaker for tweets coming from strong ties.  Value homophily Value homophily is the extent to which pairs of individuals share similarities in terms of attitudes, tastes, and beliefs, which are internal states presumed to shape our orientation toward future behavior [21]. Value homophily is inversely related to level of uncertainty and positively related to relational safety and thus value homophily contributes to interpersonal trust in the communication process, as relational safety is one dimension of interpersonal trust [21, 51]. Therefore, when a tweet comes from a followee (i.e. a person who the individual follows) with similar tastes and preferences, it’s very likely that the individual shows greater interest and favorable attitudes toward the tweet. Under such situation, instead of doing extensive cognitive work, individuals may rely on a variety of peripheral cues—such as “liking” or “authority”— that allow them to make quick decisions. Consequently, the effect of central route such as true merits and the relevance of the arguments on this individual’s response decreases. That is to say, individuals are prone to save their cognitive effort in evaluating the content and make retweeting decisions without considering the content too much. Stated formally, Hypothesis 3.11b The impact that topical relevance has on individual retweeting behavior is weaker for tweets coming from homophilic followees.  To sum up, we have proposed eleven hypotheses, which are illustrated in Fig. 3.1.

3.3 Method

49

Table 3.1 Data set description Users Users’ tweets 1250

1,479,310

Followees

Followees’ tweets Crawling period

54,322

58,566,602

2016.01.13 ∼2016.03.13

3.3 Method 3.3.1 Data Collection Twitter has become one of the most influential social networking and micro-blogging websites since its emergence in 2006 and has 317 million monthly active users as of the third quarter of 2016. From January 13 to March 13, 2016, we randomly selected 1250 members of Twitter and crawled posts published by each member since he/she created the Twitter account. Using Twitter API, each user’s profile including his or her id, #posts, verification status, #followers, #followees, time when the account was created, and all this user’s historical posts including retweets were crawled. Restricted by Twitter, the maximum number of tweets returned for each user is limited to be 3200, even though a twitterer has published more than 3200 tweets. Besides that, we also crawled tweets published by each user’s followees. Totally, there are 54, 322 followees in the data set, and the number of followees’ tweets is 58, 566, 602. Details about the data set are shown in Table 3.1. We conduct some high level analyses on this dataset. The results are illustrated in Fig. 3.2 and all distributions are drawn in log-log scale. As can be seen from Fig. 3.2a, b, a small portion of users have a huge number of followers or followees. Actually, only 0.043% users have more than 2 million followers, which coincides with the discoveries that 0.05% of the user population attracts almost 50% of all attention within Twitter [52]. Figure 3.2c, d show the distributions of the number of tweets posted and retweeted by each user, respectively. It can be seen that all the four counts follow the power-law distribution, similar to the results in [53] and [34]. To the best of our knowledge, we are the first to reveal that the number of retweets on Twitter also follows a power-law distribution.

3.3.2 Measure There is one dependent variable and seven latent independent variables belonging to two different routes. Based on the data collected from Twitter, each variable will be measured as follows.

50

3 Research Scheme Design

Fig. 3.2 Statistics of the dataset

3.3.2.1

Dependent Variable

We employ the information dissemination behavior provided by a twitterer to represent his or her retweeting decision when confronting a particular tweet. This variable is dichotomous with “1” denoting this tweet was disseminated by the individual and “0” denoting this tweet was not disseminated by the individual.

3.3.2.2

Independent Variables

• Topical relevance. Topical relevance is calculated using cosine similarity based on latent topic distributions [54, 55]. The definition is formulated in Eq. (3.1). We use a user’s retweets to generate his topic model with Latent Semantic Analysis algorithm [56]. Then in order to obtain the topic distribution Ps j for the jth twitterer s j , all s j s retweets are aggregated into a big document and the distribution of this big document is estimated using the topic model generated just now. To obtain the topic distribution

3.3 Method

51

Pi for a new tweet tweeti , tweeti is treated as a document and transformed using the topic model generated just now. Definition 3.1 Topical relevance between the ith tweet and the jth twitterer s j can be calculated as T opical_r elevance(tweeti , s j ) =

Ptweeti · Ps j Ptweeti  · Ps j 

(3.1)

where Ptweeti is the topic distribution for the ith tweet and Ps j is the topic distribution for the jth twitterer s j . • Information richness. Number of URLs. We operationalize information richness of a tweet using two measures. The first measure is the number of URLs in a post, which is also used by prior study to measure the information richness and information quality of posts in micro-blogging [13, 14]. Note that the maximum length of a post is restricted to 140 characters on Twitter. URLs in a post can redirect people to videos, interesting web pages, and other web contents to obtain further information, and thus expand the informativeness of a post. Number of hashtags. The second measure is the number of hashtags. Hashtag, a word beginning with the # symbol, is added to posts to aggregate messages which revolve around the same topic. Hashtag can categorize messages and facilitates users’ search. By highlighting keywords or topics (e.g., #intelpowered,#Christmas), hashtags make the post more informative and help people grasp the theme of a post quickly. • Source trustworthiness. In the study of Liu et al. [14], the trustworthiness of a user is determined by whether the user status is verified or not. Micro-blogging platforms use authentication mechanism to assure the authenticity of user identity, and a verified user is signified by a blue tick beside the screen name, as illustrated in Fig. 3.3. Therefore, this variable is dichotomous with “1” standing for a trustworthy user and “0” otherwise. • Source attractiveness. On Twitter, users can follow any other user they are interested in. The number of followers reflects the attractiveness and likeability of a user, and is another manifestation of his or her status in the real world. Therefore, we employ the number of followers to represent the attractiveness of the source node as prior research [14]. • Value homophily. We operationalize value homophily between individuals using their homophily in topical preferences which can be inferred from their retweeting histories. For example, Tang et al. [55] demonstrate the usefulness of topical homophily between individuals in predicting users’ retweeting behavior on SNS and cosine similarity

52

3 Research Scheme Design

Fig. 3.3 The profile photo of Donald J. Trump (the 45th President of USA) on Twitter

is used in their study to calculate the topical homophily between two individuals. The definition of value homophily is formulated in Eq. (3.2). Definition 3.2 Value homophily between the ith twitterer si and the jth twitterer s j can be calculated as V alue_homophily(si , s j ) =

Psi · Ps j Psi  · Ps j 

(3.2)

where Psi is the topic distribution for the ith twitterer s j and Ps j is the topic distribution for the jth twitterer s j . • Social tie strength. On SNS, individuals can interact with each other in many ways such as commenting on others’ tweets, directly messaging others, or disseminating others’ tweets to his or her own followers. Based on our analysis, we find that twitterers are rather picky in retweeting their followees’ tweets. On average, followees whose tweets have ever been retweeted by a user account for less than 5.5% of a user’s followees. This is understandable as retweeting is not only disseminating information but also presenting one’s self-image to others [57]. Thus, retweeting a followee is a strong indicator of their relationship, implying trust, endorsement and support of claims made by that followee. We therefore employ “the number of times the user has retweeted a friend” to denote the tie strength between them.

3.3 Method

53

• Informational social influence. We employ the number of retweeted times of a particular tweet to represent other people’s opinions about this tweet.

3.3.2.3

Moderating Variables

• Individual characteristics. Based on prior studies [36, 58], we employ the cumulative number of original tweets contributed by individual i during the past six months before time t (CumExp_it) to represent an individual’s level of cumulative experience at time t.2 We use the log value of the measure to control for possible nonlinearity. Social connectedness is measured by the log-transformed number of followers (Followers_i) an individual has on Twitter. And we inferred individual i’s gender (Gender_i ) via this user’s profile photo and double-checked the result using Gender-API which is an online database and decides the gender based on the first name of the user.3 It is a binary variable with “1” denoting female and “0” denoting male. • Tweet characteristics. Tweet length is the average number of words in each original tweet contributed by individual i during the past six months before time t (TwtAveLen_it). Time interval (Elapse_it) is the log-transformed number of days that elapsed since the last time the user retweeted an other-sourced tweet. • Interpersonal relationships. We operationalize interpersonal relationships between the individual (the receiver) and the author (the source) using two measures. The first measure is social tie strength. As mentioned above, retweeting a followee is a strong indicator of their relationship, implying trust, endorsement and support of claims made by that followee. We therefore employ the log-transformed number of times the user has retweeted a followee to denote the tie strength between them [54]. The second measure is value homophily between individuals which is assessed based on homophily in topical preferences and can be inferred from their retweeting histories. It is computed as the cosine similarity between the latent topic distributions of two individuals, formulated in Equation (3.2). Note that to reduce multicollinearity in the regression, we calculate all interaction terms using centered variables. For example, we include the interaction term 2 Actually, the following variables have a high correlation coefficient (0.8∼0.9) between each other:

the number of original tweets, the frequency of authoring original tweets, the number of tweets (including both original tweets and retweets), and the frequency of tweeting. 3 https://www.gender-api.com/en. They use data from publicly available governmental sources and combine with data crawled from social networks. Each name has to be verified by different sources to be added to their database.

54

3 Research Scheme Design

between the centered log number of Follower s and centered T opr ele to assess the impact of social connectedness on an individual’s retweeting behavior.

3.3.2.4

Control Variables

• Argument sentiment. Under the help of automated sentiment analysis technique, the sentiment for each post is represented in two dimensions: the polarity (i.e., valence) and the emotionality (i.e., affect ladenness) [59]. These techniques are well established [60] and increase coding ease and objectivity. TextBlob is a Python library for processing textual data.4 It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, and so on. The emotionality score is a float within the range [0.0, 1.0], where 0.0 is very objective and 1.0 is very subjective. The polarity score is a float within the range [−1.0, 1.0], where −1.0 is extremely negative and 1.0 is extremely positive. • Number of mention. “Mentioning” is the practice of referring to another user in a post via the use of “@username”. Thus, it is a form of “addressivity” aiming to gain the target person’s attention, which is essential for conversation to occur.

3.3.3 Data Analysis Methods This subsection introduces data analysis methods primarily employed in this monograph.

3.3.3.1

Correlation Analysis

The correlation analysis is the statistical tool used to study the closeness of the linear association between two variables. The sample correlation coefficient, denoted r , ranges between −1 and +1 and quantifies the direction and strength of the linear association between the two variables. A correlation coefficient of +1 indicates that two variables are perfectly related in a positive linear sense; a correlation coefficient of −1 indicates that two variables are perfectly related in a negative linear sense; and a correlation coefficient of 0 indicates that there is no linear relationship between the two variables. In this monograph, we use the correlation analysis to detect the linear association relationships between different variables. Independent variables with strong correlation coefficient may raise the issue of multicollinearity [61].

4 http://textblob.readthedocs.org/en/dev/.

3.3 Method

55

Correlations are useful because they can indicate a predictive relationship that can be exploited in practice. However, correlation analysis can not be interpreted as establishing cause-and-effect relationships. They can indicate only how or to what extent variables are associated with each other. The correlation coefficient measures only the degree of linear association between two variables. Any conclusions about a cause-and-effect relationship must be based on the judgment of the analyst.

3.3.3.2

Panel Logit Model

A longitudinal, or panel, data set is one that follows a given sample of individuals over time, and thus provides multiple observations on each individual in the sample [62]. A panel is said to be balanced if we have the same time periods, t = 1, ..,T , for each cross section observation. For an unbalanced panel, the time dimension, denoted Ti , is specific to each individual. Panel data allows to control for omitted (unobserved or mismeasured) variables and thus possess several major advantages over conventional cross-sectional or time-series data sets. In this monograph, our analysis is based on a panel data set crawled from Twitter.com across four years, the details of which are listed in Table 3.1. For panel data, if the dependent variable is dichotomous, then the model is called “binary choice model for panel data”. We can use a latent variable to denote the net yield of the binary decision, formulated in Eq. 3.3: 

yit∗ = xit β + μi + εit

(i = 1, ..., n; t = 1, ..., T ),

(3.3) 

where yit∗ is unobservable, μi denotes individual effects, and usually xit does not contain the constant term. If the latent variable is larger than 0, then the choice is “Yes”. Otherwise, the choice is “No”. Namely an individual’s choice yit follows this principle:  1, if yit∗ > 0 yit = (3.4) 0, otherwise 

For given xit , β, μi , we have the following: P(yit = 1|xit , β, μi ) = P(yit∗ > 0|xit , β, μi ) 

=P(εit > −μi − xit β|xit , β, μi ) 

=P(εit < μi + xit β|xit , β, μi ) 

=F(μi + xit β), where F(·) is the cumulative distribution function of εit and assume the probability density function of εit is symmetric with respect to the origin. If εit ∼ N (0, 1), then we have a Probit model:

56

3 Research Scheme Design 

P(yit = 1|xit , β, μi ) = (μi + xit )

(3.5)

If εit follows the logistic distribution, then we have a panel logit model 



P(yit = 1|xit , β, μi ) = (μi + xit ) =

3.3.3.3

eμi +xit β 

1 + eμi +xit β

(3.6)

Negative Binomial Regression

Suppose we are doing independent Bernoulli trials. Thus, each trial has two possible outcomes called “success” and “failure”. In each trial the success probability is p and the failure probability is (1-p). We observe this sequence until a predefined number r of failures have happened. Then the random number of successes we have seen, X , will have the negative binomial distribution: X ∼ N B(r ; p)

(3.7)

From above, we can see that negative binomial regression is for modeling count variables, usually for over-dispersed count outcome variables. That is to say, when a variable denotes the count and its variance is obviously larger than its expectation, we usually use negative binomial distribution to depict the probability density function of this variable. In this monograph, when validating the mediating effect, the dependent variable is non-negative integer and its mean is much larger than its variance. Thus we adopt the negative binomial regression model.

3.3.3.4

Multilevel Linear Model

Multilevel data are characterized by a hierarchical structure. A classic example is children nested within classrooms and classrooms nested within schools. The students’ performance in a test within the same classroom is correlated due to exposure to the same teacher or textbook. Likewise, the average performance of classes may be correlated within a school due to reasons such as the similar socioeconomic level of the students, the same study atmosphere in that school, and the same way of management in that school. In this monograph, repeated retweeting decisions are nested within each individual in the sample set. Specifically, retweeting activities are at the bottom level unit of analysis (i.e., topical relevance, time interval, tweet length), and individual characteristics (i.e., cumulative experience, social connectedness, gender) and interpersonal relationships (social tie strength, value homophily) are at the top-level unit of analysis [36, 63]. The rationale is that retweets of one individual tend to be dramatically distinct from those of another individual, especially in terms of the intra-individual correlation [64]. As such, unless a hierarchical model is employed, estimation results

3.3 Method

57

are likely to be inconsistent [65], especially for the cross-level moderating effects (H 9-H 11). Thus, we choose to use multilevel linear model to validate the moderating effect.

3.4 Conclusion Existing research often pays attention to limited number of factors and virtually no researchers have systematically investigate the drivers of individual information dissemination behavior. In order to bridge this gap, we propose an integrated conceptual framework to investigate factors affecting individual information dissemination behavior on SNS based on Elaboration-Likelihood Model. First, we identify influential factors based on information processing theory, bandwagon effect, and prior studies. These factors are classified into two different information processing routes according to whether they demand a high level of elaboration. Besides, research on the impacts of relationships between the source and the receiver on individual sharing behavior is limited and still controversial. We envisage that social tie strength mediates the effect of homophily on individual retweeting decisions based on homophily principle. Finally, considering the fact that personalized advertising has become one of the hottest trends in online advertising, we look into the effect of topical relevance on individual retweeting decisions by examining three kinds of moderators: individual characteristics, tweet characteristics, and interpersonal characteristics. Based on ELM, we investigate their moderating effects on the impact that topical relevance has on individual retweeting decisions. After formulating the conceptual framework, we present the research methods primarily adopted in the monograph. More specifically, we introduce our data collection method, the data corpus, variable measurement, and statistical models employed to validate those hypotheses in the conceptual model. In the following chapters, we will validate the proposed hypotheses based on the data corpus and the data modeling methods covered in this chapter. Before diving into the validation work, we would like to do some exploratory work to know more details about individual retweeting behavior on SNS.

References 1. Petty, R.E., Cacioppo, J.T.: The elaboration likelihood model of persuasion. Adv. Exp. Soc. Psychol. 19, 123–205 (1986) 2. Sussman, S.W., Siegal, W.S.: Informational influence in organizations: an integrated approach to knowledge adoption. Inf. Syst. Res. 14(1), 47–65 (2003) 3. Nah, F.F.H., Davis, S.: Hci research issues in e-commerce. J. Electron. Commer. Res. 3(3), 98–113 (2002) 4. Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81 (1956)

58

3 Research Scheme Design

5. Cheung, C.M., Lee, M.K., Rabjohn, N.: The impact of electronic word-of-mouth: the adoption of online opinions in online customer communities. Int. Res. 18(3), 229–247 (2008) 6. Harvey, C.G., Stewart, D.B., Ewing, M.T.: Forward or delete: what drives peer-to-peer message propagation across social networks? J. Consumer Behav. 10(6), 365–372 (2011) 7. Recuero, R., Araujo, R., Zago, G.: How does social capital affect retweets? In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, pp. 305–312 (2011) 8. Goffman, E. et al.: The presentation of self in everyday life. Harmondsworth (1978) 9. Naaman, M., Boase, J., Lai, C.H.: Is it really about me?: message content in social awareness streams. In: Proceedings of the 2010 ACM conference on Computer supported cooperative work, pp. 189–192. ACM (2010) 10. Chen, C.C., Tseng, Y.D.: Quality evaluation of product reviews using an information quality framework. Decis. Support Syst. 50(4), 755–768 (2011) 11. Kahneman, D., Tversky, A.: Prospect theory: an analysis of decision under risk. Econometrica: J. Econ. Soci., 263–291 (1979) 12. Goh, K.Y., Heng, C.S., Lin, Z.: Social media brand community and consumer behavior: quantifying the relative impact of user-and marketer-generated content. Inf. Syst. Res. 24(1), 88–107 (2013) 13. Yan, W., Huang, J.: Microblogging reposting mechanism: an information adoption perspective. Tsinghua Sci. Technol. 19(5), 531–542 (2014) 14. Liu, Z., Liu, L., Li, H.: Determinants of information retweeting in microblogging. Int. Res. 22(4), 443–466 (2012) 15. Park, D.H., Lee, J.: ewom overload and its effect on consumer behavioral intention depending on consumer involvement. Electron. Commerce Res. Appl. 7(4), 386–398 (2009) 16. Angst, C.M., Agarwal, R.: Adoption of electronic health records in the presence of privacy concerns: the elaboration likelihood model and individual persuasion. MIS Quart. 33(2), 339– 370 (2009) 17. Suh, B., Hong, L., Pirolli, P., Chi, E.H.: Want to be retweeted? large scale analytics on factors impacting retweet in twitter network. In: Social computing (socialcom), 2010 IEEE Second International Conference on, pp. 177–184. IEEE (2010) 18. Ha, S., Ahn, J.: Why are you sharing others’ tweets?: The impact of argument quality and source credibility on information sharing behavior. In: Proceedings of the 32nd International Conference on Information Systems (2011) 19. Rudat, A., Buder, J.: Making retweeting social: the influence of content and context information on sharing news in twitter. Comput. Human Behav. 46, 75–84 (2015) 20. Chu, S.C., Kim, Y.: Determinants of consumer engagement in electronic word-of-mouth (ewom) in social networking sites. Int. J. Advert. 30(1), 47–75 (2011) 21. Lazarsfeld, P.F., Merton, R.K., et al.: Friendship as a social process: a substantive and methodological analysis. Freedom Control in Modern Soc. 18(1), 18–66 (1954) 22. De Bruyn, A., Lilien, G.L.: A multi-stage model of word-of-mouth influence through viral marketing. Int. J. Res. Market. 25(3), 151–163 (2008) 23. Steffes, E.M., Burgee, L.E.: Social ties and online word of mouth. Int. Res. 19(1), 42–59 (2009) 24. Wilson, E.J., Sherrell, D.L.: Source effects in communication and persuasion research: a meta-analysis of effect size. J. Acad. Market. Sci. 21(2), 101–112 (1993) 25. Chu, S.C., Kamal, S.: The effect of perceived blogger credibility and argument quality on message elaboration and brand attitudes: an exploratory study. J. Interact. Advert. 8(2), 26– 37 (2008) 26. Brown, J.J., Reingen, P.H.: Social ties and word-of-mouth referral behavior. J. Consumer Res. 14(3), 350–362 (1987) 27. Marsden, P.V., Campbell, K.E.: Measuring tie strength. Soc. Forces 63(2), 482–501 (1984) 28. Bansal, H.S., Voyer, P.A.: Word-of-mouth processes within a services purchase decision context. J. Serv. Res. 3(2), 166–177 (2000) 29. Coleman, J.S., Coleman, J.S.: Foundations of Social Theory. Harvard University Press, Cambridge, MA (1994)

References

59

30. Frenzen, J., Nakamoto, K.: Structure, cooperation, and the flow of market information. J. Consumer Res. 20(3), 360–375 (1993) 31. Jahoda, M.: Conformity and independence a psychological analysis. Human Relat. 12(2), 99–120 (1959) 32. Deutsch, M., Gerard, H.B.: A study of normative and informational social influences upon individual judgment. J. Abnormal Soc. Psychol. 51(3), 629 (1955) 33. McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Sociol., 415–444 (2001) 34. Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010) 35. Tam, K.Y., Ho, S.Y.: Web personalization as a persuasion strategy: an elaboration likelihood model perspective. Inf. Syst. Res. 16(3), 271–291 (2005) 36. Ma, X., Khansa, L., Deng, Y., Kim, S.S.: Impact of prior reviews on the subsequent review process in reputation systems. J. Manag. Inf. Syst. 30(3), 279–310 (2013) 37. Levin, D.Z., Cross, R.: The strength of weak ties you can trust: the mediating role of trust in effective knowledge transfer. Manag. Sci. 50(11), 1477–1490 (2004) 38. Morris, M.R., Counts, S., Roseway, A., Hoff, A., Schwarz, J.: Tweeting is believing?: understanding microblog credibility perceptions. In: Conference on Computer Supported Cooperative Work, pp. 441–450 (2012) 39. Lin, H., Fan, W., Chau, P.Y.: Determinants of users’ continuance of social networking sites: a self-regulation perspective. Inf. Manag. 51(5), 595–603 (2014) 40. Riedl, C., Köbler, F., Goswami, S., Krcmar, H.: Tweeting to feel connected: a model for social connectedness in online social networks. Int. J. Human-Comput. Interact. 29(10), 670–687 (2013) 41. Tsai, H., Bagozzi, R.P.: Contribution behavior in virtual communities: cognitive, emotional, and social influences. Manag. Inf. Syst. Quart. 38(1), 143–164 (2014) 42. Carol, G.: In a different voice: Psychological theory and women’s development. Harvard, Cambridge, MA (1982) 43. Meyers-Levy, J., Sternthal, B.: Gender differences in the use of message cues and judgments. J. Market. Res., 84–96 (1991) 44. Meyers-Levy, J., Maheswaran, D.: Exploring differences in males’ and females’ processing strategies. J. Consumer Res. 18(1), 63–70 (1991) 45. Petty, R.E., Briñol, P., Loersch, C., McCaslin, M.J.: The need for cognition. Handbook of individual differences in social behavior, pp. 318–329 (2009) 46. Haugtvedt, C.P., Petty, R.E., Cacioppo, J.T.: Need for cognition and advertising: understanding the role of personality variables in consumer behavior. J. Consumer Psychol. 1(3), 239–260 (1992) 47. Kim, D., Benbasat, I.: Trust-related arguments in internet stores: a framework for evaluation. J. Electron. Commerce Res. 4(2), 49–64 (2003) 48. Tsai, W., Ghoshal, S.: Social capital and value creation: the role of intrafirm networks. Acad. Manag. J. 41(4), 464–476 (1998) 49. Priester, J.R., Petty, R.E.: The influence of spokesperson trustworthiness on message elaboration, attitude strength, and advertising effectiveness. J. Consumer Psychol. 13(4), 408–421 (2003) 50. Shen, G.C.C., Chiou, J.S., Hsiao, C.H., Wang, C.H., Li, H.N.: Effective marketing communication via social networking site: the moderating role of the social tie. J. Bus. Res. 69(6), 2265–2270 (2016) 51. Wheeless, L.R.: A follow-up study of the relationships among trust, disclosure, and interpersonal solidarity. Human Commun. Res. 4(2), 143–157 (1978) 52. Wu, S., Hofman, J.M., Mason, W.A., Watts, D.J.: Who says what to whom on twitter. In: Proceedings of the 20th International Conference on World Wide Web, pp. 705–714. ACM (2011)

60

3 Research Scheme Design

53. Zhang, J., Tang, J., Li, J., Liu, Y., Xing, C.: Who influenced you? predicting retweet via social influence locality. ACM Trans. Knowl. Discov. Data (TKDD) 9(3), 25 (2014) 54. Xu, Z., Yang, Q.: Analyzing user retweet behavior on twitter. In: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), pp. 46–50. IEEE Computer Society (2012) 55. Tang, X., Miao, Q., Quan, Y., Tang, J., Deng, K.: Predicting individual retweet behavior by user similarity: a multi-task learning approach. Knowl. Based Syst. 89, 681–688 (2015) 56. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41(6), 391 (1990) 57. Chung, N., Nam, K., Koo, C.: Examining information sharing in social networking communities: applying theories of social capital and attachment. Telematics Inf. 33(1), 77–91 (2016) 58. Gu, B., Konana, P., Raghunathan, R., Chen, H.M.: Research note: the allure of homophily in social media: Evidence from investor responses on virtual communities. Inf. Syst. Res. 25(3), 604–617 (2014) 59. Stieglitz, S., Dang-Xuan, L.: Emotions and information diffusion in social media—sentiment of microblogs and sharing behavior. J. Manag. Inf. Syst. 29(4), 217–248 (2013) 60. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations Trends Inf. Tetrieval 2(1–2), 1–135 (2008) 61. Evans, J.D.: Straightforward Statistics for the Behavioral Sciences. Brooks/Cole (1996) 62. Hsiao, C.: Analysis of Panel Data. 54. Cambridge University Press (2014) 63. Hofmann, D.A.: An overview of the logic and rationale of hierarchical linear models. J. Manag. 23(6), 723–744 (1997) 64. Phillips, D.M., Baumgartner, H.: The role of consumption emotions in the satisfaction response. J. Consumer Psychol. 12(3), 243–252 (2002) 65. Gelman, A., Hill, J.: Data Analysis Using Regression And Multilevel/Hierarchical Models. Cambridge University Press (2006)

Chapter 4

Dominating Factors Affecting Individual Retweeting Behavior

4.1 Introduction In this chapter, we carry out an exploratory study on individual retweeting behavior. All of prior research is based on an unstated premise—an individual’s retweeting behavior is not a random occurrence. However, this premise is taken as a given fact and left unsubstantiated. We claim that re-examining this unstated assumption is necessary, because any prediction work will be rendered meaningless without this premise. Thus, in Sect. 4.2, we examine whether individual information dissemination behavior is random or not from two different perspectives. After validating the non-randomness of such behavior, the following questions emerge naturally: what factors will have an impact on such behavior? Are these factors equally important? If not, which factors are the most important ones? We try to answer these questions in the subsequent sections. According to our observations and experience, people are willing to spend time on their favorite topics, whether online or offline. On douban,1 for example, users can choose to join in various groups which usually focus on a certain topic such as cosmetics, a celebrity, or keeping fit, etc. Besides, it is not uncommon that people are often interested in some topics whereas show indifference to other topics. Thus we propose a variable—topic_distance—to measure the closeness of a post with a certain person in Sect. 4.3 and validate the capability of this feature in distinguishing different classes of tweets. In addition to this influential factor, many other factors also have an impact on individual retweeting behavior. We take “@ jk_rowling”— a specific twitterer—as an example to examine the priorities of these factors and pick out a subset of dominating factors from a pile of factors. In order to know whether the findings from this twitterer can be generalized to other twitterers, we examine the relative importance of all relevant factors on a large sample using various feature evaluation methods such as filter models, hybrid models, and other methods in Sect. 4.4. Finally, we obtain salient features which have substantial influence on individual retweeting decisions.

1 www.douban.com.

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 J. Shi et al., Individual Retweeting Behavior on Social Networking Sites, https://doi.org/10.1007/978-981-15-7376-7_4

61

62

4 Dominating Factors Affecting Individual Retweeting Behavior

When investigating individual retweeting behavior on SNS, we also wish to precisely forecast whether an individual will share a specific tweet or not. High prediction performance is desirable for marketers as they exert themselves to induce individuals to share online contents on social networking sites. Thus, in Sect. 4.5, we compare the prediction performance of salient features with that of the full feature set and find that the salient features can not only save the cost of measuring trashy features but also improve the prediction performance moderately under some classification methods. Finally, Sect. 4.6 concludes this chapter.

4.2 Verification of the Non-randomness of Individual Retweeting Behavior If randomness dominates a user’s retweeting decisions, such a user just randomly shares the posts that run across his or her timeline. Otherwise, it might be the opposite; i.e., tweets that are not so appealing to him or her are filtered out and only some posts deemed interesting, important, or entertaining are shared with the followers. In this section, we will examine whether such behavior is random or not from two perspectives: (1) the social interaction perspective; (2) the topical perspective.

4.2.1 Verification from Social Interaction Perspective As mentioned in Subsec. 2.2.4, people can “follow” others in order to get the latest updates from these friends. And the “retweet” function of Twitter helps users quickly share a tweet with all of the user’s followers. Here we call a followee (i.e., friend) as the user’s “close friend” if this user has ever retweeted this followee’s tweets at least once. Suppose that an individual often randomly share the tweets that are pushed by his or her friends. Then we should expect that each friend has more or less the same opportunity of being retweeted (i.e., being that individual’s “close friend”). We analyze #close_friends, #all_friends and the ratio of the latter to the former using the data set shown in Table 3.1 and the result is shown in Table 4.1. As all these three statistics are highly right-skewed, we use median value to describe the central tendency. For example, the median value of “#all_friends” is 789, almost 20 times the median value of “#close_friends”. The median value of “ratio” means that one out of 19 followees—5.26%—has been retweeted by this user at least once. Thus, followees are treated quite differently by users in the sense of how many times the user has retweeted this friend; almost 95% of followees have never been retweeted by the user. We can draw a conclusion that Twitter users are rather “picky” in retweeting their followees’ tweets. It’s only a very small proportion of followees— T Scn .

4.2.2.2

Hypothesis Testing

We use the data set shown in Table 3.1 to test Hypothesis 4.1. Denote the set of users in the core network as S, thus |S| = 1250 according to Table 3.1. Since the effectiveness of the topic model can be severely impaired when the length of the “(aggregated) document” is too short [5], a twitterer who has less than 50 retweets or declares himself or herself as a non-English user will be considered invalid in our research. Denote the set of valid users as S a and |S a | = 463. Individual statistical hypothesis testing is conducted for every twitterer si ∈ S a . For the kth time, first we randomly choose 20% instances from si s retweeting history, yielding chosen RTk , r est RTk , and calculate topic similarity T Scr _k according to Eq. (4.1). Next, non RTk is randomly chosen from non-retweets and T Scn_k is calculated according to Eq. (4.2). By

4.2 Verification of the Non-randomness of Individual Retweeting Behavior

65

repeating this process 50 times we obtain two groups: {T Scr _1 , T Scr _2 , . . . , T Scr _50 }, {T Scn_1 , T Scn_2 , . . . , T Scn_50 }. We denote the average of the two groups as T Scr and T Scn , respectively. Finally, a two-sample t-test (under the assumption of unequal population variances) is conducted on the two populations formed with the above approach. The results show that for 437 out of the 463 twitterers, the null hypothesis is rejected at significance level α = 0.01. When the significance level α is set to 0.001, the null hypothesis is rejected for 430 out of the 463 twitterers. The result reveals that for the vast majority of users, retweets are much more similar with each other than non-retweets in terms of the underlying topics. That is to say, we demonstrate the existence of topics of interest in an individual’s retweeting history. In fact, a human experiment in [6] shows that 81% of retweets are related with the twitterer’s interests. Figuratively, just like the thread runs through pearls on a necklace, topics of interest run through one’s retweeting history.

4.3 Ranking Factors Affecting Individual Retweeting Behavior—An Example Section 4.2 has verified that we humans do not randomly retweet online contents but rather share the contents rationally and intentionally. Naturally, we raise the following questions: what factors (i.e. features) affect an individual’s retweeting behavior? Which features are more influential? Actually, lots of researchers have investigated the first question [1–3, 7, 8], and plenty of features have been adopted to predict an individual’s retweeting behavior. Table 4.2 lists these features and their meanings. However, the second question—the relative importance of these features— remains unanswered and even why these features are added into the prediction model has rarely been discussed. As a result, a large number of features are indiscriminately introduced into the prediction model without examining the relevance of these features. The existence of irrelevant/redundant features not only increases the data collection cost and tends to generate an overfitted model which predicts poorly on future observations not used in model training, known as the curse of dimensionality [10], but also hinders us from understanding which factors are actually dominating an individual’s retweeting behavior. Thus, we claim that understanding the relative importance of these features can not only save the cost of measuring trashy features but also deepen our understanding of individual retweeting decisions and finally, improve the prediction performance. Especially, knowledge about the salient features can guide marketers to make more effective online marketing strategies. In this section, we use a specific user with the screen name “@ jk_rowling” to illustrate the process of feature evaluation. Before we move on, we would like to give the formal definition of “topic_distance” and examine the discriminative ability of this feature first.

66

4 Dominating Factors Affecting Individual Retweeting Behavior

Table 4.2 Features and their meanings—dependent variable: r etwt Order Feature Meaning 1

topic_distance  ◦ •

2

interact  ◦ •

3 4 5

tag url men  ◦ •

6

retwttimes  ◦ •

7

favorite_count  ◦ •

8 9 10 11 12

length favorites_count statuses_count friends_count listed_count ◦

13

followers ◦

14

veri

15

acc

16 17

emo pol

Topical distance between a tweet and the user Number of times the user has retweeted the tweet’s author Number of hashtags in the tweet Number of URLs in the tweet Number of times the tweet has mentioned other users Number of times this tweet has been retweeted by others Number of times this tweet has been “liked” by Twitter users Number of words in the tweet Number of favorite tweets the author has Number of tweets published by the author Number of users the author is following Number of public lists in which the author participates Number of followers the tweet’s author has When true, indicates that the author has a verified account Number of days since the author created his or her Twitter account Emotionality of the tweet Polarity of the tweet

Studies [1–3, 7] [2, 3, 7] [2, 3, 7] [2, 3, 7] [2, 3, 7] [2, 3, 7] [3] [2, 3, 7] [2, 3] [1, 2, 7] [2] [1, 2, 7] [1–3, 7] [1, 2, 7] [9] [9]

4.3.1 A Highly Discriminating Feature: Topic_distance 4.3.1.1

The Definition of Topic_distance

Topic_distance is defined as “one minus the cosine similarity between the latent topic distribution of a specific tweet and the latent topic distribution of an individual”, formulated in Eq. (4.3). We resort to Latent Semantic Analysis algorithm (LSA) [4] to uncover the hidden topics from the tweets. However, directly applying the topic model on tweets will suffer from the severe data sparsity problem [5], as a tweet has the 140-character limitation. That is to say, the limited number of words in a tweet make it difficult for LSA to infer how words are related, compared with lengthy documents containing abundant words. To address the short text problem, a popular way is to aggregate short tweets into a lengthy pseudo-document before training a standard topic mode [2, 3, 5, 11].

4.3 Ranking Factors Affecting Individual Retweeting Behavior—An Example

67

Fig. 4.1 Examining the discriminative ability of “topic_distance”

Thus, we use a user’s retweets to generate this user’s topic model based on Latent Semantic Analysis algorithm. The reason why we do not use all of a user’s posts to generate his topic model is because a large portion of these posts are about his daily routines or social conversations with friends [12], which may have little relationship with the user’s topics of interest. In order to obtain the topic distribution Ps j for the jth twitterer s j , all s j s retweets are aggregated into a big document and the distribution of this big document is estimated using the topic model generated just now [2, 3, 5, 11]. To obtain the topic distribution Ptweeti for tweeti , tweeti is treated as a document and transformed using the topic model generated just now. Definition 4.1 Topic distance between the ith tweet tweeti and the jth twitterer s j can be calculated as dist (tweeti , s j ) = 1 −

Ptweeti · Ps j Ptweeti  · Ps j 

(4.3)

where Ptweeti is the topic distribution for tweeti and Ps j is the topic distribution for the jth twitterer s j . Comparing Eqs. (4.3) and (3.1), we can find the following relationship holds. dist (tweeti , s j ) = 1 − topical_r elevance(tweeti , s j )

4.3.1.2

(4.4)

Examining the Discriminative Ability of “Topic_distance”

Although topic-related feature has been widely used in previous research [1–3, 7, 8], the discriminative ability of this feature has not been attached enough importance in previous research. Next we examine the discriminative ability of “topic_distance” using the twitterer with the screen name of “@ jk_rowling” as an example. The three boxes in Fig. 4.1 denote the three steps of this examination process. At the first step, 80% of retweets are randomly chosen out, denoted as chosen RT , to generate this user’s topic model using the LSA algorithm. In this example, the number of chosen RT is 381. The remaining 20% of retweets are used as positive test instances, denoted as test_RT . The same number of negative test instances are randomly chosen from non-retweets, denoted as test_non RT . At the second step, to

68

4 Dominating Factors Affecting Individual Retweeting Behavior

Fig. 4.2 Distributions of topic distances in the two groups

obtain the topic distribution Pi for a new tweet tweeti , tweeti is treated as a document and transformed using the topic model generated at the first step. Thus at the end of the second step, we get the topic distributions for each positive instance, denoted as {P1 , P2 , . . . , PN }, and their counterparts for each negative instance, denoted as {Q 1 , Q 2 , . . . , Q N }. In this example, N = 95. At the final step, to obtain the topic distribution Puser for “@ jk_rowling”, we aggregate chosen RT into a big document and estimate the distribution of this big document, which is viewed as the topic distribution for “@ jk_rowling” [2, 3, 5, 11]. The topic_distance between the ith test tweet and this user can be computed based on Eq. (4.3). Thus, we get topic distances for the positive group and their counterparts for the negative group, denoted as {dist1 , dist2 , . . . , dist N }test_RT and {dist1 , dist2 , . . . , dist N }test_non RT respectively. Distributions of topic distances for the two groups are displayed in Fig. 4.2 and we observe a very pronounced relationship between the predictor topic_distance and the response group in Fig. 4.2. The topic distances for the positive group are dispersed and the average of the topic distances for this group is apparently smaller than that of the negative group. Furthermore, we conduct a two-sample t-test (under the assumption of unequal population variances) on these two groups. Let dist RT and distnon RT denote the mean topic_distance of the two groups respectively. Then the null hypothesis is H0 : dist RT = distnon RT and the alternative hypothesis is H1 : dist RT < distnon RT . The result shows that H0 is rejected at significance level α = 3.02 ∗ 10−12 , indicating that dist RT is statistically significantly smaller than distnon RT . The examination result shows that “topic_distance” is a highly discriminating feature when distinguishing different classes. Actually, similar hypothesis test is

4.3 Ranking Factors Affecting Individual Retweeting Behavior—An Example

69

carried out on other individuals who have retweeted no